Title: | Detection of post-transcriptional modifications in high throughput sequencing data |
---|---|
Description: | RNAmodR provides classes and workflows for loading/aggregation data from high througput sequencing aimed at detecting post-transcriptional modifications through analysis of specific patterns. In addition, utilities are provided to validate and visualize the results. The RNAmodR package provides a core functionality from which specific analysis strategies can be easily implemented as a seperate package. |
Authors: | Felix G.M. Ernst [aut, cre] , Denis L.J. Lafontaine [ctb, fnd] |
Maintainer: | Felix G.M. Ernst <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.21.0 |
Built: | 2024-12-30 04:24:57 UTC |
Source: | https://github.com/bioc/RNAmodR |
The aggregate
function is defined for each
SequenceData
object and can be used
directly on a SequenceData
object or
indirectly via a Modifier
object.
For the letter the call is redirect to the
SequenceData
object, the result summarized
as defined for the individual Modifier
class and stored in the
aggregate
slot of the Modifier
object. The data is then used
for subsequent tasks, such as search for modifications and visualization of
the results.
The summarization is implemented in the aggregateData
for each type of
Modifier
class. The stored data from the aggregate
slot can be
retrieved using the getAggregateData
function.
Whether the aggrgeated data is already present in the aggregate
slot
can be checked using the hasAggregateData
function.
For SequenceDataSet
, SequenceDataList
and ModfierSet
classes wrapper of the aggregate
function exist as well.
aggregate(x, ...) aggregateData(x, ...) getAggregateData(x) hasAggregateData(x) ## S4 method for signature 'SequenceData' aggregate(x, condition = c()) ## S4 method for signature 'SequenceData' aggregateData(x, condition) ## S4 method for signature 'SequenceDataSet' aggregate(x, condition = "Treated") ## S4 method for signature 'SequenceDataList' aggregate(x, condition = "Treated") ## S4 method for signature 'Modifier' aggregate(x, force = FALSE) ## S4 method for signature 'Modifier' aggregateData(x) ## S4 method for signature 'Modifier' getAggregateData(x) ## S4 method for signature 'Modifier' hasAggregateData(x) ## S4 method for signature 'ModifierSet' aggregate(x, force = FALSE)
aggregate(x, ...) aggregateData(x, ...) getAggregateData(x) hasAggregateData(x) ## S4 method for signature 'SequenceData' aggregate(x, condition = c()) ## S4 method for signature 'SequenceData' aggregateData(x, condition) ## S4 method for signature 'SequenceDataSet' aggregate(x, condition = "Treated") ## S4 method for signature 'SequenceDataList' aggregate(x, condition = "Treated") ## S4 method for signature 'Modifier' aggregate(x, force = FALSE) ## S4 method for signature 'Modifier' aggregateData(x) ## S4 method for signature 'Modifier' getAggregateData(x) ## S4 method for signature 'Modifier' hasAggregateData(x) ## S4 method for signature 'ModifierSet' aggregate(x, force = FALSE)
x |
a |
... |
additional arguments |
condition |
character value, which selects, for which condition the data
should be aggregated. One of the following values: |
force |
whether to recreate the aggregated data, if it is already stored
inside the |
aggregate
: for SequenceData
object the aggregated data
is returned as a SplitDataFrameList
with an element per transcript,
whereas for a Modifier
the modified input object is returned,
containing the aggregated data, which can be accessed using
getAggregateData
.
getAggregateData
: only for Modifier
: a
SplitDataFrameList
with an element per transcript is returned. If the
aggregated data is not stored in the object, it is generated on the fly, but
does not persist.
hasAggregateData
: TRUE or FALSE. Does the Modifier
object already contain aggregated data?
If 'x' is a
SequenceData
a
SplitDataFrameList
with elments per transcript.
SequenceDataSet
or
SequenceDataList
a SimpleList
with SplitDataFrameList
as elements.
Modifier
or
ModifierSet
an updated Modifier
object. The data can be accessed by using the aggregateData
function.
data(e5sd,package="RNAmodR") data(msi,package="RNAmodR") # modify() triggers the search for modifications in the data contained in # the Modifier or ModifierSet object sdfl <- aggregate(e5sd) mi <- aggregate(msi[[1]])
data(e5sd,package="RNAmodR") data(msi,package="RNAmodR") # modify() triggers the search for modifications in the data contained in # the Modifier or ModifierSet object sdfl <- aggregate(e5sd) mi <- aggregate(msi[[1]])
To compare data of different samples, a
ModifierSet
can be used. To select the data
alongside the transcripts and their positions a
GRanges
or a
GRangesList
needs to be provided.
In case of a GRanges
object, the parent column must match the
transcript names as defined by the out put of ranges(x)
, whereas in
case of a GRangesList
the element names must match the transcript
names.
compare(x, name, pos = 1L, ...) compareByCoord(x, coord, ...) plotCompare(x, name, pos = 1L, normalize, ...) plotCompareByCoord(x, coord, normalize, ...) ## S4 method for signature 'ModifierSet' compare(x, name, pos = 1L, normalize, ...) ## S4 method for signature 'ModifierSet,GRanges' compareByCoord(x, coord, normalize, ...) ## S4 method for signature 'ModifierSet,GRangesList' compareByCoord(x, coord, normalize, ...) ## S4 method for signature 'ModifierSet' plotCompare(x, name, pos = 1L, normalize, ...) ## S4 method for signature 'ModifierSet,GRanges' plotCompareByCoord(x, coord, normalize, ...) ## S4 method for signature 'ModifierSet,GRangesList' plotCompareByCoord(x, coord, normalize, ...)
compare(x, name, pos = 1L, ...) compareByCoord(x, coord, ...) plotCompare(x, name, pos = 1L, normalize, ...) plotCompareByCoord(x, coord, normalize, ...) ## S4 method for signature 'ModifierSet' compare(x, name, pos = 1L, normalize, ...) ## S4 method for signature 'ModifierSet,GRanges' compareByCoord(x, coord, normalize, ...) ## S4 method for signature 'ModifierSet,GRangesList' compareByCoord(x, coord, normalize, ...) ## S4 method for signature 'ModifierSet' plotCompare(x, name, pos = 1L, normalize, ...) ## S4 method for signature 'ModifierSet,GRanges' plotCompareByCoord(x, coord, normalize, ...) ## S4 method for signature 'ModifierSet,GRangesList' plotCompareByCoord(x, coord, normalize, ...)
x |
a |
name |
Only for |
pos |
Only for |
... |
optional parameters:
|
coord |
coordinates of position to subset to. Either a |
normalize |
either a single logical or character value. If it is a
character, it must match one of the names in the |
compareByCoord
returns a
DataFrame
and
plotCompareByCoord
returns a ggplot
object, which can be
modified further. The DataFrame
contains columns per sample as well
as the columns names
, positions
and mod
incorporated
from the coord
input. If coord
contains a column
Activity
this is included in the results as well.
data(msi,package="RNAmodR") # constructing a GRanges obejct to mark positive positions mod <- modifications(msi) coord <- unique(unlist(mod)) coord$score <- NULL coord$sd <- NULL # return a DataFrame compareByCoord(msi,coord) # plot the comparison as a heatmap plotCompareByCoord(msi,coord)
data(msi,package="RNAmodR") # constructing a GRanges obejct to mark positive positions mod <- modifications(msi) coord <- unique(unlist(mod)) coord$score <- NULL coord$sd <- NULL # return a DataFrame compareByCoord(msi,coord) # plot the comparison as a heatmap plotCompareByCoord(msi,coord)
CoverageSequenceData
implements
SequenceData
to contain and aggregate the
coverage of reads per position along the transcripts.
CoverageSequenceData
contains one column per data file named using the
following naming convention coverage.condition.replicate
.
aggregate
calculates the mean and sd for samples in the control
and treated
condition separatly.
CoverageSequenceDataFrame( df, ranges, sequence, replicate, condition, bamfiles, seqinfo ) CoverageSequenceData(bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature ## 'CoverageSequenceData,BamFileList,GRangesList,XStringSet,ScanBamParam' getData(x, bamfiles, grl, sequences, param, args) ## S4 method for signature 'CoverageSequenceData' aggregateData(x, condition = c("Both", "Treated", "Control")) ## S4 method for signature 'CoverageSequenceData' getDataTrack(x, name, ...)
CoverageSequenceDataFrame( df, ranges, sequence, replicate, condition, bamfiles, seqinfo ) CoverageSequenceData(bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature ## 'CoverageSequenceData,BamFileList,GRangesList,XStringSet,ScanBamParam' getData(x, bamfiles, grl, sequences, param, args) ## S4 method for signature 'CoverageSequenceData' aggregateData(x, condition = c("Both", "Treated", "Control")) ## S4 method for signature 'CoverageSequenceData' getDataTrack(x, name, ...)
df , ranges , sequence , replicate
|
inputs for creating a
|
condition |
For |
bamfiles , annotation , seqinfo , grl , sequences , param , args , ...
|
See
|
x |
a |
name |
For |
a CoverageSequenceData
object
# Construction of a CoverageSequenceData objectobject library(RNAmodR.Data) library(rtracklayer) annotation <- GFF3File(RNAmodR.Data.example.man.gff3()) sequences <- RNAmodR.Data.example.man.fasta() files <- c(treated = RNAmodR.Data.example.wt.1()) csd <- CoverageSequenceData(files, annotation = annotation, sequences = sequences)
# Construction of a CoverageSequenceData objectobject library(RNAmodR.Data) library(rtracklayer) annotation <- GFF3File(RNAmodR.Data.example.man.gff3()) sequences <- RNAmodR.Data.example.man.fasta() files <- c(treated = RNAmodR.Data.example.wt.1()) csd <- CoverageSequenceData(files, annotation = annotation, sequences = sequences)
The End5SequenceData
/End3SequenceData
/EndSequenceData
classes aggregate the counts of read ends at each position along a
transcript. End5SequenceData
/End3SequenceData
classes aggregate
either the 5'-end or 3'-end, the EndSequenceData
aggregates both.
All three classes contain one column per data file named using the following
naming convention (end5/end3/end).condition.replicate
.
aggregate
calculates the mean and sd for samples in the control
and treated
condition separatly.
End5SequenceDataFrame( df, ranges, sequence, replicate, condition, bamfiles, seqinfo ) End3SequenceDataFrame( df, ranges, sequence, replicate, condition, bamfiles, seqinfo ) EndSequenceDataFrame( df, ranges, sequence, replicate, condition, bamfiles, seqinfo ) End5SequenceData(bamfiles, annotation, sequences, seqinfo, ...) End3SequenceData(bamfiles, annotation, sequences, seqinfo, ...) EndSequenceData(bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature ## 'End5SequenceData,BamFileList,GRangesList,XStringSet,ScanBamParam' getData(x, bamfiles, grl, sequences, param, args) ## S4 method for signature ## 'End3SequenceData,BamFileList,GRangesList,XStringSet,ScanBamParam' getData(x, bamfiles, grl, sequences, param, args) ## S4 method for signature ## 'EndSequenceData,BamFileList,GRangesList,XStringSet,ScanBamParam' getData(x, bamfiles, grl, sequences, param, args) ## S4 method for signature 'End5SequenceData' aggregateData(x, condition = c("Both", "Treated", "Control")) ## S4 method for signature 'End3SequenceData' aggregateData(x, condition = c("Both", "Treated", "Control")) ## S4 method for signature 'EndSequenceData' aggregateData(x, condition = c("Both", "Treated", "Control")) ## S4 method for signature 'EndSequenceData' getDataTrack(x, name, ...) ## S4 method for signature 'End5SequenceData' getDataTrack(x, name, ...) ## S4 method for signature 'End3SequenceData' getDataTrack(x, name, ...)
End5SequenceDataFrame( df, ranges, sequence, replicate, condition, bamfiles, seqinfo ) End3SequenceDataFrame( df, ranges, sequence, replicate, condition, bamfiles, seqinfo ) EndSequenceDataFrame( df, ranges, sequence, replicate, condition, bamfiles, seqinfo ) End5SequenceData(bamfiles, annotation, sequences, seqinfo, ...) End3SequenceData(bamfiles, annotation, sequences, seqinfo, ...) EndSequenceData(bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature ## 'End5SequenceData,BamFileList,GRangesList,XStringSet,ScanBamParam' getData(x, bamfiles, grl, sequences, param, args) ## S4 method for signature ## 'End3SequenceData,BamFileList,GRangesList,XStringSet,ScanBamParam' getData(x, bamfiles, grl, sequences, param, args) ## S4 method for signature ## 'EndSequenceData,BamFileList,GRangesList,XStringSet,ScanBamParam' getData(x, bamfiles, grl, sequences, param, args) ## S4 method for signature 'End5SequenceData' aggregateData(x, condition = c("Both", "Treated", "Control")) ## S4 method for signature 'End3SequenceData' aggregateData(x, condition = c("Both", "Treated", "Control")) ## S4 method for signature 'EndSequenceData' aggregateData(x, condition = c("Both", "Treated", "Control")) ## S4 method for signature 'EndSequenceData' getDataTrack(x, name, ...) ## S4 method for signature 'End5SequenceData' getDataTrack(x, name, ...) ## S4 method for signature 'End3SequenceData' getDataTrack(x, name, ...)
df , ranges , sequence , replicate
|
inputs for creating a
|
condition |
For |
bamfiles , annotation , seqinfo , grl , sequences , param , args , ...
|
See
|
x |
a |
name |
For |
a End5SequenceData
, a End3SequenceData
or a
EndSequenceData
object
# Construction of a End5SequenceData object library(RNAmodR.Data) library(rtracklayer) annotation <- GFF3File(RNAmodR.Data.example.man.gff3()) sequences <- RNAmodR.Data.example.man.fasta() files <- c(treated = RNAmodR.Data.example.wt.1()) e5sd <- End5SequenceData(files, annotation = annotation, sequences = sequences)
# Construction of a End5SequenceData object library(RNAmodR.Data) library(rtracklayer) annotation <- GFF3File(RNAmodR.Data.example.man.gff3()) sequences <- RNAmodR.Data.example.man.fasta() files <- c(treated = RNAmodR.Data.example.wt.1()) e5sd <- End5SequenceData(files, annotation = annotation, sequences = sequences)
The Modifier
class is a virtual class, which provides the central
functionality to search for post-transcriptional RNA modification patterns in
high throughput sequencing data.
Each subclass has to implement the following functions:
Slot nucleotide
: Either "RNA" or "DNA". For conveniance the
subclasses RNAModifier
and DNAModifier
are already available
and can be inherited from.
Function aggregateData
: used for specific data
aggregation
Function findMod
: used for specific search for
modifications
Optionally the function settings<-
can be
implemented to store additional arguments, which the base class does not
recognize.
Modifier
objects are constructed centrally by calling
Modifier()
with a className
matching the specific class to be
constructed. This will trigger the immediate analysis, if find.mod
is
not set to FALSE
.
Modifier(className, x, annotation, sequences, seqinfo, ...) ## S4 method for signature 'SequenceData' Modifier( className, x, annotation = NULL, sequences = NULL, seqinfo = NULL, ... ) ## S4 method for signature 'SequenceDataSet' Modifier( className, x, annotation = NULL, sequences = NULL, seqinfo = NULL, ... ) ## S4 method for signature 'SequenceDataList' Modifier( className, x, annotation = NULL, sequences = NULL, seqinfo = NULL, ... ) ## S4 method for signature 'character' Modifier( className, x, annotation = NULL, sequences = NULL, seqinfo = NULL, ... ) ## S4 method for signature 'list' Modifier( className, x, annotation = NULL, sequences = NULL, seqinfo = NULL, ... ) ## S4 method for signature 'BamFileList' Modifier( className, x, annotation = NULL, sequences = NULL, seqinfo = NULL, ... )
Modifier(className, x, annotation, sequences, seqinfo, ...) ## S4 method for signature 'SequenceData' Modifier( className, x, annotation = NULL, sequences = NULL, seqinfo = NULL, ... ) ## S4 method for signature 'SequenceDataSet' Modifier( className, x, annotation = NULL, sequences = NULL, seqinfo = NULL, ... ) ## S4 method for signature 'SequenceDataList' Modifier( className, x, annotation = NULL, sequences = NULL, seqinfo = NULL, ... ) ## S4 method for signature 'character' Modifier( className, x, annotation = NULL, sequences = NULL, seqinfo = NULL, ... ) ## S4 method for signature 'list' Modifier( className, x, annotation = NULL, sequences = NULL, seqinfo = NULL, ... ) ## S4 method for signature 'BamFileList' Modifier( className, x, annotation = NULL, sequences = NULL, seqinfo = NULL, ... )
className |
The name of the class which should be constructed. |
x |
the input which can be of the following types
|
annotation |
annotation data, which must match the information contained
in the BAM files. This parameter is only required if |
sequences |
sequences matching the target sequences the reads were
mapped onto. This must match the information contained in the BAM files.
TThis parameter is only required if |
seqinfo |
An optional |
... |
Additional otpional parameters:
All additional options must be named and will be passed to the
|
a Modifier
object of type className
nucleotide
a character
value, which needs to contain "RNA" or
"DNA"
mod
a character
value, which needs to contain one or more
elements from the alphabet of a
ModRNAString
or
ModDNAString
class.
score
the main score identifier used for visualizations
dataType
the class name(s) of the SequenceData
class used
bamfiles
the input bam files as BamFileList
condition
conditions along the BamFileList
: Either
control
or treated
replicate
replicate number along the BamFileList
for each of the
condition types.
data
The sequence data object: Either a SequenceData
,
SequenceDataSet
or a SequenceDataList
object, if more than one
dataType
is used.
aggregate
the aggregated data as a SplitDataFrameList
modifications
the found modifications as a GRanges
object
settings
arguments used for the analysis as a list
aggregateValidForCurrentArguments
TRUE
or FALSE
whether
the aggregate data was constructed with the current arguments
modificationsValidForCurrentArguments
TRUE
or FALSE
whether the modifications were found with the current arguments
Modifier
objects can be created in two ways, either by providing a
list of bamfiles or
SequenceData
/SequenceDataSet
/SequenceDataList
objects,
which match the structure in dataType()
.
dataType()
can be a character
vector or a list
of
character
vectors and depending on this the input files have to
follow this structure:
a single character
: a SequenceData
is
constructed/expected.
a character
vector: a SequenceDataSet
is
constructed/expected.
a list
of character
vectors: a SequenceDataList
is constructed/expected.
The cases for a SequenceData
or SequenceDataSet
are straight
forward, since the input remains the same. The last case is special, since it
is a hypothetical option, in which bam files from two or more different
methods have to be combined to reliably detect a single modification (The
elements of a SequenceDataList
don't have to be created from the
bamfiles, whereas from a SequenceDataSet
they have to be).
For this example a list
of character
vectors is expected.
Each element must be named according to the names of dataType()
and
contain a character
vector for creating a SequenceData
object.
All additional options must be named and will be passed to the
settings
function and onto the SequenceData
objects, if x
is not a SequenceData
object or a list of
SequenceData
objects.
For the Modifier
and ModifierSet
classes a number of functions
are implemented to access the data stored by the object.
The validAggregate
and validModification
functions check if
settings
have been modified, after the data was
loaded. This potentially invalidates them. To update the data, run the
aggregate
or the modify
function.
bamfiles(x) mainScore(x) modifierType(x) modType(x) dataType(x) sequenceData(x) sequences(x, ...) validAggregate(x) validModification(x) ## S4 method for signature 'Modifier' show(object) ## S4 method for signature 'Modifier' bamfiles(x) ## S4 method for signature 'Modifier' conditions(object) ## S4 method for signature 'Modifier' mainScore(x) ## S4 method for signature 'Modifier' modifierType(x) ## S4 method for signature 'Modifier' modType(x) ## S4 method for signature 'Modifier' dataType(x) ## S4 method for signature 'Modifier' names(x) ## S4 method for signature 'Modifier' ranges(x) ## S4 method for signature 'Modifier' replicates(x) ## S4 method for signature 'Modifier' seqinfo(x) ## S4 method for signature 'Modifier' seqtype(x) ## S4 method for signature 'Modifier' sequenceData(x) ## S4 method for signature 'Modifier' sequences(x, modified = FALSE) ## S4 method for signature 'Modifier' validAggregate(x) ## S4 method for signature 'Modifier' validModification(x) ## S4 method for signature 'ModifierSet' show(object) ## S4 method for signature 'ModifierSet' bamfiles(x) ## S4 method for signature 'ModifierSet' conditions(object) ## S4 method for signature 'ModifierSet' mainScore(x) ## S4 method for signature 'ModifierSet' modifications(x, perTranscript = FALSE) ## S4 method for signature 'ModifierSet' modifierType(x) ## S4 method for signature 'ModifierSet' modType(x) ## S4 method for signature 'ModifierSet' dataType(x) ## S4 method for signature 'ModifierSet' ranges(x) ## S4 method for signature 'ModifierSet' replicates(x) ## S4 method for signature 'ModifierSet' seqinfo(x) ## S4 method for signature 'ModifierSet' seqtype(x) ## S4 method for signature 'ModifierSet' sequences(x, modified = FALSE)
bamfiles(x) mainScore(x) modifierType(x) modType(x) dataType(x) sequenceData(x) sequences(x, ...) validAggregate(x) validModification(x) ## S4 method for signature 'Modifier' show(object) ## S4 method for signature 'Modifier' bamfiles(x) ## S4 method for signature 'Modifier' conditions(object) ## S4 method for signature 'Modifier' mainScore(x) ## S4 method for signature 'Modifier' modifierType(x) ## S4 method for signature 'Modifier' modType(x) ## S4 method for signature 'Modifier' dataType(x) ## S4 method for signature 'Modifier' names(x) ## S4 method for signature 'Modifier' ranges(x) ## S4 method for signature 'Modifier' replicates(x) ## S4 method for signature 'Modifier' seqinfo(x) ## S4 method for signature 'Modifier' seqtype(x) ## S4 method for signature 'Modifier' sequenceData(x) ## S4 method for signature 'Modifier' sequences(x, modified = FALSE) ## S4 method for signature 'Modifier' validAggregate(x) ## S4 method for signature 'Modifier' validModification(x) ## S4 method for signature 'ModifierSet' show(object) ## S4 method for signature 'ModifierSet' bamfiles(x) ## S4 method for signature 'ModifierSet' conditions(object) ## S4 method for signature 'ModifierSet' mainScore(x) ## S4 method for signature 'ModifierSet' modifications(x, perTranscript = FALSE) ## S4 method for signature 'ModifierSet' modifierType(x) ## S4 method for signature 'ModifierSet' modType(x) ## S4 method for signature 'ModifierSet' dataType(x) ## S4 method for signature 'ModifierSet' ranges(x) ## S4 method for signature 'ModifierSet' replicates(x) ## S4 method for signature 'ModifierSet' seqinfo(x) ## S4 method for signature 'ModifierSet' seqtype(x) ## S4 method for signature 'ModifierSet' sequences(x, modified = FALSE)
x , object
|
a |
... |
Additional arguments. |
modified |
For |
perTranscript |
|
modifierType
: a character vector with the appropriate class
Name of a Modifier
.
modType
: a character vector with the modifications detected by
the Modifier
class.
seqtype
: a single character value defining if either
"RNA" or "DNA" modifications are detected by the Modifier
class.
mainScore
: a character vector.
sequenceData
: a SequenceData
object.
modifications
: a GRanges
or GRangesList
object
describing the found modifications.
seqinfo
: a Seqinfo
object.
sequences
: a RNAStingSet
object.
ranges
: a GRangesList
object with each element per
transcript.
bamfiles
: a BamFileList
object.
validAggregate
: TRUE
or FALSE
. Checks if current
settings are the same for which the data was aggregate
validModification
: TRUE
or FALSE
. Checks if
current settings are the same for which modification were found
data(msi,package="RNAmodR") mi <- msi[[1]] modifierType(mi) # The class name of the Modifier object modifierType(msi) seqtype(mi) modType(mi) mainScore(mi) sequenceData(mi) modifications(mi) # general accessors seqinfo(mi) sequences(mi) ranges(mi) bamfiles(mi)
data(msi,package="RNAmodR") mi <- msi[[1]] modifierType(mi) # The class name of the Modifier object modifierType(msi) seqtype(mi) modType(mi) mainScore(mi) sequenceData(mi) modifications(mi) # general accessors seqinfo(mi) sequences(mi) ranges(mi) bamfiles(mi)
The ModifierSet
class allows multiple
Modifier
objects to be created from the same
annotation and sequence data varying only the bam input files.
In addition the comparison of samples is also done via calling functions on
the ModifierSet
objects.
The ModifierSet
is a virtual class, which derives from the
SimpleList
class with the slot elementType = "Modifier"
. The
ModifierSet
class has to be implemented for each specific analysis.#'
ModifierSet(className, x, annotation, sequences, seqinfo, ...) ## S4 method for signature 'list' ModifierSet( className, x, annotation = NULL, sequences = NULL, seqinfo = NULL, ... ) ## S4 method for signature 'character' ModifierSet( className, x, annotation = NULL, sequences = NULL, seqinfo = NULL, ... ) ## S4 method for signature 'BamFileList' ModifierSet( className, x, annotation = NULL, sequences = NULL, seqinfo = NULL, ... ) ## S4 method for signature 'Modifier' ModifierSet(className, x, annotation, sequences, seqinfo, ...)
ModifierSet(className, x, annotation, sequences, seqinfo, ...) ## S4 method for signature 'list' ModifierSet( className, x, annotation = NULL, sequences = NULL, seqinfo = NULL, ... ) ## S4 method for signature 'character' ModifierSet( className, x, annotation = NULL, sequences = NULL, seqinfo = NULL, ... ) ## S4 method for signature 'BamFileList' ModifierSet( className, x, annotation = NULL, sequences = NULL, seqinfo = NULL, ... ) ## S4 method for signature 'Modifier' ModifierSet(className, x, annotation, sequences, seqinfo, ...)
className |
The name of the class which should be constructed. |
x |
the input which can be of the following types
|
annotation |
annotation data, which must match the information contained
in the BAM files. This is parameter is only required, if |
sequences |
sequences matching the target sequences the reads were
mapped onto. This must match the information contained in the BAM files. This
is parameter is only required, if |
seqinfo |
An optional |
... |
Additional otpional parameters:
All other arguments will be passed onto the |
a ModifierSet
object of type className
The input files have to be provided as a list
of elements. Each
element in itself must be valid for the creation of Modifier
object (Have a look at the man page for more details) and must be named.
SequenceData
The modify
function executes the search for modifications for a
Modifier
class. Usually this is done
automatically during construction of a Modifier
object.
When the modify
functions is called, the aggregated data is checked
for validity for the current settings and the search for modifications is
performed using the findMod
. The results are stored in the
modification
slot of the Modifier
object, which is returned by
modify
. The results can be accessed via the modifications()
function.
findMod
returns the found modifications as a GRanges
object and has to be implemented for each individual Modifier
class.
modifications(x, ...) modify(x, ...) findMod(x) ## S4 method for signature 'Modifier' modifications(x, perTranscript = FALSE) ## S4 method for signature 'Modifier' modify(x, force = FALSE) ## S4 method for signature 'Modifier' findMod(x) ## S4 method for signature 'ModifierSet' modify(x, force = FALSE)
modifications(x, ...) modify(x, ...) findMod(x) ## S4 method for signature 'Modifier' modifications(x, perTranscript = FALSE) ## S4 method for signature 'Modifier' modify(x, force = FALSE) ## S4 method for signature 'Modifier' findMod(x) ## S4 method for signature 'ModifierSet' modify(x, force = FALSE)
x |
a |
... |
additional arguments |
perTranscript |
For |
force |
force to run |
modify
: the updated Modifier
object.
modifications
: the modifications found as a GRanges
object.
data(msi,package="RNAmodR") # modify() triggers the search for modifications in the data contained in # the Modifier or ModifierSet object mi <- modify(msi[[1]])
data(msi,package="RNAmodR") # modify() triggers the search for modifications in the data contained in # the Modifier or ModifierSet object mi <- modify(msi[[1]])
Inosine can be detected in RNA-Seq data by the conversion of A positions to
G. This conversion is detected by ModInosine
and used to search for
Inosine positions. dataType
is "PileupSequenceData"
.
Only samples labeled with the condition treated
are used for this
analysis, since the A to G conversion is common feature among the reverse
transcriptases usually emploied. Let us know, if that is not the case, and
the class needs to be modified.
Further information on Functions
of
ModInosine
.
ModInosine(x, annotation, sequences, seqinfo, ...) ModSetInosine(x, annotation = NA, sequences = NA, seqinfo = NA, ...)
ModInosine(x, annotation, sequences, seqinfo, ...) ModSetInosine(x, annotation = NA, sequences = NA, seqinfo = NA, ...)
x |
the input which can be of the different types depending on whether
a |
annotation |
annotation data, which must match the information contained
in the BAM files. This is parameter is only required, if |
sequences |
sequences matching the target sequences the reads were
mapped onto. This must match the information contained in the BAM files. This
is parameter is only required, if |
seqinfo |
An optional |
... |
Optional arguments overwriting default values, which are
|
ModInosine
score: the scores for reported Inosine positions are
between 0 and 1. They are calculated as the relative amount of called G bases
((G / N)
) and only saved for genomic A positions.
a ModInosine
or ModSetInosine
object
Felix G.M. Ernst [aut]
# construction of ModInosine object library(RNAmodR.Data) library(rtracklayer) annotation <- GFF3File(RNAmodR.Data.example.man.gff3()) sequences <- RNAmodR.Data.example.man.fasta() files <- c(treated = RNAmodR.Data.example.wt.1()) mi <- ModInosine(files,annotation = annotation ,sequences = sequences) # construction of ModSetInosine object ## Not run: files <- list("SampleSet1" = c(treated = RNAmodR.Data.example.wt.1(), treated = RNAmodR.Data.example.wt.2(), treated = RNAmodR.Data.example.wt.3()), "SampleSet2" = c(treated = RNAmodR.Data.example.bud23.1(), treated = RNAmodR.Data.example.bud23.2()), "SampleSet3" = c(treated = RNAmodR.Data.example.trm8.1(), treated = RNAmodR.Data.example.trm8.2())) msi <- ModSetInosine(files, annotation = annotation, sequences = sequences) ## End(Not run)
# construction of ModInosine object library(RNAmodR.Data) library(rtracklayer) annotation <- GFF3File(RNAmodR.Data.example.man.gff3()) sequences <- RNAmodR.Data.example.man.fasta() files <- c(treated = RNAmodR.Data.example.wt.1()) mi <- ModInosine(files,annotation = annotation ,sequences = sequences) # construction of ModSetInosine object ## Not run: files <- list("SampleSet1" = c(treated = RNAmodR.Data.example.wt.1(), treated = RNAmodR.Data.example.wt.2(), treated = RNAmodR.Data.example.wt.3()), "SampleSet2" = c(treated = RNAmodR.Data.example.bud23.1(), treated = RNAmodR.Data.example.bud23.2()), "SampleSet3" = c(treated = RNAmodR.Data.example.trm8.1(), treated = RNAmodR.Data.example.trm8.2())) msi <- ModSetInosine(files, annotation = annotation, sequences = sequences) ## End(Not run)
All of the functions of Modifier
and
the ModifierSet
classes are
inherited by the ModInosine
and ModSetInosine
classes.
Check below for the specifically implemented functions.
## S4 replacement method for signature 'ModInosine' settings(x) <- value ## S4 method for signature 'ModInosine' aggregateData(x) ## S4 method for signature 'ModInosine' findMod(x) ## S4 method for signature 'ModInosine' getDataTrack(x, name, type, ...) ## S4 method for signature 'ModInosine,GRanges' plotDataByCoord(x, coord, type = "score", window.size = 15L, ...) ## S4 method for signature 'ModInosine' plotData(x, name, from = 1L, to = 30L, type = "score", ...) ## S4 method for signature 'ModSetInosine,GRanges' plotDataByCoord(x, coord, type = "score", window.size = 15L, ...) ## S4 method for signature 'ModSetInosine' plotData(x, name, from = 1L, to = 30L, type = "score", ...)
## S4 replacement method for signature 'ModInosine' settings(x) <- value ## S4 method for signature 'ModInosine' aggregateData(x) ## S4 method for signature 'ModInosine' findMod(x) ## S4 method for signature 'ModInosine' getDataTrack(x, name, type, ...) ## S4 method for signature 'ModInosine,GRanges' plotDataByCoord(x, coord, type = "score", window.size = 15L, ...) ## S4 method for signature 'ModInosine' plotData(x, name, from = 1L, to = 30L, type = "score", ...) ## S4 method for signature 'ModSetInosine,GRanges' plotDataByCoord(x, coord, type = "score", window.size = 15L, ...) ## S4 method for signature 'ModSetInosine' plotData(x, name, from = 1L, to = 30L, type = "score", ...)
x |
a |
value |
See |
coord , name , from , to , type , window.size , ...
|
See
|
ModInosine
specific arguments for plotData:
colour.bases
- a named character vector of length = 4
for the colours of the individual bases. The names are expected to be
c("G","A","U","C")
settings
See settings
.
aggregate
See aggregate
.
modify
See modify
.
getDataTrack
a list of
DataTrack
objects. See
plotDataByCoord
.
plotData
See plotDataByCoord
.
plotDataByCoord
See plotDataByCoord
.
data(msi,package="RNAmodR") mi <- msi[[1]] settings(mi) ## Not run: aggregate(mi) modify(mi) ## End(Not run) getDataTrack(mi, "1", mainScore(mi))
data(msi,package="RNAmodR") mi <- msi[[1]] settings(mi) ## Not run: aggregate(mi) modify(mi) ## End(Not run) getDataTrack(mi, "1", mainScore(mi))
These functions are not intended for general use, but are used for additional package development.
x , data , seqdata , sequence , args
|
internally used arguments |
The NormEnd5SequenceData
/NormEnd3SequenceData
aggregate the counts of read ends (Either 5' or 3') at each position along a
transcript. In addition, the number of counts are then normalized to the
length of the transcript and to the overlapping reads.
Both classes contain three columns per data file named using the
following naming convention (normend5/normend3).condition.replicate
.
The three columns are distinguished by additional identifiers ends
,
norm.tx
and norm.ol
.
aggregate
calculates the mean and sd for samples in the control
and treated
condition separatly. Similar to the stored results for
each of the two conditions six columns are returned (three for mean and sd
each) ending in ends
, tx
and ol
.
NormEnd5SequenceDataFrame( df, ranges, sequence, replicate, condition, bamfiles, seqinfo ) NormEnd3SequenceDataFrame( df, ranges, sequence, replicate, condition, bamfiles, seqinfo ) NormEnd5SequenceData(bamfiles, annotation, sequences, seqinfo, ...) NormEnd3SequenceData(bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature ## 'NormEnd5SequenceData,BamFileList,GRangesList,XStringSet,ScanBamParam' getData(x, bamfiles, grl, sequences, param, args) ## S4 method for signature ## 'NormEnd3SequenceData,BamFileList,GRangesList,XStringSet,ScanBamParam' getData(x, bamfiles, grl, sequences, param, args) ## S4 method for signature 'NormEnd5SequenceData' aggregateData(x, condition = c("Both", "Treated", "Control")) ## S4 method for signature 'NormEnd3SequenceData' aggregateData(x, condition = c("Both", "Treated", "Control")) ## S4 method for signature 'NormEnd5SequenceData' getDataTrack(x, name, ...) ## S4 method for signature 'NormEnd3SequenceData' getDataTrack(x, name, ...)
NormEnd5SequenceDataFrame( df, ranges, sequence, replicate, condition, bamfiles, seqinfo ) NormEnd3SequenceDataFrame( df, ranges, sequence, replicate, condition, bamfiles, seqinfo ) NormEnd5SequenceData(bamfiles, annotation, sequences, seqinfo, ...) NormEnd3SequenceData(bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature ## 'NormEnd5SequenceData,BamFileList,GRangesList,XStringSet,ScanBamParam' getData(x, bamfiles, grl, sequences, param, args) ## S4 method for signature ## 'NormEnd3SequenceData,BamFileList,GRangesList,XStringSet,ScanBamParam' getData(x, bamfiles, grl, sequences, param, args) ## S4 method for signature 'NormEnd5SequenceData' aggregateData(x, condition = c("Both", "Treated", "Control")) ## S4 method for signature 'NormEnd3SequenceData' aggregateData(x, condition = c("Both", "Treated", "Control")) ## S4 method for signature 'NormEnd5SequenceData' getDataTrack(x, name, ...) ## S4 method for signature 'NormEnd3SequenceData' getDataTrack(x, name, ...)
df , ranges , sequence , replicate
|
inputs for creating a
|
condition |
For |
bamfiles , annotation , seqinfo , grl , sequences , param , args , ...
|
See
|
x |
a |
name |
For |
a NormEnd5SequenceData
or NormEnd3SequenceData
object
# Construction of a NormEnd5SequenceData object ## Not run: library(RNAmodR.Data) library(rtracklayer) annotation <- GFF3File(RNAmodR.Data.example.man.gff3()) sequences <- RNAmodR.Data.example.man.fasta() files <- c(treated = RNAmodR.Data.example.wt.1()) ne5sd <- NormEnd5SequenceData(files, annotation = annotation, sequences = sequences) ## End(Not run)
# Construction of a NormEnd5SequenceData object ## Not run: library(RNAmodR.Data) library(rtracklayer) annotation <- GFF3File(RNAmodR.Data.example.man.gff3()) sequences <- RNAmodR.Data.example.man.fasta() files <- c(treated = RNAmodR.Data.example.wt.1()) ne5sd <- NormEnd5SequenceData(files, annotation = annotation, sequences = sequences) ## End(Not run)
The PileupSequenceData
aggregates the pileup of called bases per
position.
PileupSequenceData
contains five columns per data file named using the
following naming convention pileup.condition.replicate
. The five
columns are distinguished by additional identifiers -
, G
,
A
, T
and C
.
aggregate
calculates the mean and sd for each nucleotide in the
control
and treated
condition separatly. The results are then
normalized to a row sum of 1.
PileupSequenceDataFrame( df, ranges, sequence, replicate, condition, bamfiles, seqinfo ) PileupSequenceData(bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature ## 'PileupSequenceData,BamFileList,GRangesList,XStringSet,ScanBamParam' getData(x, bamfiles, grl, sequences, param, args) ## S4 method for signature 'PileupSequenceData' aggregateData(x, condition = c("Both", "Treated", "Control")) ## S4 method for signature 'PileupSequenceData' getDataTrack(x, name, ...) pileupToCoverage(x) ## S4 method for signature 'PileupSequenceData' pileupToCoverage(x)
PileupSequenceDataFrame( df, ranges, sequence, replicate, condition, bamfiles, seqinfo ) PileupSequenceData(bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature ## 'PileupSequenceData,BamFileList,GRangesList,XStringSet,ScanBamParam' getData(x, bamfiles, grl, sequences, param, args) ## S4 method for signature 'PileupSequenceData' aggregateData(x, condition = c("Both", "Treated", "Control")) ## S4 method for signature 'PileupSequenceData' getDataTrack(x, name, ...) pileupToCoverage(x) ## S4 method for signature 'PileupSequenceData' pileupToCoverage(x)
df , ranges , sequence , replicate
|
inputs for creating a
|
condition |
For |
bamfiles , annotation , seqinfo , grl , sequences , param , args , ...
|
See
|
x |
a |
name |
For |
a PileupSequenceData
object
# Construction of a PileupSequenceData object library(RNAmodR.Data) library(rtracklayer) annotation <- GFF3File(RNAmodR.Data.example.man.gff3()) sequences <- RNAmodR.Data.example.man.fasta() files <- c(treated = RNAmodR.Data.example.wt.1()) psd <- PileupSequenceData(files, annotation = annotation, sequences = sequences)
# Construction of a PileupSequenceData object library(RNAmodR.Data) library(rtracklayer) annotation <- GFF3File(RNAmodR.Data.example.man.gff3()) sequences <- RNAmodR.Data.example.man.fasta() files <- c(treated = RNAmodR.Data.example.wt.1()) psd <- PileupSequenceData(files, annotation = annotation, sequences = sequences)
SequenceData
,
SequenceDataSet
, SequenceDataList
, Modifier
or
ModifierSet
object.With the plotData
and plotDataByCoord
functions data
from a SequenceData
, SequenceDataSet
, SequenceDataList
,
Modifier
or ModifierSet
object can be visualized.
Internally the functionality of the Gviz
package is used. For each
SequenceData
and Modifier
class the getDataTrack
is
implemented returning a DataTrack
object
from the Gviz
package.
Positions to be visualized are selected by defining a genomic coordinate,
for which x
has to contain data.
plotData(x, name, from = 1L, to = 30L, type, ...) plotDataByCoord(x, coord, type, window.size = 15L, ...) getDataTrack(x, name, ...) ## S4 method for signature 'Modifier,GRanges' plotDataByCoord(x, coord, type = NA, window.size = 15L, ...) ## S4 method for signature 'Modifier' plotData( x, name, from, to, type = NA, showSequenceData = FALSE, showSequence = TRUE, showAnnotation = FALSE, ... ) ## S4 method for signature 'Modifier' getDataTrack(x, name = name, ...) ## S4 method for signature 'ModifierSet,GRanges' plotDataByCoord(x, coord, type = NA, window.size = 15L, ...) ## S4 method for signature 'ModifierSet' plotData( x, name, from, to, type = NA, showSequenceData = FALSE, showSequence = TRUE, showAnnotation = FALSE, ... ) ## S4 method for signature 'SequenceData,GRanges' plotDataByCoord(x, coord, type = NA, window.size = 15L, ...) ## S4 method for signature 'SequenceData' plotData( x, name, from, to, perTranscript = FALSE, showSequence = TRUE, showAnnotation = FALSE, ... ) ## S4 method for signature 'SequenceData' getDataTrack(x, name = name, ...) ## S4 method for signature 'SequenceDataList' getDataTrack(x, name = name, ...) ## S4 method for signature 'SequenceDataList,GRanges' plotDataByCoord(x, coord, type = NA, window.size = 15L, ...) ## S4 method for signature 'SequenceDataList' plotData( x, name, from, to, perTranscript = FALSE, showSequence = TRUE, showAnnotation = FALSE, ... ) ## S4 method for signature 'SequenceDataSet' getDataTrack(x, name = name, ...) ## S4 method for signature 'SequenceDataSet,GRanges' plotDataByCoord(x, coord, type = NA, window.size = 15L, ...) ## S4 method for signature 'SequenceDataSet' plotData( x, name, from, to, perTranscript = FALSE, showSequence = TRUE, showAnnotation = FALSE, ... )
plotData(x, name, from = 1L, to = 30L, type, ...) plotDataByCoord(x, coord, type, window.size = 15L, ...) getDataTrack(x, name, ...) ## S4 method for signature 'Modifier,GRanges' plotDataByCoord(x, coord, type = NA, window.size = 15L, ...) ## S4 method for signature 'Modifier' plotData( x, name, from, to, type = NA, showSequenceData = FALSE, showSequence = TRUE, showAnnotation = FALSE, ... ) ## S4 method for signature 'Modifier' getDataTrack(x, name = name, ...) ## S4 method for signature 'ModifierSet,GRanges' plotDataByCoord(x, coord, type = NA, window.size = 15L, ...) ## S4 method for signature 'ModifierSet' plotData( x, name, from, to, type = NA, showSequenceData = FALSE, showSequence = TRUE, showAnnotation = FALSE, ... ) ## S4 method for signature 'SequenceData,GRanges' plotDataByCoord(x, coord, type = NA, window.size = 15L, ...) ## S4 method for signature 'SequenceData' plotData( x, name, from, to, perTranscript = FALSE, showSequence = TRUE, showAnnotation = FALSE, ... ) ## S4 method for signature 'SequenceData' getDataTrack(x, name = name, ...) ## S4 method for signature 'SequenceDataList' getDataTrack(x, name = name, ...) ## S4 method for signature 'SequenceDataList,GRanges' plotDataByCoord(x, coord, type = NA, window.size = 15L, ...) ## S4 method for signature 'SequenceDataList' plotData( x, name, from, to, perTranscript = FALSE, showSequence = TRUE, showAnnotation = FALSE, ... ) ## S4 method for signature 'SequenceDataSet' getDataTrack(x, name = name, ...) ## S4 method for signature 'SequenceDataSet,GRanges' plotDataByCoord(x, coord, type = NA, window.size = 15L, ...) ## S4 method for signature 'SequenceDataSet' plotData( x, name, from, to, perTranscript = FALSE, showSequence = TRUE, showAnnotation = FALSE, ... )
x |
a |
name |
Only for |
from |
Only for |
to |
Only for |
type |
the data type of data show as data tracks. |
... |
optional parameters:
|
coord |
coordinates of a positions to subset to as a
|
window.size |
integer value for the number of positions on the left and
right site of the selected positions included in the plotting (default:
|
showSequenceData |
|
showSequence |
|
showAnnotation |
|
perTranscript |
|
a plot send to the active graphic device
data(msi,package="RNAmodR") plotData(msi[[1]], "2", from = 10L, to = 45L) ## Not run: plotData(msi, "2", from = 10L, to = 45L) ## End(Not run)
data(msi,package="RNAmodR") plotData(msi[[1]], "2", from = 10L, to = 45L) ## Not run: plotData(msi, "2", from = 10L, to = 45L) ## End(Not run)
Modifier
and ModifierSet
objectsplotROC
streamlines labeling, prediction, performance and plotting
functions to test the peformance of a Modifier
object and the data
analyzed via the functionallity from the ROCR
package.
The data from x
will be labeled as positive using the coord
arguments. The other arguments will be passed on to the specific ROCR
functions.
By default the prediction.args
include three values:
measure = "tpr"
x.measure = "fpr"
score = mainScore(x)
The remaining arguments are not predefined.
plotROC(x, coord, ...) ## S4 method for signature 'Modifier' plotROC( x, coord, score = NULL, prediction.args = list(), performance.args = list(), plot.args = list() ) ## S4 method for signature 'ModifierSet' plotROC( x, coord, score = NULL, prediction.args = list(), performance.args = list(), plot.args = list() )
plotROC(x, coord, ...) ## S4 method for signature 'Modifier' plotROC( x, coord, score = NULL, prediction.args = list(), performance.args = list(), plot.args = list() ) ## S4 method for signature 'ModifierSet' plotROC( x, coord, score = NULL, prediction.args = list(), performance.args = list(), plot.args = list() )
x |
a |
coord |
coordinates of position to label as positive. Either a
|
... |
additional arguments |
score |
the score identifier to subset to, if multiple scores are available. |
prediction.args |
arguments which will be used for calling
|
performance.args |
arguments which will be used for calling
|
plot.args |
arguments which will be used for calling |
a plot send to the active graphic device
Tobias Sing, Oliver Sander, Niko Beerenwinkel, Thomas Lengauer (2005): "ROCR: visualizing classifier performance in R." Bioinformatics 21(20):3940-3941 DOI: 10.1093/bioinformatics/bti623
data(msi,package="RNAmodR") # constructing a GRanges obejct to mark positive positions mod <- modifications(msi) coord <- unique(unlist(mod)) coord$score <- NULL coord$sd <- NULL # plotting a TPR vs. FPR plot per ModInosine object plotROC(msi[[1]],coord) # plotting a TPR vs. FPR plot per ModSetInosine object plotROC(msi,coord)
data(msi,package="RNAmodR") # constructing a GRanges obejct to mark positive positions mod <- modifications(msi) coord <- unique(unlist(mod)) coord$score <- NULL coord$sd <- NULL # plotting a TPR vs. FPR plot per ModInosine object plotROC(msi[[1]],coord) # plotting a TPR vs. FPR plot per ModSetInosine object plotROC(msi,coord)
ProtectedEndSequenceData
implements
SequenceData
to contain and aggregate the
start and ends of reads per position along a transcript.
ProtectedEndSequenceData
offsets the start position by -1 to align the
information on the 5'-3'-phosphate bonds to one position. The
ProtectedEndSequenceData
class is implemented specifically as required
for the RiboMethSeq
method.
The objects of type ProtectedEndSequenceData
contain three columns per
data file named using the following naming convention
protectedend.condition.replicate
.
aggregate
calculates the mean and sd for samples in the control
and treated
condition separatly.
ProtectedEndSequenceDataFrame( df, ranges, sequence, replicate, condition, bamfiles, seqinfo ) ProtectedEndSequenceData(bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature ## 'ProtectedEndSequenceData, ## BamFileList, ## GRangesList, ## XStringSet, ## ScanBamParam' getData(x, bamfiles, grl, sequences, param, args) ## S4 method for signature 'ProtectedEndSequenceData' aggregateData(x, condition = c("Both", "Treated", "Control")) ## S4 method for signature 'ProtectedEndSequenceData' getDataTrack(x, name, ...)
ProtectedEndSequenceDataFrame( df, ranges, sequence, replicate, condition, bamfiles, seqinfo ) ProtectedEndSequenceData(bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature ## 'ProtectedEndSequenceData, ## BamFileList, ## GRangesList, ## XStringSet, ## ScanBamParam' getData(x, bamfiles, grl, sequences, param, args) ## S4 method for signature 'ProtectedEndSequenceData' aggregateData(x, condition = c("Both", "Treated", "Control")) ## S4 method for signature 'ProtectedEndSequenceData' getDataTrack(x, name, ...)
df , ranges , sequence , replicate
|
inputs for creating a
|
condition |
For |
bamfiles , annotation , seqinfo , grl , sequences , param , args , ...
|
See
|
x |
a |
name |
For |
a ProtectedEndSequenceData
object
# Construction of a ProtectedEndSequenceData object library(RNAmodR.Data) library(rtracklayer) annotation <- GFF3File(RNAmodR.Data.example.man.gff3()) sequences <- RNAmodR.Data.example.man.fasta() files <- c(treated = RNAmodR.Data.example.wt.1()) pesd <- ProtectedEndSequenceData(files, annotation = annotation, sequences = sequences)
# Construction of a ProtectedEndSequenceData object library(RNAmodR.Data) library(rtracklayer) annotation <- GFF3File(RNAmodR.Data.example.man.gff3()) sequences <- RNAmodR.Data.example.man.fasta() files <- c(treated = RNAmodR.Data.example.wt.1()) pesd <- ProtectedEndSequenceData(files, annotation = annotation, sequences = sequences)
Post-transcriptional modifications can be found abundantly in rRNA and tRNA and can be detected classically via several strategies. However, difficulties arise if the identity and the position of the modified nucleotides is to be determined at the same time. Classically, a primer extension, a form of reverse transcription (RT), would allow certain modifications to be accessed by blocks during the RT changes or changes in the cDNA sequences. Other modification would need to be selectively treated by chemical reactions to influence the outcome of the reverse transcription.
With the increased availability of high throughput sequencing, these classical methods were adapted to high throughput methods allowing more RNA molecules to be accessed at the same time. With these advances post-transcriptional modifications were also detected on mRNA. Among these high throughput techniques are for example Pseudo-Seq (Carlile et al. 2014), RiboMethSeq (Birkedal et al. 2015) and AlkAnilineSeq (Marchand et al. 2018) each able to detect a specific type of modification from footprints in RNA-Seq data prepared with the selected methods.
Since similar pattern can be observed from some of these techniques, overlaps of the bioinformatical pipeline already are and will become more frequent with new emerging sequencing techniques.
RNAmodR
implements classes and a workflow to detect
post-transcriptional RNA modifications in high throughput sequencing data. It
is easily adaptable to new methods and can help during the phase of initial
method development as well as more complex screenings.
Briefly, from the SequenceData
, specific subclasses are derived for
accessing specific aspects of aligned reads, e.g. 5’-end positions or pileup
data. With this a Modifier
class can be used to detect specific
patterns for individual types of modifications. The SequenceData
classes can be shared by different Modifier
classes allowing easy
adaptation to new methods.
Felix G M Ernst [aut], Denis L.J. Lafontaine [ctb]
- Carlile TM, Rojas-Duran MF, Zinshteyn B, Shin H, Bartoli KM, Gilbert WV (2014): "Pseudouridine profiling reveals regulated mRNA pseudouridylation in yeast and human cells." Nature 515 (7525), P. 143–146. DOI: 10.1038/nature13802.
- Birkedal U, Christensen-Dalsgaard M, Krogh N, Sabarinathan R, Gorodkin J, Nielsen H (2015): "Profiling of ribose methylations in RNA by high-throughput sequencing." Angewandte Chemie (International ed. in English) 54 (2), P. 451–455. DOI: 10.1002/anie.201408362.
- Marchand V, Ayadi L, __Ernst FGM__, Hertler J, Bourguignon-Igel V, Galvanin A, Kotter A, Helm M, __Lafontaine DLJ__, Motorin Y (2018): "AlkAniline-Seq: Profiling of m7 G and m3 C RNA Modifications at Single Nucleotide Resolution." Angewandte Chemie (International ed. in English) 57 (51), P. 16785–16790. DOI: 10.1002/anie.201810946.
The RNAmodR.RiboMethSeq
and RNAmodR.AlkAnilineSeq
package.
The following datasets are contained in the RNAmodR package. They are used in the man page examples.
data(msi) data(sds) data(sdl) data(psd) data(e5sd) data(e3sd) data(esd) data(csd) data(ne3sd) data(ne5sd) data(pesd)
data(msi) data(sds) data(sdl) data(psd) data(e5sd) data(e3sd) data(esd) data(csd) data(ne3sd) data(ne5sd) data(pesd)
msi a ModSetInosine
instance
sds a SequenceDataSet
instance
sdl a SequenceDataList
instance
psd a PileupSequenceData
instance
e5sd a End5SequenceData
instance
e3sd a End3SequenceData
instance
esd a EndSequenceData
instance
csd a CoverageSequenceData
instance
ne3sd a NormEnd3SequenceData
instance
ne5sd a NormEnd5SequenceData
instance
pesd a ProtectedEndSequenceData
instance
An object of class SequenceDataSet
of length 2.
An object of class SequenceDataList
of length 3.
An object of class PileupSequenceData
of dimension 100 x 101 x 15 x 15.
An object of class End5SequenceData
of dimension 100 x 101 x 3 x 3.
An object of class End3SequenceData
of dimension 100 x 101 x 3 x 3.
An object of class EndSequenceData
of dimension 100 x 101 x 3 x 3.
An object of class CoverageSequenceData
of dimension 100 x 101 x 3 x 3.
An object of class NormEnd3SequenceData
of dimension 100 x 101 x 9 x 9.
An object of class NormEnd5SequenceData
of dimension 100 x 101 x 9 x 9.
An object of class ProtectedEndSequenceData
of dimension 100 x 101 x 3 x 3.
These functions are not intended for general use, but are used for additional package development.
getData
is used to load data into a
SequenceData
object and must be
implented for all SequenceData
classes. The results must match the
requirements outlined in the value section.
In addition the following functions should be implemented for complete functionality:
aggregateData
for each SequenceData
and Modifier
class.
See also aggregateData
findMod
for each Modifier
class. See also
findMod
.
plotData
/plotDataByCoord
for each Modifier
and ModifierSet
class. See also
plotData
.
The following helper function can be called from within findMod
to
construct a coordinate for each modification found:
constructModRanges
constructs a GRanges
object describing the
location, type and associated scores of a modification.
constructModRanges
is typically called from the modify
function, which must be implemented for all
Modifier
classes.
constructModRanges(range, data, modType, scoreFun, source, type) getData(x, bamfiles, grl, sequences, param, args) ## S4 method for signature 'GRanges,DataFrame' constructModRanges(range, data, modType, scoreFun, source, type)
constructModRanges(range, data, modType, scoreFun, source, type) getData(x, bamfiles, grl, sequences, param, args) ## S4 method for signature 'GRanges,DataFrame' constructModRanges(range, data, modType, scoreFun, source, type)
range |
for |
data |
for |
modType |
for |
scoreFun |
for |
source |
for |
type |
for |
x |
for |
bamfiles |
for |
grl |
for |
sequences |
for |
param |
for |
args |
for |
getData
: returns a list with elements per BamFile in
bamfiles
. Elements can be
IntegerList
,
NumericList
or a
CompressedSplitDataFrameList
. The
data in the elements must be order by increasing positions numbers. However,
names and rownames will be discarded.
constructModRanges
: returns a GRanges
object with
genomic coordinates of modified nucleotides in the associated transcripts.
# new SequenceData class setClass(Class = "ExampleSequenceData", contains = "SequenceData", prototype = list(minQuality = 5L)) ExampleSequenceData <- function(bamfiles, annotation, sequences, seqinfo, ...){ RNAmodR:::SequenceData("Example", bamfiles = bamfiles, annotation = annotation, sequences = sequences, seqinfo = seqinfo, ...) } setMethod("getData", signature = c(x = "ExampleSequenceData", bamfiles = "BamFileList", grl = "GRangesList", sequences = "XStringSet", param = "ScanBamParam"), definition = function(x, bamfiles, grl, sequences, param, args){ ### } ) setMethod("aggregateData", signature = c(x = "ExampleSequenceData"), function(x, condition = c("Both","Treated","Control")){ ### } ) setMethod( f = "getDataTrack", signature = c(x = "ExampleSequenceData"), definition = function(x, name, ...) { ### } ) # new Modifier class setClass("ModExample", contains = "Modifier", prototype = list(mod = "X", score = "score", dataType = "ExampleSequenceData")) ModExample <- function(x, annotation, sequences, seqinfo, ...){ RNAmodR:::Modifier("ModExample", x = x, annotation = annotation, sequences = sequences, seqinfo = seqinfo, ...) } setMethod(f = "aggregateData", signature = c(x = "ModExample"), definition = function(x, force = FALSE){ # Some data with element per transcript } ) setMethod("findMod", signature = c(x = "ModExample"), function(x){ # an element per modification found. } ) setMethod( f = "getDataTrack", signature = signature(x = "ModExample"), definition = function(x, name, type, ...) { } ) setMethod( f = "plotDataByCoord", signature = signature(x = "ModExample", coord = "GRanges"), definition = function(x, coord, type = "score", window.size = 15L, ...) { } ) setMethod( f = "plotData", signature = signature(x = "ModExample"), definition = function(x, name, from, to, type = "score", ...) { } ) # new ModifierSet class setClass("ModSetExample", contains = "ModifierSet", prototype = list(elementType = "ModExample")) ModSetExample <- function(x, annotation, sequences, seqinfo, ...){ RNAmodR:::ModifierSet("ModExample", x = x, annotation = annotation, sequences = sequences, seqinfo = seqinfo, ...) } setMethod( f = "plotDataByCoord", signature = signature(x = "ModSetExample", coord = "GRanges"), definition = function(x, coord, type = "score", window.size = 15L, ...) { } ) setMethod( f = "plotData", signature = signature(x = "ModSetExample"), definition = function(x, name, from, to, type = "score", ...) { } )
# new SequenceData class setClass(Class = "ExampleSequenceData", contains = "SequenceData", prototype = list(minQuality = 5L)) ExampleSequenceData <- function(bamfiles, annotation, sequences, seqinfo, ...){ RNAmodR:::SequenceData("Example", bamfiles = bamfiles, annotation = annotation, sequences = sequences, seqinfo = seqinfo, ...) } setMethod("getData", signature = c(x = "ExampleSequenceData", bamfiles = "BamFileList", grl = "GRangesList", sequences = "XStringSet", param = "ScanBamParam"), definition = function(x, bamfiles, grl, sequences, param, args){ ### } ) setMethod("aggregateData", signature = c(x = "ExampleSequenceData"), function(x, condition = c("Both","Treated","Control")){ ### } ) setMethod( f = "getDataTrack", signature = c(x = "ExampleSequenceData"), definition = function(x, name, ...) { ### } ) # new Modifier class setClass("ModExample", contains = "Modifier", prototype = list(mod = "X", score = "score", dataType = "ExampleSequenceData")) ModExample <- function(x, annotation, sequences, seqinfo, ...){ RNAmodR:::Modifier("ModExample", x = x, annotation = annotation, sequences = sequences, seqinfo = seqinfo, ...) } setMethod(f = "aggregateData", signature = c(x = "ModExample"), definition = function(x, force = FALSE){ # Some data with element per transcript } ) setMethod("findMod", signature = c(x = "ModExample"), function(x){ # an element per modification found. } ) setMethod( f = "getDataTrack", signature = signature(x = "ModExample"), definition = function(x, name, type, ...) { } ) setMethod( f = "plotDataByCoord", signature = signature(x = "ModExample", coord = "GRanges"), definition = function(x, coord, type = "score", window.size = 15L, ...) { } ) setMethod( f = "plotData", signature = signature(x = "ModExample"), definition = function(x, name, from, to, type = "score", ...) { } ) # new ModifierSet class setClass("ModSetExample", contains = "ModifierSet", prototype = list(elementType = "ModExample")) ModSetExample <- function(x, annotation, sequences, seqinfo, ...){ RNAmodR:::ModifierSet("ModExample", x = x, annotation = annotation, sequences = sequences, seqinfo = seqinfo, ...) } setMethod( f = "plotDataByCoord", signature = signature(x = "ModSetExample", coord = "GRanges"), definition = function(x, coord, type = "score", window.size = 15L, ...) { } ) setMethod( f = "plotData", signature = signature(x = "ModSetExample"), definition = function(x, name, from, to, type = "score", ...) { } )
The SequenceData
class is implemented to contain data on each position
along transcripts and holds the corresponding annotation data and
nucleotide sequence of these transcripts. To access this data several
functions
are available. The
SequenceData
class is a virtual class, from which specific classes can
be extended. Currently the following classes are implemented:
The annotation and sequence data can be accessed through the functions
ranges
and sequences
, respectively. Beaware, that the data is
always provided according to genomic positions with increasing
rownames
, but the sequence is given as the actual sequence of the
transcript. Therefore, it is necessary to treat the minus strand accordingly.
The SequenceData
class is derived from the
CompressedSplitDataFrameList
class
with additional slots for annotation and sequence data. Some functionality is
not inherited and might not available to full extend, e.g.relist
.
SequenceDataFrame
The SequenceDataFrame
class is a virtual class and contains data for
positions along a single transcript. In addition to being used for returning
elements from a SequenceData
object, the SequenceDataFrame class is
used to store the unlisted data within a
SequenceData
object. Therefore, a matching
SequenceData
and SequenceDataFrame
class must be implemented.
The SequenceDataFrame
class is derived from the
DataFrame
class.
Subsetting of a SequenceDataFrame
returns a SequenceDataFrame
or
DataFrame
, if it is subset by a column or row, respectively. The
drop
argument is ignored for column subsetting.
## S4 method for signature 'SequenceData' cbind(..., deparse.level = 1) ## S4 method for signature 'SequenceData' rbind(..., deparse.level = 1) SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'character,character' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'character,BSgenome' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'TxDb,character' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'TxDb,BSgenome' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'GRangesList,character' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'GRangesList,BSgenome' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'GFF3File,BSgenome' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'GFF3File,character' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'character,FaFile' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'GFF3File,FaFile' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'TxDb,FaFile' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'GRangesList,FaFile' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
## S4 method for signature 'SequenceData' cbind(..., deparse.level = 1) ## S4 method for signature 'SequenceData' rbind(..., deparse.level = 1) SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'character,character' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'character,BSgenome' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'TxDb,character' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'TxDb,BSgenome' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'GRangesList,character' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'GRangesList,BSgenome' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'GFF3File,BSgenome' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'GFF3File,character' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'character,FaFile' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'GFF3File,FaFile' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'TxDb,FaFile' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...) ## S4 method for signature 'GRangesList,FaFile' SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
... |
Optional arguments overwriting default values. Not all
|
deparse.level |
See |
dataType |
The prefix for construction the class name of the
|
bamfiles |
the input which can be of the following types
|
annotation |
annotation data, which must match the information contained in the BAM files. |
sequences |
sequences matching the target sequences the reads were mapped onto. This must match the information contained in the BAM files. |
seqinfo |
optional |
A SequenceData object
sequencesType
a character
value for the class name of
sequences
. Either RNAStringSet
, ModRNAStringSet
,
DNAStringSet
or ModDNAStringSet
.
minQuality
a integer
value describing a threshold of the minimum
quality of reads to be used.
The SequenceData
, SequenceDataSet
, SequenceDataList
and
SequenceDataFrame
classes share functionality. Have a look at the
elements listed directly below.
replicates(x) ## S4 method for signature 'SequenceDataFrame' show(object) ## S4 method for signature 'SequenceDataFrame' conditions(object) ## S4 method for signature 'SequenceDataFrame' bamfiles(x) ## S4 method for signature 'SequenceDataFrame' dataType(x) ## S4 method for signature 'SequenceDataFrame' ranges(x) ## S4 method for signature 'SequenceDataFrame' replicates(x) ## S4 method for signature 'SequenceDataFrame' seqinfo(x) ## S4 method for signature 'SequenceDataFrame' seqinfo(x) ## S4 method for signature 'SequenceDataFrame' seqtype(x) ## S4 replacement method for signature 'SequenceDataFrame' seqtype(x) <- value ## S4 method for signature 'SequenceDataFrame' sequences(x) ## S4 method for signature 'SequenceData' show(object) ## S4 method for signature ## 'SequenceData,BamFileList,GRangesList,XStringSet,ScanBamParam' getData(x, bamfiles, grl, sequences, param, args) ## S4 method for signature 'SequenceData' bamfiles(x) ## S4 method for signature 'SequenceData' conditions(object) ## S4 method for signature 'SequenceData' ranges(x) ## S4 method for signature 'SequenceData' replicates(x) ## S4 method for signature 'SequenceData' seqinfo(x) ## S4 method for signature 'SequenceData' sequences(x) ## S4 method for signature 'SequenceData' seqtype(x) ## S4 replacement method for signature 'SequenceData' seqtype(x) <- value ## S4 method for signature 'SequenceData' dataType(x) ## S4 method for signature 'SequenceDataSet' show(object) ## S4 method for signature 'SequenceDataSet' bamfiles(x) ## S4 method for signature 'SequenceDataSet' conditions(object) ## S4 method for signature 'SequenceDataSet' names(x) ## S4 method for signature 'SequenceDataSet' ranges(x) ## S4 method for signature 'SequenceDataSet' replicates(x) ## S4 method for signature 'SequenceDataSet' seqinfo(x) ## S4 method for signature 'SequenceDataSet' seqtype(x) ## S4 replacement method for signature 'SequenceDataSet' seqtype(x) <- value ## S4 method for signature 'SequenceDataSet' sequences(x) ## S4 method for signature 'SequenceDataList' show(object) ## S4 method for signature 'SequenceDataList' bamfiles(x) ## S4 method for signature 'SequenceDataList' conditions(object) ## S4 method for signature 'SequenceDataList' names(x) ## S4 method for signature 'SequenceDataList' ranges(x) ## S4 method for signature 'SequenceDataList' replicates(x) ## S4 method for signature 'SequenceDataList' seqinfo(x) ## S4 method for signature 'SequenceDataList' seqtype(x) ## S4 replacement method for signature 'SequenceDataList' seqtype(x) <- value ## S4 method for signature 'SequenceDataList' sequences(x)
replicates(x) ## S4 method for signature 'SequenceDataFrame' show(object) ## S4 method for signature 'SequenceDataFrame' conditions(object) ## S4 method for signature 'SequenceDataFrame' bamfiles(x) ## S4 method for signature 'SequenceDataFrame' dataType(x) ## S4 method for signature 'SequenceDataFrame' ranges(x) ## S4 method for signature 'SequenceDataFrame' replicates(x) ## S4 method for signature 'SequenceDataFrame' seqinfo(x) ## S4 method for signature 'SequenceDataFrame' seqinfo(x) ## S4 method for signature 'SequenceDataFrame' seqtype(x) ## S4 replacement method for signature 'SequenceDataFrame' seqtype(x) <- value ## S4 method for signature 'SequenceDataFrame' sequences(x) ## S4 method for signature 'SequenceData' show(object) ## S4 method for signature ## 'SequenceData,BamFileList,GRangesList,XStringSet,ScanBamParam' getData(x, bamfiles, grl, sequences, param, args) ## S4 method for signature 'SequenceData' bamfiles(x) ## S4 method for signature 'SequenceData' conditions(object) ## S4 method for signature 'SequenceData' ranges(x) ## S4 method for signature 'SequenceData' replicates(x) ## S4 method for signature 'SequenceData' seqinfo(x) ## S4 method for signature 'SequenceData' sequences(x) ## S4 method for signature 'SequenceData' seqtype(x) ## S4 replacement method for signature 'SequenceData' seqtype(x) <- value ## S4 method for signature 'SequenceData' dataType(x) ## S4 method for signature 'SequenceDataSet' show(object) ## S4 method for signature 'SequenceDataSet' bamfiles(x) ## S4 method for signature 'SequenceDataSet' conditions(object) ## S4 method for signature 'SequenceDataSet' names(x) ## S4 method for signature 'SequenceDataSet' ranges(x) ## S4 method for signature 'SequenceDataSet' replicates(x) ## S4 method for signature 'SequenceDataSet' seqinfo(x) ## S4 method for signature 'SequenceDataSet' seqtype(x) ## S4 replacement method for signature 'SequenceDataSet' seqtype(x) <- value ## S4 method for signature 'SequenceDataSet' sequences(x) ## S4 method for signature 'SequenceDataList' show(object) ## S4 method for signature 'SequenceDataList' bamfiles(x) ## S4 method for signature 'SequenceDataList' conditions(object) ## S4 method for signature 'SequenceDataList' names(x) ## S4 method for signature 'SequenceDataList' ranges(x) ## S4 method for signature 'SequenceDataList' replicates(x) ## S4 method for signature 'SequenceDataList' seqinfo(x) ## S4 method for signature 'SequenceDataList' seqtype(x) ## S4 replacement method for signature 'SequenceDataList' seqtype(x) <- value ## S4 method for signature 'SequenceDataList' sequences(x)
x , object
|
a |
value |
a new |
bamfiles |
a |
grl |
a |
sequences |
a |
param |
a |
args |
a list of addition arguments |
seqinfo
: a Seqinfo
object ().
sequences
: a RNAStingSet
object or a RNAString
object for a SequenceDataFrame
.
ranges
: a GRangesList
object with each element per
transcript or a GRanges
object for a SequenceDataFrame
.
bamfiles
: a BamFileList
object or a SimpleList of
BamFileList
objects for a SequenceDataList
.
data(e5sd,package="RNAmodR") # general accessors seqinfo(e5sd) sequences(e5sd) ranges(e5sd) bamfiles(e5sd)
data(e5sd,package="RNAmodR") # general accessors seqinfo(e5sd) sequences(e5sd) ranges(e5sd) bamfiles(e5sd)
The SequenceDataFrame
class is a virtual class and contains data for
positions along a single transcript. In addition to being used for returning
elements from a SequenceData
object, the SequenceDataFrame class is
used to store the unlisted data within a
SequenceData
object. Therefore, a matching
SequenceData
and SequenceDataFrame
class must be implemented.
The SequenceDataFrame
class is derived from the
DataFrame
class. To follow the
functionallity in the S4Vectors
package, SequenceDataFrame
implements the concept, whereas SequenceDFrame
is the implementation
for in-memory data representation from which some specific
*SequenceDataFrame
class derive from, e.g.
CoverageSequenceData
.
Subsetting of a SequenceDataFrame
returns a SequenceDataFrame
or
DataFrame
, if it is subset by a column or row, respectively. The
drop
argument is ignored for column subsetting.
## S4 method for signature 'SequenceDataFrame' cbind(..., deparse.level = 1) ## S4 method for signature 'SequenceDataFrame,ANY,ANY,ANY' x[i, j, ..., drop = TRUE]
## S4 method for signature 'SequenceDataFrame' cbind(..., deparse.level = 1) ## S4 method for signature 'SequenceDataFrame,ANY,ANY,ANY' x[i, j, ..., drop = TRUE]
x , i , j , ... , drop , deparse.level
|
arguments used for
|
A SequenceDataFrame
object or if subset to row a
DataFrame
ranges
a GRanges
object each element describing a transcript including its element. The
GRanges
is constructed from the unlisted results of the
exonsBy(x, by="tx")
function.
If during construction a GRangesList
is provided instead of a
character value pointing to a gff3 file or a TxDb
object, it must have
a comparable structure.
sequence
a XString
of
type sequencesType
from the parent
SequenceData
object.
condition
conditions along the
BamFileList
: Either control
or treated
replicate
replicate number along the BamFileList
for each of the
condition types.
bamfiles
the input bam files as
BamFileList
seqinfo
a Seqinfo
describing
the avialable/used chromosomes.
for an example see
ProtectedEndSequenceData
and for more information see SequenceData
data(e5sd,package="RNAmodR") # A SequenceDataFrame can is usually constructed by subsetting from # a SequenceData object sdf <- e5sd[[1]] # Its also used to store the unlisted data in a SequenceData object sdf <- unlist(e5sd) # should probably only used internally e5sd <- relist(sdf,e5sd)
data(e5sd,package="RNAmodR") # A SequenceDataFrame can is usually constructed by subsetting from # a SequenceData object sdf <- e5sd[[1]] # Its also used to store the unlisted data in a SequenceData object sdf <- unlist(e5sd) # should probably only used internally e5sd <- relist(sdf,e5sd)
The SequenceDataList
class is used to hold SequenceData
or
SequenceDataSet
objects as its elements. It is derived from the
List
class.
The SequenceDataList
is used to hold data from different sets of
aligned reads. This allows multiple methods to be aggregated into one
modification detection strategy. Annotation and sequence data must be the
same for all elements, however the bam files can be different.
SequenceDataList(...)
SequenceDataList(...)
... |
The elements to be included in the |
a SequenceDataList
data(psd,package="RNAmodR") data(e5sd,package="RNAmodR") sdl <- SequenceDataList(SequenceDataSet(psd,e5sd),e5sd)
data(psd,package="RNAmodR") data(e5sd,package="RNAmodR") sdl <- SequenceDataList(SequenceDataSet(psd,e5sd),e5sd)
The SequenceDataSet
class is used to hold SequenceData
objects
as its elements. It is derived from the
List
class.
The SequenceDataSet
is used to hold different data types from the of
same aligned reads. The same dataset can be used to generate multiple sets of
data types. Bam files, annotation and sequence data must be the same for all
elements.
SequenceDataSet(...)
SequenceDataSet(...)
... |
The elements to be included in the |
a SequenceDataSet
data(psd,package="RNAmodR") data(e5sd,package="RNAmodR") sdl <- SequenceDataSet(psd,e5sd)
data(psd,package="RNAmodR") data(e5sd,package="RNAmodR") sdl <- SequenceDataSet(psd,e5sd)
A Gviz
compatible
SequenceTrack
for showing modified
DNA sequences.
ModDNASequenceTrack(sequence, chromosome, genome, name = "SequenceTrack", ...) ## S4 method for signature 'SequenceModDNAStringSetTrack' seqnames(x) ## S4 method for signature 'SequenceModDNAStringSetTrack' seqlevels(x)
ModDNASequenceTrack(sequence, chromosome, genome, name = "SequenceTrack", ...) ## S4 method for signature 'SequenceModDNAStringSetTrack' seqnames(x) ## S4 method for signature 'SequenceModDNAStringSetTrack' seqlevels(x)
sequence |
A |
chromosome , genome , name , ...
|
See
|
x |
A |
a SequenceModDNAStringSetTrack
object
sequence
A ModDNAStringSet
object
seq <- ModDNAStringSet(c(chr1 = paste0(alphabet(ModDNAString()), collapse = ""))) st <- ModDNASequenceTrack(seq) Gviz::plotTracks(st, chromosome = "chr1",from = 1L, to = 20L)
seq <- ModDNAStringSet(c(chr1 = paste0(alphabet(ModDNAString()), collapse = ""))) st <- ModDNASequenceTrack(seq) Gviz::plotTracks(st, chromosome = "chr1",from = 1L, to = 20L)
A Gviz
compatible
SequenceTrack
for showing modified
RNA sequences.
ModRNASequenceTrack(sequence, chromosome, genome, name = "SequenceTrack", ...) ## S4 method for signature 'SequenceModRNAStringSetTrack' seqnames(x) ## S4 method for signature 'SequenceModRNAStringSetTrack' seqlevels(x)
ModRNASequenceTrack(sequence, chromosome, genome, name = "SequenceTrack", ...) ## S4 method for signature 'SequenceModRNAStringSetTrack' seqnames(x) ## S4 method for signature 'SequenceModRNAStringSetTrack' seqlevels(x)
sequence |
A |
chromosome , genome , name , ...
|
See
|
x |
A |
a SequenceModRNAStringSetTrack
object
sequence
A ModRNAStringSet
object
seq <- ModRNAStringSet(c(chr1 = paste0(alphabet(ModRNAString()), collapse = ""))) st <- ModRNASequenceTrack(seq) Gviz::plotTracks(st, chromosome = "chr1",from = 1L, to = 20L)
seq <- ModRNAStringSet(c(chr1 = paste0(alphabet(ModRNAString()), collapse = ""))) st <- ModRNASequenceTrack(seq) Gviz::plotTracks(st, chromosome = "chr1",from = 1L, to = 20L)
Modifier
objectsDepending on data prepation, quality and desired stringency of a modification
strategy, settings for cut off parameters or other variables may need to be
adjusted. This should be rarely the case, but a function for changing these
settings, is implemented as the... settings
function.
For changing values the input can be either a list
or something
coercible to a list
. Upon changing a setting, the validity of the
value in terms of type(!) and dimensions will be checked.
If settings have been modified after the data was loaded, the data is
potentially invalid. To update the data, run the aggregate
or the
modify
function.
settings(x, name = NULL) settings(x, name) <- value ## S4 method for signature 'Modifier' settings(x, name = NULL) ## S4 replacement method for signature 'Modifier' settings(x) <- value ## S4 method for signature 'ModifierSet' settings(x, name = NULL) ## S4 replacement method for signature 'ModifierSet' settings(x) <- value
settings(x, name = NULL) settings(x, name) <- value ## S4 method for signature 'Modifier' settings(x, name = NULL) ## S4 replacement method for signature 'Modifier' settings(x) <- value ## S4 method for signature 'ModifierSet' settings(x, name = NULL) ## S4 replacement method for signature 'ModifierSet' settings(x) <- value
x |
a |
name |
name of the setting to be returned or set |
value |
value of the setting to be set |
If name
is omitted, settings
returns a list of all settings.
If name
is set, settings
returns a single settings or
NULL
, if a value for name
is not available.
data(msi,package="RNAmodR") mi <- msi[[1]] # returns a list of all settings settings(mi) # accesses a specific setting settings(mi,"minCoverage") # modification of setting settings(mi) <- list(minCoverage = 11L)
data(msi,package="RNAmodR") mi <- msi[[1]] # returns a list of all settings settings(mi) # accesses a specific setting settings(mi,"minCoverage") # modification of setting settings(mi) <- list(minCoverage = 11L)
stats
returns information about reads used in the RNAmodR analysis.
Three modes are available depending on which type of object is provided. If a
SequenceData
object is provided, a
BamFile
or
BamFileList
must be provided as well. If a
Modifier
object is used, the bam files
returned from the bamfiles
function are used. This is also the case,
if a ModifierSet
object is used.
stats(x, file, ...) ## S4 method for signature 'SequenceData,BamFile' stats(x, file, ...) ## S4 method for signature 'SequenceData,BamFileList' stats(x, file, ...) ## S4 method for signature 'Modifier,missing' stats(x) ## S4 method for signature 'ModifierSet,missing' stats(x)
stats(x, file, ...) ## S4 method for signature 'SequenceData,BamFile' stats(x, file, ...) ## S4 method for signature 'SequenceData,BamFileList' stats(x, file, ...) ## S4 method for signature 'Modifier,missing' stats(x) ## S4 method for signature 'ModifierSet,missing' stats(x)
x |
a |
file |
a |
... |
optional parameters used as stated
|
a DataFrame
, DataFrameList
or SimpleList
with
the results in aggregated form
library(RNAmodR.Data) library(rtracklayer) sequences <- RNAmodR.Data.example.AAS.fasta() annotation <- GFF3File(RNAmodR.Data.example.AAS.gff3()) files <- list("SampleSet1" = c(treated = RNAmodR.Data.example.wt.1(), treated = RNAmodR.Data.example.wt.2(), treated = RNAmodR.Data.example.wt.3()), "SampleSet2" = c(treated = RNAmodR.Data.example.bud23.1(), treated = RNAmodR.Data.example.bud23.2()), "SampleSet3" = c(treated = RNAmodR.Data.example.trm8.1(), treated = RNAmodR.Data.example.trm8.2())) msi <- ModSetInosine(files, annotation = annotation, sequences = sequences) # smallest chunk of information stats(sequenceData(msi[[1L]]),bamfiles(msi[[1L]])[[1L]]) # partial information stats(sequenceData(msi[[1L]]),bamfiles(msi[[1L]])) # the whole stats stats(msi)
library(RNAmodR.Data) library(rtracklayer) sequences <- RNAmodR.Data.example.AAS.fasta() annotation <- GFF3File(RNAmodR.Data.example.AAS.gff3()) files <- list("SampleSet1" = c(treated = RNAmodR.Data.example.wt.1(), treated = RNAmodR.Data.example.wt.2(), treated = RNAmodR.Data.example.wt.3()), "SampleSet2" = c(treated = RNAmodR.Data.example.bud23.1(), treated = RNAmodR.Data.example.bud23.2()), "SampleSet3" = c(treated = RNAmodR.Data.example.trm8.1(), treated = RNAmodR.Data.example.trm8.2())) msi <- ModSetInosine(files, annotation = annotation, sequences = sequences) # smallest chunk of information stats(sequenceData(msi[[1L]]),bamfiles(msi[[1L]])[[1L]]) # partial information stats(sequenceData(msi[[1L]]),bamfiles(msi[[1L]])) # the whole stats stats(msi)
SequenceData
, SequenceDataSet
,
SequenceDataList
, Modifier
or ModifierSet
object.With the subsetByCoord
function data from a SequenceData
,
SequenceDataSet
, SequenceDataList
, Modifier
or
ModifierSet
object can be subset to positions as defined in
coord
.
If coord
contains a column mod
and x
is a
Modifier
object, it will be filtered to identifiers matching the
modType
of x
. To disable this
behaviour remove the column mod
from coord
or set type =
NA
labelByCoord
functions similarly. It will return a
SplitDataFrameList
, which matches the dimensions of the aggregated
data plus the labels
column, which contains logical values to indicate
selected positions.
subsetByCoord(x, coord, ...) labelByCoord(x, coord, ...) ## S4 method for signature 'Modifier,GRanges' subsetByCoord(x, coord, ...) ## S4 method for signature 'Modifier,GRangesList' subsetByCoord(x, coord, ...) ## S4 method for signature 'ModifierSet' subset(x, name, pos = 1L, ...) ## S4 method for signature 'ModifierSet,GRanges' subsetByCoord(x, coord, ...) ## S4 method for signature 'ModifierSet,GRangesList' subsetByCoord(x, coord, ...) ## S4 method for signature 'Modifier,GRanges' labelByCoord(x, coord, ...) ## S4 method for signature 'Modifier,GRangesList' labelByCoord(x, coord, ...) ## S4 method for signature 'ModifierSet,GRanges' labelByCoord(x, coord, ...) ## S4 method for signature 'ModifierSet,GRangesList' labelByCoord(x, coord, ...) ## S4 method for signature 'SplitDataFrameList,GRanges' subsetByCoord(x, coord, ...) ## S4 method for signature 'SequenceData' subset(x, name, pos = 1L, ...) ## S4 method for signature 'SequenceData,GRanges' subsetByCoord(x, coord, ...) ## S4 method for signature 'SequenceData,GRangesList' subsetByCoord(x, coord, ...) ## S4 method for signature 'SequenceDataSet' subset(x, name, pos = 1L, ...) ## S4 method for signature 'SequenceDataSet,GRanges' subsetByCoord(x, coord, ...) ## S4 method for signature 'SequenceDataSet,GRangesList' subsetByCoord(x, coord, ...) ## S4 method for signature 'SequenceDataList' subset(x, name, pos = 1L, ...) ## S4 method for signature 'SequenceDataList,GRanges' subsetByCoord(x, coord, ...) ## S4 method for signature 'SequenceDataList,GRangesList' subsetByCoord(x, coord, ...) ## S4 method for signature 'SequenceData,GRanges' labelByCoord(x, coord, ...) ## S4 method for signature 'SequenceData,GRangesList' labelByCoord(x, coord, ...) ## S4 method for signature 'SequenceDataSet,GRanges' labelByCoord(x, coord, ...) ## S4 method for signature 'SequenceDataSet,GRangesList' labelByCoord(x, coord, ...) ## S4 method for signature 'SequenceDataList,GRanges' labelByCoord(x, coord, ...) ## S4 method for signature 'SequenceDataList,GRangesList' labelByCoord(x, coord, ...)
subsetByCoord(x, coord, ...) labelByCoord(x, coord, ...) ## S4 method for signature 'Modifier,GRanges' subsetByCoord(x, coord, ...) ## S4 method for signature 'Modifier,GRangesList' subsetByCoord(x, coord, ...) ## S4 method for signature 'ModifierSet' subset(x, name, pos = 1L, ...) ## S4 method for signature 'ModifierSet,GRanges' subsetByCoord(x, coord, ...) ## S4 method for signature 'ModifierSet,GRangesList' subsetByCoord(x, coord, ...) ## S4 method for signature 'Modifier,GRanges' labelByCoord(x, coord, ...) ## S4 method for signature 'Modifier,GRangesList' labelByCoord(x, coord, ...) ## S4 method for signature 'ModifierSet,GRanges' labelByCoord(x, coord, ...) ## S4 method for signature 'ModifierSet,GRangesList' labelByCoord(x, coord, ...) ## S4 method for signature 'SplitDataFrameList,GRanges' subsetByCoord(x, coord, ...) ## S4 method for signature 'SequenceData' subset(x, name, pos = 1L, ...) ## S4 method for signature 'SequenceData,GRanges' subsetByCoord(x, coord, ...) ## S4 method for signature 'SequenceData,GRangesList' subsetByCoord(x, coord, ...) ## S4 method for signature 'SequenceDataSet' subset(x, name, pos = 1L, ...) ## S4 method for signature 'SequenceDataSet,GRanges' subsetByCoord(x, coord, ...) ## S4 method for signature 'SequenceDataSet,GRangesList' subsetByCoord(x, coord, ...) ## S4 method for signature 'SequenceDataList' subset(x, name, pos = 1L, ...) ## S4 method for signature 'SequenceDataList,GRanges' subsetByCoord(x, coord, ...) ## S4 method for signature 'SequenceDataList,GRangesList' subsetByCoord(x, coord, ...) ## S4 method for signature 'SequenceData,GRanges' labelByCoord(x, coord, ...) ## S4 method for signature 'SequenceData,GRangesList' labelByCoord(x, coord, ...) ## S4 method for signature 'SequenceDataSet,GRanges' labelByCoord(x, coord, ...) ## S4 method for signature 'SequenceDataSet,GRangesList' labelByCoord(x, coord, ...) ## S4 method for signature 'SequenceDataList,GRanges' labelByCoord(x, coord, ...) ## S4 method for signature 'SequenceDataList,GRangesList' labelByCoord(x, coord, ...)
x |
a |
coord |
coordinates of position to subset to. Either a |
... |
Optional parameters:
|
name |
Optional: Limit results to one specific transcript. |
pos |
Optional: Limit results to a specific position. |
If 'x' is a
SequenceData
or
Modifier
: a SplitDataFrameList
with elments per transcript.
SequenceDataSet
,
SequenceDataList
or
ModifierSet
: a SimpleList
of
SplitDataFrameList
with elments per transcript.
data(msi,package="RNAmodR") mod <- modifications(msi) coord <- unique(unlist(mod)) coord$score <- NULL coord$sd <- NULL subsetByCoord(msi,coord)
data(msi,package="RNAmodR") mod <- modifications(msi) coord <- unique(unlist(mod)) coord$score <- NULL coord$sd <- NULL subsetByCoord(msi,coord)