Title: | Analysis of Transposable Elements |
---|---|
Description: | Quantify expression of transposable elements (TEs) from RNA-seq data through different methods, including ERVmap, TEtranscripts and Telescope. A common interface is provided to use each of these methods, which consists of building a parameter object, calling the quantification function with this object and getting a SummarizedExperiment object as output container of the quantified expression profiles. The implementation allows one to quantify TEs and gene transcripts in an integrated manner. |
Authors: | Beatriz Calvo-Serra [aut], Robert Castelo [aut, cre] |
Maintainer: | Robert Castelo <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.13.0 |
Built: | 2024-10-30 03:41:00 UTC |
Source: | https://github.com/bioc/atena |
The atena package provides a complete re-implementation in R of three existing methods for the quantification of transposable element (TE) expression in order to facilitate its integration into Bioconductor workflows for the analysis of RNA-seq data. The three methods are TEtranscripts (Jin et al. (2015)), ERVmap (Tokuyama et al. (2018)) and Telescope (Bendall et al.(2019)).
The main functions are:
TEtranscriptsParam
- build parameter objects of the class TEtranscriptsParam-class
for the TEtranscripts expression quantification method
ERVmapParam
- build parameter objects of the class ERVmapParam-class
for the ERVmap expression quantification method
TelescopeParam
- build parameter objects of the class TelescopeParam-class
for the Telescope expression quantification method
qtex
- call the TE expression quantification method using a previously built parameter object
For detailed information on usage, see the package vignette, by typing
vignette("atena")
.
All questions and bug reports should be posted to the Bioconductor Support Site:
https://support.bioconductor.org
The code of the development version of the package is available at the GitHub repository:
https://github.com/functionalgenomics/atena
Maintainer: Beatriz Calvo-Serra [email protected]
Authors:
Robert Castelo [email protected]
Jin Y et al. TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets. Bioinformatics. 2015;31(22):3593-3599. DOI: https://doi.org/10.1093/bioinformatics/btv422
Tokuyama M et al. ERVmap analysis reveals genome-wide transcription of human endogenous retroviruses. PNAS. 2018;115(50):12565-12572. DOI: https://doi.org/10.1073/pnas.1814589115
Bendall et al. Telescope: characterization of the retrotranscriptome by accurate estimation of transposable element expression. PLOS Comp. Biol. 2019;15(9):e1006453. DOI: https://doi.org/10.1371/journal.pcbi.1006453
Useful links:
The annotaTEs()
function fetches RepeatMasker UCSC transposable
element (TE) annotations using
AnnotationHub and parses them.
annotaTEs( genome = "hg38", parsefun = rmskidentity, verbose = TRUE, AHid = NULL, ... )
annotaTEs( genome = "hg38", parsefun = rmskidentity, verbose = TRUE, AHid = NULL, ... )
genome |
The genome version of the desired RepeatMasker annotations (e.g. "hg38"). |
parsefun |
A
|
verbose |
(Default |
AHid |
AnnotationHub unique identifier, of the form AH12345, of an
object with TE annotations. This is an optional argument to
specify a concrete AnnotationHub resource, for instance
when more there is more than one RepeatMasker annotation
available for a specific genome version. If |
... |
Arguments passed to |
Given a specific genome version, the annotaTEs()
function fetches
RepeatMasker annotations from UCSC Genome Browser using the
AnnotationHub package. Since RepeatMasker not only
provides TE annotations but also low complexity DNA sequences and other
types of repeats, a specific parsefun
can be set to parse these
annotations (e.g. rmskbasicparser
or a user-defined function). If no
parsing is required, parsefun
can be set to rmskidentity
.
A GRanges object with transposable element annotations.
rmskid <- annotaTEs(genome="hg19", parsefun=rmskidentity) rmskid
rmskid <- annotaTEs(genome="hg19", parsefun=rmskidentity) rmskid
Getter functions of TE classes from parsed RepeatMasker annotations.
getLTRs( annot, relLength = 0.9, fullLength = TRUE, partial = FALSE, soloLTR = FALSE, otherLTR = FALSE, returnMask = FALSE ) getLINEs(annot, relLength = 0.9, returnMask = FALSE) getSINEs(annot, relLength = 0.9, returnMask = FALSE) getDNAtransposons(annot, relLength = 0.9, returnMask = FALSE)
getLTRs( annot, relLength = 0.9, fullLength = TRUE, partial = FALSE, soloLTR = FALSE, otherLTR = FALSE, returnMask = FALSE ) getLINEs(annot, relLength = 0.9, returnMask = FALSE) getSINEs(annot, relLength = 0.9, returnMask = FALSE) getDNAtransposons(annot, relLength = 0.9, returnMask = FALSE)
annot |
A ['GRanges'] or ['GRangesList'] object obtained with the function 'annotaTES()', using either ['OneCodeToFindThemAll'] or ['rmskatenaparser'] as RepeatMasker parser functions. Alternatively, if 'annot' is a ['QuantifyParam'] or a ['SummarizedExperiment'] object produced by the 'qtex()' function, this function will attempt to extract the corresponding annotations from inside those objects. |
relLength |
(Default 0.9) Numeric value that can take values between 0
to 1. Sets the minimum relative length required for
features. Elements with a lower relative length than
|
fullLength |
(Default TRUE) Logical value on whether reconstructed full-length LTR TEs (elements with structure LTR - internal region - LTR) should be selected. |
partial |
(Default FALSE) Logical value on whether partially reconstructed LTR TEs should be selected (structure LTR - internal region or internal region - LTR). |
soloLTR |
(Default FALSE) Logical value on whether solo LTRs should be selected. Note that only fragments unambiguously identified as LTRs thanks to the identification of their equivalent internal region are considered as LTRs. |
otherLTR |
(Default FALSE) Logical value on whether other TEs from the LTR class, not included in any of the previous three categories, should be selected. These include TEs from LTR class that cannot be unambiguously identified as LTR o internal region, and thus cannot be reconstructed into partial or full-length elements; as well as solo internal regions. |
returnMask |
(Default FALSE) Logical value indicating whether a subset
of the input annotations should be returned (default) or
a logical mask of the same length as the input annotations
where |
Retrieves annotations from the TE class corresponding to the getter function,
using RepeatMasker annotations after parsing them with the
OneCodeToFindThemAll()
or rmskatenaparser()
function. The
relLength
parameter can be used to filter out elements with a lower
relative length. Further parameters can be used to fine-tune the type of
elements to be reported.
A GRangesList object with annotations from class corresponding to the getter function (LTRs, LINEs, SINEs or DNA transposons).
rmskat <- annotaTEs(genome="dm6", parsefun=rmskatenaparser, strict=FALSE) rmskatLTR <- getLTRs(rmskat, relLength=0.95, fullLength=TRUE, partial=TRUE) rmskatLTR rmskat_line <- getLINEs(rmskat, relLength=0.95) rmskat_sine <- getSINEs(rmskat, relLength=0.95) rmskat_DNAtrans <- getDNAtransposons(rmskat, relLength=0.95)
rmskat <- annotaTEs(genome="dm6", parsefun=rmskatenaparser, strict=FALSE) rmskatLTR <- getLTRs(rmskat, relLength=0.95, fullLength=TRUE, partial=TRUE) rmskatLTR rmskat_line <- getLINEs(rmskat, relLength=0.95) rmskat_sine <- getSINEs(rmskat, relLength=0.95) rmskat_DNAtrans <- getDNAtransposons(rmskat, relLength=0.95)
This is a class for storing parameters to quantify TE (and gene) expression using the atena method. It is a subclass of the 'QuantifyParam-class'.
Build an object of the class atenaParam
.
atenaParam( bfl, teFeatures, aggregateby = character(0), ovMode = "ovUnion", geneFeatures = NULL, singleEnd = TRUE, strandMode = 1L, ignoreStrand = FALSE, fragments = TRUE, pi_prior = 0L, theta_prior = 0L, em_epsilon = 1e-07, maxIter = 100L, reassign_mode = "exclude", conf_prob = 0.9, verbose = TRUE ) ## S4 method for signature 'atenaParam' show(object)
atenaParam( bfl, teFeatures, aggregateby = character(0), ovMode = "ovUnion", geneFeatures = NULL, singleEnd = TRUE, strandMode = 1L, ignoreStrand = FALSE, fragments = TRUE, pi_prior = 0L, theta_prior = 0L, em_epsilon = 1e-07, maxIter = 100L, reassign_mode = "exclude", conf_prob = 0.9, verbose = TRUE ) ## S4 method for signature 'atenaParam' show(object)
bfl |
A |
teFeatures |
A |
aggregateby |
Character vector with column names from the annotation
to be used to aggregate quantifications. By default, this is an empty
vector, which means that the names of the input |
ovMode |
Character vector indicating the overlapping mode. Available options are: "ovUnion" (default) and "ovIntersectionStrict", which implement the corresponding methods from HTSeq (https://htseq.readthedocs.io/en/release_0.11.1/count.html). Ambiguous alignments (alignments overlapping > 1 feature) are not counted. |
geneFeatures |
(Default NULL) A |
singleEnd |
(Default TRUE) Logical value indicating if reads are single
( |
strandMode |
(Default 1) Numeric vector which can take values 0, 1 or
2.
The strand mode is a per-object switch on
|
ignoreStrand |
(Default FALSE) A logical which defines if the strand
should be taken into consideration when computing the overlap between reads
and annotated features. When |
fragments |
(Default TRUE) A logical; applied to paired-end data only.
When |
pi_prior |
(Default 0) A positive numeric object indicating the prior
on pi. The same prior can be specified for all features setting
|
theta_prior |
(Default 0) A positive numeric object indicating the
prior on Q. The same prior can be specified for all features setting
|
em_epsilon |
(Default 1e-7) A numeric scalar indicating the EM Algorithm Epsilon cutoff. |
maxIter |
A positive integer scalar storing the maximum number of
iterations of the EM SQUAREM algorithm (Du and Varadhan, 2020). Default
is 100 and this value is passed to the |
reassign_mode |
(Default 'exclude') Character vector indicating
reassignment mode after EM step.
Available methods are 'exclude' (reads with more than one best
assignment are excluded from the final counts), 'choose' (when reads have
more than one best assignment, one of them is randomly chosen), 'average'
(the read count is divided evenly among the best assignments) and 'conf'
(only assignments that exceed a certain threshold -defined by
|
conf_prob |
(Default 0.9) Minimum probability for high confidence assignment. |
verbose |
(Default |
object |
A atenaParam object. |
This is the constructor function for objects of the class
atenaParam-class
. This type of object is the input to the
function qtex()
for quantifying expression of transposable
elements, which will call the atena method with this type of object. The
atena method uses a multiple '__no_feature' approach in which as many
'__no_feature' features as different overlapping patterns of multimapping
reads in the overlapping matrix are used to represent alignments mapping
outside annotations.
A atenaParam object.
singleEnd
(Default TRUE) Logical value indicating if reads are single
(TRUE
) or paired-end (FALSE
).
strandMode
(Default 1) Numeric vector which can take values 0, 1 or 2.
The strand mode is a per-object switch on
GAlignmentPairs
objects that controls the behavior of the strand getter. See
GAlignmentPairs
class for further detail. If singleEnd = TRUE
, then strandMode
is ignored.
ignoreStrand
(Default FALSE) A logical which defines if the strand
should be taken into consideration when computing the overlap between reads
and annotated features. When ignoreStrand = FALSE
, an aligned read
is considered to be overlapping an annotated feature as long as they
have a non-empty intersecting genomic range on the same strand, while when
ignoreStrand = TRUE
the strand is not considered.
fragments
(Default TRUE) A logical; applied to paired-end data only.
When fragments=FALSE
, the read-counting method only counts
‘mated pairs’ from opposite strands (non-ambiguous properly paired reads),
while when fragments=TRUE
same-strand pairs, singletons, reads with
unmapped pairs and other ambiguous or not properly paired fragments
are also counted (see "Pairing criteria" in
readGAlignments()
).
For further details see
summarizeOverlaps()
.
pi_prior
(Default 0) A positive numeric object indicating the prior
on pi. The same prior can be specified for all features setting
pi_prior
as a scalar, or each feature can have a specific prior by
setting pi_prior
as a vector with names()
corresponding to
all feature names. Setting a pi prior is equivalent to adding n unique
reads.
theta_prior
(Default 0) A positive numeric object indicating the
prior on Q. The same prior can be specified for all features setting
theta_prior
as a scalar, or each feature can have a specific prior by
setting theta_prior
as a vector with names()
corresponding to
all feature names. Equivalent to adding n non-unique reads.
em_epsilon
(Default 1e-7) A numeric scalar indicating the EM Algorithm Epsilon cutoff.
maxIter
A positive integer scalar storing the maximum number of
iterations of the EM SQUAREM algorithm (Du and Varadhan, 2020). Default
is 100 and this value is passed to the maxiter
parameter of the
squarem()
function.
reassign_mode
(Default 'exclude') Character vector indicating
reassignment mode after EM step.
Available methods are 'exclude' (reads with more than one best
assignment are excluded from the final counts), 'choose' (when reads have
more than one best assignment, one of them is randomly chosen), 'average'
(the read count is divided evenly among the best assignments) and 'conf'
(only assignments that exceed a certain threshold -defined by
conf_prob
parameter- are accepted, then the read count is
proportionally divided among the assignments above conf_prob
).
conf_prob
(Default 0.9) Minimum probability for high confidence assignment.
bamfiles <- list.files(system.file("extdata", package="atena"), pattern="*.bam", full.names=TRUE) ## Not run: ## use the following two instructions to fetch annotations, they are here ## commented out to enable running this example quickly when building and ## checking the package rmskat <- annotaTEs(genome="dm6", parsefun=rmskatenaparser, strict=FALSE, insert=500) rmskLTR <- getLTRs(rmskat, relLength=0.8, fullLength=TRUE, partial=TRUE, otherLTR=TRUE) ## End(Not run) ## DO NOT TYPE THIS INSTRUCTION, WHICH JUST LOADS A PRE-COMPUTED ANNOTATION ## YOU SHOULD USE THE INSTRUCTIONS ABOVE TO FETCH ANNOTATIONS rmskLTR <- readRDS(system.file("extdata", "rmskatLTRrlen80flenpartoth.rds", package="atena")) ## build a parameter object for the atena method atpar <- atenaParam(bfl=bamfiles, teFeatures=rmskLTR, singleEnd=TRUE, ignoreStrand=TRUE) atpar
bamfiles <- list.files(system.file("extdata", package="atena"), pattern="*.bam", full.names=TRUE) ## Not run: ## use the following two instructions to fetch annotations, they are here ## commented out to enable running this example quickly when building and ## checking the package rmskat <- annotaTEs(genome="dm6", parsefun=rmskatenaparser, strict=FALSE, insert=500) rmskLTR <- getLTRs(rmskat, relLength=0.8, fullLength=TRUE, partial=TRUE, otherLTR=TRUE) ## End(Not run) ## DO NOT TYPE THIS INSTRUCTION, WHICH JUST LOADS A PRE-COMPUTED ANNOTATION ## YOU SHOULD USE THE INSTRUCTIONS ABOVE TO FETCH ANNOTATIONS rmskLTR <- readRDS(system.file("extdata", "rmskatLTRrlen80flenpartoth.rds", package="atena")) ## build a parameter object for the atena method atpar <- atenaParam(bfl=bamfiles, teFeatures=rmskLTR, singleEnd=TRUE, ignoreStrand=TRUE) atpar
This is a class for storing parameters provided to the ERVmap algorithm. It is a subclass of the 'QuantifyParam-class'.
Build an object of the class ERVmapParam
ERVmapParam( bfl, teFeatures, aggregateby = character(0), ovMode = "ovUnion", geneFeatures = NULL, singleEnd = TRUE, ignoreStrand = TRUE, strandMode = 1L, fragments = !singleEnd, maxMismatchRate = 0.02, suboptimalAlignmentTag = "auto", suboptimalAlignmentCutoff = 5, geneCountMode = "all", verbose = TRUE ) ## S4 method for signature 'ERVmapParam' show(object)
ERVmapParam( bfl, teFeatures, aggregateby = character(0), ovMode = "ovUnion", geneFeatures = NULL, singleEnd = TRUE, ignoreStrand = TRUE, strandMode = 1L, fragments = !singleEnd, maxMismatchRate = 0.02, suboptimalAlignmentTag = "auto", suboptimalAlignmentCutoff = 5, geneCountMode = "all", verbose = TRUE ) ## S4 method for signature 'ERVmapParam' show(object)
bfl |
A |
teFeatures |
A |
aggregateby |
Character vector with column names in the annotation
to be used to aggregate quantifications. By default, this is an empty
vector, which means that the names of the input |
ovMode |
Character vector indicating the overlapping mode. Available options are: "ovUnion" (default) and "ovIntersectionStrict", which implement the corresponding methods from HTSeq (https://htseq.readthedocs.io/en/release_0.11.1/count.html). Ambiguous alignments (alignments overlapping > 1 feature) are addressed as in the original ERVmap algorithm. |
geneFeatures |
(Default NULL) A |
singleEnd |
(Default TRUE) Logical value indicating if reads are single
( |
ignoreStrand |
(Default TRUE) A logical which defines if the strand
should be taken into consideration when computing the overlap between reads
and annotated features. When |
strandMode |
(Default 1) Numeric vector which can take values 0, 1 or
2.
The strand mode is a per-object switch on
|
fragments |
(Default not |
maxMismatchRate |
(Default 0.02) Numeric value storing the maximum mismatch rate employed by the ERVmap algorithm to discard aligned reads whose rate of sum of hard and soft clipping or whose rate of the edit distance over the genome reference to the length of the read is above this threshold. |
suboptimalAlignmentTag |
(Default "auto") Character string storing the
tag name in the BAM files that stores the suboptimal alignment score used in
the third filter of ERVmap; see
Tokuyama et al. (2018).
The default, |
suboptimalAlignmentCutoff |
(Default 5) Numeric value storing the
cutoff above which the difference between the alignment score and the
suboptimal alignment score is considered sufficiently large to retain the
alignment. When this value is set to |
geneCountMode |
(Default |
verbose |
(Default |
object |
A ERVmapParam object. |
This is the constructor function for objects of the class
ERVmapParam-class
. This type of object is the input to the
function qtex()
for quantifying expression of transposable
elements using the ERVmap method
Tokuyama et al. (2018). The
ERVmap algorithm processes reads following conservative filtering criteria
to provide reliable raw count data for each TE.
A ERVmapParam object.
readMapper
The name of the software used to align reads, obtained from the BAM file header.
singleEnd
(Default FALSE) Logical value indicating if reads are single
(TRUE
) or paired-end (FALSE
).
strandMode
(Default 1) Numeric vector which can take values 0, 1 or 2.
The strand mode is a per-object switch on
GAlignmentPairs
objects that controls the behavior of the strand getter. See
GAlignmentPairs
class for further detail. If singleEnd = TRUE
, then
strandMode
#' is ignored.
ignoreStrand
(Default TRUE) A logical which defines if the strand
should be taken into consideration when computing the overlap between reads
and TEs in the annotations. When ignore_strand = FALSE
, only those
reads which overlap the TE and are on the same strand are counted. On the
contrary, when ignore_strand = TRUE
, any read overlapping an element
in teFeatures
is counted regardless of the strand.
fragments
(Default not singleEnd
) A logical; applied to
paired-end data only. When fragments=TRUE
, the read-counting
method in the original ERVmap algorithm is applied: each mate of a
paired-end read is counted (including ambiguous and not properly paired
reads). When
fragments=FALSE
, if the two mates of a paired-end read map to the
same element, they are counted as a single hit and singletons, reads with
unmapped pairs and other ambiguous or not properly paired fragments are
not counted (see "Pairing criteria" in
readGAlignments()
).
maxMismatchRate
(Default 0.02) Numeric value storing the maximum mismatch rate employed by the ERVmap algorithm to discard aligned reads whose rate of sum of hard and soft clipping, or of the edit distance over the genome reference, to the length of the read is above this threshold.
suboptimalAlignmentTag
(Default "auto") Character string storing the
tag name in the BAM files that stores the suboptimal alignment score used in
the third filter of ERVmap; see Tokuyama et al. (2018). The default,
suboptimalAlignmentTag="auto"
, assumes that either the BAM files were
generated by BWA and include a tag called XS
that stores the
suboptimal alignment score or, if the XS
tag is not available, then
it uses the available secondary alignments to implement an analogous
approach to that of the third ERVmap filter. When
suboptimalAlignmentTag="none"
, it also performs the latter approach
even when the tag XS
is available.
When this parameter is different from "auto"
and "none"
, a tag
with the given name is used to extract the suboptimal alignment score.
The absence of that tag will prompt an error.
suboptimalAlignmentCutoff
(Default 5) Numeric value storing the cutoff
above which the difference between the alignment score and the suboptimal
alignment score is considered sufficiently large to retain the alignment.
When this value is set to NA
, then the filtering step based on
suboptimal alignment scores is skipped.
geneCountMode
(Default "all") Character string indicating if the ERVmap read filters applied to quantify TEs expression should also be applied when quantifying gene expression ("ervmap") or not ("all"), in which case all primary alignments mapping to genes are counted.
Tokuyama M et al. ERVmap analysis reveals genome-wide transcription of human endogenous retroviruses. PNAS. 2018;115(50):12565-12572. DOI: https://doi.org/10.1073/pnas.1814589115
Tokuyama M et al. ERVmap analysis reveals genome-wide transcription of human endogenous retroviruses. PNAS. 2018;115(50):12565-12572. DOI: https://doi.org/10.1073/pnas.1814589115
bamfiles <- list.files(system.file("extdata", package="atena"), pattern="*.bam", full.names=TRUE) ## Not run: rmskat <- annotaTEs(genome="dm6", parsefun=rmskatenaparser, strict=FALSE, insert=500) rmskLTR <- getLTRs(rmskat, relLength=0.8, fullLength=TRUE, partial=TRUE, otherLTR=TRUE) ## End(Not run) ## DO NOT TYPE THIS INSTRUCTION, WHICH JUST LOADS A PRE-COMPUTED ANNOTATION ## YOU SHOULD USE THE INSTRUCTIONS ABOVE TO FETCH ANNOTATIONS rmskLTR <- readRDS(system.file("extdata", "rmskatLTRrlen80flenpartoth.rds", package="atena")) ## build a parameter object for ERVmap empar <- ERVmapParam(bamfiles, teFeatures=rmskLTR, singleEnd=TRUE, ignoreStrand=TRUE, suboptimalAlignmentCutoff=NA) empar
bamfiles <- list.files(system.file("extdata", package="atena"), pattern="*.bam", full.names=TRUE) ## Not run: rmskat <- annotaTEs(genome="dm6", parsefun=rmskatenaparser, strict=FALSE, insert=500) rmskLTR <- getLTRs(rmskat, relLength=0.8, fullLength=TRUE, partial=TRUE, otherLTR=TRUE) ## End(Not run) ## DO NOT TYPE THIS INSTRUCTION, WHICH JUST LOADS A PRE-COMPUTED ANNOTATION ## YOU SHOULD USE THE INSTRUCTIONS ABOVE TO FETCH ANNOTATIONS rmskLTR <- readRDS(system.file("extdata", "rmskatLTRrlen80flenpartoth.rds", package="atena")) ## build a parameter object for ERVmap empar <- ERVmapParam(bamfiles, teFeatures=rmskLTR, singleEnd=TRUE, ignoreStrand=TRUE, suboptimalAlignmentCutoff=NA) empar
OneCodeToFindThemAll parser of RepeatMasker annotations
OneCodeToFindThemAll( gr, dictionary = NULL, fuzzy = FALSE, strict = FALSE, insert = -1, BPPARAM = SerialParam(progressbar = TRUE) )
OneCodeToFindThemAll( gr, dictionary = NULL, fuzzy = FALSE, strict = FALSE, insert = -1, BPPARAM = SerialParam(progressbar = TRUE) )
gr |
A GRanges object with RepeatMasker annotations from AnnotationHub |
dictionary |
(Default NULL) When NULL, a dictionary is built based on names of repeats. If not, a data.frame with equivalences LTR - internal regions created by the user, where first column should be the name of the internal region and the second column should be the LTR(s). When more than one LTR, these should be separated by ":". |
fuzzy |
(Default FALSE) A logical; if TRUE, the search for equivalences between internal parts and LTRs to reconstruct LTR class transposable elements is less stringent, allowing more matches between corresponding subparts. This option can increase the proportion of false positives (incorrectly reconstructed LTR class TEs). |
strict |
(Default FALSE) A logical; if TRUE, the 80-80 rule is applied, i.e. only copies with more than 80 and more than 80 bp long are reported. |
insert |
(Default -1) An integer. When |
BPPARAM |
See |
Implementation of One code to find them all
(Bailly-Bechet et al. 2014).
Parses RepeatMasker annotations from UCSC by assembling together fragments
from the same transposable elemenet (TE) that are close enough (determined
by the insert
parameter). For TEs from the LTR class, the parser
tries to reconstruct full-length, when possible, or partial TEs following
the LTR - internal region - LTR structure. Equivalences between internal
regions and flanking LTRs can be set by the user with the dictionary
parameter or can be obtained by the parser. In this last case, the
fuzzy
parameter determines the level of stringency when searching
for LTR - internal region equivalences.
A GRangesList object.
Bailly-Bechet et al. "One code to find them all": a perl tool to conveniently parse RepeatMasker output files. Mobile DNA. 2014;5(1):1-15. DOI: https://doi.org/10.1186/1759-8753-5-13
## Not run: rmskoc <- annotaTEs(genome="dm6", parsefun=OneCodeToFindThemAll, fuzzy=FALSE, strict=FALSE) ## End(Not run)
## Not run: rmskoc <- annotaTEs(genome="dm6", parsefun=OneCodeToFindThemAll, fuzzy=FALSE, strict=FALSE) ## End(Not run)
The following functions control the way in which overlaps between aligned reads and annotated features are resolved when an aligned read overlaps more than one feature on the same locus:
ovUnion(reads, features, ignoreStrand, inter.feature = TRUE) ovIntersectionStrict(reads, features, ignoreStrand, inter.feature = TRUE)
ovUnion(reads, features, ignoreStrand, inter.feature = TRUE) ovIntersectionStrict(reads, features, ignoreStrand, inter.feature = TRUE)
reads |
A |
features |
A |
ignoreStrand |
(Default FALSE) A logical which defines if the strand
should be taken into consideration when computing the overlap between reads
and annotated features. When |
inter.feature |
When TRUE, ambiguous alignments (alignments
overlapping > 1 features) are removed and not counted. When
|
ovUnion()
: (default)
ovIntersectionStrict()
:
User supplied: a function taking the same parameters as the
previous three functions and returning a
Hits
object.
They take the following parameters:
These functions are given to the mode
parameter of the
qtex()
function and are similar to the functions
Union()
and
IntersectionStrict()
from the
GenomicAlignments
package, with the difference that instead
of returning counts of reads overlapping annotated features, they
return the actual overlaps, because the counting is deferred to other
algorithms that follow some specific strategy when a read maps to
more than one feature. For this same reason, these functions lack
the inter.feature
argument found in the corresponding functions
from the GenomicAlignments
package.
A Hits
object; see the Hits-class
manual page.
bamfiles <- list.files(system.file("extdata", package="atena"), pattern="*.bam", full.names=TRUE) ## Not run: ## use the following two instructions to fetch annotations, they are here ## commented out to enable running this example quickly when building and ## checking the package rmskat <- annotaTEs(genome="dm6", parsefun=rmskatenaparser, strict=FALSE, insert=500) rmskLTR <- getLTRs(rmskat, relLength=0.8, fullLength=TRUE, partial=TRUE, otherLTR=TRUE) ## End(Not run) ## DO NOT TYPE THIS INSTRUCTION, WHICH JUST LOADS A PRE-COMPUTED ANNOTATION ## YOU SHOULD USE THE INSTRUCTIONS ABOVE TO FETCH ANNOTATIONS rmskLTR <- readRDS(system.file("extdata", "rmskatLTRrlen80flenpartoth.rds", package="atena")) ## build a parameter object for Telescope tspar <- TelescopeParam(bfl=bamfiles, teFeatures=rmskLTR, singleEnd=TRUE, ignoreStrand=TRUE) ## quantify expression using the 'ovIntersectionStrict()' mode function tsquant <- qtex(tspar, mode=ovIntersectionStrict)
bamfiles <- list.files(system.file("extdata", package="atena"), pattern="*.bam", full.names=TRUE) ## Not run: ## use the following two instructions to fetch annotations, they are here ## commented out to enable running this example quickly when building and ## checking the package rmskat <- annotaTEs(genome="dm6", parsefun=rmskatenaparser, strict=FALSE, insert=500) rmskLTR <- getLTRs(rmskat, relLength=0.8, fullLength=TRUE, partial=TRUE, otherLTR=TRUE) ## End(Not run) ## DO NOT TYPE THIS INSTRUCTION, WHICH JUST LOADS A PRE-COMPUTED ANNOTATION ## YOU SHOULD USE THE INSTRUCTIONS ABOVE TO FETCH ANNOTATIONS rmskLTR <- readRDS(system.file("extdata", "rmskatLTRrlen80flenpartoth.rds", package="atena")) ## build a parameter object for Telescope tspar <- TelescopeParam(bfl=bamfiles, teFeatures=rmskLTR, singleEnd=TRUE, ignoreStrand=TRUE) ## quantify expression using the 'ovIntersectionStrict()' mode function tsquant <- qtex(tspar, mode=ovIntersectionStrict)
The qtex()
method quantifies transposable element expression.
## S4 method for signature 'ERVmapParam' qtex( x, phenodata = NULL, mode = ovUnion, yieldSize = 1000000L, verbose = 1, BPPARAM = SerialParam(progressbar = ifelse(verbose == 1, TRUE, FALSE)) ) ## S4 method for signature 'TEtranscriptsParam' qtex( x, phenodata = NULL, mode = ovUnion, yieldSize = 1000000L, BPPARAM = SerialParam(progressbar = TRUE) ) ## S4 method for signature 'TelescopeParam' qtex( x, phenodata = NULL, mode = ovUnion, yieldSize = 1000000L, auxiliaryFeatures = FALSE, BPPARAM = SerialParam(progressbar = TRUE) ) ## S4 method for signature 'atenaParam' qtex( x, phenodata = NULL, mode = ovUnion, yieldSize = 1000000L, auxiliaryFeatures = FALSE, BPPARAM = SerialParam(progressbar = TRUE) )
## S4 method for signature 'ERVmapParam' qtex( x, phenodata = NULL, mode = ovUnion, yieldSize = 1000000L, verbose = 1, BPPARAM = SerialParam(progressbar = ifelse(verbose == 1, TRUE, FALSE)) ) ## S4 method for signature 'TEtranscriptsParam' qtex( x, phenodata = NULL, mode = ovUnion, yieldSize = 1000000L, BPPARAM = SerialParam(progressbar = TRUE) ) ## S4 method for signature 'TelescopeParam' qtex( x, phenodata = NULL, mode = ovUnion, yieldSize = 1000000L, auxiliaryFeatures = FALSE, BPPARAM = SerialParam(progressbar = TRUE) ) ## S4 method for signature 'atenaParam' qtex( x, phenodata = NULL, mode = ovUnion, yieldSize = 1000000L, auxiliaryFeatures = FALSE, BPPARAM = SerialParam(progressbar = TRUE) )
x |
An
|
phenodata |
A |
mode |
One of the pre-defined overlapping methods such as
|
yieldSize |
Field inherited from |
verbose |
(Default 1). When |
BPPARAM |
An object of a BiocParallelParam subclass
to configure the parallel execution of the code. By default,
a SerialParam object is used, which does not use
any parallelization, with the flag |
auxiliaryFeatures |
(Default |
Giving some AtenaParam
object sub-class as input, the
qtex()
method quantifies the expression of transposable
elements (TEs). The particular algorithm to perform the
quantification will be selected depending on the specific
sub-class of input AtenaParam
object, see argument
x
above.
A SummarizedExperiment object.
Jin Y et al. TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets. Bioinformatics. 2015;31(22):3593-3599. DOI: https://doi.org/10.1093/bioinformatics/btv422
Tokuyama M et al. ERVmap analysis reveals genome-wide transcription of human endogenous retroviruses. PNAS, 115(50):12565-12572, 2018. https://doi.org/10.1073/pnas.1814589115
Bendall ML et al. Telescope: characterization of the retrotranscriptome by accurate estimation of transposable element expression. PLOS Computational Biology, 15:e1006453, 2019. https://doi.org/10.1371/journal.pcbi.1006453
TEtranscriptsParam
ERVmapParam
TelescopeParam
bamfiles <- list.files(system.file("extdata", package="atena"), pattern="*.bam", full.names=TRUE) ## Not run: ## use the following two instructions to fetch annotations, they are here ## commented out to enable running this example quickly when building and ## checking the package rmskat <- annotaTEs(genome="dm6", parsefun=rmskatenaparser, strict=FALSE, insert=500) rmskLTR <- getLTRs(rmskat, relLength=0.8, fullLength=TRUE, partial=TRUE, otherLTR=TRUE) ## End(Not run) ## DO NOT TYPE THIS INSTRUCTION, WHICH JUST LOADS A PRE-COMPUTED ANNOTATION ## YOU SHOULD USE THE INSTRUCTIONS ABOVE TO FETCH ANNOTATIONS rmskLTR <- readRDS(system.file("extdata", "rmskatLTRrlen80flenpartoth.rds", package="atena")) ## build a parameter object for Telescope tspar <- TelescopeParam(bfl=bamfiles, teFeatures=rmskLTR, singleEnd=TRUE, ignoreStrand=TRUE) ## quantify expression qts <- qtex(tspar)
bamfiles <- list.files(system.file("extdata", package="atena"), pattern="*.bam", full.names=TRUE) ## Not run: ## use the following two instructions to fetch annotations, they are here ## commented out to enable running this example quickly when building and ## checking the package rmskat <- annotaTEs(genome="dm6", parsefun=rmskatenaparser, strict=FALSE, insert=500) rmskLTR <- getLTRs(rmskat, relLength=0.8, fullLength=TRUE, partial=TRUE, otherLTR=TRUE) ## End(Not run) ## DO NOT TYPE THIS INSTRUCTION, WHICH JUST LOADS A PRE-COMPUTED ANNOTATION ## YOU SHOULD USE THE INSTRUCTIONS ABOVE TO FETCH ANNOTATIONS rmskLTR <- readRDS(system.file("extdata", "rmskatLTRrlen80flenpartoth.rds", package="atena")) ## build a parameter object for Telescope tspar <- TelescopeParam(bfl=bamfiles, teFeatures=rmskLTR, singleEnd=TRUE, ignoreStrand=TRUE) ## quantify expression qts <- qtex(tspar)
This is a virtual class from which other classes are derived for storing parameters provided to quantification methods of transposable elements from RNA-seq data.
## S4 method for signature 'QuantifyParam' path(object) ## S4 method for signature 'QuantifyParam' features(x)
## S4 method for signature 'QuantifyParam' path(object) ## S4 method for signature 'QuantifyParam' features(x)
object |
A QuantifyParam object. |
x |
A QuantifyParam object. |
path()
: Filesystem paths to the BAM files in the input
parameter object.
features()
: The GenomicRanges
or
GenomicRangesList
object with the features in the input parameter
object.
bfl
A BamFileList object.
features
A GRanges object.
aggregateby
Character vector with column names in the annotation to be used to aggregate quantifications.
ovMode
Character vector indicating the overlapping mode. Available options are: "ovUnion" (default) and "ovIntersectionStrict", which implement the corresponding methods from HTSeq (https://htseq.readthedocs.io/en/release_0.11.1/count.html). In the TEtranscripts, ERVmap and Telescope methods ambiguous alignments (alignments overlapping > 1 feature) are addressed differently depending on the method. In the atena method, those overlaps are not counted.
ERVmapParam-class
TelescopeParam-class
TEtranscriptsParam-class
atenaParam-class
bamfiles <- list.files(system.file("extdata", package="atena"), pattern="*.bam", full.names=TRUE) ## Not run: ## use the following two instructions to fetch annotations, they are here ## commented out to enable running this example quickly when building and ## checking the package rmskat <- annotaTEs(genome="dm6", parsefun=rmskatenaparser, strict=FALSE, insert=500) rmskLTR <- getLTRs(rmskat, relLength=0.8, fullLength=TRUE, partial=TRUE) ## End(Not run) ## DO NOT TYPE THIS INSTRUCTION, WHICH JUST LOADS A PRE-COMPUTED ANNOTATION ## YOU SHOULD USE THE INSTRUCTIONS ABOVE TO FETCH ANNOTATIONS rmskLTR <- readRDS(system.file("extdata", "rmskatLTRrlen80flenpartoth.rds", package="atena")) ## build a parameter object for TEtranscripts ttpar <- TEtranscriptsParam(bamfiles, teFeatures=rmskLTR, singleEnd=TRUE, ignoreStrand=TRUE) ## just check that the parameter object belongs to the expected classes is(ttpar, "QuantifyParam") is(ttpar, "TEtranscriptsParam")
bamfiles <- list.files(system.file("extdata", package="atena"), pattern="*.bam", full.names=TRUE) ## Not run: ## use the following two instructions to fetch annotations, they are here ## commented out to enable running this example quickly when building and ## checking the package rmskat <- annotaTEs(genome="dm6", parsefun=rmskatenaparser, strict=FALSE, insert=500) rmskLTR <- getLTRs(rmskat, relLength=0.8, fullLength=TRUE, partial=TRUE) ## End(Not run) ## DO NOT TYPE THIS INSTRUCTION, WHICH JUST LOADS A PRE-COMPUTED ANNOTATION ## YOU SHOULD USE THE INSTRUCTIONS ABOVE TO FETCH ANNOTATIONS rmskLTR <- readRDS(system.file("extdata", "rmskatLTRrlen80flenpartoth.rds", package="atena")) ## build a parameter object for TEtranscripts ttpar <- TEtranscriptsParam(bamfiles, teFeatures=rmskLTR, singleEnd=TRUE, ignoreStrand=TRUE) ## just check that the parameter object belongs to the expected classes is(ttpar, "QuantifyParam") is(ttpar, "TEtranscriptsParam")
atena annotation parser of RepeatMasker annotations
rmskatenaparser(gr, strict = FALSE, insert = 1000)
rmskatenaparser(gr, strict = FALSE, insert = 1000)
gr |
A GRanges object with RepeatMasker annotations from AnnotationHub |
strict |
(Default FALSE) A logical; if TRUE, the 80-80 rule is applied, i.e. only copies with more than 80 and more than 80 bp long are reported. |
insert |
(Default 1000L) An integer > 0. Fragments are assembled
together if the distance between their closest extremities
is equal or less than |
atena annotation parser of RepeatMasker annotations.
Parses RepeatMasker annotations from UCSC by assembling together fragments
from the same transposable element (TE) that are close enough (determined
by the insert
parameter). For TEs from the LTR class, the parser
tries to reconstruct full-length, when possible, or partial TEs following
the LTR - internal region - LTR structure. Equivalences between LTR and
internal regions are found by, first, identifying LTR regions (those with
the "LTR" substring in their name) and internal regions (those with a
suffix such as "-int", "-I", etc.). Then, LTR are assigned to internal
regions for which the comparison of the two names are has a higher number
of equal consecutive characters.
A GRangesList object.
rmskat <- annotaTEs(genome="dm6", parsefun=rmskatenaparser, strict=FALSE) rmskat
rmskat <- annotaTEs(genome="dm6", parsefun=rmskatenaparser, strict=FALSE) rmskat
Parser of RepeatMasker annotations
rmskbasicparser(gr)
rmskbasicparser(gr)
gr |
A GRanges object with RepeatMasker annotations from AnnotationHub |
Parses annotations by removing low complexity regions, simple repeats, satellites, rRNA, scRNA, snRNA, srpRNA and tRNA. Also removes TEs with a strand different than "+" or "-". Modifies "repFamily" and "repClass" columns when a "?" is present or when they are defined as "Unknown" or "Other". Finally, assigns a unique id to each TE instance by adding the suffix "_dup" plus a number at the end of the "repName".
A GRanges object.
rmskba <- annotaTEs(genome="dm6", parsefun=rmskbasicparser) rmskba
rmskba <- annotaTEs(genome="dm6", parsefun=rmskbasicparser) rmskba
Identity function for parsefun
rmskidentity(gr)
rmskidentity(gr)
gr |
A GRanges object. |
Identity function: returns the GRanges object without any modification.
A GRanges object.
rmskid <- annotaTEs(genome="dm6", parsefun=rmskidentity) rmskid
rmskid <- annotaTEs(genome="dm6", parsefun=rmskidentity) rmskid
This is a class for storing parameters provided to the Telescope algorithm.
Build an object of the class TelescopeParam
.
TelescopeParam( bfl, teFeatures, aggregateby = character(0), ovMode = "ovUnion", geneFeatures = NULL, singleEnd = TRUE, strandMode = 1L, ignoreStrand = FALSE, fragments = FALSE, minOverlFract = 0.2, pi_prior = 0L, theta_prior = 0L, em_epsilon = 1e-07, maxIter = 100L, reassign_mode = "exclude", conf_prob = 0.9, verbose = TRUE ) ## S4 method for signature 'TelescopeParam' show(object)
TelescopeParam( bfl, teFeatures, aggregateby = character(0), ovMode = "ovUnion", geneFeatures = NULL, singleEnd = TRUE, strandMode = 1L, ignoreStrand = FALSE, fragments = FALSE, minOverlFract = 0.2, pi_prior = 0L, theta_prior = 0L, em_epsilon = 1e-07, maxIter = 100L, reassign_mode = "exclude", conf_prob = 0.9, verbose = TRUE ) ## S4 method for signature 'TelescopeParam' show(object)
bfl |
A |
teFeatures |
A |
aggregateby |
Character vector with column names from the annotation
to be used to aggregate quantifications. By default, this is an empty
vector, which means that the names of the input |
ovMode |
Character vector indicating the overlapping mode. Available options are: "ovUnion" (default) and "ovIntersectionStrict", which implement the corresponding methods from HTSeq (https://htseq.readthedocs.io/en/release_0.11.1/count.html). Ambiguous alignments (alignments overlapping > 1 feature) are addressed as in the original Telescope method: the overlap with the longest overlapping length is kept. |
geneFeatures |
(Default NULL) A |
singleEnd |
(Default TRUE) Logical value indicating if reads are single
( |
strandMode |
(Default 1) Numeric vector which can take values 0, 1 or
2.
The strand mode is a per-object switch on
|
ignoreStrand |
(Default FALSE) A logical which defines if the strand
should be taken into consideration when computing the overlap between reads
and annotated features. When |
fragments |
(Default FALSE) A logical; applied to paired-end data only.
When |
minOverlFract |
(Default 0.2) A numeric scalar. |
pi_prior |
(Default 0) A positive integer scalar indicating the prior on pi. This is equivalent to adding n unique reads. |
theta_prior |
(Default 0) A positive integer scalar storing the prior on Q. Equivalent to adding n non-unique reads. |
em_epsilon |
(Default 1e-7) A numeric scalar indicating the EM Algorithm Epsilon cutoff. |
maxIter |
A positive integer scalar storing the maximum number of
iterations of the EM SQUAREM algorithm (Du and Varadhan, 2020). Default
is 100 and this value is passed to the |
reassign_mode |
(Default 'exclude') Character vector indicating
reassignment mode after EM step.
Available methods are 'exclude' (reads with more than one best
assignment are excluded from the final counts), 'choose' (when reads have
more than one best assignment, one of them is randomly chosen), 'average'
(the read count is divided evenly among the best assignments) and 'conf'
(only assignments that exceed a certain threshold -defined by
|
conf_prob |
(Default 0.9) Minimum probability for high confidence assignment. |
verbose |
(Default |
object |
A TelescopeParam object. |
This is the constructor function for objects of the class
TelescopeParam-class
. This type of object is the input to the
function qtex()
for quantifying expression of transposable
elements, which will call the Telescope algorithm
Bendall et al. (2019)
with this type of object.
A TelescopeParam object.
singleEnd
(Default TRUE) Logical value indicating if reads are single
(TRUE
) or paired-end (FALSE
).
strandMode
(Default 1) Numeric vector which can take values 0, 1 or 2.
The strand mode is a per-object switch on
GAlignmentPairs
objects that controls the behavior of the strand getter. See
GAlignmentPairs
class for further detail. If singleEnd = TRUE
, then strandMode
is ignored.
ignoreStrand
(Default FALSE) A logical which defines if the strand
should be taken into consideration when computing the overlap between reads
and annotated features. When ignoreStrand = FALSE
, an aligned read
is considered to be overlapping an annotated feature as long as they
have a non-empty intersecting genomic range on the same strand, while when
ignoreStrand = TRUE
the strand is not considered.
fragments
(Default FALSE) A logical; applied to paired-end data only.
When fragments=FALSE
, the read-counting method only counts
‘mated pairs’ from opposite strands (non-ambiguous properly paired reads),
while when fragments=TRUE
same-strand pairs, singletons, reads with
unmapped pairs and other ambiguous or not properly paired fragments
are also counted (see "Pairing criteria" in
readGAlignments()
). fragments=TRUE
is equivalent to the original Telescope algorithm. For further details see
summarizeOverlaps()
.
minOverlFract
(Default 0.2) A numeric scalar. minOverlFract
is multiplied by the read length and the resulting value is used to
discard alignments for which the overlapping length (number of base
pairs the alignment and the feature overlap) is lower. When no minimum
overlap is required, set minOverlFract = 0
.
pi_prior
(Default 0) A positive integer scalar indicating the prior on pi. This is equivalent to adding n unique reads.
theta_prior
(Default 0) A positive integer scalar storing the prior on Q. Equivalent to adding n non-unique reads.
em_epsilon
(Default 1e-7) A numeric scalar indicating the EM Algorithm Epsilon cutoff.
maxIter
A positive integer scalar storing the maximum number of
iterations of the EM SQUAREM algorithm (Du and Varadhan, 2020). Default
is 100 and this value is passed to the maxiter
parameter of the
squarem()
function.
reassign_mode
(Default 'exclude') Character vector indicating
reassignment mode after EM step.
Available methods are 'exclude' (reads with more than one best
assignment are excluded from the final counts), 'choose' (when reads have
more than one best assignment, one of them is randomly chosen), 'average'
(the read count is divided evenly among the best assignments) and 'conf'
(only assignments that exceed a certain threshold -defined by
conf_prob
parameter- are accepted, then the read count is
proportionally divided among the assignments above conf_prob
).
conf_prob
(Default 0.9) Minimum probability for high confidence assignment.
Bendall et al. Telescope: characterization of the retrotranscriptome by accurate estimation of transposable element expression. PLOS Comp. Biol. 2019;15(9):e1006453. DOI: https://doi.org/10.1371/journal.pcbi.1006453
Bendall et al. Telescope: characterization of the retrotranscriptome by accurate estimation of transposable element expression. PLOS Comp. Biol. 2019;15(9):e1006453. DOI: https://doi.org/10.1371/journal.pcbi.1006453
bamfiles <- list.files(system.file("extdata", package="atena"), pattern="*.bam", full.names=TRUE) ## Not run: ## use the following two instructions to fetch annotations, they are here ## commented out to enable running this example quickly when building and ## checking the package rmskat <- annotaTEs(genome="dm6", parsefun=rmskatenaparser, strict=FALSE, insert=500) rmskLTR <- getLTRs(rmskat, relLength=0.8, fullLength=TRUE, partial=TRUE, otherLTR=TRUE) ## End(Not run) ## DO NOT TYPE THIS INSTRUCTION, WHICH JUST LOADS A PRE-COMPUTED ANNOTATION ## YOU SHOULD USE THE INSTRUCTIONS ABOVE TO FETCH ANNOTATIONS rmskLTR <- readRDS(system.file("extdata", "rmskatLTRrlen80flenpartoth.rds", package="atena")) ## build a parameter object for Telescope tspar <- TelescopeParam(bfl=bamfiles, teFeatures=rmskLTR, singleEnd=TRUE, ignoreStrand=TRUE) tspar
bamfiles <- list.files(system.file("extdata", package="atena"), pattern="*.bam", full.names=TRUE) ## Not run: ## use the following two instructions to fetch annotations, they are here ## commented out to enable running this example quickly when building and ## checking the package rmskat <- annotaTEs(genome="dm6", parsefun=rmskatenaparser, strict=FALSE, insert=500) rmskLTR <- getLTRs(rmskat, relLength=0.8, fullLength=TRUE, partial=TRUE, otherLTR=TRUE) ## End(Not run) ## DO NOT TYPE THIS INSTRUCTION, WHICH JUST LOADS A PRE-COMPUTED ANNOTATION ## YOU SHOULD USE THE INSTRUCTIONS ABOVE TO FETCH ANNOTATIONS rmskLTR <- readRDS(system.file("extdata", "rmskatLTRrlen80flenpartoth.rds", package="atena")) ## build a parameter object for Telescope tspar <- TelescopeParam(bfl=bamfiles, teFeatures=rmskLTR, singleEnd=TRUE, ignoreStrand=TRUE) tspar
This is a class for storing parameters provided to the TEtranscripts algorithm. It is a subclass of the 'QuantifyParam-class'.
Build an object of the class TEtranscriptsParam
TEtranscriptsParam( bfl, teFeatures, aggregateby = character(0), ovMode = "ovUnion", geneFeatures = NULL, singleEnd = TRUE, ignoreStrand = FALSE, strandMode = 1L, fragments = TRUE, tolerance = 1e-04, maxIter = 100L, verbose = TRUE ) ## S4 method for signature 'TEtranscriptsParam' show(object)
TEtranscriptsParam( bfl, teFeatures, aggregateby = character(0), ovMode = "ovUnion", geneFeatures = NULL, singleEnd = TRUE, ignoreStrand = FALSE, strandMode = 1L, fragments = TRUE, tolerance = 1e-04, maxIter = 100L, verbose = TRUE ) ## S4 method for signature 'TEtranscriptsParam' show(object)
bfl |
a character string vector of BAM file names. |
teFeatures |
A |
aggregateby |
Character vector with column names from the annotation
to be used to aggregate quantifications. By default, this is an empty
vector, which means that the names of the input |
ovMode |
Character vector indicating the overlapping mode. Available options are: "ovUnion" (default) and "ovIntersectionStrict", which implement the corresponding methods from HTSeq (https://htseq.readthedocs.io/en/release_0.11.1/count.html). Ambiguous alignments (alignments overlapping > 1 feature) are addressed as in the original TEtranscripts method. |
geneFeatures |
(Default NULL) A |
singleEnd |
(Default TRUE) Logical value indicating if reads are single
( |
ignoreStrand |
(Default FALSE) Logical value that defines if the strand
should be taken into consideration when computing the overlap between reads
and annotated features. When |
strandMode |
(Default 1) Numeric vector which can take values 0, 1 or
2.
The strand mode is a per-object switch on
|
fragments |
(Default TRUE) Logical value applied to paired-end data
only. In both cases ( |
tolerance |
A positive numeric scalar storing the minimum tolerance
above which the SQUAREM algorithm (Du and Varadhan, 2020) keeps iterating.
Default is |
maxIter |
A positive integer scalar storing the maximum number of
iterations of the SQUAREM algorithm (Du and Varadhan, 2020). Default
is 100 and this value is passed to the |
verbose |
(Default |
object |
A TEtranscriptsParam object. |
This is the constructor function for objects of the class
TEtranscriptsParam-class
. This type of object is the input to the
function qtex()
for quantifying expression of transposable
elements using the TEtranscripts method
Jin et al. (2015). The
TEtranscripts algorithm quantifies TE expression by using an EM algorithm
to optimally distribute ambiguously mapped reads.
A TEtranscriptsParam object.
singleEnd
(Default FALSE) Logical value indicating if reads are single
(TRUE
) or paired-end (FALSE
).
ignoreStrand
(Default FALSE) A logical which defines if the strand
should be taken into consideration when computing the overlap between reads
and annotated features. When ignoreStrand = FALSE
, an aligned read
will be considered to be overlapping an annotated feature as long as they
have a non-empty intersecting genomic ranges on the same strand, while when
ignoreStrand = TRUE
the strand will not be considered.
strandMode
(Default 1) Numeric vector which can take values 0, 1 or 2.
The strand mode is a per-object switch on
GAlignmentPairs
objects that controls the behavior of the strand getter. See
GAlignmentPairs
class for further detail. If singleEnd = TRUE
, then use either
strandMode = NULL
or do not specify the strandMode
parameter.
fragments
(Default TRUE) A logical; applied to paired-end data only.
In both cases (fragments=FALSE
and fragments=TRUE
), the
read-counting method discards not properly paired reads. Moreover,
when fragments=FALSE
, only non-ambiguous properly paired reads are
counted. When fragments=TRUE
, ambiguous reads are also counted
(see "Pairing criteria" in readGAlignments()
).
fragments=TRUE
is equivalent to
the behavior of the TEtranscripts algorithm. For further details see
summarizeOverlaps()
.
tolerance
A positive numeric scalar storing the minimum tolerance
above which the SQUAREM algorithm (Du and Varadhan, 2020) keeps iterating.
Default is 1e-4
and this value is passed to the tol
parameter
of the squarem()
function.
maxIter
A positive integer scalar storing the maximum number of
iterations of the SQUAREM algorithm (Du and Varadhan, 2020). Default
is 100 and this value is passed to the maxiter
parameter of the
squarem()
function.
Jin Y et al. TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets. Bioinformatics. 2015;31(22):3593-3599. DOI: https://doi.org/10.1093/bioinformatics/btv422
Jin Y et al. TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets. Bioinformatics. 2015;31(22):3593-3599. DOI: https://doi.org/10.1093/bioinformatics/btv422
bamfiles <- list.files(system.file("extdata", package="atena"), pattern="*.bam", full.names=TRUE) ## Not run: ## use the following two instructions to fetch annotations, they are here ## commented out to enable running this example quickly when building and ## checking the package rmskat <- annotaTEs(genome="dm6", parsefun=rmskatenaparser, strict=FALSE, insert=500) rmskLTR <- getLTRs(rmskat, relLength=0.8, fullLength=TRUE, partial=TRUE, otherLTR=TRUE) ## End(Not run) ## DO NOT TYPE THIS INSTRUCTION, WHICH JUST LOADS A PRE-COMPUTED ANNOTATION ## YOU SHOULD USE THE INSTRUCTIONS ABOVE TO FETCH ANNOTATIONS rmskLTR <- readRDS(system.file("extdata", "rmskatLTRrlen80flenpartoth.rds", package="atena")) library(TxDb.Dmelanogaster.UCSC.dm6.ensGene) txdb <- TxDb.Dmelanogaster.UCSC.dm6.ensGene txdb_genes <- genes(txdb) ## build a parameter object for TEtranscripts ttpar <- TEtranscriptsParam(bamfiles, teFeatures=rmskLTR, geneFeatures=txdb_genes, singleEnd=TRUE, ignoreStrand=TRUE, aggregateby="repName") ttpar
bamfiles <- list.files(system.file("extdata", package="atena"), pattern="*.bam", full.names=TRUE) ## Not run: ## use the following two instructions to fetch annotations, they are here ## commented out to enable running this example quickly when building and ## checking the package rmskat <- annotaTEs(genome="dm6", parsefun=rmskatenaparser, strict=FALSE, insert=500) rmskLTR <- getLTRs(rmskat, relLength=0.8, fullLength=TRUE, partial=TRUE, otherLTR=TRUE) ## End(Not run) ## DO NOT TYPE THIS INSTRUCTION, WHICH JUST LOADS A PRE-COMPUTED ANNOTATION ## YOU SHOULD USE THE INSTRUCTIONS ABOVE TO FETCH ANNOTATIONS rmskLTR <- readRDS(system.file("extdata", "rmskatLTRrlen80flenpartoth.rds", package="atena")) library(TxDb.Dmelanogaster.UCSC.dm6.ensGene) txdb <- TxDb.Dmelanogaster.UCSC.dm6.ensGene txdb_genes <- genes(txdb) ## build a parameter object for TEtranscripts ttpar <- TEtranscriptsParam(bamfiles, teFeatures=rmskLTR, geneFeatures=txdb_genes, singleEnd=TRUE, ignoreStrand=TRUE, aggregateby="repName") ttpar