---
title: "An introduction to the atena package"
author:
- name: Beatriz Calvo-Serra
  affiliation:
  - &id Dept. of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
  email: beatriz.calvo@upf.edu
- name: Robert Castelo
  affiliation: *id
  email: robert.castelo@upf.edu
package: "`r pkg_ver('atena')`"
abstract: >
  The `atena` package provides methods to quantify the expression of transposable elements within R and Bioconductor.
vignette: >
  %\VignetteIndexEntry{An introduction to the atena package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
output:
  BiocStyle::html_document:
    toc: true
    toc_float: true
    number_sections: true
bibliography: bibliography.bib
---

```{r setup, echo=FALSE}
library(knitr)

options(width=80)

knitr::opts_chunk$set(
  collapse=TRUE,
  comment="")
```

# What are transposable elements

Transposable elements (TEs) are autonomous mobile genetic elements. They are
DNA sequences that have, or once had, the ability to mobilize within the genome
either directly or through an RNA intermediate [@payer2019transposable]. TEs
can be categorized into two classes based on the intermediate substrate
propagating insertions (RNA or DNA). Class I TEs, also called retrotransposons,
first transcribe an RNA copy that is then reverse transcribed to cDNA before
inserting in the genome. In turn, these can be divided into long terminal repeat
(LTR) retrotransposons, which refer to endogenous retroviruses (ERVs), and
non-LTR retrotransposons, which include long interspersed element class 1
(LINE-1 or L1) and short interspersed elements (SINEs). Class II TEs, also known
as DNA transposons, directly excise themselves from one location before
reinsertion. TEs are further split into families and subfamilies depending on
various structural features [@goerner2018computational; @guffanti2018novel].

Most TEs have lost the capacity for generating new insertions over their
evolutionary history and are now fixed in the human population. Their insertions
have resulted in a complex distribution of interspersed repeats comprising
almost half (50%) of the human genome [@payer2019transposable].

TE expression has been observed in association with physiological processes in
a wide range of species, including humans where it has been described to be
important in early embryonic pluripotency and development. Moreover, aberrant TE
expression has been associated with diseases such as cancer, neurodegenerative
disorders, and infertility [@payer2019transposable].

# Currently available methods for quantifying TE expression

The study of TE expression faces one main challenge: given their repetitive
nature, the majority of TE-derived reads map to multiple regions of the genome
and these multi-mapping reads are consequently discarded in standard RNA-seq
data processing pipelines. For this reason, specific software packages for the
quantification of TE expression have been developed [@goerner2018computational],
such as TEtranscripts [@jin2015tetranscripts], ERVmap [@tokuyama2018ervmap] and
Telescope [@bendall2019telescope]. The main differences between these three
methods are the following: 

* [TEtranscripts](https://github.com/mhammell-laboratory/TEtranscripts)
  [@jin2015tetranscripts] reassigns multi-mapping reads to TEs proportionally
  to their relative abundance, which is estimated using an
  expectation-maximization (EM) algorithm.

* [ERVmap](https://github.com/mtokuyama/ERVmap) [@tokuyama2018ervmap] is based
  on selective filtering of multi-mapping reads. It applies filters that consist
  in discarding reads when the ratio of sum of hard and soft clipping to the
  length of the read (base pair) is greater than or equal to 0.02, the ratio of
  the edit distance to the sequence read length (base pair) is greater or equal
  to 0.02 and/or the difference between the alignment score from BWA (field AS)
  and the suboptimal alignment score from BWA (field XS) is less than 5.

* [Telescope](https://github.com/mlbendall/telescope) [@bendall2019telescope]
  reassigns multi-mapping reads to TEs using their relative abundance, which
  like in TEtranscripts, is also estimated using an EM algorithm. The main
  differences with respect to TEtranscripts are: (1) Telescope works with an
  additional parameter for each TE that estimates the proportion of
  multi-mapping reads that need to be reassigned to that TE; (2) that
  reassignment parameter is optimized during the EM algorithm jointly with the
  TE relative abundances, using a Bayesian maximum a posteriori (MAP) estimate
  that allows one to use prior values on these two parameters; and (3) using
  the final estimates on these two parameters, multi-mapping reads can be
  flexibly reassigned to TEs using different strategies, where the default one
  is to assign a multi-mapping read to the TE with largest estimated abundance
  and discard those multi-mapping reads with ties on those largest abundances.

Because these tools were only available outside R and Bioconductor, the `atena`
package provides a complete re-implementation in R of these three methods to
facilitate the integration of TE expression quantification into Bioconductor
workflows for the analysis of RNA-seq data.

# TEs annotations

Another challenge in TE expression quantification is the lack of complete TE
annotations due to the difficulty to correctly place TEs in genome assemblies
[@goerner2018computational]. One of the main sources of TE annotations are
RepeatMasker annotations, available for instance at the RepeatMasker track of
the UCSC Genome Browser. `atena` can fetch RepeatMasker annotations with the
function `annotaTEs()` and flexibly parse them by using a parsing function
provided through the parameter `parsefun`. Examples of `parsefun` included in
`atena` are:

* `rmskidentity()`: returns RepeatMasker annotations without any modification.
* `rmskbasicparser()`: filters out non-TE repeats and elements without strand
  information from RepeatMasker annotations. Then assigns a unique id to each 
  elements based on their repeat name.
* `OneCodeToFindThemAll()`: implementation of the "One Code To Find Them All"
  algorithm by @bailly2014one, for parsing RepeatMasker output files.
* `rmskatenaparser()`: attempts to reconstruct fragmented TEs by assembling 
  together fragments from the same TE that are close enough. For LTR class TEs,
  tries to reconstruct full-length and partial TEs following the LTR - internal
  region - LTR structure.

Both, the `rmskatenaparser()` and `OneCodeToFindThemAll()` parser functions
attempt to address the annotation fragmentation present in the output files of
the RepeatMasker software (i.e. presence of multiple hits, such as
homology-based matches, corresponding to a unique copy of an element). This is
highly frequent for TEs of the LTR class, where the consensus sequences are
split separately into the LTR and internal regions, causing RepeatMasker to
also report these two regions of the TE as two separate elements. These two
functions try to identify these and other cases of fragmented annotations and
assemble them together into single elements. To do so, the assembled elements
must satisfy certain criteria. These two parser functions differ in those
criteria, as well as in the approach for finding equivalences between LTR and
internal regions to reconstruct LTR retrotransposons. The `rmskatenaparser()`
function is also much faster than `OneCodeToFindThemAll()`.

## Retrieving and parsing TE annotations

As an example, let's retrieve TE annotations for *Drosophila melanogaster* 
*dm6* genome version. By setting `rmskidentity()` as argument to the
`parsefun` parameter, RepeatMasker annotations are retrieved intact as a
`GRanges` object.

```{r, message=FALSE, warning=FALSE}
library(atena)
library(BiocParallel)

rmskann <- annotaTEs(genome="dm6", parsefun=rmskidentity)
rmskann
```

We can see that we obtained annotations for `r length(rmskann)` elements. Now,
let's fetch the same RepeatMasker annotations, but process them using the
`OneCodeToFindThemAll` parser function [@bailly2014one]. We set the parameter
`strict=FALSE` to avoid applying a filter of minimum 80% identity with the
consensus sequence and minimum 80 bp length. The `insert` parameter is set to
500, meaning that two elements with the same name are merged if they are closer
than 500 bp in the annotations. The `BPPARAM` parameter allows one to run
calculations in parallel using the functionality of the
[BiocParallel](https://bioconductor.org/packages/BiocParallel) Bioconductor
package. In this particular example, we are setting the `BPPARAM` parameter
to `SerialParam(progress=FALSE)` to disable parallel calculations and progress
reporting, but a common setting if we want to run calculations in parallel
would be `BPPARAM=Multicore(workers=ncores, progress=TRUE)`, which would use
`ncores` parallel threads of execution and report progress on the calculations.

```{r, message=FALSE, warning=FALSE}
teann <- annotaTEs(genome="dm6", parsefun=OneCodeToFindThemAll, strict=FALSE,
                   insert=500, BPPARAM=SerialParam(progress=FALSE))
length(teann)
teann[1]
```

As expected, we get a lower number of elements in the annotations, because
repeats that are not TEs have been removed. Furthermore, some fragmented
regions of TEs have been assembled together.

This time, the resulting `teann` object is of class `GRangesList`. Each
element of the list represents an assembled TE containing a `GRanges` object of
length one, if the TE could not be not assembled with another element, or of
length greater than one, if two or more fragments were assembled together into a
single TE.

We can get more information of the parsed annotations by accessing the 
metadata columns with `mcols()`:

```{r}
mcols(teann)
```

There is information about the reconstruction status of the TE (*Status*
column), the relative length of the reconstructed TE (*RelLength*) and the
repeat class of the TE (*Class*). The relative length is calculated by adding
the length (in base pairs) of all fragments from the same assembled TE, and
dividing that sum by the length (in base pairs) of the consensus sequence. For
full-length and partially reconstructed LTR TEs, the consensus sequence length
used is the one resulting from adding twice the consensus sequence length of
the long terminal repeat (LTR) and the one from the corresponding internal
region. For solo-LTRs, the consensus sequence length of the long terminal
repeat is used.

We can get an insight into the composition of the assembled annotations using
the information from the *status* column. Let's look at the absolute
frequencies of the status and class of TEs in the annotations.

```{r comparsedann, message=FALSE, height=6, width=10, out.width="800px", fig.cap="Composition of parsed TE annotations.", echo=FALSE}
library(RColorBrewer)

pal1 <- brewer.pal(6, "Pastel2")
pal2 <- brewer.pal(length(unique(mcols(teann)$Class)), "Set2")

par(mfrow = c(1,2), mar = c(5,4,3,1))
bp1 <- barplot(table(mcols(teann)$Status), col = pal1, border = "black",
        main = "TEs by status", cex.axis=0.8, xaxt = "n")
grid(nx=NA, ny=NULL)
axis(1, at=bp1, labels = FALSE, las=1, line=0, lwd = 0, lwd.ticks = 1) 
par(xpd=TRUE)
text(x= bp1[, 1] - 0.3, y = 10, labels=names(table(mcols(teann)$Status)), 
     srt=40, offset = 1.7, cex = 0.8, pos = 1)
par(xpd=FALSE)

bp2 <- barplot(table(mcols(teann)$Class), col = pal2, border = "black",
        main = "TEs by class", cex.axis=0.8, xaxt = "n",
        ylim = c(0,max(table(mcols(teann)$Status))))
grid(nx=NA, ny=NULL)
axis(1, at=bp2, labels = FALSE, las=1, line=0, lwd = 0, lwd.ticks = 1) 
par(xpd=TRUE)
text(x= bp2[, 1] - 0.1, y = 10, labels=names(table(mcols(teann)$Class)), 
     srt=35, offset = 1.2, cex = 0.8, pos = 1)
par(xpd=FALSE)
```

Here, *full-lengthLTR* are reconstructed LTR retrotransposons following the
LTR - internal region (int) - LTR structure. Partially reconstructed LTR TEs
are *partialLTR_down* (internal region followed by a downstream LTR) and
*partialLTR_up* (LTR upstream of an internal region). *int* and *LTR*
correspond to internal and solo-LTR regions, respectively. Finally, the
*noLTR* refers to TEs of other classes (not LTR), as well as TEs of class LTR
which could not be identified as either internal or long terminal repeat
regions based on their name.

Moreover, the `atena` package provides getter functions to retrieve TEs of a
specific class, using a specific relative length threshold. Those TEs with
higher relative lengths are more likely to have intact open reading frames,
making them more interesting for expression quantification and functional
analyses. For example, to get LINEs with a minimum of 0.9 relative length, we
can use the `getLINEs()` function. We use the TE annotations in `teann` we
obtained before and set the `relLength` to 0.9.

```{r}
rmskLINE <- getLINEs(teann, relLength=0.9)
length(rmskLINE)
rmskLINE[1]
```

To get LTR retrotransposons, we can use the function `getLTRs()`. This function
also allows to get one or more specific types of reconstructed TEs. To get
full-length, partial LTRs and other fragments that could not be reconstructed,
we can:

```{r}
rmskLTR <- getLTRs(teann, relLength=0.8, fullLength=TRUE, partial=TRUE,
                   otherLTR=TRUE)
length(rmskLTR)
rmskLTR[1]
```

To obtain DNA transposons and SINEs, one can use the `getDNAtransposons()` and
`getSINEs()` functions, respectively.

# TE expression quantification

Quantification of TE expression with `atena` consists in the following two
steps:

1. Building of a parameter object for one of the available quantification
   methods.

2. Calling the TE expression quantification method `qtex()` using the
   previously built parameter object.
   
The dataset that will be used to illustrate how to quantify TE expression with
`atena` is a published RNA-seq dataset of _Drosophila melanogaster_ available
at the National Center for Biotechnology Information (NCBI) Gene Expression
Omnibus (GEO) under accession
[GSE47006](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE47006)). 
The two selected samples are: a piwi knockdown and a piwi control (GSM1142845
and GSM1142844). These files have been subsampled. The piwi-associated
silencing complex (piRISC) silences TEs in the _Drosophila_ ovary, hence the
knockdown of piwi causes the de-repression of TEs. Here we show how the
expression of full-length LTR retrotransposons present in `rmskLTR` can be
easily quantified using `atena`.

## Building a parameter object

A parameter object is build calling a specific function for the quantification
method we want to use. Independenty of each method, all parameter object
constructor functions require that the first two arguments specify the BAM
files and the TE annotation, respectively.

### ERVmap

To use the ERVmap method in `atena` we should first build an object of the
class `ERVmapParam` using the function `ERVmapParam()`. The `singleEnd`
parameter is set to `TRUE` since the example BAM files are single-end. The
`ignoreStrand` parameter works analogously to the same parameter in the
function `summarizeOverlaps()` from package `r Biocpkg("GenomicAlignments")`
and should be set to `TRUE` whenever the RNA library preparation protocol was
stranded.

One of the filters applied by the ERVmap method compares the alignment score of
a given primary alignment, stored in the `AS` tag of a SAM record, to the
largest alignment score among every other secondary alignment, known as the
suboptimal alignment score. The original ERVmap software assumes that input BAM
files are generated using the Burrows-Wheeler Aligner (BWA) software
[@li2009fast], which stores suboptimal alignment scores in the `XS` tag.
Although `AS` is an optional tag, most short-read aligners provide this tag
with alignment scores in BAM files. However, the suboptimal alignment score,
stored in the `XS` tag by BWA, is either stored in a different tag or not
stored at all by other short-read aligner software, such as STAR
[@dobin2013star].

To enable using ERVmap on BAM files produced by short-read aligner software
other than BWA, `atena` allows the user to set the argument
`suboptimalAlignmentTag` to one of the following three possible values:

* The name of a tag different to `XS` that stores the suboptimal alignment
  score.

* The value "none", which will trigger the calculation of the suboptimal
  alignment score by searching for the largest value stored in the `AS` tag
  among all available secondary alignments.

* The value "auto" (default), by which `atena` will first extract the name of
  the short-read aligner software from the BAM file and if that software is
  BWA, then suboptimal alignment scores will be obtained from the `XS` tag.
  Otherwise, it will trigger the calculation previously explained for
  `suboptimalAlignemntTag="none"`.

Finally, this filter is applied by comparing the difference between alignment
and suboptimal alignment scores to a cutoff value, which by default is 5 but
can be modified using the parameter `suboptimalAlignmentCutoff`. The default
value 5 is the one employed in the original ERVmap software that assumes the
BAM file was generated with BWA and for which lower values are interpreted as
"equivalent to second best match has one or more mismatches than the best
match" [@tokuyama2018ervmap, pg. 12571]. From a different perspective, in BWA
the mismatch penalty has a value of 4 and therefore, a
`suboptimalAlignmentCutoff` value of 5 only retains those reads where the
suboptimal alignment has at least 1 mismatch more than the best match.
Therefore, the `suboptimalAlignmentCutoff` value is specific to the short-read
mapper software and we recommend to set this value according to the mismatch
penalty of that software. Another option is to set
`suboptimalAlignmentCutoff=NA`, which prevents the filtering of reads based on
this criteria, as set in the following example.

```{r}
bamfiles <- list.files(system.file("extdata", package="atena"),
                       pattern="*.bam", full.names=TRUE)
empar <- ERVmapParam(bamfiles, 
                     teFeatures=rmskLTR, 
                     singleEnd=TRUE, 
                     ignoreStrand=TRUE, 
                     suboptimalAlignmentCutoff=NA)
empar
```

In the case of paired-end BAM files (`singleEnd=FALSE`), two additional
arguments can be specified, `strandMode` and `fragments`:

* `strandMode` defines the behavior of the strand getter when internally
  reading the BAM files with the `GAlignmentPairs()` function. See the help
  page of `strandMode` in the `r Biocpkg("GenomicAlignments")` package for
  further details.
 
* `fragments` controls how read filtering and counting criteria are applied to
  the read mates in a paired-end read. To use the original ERVmap algorithm
  [@tokuyama2018ervmap] one should set `fragments=TRUE` (default when
  `singleEnd=FALSE`), which filters and counts each mate of a paired-end read
  independently (i.e., two read mates overlapping the same feature count twice
  on that feature, treating paired-end reads as if they were single-end). On
  the other hand, when `fragments=FALSE`, if the two read mates pass the
  filtering criteria and overlap the same feature, they count once on that
  feature. If either read mate fails to pass the filtering criteria, then both
  read mates are discarded.

An additional functionality with respect to the original ERVmap software is the
integration of gene and TE expression quantification. The original ERVmap
software doesn't quantify TE and gene expression coordinately and this can
potentially lead to counting twice reads that simultaneously overlap a gene and
a TE. In `atena`, gene expression is quantified based on the approach used in
the TEtranscripts software [@jin2015tetranscripts]: unique reads are preferably
assigned to genes, whereas multi-mapping reads are preferably assigned to TEs.

In case that a unique read does not overlap a gene or a multi-mapping read does
not overlap a TE, `atena` searches for overlaps with TEs or genes,
respectively. Given the different treatment of unique and multi-mapping reads,
`atena` requires the information regarding the _unique_ or _multi-mapping_
status of a read. This information is obtained from the presence of secondary
alignments in the BAM file or, alternatively, from the `NH` tag in the BAM file
(number of reported alignments that contain the query in the current SAM
record). Therefore, either secondary alignments or the `NH` tag need to be
present for gene expression quantification.

The original ERVmap approach does not discard any read overlapping gene
annotations. However, this can be changed using the parameter `geneCountMode`,
which by default `geneCountMode="all"` and follows the behavior in the original
ERVmap method. On the contrary, by setting `geneCountMode="ervmap"`, `atena`
also applies the filtering criteria employed to quantify TE expression to the
reads overlapping gene annotations.

Finally, `atena` also allows one to aggregate TE expression quantifications. By
default, the names of the input `GRanges` or `GRangesList` object given in the
`teFeatures` parameter are used to aggregate quantifications. However, the
`aggregateby` parameter can be used to specify other column names in the
feature annotations to be used to aggregate TE counts, for example at the
sub-family level.

### Telescope

To use the Telescope method for TE expression quantification, the
`TelescopeParam()` function is used to build a parameter object of the class
`TelescopeParam`.

As in the case of `ERVmapParam()`, the `aggregateby` argument, which should be
a character vector of column names in the annotation, determines the columns to
be used to aggregate TE expression quantifications. This way, `atena` provides
not only quantifications at the subfamily level, but also allows to quantify
TEs at the desired level (family, class, etc.), including locus based
quantifications. For such a use case, the object with the TE annotations should
include a column with unique identifiers for each TE locus and the
`aggregateby` argument should specify the name of that column. When
`aggregateby` is not specified, the `names()` of the object containing TE
annotations are used to aggregate quantifications.

Here, TE quantifications will be aggregated according to the `names()` of the
`rmskLTR` object.

```{r}
bamfiles <- list.files(system.file("extdata", package="atena"),
                       pattern="*.bam", full.names=TRUE)
tspar <- TelescopeParam(bfl=bamfiles, 
                        teFeatures=rmskLTR, 
                        singleEnd=TRUE, 
                        ignoreStrand=TRUE)
tspar
```

In case of paired-end data (`singleEnd=FALSE`), the argument usage is similar
to that of `ERVmapParam()`. In relation to the BAM file, Telescope follows the
same approach as the ERVmap method: when `fragments=FALSE`, only _mated read
pairs_ from opposite strands are considered, while when `fragments=TRUE`,
same-strand pairs, singletons, reads with unmapped pairs and other fragments
are also considered by the algorithm. However, there is one important
difference with respect to the counting approach followed by ERVmap: when
`fragments=TRUE` _mated read pairs_ mapping to the same element are counted
once, whereas in the ERVmap method they are counted twice.

As in the ERVmap method from `atena`, the gene expression quantification method
in Telescope is based on the approach of the TEtranscripts software
[@jin2015tetranscripts]. This way, `atena` provides the possibility to
integrate TE expression quantification by Telescope with gene expression
quantification. As in the case of the ERVmap method implemented in `atena`,
either secondary alignments or the `NH` tag are required for gene expression
quantification.

### TEtranscripts

Finally, the third method available is TEtranscripts. First, the
`TEtranscriptsParam()` function is called to build a parameter object of the
class `TEtranscriptsParam`. The usage of the `aggregateby` argument is the same
as in `TelescopeParam()` and `ERVmapParam()`. Locus based quantifications in
the TEtranscripts method from `atena` is possible because the TEtranscripts
algorithm actually computes TE quantifications at the locus level and then sums
up all instances of each TE subfamily to provide expression at the subfamily
level. By avoiding this last step, `atena` can provide TE expression
quantification at the locus level using the TEtranscripts method. For such a
use case, the object with the TE annotations should include a column with
unique identifiers for each TE and the `aggregateby` argument should specify
the name of that column.

In this example, the `aggregateby` argument will be set to
`aggregateby="repName"` in order to aggregate quantifications at the repeat
name level. Moreover, gene expression will also be quantified. To do so, gene
annotations are loaded from a *TxDb* object.

```{r, message=FALSE}
library(TxDb.Dmelanogaster.UCSC.dm6.ensGene)

txdb <- TxDb.Dmelanogaster.UCSC.dm6.ensGene
gannot <- exonsBy(txdb, by="gene")
length(gannot)
```

```{r}
bamfiles <- list.files(system.file("extdata", package="atena"),
                       pattern="*.bam", full.names=TRUE)
ttpar <- TEtranscriptsParam(bamfiles, 
                            teFeatures=rmskLTR,
                            geneFeatures=gannot,
                            singleEnd=TRUE, 
                            ignoreStrand=TRUE, 
                            aggregateby="repName")
ttpar
```

For paired-end data, where would set `singleEnd=FALSE`, the `fragments`
parameter has the same purpose as in `TelescopeParam()`. We can also
extract the TEs and gene combined feature set using the `features()`
function on the parameter object. A metadata column called `isTE` is added
to enable distinguishing TEs from gene annotations.

```{r}
features(ttpar)
mcols(features(ttpar))
table(mcols(features(ttpar))$isTE)
```

Regarding gene expression quantification, `atena` has implemented the approach
of the original TEtranscripts software [@jin2015tetranscripts]. As in the case
of the ERVmap and Telescope methods from `atena`, either secondary alignments
or the `NH` tag are required.

Following the gene annotation processing present in the TEtranscripts
algorithm, in case that `geneFeatures` contains a metadata column named "type",
only the elements with `type="exon"` are considered for quantification. If
those elements are grouped through a `GRangesList` object, then counts are
aggregated at the level of those `GRangesList` elements, such as genes or
transcripts. This also applies to the ERVmap and Telescope methods implemented
in `atena` when gene features are present. Let's see an example of this
processing:

```{r}
## Create a toy example of gene annotations
geneannot <- GRanges(seqnames=rep("2L", 8),
                     ranges=IRanges(start=c(1,20,45,80,110,130,150,170),
                                    width=c(10,20,35,10,5,15,10,25)),
                     strand="*", 
                     type=rep("exon",8))
names(geneannot) <- paste0("gene",c(rep(1,3),rep(2,4),rep(3,1)))
geneannot
ttpar2 <- TEtranscriptsParam(bamfiles, 
                             teFeatures=rmskLTR, 
                             geneFeatures=geneannot, 
                             singleEnd=TRUE, 
                             ignoreStrand=TRUE)
mcols(features(ttpar2))
features(ttpar2)[!mcols(features(ttpar2))$isTE]
```

### Quantifying expression

Finally, to quantify TE expression we call the `qtex()` method using one of the
previously defined parameter objects (`ERVmapParam`, `TEtranscriptsParam` or
`TelescopeParam`) according to the quantification method we want to use. As with
the `OneCodeToFindThemAll()` function described before, here we can also use the
`BPPARAM` parameter to perform calculations in parallel.

The `qtex()` method returns a `SummarizedExperiment` object containing the
resulting quantification of expression in an assay slot. Additionally, when a
`data.frame`, or `DataFrame`, object storing phenotypic data is passed to the
`qtex()` function through the `phenodata` parameter, this will be included as
column data in the resulting `SummarizedExperiment` object and the row names of
these phenotypic data will be set as column names in the output
`SummarizedExperiment` object.

In the current example, the call to quantify TE expression using the ERVmap
method would be the following:

```{r, results='hide'}
emq <- qtex(empar)
```
```{r}
emq
colSums(assay(emq))
```

In the case of the Telescope method, the call would be as follows:

```{r, results='hide'}
tsq <- qtex(tspar)
```
```{r}
tsq
colSums(assay(tsq))
```

For the TEtranscripts method, TE expression is quantified by using the
following call:

```{r, results='hide'}
ttq <- qtex(ttpar)
```
```{r}
ttq
colSums(assay(ttq))
```

As mentioned, TE expression quantification is provided at the repeat name
level.

# Accesing expression quantifications and metadata

The `qtex()` function returns a `SummarizedExperiment` object that, on the
one hand, stores the quantified expression in its assay data.

```{r}
head(assay(ttq))
```

On the other hand, it contains metadata about the features that may be useful
to select subsets of the quantified data and extract and explore the feature
annotations, using the function `rowData()` on this `SummarizedExperiment`
object.

```{r}
rowData(ttq)
```

Because we have aggregated quantifications by `RepName` the number of TE
quantified features has been substantially reduced with respect to the original
number of TE features.

```{r}
table(rowData(ttq)$isTE)
```

Let's say we want to select full-length LTRs features, this could be a way of
doing it.

```{r}
temask <- rowData(ttq)$isTE
fullLTRs <- rowData(ttq)$Status == "full-lengthLTR"
fullLTRs <- (sapply(fullLTRs, sum, na.rm=TRUE) == 1) &
            (lengths(rowData(ttq)$Status) == 1)
sum(fullLTRs)
rowData(ttq)[fullLTRs, ]
```

Note also that since we restricted expression quantification to LTRs, we do
have only quantification for that TE class.

```{r}
table(rowData(ttq)$Class[temask])
```

# Session information

```{r session_info, cache=FALSE}
sessionInfo()
```

# References