To install the CircSeqAlignTk package (Sun and Cao 2024), start R (≥ 4.2) and run the following steps:
if (!requireNamespace('BiocManager', quietly = TRUE))
install.packages('BiocManager')
BiocManager::install('CircSeqAlignTk')
Note that to install the latest version of the CircSeqAlignTk package, the latest version of R is required.
CircSeqAlignTk is designed for end-to-end RNA-Seq data analysis of
circular genome sequences, from alignment to visualization. The whole
processes will generate many files including genome sequence indexes,
and intermediate and final alignment results. Thus, it is recommended to
specify a working directory to save these files. Here, for convenience
in package development and validation, we use a temporary folder which
is automatically arranged by the tempdir
function as the
working directory.
However, instead of using a temporary folder, users can specify a folder on the desktop or elsewhere, depending on the analysis project. For example:
Viroids are composed of 246-401 nt, single-stranded circular non-coding RNAs (Hull 2014; R. Flores et al. 2015; Gago-Zachert 2016). Sequencing small RNAs from viroid-infected plants could offer insights regarding the mechanisms of infection and eventually help prevent these infections in plants. The common workflow for analyzing such data involves the following steps: (i) limit read-length between 21 and 24 nt, as small RNAs derived from viroids are known to be in this range, (ii) align these reads to viroid genome sequences, and (iii) visualize the coverage of alignment to identify the pathogenic region of the viroid. This section demonstrates the workflow using a sample RNA-Seq dataset. It includes workflow from the FASTQ format file to the visualization of the analyzed results, for analyzing small RNA-seq data sequenced from viroid-infected plants.
The FASTQ format file used in this section is attached in the CircSeqAlignTk
package and can be obtained using the system.file
function.
This FASTQ format file contains 29,178 sequence reads of small RNAs that
were sequenced from a tomato plant infected with the potato spindle
tuber viroid (PSTVd) isolate Cen-1 (FR851463).
The genome sequence of PSTVd isolate Cen-1 in FASTA format can be
downloaded from GenBank or ENA using the
accession number FR851463. It is also attached in the CircSeqAlignTk
package, and can be obtained using the system.file
function.
To ensure alignment quality, trimming adapter sequences from the sequence reads is required, because most sequence reads in this FASTQ format file contain adapters with sequence “AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC”. Here, we use AdapterRemoval (Schubert, Lindgreen, and Orlando (2016)) implemented in the Rbowtie2 package (Wei et al. 2018) to trim the adapters before aligning the sequence reads. Note that the length of small RNAs derived from viroids is known to be in the range of 21–24 nt. Therefore, we set an argument to remove sequence reads with lengths outside this range after adapter removal.
library(R.utils)
library(Rbowtie2)
adapter <- 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
# decompressed the gzip file for trimming to avoid errors from `remove_adapters`
gunzip(fq, destname = file.path(ws, 'srna.fq'), overwrite = TRUE, remove = FALSE)
trimmed_fq <- file.path(ws, 'srna_trimmed.fq')
params <- '--maxns 1 --trimqualities --minquality 30 --minlength 21 --maxlength 24'
remove_adapters(file1 = file.path(ws, 'srna.fq'),
adapter1 = adapter,
adapter2 = NULL,
output1 = trimmed_fq,
params,
basename = file.path(ws, 'AdapterRemoval.log'),
overwrite = TRUE)
After obtaining the cleaned FASTQ format file (i.e.,
srna_trimmed.fq.gz
), we build index files and perform
alignment using the build_index
and
align_reads
functions implemented in the CircSeqAlignTk
package. To precisely align the reads to the circular genome sequence of
the viroid, the alignment is performed in two stages.
ref_index <- build_index(input = genome_seq,
output = file.path(ws, 'index'))
aln <- align_reads(input = trimmed_fq,
index = ref_index,
output = file.path(ws, 'align_results'))
The index files are stored in a directory specified by the
output
argument of the build_index
function.
The intermediate files (e.g., FASTQ format files used as inputs) and
alignment results (e.g., BAM format files) are stored in the directory
specified by the output
argument of the
align_reads
function. BAM format files with the suffixes of
.clean.t1.bam
and .clean.t2.bam
are the final
results obtained after alignment. Refer to the sections
@ref(generation-of-reference-sequences) and @ref(alignment) for a
detailed description of each of the files generated by each
function.
The alignment coverage can be summarized with the
calc_coverage
function. This function loads the alignment
results (i.e., *.clean.t1.bam
and
*.clean.t2.bam
), calculates alignment coverage from these
BAM format files, and combines them into two data frames according to
the aligned strands.
alncov <- calc_coverage(aln)
head(get_slot_contents(alncov, 'forward')) # alignment coverage in forward strand
## L21 L22 L23 L24
## [1,] 13 12 1 1
## [2,] 13 12 1 1
## [3,] 13 12 1 1
## [4,] 13 13 1 1
## [5,] 13 13 1 1
## [6,] 13 13 1 1
## L21 L22 L23 L24
## [1,] 7 5 0 1
## [2,] 7 5 0 1
## [3,] 7 5 0 1
## [4,] 7 5 0 1
## [5,] 7 5 0 1
## [6,] 7 5 0 1
The alignment coverage can be then visualized using the
plot
function (Figure @ref(fig:quickStartVisualization)).
The scale of the upper and lower directions indicate alignment coverage
of the forward and reverse strands, respectively.
Circular genome sequences are generally represented as linear sequences in the FASTA format during analysis. Consequently, sequence reads obtained from organelles or organisms with circular genome sequences can be aligned anywhere, including at both ends of the sequence represented in the FASTA format. Using existing alignment tools such as Bowtie2 (Langmead and Salzberg 2012) and HISAT2 (Kim et al. 2019) to align such sequence reads onto circular sequences may fail, because these tools are designed to align sequence reads to linear genome sequences and their implementation does not assume that a single read can be aligned to both ends of a linear sequence. To solve this problem, that is, allowing reads to be aligned to both ends, the CircSeqAlignTk package implements a two-stage alignment process (Figure @ref(fig:packageImplementation)), using these existing alignment tools (Bowtie2 and HISAT2).
To prepare for the two-stage alignment process, two types of
reference sequences are generated from the same circular genome
sequence. The type 1 reference sequence is a linear sequence generated
by cutting a circular sequence at an arbitrary location. The type 2
reference is generated by restoring the type 1 reference sequence into a
circular sequence and cutting the circle at the opposite position to
type 1 reference sequence. The type 1 reference sequence is the input
genome sequence itself, while the type 2 reference sequence is newly
created (by the build_index
function).
Once the two reference sequences are generated, the sequence reads are aligned to the two types of reference sequences in two stages: (i) aligning all sequence reads onto the type 1 reference sequences, and (ii) collecting the unaligned sequence reads and aligning them to the type 2 reference. Alignment can be performed with Bowtie2 or HISAT2 depending on the options specified by the user.
The build_index
function is designed to generate type 1
and type 2 reference sequences for alignment. This function has two
required arguments, input
and output
which are
used for specifying a file path to a genome sequence in FASTA format and
a directory path to save the generated type 1 and type 2 reference
sequences, respectively. The type 1 and type 2 reference sequences are
saved in files refseq.t1.fa
and refseq.t2.fa
in FASTA format, respectively.
Following the generation of reference sequences, The
build_index
function then creates index files for each
reference sequence for alignment. The index files are saved with the
prefix refseq.t1.*
and refseq.t2.*
. They
correspond to the type 1 and 2 reference sequences (i.e.,
refseq.t1.fa
and refseq.t2.fa
), respectively.
The extension of index files depends on the alignment tool.
Two alignment tools (Bowtie2
and HISAT2) can be
specified for creating index files through the aligner
argument. If Bowtie2
is specified, then the extension is .bt2
or
.bt2l
; if HISAT2 is specified,
then the extension is .ht2
or .ht2l
. By
default, Bowtie2
is used. The build_index
function first attempts to call
the specified alignment tool directly installed on the operation system;
however, if the tool is not installed, the function will then attempt to
call the bowtie2_build
or hisat2_build
functions implemented in Rbowtie2
or Rhisat2
packages for indexing.
For example, to generate reference sequences and index files for
alignment against the viroid PSTVd isolate Cen-1 (FR851463) using Bowtie2/Rbowtie2,
we set the argument input
to the FASTA format file
containing the sequence of FR851463 and execute the
build_index
function. The generated index files will be
saved into the folder specified by the argument output
.
genome_seq <- system.file(package = 'CircSeqAlignTk', 'extdata', 'FR851463.fa')
ref_index <- build_index(input = genome_seq, output = file.path(ws, 'index'))
The function returns a CircSeqAlignTkRefIndex
class
object that contains the file path to type 1 and 2 reference sequences
and corresponding index files. The data structure of
CircSeqAlignTkRefIndex
can be verified using the
str
function.
## Formal class 'CircSeqAlignTkRefIndex' [package "CircSeqAlignTk"] with 6 slots
## ..@ name : chr "FR851463"
## ..@ seq : chr "CGGAACTAAACTCGTGGTTCCTGTGGTTCACACCTGACCTCCTGAGCAGGAAAGAAAAAAGAATTGCGGCTCGGAGGAGCGCTTCAGGGATCCCCGGGGAAACCTGGAGCG"| __truncated__
## ..@ length : int 361
## ..@ fasta : chr [1:2] "/tmp/RtmpprKR4V/index/refseq.t1.fa" "/tmp/RtmpprKR4V/index/refseq.t2.fa"
## ..@ index : chr [1:2] "/tmp/RtmpprKR4V/index/refseq.t1" "/tmp/RtmpprKR4V/index/refseq.t2"
## ..@ cut_loc: num 180
The file path to type 1 and type 2 reference sequences,
refseq.type1.fa
and refseq.type2.fa
, can be
checked through the @fasta
slot using the
get_slot_contents
function.
## [1] "/tmp/RtmpprKR4V/index/refseq.t1.fa" "/tmp/RtmpprKR4V/index/refseq.t2.fa"
The file path (prefix) to the index files,
refseq.t1.*.bt2
and refseq.t2.*.bt2
, can be
checked through @index
slot.
## [1] "/tmp/RtmpprKR4V/index/refseq.t1" "/tmp/RtmpprKR4V/index/refseq.t2"
Note that, users can simply use the slot
function or
@
operator to access these slot contents instead of using
the get_slot_contents
function. For example,
As mentioned previously, the type 2 reference is generated by
restoring the type 1 reference sequence to a circular sequence and
cutting the circular sequence at the opposite position of type 1. The
cutting position based on the type 1 reference sequence coordinate can
be checked from the @cut_loc
slot.
## [1] 180
By default, Bowtie2/Rbowtie2
is used for indexing. This can be changed to HISAT2/Rhisat2
using the aligner
argument.
The align_reads
function is used to align sequence reads
onto a circular genome sequence. This function requires three arguments:
input
, index
, and output
, which
are used to specify a file path to RNA-seq reads in FASTQ format, a
CircSeqAlignTkRefIndex
class object generated by the
build_index
function, and a directory path to save the
intermediate and final results, respectively.
This function aligns sequence reads within the two-stage alignment process
described above. Thus, it (i) aligns reads to the type 1 reference
sequence (i.e., refseq.t1.fa
) and (ii) collects the
unaligned reads and aligns them with the type 2 reference sequence
(i.e., refseq.t2.fa
).
Two alignment tools (Bowtie2
and HISAT2) can be
specified for sequence read alignment. By default, Bowtie2
is used, and it can be changed with the alinger
argument.
Similar to the build_index
function, the
align_reads
function first attempts to call the specified
alignment tool directly installed on the operation system; however, if
the tool is not installed, the function then attempts to call the
bowtie2_build
or hisat2_build
function
implemented in Rbowtie2
or Rhisat2
packages for alignment.
The following example is aligning RNA-Seq reads in FASTQ format
(fq
) on the reference index (ref_index
) of
PSTVd isolate Cen-1 (FR851463) which was generated at the section
@ref(generation-of-reference-sequences). The alignment results will be
stored into the folder specified by the argument
output
.
fq <- system.file(package = 'CircSeqAlignTk', 'extdata', 'srna.fq.gz')
# trimming the adapter sequences if needed before alignment, omitted here.
aln <- align_reads(input = fq,
index = ref_index,
output = file.path(ws, 'align_results'))
This function returns a CircSeqAlignTkAlign
class object
containing the path to the intermediate files and final alignment
results.
## Formal class 'CircSeqAlignTkAlign' [package "CircSeqAlignTk"] with 6 slots
## ..@ input_fastq: chr "/tmp/Rtmpm9tc7H/Rinst1e071d5f7f8a/CircSeqAlignTk/extdata/srna.fq.gz"
## ..@ fastq : chr [1:2] "/tmp/Rtmpm9tc7H/Rinst1e071d5f7f8a/CircSeqAlignTk/extdata/srna.fq.gz" "/tmp/RtmpprKR4V/align_results/srna.t2.fq.gz"
## ..@ bam : chr [1:2] "/tmp/RtmpprKR4V/align_results/srna.t1.bam" "/tmp/RtmpprKR4V/align_results/srna.t2.bam"
## ..@ clean_bam : chr [1:2] "/tmp/RtmpprKR4V/align_results/srna.clean.t1.bam" "/tmp/RtmpprKR4V/align_results/srna.clean.t2.bam"
## ..@ stats :'data.frame': 4 obs. of 5 variables:
## .. ..$ n_reads : num [1:4] 29178 29012 166 30
## .. ..$ aligned_fwd : num [1:4] 89 22 89 21
## .. ..$ aligned_rev : num [1:4] 77 9 77 9
## .. ..$ unaligned : num [1:4] 29012 28981 0 0
## .. ..$ unsorted_reads: num [1:4] 0 0 0 0
## ..@ reference :Formal class 'CircSeqAlignTkRefIndex' [package "CircSeqAlignTk"] with 6 slots
## .. .. ..@ name : chr "FR851463"
## .. .. ..@ seq : chr "CGGAACTAAACTCGTGGTTCCTGTGGTTCACACCTGACCTCCTGAGCAGGAAAGAAAAAAGAATTGCGGCTCGGAGGAGCGCTTCAGGGATCCCCGGGGAAACCTGGAGCG"| __truncated__
## .. .. ..@ length : int 361
## .. .. ..@ fasta : chr [1:2] "/tmp/RtmpprKR4V/index/refseq.t1.fa" "/tmp/RtmpprKR4V/index/refseq.t2.fa"
## .. .. ..@ index : chr [1:2] "/tmp/RtmpprKR4V/index/refseq.t1" "/tmp/RtmpprKR4V/index/refseq.t2"
## .. .. ..@ cut_loc: num 180
The alignment results are saved as BAM format files in the specified
directory with the suffixes of *.t1.bam
and
*.t2.bam
. The original alignment results may contain
mismatches. Hence, this function performs filtering to remove alignment
with the mismatches over the specified value from the BAM format file.
Filtering results for *.t1.bam
and *.t2.bam
are saved as *.clean.t1.bam
and
*.clean.t2.bam
, respectively. The path to the original and
filtered BAM format files can be checked using @bam
and
@clean_bam
slots, respectively.
## [1] "/tmp/RtmpprKR4V/align_results/srna.t1.bam"
## [2] "/tmp/RtmpprKR4V/align_results/srna.t2.bam"
## [1] "/tmp/RtmpprKR4V/align_results/srna.clean.t1.bam"
## [2] "/tmp/RtmpprKR4V/align_results/srna.clean.t2.bam"
The alignment statistics (for example, number of input sequence
reads, number of aligned reads) can be checked using the
@stats
slot.
## n_reads aligned_fwd aligned_rev unaligned unsorted_reads
## srna.t1.bam 29178 89 77 29012 0
## srna.t2.bam 29012 22 9 28981 0
## srna.clean.t1.bam 166 89 77 0 0
## srna.clean.t2.bam 30 21 9 0 0
By default, the align_read
function allows a single
mismatch in the alignment of each read (i.e.,
n_mismatch = 1
). To forbid a mismatch or allow more
mismatches, assign 0
or a large number to the
n_mismatch
argument.
aln <- align_reads(input = fq,
index = ref_index,
output = file.path(ws, 'align_results'),
n_mismatch = 0)
The number of threads for alignment can be specified using the
n_threads
argument. Setting a large number of threads (but
not exceeding the computer limits) can accelerate the speed of
alignment.
aln <- align_reads(input = fq,
index = ref_index,
output = file.path(ws, 'align_results'),
n_threads = 4)
Additional arguments to be directly passed on to the alignment tool
can be specified with the add_args
argument. For example,
to increase the alignment sensitivity, we set the maximum number of
mismatches to 1 and the length of seed substrings for alignment to 20
during the process of the Bowtie2 multiseed
alignment. See the Bowtie2
website to find additional parameters of Bowtie2.
aln <- align_reads(input = fq,
index = ref_index,
output = file.path(ws, 'align_results'),
add_args = '-L 20 -N 1')
To use HISAT2/Rhisat2,
assign hisat2
to the aligner
argument.
Summarization and visualization of the alignment results can be
performed with the calc_coverage
and plot
functions, respectively. The calc_coverage
function
calculates alignment coverage from the two BAM files,
*.clean.t1.bam
and *.clean.t2.bam
, generated
by the align_reads
function.
This function returns a CircSeqAlignTkCoverage
class
object. Alignment coverage of the reads aligned in the forward and
reverse strands are stored in the @forward
and
@reverse
slots, respectively, as a data frame.
## L21 L22 L23 L24
## [1,] 12 8 1 0
## [2,] 12 8 1 0
## [3,] 12 8 1 0
## [4,] 12 9 1 0
## [5,] 13 9 1 0
## [6,] 13 9 1 0
## L21 L22 L23 L24
## [1,] 5 4 0 0
## [2,] 5 4 0 0
## [3,] 5 4 0 0
## [4,] 5 4 0 0
## [5,] 5 4 0 0
## [6,] 5 4 0 0
Coverage can be visualized with an area chart using the
plot
function. In the chart, the upper and lower directions
of the y-axis represent the alignment coverage of reads with forward and
reverse strands, respectively.
To plot alignment coverage of the reads with a specific length,
assign the targeted length to the read_lengths
argument.
As the plot
function returns a ggplot2 class object, we
can use additional functions implemented in the ggplot2
package (Wickham 2016) to decorate the
chart, for example:
The CircSeqAlignTk
package implements the generate_fastq
function to generate
synthetic sequence reads in FASTQ format to mimic RNA-Seq data sequenced
from organelles or organisms with circular genome sequences. This
function is intended for the use of developers, to help them evaluate
the performance of alignment tools, new alignment algorithms, and new
workflows.
To generate synthetic sequence reads with default parameters and save
them into a file named synthetic_reads.fq.gz
in
GZIP-compressed FASTQ format, run the following command. By default, it
generates 10,000 reads.
This function returns a CircSeqAlignTkSim
class object
whose data structure can be checked with the str
function,
as follows:
## Formal class 'CircSeqAlignTkSim' [package "CircSeqAlignTk"] with 6 slots
## ..@ seq : chr "CGGAACTAAACTCGTGGTTCCTGTGGTTCACACCTGACCTCCTGAGCAGGAAAGAAAAAAGAATTGCGGCTCGGAGGAGCGCTTCAGGGATCCCCGGGGAAACCTGGAGCG"| __truncated__
## ..@ adapter : chr "AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC"
## ..@ read_info:'data.frame': 10000 obs. of 8 variables:
## .. ..$ mean : num [1:10000] 341 74 227 65 341 239 341 239 239 341 ...
## .. ..$ std : num [1:10000] 4 4 4 4 4 3 4 3 3 4 ...
## .. ..$ strand: chr [1:10000] "+" "+" "+" "+" ...
## .. ..$ prob : num [1:10000] 0.108 0.195 0.11 0.15 0.108 ...
## .. ..$ start : num [1:10000] 703 436 589 422 701 602 699 598 598 701 ...
## .. ..$ end : num [1:10000] 726 458 609 443 724 625 722 621 621 722 ...
## .. ..$ sRNA : chr [1:10000] "TGGAACCGCAGTTGGTTCCTCGGA" "AGGAGCGCTTCAGGGATCCCCGG" "CCCTCGCCCCCTTTGCGCTGT" "GAATTGCGGCTCGGAGGAGCGC" ...
## .. ..$ length: num [1:10000] 24 23 21 22 24 24 24 24 24 22 ...
## ..@ peak :'data.frame': 8 obs. of 4 variables:
## .. ..$ mean : num [1:8] 74 324 341 239 227 23 75 65
## .. ..$ std : num [1:8] 4 3 4 3 4 5 3 4
## .. ..$ strand: chr [1:8] "+" "+" "+" "+" ...
## .. ..$ prob : num [1:8] 0.1947 0.0762 0.1079 0.1342 0.1105 ...
## ..@ coverage :Formal class 'CircSeqAlignTkCoverage' [package "CircSeqAlignTk"] with 3 slots
## .. .. ..@ forward : num [1:361, 1:4] 56 30 10 3 0 0 0 0 0 0 ...
## .. .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. .. ..$ : NULL
## .. .. .. .. ..$ : chr [1:4] "L21" "L22" "L23" "L24"
## .. .. ..@ reverse : num [1:361, 1:4] 0 0 0 0 0 0 0 0 0 0 ...
## .. .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. .. ..$ : NULL
## .. .. .. .. ..$ : chr [1:4] "L21" "L22" "L23" "L24"
## .. .. ..@ .figdata:'data.frame': 2888 obs. of 4 variables:
## .. .. .. ..$ position : int [1:2888] 1 2 3 4 5 6 7 8 9 10 ...
## .. .. .. ..$ read_length: Factor w/ 4 levels "21","22","23",..: 1 1 1 1 1 1 1 1 1 1 ...
## .. .. .. ..$ coverage : num [1:2888] 56 30 10 3 0 0 0 0 0 0 ...
## .. .. .. ..$ strand : chr [1:2888] "+" "+" "+" "+" ...
## ..@ fastq : chr "/tmp/RtmpprKR4V/synthetic_reads.fq.gz"
The parameters for generating the peaks of alignment coverage can be
checked using @peak
slot.
## mean std strand prob
## 1 74 4 + 0.19467997
## 2 324 3 + 0.07618699
## 3 341 4 + 0.10791345
## 4 239 3 + 0.13421258
## 5 227 4 + 0.11047903
## 6 23 5 + 0.04168474
The parameters for sequence-read sampling can be checked using the
@read_info
slot. The first four columns (i.e.,
mean
, std
, strand
, and
prob
) represent peak information used for sampling sequence
reads; the next two columns (i.e., start
and
end
) are the exact start and end position of the sampled
sequence reads, respectively; and the last two columns (i.e.,
sRNA
and length
) are the nucleotides and
length of the sampled sequence reads.
## [1] 10000 8
## mean std strand prob start end sRNA length
## 1 341 4 + 0.1079135 703 726 TGGAACCGCAGTTGGTTCCTCGGA 24
## 2 74 4 + 0.1946800 436 458 AGGAGCGCTTCAGGGATCCCCGG 23
## 3 227 4 + 0.1104790 589 609 CCCTCGCCCCCTTTGCGCTGT 21
## 4 65 4 + 0.1496360 422 443 GAATTGCGGCTCGGAGGAGCGC 22
## 5 341 4 + 0.1079135 701 724 CTTGGAACCGCAGTTGGTTCCTCG 24
## 6 239 3 + 0.1342126 602 625 TGCGCTGTCGCTTCGGCTACTACC 24
The alignment coverage of the synthetic sequence reads are stored in
the @coverage
slot as a CircSeqAlignTkCoverage
class object. This can be visualized using the plot
function.
## L21 L22 L23 L24
## [1,] 56 125 212 446
## [2,] 30 88 166 376
## [3,] 10 43 116 291
## [4,] 3 14 63 206
## [5,] 0 1 28 122
## [6,] 0 0 6 44
## L21 L22 L23 L24
## [1,] 0 0 0 0
## [2,] 0 0 0 0
## [3,] 0 0 0 0
## [4,] 0 0 0 0
## [5,] 0 0 0 0
## [6,] 0 0 0 0
To change the number of sequence reads that need to be generated, use
the n
argument in the generate_reads
function.
By default, the generate_reads
function generates
sequence reads from the genome sequence of the viroid PSTVd isolate
Cen-1 (FR851463). To
change the seed genome sequence for sequence read sampling, users can
specify a sequence as characters or as a file path to the FASTA format
file containing a sequence using the seq
argument.
genome_seq <- 'CGGAACTAAACTCGTGGTTCCTGTGGTTCACACCTGACCTCCTGACAAGAAAAGAAAAAAGAAGGCGGCTCGGAGGAGCGCTTCAGGGATCCCCGGGGAAACCTGGAGCGAACTGGCAAAAAAGGACGGTGGGGAGTGCCCAGCGGCCGACAGGAGTAATTCCCGCCGAAACAGGGTTTTCACCCTTCCTTTCTTCGGGTGTCCTTCCTCGCGCCCGCAGGACCACCCCTCGCCCCCTTTGCGCTGTCGCTTCGGCTACTACCCGGTGGAAACAACTGAAGCTCCCGAGAACCGCTTTTTCTCTATCTTACTTGCTCCGGGGCGAGGGTGTTTAGCCCTTGGAACCGCAGTTGGTTCCT'
sim <- generate_reads(seq = genome_seq,
output = file.path(ws, 'synthetic_reads.fq.gz'))
By default, generate_reads
function generates sequence
reads with the adapter sequence “AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC”. To
change the adapter sequence, specify a sequence as characters or as a
file path to a FASTA format file containing a adapter sequence using the
adapter
argument. For example, the following scripts
generate reads with 150 nt, containing the adapter sequence
“AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA”.
adapter <- 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
sim <- generate_reads(adapter = adapter,
output = file.path(ws, 'synthetic_reads.fq.gz'),
read_length = 150)
In contrast, to generate sequence reads without adapter sequences,
run the generate_reads
function with
adapter = NA
.
sim <- generate_reads(adapter = NA,
output = file.path(ws, 'synthetic_reads.fq.gz'),
read_length = 150)
The generate_reads
function also implements a process
that introduces several mismatches into the reads after sequence-read
sampling. To introduce a single mismatch for each sequence read with a
probability of 0.05, set the mismatch_prob
argument to
0.05
.
To allow a single sequence read to have multiple mismatches, assign
multiple values to the mismatch_prob
argument. For example,
using the following scripts, the function first generates 10,000 reads.
Then, introduce the first mismatches against all sequence reads with the
probability of 0.05. This will generate approximately 500 (i.e., 10,000
x 0.05) sequence reads containing a mismatch. Next, the function
introduces a second mismatch against the sequence reads with a single
mismatch with the probability of 0.1. Thus, this will generate
approximately 50 (i.e., 500 x 0.1) sequence reads containing two
mismatches.
sim <- generate_reads(output = file.path(ws, 'synthetic_reads.fq.gz'),
mismatch_prob = c(0.05, 0.1))
In addition, the generate_reads
provide some
groundbreaking arguments, srna_length
and
peaks
, to specify the length and strand of sequence reads
and the positions of peaks of the alignment coverage, respectively.
Using these arguments allows users to generate synthetic sequence reads
that are very close to the real small RNA-Seq data sequenced from
viroid-infected plants. The following is an example of how to use these
arguments:
peaks <- data.frame(
mean = c( 0, 25, 70, 90, 150, 240, 260, 270, 330, 350),
std = c( 5, 5, 5, 5, 10, 5, 5, 1, 2, 1),
strand = c( '+', '+', '-', '-', '+', '+', '-', '+', '+', '-'),
prob = c(0.10, 0.10, 0.18, 0.05, 0.03, 0.18, 0.15, 0.10, 0.06, 0.05)
)
srna_length <- data.frame(
length = c( 21, 22, 23, 24),
prob = c(0.45, 0.40, 0.10, 0.05)
)
sim <- generate_reads(n = 1e4,
output = file.path(ws, 'synthetic_reads.fq.gz'),
srna_length = srna_length,
peaks = peaks)
In the synthetic data generated by the generate_reads
function, every peak contains a relatively equal proportion of sequence
reads with different sequence read lengths (Figure
@ref(fig:simSetLengthAndPeaks)). However, in real data, composition of
the reads differs from peak to peak. The CircSeqAlignTk
package provides a merge
function to generate such
synthetic data. This feature can be used, to first generate multiple
synthetic data with various features with the
generate_reads
function and then merge these synthetic data
with the merge
function.
peaks_1 <- data.frame(
mean = c( 100, 150, 250, 300),
std = c( 5, 5, 5, 5),
strand = c( '+', '+', '-', '-'),
prob = c(0.25, 0.25, 0.40, 0.05)
)
srna_length_1 <- data.frame(
length = c( 21, 22),
prob = c(0.45, 0.65)
)
sim_1 <- generate_reads(n = 1e4,
output = file.path(ws, 'synthetic_reads_1.fq.gz'),
srna_length = srna_length_1,
peaks = peaks_1)
peaks_2 <- data.frame(
mean = c( 50, 200, 300),
std = c( 5, 5, 5),
strand = c( '+', '-', '+'),
prob = c(0.80, 0.10, 0.10)
)
srna_length_2 <- data.frame(
length = c( 21, 22, 23),
prob = c(0.10, 0.10, 0.80)
)
sim_2 <- generate_reads(n = 1e3,
output = file.path(ws, 'synthetic_reads_2.fq.gz'),
srna_length = srna_length_2,
peaks = peaks_2)
peaks_3 <- data.frame(
mean = c( 80, 100, 220, 270),
std = c( 5, 5, 1, 2),
strand = c( '-', '+', '+', '-'),
prob = c( 0.20, 0.30, 0.20, 0.30)
)
srna_length_3 <- data.frame(
length = c( 19, 20, 21, 22),
prob = c(0.30, 0.30, 0.20, 0.20)
)
sim_3 <- generate_reads(n = 5e3,
output = file.path(ws, 'synthetic_reads_3.fq.gz'),
srna_length = srna_length_3,
peaks = peaks_3)
# merge the three data sets
sim <- merge(sim_1, sim_2, sim_3,
output = file.path(ws, 'synthetic_reads.fq.gz'))
From Figure @ref(fig:simMergeMultiSimObjects), it can be seen that the lengths of the sequence reads that constitute the peaks vary from peak to peak. For example, the first peak of the forward strand is mainly composed of sequence reads with a length of 23 nt, and the third peak of the forward strand is mainly composed of sequence reads with lengths of 21 nt and 22 nt.
Here we show how to use the CircSeqAlignTk
package to evaluate the performance of the workflow, from aligning
sequence reads to calculating alignment coverage, as shown in the Quick Start section. First, to validate that the
workflow is working correctly, we generate sequence reads without
adapter sequences and mismatches using the generate_reads
function and apply the workflow to these synthetic reads.
sim <- generate_reads(adapter = NA,
mismatch_prob = 0,
output = file.path(ws, 'synthetic_reads.fq.gz'))
genome_seq <- system.file(package = 'CircSeqAlignTk', 'extdata', 'FR851463.fa')
ref_index <- build_index(input = genome_seq,
output = file.path(ws, 'index'))
aln <- align_reads(input = file.path(ws, 'synthetic_reads.fq.gz'),
index = ref_index,
output = file.path(ws, 'align_results'))
alncov <- calc_coverage(aln)
The true alignment coverage of this synthetic data is stored in the
@coverage
slot of the sim
variable, whereas
the predicted alignment coverage is stored in the alncov
variable. Here, we can calculate the root mean squared error (RMSE)
between the true and predicted values for validation.
# coverage of reads in forward strand
fwd_pred <- get_slot_contents(alncov, 'forward')
fwd_true <- get_slot_contents(get_slot_contents(sim, 'coverage'), 'forward')
sqrt(sum((fwd_pred - fwd_true) ^ 2) / length(fwd_true))
## [1] 0
# coverage of reads in reverse strand
rev_pred <- get_slot_contents(alncov, 'reverse')
rev_true <- get_slot_contents(get_slot_contents(sim, 'coverage'), 'reverse')
sqrt(sum((rev_pred - rev_true) ^ 2) / length(rev_true))
## [1] 0
We found that there was no error (i.e., RMSE = 0) between the true and predicted values of alignment coverage. Thus, we confirmed that the workflow presented in the CircSeqAlignTk package works perfectly when the reads do not contain adapter sequences or mismatches.
Next, we evaluate the performance of this workflow under conditions similar to those of real RNA-seq data by concatenating adapter sequences and introducing mismatches into the reads. We first generate synthetic sequence reads with a length of 150 nt that contain at most two mismatches and have adapter sequences.
Next, we follow the Quick Start chapter to trim the adapter sequences, perform alignment, and calculate the alignment coverage.
library(R.utils)
library(Rbowtie2)
# quality control
params <- '--maxns 1 --trimqualities --minquality 30 --minlength 21 --maxlength 24'
remove_adapters(file1 = file.path(ws, 'synthetic_reads.fq'),
adapter1 = get_slot_contents(sim, 'adapter'),
adapter2 = NULL,
output1 = file.path(ws, 'srna_trimmed.fq'),
params,
basename = file.path(ws, 'AdapterRemoval.log'),
overwrite = TRUE)
# alignment
genome_seq <- system.file(package = 'CircSeqAlignTk', 'extdata', 'FR851463.fa')
ref_index <- build_index(input = genome_seq,
output = file.path(ws, 'index'))
aln <- align_reads(input = file.path(ws, 'srna_trimmed.fq'),
index = ref_index,
output = file.path(ws, 'align_results'),
n_mismatch = 2)
# calculate alignment coverage
alncov <- calc_coverage(aln)
We then calculate the RMSE between the true and the predicted values of the alignment coverage.
# coverage of reads in forward strand
fwd_pred <- get_slot_contents(alncov, 'forward')
fwd_true <- get_slot_contents(get_slot_contents(sim, 'coverage'), 'forward')
sqrt(sum((fwd_pred - fwd_true) ^ 2) / length(fwd_true))
## [1] 2.510504
# coverage of reads in reverse strand
rev_pred <- get_slot_contents(alncov, 'reverse')
rev_true <- get_slot_contents(get_slot_contents(sim, 'coverage'), 'reverse')
sqrt(sum((rev_pred - rev_true) ^ 2) / length(rev_true))
## [1] 6.93215
If the sequence reads contained adapter sequences and mismatches,
some sequence reads failed to align. Therefore, the coverage calculated
from this alignment result (i.e., aln
) showed errors from
the true alignment coverage (i.e.,
get_slot_contents(sim, 'coverage')
).
Synthetic sequence reads for various scenarios can be generated by
repeating the generate_reads
function. These synthetic
sequence reads can be used to evaluate the workflow, from aligning reads
to calculating alignment coverage as shown in the Quick Start chapter, more reliably. Given below
is an example for generating 10 sets of synthetic sequence reads,
performing alignment, and calculating alignment coverage for performance
evaluation. Note that two mismatches are introduced here with
probabilities of 0.1 and 0.2, respectively; and adapter sequences are
added until the length of the reads reaches 150 nt.
library(R.utils)
library(Rbowtie2)
params <- '--maxns 1 --trimqualities --minquality 30 --minlength 21 --maxlength 24'
genome_seq <- system.file(package = 'CircSeqAlignTk', 'extdata', 'FR851463.fa')
ref_index <- build_index(input = genome_seq,
output = file.path(ws, 'index'))
fwd_rmse <- rev_rmse <- rep(NA, 10)
for (i in seq(fwd_rmse)) {
# prepare file names and directory to store the simulation results
simset_dpath <- file.path(ws, paste0('sim_tries_', i))
dir.create(simset_dpath)
syn_fq <- file.path(simset_dpath, 'synthetic_reads.fq')
trimmed_syn_fq <- file.path(simset_dpath, 'srna_trimmed.fq')
align_result <- file.path(simset_dpath, 'align_results')
fig_coverage <- file.path(simset_dpath, 'alin_coverage.png')
# generate synthetic reads
set.seed(i)
sim <- generate_reads(mismatch_prob = c(0.1, 0.2),
output = syn_fq)
# quality control
remove_adapters(file1 = syn_fq,
adapter1 = get_slot_contents(sim, 'adapter'),
adapter2 = NULL,
output1 = trimmed_syn_fq,
params,
basename = file.path(ws, 'AdapterRemoval.log'),
overwrite = TRUE)
# alignment
aln <- align_reads(input = trimmed_syn_fq,
index = ref_index,
output = align_result,
n_mismatch = 2)
# calculate alignment coverage
alncov <- calc_coverage(aln)
# calculate RMSE
fwd_pred <- get_slot_contents(alncov, 'forward')
fwd_true <- get_slot_contents(get_slot_contents(sim, 'coverage'), 'forward')
fwd_rmse[i] <- sqrt(sum((fwd_pred - fwd_true) ^ 2) / length(fwd_true))
rev_pred <- get_slot_contents(alncov, 'reverse')
rev_true <- get_slot_contents(get_slot_contents(sim, 'coverage'), 'reverse')
rev_rmse[i] <- sqrt(sum((rev_pred - rev_true) ^ 2) / length(rev_true))
}
## forward reverse
## 1 4.598115 2.356990
## 2 5.325811 2.220373
## 3 4.912320 3.166108
## 4 5.074036 3.010140
## 5 3.031689 6.275585
## 6 2.821195 4.571834
## 7 4.332925 4.563874
## 8 2.726956 4.652837
## 9 4.185369 3.121390
## 10 4.105516 3.995843
The RMSE between the true (i.e., simulation condition) and predicted coverage for the sequence reads in forward strand and reverse strand are 4.11 ± 0.95 and 3.79 ± 1.25, respectively. The result indicates that performance of this workflow is worse when the sequence reads contain up to two mismatches as compared with no mismatches (i.e., RMSE = 0 as shown in the section @ref(performance-evaluation-with-the-synthetic-data)). To examine detailed changes in performance, users can change the number of mismatches and the probabilities of mismatches to estimate how the performance changes.
Viroids are the smallest infectious pathogens. Most variants of viroids infect plants without being toxic, while some variants occasionally result in extensive damage such as stunting, leaf deformation, leaf necrosis, fruit distortion, and even plant death (Ricardo Flores et al. 2005). A single outbreak of toxic viroid infection can cause tremendous economic and agricultural damage (Soliman et al. 2012; Sastry 2013).
The damage caused by viroids to plants is thought to occur during the replication process of the viroid that infects the plants. Viroids are composed of 246–401 nt single-stranded circular non-coding RNAs (Hull 2014; R. Flores et al. 2015; Gago-Zachert 2016). Replication of viroids depends on their host plants through an RNA-based rolling circle process. This process generates double-stranded RNAs (dsRNAs) as intermediate products. The dsRNAs are cut into 21–24 nt fragments called small interfering RNAs (siRNAs) or microRNAs (miRNAs) by Dicer, a bidentate RNase III-like enzyme. siRNAs or miRNAs are then transferred to the RNA-induced silencing complex (RISC), which acts as a functional intermediate for RNA interference (RNAi). Consequently, RNAi causes mRNA cleavage and translational attenuation, resulting in a disease in the host plant.
Sequencing of small RNAs, including viroid-derived small RNAs (vd-sRNAs), siRNAs, and miRNAs from viroid-infected plants could offer insights regarding the mechanism of infection and eventually help in preventing plant damage. The common workflow for analyzing such sequencing data is to (i) limit the read-length (e.g., between 21 and 24 nt), (ii) align these reads to viroid genome sequences, and (iii) visualize coverage of alignment to identify the pathogenic region in the viroid.
Adkar-Purushothama et al. reported viroid-host interactions by infecting potato spindle tuber viroid (PSTVd) RG1 variant in tomato plants (Adkar-Purushothama, Iyer, and Perreault 2017). In their study, small RNAs were sequenced from viroid-infected tomato plants to investigate the expression profiles (i.e., alignment coverage) of vd-sRNAs. In this case study, we demonstrate the manner in which such expression profiles can be calculated using the CircSeqAlignTk package.
First, we prepare a directory to store the initial data,
intermediate, and final results. Then, we use the
download.file
function to download the genome sequence of
PSTVd RG1 and small RNA-Seq data of viroid-infected tomato plants that
are registered in GenBank with accession number U23058 and gene
expression omnibus (GEO) with accession number GSE70166,
respectively. The downloaded genome sequence is saved as
U23058.fa
and the small RNA-Seq data is saved as
GSM1717894_PSTVd_RG1.txt.gz
by running the following
scripts:
library(utils)
project_dpath <- tempdir()
dir.create(project_dpath)
options(timeout = 60 * 10)
download.file(url = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=U23058&rettype=fasta&retmode=text',
destfile = file.path(project_dpath, 'U23058.fa'))
download.file(url = 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE70166&format=file',
destfile = file.path(project_dpath, 'GSE70166.tar'))
untar(file.path(project_dpath, 'GSE70166.tar'),
exdir = project_dpath)
Following the preparation, we specify the genome sequence of the
viroid (i.e., U23058.fa
) to build index files, and align
the reads of the small RNA-Seq data
(GSM1717894_PSTVd_RG1.txt.gz
) against the viroid genome.
Note that this process may take a few minutes, depending on machine
power.
genome_seq <- file.path(project_dpath, 'U23058.fa')
fq <- file.path(project_dpath, 'GSM1717894_PSTVd_RG1.txt.gz')
ref_index <- build_index(input = genome_seq,
output = file.path(project_dpath, 'index'))
aln <- align_reads(input = fq, index = ref_index,
output = file.path(project_dpath, 'align_results'))
The number of sequence reads that can align with the viroid genome sequences can be checked using the following script. From the alignment results saved in the cleaned BAM format file, we can see that the numbers of 71,545 + 15,500 = 87,045 and 11,994 + 1,571 = 13,565 reads in forward and reverse strands that were successfully aligned to the viroid genome sequences, respectively.
## n_reads aligned_fwd aligned_rev unaligned
## GSM1717894_PSTVd_RG1.txt.t1.bam 730499 71586 12000 646913
## GSM1717894_PSTVd_RG1.txt.t2.bam 646913 15504 1572 629837
## GSM1717894_PSTVd_RG1.txt.clean.t1.bam 83539 71545 11994 0
## GSM1717894_PSTVd_RG1.txt.clean.t2.bam 17071 15500 1571 0
## unsorted_reads
## GSM1717894_PSTVd_RG1.txt.t1.bam 0
## GSM1717894_PSTVd_RG1.txt.t2.bam 0
## GSM1717894_PSTVd_RG1.txt.clean.t1.bam 0
## GSM1717894_PSTVd_RG1.txt.clean.t2.bam 0
calc_coverage
and plot
can be used to
calculate and visualize the alignment coverage.
## L21 L22 L23 L24
## [1,] 10832 4201 119 351
## [2,] 10832 4201 119 351
## [3,] 10849 4204 125 354
## [4,] 11029 4827 127 363
## [5,] 14050 4852 135 404
## [6,] 14061 4887 142 414
## L21 L22 L23 L24
## [1,] 866 610 36 59
## [2,] 866 610 36 59
## [3,] 868 610 36 59
## [4,] 870 611 36 60
## [5,] 869 609 36 60
## [6,] 862 609 35 60
We can confirm that the results with the CircSeqAlignTk package are the same as the results shown in Figure 1B of the original paper (Adkar-Purushothama, Iyer, and Perreault 2017) based on the above figure @ref(fig:tutorialViroidCoverage).
The CircSeqAlignTk package also provides a graphical user interface (GUI) mode. To use the GUI mode, we can run the following script:
In the GUI mode, users are required to set a working directory (default is the current directory), a FASTQ file for small RNA-Seq data, and a FASTA file of the viroid genome sequence. After setting up the working directory and files, users can click the “Run” buttons to start the FASTQ quality control and alignment process. If required, the parameters of quality control and alignment can be adjusted. The results will be displayed in the application and saved in the working directory.
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_3.5.1 Rbowtie2_2.13.0 R.utils_2.12.3
## [4] R.oo_1.27.0 R.methodsS3_1.8.2 CircSeqAlignTk_1.9.0
## [7] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 sys_3.4.3
## [3] jsonlite_1.8.9 magrittr_2.0.3
## [5] GenomicFeatures_1.59.1 farver_2.1.2
## [7] rmarkdown_2.29 fs_1.6.5
## [9] BiocIO_1.17.1 zlibbioc_1.52.0
## [11] vctrs_0.6.5 memoise_2.0.1
## [13] Rsamtools_2.23.1 RCurl_1.98-1.16
## [15] htmltools_0.5.8.1 S4Arrays_1.7.1
## [17] progress_1.2.3 curl_6.0.1
## [19] SparseArray_1.7.2 shinyFiles_0.9.3
## [21] sass_0.4.9 bslib_0.8.0
## [23] htmlwidgets_1.6.4 httr2_1.0.7
## [25] plotly_4.10.4 cachem_1.1.0
## [27] buildtools_1.0.0 GenomicAlignments_1.43.0
## [29] igraph_2.1.1 mime_0.12
## [31] lifecycle_1.0.4 pkgconfig_2.0.3
## [33] Matrix_1.7-1 R6_2.5.1
## [35] fastmap_1.2.0 GenomeInfoDbData_1.2.13
## [37] MatrixGenerics_1.19.0 shiny_1.9.1
## [39] digest_0.6.37 colorspace_2.1-1
## [41] ShortRead_1.65.0 AnnotationDbi_1.69.0
## [43] S4Vectors_0.45.2 GenomicRanges_1.59.1
## [45] RSQLite_2.3.8 hwriter_1.3.2.1
## [47] labeling_0.4.3 filelock_1.0.3
## [49] fansi_1.0.6 httr_1.4.7
## [51] abind_1.4-8 compiler_4.4.2
## [53] withr_3.0.2 bit64_4.5.2
## [55] BiocParallel_1.41.0 DBI_1.2.3
## [57] biomaRt_2.63.0 Rhisat2_1.23.0
## [59] rappdirs_0.3.3 DelayedArray_0.33.2
## [61] rjson_0.2.23 tools_4.4.2
## [63] httpuv_1.6.15 glue_1.8.0
## [65] restfulr_0.0.15 promises_1.3.2
## [67] grid_4.4.2 generics_0.1.3
## [69] gtable_0.3.6 tidyr_1.3.1
## [71] data.table_1.16.2 hms_1.1.3
## [73] xml2_1.3.6 utf8_1.2.4
## [75] XVector_0.47.0 BiocGenerics_0.53.3
## [77] pillar_1.9.0 stringr_1.5.1
## [79] later_1.4.1 dplyr_1.1.4
## [81] BiocFileCache_2.15.0 lattice_0.22-6
## [83] rtracklayer_1.67.0 bit_4.5.0
## [85] deldir_2.0-4 tidyselect_1.2.1
## [87] maketools_1.3.1 Biostrings_2.75.1
## [89] knitr_1.49 IRanges_2.41.1
## [91] SummarizedExperiment_1.37.0 stats4_4.4.2
## [93] xfun_0.49 Biobase_2.67.0
## [95] matrixStats_1.4.1 stringi_1.8.4
## [97] UCSC.utils_1.3.0 lazyeval_0.2.2
## [99] yaml_2.3.10 evaluate_1.0.1
## [101] codetools_0.2-20 interp_1.1-6
## [103] tibble_3.2.1 BiocManager_1.30.25
## [105] cli_3.6.3 xtable_1.8-4
## [107] munsell_0.5.1 jquerylib_0.1.4
## [109] Rcpp_1.0.13-1 GenomeInfoDb_1.43.2
## [111] dbplyr_2.5.0 png_0.1-8
## [113] XML_3.99-0.17 RUnit_0.4.33
## [115] parallel_4.4.2 blob_1.2.4
## [117] prettyunits_1.2.0 latticeExtra_0.6-30
## [119] jpeg_0.1-10 SGSeq_1.41.0
## [121] bitops_1.0-9 pwalign_1.3.0
## [123] txdbmaker_1.3.1 viridisLite_0.4.2
## [125] scales_1.3.0 purrr_1.0.2
## [127] crayon_1.5.3 rlang_1.1.4
## [129] KEGGREST_1.47.0 shinyjs_2.1.0