The rhinotypeR package is designed to automate the
genotyping of rhinoviruses using the VP4/2 genomic region. Having worked
on rhinoviruses for a few years, I noticed that assigning genotypes
after sequencing was particularly laborious, and needed several manual
interventions. We, therefore, developed this package to address this
challenge by streamlining the process.
It provides:
pairwiseDistances(),
overallMeanDistance())assignTypes())plotFrequency(),
plotDistances(), plotTree(),
SNPeek(), plotAA())alignToRefs() for optional alignment in
R; deleteMissingDataSites() for global gap deletion)getPrototypeSeqs() if you prefer
to align externally)This vignette walks through two practical workflows:
A. Align externally (Your tool of choice) with
getPrototypeSeqs() → merge with new data, align with tool
of choice → import → assignTypes().
B. Align in R (with packaged prototypes) using
alignToRefs() → assignTypes().
Tip: Genotype assignment relies on distances to
prototype strains and requires that those prototypes are present in the
alignment you pass to assignTypes(). The
alignToRefs() helper makes this easy by appending packaged
prototypes and aligning jointly.
You can install rhinotypeR from Bioconductor using:
plotAA())alignToRefs()) or outside R and then import
(e.g., with Biostrings::readDNAStringSet())Prefer your own alignment tool? Use getPrototypeSeqs()
to export the packaged prototypes to your local device. These should
then be combined with the newly generated sequences, aligned using
suitable software, curated and imported into R. There are a multitude of
existing alignment software including MAFFT and MUSCLE.
For example, to download to the Desktop directory:
getPrototypeSeqs("~/Desktop")
# You will get "~/Desktop/RVRefs.fasta"
# 2) Combine RVRefs.fasta with your new sequences and align in your tool (e.g., MAFFT).
# 3) Manually curate (trim to VP4/2 span, resolve poor columns), then save as FASTA.Use the Biostrings package to read the curated alignment FASTA file into R:
Use this when you want a fully scripted path in R.
alignToRefs() appends packaged references and runs
msa::msa() using ClustalW/ClustalOmega/MUSCLE. It can
optionally crop the final alignment to the non-gap span of a chosen
prototype (trimToRef = TRUE, default; anchor with
refName).
# Use package dataset: rhinovirusVP4
# Align user sequences + prototypes in R (choose method)
aln <- alignToRefs(
seqData = rhinovirusVP4,
method = "ClustalW", # or "ClustalOmega", "Muscle"
trimToRef = TRUE, # crop to reference non-gap span
refName = "JN855971.1_A107" # default anchor; change if desired
)
alnWhy trimming and gap handling matter:
trimToRef = TRUE crops the alignment to the exact span
covered by a chosen prototype — helpful when user reads spill over the
target regiondeleteGapsGlobally() performs global deletion (drop any
column with a gap in any sequence). This is different from pairwise
deletion used within distance calculations. Use with care — this
modifies the alignment# Input A
res <- assignTypes(aln_curated, model = "IUPAC", deleteGapsGlobally = FALSE, threshold = 0.105)## Computing: [========================================] 100% (done)
## query assignedType distance reference
## 1 MT177836.1 unassigned 0.11190476 AY040242.1_B97
## 2 MT177837.1 unassigned 0.11190476 AY040242.1_B97
## 3 MT177838.1 B99 0.08809524 AF343652.1_B99
## 4 MT177793.1 B42 0.07380952 AY016404.1_B42
## 5 MT177794.1 B106 0.05617978 KP736587.1_B106
## 6 MT177795.1 B106 0.05912596 KP736587.1_B106
# OR similarly for input B
res <- assignTypes(aln, model = "IUPAC", deleteGapsGlobally = FALSE, threshold = 0.105)
head(res)Assigning types: rules and outputs
assignTypes() compares each query to prototypes and
returns:
query (sequence ID)assignedType (or “unassigned”)distance (to nearest prototype — always reported, even
if unassigned)reference (prototype accession/name used)Thresholds: We default to
threshold = 0.105 (~10.5%) in line with common practice for
VP4/2 (A/C) and close to B; adjust per your analysis or species.
# Distances among all sequences
d <- pairwiseDistances(
fastaData = rhinovirusVP4,
model = "IUPAC",
deleteGapsGlobally = FALSE # set TRUE to apply global deletion inside the function
)## Computing: [========================================] 100% (done)
The distance matrix looks like:
## MT177836.1 MT177837.1 MT177838.1 MT177793.1 MT177794.1 MT177795.1
## MT177836.1 0.0000000 0.0000000 0.2380952 0.2119048 0.1938202 0.2113022
## MT177837.1 0.0000000 0.0000000 0.2380952 0.2119048 0.1938202 0.2113022
## MT177838.1 0.2380952 0.2380952 0.0000000 0.1476190 0.2191011 0.2260442
## MT177793.1 0.2119048 0.2119048 0.1476190 0.0000000 0.2050562 0.2186732
## MT177794.1 0.1938202 0.1938202 0.2191011 0.2050562 0.0000000 0.0000000
## MT177796.1
## MT177836.1 0.21190476
## MT177837.1 0.21190476
## MT177838.1 0.22380952
## MT177793.1 0.21666667
## MT177794.1 0.01404494
To visualize the pairwise distances using a heatmap or a phylogenetic tree:
## Computing: [========================================] 100% (done)
## [1] 0.3077591
## Simple tree (from distances)
sampled_distances <- d[1:30, 1:30]
plotTree(sampled_distances, hang = -1, cex = 0.6,
main = "A simple tree", xlab = "", ylab = "Genetic distance")The SNPeek() function visualizes single nucleotide
polymorphisms (SNPs) in the sequences, with a selected sequence acting
as the reference. To specify the reference sequence, move it to the
bottom of the alignment before importing into R. Substitutions are
color-coded by nucleotide:
SNPeek() (nucleotide) and plotAA()
(protein) support zooming and highlighting. To show all sequence names,
increase the plotting device height (RStudio plot pane or
png(height=...)).
# AA view (requires an AAStringSet)
## Option 1 -- read an aligned amino acid sequence
aa_seq <- Biostrings::readAAStringSet(system.file("extdata", "test_translated.fasta", package = "rhinotypeR"))
plotAA(aa_seq)## Option 2 -- translating DNA
# Remove gaps before translate()
aln_no_gaps <- Biostrings::DNAStringSet(
gsub("-", "", as.character(rhinovirusVP4))
)
#translate
aa <- Biostrings::translate(aln_no_gaps)
aa_aln <- msa::msa(aa) # align as plotAA expects equal width
aa_aln <- as(aa_aln, "AAStringSet") # transform to AAStringSet
plotAA(aa_aln)Tip: show_only_highlighted = TRUE
focuses on chosen rows while keeping mismatch context relative to the
reference.
assignTypes() will
error if prototypes are not present in your alignment. Use Option A
(alignToRefs()) or Option B
(getPrototypeSeqs() + external align).trimToRef = TRUE and select an appropriate
refName that spans your intended VP4/2 region.deleteMissingDataSites() (global deletion), or use a
pairwise deletion model where appropriate in distance routines.highlight_* options and
show_only_highlighted = TRUE.The rhinotypeR package simplifies the process of
genotyping rhinoviruses and analyzing their genetic data. By automating
various steps and providing visualization tools, it enhances the
efficiency and accuracy of rhinovirus epidemiological studies.
## R version 4.6.0 (2026-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rhinotypeR_1.7.0 BiocStyle_2.41.0
##
## loaded via a namespace (and not attached):
## [1] tidyr_1.3.2 sass_0.4.10 generics_0.1.4
## [4] stringi_1.8.7 lattice_0.22-9 digest_0.6.39
## [7] magrittr_2.0.5 evaluate_1.0.5 grid_4.6.0
## [10] iterators_1.0.14 fastmap_1.2.0 seqinr_4.2-44
## [13] foreach_1.5.2 doParallel_1.0.17 jsonlite_2.0.0
## [16] ape_5.8-1 BiocManager_1.30.27 purrr_1.2.2
## [19] Biostrings_2.81.2 ade4_1.7-24 codetools_0.2-20
## [22] jquerylib_0.1.4 cli_3.6.6 rlang_1.2.0
## [25] crayon_1.5.3 XVector_0.53.0 cachem_1.1.0
## [28] yaml_2.3.12 tools_4.6.0 parallel_4.6.0
## [31] dplyr_1.2.1 BiocGenerics_0.59.6 msa_1.45.0
## [34] buildtools_1.0.0 vctrs_0.7.3 R6_2.6.1
## [37] stats4_4.6.0 lifecycle_1.0.5 stringr_1.6.0
## [40] pwalign_1.9.1 Seqinfo_1.3.0 S4Vectors_0.51.3
## [43] IRanges_2.47.2 MASS_7.3-65 pkgconfig_2.0.3
## [46] bslib_0.11.0 pillar_1.11.1 glue_1.8.1
## [49] Rcpp_1.1.1-1.1 xfun_0.57 tibble_3.3.1
## [52] GenomicRanges_1.65.0 tidyselect_1.2.1 sys_3.4.3
## [55] knitr_1.51 htmltools_0.5.9 nlme_3.1-169
## [58] rmarkdown_2.31 maketools_1.3.2 compiler_4.6.0
## [61] MSA2dist_1.17.0