| Title: | Rhinovirus genotyping |
|---|---|
| Description: | "rhinotypeR" is designed to automate the comparison of sequence data against prototype strains, streamlining the genotype assignment process. By implementing predefined pairwise distance thresholds, this package makes genotype assignment accessible to researchers and public health professionals. This tool enhances our epidemiological toolkit by enabling more efficient surveillance and analysis of rhinoviruses (RVs) and other viral pathogens with complex genomic landscapes. Additionally, "rhinotypeR" supports comprehensive visualization and analysis of single nucleotide polymorphisms (SNPs) and amino acid substitutions, facilitating in-depth genetic and evolutionary studies. |
| Authors: | Martha Luka [aut, cre] (ORCID: <https://orcid.org/0000-0001-6217-4426>), Ruth Nanjala [aut], Winfred Gatua [aut], Wafaa M. Rashed [aut], Olaitan Awe [aut] |
| Maintainer: | Martha Luka <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.7.0 |
| Built: | 2026-05-29 10:14:19 UTC |
| Source: | https://github.com/bioc/rhinotypeR |
Combines imported sequences with the prototype rhinovirus references shipped
in rhinotypeR. Performs a multiple sequence alignment using ClustalW,
ClustalOmega, or MUSCLE. The aligned result is returned as a
DNAStringSet with equal-width sequences, suitable for downstream
distance calculations (e.g., pairwiseDistances).
If trimToRef = TRUE, the final alignment is cropped to the
non-gap span of a chosen reference sequence.
alignToRefs( seqData, method = c("ClustalW", "ClustalOmega", "Muscle"), trimToRef = TRUE, refName = "JN855971.1_A107", ... )alignToRefs( seqData, method = c("ClustalW", "ClustalOmega", "Muscle"), trimToRef = TRUE, refName = "JN855971.1_A107", ... )
seqData |
A |
method |
Character; one of |
trimToRef |
|
refName |
|
... |
Additional arguments forwarded to |
Packaged prototypes are loaded from
system.file("extdata","prototypes.fasta", package = "rhinotypeR").
User sequences are appended and jointly aligned with msa::msa. The
returned DNAStringSet has equal width. When trimToRef = TRUE,
the alignment is cropped to the range covered by non-gap positions in
refName; internal gaps within that span are preserved. For global
removal of gap-containing columns across all sequences, see
deleteMissingDataSites().
If you prefer to align/curate outside R, use
getPrototypeSeqs to copy the prototype references to disk and
import your curated alignment later.
A Biostrings::DNAStringSet of aligned sequences (user + prototypes),
all with identical width; if trimToRef = TRUE, cropped to the chosen
reference’s non-gap span.
Method selection: msa::msa expects a single method
string; this wrapper enforces that with match.arg().
Homology: Ensure sequences are homologous and from the same genomic region; MSA quality degrades on mixed loci.
Result class: The function returns a DNAStringSet
(not a DNAMultipleAlignment).
getPrototypeSeqs, pairwiseDistances,
assignTypes
seqs_path <- system.file("extdata", "test.fasta", package = "rhinotypeR") seqs <- Biostrings::readDNAStringSet(seqs_path) aln_trim <- alignToRefs(seqs, method = "ClustalW", trimToRef = TRUE) # Keep full joint alignment (no trimming) aln_full <- alignToRefs(seqs, method = "ClustalW", trimToRef = FALSE) # Use a different prototype as trimming anchor # aln_sel <- alignToRefs(seqs, method = "Muscle", trimToRef = TRUE, # refName = "AF343653.1_B27")seqs_path <- system.file("extdata", "test.fasta", package = "rhinotypeR") seqs <- Biostrings::readDNAStringSet(seqs_path) aln_trim <- alignToRefs(seqs, method = "ClustalW", trimToRef = TRUE) # Keep full joint alignment (no trimming) aln_full <- alignToRefs(seqs, method = "ClustalW", trimToRef = FALSE) # Use a different prototype as trimming anchor # aln_sel <- alignToRefs(seqs, method = "Muscle", trimToRef = TRUE, # refName = "AF343653.1_B27")
Compares query rhinovirus sequences to prototype reference strains using pairwise genetic distances and assigns a genotype when the nearest distance is below a user-specified threshold. If the nearest distance exceeds the threshold, the sequence is returned as '"unassigned"' **but the actual nearest distance and prototype reference are still reported**. This enables users to inspect borderline cases and optionally apply relaxed thresholds (e.g., up to 0.11) to account for evolving viral diversity or intra-type variation.
assignTypes( fastaData, model = "IUPAC", deleteGapsGlobally = FALSE, threshold = 0.105 )assignTypes( fastaData, model = "IUPAC", deleteGapsGlobally = FALSE, threshold = 0.105 )
fastaData |
A [Biostrings::DNAStringSet-class] containing an **aligned** VP4/2 nucleotide alignment. The alignment must include the rhinovirus prototype sequences (same accessions as shipped with the package). If your alignment does not already include prototypes, you have two options:
|
model |
Character string specifying the substitution model used by [pairwiseDistances()] for distance calculation. Defaults to '"IUPAC"'. |
deleteGapsGlobally |
Logical. If 'TRUE', sites containing gaps ('-') are removed across all sequences before distance calculation. Default: 'FALSE'. |
threshold |
Numeric. Distance threshold for genotype assignment. Defaults to '0.105' (approximately 10.5 McIntyre et al. (2013). Use '0.095' for species B if desired. |
- Prototype accessions are distributed with the package and loaded from 'inst/extdata/prototypes.rda' (object 'prototypes'). - The function checks that all prototypes are present in 'fastaData'. If not, it stops with guidance to use [alignToRefs()] or [getPrototypeSeqs()]. - Distances are computed via [pairwiseDistances()] on the provided alignment. - Above-threshold queries are labeled '"unassigned"' while still reporting the nearest prototype and its distance to support interpretation and threshold adjustment.
A 'data.frame' with columns:
Query sequence name.
Assigned rhinovirus type, or '"unassigned"'.
Nearest pairwise distance to any prototype (reported even when unassigned).
Prototype accession corresponding to the nearest distance.
McIntyre, C. L., et al. (2013). Proposals for the classification of human rhinovirus species A, B and C into genotypically assigned types. Journal of General Virology, 94(8), 1791–1806.
[alignToRefs()] for in-R alignment with packaged prototypes; [getPrototypeSeqs()] to export prototypes for external alignment; [pairwiseDistances()] for distance calculation.
if (interactive()) { # Load example alignment shipped with the package test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") fastaD <- Biostrings::readDNAStringSet(test) # Assign types # Note: The input must include the prototype sequences. # If your alignment does not already contain prototypes, # use alignToRefs() before calling assignTypes() OR # use getPrototypeSeqs() to download the prototype sequences, # combine with your new sequences and align before importing to R try({ res <- assignTypes(fastaD, model = "IUPAC", deleteGapsGlobally = FALSE, threshold = 0.105) head(res) }) }if (interactive()) { # Load example alignment shipped with the package test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") fastaD <- Biostrings::readDNAStringSet(test) # Assign types # Note: The input must include the prototype sequences. # If your alignment does not already contain prototypes, # use alignToRefs() before calling assignTypes() OR # use getPrototypeSeqs() to download the prototype sequences, # combine with your new sequences and align before importing to R try({ res <- assignTypes(fastaD, model = "IUPAC", deleteGapsGlobally = FALSE, threshold = 0.105) head(res) }) }
Computes the absolute number of single-nucleotide differences (SNPs) between all pairs of sequences in an aligned [Biostrings::DNAStringSet-class] object. Optionally removes any alignment columns that contain gaps or ambiguous bases before counting.
countSNPs(fastaData, deleteGapsGlobally = FALSE)countSNPs(fastaData, deleteGapsGlobally = FALSE)
fastaData |
A [Biostrings::DNAStringSet-class] of aligned DNA sequences. Sequences must have identical width. |
deleteGapsGlobally |
Logical. If 'TRUE', sites containing gaps ('-' or '.') or ambiguous bases ('"N"', '"?"') are removed across all sequences before SNP counting. Default: 'FALSE'. |
A square integer matrix of SNP counts with sequence names as row and column dimnames.
[pairwiseDistances()] for distances; [SNPeek()], [plotAA()] for viz.
if (interactive()) { aln <- Biostrings::DNAStringSet(c(Seq1="ATGC", Seq2="ATGT", Seq3="ATGA")) countSNPs(aln) countSNPs(aln, deleteGapsGlobally = FALSE) }if (interactive()) { aln <- Biostrings::DNAStringSet(c(Seq1="ATGC", Seq2="ATGT", Seq3="ATGA")) countSNPs(aln) countSNPs(aln, deleteGapsGlobally = FALSE) }
Copies the packaged rhinovirus prototype strains (‘prototypes.fasta’) from rhinotypeR into a user-specified directory as ‘RVRefs.fasta’. Downloading to local storage is **not required** to use the package.
getPrototypeSeqs(destinationFolder, overwrite = TRUE)getPrototypeSeqs(destinationFolder, overwrite = TRUE)
destinationFolder |
|
overwrite |
|
Users have two equivalent workflows:
In-R workflow (no download): use
alignToRefs to align your sequences against the
packaged reference prototypes directly in R, then run
assignTypes on the resulting alignment.
External-tools workflow (with download): use
getPrototypeSeqs to save ‘RVRefs.fasta’ locally,
combine it with your new sequences, and perform alignment
using your preferred external tool. You can then bring the aligned
FASTA back into R for genotype assignment.
Internally, this function uses system.file to locate the
packaged prototype file and file.copy to copy it into the
specified directory. If you are following the in-R workflow, you can skip this
function entirely and proceed with alignToRefs followed by
assignTypes.
Prints a message noting the destination directory.
Martha Luka, Ruth Nanjala, Wafaa Rashed, Winfred Gatua, Olaitan Awe
if (interactive()) { # --- In-R workflow (no download) --- # aln <- alignToRefs(query_fasta = "my_new_sequences.fasta", method = "Muscle") # typesDF <- assignTypes(aln) # --- External-tools workflow (with download) --- dest_dir <- tempdir() # specify a destination directory getPrototypeSeqs(destinationFolder = dest_dir) # writes RVRefs.fasta # Now combine RVRefs.fasta with your new sequences and align using # external tools (e.g., MAFFT, MUSCLE). Then read the aligned FASTA # back into R and proceed with assignTypes(). list.files(dest_dir) }if (interactive()) { # --- In-R workflow (no download) --- # aln <- alignToRefs(query_fasta = "my_new_sequences.fasta", method = "Muscle") # typesDF <- assignTypes(aln) # --- External-tools workflow (with download) --- dest_dir <- tempdir() # specify a destination directory getPrototypeSeqs(destinationFolder = dest_dir) # writes RVRefs.fasta # Now combine RVRefs.fasta with your new sequences and align using # external tools (e.g., MAFFT, MUSCLE). Then read the aligned FASTA # back into R and proceed with assignTypes(). list.files(dest_dir) }
Computes the overall mean pairwise distance between all sequences in an alignment. Optionally deletes columns containing gaps across all sequences before computing the distances.
overallMeanDistance( fastaData, model = "IUPAC", deleteGapsGlobally = FALSE, ... )overallMeanDistance( fastaData, model = "IUPAC", deleteGapsGlobally = FALSE, ... )
fastaData |
A DNAStringSet containing aligned nucleotide sequences. |
model |
Character string specifying the substitution model to use.
Defaults to '"IUPAC"'. Other options are passed to |
deleteGapsGlobally |
Logical; if |
... |
Additional arguments passed to |
The function computes all pairwise distances using
dnastring2dist, extracts the distance matrix,
and reports the mean of the off-diagonal elements.
A single numeric value giving the overall mean pairwise distance across all sequence comparisons (excluding diagonal/self comparisons).
pairwiseDistances, assignTypes
# Load example alignment test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") fastaD <- Biostrings::readDNAStringSet(test) # Compute mean distance (using default IUPAC model) overallMeanDistance(fastaD) # Compute mean distance with gaps deleted overallMeanDistance(fastaD, deleteGapsGlobally = TRUE)# Load example alignment test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") fastaD <- Biostrings::readDNAStringSet(test) # Compute mean distance (using default IUPAC model) overallMeanDistance(fastaD) # Compute mean distance with gaps deleted overallMeanDistance(fastaD, deleteGapsGlobally = TRUE)
Calculates pairwise genetic distances between all sequences in an aligned [Biostrings::DNAStringSet-class] object using [MSA2dist::dnastring2dist()]. Supports global gap deletion or default pairwise gap masking.
pairwiseDistances(fastaData, model = "IUPAC", deleteGapsGlobally = FALSE, ...)pairwiseDistances(fastaData, model = "IUPAC", deleteGapsGlobally = FALSE, ...)
fastaData |
A [Biostrings::DNAStringSet-class] object containing aligned DNA sequences. Sequences must all be the same width (e.g., from [msa::msa()] or [alignToRefs()]). |
model |
A character string specifying the substitution model to use. Defaults to '"IUPAC"', which accounts for ambiguous bases. Other models are passed to [ape::dist.dna()] through [MSA2dist::dnastring2dist()]. |
deleteGapsGlobally |
Logical. If 'TRUE', all columns containing at least one gap ('-') are removed prior to distance calculation (global gap deletion). If 'FALSE' (default), gaps are handled pairwise during distance calculation. |
... |
Additional arguments passed to [ape::dist.dna()] via [MSA2dist::dnastring2dist()]. |
- If 'deleteGapsGlobally = TRUE', all gapped sites are removed across the entire alignment using [deleteMissingDataSites()]. - Otherwise, gaps are masked on a pairwise basis by [MSA2dist::dnastring2dist()].
A numeric matrix of pairwise genetic distances, with row and column names corresponding to the sequence names in 'fastaData'.
* [countSNPs()] for converting distances to SNP counts. * [deleteMissingDataSites()] for global gap trimming.
if (interactive()) { aln <- DNAStringSet(c( Seq1 = "ATGC", Seq2 = "ATGT", Seq3 = "ATGA" )) # Default: IUPAC model with pairwise gap masking pairwiseDistances(aln) # Jukes and Cantor model pairwiseDistances(aln, model = "JC69") }if (interactive()) { aln <- DNAStringSet(c( Seq1 = "ATGC", Seq2 = "ATGT", Seq3 = "ATGA" )) # Default: IUPAC model with pairwise gap masking pairwiseDistances(aln) # Jukes and Cantor model pairwiseDistances(aln, model = "JC69") }
'plotAA()' plots per–position amino acid differences between an aligned set of protein sequences and a chosen reference. It reuses the same cache and fast redraw machinery as [SNPeek()], with an amino acid color map grouping residues into biochemical classes:
* Positively charged (Arg, His, Lys) – red * Negatively charged (Asp, Glu) – blue * Polar uncharged (Ser, Thr, Asn, Gln) – green * Nonpolar / aromatic / special (all others) – gold * Other/unknown (X, gaps) – grey
By default (no 'xlim'/'center'+'window'), the function plots the full alignment span and all sequences. Users can zoom to regions of interest and optionally emphasize specific sequence IDs without hiding the others.
plotAA( aln_aa_set, ref_name = NULL, xlim = NULL, center = NULL, window = NULL, showLegend = FALSE, colorMapAA = c(R = "red2", H = "red2", K = "red2", D = "blue2", E = "blue2", S = "green3", T = "green3", N = "green3", Q = "green3", A = "gold", V = "gold", I = "gold", L = "gold", M = "gold", F = "gold", W = "gold", P = "gold", G = "gold", Y = "gold", C = "gold"), colorFallback = "grey50", highlight_seqs = NULL, highlight_bg = grDevices::adjustcolor("#FFE082", alpha.f = 0.5), lwd_normal = 1, lwd_highlight = 2, seg_half_height = 0.35, y_cex = 0.8, line_col = "grey20", show_only_highlighted = FALSE )plotAA( aln_aa_set, ref_name = NULL, xlim = NULL, center = NULL, window = NULL, showLegend = FALSE, colorMapAA = c(R = "red2", H = "red2", K = "red2", D = "blue2", E = "blue2", S = "green3", T = "green3", N = "green3", Q = "green3", A = "gold", V = "gold", I = "gold", L = "gold", M = "gold", F = "gold", W = "gold", P = "gold", G = "gold", Y = "gold", C = "gold"), colorFallback = "grey50", highlight_seqs = NULL, highlight_bg = grDevices::adjustcolor("#FFE082", alpha.f = 0.5), lwd_normal = 1, lwd_highlight = 2, seg_half_height = 0.35, y_cex = 0.8, line_col = "grey20", show_only_highlighted = FALSE )
aln_aa_set |
A 'Biostrings::AAStringSet' of aligned amino acid sequences (all sequences must be the same width). |
ref_name |
Character scalar; the sequence name in 'aln_aa_set' to use as reference. If 'NULL', the last sequence is used. |
xlim |
Optional integer length-2 vector giving the protein window to display, e.g. 'c(5, 20)'. If omitted, the entire span is shown. |
center, window
|
Optional integers; alternatively specify a zoom by center position and window width (e.g., 'center = 150, window = 50'). Ignored if 'xlim' is provided. |
showLegend |
Logical; if 'TRUE', draws a compact legend for the five amino acid classes. Default: 'FALSE'. |
colorMapAA |
Named character vector mapping amino acids to colors. The default assigns one color per residue class (see Details). |
colorFallback |
Color used for any symbol not present in 'colorMapAA' (e.g., 'X', '-'). Default: '"grey50"'. |
highlight_seqs |
Optional character vector of sequence IDs (row names) to visually emphasize in the plot (adds a translucent row band and thicker mismatch strokes). Sequences are still shown unless 'show_only_highlighted = TRUE'. |
highlight_bg |
Fill color for emphasized rows (semi-transparent recommended). Default: translucent amber. |
lwd_normal, lwd_highlight
|
Line widths for mismatch segments in normal vs highlighted rows. Defaults: '1' and '2'. |
seg_half_height |
Vertical half-height of each mismatch segment (in row units). Default: '0.35'. |
y_cex |
Expansion factor for y-axis (sequence labels). Default: '0.8'. |
line_col |
Baseline axis/line color. Default: '"grey20"'. |
show_only_highlighted |
Logical; if 'TRUE', display only the rows whose IDs are listed in 'highlight_seqs'. Default: 'FALSE'. |
Internally, 'plotAA()' calls [compute_cache()] with an amino acid–specific color map, then passes the result to [plot_window()] for fast drawing. The returned cache can be reused for repeated zooms without recomputation.
When 'showLegend = TRUE', the legend shows only five biochemical classes (positively charged, negatively charged, polar uncharged, nonpolar/aromatic, and other).
Invisibly returns the precomputed cache (class '"SNPeekCache"') used for plotting. The primary output is a base R plot drawn to the active device.
[SNPeek()] for the nucleotide version; [compute_cache()], [plot_window()] for internals.
Other visualization:
SNPeek(),
plotDistances(),
plotFrequency(),
plotTree()
# Create a protein alignment aa <- Biostrings::AAStringSet(c( Ref = "MTEYKLVVVGYKL", S1 = "MTEYKLVILVVVG", S2 = "MTEYKLVVV-LVV" )) # 1) Full span, default reference (last sequence) plotAA(aa) # 2) Choose a reference and zoom plotAA(aa, ref_name = "Ref", xlim = c(3, 8)) # 3) Highlight a sequence of interest plotAA(aa, ref_name = "Ref", xlim = c(3, 8), highlight_seqs = "S1", showLegend = TRUE)# Create a protein alignment aa <- Biostrings::AAStringSet(c( Ref = "MTEYKLVVVGYKL", S1 = "MTEYKLVILVVVG", S2 = "MTEYKLVVV-LVV" )) # 1) Full span, default reference (last sequence) plotAA(aa) # 2) Choose a reference and zoom plotAA(aa, ref_name = "Ref", xlim = c(3, 8)) # 3) Highlight a sequence of interest plotAA(aa, ref_name = "Ref", xlim = c(3, 8), highlight_seqs = "S1", showLegend = TRUE)
Generates a heatmap from a pairwise distance matrix, typically produced by
pairwiseDistances.
plotDistances(distancesMatrix)plotDistances(distancesMatrix)
distancesMatrix |
A numeric matrix of pairwise distances, usually
produced by |
The function uses the base R heatmap function to visualize distances.
Row and column clustering are disabled to preserve the input ordering.
Colors are drawn from heat.colors(256). The scale is set to "none"
so the distances are displayed directly, not normalized by row or column.
Invisibly returns the object from stats::heatmap
(a list with components such as rowInd and colInd).
The primary output is the heatmap drawn to the active device.
Other visualization:
SNPeek(),
plotAA(),
plotFrequency(),
plotTree()
# Example using built-in dataset test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") fastaD <- Biostrings::readDNAStringSet(test) # Compute pairwise distances dist_mat <- pairwiseDistances(fastaD) # Plot heatmap of distances plotDistances(dist_mat)# Example using built-in dataset test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") fastaD <- Biostrings::readDNAStringSet(test) # Compute pairwise distances dist_mat <- pairwiseDistances(fastaD) # Plot heatmap of distances plotDistances(dist_mat)
Generates a barplot showing the frequency of assigned rhinovirus types. Optionally displays a legend indicating species classification (A, B, C, Other).
plotFrequency( assignedTypesDF, showLegend = FALSE, sort = FALSE, las = 2, cex_names = 0.8 )plotFrequency( assignedTypesDF, showLegend = FALSE, sort = FALSE, las = 2, cex_names = 0.8 )
assignedTypesDF |
A data frame, typically the output of |
showLegend |
Logical; if |
sort |
Logical; if |
las |
Integer; axis label style for type names (defaults to 2 = perpendicular for readability). |
cex_names |
Numeric; character expansion for x-axis names. Default 0.8. |
Types are grouped into species via the first letter of the type token
after stripping common prefixes like "RV-" or "RV".
Any "unassigned" (any case) is labeled as species "Other".
Colors:
A = blue
B = red
C = green
Other = grey
Invisibly returns a data frame with columns:
assignedType, count, species.
Other visualization:
SNPeek(),
plotAA(),
plotDistances(),
plotTree()
test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") fastaD <- Biostrings::readDNAStringSet(test) assigned <- try(assignTypes(fastaD), silent = TRUE) if (!inherits(assigned, "try-error")) { plotFrequency(assigned, showLegend = TRUE) }test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") fastaD <- Biostrings::readDNAStringSet(test) assigned <- try(assignTypes(fastaD), silent = TRUE) if (!inherits(assigned, "try-error")) { plotFrequency(assigned, showLegend = TRUE) }
Generates a simple phylogenetic tree/dendrogram from a pairwise distance matrix using hierarchical clustering (complete linkage by default).
plotTree(distance_matrix, ...)plotTree(distance_matrix, ...)
distance_matrix |
A numeric, symmetric matrix of pairwise distances (zeros on the diagonal). |
... |
Additional arguments passed to |
The function converts the matrix to a dist object,
performs hclust with method = "complete", and
plots the resulting dendrogram.
Invisibly returns the hclust object.
pairwiseDistances, plotDistances
Other visualization:
SNPeek(),
plotAA(),
plotDistances(),
plotFrequency()
# Example using built-in dataset test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") fastaD <- Biostrings::readDNAStringSet(test) # Compute pairwise distances dist_mat <- pairwiseDistances(fastaD) # Plot tree plotTree(dist_mat, lwd = 2)# Example using built-in dataset test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") fastaD <- Biostrings::readDNAStringSet(test) # Compute pairwise distances dist_mat <- pairwiseDistances(fastaD) # Plot tree plotTree(dist_mat, lwd = 2)
The rhinotypeR package provides a toolkit for automating rhinovirus genotyping using the VP4/2 genomic region—streamlining alignment, genotype assignment, and visualization.
assignTypes: Assign sequences to rhinovirus genotypes
based on distances to prototype strains. Returns query ID, assigned type,
distance to nearest prototype, and reference used.
pairwiseDistances: Calculates pairwise genetic distances
between all sequences using specified evolutionary models (e.g., IUPAC).
overallMeanDistance: Computes the overall mean genetic
distance across all sequences as a measure of diversity.
alignToRefs: Aligns user sequences with packaged prototype
references using ClustalW, ClustalOmega, or MUSCLE. Optionally crops
the alignment to a reference span.
getPrototypeSeqs: Exports packaged prototype sequences
to a local directory for external alignment workflows.
deleteMissingDataSites: Performs global gap deletion
(removes any column containing gaps in any sequence).
plotDistances: Creates a heatmap of pairwise genetic distances.
plotTree: Generates a simple phylogenetic tree from a distance matrix.
SNPeek: Visualizes single nucleotide polymorphisms (SNPs)
with color-coded nucleotides (A=green, T=red, C=blue, G=yellow).
plotAA: Displays amino acid differences with zooming and highlighting.
Format: FASTA; DNA for VP4/2 (AA only for plotAA).
Region: VP4/2; sequences must be homologous.
Length/QC: Recommended 350 bp (typical 420 bp) and trimmed to the VP4/2 span.
Export prototypes: getPrototypeSeqs("~/Desktop")
Combine prototypes with your sequences
Align externally (e.g., MAFFT https://mafft.cbrc.jp/alignment/server/)
Import alignment: Biostrings::readDNAStringSet("alignment.fasta")
Assign genotypes: assignTypes(alignment)
# Align sequences with packaged prototypes aln <- alignToRefs( seqData = rhinovirusVP4, method = "ClustalW", trimToRef = TRUE, refName = "JN855971.1_A107" ) # Assign genotypes res <- assignTypes( aln, model = "IUPAC", deleteGapsGlobally = FALSE, threshold = 0.105 ) # Visualize results plotFrequency(res)
threshold: Default 0.105 (approximately 10.5%) for VP4/2 region, following proposals in McIntyre et al. (2013).
trimToRef: When TRUE, crops alignment to non-gap span of chosen prototype (helpful when reads extend beyond target region).
deleteGapsGlobally: Global deletion vs. pairwise deletion (distance models).
model: Evolutionary model for distance calculation (e.g., "IUPAC").
rhinovirusPrototypesVP4: VP4/2 prototype references as a
Biostrings::DNAStringSet (mirrors ‘inst/extdata/prototypes.fasta’).
Reported by McIntyre et al. (2013), Journal of General Virology, 94(8), 1791–1806.
doi:10.1099/vir.0.053686-0
rhinovirusVP4: A rhinovirus VP4/2 alignment as a
Biostrings::DNAStringSet for examples/tests; derived from
Luka et al. (2020), Open Forum Infectious Diseases, 7(10), ofaa385.
doi:10.1093/ofid/ofaa385
library(rhinotypeR) data(rhinovirusVP4, package = "rhinotypeR") aln <- alignToRefs( seqData = rhinovirusVP4, method = "ClustalW", trimToRef = TRUE ) res <- assignTypes(aln, model = "IUPAC") head(res) d <- pairwiseDistances(rhinovirusVP4, model = "IUPAC") plotDistances(d) sampled <- d[1:30, 1:30] plotTree(sampled, main = "Rhinovirus VP4/2 Tree") SNPeek(rhinovirusVP4) aa <- Biostrings::translate(rhinovirusVP4) plotAA(aa)
Prototypes missing error: Ensure prototypes are present. Use
alignToRefs() or getPrototypeSeqs().
Misaligned regions: Use trimToRef = TRUE and choose an
appropriate refName.
Gap issues: Consider global deletion with deleteMissingDataSites()
or pairwise deletion via distance models.
Truncated plots: Increase device height or use
show_only_highlighted = TRUE.
Maintainer: Martha Luka [email protected] (ORCID)
Authors:
Ruth Nanjala
Winfred Gatua
Wafaa M. Rashed
Olaitan Awe
alignToRefs, assignTypes, pairwiseDistances,
overallMeanDistance, plotDistances,
plotTree, SNPeek, plotAA,
getPrototypeSeqs
A FASTA bundle of rhinovirus prototype sequences used as references for genotyping (VP4/2 region). Provided as a convenience dataset for examples, vignettes, and tests.
data(rhinovirusPrototypesVP4)data(rhinovirusPrototypesVP4)
A [Biostrings::DNAStringSet-class] containing near-aligned VP4/2 prototype sequences. Sequence names are accessions (optionally with type labels).
These sequences mirror the prototypes shipped in ‘inst/extdata/prototypes.fasta’. You may use this object directly in workflows (e.g., to append to user sequences before alignment) or export prototypes to disk with [getPrototypeSeqs()].
McIntyre, C. L., Knowles, N. J., & Simmonds, P. (2013). Proposals for the classification of human rhinovirus species A, B and C into genotypically assigned types.
McIntyre, C. L., Knowles, N. J., & Simmonds, P. (2013). Proposals for the classification of human rhinovirus species A, B and C into genotypically assigned types. *Journal of General Virology*, 94(8), 1791–1806. doi:10.1099/vir.0.053686-0
data(rhinovirusPrototypesVP4) rhinovirusPrototypesVP4data(rhinovirusPrototypesVP4) rhinovirusPrototypesVP4
A VP4/2 alignment used in examples and tests. Useful for demonstrating 'alignToRefs()', 'pairwiseDistances()', and 'assignTypes()'.
data(rhinovirusVP4)data(rhinovirusVP4)
A [Biostrings::DNAStringSet-class] where all sequences have equal width (aligned). Names are sequence IDs.
This dataset is provided for illustration of the rhinotypeR workflow. For related cohort data and background context on respiratory viruses in a coastal Kenya school cohort, see the sources and references below.
Luka, M. M., et al. (2020). Molecular Epidemiology of Human Rhinovirus From 1-Year Surveillance Within a School Setting in Rural Coastal Kenya.
Luka, M. M., et al. (2020). Molecular Epidemiology of Human Rhinovirus
From 1-Year Surveillance Within a School Setting in Rural Coastal Kenya.
*Open Forum Infectious Diseases*, 7(10), ofaa385.
doi:10.1093/ofid/ofaa385
Adema, I. et al. (2020).Surveillance of respiratory viruses among children attending
a primary school in rural coastal Kenya (Wellcome Open Research, 5:63, v2).
doi:10.12688/wellcomeopenres.15703.2
data(rhinovirusVP4) rhinovirusVP4 # d <- pairwiseDistances(rhinovirusVP4, model = "IUPAC")data(rhinovirusVP4) rhinovirusVP4 # d <- pairwiseDistances(rhinovirusVP4, model = "IUPAC")
'SNPeek()' renders per–position nucleotide differences between an aligned set of DNA sequences and a chosen reference using a cached pre-computation for fast redraws. With no 'xlim'/'center'+'window', it plots the full genome span and **all** sequences. Users can zoom regions and optionally emphasize specific sequence IDs with(out) hiding the others.
SNPeek( aln_set, ref_name = NULL, xlim = NULL, center = NULL, window = NULL, showLegend = FALSE, colorMapNT = c(A = "green3", T = "red2", C = "blue2", G = "gold"), colorFallback = "grey50", highlight_seqs = NULL, highlight_bg = grDevices::adjustcolor("#FFE082", alpha.f = 0.5), lwd_normal = 1, lwd_highlight = 2, seg_half_height = 0.35, y_cex = 0.8, line_col = "grey20", show_only_highlighted = FALSE )SNPeek( aln_set, ref_name = NULL, xlim = NULL, center = NULL, window = NULL, showLegend = FALSE, colorMapNT = c(A = "green3", T = "red2", C = "blue2", G = "gold"), colorFallback = "grey50", highlight_seqs = NULL, highlight_bg = grDevices::adjustcolor("#FFE082", alpha.f = 0.5), lwd_normal = 1, lwd_highlight = 2, seg_half_height = 0.35, y_cex = 0.8, line_col = "grey20", show_only_highlighted = FALSE )
aln_set |
A ['Biostrings::DNAStringSet'] of aligned sequences (all sequences must have identical width). |
ref_name |
Character; the sequence name in 'aln_set' to use as reference. If 'NULL', the last sequence is used. |
xlim |
Optional integer length-2 vector giving the genomic window, e.g. 'c(100, 200)'. If omitted, the entire span is shown. |
center, window
|
Optionally specify a zoom by center position and window width (e.g., 'center = 5012, window = 600'). Ignored if 'xlim' is provided. |
showLegend |
Logical; if 'TRUE', draws a legend for the nucleotide color map. Default: 'FALSE'. |
colorMapNT |
Named character vector mapping nucleotides to colors. Default: 'c(A = "green3", T = "red2", C = "blue2", G = "gold")'. |
colorFallback |
Color for any non-ATCG symbol (e.g., 'N', '-'). Default: '"grey50"'. |
highlight_seqs |
Optional character vector of sequence IDs (row names) to emphasize (adds a translucent row band and thicker mismatch strokes). All sequences are still shown unless 'show_only_highlighted = TRUE'. |
highlight_bg |
Fill color for emphasized rows (use 'grDevices::adjustcolor()' for transparency). Default: 'adjustcolor("#FFE082", alpha.f = 0.5)'. |
lwd_normal, lwd_highlight
|
Line widths for mismatch segments in normal vs highlighted rows. Defaults: '1' and '2'. |
seg_half_height |
Vertical half-height of each mismatch segment (row units). Default: '0.35'. |
y_cex |
Expansion factor for y-axis (sequence labels). Default: '0.8'. |
line_col |
Baseline axis/line color. Default: '"grey20"'. |
show_only_highlighted |
Logical; if 'TRUE', display only rows whose IDs are in 'highlight_seqs'. Default: 'FALSE'. |
Internally, 'SNPeek()' builds a reusable cache of mismatch positions and color indices versus the reference, then subsets to the requested window before plotting for speed. The cache is returned invisibly so you can reuse it for interactive workflows without recomputation.
Invisibly returns the precomputed cache (class '"SNPeekCache"') used for plotting. The primary output is a base R plot.
'plotAA()' for the amino-acid analogue.
Other visualization:
plotAA(),
plotDistances(),
plotFrequency(),
plotTree()
fasta_file <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") aln <- Biostrings::readDNAStringSet(fasta_file) # Full span SNPeek(aln) # Zoom + highlight SNPeek(aln, xlim = c(100, 200), highlight_seqs = c("MT177780.1", "MT177798.1"), showLegend = TRUE)fasta_file <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") aln <- Biostrings::readDNAStringSet(fasta_file) # Full span SNPeek(aln) # Zoom + highlight SNPeek(aln, xlim = c(100, 200), highlight_seqs = c("MT177780.1", "MT177798.1"), showLegend = TRUE)