Introduction to rhinotypeR

Background

The rhinotypeR package is designed to simplify the genotyping of rhinoviruses using the VP4/2 genomic region. Having worked on rhinoviruses for a few years, I noticed that assigning genotypes after sequencing was particularly laborious, and needed several manual interventions. We, therefore, developed this package to address this challenge by streamlining the process by enabling a user to download prototype sequences, calculate genetic pairwise distances, and compare the distances to prototype strains for genotype assignment. It also provides visualization options such as frequency plots and simple phylogenetic trees.

Usage

Installing the package

You can install rhinotypeR from BioConductor using

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("rhinotypeR")

Loading the package

library(rhinotypeR)

Example Workflow

  1. Download prototype sequences:

The getPrototypeSeqs function downloads the prototype sequences required for genotyping. These should the be combined with the newly generated sequences, aligned using a suitable software, and imported into R. For example, to download to the Desktop directory, one can run:

getPrototypeSeqs("~/Desktop")
  1. Read sequences:

Use the Biostrings package to read FASTA files containing sequence data. This extracts the sequence data and headers information and should be stored into an object for downstream analysis.

sequences <- Biostrings::readDNAStringSet(system.file("extdata", "input_aln.fasta", package="rhinotypeR"))
  1. Visualize SNPs:

The SNPeek function visualizes single nucleotide polymorphisms (SNPs) in the sequences, with a select sequence acting as the reference. To specify the reference sequences, move it to the bottom of the alignment before importing into R. Substitutions are color-coded by the nucleotide i.e.,

A = green

T = red

C = blue

G = yellow

SNPeek(sequences)

  1. Calculate Pairwise Distances:

The pairwiseDistances function calculates genetic distances between sequences, using a specified evolutionary model.

distances <- pairwiseDistances(sequences, model = "p-distance", gapDeletion = TRUE)

The distance matrix looks like:

##                AF343653.1_B26 MT177836.1 MT177837.1 AY040242.1_B97
## AF343653.1_B26      0.0000000  0.2435897  0.2435897      0.2243590
## MT177836.1          0.2435897  0.0000000  0.0000000      0.1185897
## MT177837.1          0.2435897  0.0000000  0.0000000      0.1185897
## AY040242.1_B97      0.2243590  0.1185897  0.1185897      0.0000000
## AF343654.1_B27      0.2147436  0.1698718  0.1698718      0.1794872
##                AF343654.1_B27 AY040239.1_B93 AY040240.1_B84
## AF343653.1_B26      0.2147436      0.2435897      0.2115385
## MT177836.1          0.1698718      0.1506410      0.1923077
## MT177837.1          0.1698718      0.1506410      0.1923077
## AY040242.1_B97      0.1794872      0.1634615      0.2083333
## AF343654.1_B27      0.0000000      0.1185897      0.1891026
  1. Assign Genotypes:

The assignTypes function assigns genotypes to the sequences by comparing genetic distances to prototype strains.

genotypes <- assignTypes(sequences, model = "p-distance", gapDeletion = TRUE, threshold = 0.105)

head(genotypes)
##                 query assignedType   distance       reference
## MT177836.1 MT177836.1   unassigned         NA  AY040242.1_B97
## MT177837.1 MT177837.1   unassigned         NA  AY040242.1_B97
## MT177838.1 MT177838.1          B99 0.08974359  AF343652.1_B99
## MT177793.1 MT177793.1          B42 0.08012821  AY016404.1_B42
## MT177794.1 MT177794.1         B106 0.05769231 KP736587.1_B106
## MT177795.1 MT177795.1         B106 0.05769231 KP736587.1_B106
  1. Plot Results:

The plotFrequency function visualizes the frequency of assigned genotypes. This function uses the output of assignTypes as input.

plotFrequency(genotypes)

The plotDistances function visualizes pairwise genetic distances in a heatmap. This function uses the output of pairwiseDistances as input.

plotDistances(distances)

The plotTree function plots a simple phylogenetic tree. This function uses the output of pairwiseDistances as input.

# sub-sample 
sampled_distances <- distances[1:30,1:30]

plotTree(sampled_distances, hang = -1, cex = 0.6, main = "A simple tree", xlab = "", ylab = "Genetic distance")

Conclusion

The rhinotypeR package simplifies the process of genotyping rhinoviruses and analyzing their genetic data. By automating various steps and providing visualization tools, it enhances the efficiency and accuracy of rhinovirus epidemiological studies.

Session Info

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] rhinotypeR_1.1.0
## 
## loaded via a namespace (and not attached):
##  [1] crayon_1.5.3            httr_1.4.7              cli_3.6.3              
##  [4] knitr_1.49              rlang_1.1.4             xfun_0.49              
##  [7] UCSC.utils_1.3.0        generics_0.1.3          jsonlite_1.8.9         
## [10] S4Vectors_0.45.2        buildtools_1.0.0        Biostrings_2.75.1      
## [13] htmltools_0.5.8.1       maketools_1.3.1         sys_3.4.3              
## [16] sass_0.4.9              stats4_4.4.2            rmarkdown_2.29         
## [19] evaluate_1.0.1          jquerylib_0.1.4         fastmap_1.2.0          
## [22] GenomeInfoDb_1.43.2     IRanges_2.41.1          lifecycle_1.0.4        
## [25] compiler_4.4.2          XVector_0.47.0          digest_0.6.37          
## [28] R6_2.5.1                GenomeInfoDbData_1.2.13 bslib_0.8.0            
## [31] tools_4.4.2             zlibbioc_1.52.0         BiocGenerics_0.53.3    
## [34] cachem_1.1.0