```{r setup, include=FALSE}
options(repos = c(CRAN = "https://cran.rstudio.com/"))
```
---
title: "Introduction to rhinotypeR"
output: BiocStyle::html_document
vignette: >
%\VignetteIndexEntry{Introduction to rhinotypeR}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
## Background
The `rhinotypeR` package is designed to simplify the genotyping of rhinoviruses using the VP4/2 genomic region. Having worked on rhinoviruses for a few years, I noticed that assigning genotypes after sequencing was particularly laborious, and needed several manual interventions. We, therefore, developed this package to address this challenge by streamlining the process by enabling a user to download prototype sequences, calculate genetic pairwise distances, and compare the distances to prototype strains for genotype assignment. It also provides visualization options such as frequency plots and simple phylogenetic trees.
## Usage
### Installing the package
You can install rhinotypeR from BioConductor using
```{r eval=FALSE}
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("rhinotypeR")
```
### Loading the package
```{r}
library(rhinotypeR)
```
### Example Workflow
1. Download prototype sequences:
The `getPrototypeSeqs` function downloads the prototype sequences required for genotyping. These should the be combined with the newly generated sequences, aligned using a suitable software, and imported into R. For example, to download to the Desktop directory, one can run:
```{r eval=FALSE}
getPrototypeSeqs("~/Desktop")
```
2. Read sequences:
Use the Biostrings package to read FASTA files containing sequence data. This extracts the sequence data and headers information and should be stored into an object for downstream analysis.
```{r}
sequences <- Biostrings::readDNAStringSet(system.file("extdata", "input_aln.fasta", package="rhinotypeR"))
```
3. Visualize SNPs:
The `SNPeek` function visualizes single nucleotide polymorphisms (SNPs) in the sequences, with a select sequence acting as the reference. To specify the reference sequences, move it to the bottom of the alignment before importing into R. Substitutions are color-coded by the nucleotide i.e.,
A = green
T = red
C = blue
G = yellow
```{r}
SNPeek(sequences)
```
4. Calculate Pairwise Distances:
The `pairwiseDistances` function calculates genetic distances between sequences, using a specified evolutionary model.
```{r}
distances <- pairwiseDistances(sequences, model = "p-distance", gapDeletion = TRUE)
```
The distance matrix looks like:
```{r, echo=FALSE}
distances[1:5, 1:7]
```
5. Assign Genotypes:
The assignTypes function assigns genotypes to the sequences by comparing genetic distances to prototype strains.
```{r}
genotypes <- assignTypes(sequences, model = "p-distance", gapDeletion = TRUE, threshold = 0.105)
head(genotypes)
```
6. Plot Results:
The `plotFrequency` function visualizes the frequency of assigned genotypes. This function uses the output of `assignTypes` as input.
```{r}
plotFrequency(genotypes)
```
The `plotDistances` function visualizes pairwise genetic distances in a heatmap. This function uses the output of `pairwiseDistances` as input.
```{r}
plotDistances(distances)
```
The `plotTree` function plots a simple phylogenetic tree. This function uses the output of `pairwiseDistances` as input.
```{r}
# sub-sample
sampled_distances <- distances[1:30,1:30]
plotTree(sampled_distances, hang = -1, cex = 0.6, main = "A simple tree", xlab = "", ylab = "Genetic distance")
```
## Conclusion
The rhinotypeR package simplifies the process of genotyping rhinoviruses and analyzing their genetic data. By automating various steps and providing visualization tools, it enhances the efficiency and accuracy of rhinovirus epidemiological studies.
## Session Info
```{r}
sessionInfo()
```