Title: | Rhinovirus genotyping |
---|---|
Description: | "rhinotypeR" is designed to automate the comparison of sequence data against prototype strains, streamlining the genotype assignment process. By implementing predefined pairwise distance thresholds, this package makes genotype assignment accessible to researchers and public health professionals. This tool enhances our epidemiological toolkit by enabling more efficient surveillance and analysis of rhinoviruses (RVs) and other viral pathogens with complex genomic landscapes. Additionally, "rhinotypeR" supports comprehensive visualization and analysis of single nucleotide polymorphisms (SNPs) and amino acid substitutions, facilitating in-depth genetic and evolutionary studies. |
Authors: | Martha Luka [aut, cre] , Ruth Nanjala [aut], Winfred Gatua [aut], Wafaa M. Rashed [aut], Olaitan Awe [aut] |
Maintainer: | Martha Luka <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.0 |
Built: | 2024-10-31 04:24:11 UTC |
Source: | https://github.com/bioc/rhinotypeR |
Rapidly assigns genotypes to input sequences. The input fasta file should include the prototype strains, which can be downloaded using getPrototypeSeqs().
assignTypes(fastaData, model = "p-distance", gapDeletion = TRUE, threshold = 0.105)
assignTypes(fastaData, model = "p-distance", gapDeletion = TRUE, threshold = 0.105)
fastaData |
The fasta data to be processed. |
model |
The evolutionary model to be used. Default is set to "p-distance". |
gapDeletion |
Logical. If TRUE, gaps are deleted. Default is TRUE. |
threshold |
The distance threshold for genotype assignment. Default is 0.105. |
This function compares input sequences against prototype strains using a specified evolutionary model and assigns genotypes based on predefined distance thresholds.
A list with the assigned genotypes and their distances to the reference sequences.
Ensure the input fasta file includes the necessary prototype strains.
Martha Luka, Ruth Nanjala, Wafaa Rashed, Winfred Gatua, Olaitan Awe
getPrototypeSeqs
, pairwiseDistances
# Load the dataset test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") # Run command fastaD <- Biostrings::readDNAStringSet(test) assignTypes(fastaD, model = "p-distance", gapDeletion = TRUE, threshold = 0.105)
# Load the dataset test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") # Run command fastaD <- Biostrings::readDNAStringSet(test) assignTypes(fastaD, model = "p-distance", gapDeletion = TRUE, threshold = 0.105)
Counts single nucleotide polymorphisms across input sequences and produces an output matrix.
countSNPs(fastaData, gapDeletion = TRUE)
countSNPs(fastaData, gapDeletion = TRUE)
fastaData |
The fasta data to be processed. |
gapDeletion |
Whether or not to delete positions with gaps. Default is set to TRUE. |
This function counts the number of single nucleotide polymorphisms (SNPs) across the provided sequences.
A matrix with the SNP counts for each sequence.
Martha Luka, Ruth Nanjala, Wafaa Rashed, Winfred Gatua, Olaitan Awe
# Load the dataset test <- system.file("extdata", "test.fasta", package = "rhinotypeR") # Run the function fastaData <- Biostrings::readDNAStringSet(test) countSNPs(fastaData)
# Load the dataset test <- system.file("extdata", "test.fasta", package = "rhinotypeR") # Run the function fastaData <- Biostrings::readDNAStringSet(test) countSNPs(fastaData)
Download rhinovirus prototype strains into a local directory. These sequences should be combined with and aligned alongside newly generated sequences before being imported into R for genotype assignment.
getPrototypeSeqs(destinationFolder, overwrite = TRUE)
getPrototypeSeqs(destinationFolder, overwrite = TRUE)
destinationFolder |
Provide a path that will act as an output folder for the prototype sequences. The default is set to "output", but you will need to create it prior to running the command in your current directory. |
overwrite |
Logical value indicating whether to overwrite existing files in the destination folder.
The default is |
This function downloads rhinovirus prototype strains and saves them in the specified folder.
If overwrite = TRUE
, existing files with the same name in the destination folder
will be replaced.
A character vector of the downloaded file paths.
Martha Luka, Ruth Nanjala, Wafaa Rashed, Winfred Gatua, Olaitan Awe
if (interactive()) { # Specify a destination directory dest_dir <- tempdir() # Create a temporary directory for the example # Download prototypes with default overwrite = TRUE getPrototypeSeqs(destinationFolder = dest_dir) # List downloaded files list.files(dest_dir) }
if (interactive()) { # Specify a destination directory dest_dir <- tempdir() # Create a temporary directory for the example # Download prototypes with default overwrite = TRUE getPrototypeSeqs(destinationFolder = dest_dir) # List downloaded files list.files(dest_dir) }
Estimates the overall mean distance of input sequences.
overallMeanDistance(fastaData, model="p-distance", gapDeletion=TRUE)
overallMeanDistance(fastaData, model="p-distance", gapDeletion=TRUE)
fastaData |
The fasta file used here is the output from the Biostrings function 'Biostrings::readDNAStringSet'. |
model |
The evolutionary model used to calculate distances. Default is set to "p-distance". |
gapDeletion |
Whether or not to delete positions with gaps. Default is set to TRUE. |
This function estimates the overall mean genetic distance of input sequences using the specified evolutionary model.
A numeric value representing the overall mean distance.
Martha Luka, Ruth Nanjala, Wafaa Rashed, Winfred Gatua, Olaitan Awe
# Load the dataset test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") # Usage fastaData <- Biostrings::readDNAStringSet(test) overallMeanDistance(fastaData, model="p-distance")
# Load the dataset test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") # Usage fastaData <- Biostrings::readDNAStringSet(test) overallMeanDistance(fastaData, model="p-distance")
Estimates pairwise distances across input sequences using a specified evolutionary model.
pairwiseDistances(fastaData, model = "p-distance", gapDeletion = TRUE)
pairwiseDistances(fastaData, model = "p-distance", gapDeletion = TRUE)
fastaData |
The fasta data to be processed. |
model |
The evolutionary model used to calculate distances. Default is set to "p-distance". |
gapDeletion |
Whether or not to delete positions with gaps. Default is set to TRUE. |
This function calculates the pairwise genetic distances between sequences using the specified evolutionary model.
A matrix of pairwise distances.
Martha Luka, Ruth Nanjala, Wafaa Rashed, Winfred Gatua, Olaitan Awe
# Load the dataset test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") # Example usage fastaD <- Biostrings::readDNAStringSet(test) pairwiseDistances(fastaData = fastaD, model = "p-distance", gapDeletion = TRUE)
# Load the dataset test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") # Example usage fastaD <- Biostrings::readDNAStringSet(test) pairwiseDistances(fastaData = fastaD, model = "p-distance", gapDeletion = TRUE)
Plots amino acid substitutions with a specified sequence as the reference. The input is an amino acid fasta file (translated DNA sequences). To specify the reference sequence, move it to the bottom of the alignment. Changes are colored by the class of amino acid: Red = Positively charged, Blue = Negatively charged, Green = Polar, Yellow = Non-polar.
plotAA(AAfastaFile, showLegend = FALSE)
plotAA(AAfastaFile, showLegend = FALSE)
AAfastaFile |
The file path to the input amino acid sequences in fasta format. |
showLegend |
Logical indicating whether to show the legend. Default is FALSE. |
This function visualizes amino acid substitutions in a given set of sequences with color-coded classes.
A plot object showing the amino acid substitutions.
Martha Luka, Ruth Nanjala, Wafaa Rashed, Winfred Gatua, Olaitan Awe
# Load the dataset test <- system.file("extdata", "test.translated.fasta", package = "rhinotypeR") # Usage testData <- Biostrings::readAAStringSet(test) plotAA(testData)
# Load the dataset test <- system.file("extdata", "test.translated.fasta", package = "rhinotypeR") # Usage testData <- Biostrings::readAAStringSet(test) plotAA(testData)
Visualizes pairwise genetic distances in a heatmap. This function uses the output of pairwiseDistances() as input.
plotDistances(distancesMatrix)
plotDistances(distancesMatrix)
distancesMatrix |
A matrix of pairwise genetic distances from the function |
This function creates a heatmap to visualize the pairwise genetic distances between sequences.
A heatmap plot object.
Martha Luka, Ruth Nanjala, Wafaa Rashed, Winfred Gatua, Olaitan Awe
# Load the dataset test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") # Example usage fastaD <- Biostrings::readDNAStringSet(test) distancesMatrix <- pairwiseDistances(fastaD, "p-distance") plotDistances(distancesMatrix)
# Load the dataset test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") # Example usage fastaD <- Biostrings::readDNAStringSet(test) distancesMatrix <- pairwiseDistances(fastaD, "p-distance") plotDistances(distancesMatrix)
Plots the frequency of assigned genotypes.
This function uses the output of assignTypes()
as input.
plotFrequency(assignedTypesDF, showLegend = FALSE)
plotFrequency(assignedTypesDF, showLegend = FALSE)
assignedTypesDF |
A dataframe from the function |
showLegend |
Logical indicating whether to show the legend. Default is FALSE. |
This function visualizes the frequency of assigned genotypes based of the newly generated data.
A plot showing the frequency of assigned genotypes.
Martha Luka, Ruth Nanjala, Wafaa Rashed, Winfred Gatua, Olaitan Awe
# Load the dataset test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") # Run queryFastaData <- Biostrings::readDNAStringSet(test) df <- assignTypes(queryFastaData, "p-distance") plotFrequency(df)
# Load the dataset test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") # Run queryFastaData <- Biostrings::readDNAStringSet(test) df <- assignTypes(queryFastaData, "p-distance") plotFrequency(df)
Plots a simple phylogenetic tree using the genetic distances estimated by
pairwiseDistances()
and allows for customization using additional arguments.
plotTree(distance_matrix, ...)
plotTree(distance_matrix, ...)
distance_matrix |
Distance matrix from the function |
... |
Additional parameters passed to the |
This function visualizes a phylogenetic tree based on the calculated pairwise genetic distances. Users can customize the appearance of the plot by providing additional parameters through the ...
argument, which are passed directly to the plot()
function.
A plot object representing the phylogenetic tree.
Martha Luka, Ruth Nanjala, Wafaa Rashed, Winfred Gatua, Olaitan Awe
# Load the dataset test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") # Example usage fastaD <- Biostrings::readDNAStringSet(test) pdistances <- pairwiseDistances(fastaD, "p-distance") plotTree(pdistances, col = "blue", lwd = 2)
# Load the dataset test <- system.file("extdata", "input_aln.fasta", package = "rhinotypeR") # Example usage fastaD <- Biostrings::readDNAStringSet(test) pdistances <- pairwiseDistances(fastaD, "p-distance") plotTree(pdistances, col = "blue", lwd = 2)
The rhinotypeR package provides tools and functions that streamline the genotyping of rhinoviruses using the VP4/2 region.
This package includes functions to calculate pairwise distances and visualization, specifically tailored for rhinovirus data. It also provides additional utilities for working with sequence data in FASTA format such as simple phylogenetic trees.
getPrototypeSeqs
: Download prototype sequences into local machine.
pairwiseDistances
: Calculates pairwise distances using a specified evolutionary model.
assignTypes
: Assigns sequences to their respective rhinovirus genotypes.
plotTree
: Plots a simple phylogenetic tree.
To get started with rhinotypeR, download the prototype sequences and combine these with your newly generated VP4/2 sequences and align using a suitable tool. Then import the curated alignment into R:
\# Download prototype sequences getPrototypeSeqs("path/to/destination") \# Read sequences from a FASTA file sequences <- Biostrings::readDNAStringSet("../inst/extdata/input_aln.fasta") \# Perform sequence analysis distance_matrix <- pairwiseDistances(sequences) \# Assign to genotypes assigned_types <- assignTypes(distance_matrix) \# Simple phylogenetic tree plotTree(distance_matrix)
For more detailed examples and usage, refer to the package vignettes:
vignette(package = "rhinotypeR") vignette("rhinotypeR", package = "rhinotypeR")
Maintainer: Martha Luka [email protected] (ORCID)
Authors:
Ruth Nanjala
Winfred Gatua
Wafaa M. Rashed
Olaitan Awe
https://github.com/omicscodeathon/rhinotypeR
Visualizes single nucleotide polymorphisms (SNPs) relative to a specified reference sequence. To specify the reference, manually move it to the bottom of the alignment. Substitutions are color-coded by nucleotide: A = green, T = red, C = blue, G = yellow.
SNPeek(fastaData, showLegend = FALSE)
SNPeek(fastaData, showLegend = FALSE)
fastaData |
The fasta file used here is the output from the function 'Biostrings::readDNAStringSet'. |
showLegend |
Logical indicating whether to show the legend. Default is FALSE. |
This function visualizes SNPs in the provided sequence data, using a color-coding scheme for different nucleotides.
A plot showing the SNPs relative to a user specified reference sequence.
Martha Luka, Ruth Nanjala, Wafaa Rashed, Winfred Gatua, Olaitan Awe
# Load the dataset test <- system.file("extdata", "test.fasta", package = "rhinotypeR") fastaData <- Biostrings::readDNAStringSet(test) SNPeek(fastaData, showLegend = FALSE)
# Load the dataset test <- system.file("extdata", "test.fasta", package = "rhinotypeR") fastaData <- Biostrings::readDNAStringSet(test) SNPeek(fastaData, showLegend = FALSE)