Title: | Quasispecies Diversity |
---|---|
Description: | Set of utility functions for viral quasispecies analysis with NGS data. Most functions are equally useful for metagenomic studies. There are three main types: (1) data manipulation and exploration—functions useful for converting reads to haplotypes and frequencies, repairing reads, intersecting strand haplotypes, and visualizing haplotype alignments. (2) diversity indices—functions to compute diversity and entropy, in which incidence, abundance, and functional indices are considered. (3) data simulation—functions useful for generating random viral quasispecies data. |
Authors: | Mercedes Guerrero-Murillo [cre, aut] , Josep Gregori i Font [aut] |
Maintainer: | Mercedes Guerrero-Murillo <[email protected]> |
License: | GPL-2 |
Version: | 1.25.0 |
Built: | 2025-01-17 04:23:26 UTC |
Source: | https://github.com/bioc/QSutils |
Set of utility functions for viral quasispecies analysis with NGS data. Most functions are equally useful for metagenomic studies. There are three main types: (1) data manipulation and exploration—functions useful for converting reads to haplotypes and frequencies, repairing reads, intersecting strand haplotypes, and visualizing haplotype alignments. (2) diversity indices—functions to compute diversity and entropy, in which incidence, abundance, and functional indices are considered. (3) data simulation—functions useful for generating random viral quasispecies data.
The DESCRIPTION file:
Package: | QSutils |
Type: | Package |
Title: | Quasispecies Diversity |
Version: | 1.25.0 |
Date: | 2024-04-23 |
Authors@R: | c(person("Mercedes", "Guerrero-Murillo", email="[email protected]", role=c("cre", "aut"), comment = c(ORCID = "0000-0002-5556-2460")), person("Josep", "Gregori i Font", role=c ("aut"), comment = c(ORCID = "0000-0002-4253-8015"))) |
Depends: | R (>= 3.5), Biostrings, pwalign, BiocGenerics, methods |
Imports: | ape, stats, psych |
Encoding: | UTF-8 |
Description: | Set of utility functions for viral quasispecies analysis with NGS data. Most functions are equally useful for metagenomic studies. There are three main types: (1) data manipulation and exploration—functions useful for converting reads to haplotypes and frequencies, repairing reads, intersecting strand haplotypes, and visualizing haplotype alignments. (2) diversity indices—functions to compute diversity and entropy, in which incidence, abundance, and functional indices are considered. (3) data simulation—functions useful for generating random viral quasispecies data. |
License: | GPL-2 |
biocViews: | Software, Genetics, DNASeq, GeneticVariability, Sequencing, Alignment, SequenceMatching, DataImport |
VignetteBuilder: | knitr |
Suggests: | BiocStyle, knitr, rmarkdown, ggplot2 |
NeedsCompilation: | no |
Packaged: | 2018-05-28 14:56:11 UTC; mervhir |
RoxygenNote: | 6.0.1 |
Config/pak/sysreqs: | libssl-dev |
Repository: | https://bioc.r-universe.dev |
RemoteUrl: | https://github.com/bioc/QSutils |
RemoteRef: | HEAD |
RemoteSha: | 4274cf1c2863ded612e426410921d3d1a3052386 |
Author: | Mercedes Guerrero-Murillo [cre, aut] (<https://orcid.org/0000-0002-5556-2460>), Josep Gregori i Font [aut] (<https://orcid.org/0000-0002-4253-8015>) |
Maintainer: | Mercedes Guerrero-Murillo <[email protected]> |
Mercedes Guerrero-Murillo [cre, aut] (<https://orcid.org/0000-0002-5556-2460>), Josep Gregori i Font [aut] (<https://orcid.org/0000-0002-4253-8015>)
Maintainer: Mercedes Guerrero-Murillo <[email protected]>
Gregori J, Perales C, Rodriguez-Frias F, Esteban JI, Quer J, Domingo E. Viral quasispecies complexity measures. Virology. 2016 Jun;493:227-37. doi: 10.1016/j.virol.2016.03.017. Epub 2016 Apr 6. Review. PubMed PMID: 27060566.
Gregori J, Salicrú M, Domingo E, Sanchez A, Esteban JI, Rodríguez-Frías F, Quer J. Inference with viral quasispecies diversity indices: clonal and NGS approaches. Bioinformatics. 2014 Apr 15;30(8):1104-1111. Epub 2014 Jan 2. PubMed PMID: 24389655.
Gregori J, Esteban JI, Cubero M, Garcia-Cehic D, Perales C, Casillas R, Alvarez-Tejado M, Rodríguez-Frías F, Guardia J, Domingo E, Quer J. Ultra-deep pyrosequencing (UDPS) data treatment to study amplicon HCV minor variants. PLoS One. 2013 Dec 31;8(12):e83361. doi: 10.1371/journal.pone.0083361. eCollection 2013. PubMed PMID: 24391758; PubMed Central PMCID: PMC3877031.
Ramírez C, Gregori J, Buti M, Tabernero D, Camós S, Casillas R, Quer J, Esteban R, Homs M, Rodriguez-Frías F. A comparative study of ultra-deep pyrosequencing and cloning to quantitatively analyze the viral quasispecies using hepatitis B virus infection as a model. Antiviral Res. 2013 May;98(2):273-83.doi: 10.1016/j.antiviral.2013.03.007. Epub 2013 Mar 20. PubMed PMID: 23523552.
Collapse summarizes aligned reads into haplotypes with their frequencies. Recollapse is used to update the collapse after some type of manipulation may have resulted in duplicate haplotypes.
Collapse(seqs) Recollapse(seqs,nr)
Collapse(seqs) Recollapse(seqs,nr)
seqs |
DNAStringSet or AAStringSet object with the sequences to collapse. |
nr |
Vector with the haplotype counts. |
Recollapse is used when haplotypes may become equivalent after some type of manipulation. It removes duplicate sequences and updates their frequencies.
Collapse and Recollapse return a list with two elements.
nr |
Vector of the haplotype counts. |
hseqs |
DNAStringSet or AAStringSet with the haplotype sequence. |
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Esteban JI, Cubero M, Garcia-Cehic D, Perales C, Casillas R, Alvarez-Tejado M, Rodríguez-Frías F, Guardia J, Domingo E, Quer J. Ultra-deep pyrosequencing (UDPS) data treatment to study amplicon HCV minor variants. PLoS One. 2013 Dec 31;8(12):e83361. doi: 10.1371/journal.pone.0083361. eCollection 2013. PubMed PMID: 24391758; PubMed Central PMCID: PMC3877031.
Ramírez C, Gregori J, Buti M, Tabernero D, Camós S, Casillas R, Quer J, Esteban R, Homs M, Rodriguez-Frías F. A comparative study of ultra-deep pyrosequencing and cloning to quantitatively analyze the viral quasispecies using hepatitis B virus infection as a model. Antiviral Res. 2013 May;98(2):273-83. doi: 10.1016/j.antiviral.2013.03.007. Epub 2013 Mar 20. PubMed PMID: 23523552.
# Load raw reads. filepath<-system.file("extdata","Toy.GapsAndNs.fna", package="QSutils") reads <- readDNAStringSet(filepath) # Collapse this reads into haplotypes lstCollapsed <- Collapse(reads) lstCorrected<-CorrectGapsAndNs(lstCollapsed$hseqs[2:length(lstCollapsed$hseqs)], lstCollapsed$hseqs[[1]]) #Add again the most abundant haplotype. lstCorrected<- c(lstCollapsed$hseqs[1],lstCorrected) lstCorrected # Recollapse the corrected haplotypes lstRecollapsed<-Recollapse(lstCorrected,lstCollapsed$nr) lstRecollapsed
# Load raw reads. filepath<-system.file("extdata","Toy.GapsAndNs.fna", package="QSutils") reads <- readDNAStringSet(filepath) # Collapse this reads into haplotypes lstCollapsed <- Collapse(reads) lstCorrected<-CorrectGapsAndNs(lstCollapsed$hseqs[2:length(lstCollapsed$hseqs)], lstCollapsed$hseqs[[1]]) #Add again the most abundant haplotype. lstCorrected<- c(lstCollapsed$hseqs[1],lstCorrected) lstCorrected # Recollapse the corrected haplotypes lstRecollapsed<-Recollapse(lstCorrected,lstCollapsed$nr) lstRecollapsed
ConsSeq determines the consensus sequence from a set of haplotypes.
ConsSeq(seqs, w=NULL)
ConsSeq(seqs, w=NULL)
seqs |
DNAStringSet or AAStringSet object with the haplotype sequences. |
w |
An optional numeric vector with the haplotype counts. |
The most frequent nucleotide or amino acid at each position is taken. No IUPAC ambiguity codes are considered; in the case of ties, the consensus nucleotide is decided randomly.
Character vector with the consensus sequence.
Mercedes Guerrero-Murillo and Josep Gregori
filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") ConsSeq(lst$hseqs,lst$nr)
filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") ConsSeq(lst$hseqs,lst$nr)
Corrects positions in a DNAStringSet or AAStringSet of aligned haplotypes, replacing gaps and Ns (indeterminates) with the nucleotide or amino acid from the corresponding position in the reference sequence.
CorrectGapsAndNs(hseqs, ref.seq)
CorrectGapsAndNs(hseqs, ref.seq)
hseqs |
DNAStringSet or AAStringSet object with the alignment to correct. |
ref.seq |
Character vector with the reference sequence of the alignment. |
DNAStringSet or AAStringSet object with the sequences corrected. Duplicate
haplotypes may arise as a consequence of this operation.
See Recollapse
.
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Esteban JI, Cubero M, Garcia-Cehic D, Perales C, Casillas R, Alvarez-Tejado M, Rodríguez-Frías F, Guardia J, Domingo E, Quer J. Ultra-deep pyrosequencing (UDPS) data treatment to study amplicon HCV minor variants. PLoS One. 2013 Dec 31;8(12):e83361. doi: 10.1371/journal.pone.0083361. eCollection 2013. PubMed PMID: 24391758; PubMed Central PMCID: PMC3877031.
Ramírez C, Gregori J, Buti M, Tabernero D, Camós S, Casillas R, Quer J, Esteban R, Homs M, Rodriguez-Frías F. A comparative study of ultra-deep pyrosequencing and cloning to quantitatively analyze the viral quasispecies using hepatitis B virus infection as a model. Antiviral Res. 2013 May;98(2):273-83. doi: 10.1016/j.antiviral.2013.03.007. Epub 2013 Mar 20. PubMed PMID: 23523552.
# Create a random reference sequence. ref.seq <-GetRandomSeq(50) ref.seq # Create an alignment with gaps and Ns. symb <- c(".","-","N") nseqs <- 12 p <- c(0.9,0.06,0.04) hseqs <- matrix(sample(symb,50*nseqs,replace=TRUE,prob=p),ncol=50) hseqs <- apply(hseqs,1,paste,collapse="") hseqs hseqs <- DNAStringSet(hseqs) # Apply the function and visualize the result. cseqs <- CorrectGapsAndNs(hseqs,as.character(ref.seq)) c(ref.seq,as.character(cseqs))
# Create a random reference sequence. ref.seq <-GetRandomSeq(50) ref.seq # Create an alignment with gaps and Ns. symb <- c(".","-","N") nseqs <- 12 p <- c(0.9,0.06,0.04) hseqs <- matrix(sample(symb,50*nseqs,replace=TRUE,prob=p),ncol=50) hseqs <- apply(hseqs,1,paste,collapse="") hseqs hseqs <- DNAStringSet(hseqs) # Apply the function and visualize the result. cseqs <- CorrectGapsAndNs(hseqs,as.character(ref.seq)) c(ref.seq,as.character(cseqs))
Computes the nearest cluster to a given sequence.
DBrule(grpDist, hr, oDist, g.names = NULL)
DBrule(grpDist, hr, oDist, g.names = NULL)
grpDist |
Distances between reference sequences. |
hr |
Factor or a vector of integers that contains the type or subtype for each reference sequence. |
oDist |
Distance from the sequence to be classified to the reference sequences. |
g.names |
Type or subtype names to classify the sequence. |
List with three elements:
Phi2 |
Vector with the distances to each cluster. |
DB.rule |
The index of the nearest cluster. |
Type |
Name of the nearest cluster. |
Mercedes Guerrero-Murillo and Josep Gregori
Caballero A, Gregori J, Homs M, Tabernero D, Gonzalez C, Quer J, Blasi M, Casillas R, Nieto L, Riveiro-Barciela M, Esteban R, Buti M, Rodriguez-Frias F. Complex Genotype Mixtures Analyzed by Deep Sequencing in Two Different Regions of Hepatitis B Virus. PLoS One. 2015 Dec 29;10(12):e0144816. doi: 10.1371/journal.pone.0144816. eCollection 2015. PubMed PMID: 26714168; PubMed Central PMCID: PMC4695080.
# Load haplotype to be genotyped. filepath<-system.file("extdata","Unknown-Genotype.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") hseq <- lst$hseq[1] # Load genotype references. filepath_geno<-system.file("extdata","GenotypeStandards_A-H.fas", package="QSutils") RefSeqs <- readDNAStringSet(filepath_geno) # Compute pairwise distances. dm <- as.matrix(DNA.dist(c(hseq,RefSeqs),model="K80")) # Distances between genotype RefSeqs dgrp <- dm[-1,-1] grp <- factor(substr(rownames(dgrp),1,1)) hr <- as.integer(grp) # Distance of the query haplotype to the reference sequence. d <- dm[1,-1] # Genotyping by the DB rule. dsc <- DBrule(dgrp,hr,d,levels(grp)) dsc
# Load haplotype to be genotyped. filepath<-system.file("extdata","Unknown-Genotype.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") hseq <- lst$hseq[1] # Load genotype references. filepath_geno<-system.file("extdata","GenotypeStandards_A-H.fas", package="QSutils") RefSeqs <- readDNAStringSet(filepath_geno) # Compute pairwise distances. dm <- as.matrix(DNA.dist(c(hseq,RefSeqs),model="K80")) # Distances between genotype RefSeqs dgrp <- dm[-1,-1] grp <- factor(substr(rownames(dgrp),1,1)) hr <- as.integer(grp) # Distance of the query haplotype to the reference sequence. d <- dm[1,-1] # Genotyping by the DB rule. dsc <- DBrule(dgrp,hr,d,levels(grp)) dsc
Generates a set of diverging haplotypes from the given DNA sequence. The haplotypes produced share a pattern of divergence with an increasing number of mutations.
Diverge(vm, seq)
Diverge(vm, seq)
vm |
Vector with number of diverging mutations to be generated. |
seq |
Reference sequence from which to generate the variants. |
max(vm)
Positions in the given sequence are randomly generated.
A substitution is also randomly produced for each of these positions.
A haplotype is generated for each element in vm
, so that it contains
vm[i]
substitutions of those previously generated.
Character string vector with the segregating haplotypes generated.
Mercedes Guerrero-Murillo and Josep Gregori
set.seed(123) m1 <- GetRandomSeq(50) hpl <- Diverge(3:6,m1) DottedAlignment(DNAStringSet(hpl))
set.seed(123) m1 <- GetRandomSeq(50) hpl <- Diverge(3:6,m1) DottedAlignment(DNAStringSet(hpl))
Function to compute a matrix of pairwise distances from DNA sequences using a
model of DNA evolution. It relies on the dist.dna()
function in the
APE package.
DNA.dist(seqs, model = "raw", gamma = FALSE, pairwise.deletion = FALSE)
DNA.dist(seqs, model = "raw", gamma = FALSE, pairwise.deletion = FALSE)
seqs |
DNAStringSet object with the aligned haplotypes. |
model |
Evolutionary model to compute genetic distance by default "raw", but "N", "TS", "TV", "JC69", "K80", "F81", "K81", "F84", "BH87", "T92","TN93", "GG95", "logdet", "paralin", "indel", or "indelblock" can also be used. |
gamma |
Gamma parameter possibly used to apply a correction to the distances or FALSE (by default). |
pairwise.deletion |
A logical indicating whether to delete sites with missing data (gaps) in a pairwise manner. The default is to delete sites with at least one missing datum in all sequences. |
Object of class "dist" with pairwise distances.
Mercedes Guerrero-Murillo and Josep Gregori
Paradis E., Claude J. and Strimmer K., APE: analyses of phylogenetics and evolution in R language. Bioinformatics. 2004, 20, 289-290
Gregori J, Perales C, Rodriguez-Frias F, Esteban JI, Quer J, Domingo E. Viral quasispecies complexity measures. Virology. 2016 Jun;493:227-37. doi: 10.1016/j.virol.2016.03.017. Epub 2016 Apr 6. Review. PubMed PMID: 27060566.
Gregori J, Salicrú M, Domingo E, Sanchez A, Esteban JI, Rodríguez-Frías F, Quer J. Inference with viral quasispecies diversity indices: clonal and NGS approaches. Bioinformatics. 2014 Apr 15;30(8):1104-1111. Epub 2014 Jan 2. PubMed PMID: 24389655.
filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") dst <- DNA.dist(lst$hseqs,model="N") dst
filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") dst <- DNA.dist(lst$hseqs,model="N") dst
Given an alignment, it takes the first sequence as reference, and depicts all equivalences in the alignment as dots, leaving only the variants with respect to the reference.
DottedAlignment(hseqs)
DottedAlignment(hseqs)
hseqs |
DNAStringSet or AAStringSet with haplotype sequences. |
A character string vector of the alignment, with dots in the conserved positions.
Mercedes Guerrero-Murillo and Josep Gregori
filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") strs <- DottedAlignment(lst$hseqs) # Create a data frame to visualize the result. data.frame(Hpl=strs,stringsAsFactors=FALSE)
filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") strs <- DottedAlignment(lst$hseqs) # Create a data frame to visualize the result. data.frame(Hpl=strs,stringsAsFactors=FALSE)
Diversity indices are influenced to a greater or lesser degree by the sample size on which they are computed. This function helps to minimize the bias inherent to sample size. First the vector of abundances is scaled to a smaller sample size, then all haplotypes with abundances below a given threshold are excluded with high confidence.
DSFT(nr, size, p.cut = 0.002, conf = 0.95)
DSFT(nr, size, p.cut = 0.002, conf = 0.95)
nr |
Vector of observed haplotype counts. |
size |
Size to downsample. |
p.cut |
Abundance threshold. |
conf |
Confidence in trimming. |
Vector of logicals, with false the haplotypes to be excluded.
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Perales C, Rodriguez-Frias F, Esteban JI, Quer J, Domingo E. Viral quasispecies complexity measures. Virology. 2016 Jun;493:227-37. doi: 10.1016/j.virol.2016.03.017. Epub 2016 Apr 6. Review. PubMed PMID: 27060566.
Gregori J, Salicrú M, Domingo E, Sanchez A, Esteban JI, Rodríguez-Frías F, Quer J. Inference with viral quasispecies diversity indices: clonal and NGS approaches. Bioinformatics. 2014 Apr 15;30(8):1104-1111. Epub 2014 Jan 2. PubMed PMID: 24389655.
# Generate viral quasispecies abundance data. set.seed(123) n <- 2000 y <- geom.series(n,0.8)+geom.series(n,0.0004) nr.pop <- round(y*1e7) # Get a sample of 10000 reads from this population. sz2 <- 10000 nr.sz2 <- table(sample(length(nr.pop),size=sz2,replace=TRUE,p=nr.pop)) # Filter out haplotypes below 0.1%. thr <- 0.1 fl <- nr.sz2>=sz2*thr/100 nr.sz2 <- nr.sz2[fl] Shannon(nr.sz2) #0.630521 # DSFT to 5000 reads. sz1 <- 5000 fl <- DSFT(nr.sz2,sz1) nr.sz2 <- nr.sz2[fl] # Compute size corrected Shannon entropy. Shannon(nr.sz2) #0.6189798
# Generate viral quasispecies abundance data. set.seed(123) n <- 2000 y <- geom.series(n,0.8)+geom.series(n,0.0004) nr.pop <- round(y*1e7) # Get a sample of 10000 reads from this population. sz2 <- 10000 nr.sz2 <- table(sample(length(nr.pop),size=sz2,replace=TRUE,p=nr.pop)) # Filter out haplotypes below 0.1%. thr <- 0.1 fl <- nr.sz2>=sz2*thr/100 nr.sz2 <- nr.sz2[fl] Shannon(nr.sz2) #0.630521 # DSFT to 5000 reads. sz1 <- 5000 fl <- DSFT(nr.sz2,sz1) nr.sz2 <- nr.sz2[fl] # Compute size corrected Shannon entropy. Shannon(nr.sz2) #0.6189798
Computes the Functional Attribute Diversity as the sum of elements in the pairwise distance matrix.
FAD(dst)
FAD(dst)
dst |
A "dist" object or a symmetrical matrix with pairwise distances. |
A value that corresponds to the Functional Attribute Diversity. The sum of matrix elements.
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Perales C, Rodriguez-Frias F, Esteban JI, Quer J, Domingo E. Viral quasispecies complexity measures. Virology. 2016 Jun;493:227-37. doi: 10.1016/j.virol.2016.03.017. Epub 2016 Apr 6. Review. PubMed PMID: 27060566.
# Create the object. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") # Compute the DNA distance matrix. dst <- DNA.dist(lst$hseqs,model="N") FAD(dst)
# Create the object. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") # Compute the DNA distance matrix. dst <- DNA.dist(lst$hseqs,model="N") FAD(dst)
Function to simulate haplotype abundances in the quasispecies.
fn.ab(n, h = 10000, r = 0.5,fn="pcf")
fn.ab(n, h = 10000, r = 0.5,fn="pcf")
n |
Number of counts to compute. |
h |
Highest abundance value. |
r |
A number to compute the abundance. See details. |
fn |
Character indicating which function to use to compute the abundances "pf","pcf" or "dfp", see details. By default "pcf". |
The abundances computed as a power of fractions, when fn is "pf", are computed according to the following equation, taking the integer part:
The lower r
, the faster the decrease in abundance, r
is in the
range 0 < r < 1.
With "pcf" the abundances are computed by a power of decreasing fractions, as counts, according to the following equation, taking the integer part:
The higher r,
the faster the decrease in abundances. In this case
r
corresponds to the power of the function, a value larger than 0,
usually in the range 0.5 < r < 4.
If fn is equal to "dfp", the abundances are computed by increasing root powers according to the following equation,taking the integer part:
Numeric vector with n
decreasing counts, where the first element
equals h
, and no element is lower than 1.
Mercedes Guerrero-Murillo and Josep Gregori
geom.series
,GetRandomSeq
,
GenerateVars
,Diverge
# Simulate a quasispecies alignment. m1 <- GetRandomSeq(50) v1 <- GenerateVars(m1,50,2,c(10,1)) qs <- c(m1,v1) w_pf <- fn.ab(length(qs),h=1000,r=1.5,fn="pf") w_pf w_pcf <- fn.ab(length(qs),h=1000,r=1.5,fn="pcf") w_pcf w_dfp <- fn.ab(length(qs),h=1000,fn="dfp") w_dfp
# Simulate a quasispecies alignment. m1 <- GetRandomSeq(50) v1 <- GenerateVars(m1,50,2,c(10,1)) qs <- c(m1,v1) w_pf <- fn.ab(length(qs),h=1000,r=1.5,fn="pf") w_pf w_pcf <- fn.ab(length(qs),h=1000,r=1.5,fn="pcf") w_pcf w_dfp <- fn.ab(length(qs),h=1000,fn="dfp") w_dfp
Computes the nucleotide or amino acid frequency at each position in the alignment.
FreqMat(seqs,nr=NULL)
FreqMat(seqs,nr=NULL)
seqs |
DNAStringSet or AAStringSet with the aligned haplotype sequences. |
nr |
An optional numeric vector with the haplotype counts. |
Matrix with the frequency of each nucleotide or amino acid in each position. A (4 x n) or (20 x n) matrix, where n is the alignment length.
Mercedes Guerrero-Murillo and Josep Gregori
filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") # Frequencies only in the alignment. FreqMat(lst$hseqs) # Also taking into account haplotype frequencies. FreqMat(lst$hseqs,lst$nr)
filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") # Frequencies only in the alignment. FreqMat(lst$hseqs) # Also taking into account haplotype frequencies. FreqMat(lst$hseqs,lst$nr)
Function to generate a set of variants for a given DNA sequence.
GenerateVars(seq, nhpl, max.muts, p.muts)
GenerateVars(seq, nhpl, max.muts, p.muts)
seq |
A character string with a DNA sequence from which to generate the variants. |
nhpl |
Number of haplotypes to generate. |
max.muts |
Maximum number of mutations in each sequence. |
p.muts |
Vector of length |
Given a DNA sequence, nhpl
variant haplotypes are generated at random,
with a maximum of max.muts
substitutions each. The probability of the
number of mutations in each haplotype generated is given by the vector
p.muts
. The positions of the mutations in each haplotype are independent
and random.
A character vector with nhpl
haplotype variants of the seq
sequence.
Mercedes Guerrero-Murillo and Josep Gregori
set.seed(123) m1 <- GetRandomSeq(50) GenerateVars(m1,50,2,c(10,1))
set.seed(123) m1 <- GetRandomSeq(50) GenerateVars(m1,50,2,c(10,1))
Fasta file with a set of well characterized sequences belonging to each
HBV genotype. See the QSutils vignette:
vignette("QSutils", package = "QSutils")
.
Fasta file format. Each sequence starts with the symbol ">" followed by the sequence ID. Subsequent lines correspond to the nucleotide sequences or peptide sequences.
filepath<-system.file("extdata","GenotypeStandards_A-H.fas", package="QSutils") lstRefs <- ReadAmplSeqs(filepath,type="DNA") RefSeqs <- lstRefs$hseq
filepath<-system.file("extdata","GenotypeStandards_A-H.fas", package="QSutils") lstRefs <- ReadAmplSeqs(filepath,type="DNA") RefSeqs <- lstRefs$hseq
Function to simulate haplotype abundances in the quasispecies by geometric series.
geom.series(n,p=0.001)
geom.series(n,p=0.001)
n |
Number of frequencies to compute. |
p |
Numeric parameter of the geometric function. |
The abundances, as counts, are computed according to the following equation:
The lower r
, the faster the decrease in abundances.
Numeric vector with n
decreasing counts.
Mercedes Guerrero-Murillo and Josep Gregori
GetRandomSeq
, GenerateVars
,
Diverge
# Simulate a quasispecies alignment. m1 <- GetRandomSeq(50) v1 <- GenerateVars(m1,50,2,c(10,1)) qs <- c(m1,v1) w <- geom.series(100,0.8)
# Simulate a quasispecies alignment. m1 <- GetRandomSeq(50) v1 <- GenerateVars(m1,50,2,c(10,1)) qs <- c(m1,v1) w <- geom.series(100,0.8)
GetInfProfile computes the information content at each position of an alignment.
GetInfProfile(seqs,nr=NULL)
GetInfProfile(seqs,nr=NULL)
seqs |
DNAStringSet or AAStringSet with the haplotype alignment. |
nr |
An optional numeric vector with the haplotype counts to take into account the information content of each position in the alignment. |
Returns a numeric vector whose length is equal to the length of the alignment. Each value corresponds to the information content of each position in the alignment.
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Perales C, Rodriguez-Frias F, Esteban JI, Quer J, Domingo E. Viral quasispecies complexity measures. Virology. 2016 Jun;493:227-37. doi: 10.1016/j.virol.2016.03.017. Epub 2016 Apr 6. Review. PubMed PMID: 27060566.
Gregori J, Salicrú M, Domingo E, Sanchez A, Esteban JI, Rodríguez-Frías F, Quer J. Inference with viral quasispecies diversity indices: clonal and NGS approaches. Bioinformatics. 2014 Apr 15;30(8):1104-1111. Epub 2014 Jan 2. PubMed PMID: 24389655.
# Load the alignment. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") # Compute the alignment's IC profile. GetInfProfile(lst$hseqs) # Also taking into account haplotype frequencies. GetInfProfile(lst$hseqs,lst$nr)
# Load the alignment. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") # Compute the alignment's IC profile. GetInfProfile(lst$hseqs) # Also taking into account haplotype frequencies. GetInfProfile(lst$hseqs,lst$nr)
Reads aligned amplicon sequences with abundance data, filters at a given minimum abundance, and sorts by mutations and abundance.
GetQSData(flnm,min.pct=0.1,type="DNA")
GetQSData(flnm,min.pct=0.1,type="DNA")
flnm |
Fasta file with haplotype sequences and their frequencies. The header of each haplotype in the fasta file is composed of an ID followed by a vertical bar "|" followed by the read counts, and eventually followed by another vertical bar and additional information (eg, Hpl.2.0001|15874|25.2). |
min.pct |
Minimum abundance, in %, to filter the reads. Defaults to 0.1%. |
type |
Character string specifying the type of the sequences in the fasta file. This must be one of "DNA" or "AA". It is "DNA" by default. |
The fasta file is loaded and the haplotype abundances, as counts, are taken
from the header of each sequence. Haplotypes with abundances below
min.pct
% are filtered out. The haplotypes are then sorted: first,
by decreasing order of the number of mutations with respect to the dominant
haplotype, and second, by decreasing order of abundances. The haplotypes are
then renamed according to the pattern Hpl.n.xxxx
, where n
represents the number of mutations, and xxxx
the abundance order within
the mutation number.
Returns a list with three elements.
bseqs |
DNAStringSet or AAStringSet with the haplotype sequences. |
nr |
Vector of haplotype counts. |
nm |
Vector of number of mutations of each haplotype with respect to the dominant (most frequent) haplotype. |
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Esteban JI, Cubero M, Garcia-Cehic D, Perales C, Casillas R, Alvarez-Tejado M, Rodríguez-Frías F, Guardia J, Domingo E, Quer J. Ultra-deep pyrosequencing (UDPS) data treatment to study amplicon HCV minor variants. PLoS One. 2013 Dec 31;8(12):e83361. doi: 10.1371/journal.pone.0083361. eCollection 2013. PubMed PMID: 24391758; PubMed Central PMCID: PMC3877031.
Ramírez C, Gregori J, Buti M, Tabernero D, Camós S, Casillas R, Quer J, Esteban R, Homs M, Rodriguez-Frías F. A comparative study of ultra-deep pyrosequencing and cloning to quantitatively analyze the viral quasispecies using hepatitis B virus infection as a model. Antiviral Res. 2013 May;98(2):273-83. doi: 10.1016/j.antiviral.2013.03.007. Epub 2013 Mar 20. PubMed PMID: 23523552.
filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst<-GetQSData(filepath,min.pct=0.1,type="DNA") lst
filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst<-GetQSData(filepath,min.pct=0.1,type="DNA") lst
Creates a random DNA sequence of a given length.
GetRandomSeq(seq.len)
GetRandomSeq(seq.len)
seq.len |
The sequence length. |
A character string representing a DNA sequence.
Mercedes Guerrero-Murillo and Josep Gregori
set.seed(123) GetRandomSeq(50)
set.seed(123) GetRandomSeq(50)
GiniSimpson calculates the unbiased estimator, GiniSimpsonVar computes Gini-Simpson asymp- totic variance, and GiniSimpsonMVUE calculates the minimum variance unbiased estimator of the Gini-Simpson index.
GiniSimpson(w) GiniSimpsonMVUE(w) GiniSimpsonVar(w)
GiniSimpson(w) GiniSimpsonMVUE(w) GiniSimpsonVar(w)
w |
Vector of observed counts or frequencies. |
A value that corresponds to the Gini-Simpson diversity index.
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Perales C, Rodriguez-Frias F, Esteban JI, Quer J, Domingo E. Viral quasispecies complexity measures. Virology. 2016 Jun;493:227-37. doi: 10.1016/j.virol.2016.03.017. Epub 2016 Apr 6. Review. PubMed PMID: 27060566.
Gregori J, Salicrú M, Domingo E, Sanchez A, Esteban JI, Rodríguez-Frías F, Quer J. Inference with viral quasispecies diversity indices: clonal and NGS approaches. Bioinformatics. 2014 Apr 15;30(8):1104-1111. Epub 2014 Jan 2. PubMed PMID: 24389655.
# A vector of haplotype counts. nr <- c(464, 62, 39, 27, 37, 16, 33, 54, 248, 20) # Gini-Simpson index. GiniSimpson(nr) # Gini-Simpson variance. GiniSimpsonVar(nr) # MVUE Gini-Simpson index. GiniSimpsonMVUE(nr)
# A vector of haplotype counts. nr <- c(464, 62, 39, 27, 37, 16, 33, 54, 248, 20) # Gini-Simpson index. GiniSimpson(nr) # Gini-Simpson variance. GiniSimpsonVar(nr) # MVUE Gini-Simpson index. GiniSimpsonMVUE(nr)
HCq computes the Havrda-Charvat estimator, and HCqVar computes the Havrda-Charvat asymptotic variance for a given exponent. By using HCqProfile, a Havrda-Charvat estimator is calculated for a predefined vector of exponents to obtain the full profile in the range, 0 to Inf.
HCq(w, q) HCqVar(w, q) HCqProfile(w, q = NULL)
HCq(w, q) HCqVar(w, q) HCqProfile(w, q = NULL)
w |
Vector of observed counts or frequencies. |
q |
Exponent. By default, a vector of values 1, 2, 3, 4 and Inf. |
In HCq
only the first element in q is considered. HCqProfile
is
vectorized and considers all elements in q. When q is NULL: in this case, a
default vector is taken to obtain the full profile in the range 0 to Inf.
A value that corresponds to the Havrda-Charvat estimator when HCq
or
HCqVar
is used. A data frame with the Havrda-Charvat estimator for each
exponent when HCqProfile
is used.
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Perales C, Rodriguez-Frias F, Esteban JI, Quer J, Domingo E. Viral quasispecies complexity measures. Virology. 2016 Jun;493:227-37. doi: 10.1016/j.virol.2016.03.017. Epub 2016 Apr 6. Review. PubMed PMID: 27060566.
Gregori J, Salicrú M, Domingo E, Sanchez A, Esteban JI, Rodríguez-Frías F, Quer J. Inference with viral quasispecies diversity indices: clonal and NGS approaches. Bioinformatics. 2014 Apr 15;30(8):1104-1111. Epub 2014 Jan 2. PubMed PMID: 24389655.
Pavoine, S. (2005). M?thodes statistiques pour la mesure de la biodiversit?. UMR CNRS 5558 Biometrie et Biologie Evolutive.
# A vector of observed counts. nr<-c(464, 62, 39, 27, 37, 16, 33, 54, 248, 20) # Havrda-Charvat estimator for q=4. HCq(nr,4) # Havrda-Charvat estimator variance for q=4. HCqVar(nr,4) # Prolife of Havrda-Charvat estimator for 0:4 and Inf. HCqProfile(nr,c(0:4,Inf)) # Full profile. HCqProfile(nr)
# A vector of observed counts. nr<-c(464, 62, 39, 27, 37, 16, 33, 54, 248, 20) # Havrda-Charvat estimator for q=4. HCq(nr,4) # Havrda-Charvat estimator variance for q=4. HCqVar(nr,4) # Prolife of Havrda-Charvat estimator for 0:4 and Inf. HCqProfile(nr,c(0:4,Inf)) # Full profile. HCqProfile(nr)
Functions to compute Hill numbers. Hill
computes the Hill number of a
single q value. HillProfile
computes Hill numbers for all elements
in vector q
.
Hill(w, q) HillProfile(w, q = NULL)
Hill(w, q) HillProfile(w, q = NULL)
w |
Vector of observed counts or frequencies. |
q |
Exponent. |
In Hill
, only the first element in q is considered. HillProfile
is vectorized and considers all elements in q
.When q
is NULL:
in this case, a default vector is taken to obtain the full profile in the range,
0 to Inf.
A value or vector of values corresponding to the Hill number estimators of passed exponents.
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Perales C, Rodriguez-Frias F, Esteban JI, Quer J, Domingo E. Viral quasispecies complexity measures. Virology. 2016 Jun;493:227-37. doi: 10.1016/j.virol.2016.03.017. Epub 2016 Apr 6. Review. PubMed PMID: 27060566.
Gregori J, Salicrú M, Domingo E, Sanchez A, Esteban JI, Rodríguez-Frías F, Quer J. Inference with viral quasispecies diversity indices: clonal and NGS approaches. Bioinformatics. 2014 Apr 15;30(8):1104-1111. Epub 2014 Jan 2. PubMed PMID: 24389655.
# Vector of observed counts. nr<-c(464, 62, 39, 27, 37, 16, 33, 54, 248, 20) # Hill numbers of order 2. Hill(nr,2) # Set of most common values. HillProfile(nr,q=c(0:4,Inf)) # Full Hill numbers profile. HillProfile(nr)
# Vector of observed counts. nr<-c(464, 62, 39, 27, 37, 16, 33, 54, 248, 20) # Hill numbers of order 2. Hill(nr,2) # Set of most common values. HillProfile(nr,q=c(0:4,Inf)) # Full Hill numbers profile. HillProfile(nr)
Computes the intersection of forward and reverse strand haplotypes after a previous abundance filter that removes strand haplotypes below a given frequency threshold or unique to a single strand.
IntersectStrandHpls(nrFW, hseqsFW, nrRV, hseqsRV, thr = 0.001)
IntersectStrandHpls(nrFW, hseqsFW, nrRV, hseqsRV, thr = 0.001)
nrFW |
Numeric vector with forward strand haplotype counts. |
hseqsFW |
DNAStringSet object with the forward strand haplotypes. |
nrRV |
Numeric vector with forward reverse strand haplotypes. |
hseqsRV |
DNAStringSet object with the reverse strand haplotypes. |
thr |
Threshold to filter haplotypes at minimum abundance. |
List object with this elements:
hseqs |
DNAStringSet object with the forward and reverse strand intersected. |
nr |
Numeric vector with the abundance of each haplotype. |
pFW |
Vector of abundances of aligned forward strand. |
pRV |
Vector of abundances of aligned reverse strand. |
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Esteban JI, Cubero M, Garcia-Cehic D, Perales C, Casillas R, Alvarez-Tejado M, Rodríguez-Frías F, Guardia J, Domingo E, Quer J. Ultra-deep pyrosequencing (UDPS) data treatment to study amplicon HCV minor variants. PLoS One. 2013 Dec 31;8(12):e83361. doi: 10.1371/journal.pone.0083361. eCollection 2013. PubMed PMID: 24391758; PubMed Central PMCID: PMC3877031.
Ramírez C, Gregori J, Buti M, Tabernero D, Camós S, Casillas R, Quer J, Esteban R, Homs M, Rodriguez-Frías F. A comparative study of ultra-deep pyrosequencing and cloning to quantitatively analyze the viral quasispecies using hepatitis B virus infection as a model. Antiviral Res. 2013 May;98(2):273-83. doi: 10.1016/j.antiviral.2013.03.007. Epub 2013 Mar 20. PubMed PMID: 23523552.
# Load objects. filepath_FW<-system.file("extdata","ToyData_FWReads.fna", package="QSutils") FW<- ReadAmplSeqs(filepath_FW,type="DNA") filepath_RV<-system.file("extdata","ToyData_RVReads.fna", package="QSutils") RV<- ReadAmplSeqs(filepath_RV,type="DNA") # Intersect the two objects, with a default threshold. IntersectStrandHpls(FW$nr,FW$hseqs,RV$nr,RV$hseqs)
# Load objects. filepath_FW<-system.file("extdata","ToyData_FWReads.fna", package="QSutils") FW<- ReadAmplSeqs(filepath_FW,type="DNA") filepath_RV<-system.file("extdata","ToyData_RVReads.fna", package="QSutils") RV<- ReadAmplSeqs(filepath_RV,type="DNA") # Intersect the two objects, with a default threshold. IntersectStrandHpls(FW$nr,FW$hseqs,RV$nr,RV$hseqs)
MutationFreq
computes the mutation frequency given a vector of counts,
and the genetic distances of each haplotype to the dominant haplotype.
MutationFreqVar
returns the variance of the mutation frequency.
MutationFreq(dst=NULL,nm=NULL,nr=NULL,len=1) MutationFreqVar(nm,nr=NULL,len=1)
MutationFreq(dst=NULL,nm=NULL,nr=NULL,len=1) MutationFreqVar(nm,nr=NULL,len=1)
dst |
A "dist" object or a symmetric matrix with pairwise distances. |
nm |
Vector of distances or differences with respect to the dominant haplotype including itself (eg, nm[1] is 0 if w[1]==max(w)). |
nr |
An optional numeric vector with the haplotype counts. |
len |
The alignment width when nm is the number of differences, otherwise 1. Defaults to 1. |
A value corresponding to the mutation frequency for MutationFreq
or
its variance for MutationFreqVar
. When nr
is NULL, the same
weight is given to each haplotype and the computed value corresponds to the
mutation frequency by entity.
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Perales C, Rodriguez-Frias F, Esteban JI, Quer J, Domingo E. Viral quasispecies complexity measures. Virology. 2016 Jun;493:227-37. doi: 10.1016/j.virol.2016.03.017. Epub 2016 Apr 6. Review. PubMed PMID: 27060566.
Gregori J, Salicrú M, Domingo E, Sanchez A, Esteban JI, Rodríguez-Frías F, Quer J. Inference with viral quasispecies diversity indices: clonal and NGS approaches. Bioinformatics. 2014 Apr 15;30(8):1104-1111. Epub 2014 Jan 2. PubMed PMID: 24389655.
DNA.dist
, GetQSData
, ReadAmplSeqs
# Load alignment with abundances. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- GetQSData(filepath,type="DNA") # Mutation frequency. dst <- DNA.dist(lst$seqs,model="raw") MutationFreq(dst=dst,len=width(lst$seqs)[1]) # Mutation frequency with abundances. MutationFreq(nm=lst$nm,nr=lst$nr,len=width(lst$seqs)[1]) # Variance of the mutation frequency. MutationFreqVar(nm=lst$nm,nr=lst$nr,len=width(lst$seqs)[1])
# Load alignment with abundances. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- GetQSData(filepath,type="DNA") # Mutation frequency. dst <- DNA.dist(lst$seqs,model="raw") MutationFreq(dst=dst,len=width(lst$seqs)[1]) # Mutation frequency with abundances. MutationFreq(nm=lst$nm,nr=lst$nr,len=width(lst$seqs)[1]) # Variance of the mutation frequency. MutationFreqVar(nm=lst$nm,nr=lst$nr,len=width(lst$seqs)[1])
Computes the table of mutation frequencies by position with respect to the alignment consensus.
MutsTbl(hseqs,nr=NULL)
MutsTbl(hseqs,nr=NULL)
hseqs |
DNAStringSet or AAStringSet with the aligned haplotype sequences. |
nr |
An optional numeric vector with the haplotype counts. When |
Matrix of mutation counts by position. A (4 x n) or (20 x n) matrix, where n is the alignment length.
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Perales C, Rodriguez-Frias F, Esteban JI, Quer J, Domingo E. Viral quasispecies complexity measures. Virology. 2016 Jun;493:227-37. doi: 10.1016/j.virol.2016.03.017. Epub 2016 Apr 6. Review. PubMed PMID: 27060566.
Gregori J, Salicrú M, Domingo E, Sanchez A, Esteban JI, Rodríguez-Frías F, Quer J. Inference with viral quasispecies diversity indices: clonal and NGS approaches. Bioinformatics. 2014 Apr 15;30(8):1104-1111. Epub 2014 Jan 2. PubMed PMID: 24389655.
# Load the haplotypes alignment with abundances. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") # Table of mutations in the alignment, regardless of haplotype abundance. MutsTbl(lst$hseqs) # Table of mutations taking into account abundance. MutsTbl(lst$hseqs,lst$nr)
# Load the haplotypes alignment with abundances. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") # Table of mutations in the alignment, regardless of haplotype abundance. MutsTbl(lst$hseqs) # Table of mutations taking into account abundance. MutsTbl(lst$hseqs,lst$nr)
Computes the mean pairwise genetic distance between sequences in the alignment.
NucleotideDiversity(dst,w=NULL)
NucleotideDiversity(dst,w=NULL)
dst |
A "dist" object or a symmetrical matrix with haplotype pairwise distances (ie, the output of DNA.dist). |
w |
An optional numeric vector with the haplotype counts. When |
A value that corresponds to the nucleotide diversity, either by entity or
abundance, depending on argument w
.
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Perales C, Rodriguez-Frias F, Esteban JI, Quer J, Domingo E. Viral quasispecies complexity measures. Virology. 2016 Jun;493:227-37. doi: 10.1016/j.virol.2016.03.017. Epub 2016 Apr 6. Review. PubMed PMID: 27060566.
Gregori J, Salicrú M, Domingo E, Sanchez A, Esteban JI, Rodríguez-Frías F, Quer J. Inference with viral quasispecies diversity indices: clonal and NGS approaches. Bioinformatics. 2014 Apr 15;30(8):1104-1111. Epub 2014 Jan 2. PubMed PMID: 24389655.
# Load haplotype alignment with abundances. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") # Compute the DNA distance matrix. dst <- DNA.dist(lst$hseqs,model="K80") NucleotideDiversity(dst, lst$nr) NucleotideDiversity(dst)
# Load haplotype alignment with abundances. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") # Compute the DNA distance matrix. dst <- DNA.dist(lst$hseqs,model="K80") NucleotideDiversity(dst, lst$nr) NucleotideDiversity(dst)
Computes the fraction of substitutions at each polymorphic site. The wild-type
base is taken as the most abundant at each site, taking into account the
weights, w
.
PolyDist(seqs,w=NULL)
PolyDist(seqs,w=NULL)
seqs |
DNAStringSet or AAStringSet with the haplotype sequences. |
w |
An optional numeric vector with the haplotype counts. When |
Vector of numbers corresponding to the fraction of substitutions at polymorphic
sites. Note that the wild type also depends on w
.
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Perales C, Rodriguez-Frias F, Esteban JI, Quer J, Domingo E. Viral quasispecies complexity measures. Virology. 2016 Jun;493:227-37. doi: 10.1016/j.virol.2016.03.017. Epub 2016 Apr 6. Review. PubMed PMID: 27060566.
Gregori J, Salicrú M, Domingo E, Sanchez A, Esteban JI, Rodríguez-Frías F, Quer J. Inference with viral quasispecies diversity indices: clonal and NGS approaches. Bioinformatics. 2014 Apr 15;30(8):1104-1111. Epub 2014 Jan 2. PubMed PMID: 24389655.
# Load haplotype alignment with abundances. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") PolyDist(lst$hseqs) PolyDist(lst$hseqs,lst$nr)
# Load haplotype alignment with abundances. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") PolyDist(lst$hseqs) PolyDist(lst$hseqs,lst$nr)
Set of functions to estimate Rao’s functional entropy. Rao
calculates
the Rao entropy, RaoVar
the variance of the Rao estimator, RaoPow
the Rao entropy of order q, and RaoPowProfile
the functional Rao
entropy profile for the given set of exponents.
Rao(dst, w=NULL) RaoVar(dst,w=NULL) RaoPow(dst,q,w=NULL) RaoPowProfile(dst,w=NULL,q=NULL)
Rao(dst, w=NULL) RaoVar(dst,w=NULL) RaoPow(dst,q,w=NULL) RaoPowProfile(dst,w=NULL,q=NULL)
dst |
A "dist" object, output of the DNA.dist function. |
w |
An optional numeric vector with the haplotype counts. When |
q |
Exponent. A single value for |
A single value for Rao
, RaoVar
and RaoPow
. A vector of
values for RaoPowProfile
corresponding to each exponent in vector q.
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Perales C, Rodriguez-Frias F, Esteban JI, Quer J, Domingo E. Viral quasispecies complexity measures. Virology. 2016 Jun;493:227-37. doi: 10.1016/j.virol.2016.03.017. Epub 2016 Apr 6. Review. PubMed PMID: 27060566.
Gregori J, Salicrú M, Domingo E, Sanchez A, Esteban JI, Rodríguez-Frías F, Quer J. Inference with viral quasispecies diversity indices: clonal and NGS approaches. Bioinformatics. 2014 Apr 15;30(8):1104-1111. Epub 2014 Jan 2. PubMed PMID: 24389655.
Pavoine, S. (2005). Méthodes statistiques pour la mesure de la biodiversité. UMR CNRS 5558 «Biométrie et Biologie Evolutive».
# Load haplotype alignment with abundances. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") # DNA pairwise distances. dst <- DNA.dist(lst$hseqs,model="N") Rao(dst,lst$nr) RaoVar(dst,lst$nr) RaoPow(dst,2,lst$nr) RaoPowProfile(dst,lst$nr,c(0:4,Inf)) RaoPowProfile(dst,lst$nr)
# Load haplotype alignment with abundances. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") # DNA pairwise distances. dst <- DNA.dist(lst$hseqs,model="N") Rao(dst,lst$nr) RaoVar(dst,lst$nr) RaoPow(dst,2,lst$nr) RaoPowProfile(dst,lst$nr,c(0:4,Inf)) RaoPowProfile(dst,lst$nr)
Loads an alignment of haplotypes and their frequencies from a fasta file.
ReadAmplSeqs(flnm,type="DNA")
ReadAmplSeqs(flnm,type="DNA")
flnm |
File name of a fasta file with haplotype sequences and their frequencies. The header of each haplotype in the fasta file is composed of an ID followed by a vertical bar "|" followed by the read count, and eventually followed by another vertical bar and additional information (eg, Hpl.2.0001|15874|25.2). |
type |
Character string specifying the types of sequences in the fasta file. This must be either "DNA" or "AA". It is "DNA" by default. |
Returns a list with two elements:
nr |
Vector of the haplotype counts. |
hseqs |
DNAStringSet or AAStringSet with the haplotype DNA sequences or amino acid sequences. |
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Esteban JI, Cubero M, Garcia-Cehic D, Perales C, Casillas R, Alvarez-Tejado M, Rodríguez-Frías F, Guardia J, Domingo E, Quer J. Ultra-deep pyrosequencing (UDPS) data treatment to study amplicon HCV minor variants. PLoS One. 2013 Dec 31;8(12):e83361. doi: 10.1371/journal.pone.0083361. eCollection 2013. PubMed PMID: 24391758; PubMed Central PMCID: PMC3877031.
Ramírez C, Gregori J, Buti M, Tabernero D, Camós S, Casillas R, Quer J, Esteban R, Homs M, Rodriguez-Frías F. A comparative study of ultra-deep pyrosequencing and cloning to quantitatively analyze the viral quasispecies using hepatitis B virus infection as a model. Antiviral Res. 2013 May;98(2):273-83. doi: 10.1016/j.antiviral.2013.03.007. Epub 2013 Mar 20. PubMed PMID: 23523552.
filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") lst
filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") lst
Functions to compute the Rényi entropy given a vector of counts
RenyiProfile
computes the Rényi number for a set of exponents.
Renyi(w, q) RenyiProfile(w, q = NULL)
Renyi(w, q) RenyiProfile(w, q = NULL)
w |
Vector of observed counts or frequencies. |
q |
Exponent. A single value for |
A single value for Renyi
. A data frame with exponents and entropies
for RenyiProfile
.
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Perales C, Rodriguez-Frias F, Esteban JI, Quer J, Domingo E. Viral quasispecies complexity measures. Virology. 2016 Jun;493:227-37. doi: 10.1016/j.virol.2016.03.017. Epub 2016 Apr 6. Review. PubMed PMID: 27060566.
Gregori J, Salicrú M, Domingo E, Sanchez A, Esteban JI, Rodríguez-Frías F, Quer J. Inference with viral quasispecies diversity indices: clonal and NGS approaches. Bioinformatics. 2014 Apr 15;30(8):1104-1111. Epub 2014 Jan 2. PubMed PMID: 24389655.
Pavoine, S. (2005). Méthodes statistiques pour la mesure de la biodiversité. UMR CNRS 5558 «Biométrie et Biologie Evolutive».
# A vector of observed counts. nr<-c(464, 62, 39, 27, 37, 16, 33, 54, 248, 20) Renyi(nr,2) RenyiProfile(nr,c(0:4,Inf)) RenyiProfile(nr)
# A vector of observed counts. nr<-c(464, 62, 39, 27, 37, 16, 33, 54, 248, 20) Renyi(nr,2) RenyiProfile(nr,c(0:4,Inf)) RenyiProfile(nr)
Reports the variants of a DNAStringSet or AAStringSet of haplotypes given a reference sequence.
ReportVariants(hseqs,ref.seq,nr=NULL,start=1)
ReportVariants(hseqs,ref.seq,nr=NULL,start=1)
hseqs |
DNAStringSet or AAstringSet object of the aligned haplotypes. |
ref.seq |
Character vector with the reference sequence of the alignment. |
nr |
Numeric vector with the abundances of each haplotype in hseqs. When |
start |
Position of the first nucleotide in the alignment |
A dataframe with 4 columns: the nucleotide in the reference sequence, the position, the variant nucleotide, and its abundance.
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Esteban JI, Cubero M, Garcia-Cehic D, Perales C, Casillas R, Alvarez-Tejado M, Rodríguez-Frías F, Guardia J, Domingo E, Quer J. Ultra-deep pyrosequencing (UDPS) data treatment to study amplicon HCV minor variants. PLoS One. 2013 Dec 31;8(12):e83361. doi: 10.1371/journal.pone.0083361. eCollection 2013. PubMed PMID: 24391758; PubMed Central PMCID: PMC3877031.
Ramírez C, Gregori J, Buti M, Tabernero D, Camós S, Casillas R, Quer J, Esteban R, Homs M, Rodriguez-Frías F. A comparative study of ultra-deep pyrosequencing and cloning to quantitatively analyze the viral quasispecies using hepatitis B virus infection as a model. Antiviral Res. 2013 May;98(2):273-83. doi: 10.1016/j.antiviral.2013.03.007. Epub 2013 Mar 20. PubMed PMID: 23523552.
# Load objects. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") # Report the variants in these haplotypes, # taking as a reference the most abundant haplotype. ReportVariants(lst$hseqs[-1], ref.seq= as.character(lst$hseqs[1]), lst$nr[-1], start = 1)
# Load objects. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") # Report the variants in these haplotypes, # taking as a reference the most abundant haplotype. ReportVariants(lst$hseqs[-1], ref.seq= as.character(lst$hseqs[1]), lst$nr[-1], start = 1)
Computes the number of segregating (polymorphic) sites in a given alignment. That is, the number of sites with more than a single nucleotide or amino acid in the alignment.
SegSites(seqs)
SegSites(seqs)
seqs |
DNAStringSet or AAStringSet with the haplotype sequences. |
A value corresponding to the number of polymorphic sites.
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Perales C, Rodriguez-Frias F, Esteban JI, Quer J, Domingo E. Viral quasispecies complexity measures. Virology. 2016 Jun;493:227-37. doi: 10.1016/j.virol.2016.03.017. Epub 2016 Apr 6. Review. PubMed PMID: 27060566.
Gregori J, Salicrú M, Domingo E, Sanchez A, Esteban JI, Rodríguez-Frías F, Quer J. Inference with viral quasispecies diversity indices: clonal and NGS approaches. Bioinformatics. 2014 Apr 15;30(8):1104-1111. Epub 2014 Jan 2. PubMed PMID: 24389655.
# Create the object. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") SegSites(lst$hseqs)
# Create the object. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") SegSites(lst$hseqs)
Shannon
computes the Shannon entropy.
NormShannon
returns the normalized Shannon entropy.
ShannonVar
computes the Shannon entropy asymptotic variance.
NormShannonVar
computes the normalized Shannon entropy asymptotic
variance.
Shannon(w) ShannonVar(w) NormShannon(w) NormShannonVar(w)
Shannon(w) ShannonVar(w) NormShannon(w) NormShannonVar(w)
w |
Vector of observed counts or frequencies. |
A single value with the result of the computations.
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Perales C, Rodriguez-Frias F, Esteban JI, Quer J, Domingo E. Viral quasispecies complexity measures. Virology. 2016 Jun;493:227-37. doi: 10.1016/j.virol.2016.03.017. Epub 2016 Apr 6. Review. PubMed PMID: 27060566.
Gregori J, Salicrú M, Domingo E, Sanchez A, Esteban JI, Rodríguez-Frías F, Quer J. Inference with viral quasispecies diversity indices: clonal and NGS approaches. Bioinformatics. 2014 Apr 15;30(8):1104-1111. Epub 2014 Jan 2. PubMed PMID: 24389655.
# Create a vector of observed counts. nr<-c(464, 62, 39, 27, 37, 16, 33, 54, 248, 20) # Shannon entropy. Shannon(nr) # Shannon entropy variance. ShannonVar(nr) # Normalized Shannon entropy. NormShannon(nr) # Normalized Shannon entropy variance. NormShannonVar(nr)
# Create a vector of observed counts. nr<-c(464, 62, 39, 27, 37, 16, 33, 54, 248, 20) # Shannon entropy. Shannon(nr) # Shannon entropy variance. ShannonVar(nr) # Normalized Shannon entropy. NormShannon(nr) # Normalized Shannon entropy variance. NormShannonVar(nr)
Sorts and renames haplotypes by the number of mutations with respect to the dominant haplotype, and by abundance.
SortByMutations(bseqs, nr)
SortByMutations(bseqs, nr)
bseqs |
DNAStringSet or AAStringSet object with the haplotype alignment. |
nr |
Vector with the haplotype counts. |
The haplotypes are pairwise-aligned to the dominant haplotype and then sorted:
first, by decreasing order of the number of differences with respect to the
dominant haplotype, and second, by decreasing order of abundance. As a result,
haplotypes are renamed according to the pattern Hpl.n.xxxx
, where
n
represents the number of differences, and xxxx
the abundance
order within the mutation number.
Returns a list with three elements.
bseqs |
DNAStringSet or AAStringSet with the haplotype sequences. |
nr |
Vector of the haplotype counts. |
nm |
Vector of the number of differences of each haplotype with respect to the dominant haplotype. |
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Perales C, Rodriguez-Frias F, Esteban JI, Quer J, Domingo E. Viral quasispecies complexity measures. Virology. 2016 Jun;493:227-37. doi: 10.1016/j.virol.2016.03.017. Epub 2016 Apr 6. Review. PubMed PMID: 27060566.
Gregori J, Salicrú M, Domingo E, Sanchez A, Esteban JI, Rodríguez-Frías F, Quer J. Inference with viral quasispecies diversity indices: clonal and NGS approaches. Bioinformatics. 2014 Apr 15;30(8):1104-1111. Epub 2014 Jan 2. PubMed PMID: 24389655.
# Load haplotype alignment with abundances. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") SortByMutations(lst$hseq,lst$nr)
# Load haplotype alignment with abundances. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") SortByMutations(lst$hseq,lst$nr)
Computes the nucleotide or amino acid frequencies at all polymorphic sites in the alignment.
SummaryMuts(seqs, w = NULL, off = 0)
SummaryMuts(seqs, w = NULL, off = 0)
seqs |
DNAStringSet or AAStringSet with the haplotype sequences. |
w |
An optional numeric vector with the haplotype counts. When |
off |
Offset of first position in the alignment. |
Data frame with the polymorphic positions and nucleotide or amino acid frequencies.
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Esteban JI, Cubero M, Garcia-Cehic D, Perales C, Casillas R, Alvarez-Tejado M, Rodríguez-Frías F, Guardia J, Domingo E, Quer J. Ultra-deep pyrosequencing (UDPS) data treatment to study amplicon HCV minor variants. PLoS One. 2013 Dec 31;8(12):e83361. doi: 10.1371/journal.pone.0083361. eCollection 2013. PubMed PMID: 24391758; PubMed Central PMCID: PMC3877031.
Ramírez C, Gregori J, Buti M, Tabernero D, Camós S, Casillas R, Quer J, Esteban R, Homs M, Rodriguez-Frías F. A comparative study of ultra-deep pyrosequencing and cloning to quantitatively analyze the viral quasispecies using hepatitis B virus infection as a model. Antiviral Res. 2013 May;98(2):273-83. doi: 10.1016/j.antiviral.2013.03.007. Epub 2013 Mar 20. PubMed PMID: 23523552.
# Load haplotype alignment with abundances. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") # Distribution of nucleotides at polymorphic sites. SummaryMuts(lst$hseqs,lst$nr,off=0)
# Load haplotype alignment with abundances. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") # Distribution of nucleotides at polymorphic sites. SummaryMuts(lst$hseqs,lst$nr,off=0)
TotalMutations
computes the number of mutations in the alignment.
TotalMutations(hseqs,w)
TotalMutations(hseqs,w)
hseqs |
DNAStringSet or AAStringSet with the haplotype sequences. |
w |
An optional numeric vector with the haplotype counts used to compute the
total number of mutations in the population, that is, taking into account
haplotype abundances. When |
A value corresponding to the number of mutations. Note that the wild-type is
decided taking w
into account.
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Perales C, Rodriguez-Frias F, Esteban JI, Quer J, Domingo E. Viral quasispecies complexity measures. Virology. 2016 Jun;493:227-37. doi: 10.1016/j.virol.2016.03.017. Epub 2016 Apr 6. Review. PubMed PMID: 27060566.
Gregori J, Salicrú M, Domingo E, Sanchez A, Esteban JI, Rodríguez-Frías F, Quer J. Inference with viral quasispecies diversity indices: clonal and NGS approaches. Bioinformatics. 2014 Apr 15;30(8):1104-1111. Epub 2014 Jan 2. PubMed PMID: 24389655.
# Create the object. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") TotalMutations(lst$hseqs) TotalMutations(lst$hseqs,lst$nr)
# Create the object. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") TotalMutations(lst$hseqs) TotalMutations(lst$hseqs,lst$nr)
Fasta file of sequenced data with some missing information. This is toy data
to illustrate some functions of the package QSutils
package.
Fasta file format. Each sequence starts with the symbol ">" followed by the sequence ID. Subsequent lines correspond to the nucleotide sequences or peptide sequences.
Collapse
, CorrectGapsAndNs
and
Recollapse
filepath<-system.file("extdata","Toy.GapsAndNs.fna", package="QSutils") reads <- readDNAStringSet(filepath) lstCollapsed <- Collapse(reads) DottedAlignment(lstCollapsed$hseqs) lstCorrected<-CorrectGapsAndNs(lstCollapsed$hseqs[2:length(lstCollapsed$hseqs)], lstCollapsed$hseqs[[1]]) lstCorrected<- c(lstCollapsed$hseqs[1],lstCorrected) lstCorrected lstRecollapsed<-Recollapse(lstCorrected,lstCollapsed$nr) lstRecollapsed
filepath<-system.file("extdata","Toy.GapsAndNs.fna", package="QSutils") reads <- readDNAStringSet(filepath) lstCollapsed <- Collapse(reads) DottedAlignment(lstCollapsed$hseqs) lstCorrected<-CorrectGapsAndNs(lstCollapsed$hseqs[2:length(lstCollapsed$hseqs)], lstCollapsed$hseqs[[1]]) lstCorrected<- c(lstCollapsed$hseqs[1],lstCorrected) lstCorrected lstRecollapsed<-Recollapse(lstCorrected,lstCollapsed$nr) lstRecollapsed
Fasta file that contains the sequence of 10 haplotypes used as examples in
the QSutils
package.
Fasta file format. Each sequence starts with the symbol ">" followed by the sequence ID. Subsequent lines correspond to the nucleotide sequences or peptide sequences.
filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") lst
filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") lst
Fasta file with forward strand reads. Toy data used to illustrate the
intersections of forward and reverse haplotypes with the function
IntersectStrandHpls
.
Fasta file format. Each sequence starts with the symbol ">" followed by the sequence ID. Subsequent lines correspond to the nucleotide sequences or peptide sequences.
ToyData_RVReads.fna
, IntersectStrandHpls
filepath_FW<-system.file("extdata","ToyData_FWReads.fna", package="QSutils") lstFW <- ReadAmplSeqs(filepath_FW,type="DNA") filepath_RV<-system.file("extdata","ToyData_RVReads.fna", package="QSutils") lstRV <- ReadAmplSeqs(filepath_RV,type="DNA") lstI <- IntersectStrandHpls(lstFW$nr,lstFW$hseqs,lstRV$nr,lstRV$hseqs) lstI
filepath_FW<-system.file("extdata","ToyData_FWReads.fna", package="QSutils") lstFW <- ReadAmplSeqs(filepath_FW,type="DNA") filepath_RV<-system.file("extdata","ToyData_RVReads.fna", package="QSutils") lstRV <- ReadAmplSeqs(filepath_RV,type="DNA") lstI <- IntersectStrandHpls(lstFW$nr,lstFW$hseqs,lstRV$nr,lstRV$hseqs) lstI
Fasta file with reverse strand reads. Toy data used to illustrate the
intersections of forward and reverse haplotypes with the function
IntersectStrandHpls
.
Fasta file format. Each sequence starts with the symbol ">" followed by the sequence ID. Subsequent lines correspond to the nucleotide sequences or peptide sequences.
ToyData_FWReads.fna
, IntersectStrandHpls
filepath_FW<-system.file("extdata","ToyData_FWReads.fna", package="QSutils") lstFW <- ReadAmplSeqs(filepath_FW,type="DNA") filepath_RV<-system.file("extdata","ToyData_RVReads.fna", package="QSutils") lstRV <- ReadAmplSeqs(filepath_RV,type="DNA") lstI <- IntersectStrandHpls(lstFW$nr,lstFW$hseqs,lstRV$nr,lstRV$hseqs) lstI
filepath_FW<-system.file("extdata","ToyData_FWReads.fna", package="QSutils") lstFW <- ReadAmplSeqs(filepath_FW,type="DNA") filepath_RV<-system.file("extdata","ToyData_RVReads.fna", package="QSutils") lstRV <- ReadAmplSeqs(filepath_RV,type="DNA") lstI <- IntersectStrandHpls(lstFW$nr,lstFW$hseqs,lstRV$nr,lstRV$hseqs) lstI
UniqueMutations
computes the number of unique mutations in the alignment.
UniqueMutations(hseqs)
UniqueMutations(hseqs)
hseqs |
DNAStringSet or AAStringSet with the haplotype sequences. |
A value corresponding to the number of mutations.
Mercedes Guerrero-Murillo and Josep Gregori
Gregori J, Perales C, Rodriguez-Frias F, Esteban JI, Quer J, Domingo E. Viral quasispecies complexity measures. Virology. 2016 Jun;493:227-37. doi: 10.1016/j.virol.2016.03.017. Epub 2016 Apr 6. Review. PubMed PMID: 27060566.
Gregori J, Salicrú M, Domingo E, Sanchez A, Esteban JI, Rodríguez-Frías F, Quer J. Inference with viral quasispecies diversity indices: clonal and NGS approaches. Bioinformatics. 2014 Apr 15;30(8):1104-1111. Epub 2014 Jan 2. PubMed PMID: 24389655.
# Create the object. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") UniqueMutations(lst$hseqs)
# Create the object. filepath<-system.file("extdata","ToyData_10_50_1000.fna", package="QSutils") lst <- ReadAmplSeqs(filepath,type="DNA") UniqueMutations(lst$hseqs)
Fasta file with hepatitis B virus sequences of unknown genotype. This is used
to illustrate the genotyping of HBV sequences with the QSutils
package.
Fasta file format. Each sequence starts with the symbol ">" followed by the sequence ID. Subsequent lines correspond to the nucleotide sequences or peptide sequences.
filepath<-system.file("extdata","Unknown-Genotype.fna", package="QSutils") lst2Geno <- ReadAmplSeqs(filepath,type="DNA") hseq <- lst2Geno$hseq[1] hseq
filepath<-system.file("extdata","Unknown-Genotype.fna", package="QSutils") lst2Geno <- ReadAmplSeqs(filepath,type="DNA") hseq <- lst2Geno$hseq[1] hseq