Title: | MSA2dist calculates pairwise distances between all sequences of a DNAStringSet or a AAStringSet using a custom score matrix and conducts codon based analysis |
---|---|
Description: | MSA2dist calculates pairwise distances between all sequences of a DNAStringSet or a AAStringSet using a custom score matrix and conducts codon based analysis. It uses scoring matrices to be used in these pairwise distance calcualtions which can be adapted to any scoring for DNA or AA characters. E.g. by using literal distances MSA2dist calculates pairwise IUPAC distances. |
Authors: | Kristian K Ullrich [aut, cre] |
Maintainer: | Kristian K Ullrich <[email protected]> |
License: | GPL-3 + file LICENSE |
Version: | 1.11.1 |
Built: | 2025-01-17 04:09:19 UTC |
Source: | https://github.com/bioc/MSA2dist |
This function return the selfscore from an AAStringSet
.
aa2selfscore(aa, scorematrix = "BLOSUM62")
aa2selfscore(aa, scorematrix = "BLOSUM62")
aa |
|
scorematrix |
score matrix to use [default: BLOSUM62] |
data.frame
Kristian K Ullrich
XStringSet-class
,
substitution_matrices
data(woodmouse, package="ape") #cds2aa(dnabin2dnastring(woodmouse), shorten=TRUE, #genetic.code=Biostrings::getGeneticCode("2")) woodmouse |> dnabin2dnastring() |> cds2aa(shorten=TRUE, genetic.code=Biostrings::getGeneticCode("2")) |> aa2selfscore()
data(woodmouse, package="ape") #cds2aa(dnabin2dnastring(woodmouse), shorten=TRUE, #genetic.code=Biostrings::getGeneticCode("2")) woodmouse |> dnabin2dnastring() |> cds2aa(shorten=TRUE, genetic.code=Biostrings::getGeneticCode("2")) |> aa2selfscore()
This function converts an ape
AAbin
into
AAStringSet
.
aabin2aastring(aabin)
aabin2aastring(aabin)
aabin |
|
An object of class AAStringSet
Kristian K Ullrich
as.alignment
as.DNAbin.alignment
AAStringSet
data(woodmouse, package="ape") ## convert into AAStringSet #aabin2aastring(ape::trans(woodmouse, 2)) ape::trans(woodmouse, 2) |> aabin2aastring()
data(woodmouse, package="ape") ## convert into AAStringSet #aabin2aastring(ape::trans(woodmouse, 2)) ape::trans(woodmouse, 2) |> aabin2aastring()
getAAMatrix()
from the alakazam
package.
data(AAMatrix)
data(AAMatrix)
an object of class matrix
score matrix
Gupta N, Vander Heiden J, Uduman M, Gadala-Maria D, Yaari G, Kleinstein S (2015) Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics. 31(20), 3356-3358.
data("AAMatrix", package="MSA2dist")
data("AAMatrix", package="MSA2dist")
This function converts an AAStringSet
into an ape
AAbin
.
aastring2aabin(aa)
aastring2aabin(aa)
aa |
|
An object of class AAbin
Kristian K Ullrich
as.alignment
as.DNAbin.alignment
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) ## convert into AAbin #aastring2aabin(cds2aa(cds1.cds2.aln)) cds1.cds2.aln |> cds2aa() |> aastring2aabin()
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) ## convert into AAbin #aastring2aabin(cds2aa(cds1.cds2.aln)) cds1.cds2.aln |> cds2aa() |> aastring2aabin()
This function converts a AAStringSet
into an
seqinr
alignment
.
aastring2aln(aa)
aastring2aln(aa)
aa |
|
An object of class alignment
which is a list with the
following components:nb
the number of aligned sequencesnam
a vector of strings containing the names of the aligned
sequencesseq
a vector of strings containing the aligned sequencescom
a vector of strings containing the commentaries for each sequence
or NA
if there are no comments
Kristian K Ullrich
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) #aastring2aln(cds2aa(cds1.cds2.aln)) cds1.cds2.aln |> cds2aa() |> aastring2aln()
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) #aastring2aln(cds2aa(cds1.cds2.aln)) cds1.cds2.aln |> cds2aa() |> aastring2aln()
This function calculates pairwise distances for all combinations
of a AAStringSet
.
aastring2dist( aa, threads = 1, symmetric = TRUE, score = NULL, mask = NULL, region = NULL )
aastring2dist( aa, threads = 1, symmetric = TRUE, score = NULL, mask = NULL, region = NULL )
aa |
|
threads |
number of parallel threads [default: 1] |
symmetric |
symmetric score matrix [default: TRUE] |
score |
|
mask |
|
region |
|
A data.frame
of pairwise distance values
distSTRING
, sites used sitesUsed
and region used
regionUsed
Kristian K Ullrich
## load example sequence data data("hiv", package="MSA2dist") #aastring2dist(cds2aa(hiv), score=granthamMatrix()) hiv |> cds2aa() |> aastring2dist(score=granthamMatrix()) ## create mask mask1 <- IRanges::IRanges(start=c(11,41,71), end=c(20,50,80)) ## use mask hiv |> cds2aa() |> aastring2dist(score=granthamMatrix(), mask=mask1) ## use region region1 <- IRanges::IRanges(start=c(1,75), end=c(45,85)) hiv |> cds2aa() |> aastring2dist(score=granthamMatrix(), region=region1) ## use mask and region hiv |> cds2aa() |> aastring2dist(score=granthamMatrix(), mask=mask1, region=region1) ## use asymmetric score matrix myscore <- granthamMatrix() myscore[5, 6] <- 0 h <- hiv |> cds2aa() |> aastring2dist(score=myscore, symmetric=FALSE) h$distSTRING[1:2, 1:2]
## load example sequence data data("hiv", package="MSA2dist") #aastring2dist(cds2aa(hiv), score=granthamMatrix()) hiv |> cds2aa() |> aastring2dist(score=granthamMatrix()) ## create mask mask1 <- IRanges::IRanges(start=c(11,41,71), end=c(20,50,80)) ## use mask hiv |> cds2aa() |> aastring2dist(score=granthamMatrix(), mask=mask1) ## use region region1 <- IRanges::IRanges(start=c(1,75), end=c(45,85)) hiv |> cds2aa() |> aastring2dist(score=granthamMatrix(), region=region1) ## use mask and region hiv |> cds2aa() |> aastring2dist(score=granthamMatrix(), mask=mask1, region=region1) ## use asymmetric score matrix myscore <- granthamMatrix() myscore[5, 6] <- 0 h <- hiv |> cds2aa() |> aastring2dist(score=myscore, symmetric=FALSE) h$distSTRING[1:2, 1:2]
This function adds mask information as an IRanges
object,
START
and END
information, to a
DNAStringSet
or an AAStringSet
and puts them into the
metadata
information.
This information can be used to restrict the distance calculation to
specific regions of the DNAStringSet
or the AAStringSet
.
addmask2string(seq, mask = NULL, append = TRUE)
addmask2string(seq, mask = NULL, append = TRUE)
seq |
|
mask |
|
append |
indicate if mask should be appended or overwritten [default: TRUE] |
An object of class DNAStringSet
or AAStringSet
Kristian K Ullrich
addregion2string
,
addpop2string
,
addpos2string
## load example sequence data data(iupac, package="MSA2dist") iupac.aa <- iupac |> cds2aa(shorten = TRUE) ## create mask mask1 <- IRanges::IRanges(start=c(1,41), end=c(20,50)) ## add mask iupac.aa <- iupac.aa |> addmask2string(mask=mask1) #(iupac.aa |> slot("metadata"))$mask iupac.aa |> getmask() ## append mask mask2 <- IRanges::IRanges(start=c(21), end=c(30)) iupac.aa <- iupac.aa |> addmask2string(mask=mask2) #(iupac.aa |> slot("metadata"))$mask iupac.aa |> getmask() ## overwrite mask iupac.aa <- iupac.aa |> addmask2string(mask=mask2, append=FALSE) #(iupac.aa |> slot("metadata"))$mask iupac.aa |> getmask() ## reduce by mask #iupac.aa.region <- iupac.aa |> string2region(mask= # (iupac.aa |> slot("metadata"))$mask) iupac.aa.region <- iupac.aa |> string2region(mask= getmask(iupac.aa)) #iupac.aa.region |> slot("metadata") iupac.aa.region |> getmask()
## load example sequence data data(iupac, package="MSA2dist") iupac.aa <- iupac |> cds2aa(shorten = TRUE) ## create mask mask1 <- IRanges::IRanges(start=c(1,41), end=c(20,50)) ## add mask iupac.aa <- iupac.aa |> addmask2string(mask=mask1) #(iupac.aa |> slot("metadata"))$mask iupac.aa |> getmask() ## append mask mask2 <- IRanges::IRanges(start=c(21), end=c(30)) iupac.aa <- iupac.aa |> addmask2string(mask=mask2) #(iupac.aa |> slot("metadata"))$mask iupac.aa |> getmask() ## overwrite mask iupac.aa <- iupac.aa |> addmask2string(mask=mask2, append=FALSE) #(iupac.aa |> slot("metadata"))$mask iupac.aa |> getmask() ## reduce by mask #iupac.aa.region <- iupac.aa |> string2region(mask= # (iupac.aa |> slot("metadata"))$mask) iupac.aa.region <- iupac.aa |> string2region(mask= getmask(iupac.aa)) #iupac.aa.region |> slot("metadata") iupac.aa.region |> getmask()
This function adds population information to a
DNAStringSet
or an AAStringSet
and puts them into the
metadata
information.
__Note__: All unassigned sequences will be put into pop "unassigned"!
Do not use "unassigned" as a population name!
__Note__: Names in a population in the poplist must match sequence names!
__Note__: Duplicated assignments are allowed!
addpop2string(seq, poplist)
addpop2string(seq, poplist)
seq |
|
poplist |
named |
An object of class DNAStringSet
or AAStringSet
Kristian K Ullrich
addmask2string
,
addregion2string
,
addpos2string
## load example sequence data data(iupac, package="MSA2dist") iupac.aa <- iupac |> cds2aa(shorten = TRUE) ## create poplist poplist <- list(FRA = grep("Mmd.FRA", names(iupac)), GER = grep("Mmd.GER", names(iupac)), IRA = grep("Mmd.IRA", names(iupac)), AFG = grep("Mmm.AFG", names(iupac))) iupac.aa <- iupac.aa |> addpop2string(poplist) #(iupac.aa |> slot("metadata"))$pop.integer iupac.aa |> popinteger() #(iupac.aa |> slot("metadata"))$pop.names iupac.aa |> popnames() ## mxixing index and names poplist <- list(FRA = names(iupac)[grep("Mmd.FRA", names(iupac))], GER = grep("Mmd.GER", names(iupac)), IRA = names(iupac)[grep("Mmd.IRA", names(iupac))], AFG = grep("Mmm.AFG", names(iupac))) iupac.aa <- iupac.aa |> addpop2string(poplist) iupac.aa |> popinteger() iupac.aa |> popnames() ## leaving out some sequences which will be assigned as "unassigned" poplist <- list(FRA = names(iupac)[grep("Mmd.FRA", names(iupac))], GER = grep("Mmd.GER", names(iupac)), IRA = names(iupac)[grep("Mmd.IRA", names(iupac))]) iupac.aa <- iupac.aa |> addpop2string(poplist) iupac.aa |> popinteger() iupac.aa |> popnames()
## load example sequence data data(iupac, package="MSA2dist") iupac.aa <- iupac |> cds2aa(shorten = TRUE) ## create poplist poplist <- list(FRA = grep("Mmd.FRA", names(iupac)), GER = grep("Mmd.GER", names(iupac)), IRA = grep("Mmd.IRA", names(iupac)), AFG = grep("Mmm.AFG", names(iupac))) iupac.aa <- iupac.aa |> addpop2string(poplist) #(iupac.aa |> slot("metadata"))$pop.integer iupac.aa |> popinteger() #(iupac.aa |> slot("metadata"))$pop.names iupac.aa |> popnames() ## mxixing index and names poplist <- list(FRA = names(iupac)[grep("Mmd.FRA", names(iupac))], GER = grep("Mmd.GER", names(iupac)), IRA = names(iupac)[grep("Mmd.IRA", names(iupac))], AFG = grep("Mmm.AFG", names(iupac))) iupac.aa <- iupac.aa |> addpop2string(poplist) iupac.aa |> popinteger() iupac.aa |> popnames() ## leaving out some sequences which will be assigned as "unassigned" poplist <- list(FRA = names(iupac)[grep("Mmd.FRA", names(iupac))], GER = grep("Mmd.GER", names(iupac)), IRA = names(iupac)[grep("Mmd.IRA", names(iupac))]) iupac.aa <- iupac.aa |> addpop2string(poplist) iupac.aa |> popinteger() iupac.aa |> popnames()
This function adds GenomicRanges
information,
CHROM
, START
and END
to a
DNAStringSet
or an AAStringSet
and puts them into the
metadata
information.
This information can be used to find overlaps with a chromosome wide mask.
addpos2string(seq, chrom = NULL, start = NULL, end = NULL)
addpos2string(seq, chrom = NULL, start = NULL, end = NULL)
seq |
|
chrom |
chromosome name [mandatory] |
start |
start position [mandatory] |
end |
end position [mandatory] |
An object of class DNAStringSet
or AAStringSet
Kristian K Ullrich
addmask2string
,
addregion2string
,
addpop2string
## load example sequence data data(iupac, package="MSA2dist") ## add position iupac <- iupac |> addpos2string(chrom="chr1", start=1, end=1000) #(iupac |> slot("metadata"))$GRanges iupac |> getpos()
## load example sequence data data(iupac, package="MSA2dist") ## add position iupac <- iupac |> addpos2string(chrom="chr1", start=1, end=1000) #(iupac |> slot("metadata"))$GRanges iupac |> getpos()
This function adds region information as an IRanges
object, START
and END
information, to a
DNAStringSet
or an AAStringSet
and puts them into the
metadata
information.
This information can be used to restrict the distance calculation to
specific regions of the DNAStringSet
or the AAStringSet
.
addregion2string(seq, region = NULL, append = TRUE)
addregion2string(seq, region = NULL, append = TRUE)
seq |
|
region |
|
append |
indicate if region should be appended or overwritten [default: TRUE] |
An object of class DNAStringSet
or AAStringSet
Kristian K Ullrich
addmask2string
,
addpop2string
,
addpos2string
## load example sequence data data(iupac, package="MSA2dist") iupac.aa <- iupac |> cds2aa(shorten = TRUE) ## create region region1 <- IRanges::IRanges(start=c(1,41), end=c(20,50)) ## add region iupac.aa <- iupac.aa |> addregion2string(region=region1) #(iupac.aa |> slot("metadata"))$region iupac.aa |> region() ## append region region2 <- IRanges::IRanges(start=c(21), end=c(30)) iupac.aa <- iupac.aa |> addregion2string(region=region2) #(iupac.aa |> slot("metadata"))$region iupac.aa |> region() ## overwrite region iupac.aa <- iupac.aa |> addregion2string(region=region2, append=FALSE) #(iupac.aa |> slot("metadata"))$region iupac.aa |> region() ## reduce by region #iupac.aa.region <- iupac.aa |> string2region(region= # (iupac.aa |> slot("metadata"))$region) iupac.aa.region <- iupac.aa |> string2region(region= region(iupac.aa)) #iupac.aa.region |> slot("metadata") iupac.aa.region |> region()
## load example sequence data data(iupac, package="MSA2dist") iupac.aa <- iupac |> cds2aa(shorten = TRUE) ## create region region1 <- IRanges::IRanges(start=c(1,41), end=c(20,50)) ## add region iupac.aa <- iupac.aa |> addregion2string(region=region1) #(iupac.aa |> slot("metadata"))$region iupac.aa |> region() ## append region region2 <- IRanges::IRanges(start=c(21), end=c(30)) iupac.aa <- iupac.aa |> addregion2string(region=region2) #(iupac.aa |> slot("metadata"))$region iupac.aa |> region() ## overwrite region iupac.aa <- iupac.aa |> addregion2string(region=region2, append=FALSE) #(iupac.aa |> slot("metadata"))$region iupac.aa |> region() ## reduce by region #iupac.aa.region <- iupac.aa |> string2region(region= # (iupac.aa |> slot("metadata"))$region) iupac.aa.region <- iupac.aa |> string2region(region= region(iupac.aa)) #iupac.aa.region |> slot("metadata") iupac.aa.region |> region()
This function converts a seqinr
alignment
into
an AAStringSet
.
aln2aastring(aln)
aln2aastring(aln)
aln |
|
An object of class AAStringSet
Kristian K Ullrich
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) #aastring2aln(cds2aa(cds1.cds2.aln)) cds1.cds2.aln |> cds2aa() |> aastring2aln() |> aln2aastring()
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) #aastring2aln(cds2aa(cds1.cds2.aln)) cds1.cds2.aln |> cds2aa() |> aastring2aln() |> aln2aastring()
This function converts a seqinr
alignment
into
an DNAStringSet
.
aln2dnastring(aln)
aln2dnastring(aln)
aln |
|
An object of class DNAStringSet
Kristian K Ullrich
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) ## convert into alignment #dnastring2aln(cds1.cds2.aln) cds1.cds2.aln |> dnastring2aln() ## convert back into DNAStringSet #aln2dnastring(dnastring2aln(cds1.cds2.aln)) cds1.cds2.aln |> dnastring2aln() |> aln2dnastring()
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) ## convert into alignment #dnastring2aln(cds1.cds2.aln) cds1.cds2.aln |> dnastring2aln() ## convert back into DNAStringSet #aln2dnastring(dnastring2aln(cds1.cds2.aln)) cds1.cds2.aln |> dnastring2aln() |> aln2dnastring()
This function translates a DNAStringSet
into an
AAStringSet
.
cds2aa( cds, shorten = FALSE, frame = 1, framelist = NULL, genetic.code = NULL, return.cds = FALSE )
cds2aa( cds, shorten = FALSE, frame = 1, framelist = NULL, genetic.code = NULL, return.cds = FALSE )
cds |
|
shorten |
shorten all sequences to multiple of three [default: FALSE] |
frame |
indicates the first base of a the first codon [default: 1] |
framelist |
supply vector of frames for each entry [default: NULL] |
genetic.code |
The genetic code to use for the translation of codons into Amino Acid letters [default: NULL] |
return.cds |
return shorten cds instead of aa [default: FALSE] |
AAStringSet
Kristian K Ullrich
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) #cds2aa(cds1.cds2.aln) cds1.cds2.aln |> cds2aa() ## alternative genetic code data(woodmouse, package="ape") #cds2aa(dnabin2dnastring(woodmouse), shorten=TRUE) woodmouse |> dnabin2dnastring() |> cds2aa(shorten=TRUE) #cds2aa(dnabin2dnastring(woodmouse), shorten=TRUE, #genetic.code=Biostrings::getGeneticCode("2")) woodmouse |> dnabin2dnastring() |> cds2aa(shorten=TRUE, genetic.code=Biostrings::getGeneticCode("2")) woodmouse |> dnabin2dnastring() |> cds2aa(shorten=TRUE, return.cds=TRUE) |> cds2aa(genetic.code=Biostrings::getGeneticCode("2"))
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) #cds2aa(cds1.cds2.aln) cds1.cds2.aln |> cds2aa() ## alternative genetic code data(woodmouse, package="ape") #cds2aa(dnabin2dnastring(woodmouse), shorten=TRUE) woodmouse |> dnabin2dnastring() |> cds2aa(shorten=TRUE) #cds2aa(dnabin2dnastring(woodmouse), shorten=TRUE, #genetic.code=Biostrings::getGeneticCode("2")) woodmouse |> dnabin2dnastring() |> cds2aa(shorten=TRUE, genetic.code=Biostrings::getGeneticCode("2")) woodmouse |> dnabin2dnastring() |> cds2aa(shorten=TRUE, return.cds=TRUE) |> cds2aa(genetic.code=Biostrings::getGeneticCode("2"))
This function takes two single sequence DNAString
's or
two single sequence DNAStringSet
's, converts them into aa, calculates
a global alignment and converts this alignment back into a codon alignment.
cds2codonaln( cds1, cds2, type = "global", substitutionMatrix = "BLOSUM62", gapOpening = 10, gapExtension = 0.5, remove.gaps = FALSE, ... )
cds2codonaln( cds1, cds2, type = "global", substitutionMatrix = "BLOSUM62", gapOpening = 10, gapExtension = 0.5, remove.gaps = FALSE, ... )
cds1 |
single sequence |
cds2 |
single sequence |
type |
type of alignment (see
|
substitutionMatrix |
substitution matrix representing the fixed
substitution scores for an alignment (see
|
gapOpening |
the cost for opening a gap in the alignment (see
|
gapExtension |
the incremental cost incurred along the length of the
gap in the alignment (see |
remove.gaps |
specify if gaps in the codon alignment should be removed [default: FALSE] |
... |
other cds2aa parameters |
codon alignment as DNAStringSet
Kristian K Ullrich
Pagès, H et al. (2014) Biostrings: Efficient manipulation of biological strings. R package version, 2(0).
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATGCATTGC") cds2codonaln(cds1, cds2)
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATGCATTGC") cds2codonaln(cds1, cds2)
This function takes two sequences as DNAStringSet
,
and their corresponding AAStringSet
, calculates
a global alignment and converts this alignment back into a codon alignment.
cdsstring2codonaln( cds, aa, type = "global", substitutionMatrix = "BLOSUM62", gapOpening = 10, gapExtension = 0.5, remove.gaps = FALSE )
cdsstring2codonaln( cds, aa, type = "global", substitutionMatrix = "BLOSUM62", gapOpening = 10, gapExtension = 0.5, remove.gaps = FALSE )
cds |
two sequences |
aa |
two sequences |
type |
type of alignment (see
|
substitutionMatrix |
substitution matrix representing the fixed
substitution scores for an alignment (see
|
gapOpening |
the cost for opening a gap in the alignment (see
|
gapExtension |
the incremental cost incurred along the length of the
gap in the alignment (see |
remove.gaps |
specify if gaps in the codon alignment should be removed [default: FALSE] |
codon alignment as DNAStringSet
Kristian K Ullrich
Pagès, H et al. (2014) Biostrings: Efficient manipulation of biological strings. R package version, 2(0).
## define two cds sequences cds <- Biostrings::DNAStringSet(c("ATGCAACATTGC", "ATGCATTGC")) names(cds) <- c("cds1", "cds2") ## get protein alignment aa <- MSA2dist::cds2aa(cds) cdsstring2codonaln(cds, aa)
## define two cds sequences cds <- Biostrings::DNAStringSet(c("ATGCAACATTGC", "ATGCATTGC")) names(cds) <- c("cds1", "cds2") ## get protein alignment aa <- MSA2dist::cds2aa(cds) cdsstring2codonaln(cds, aa)
This function converts a codon
into a number
,
but accept N and -.
codon2numberAMBIG(codon)
codon2numberAMBIG(codon)
codon |
[mandatory] |
An object of class numeric
Kristian K Ullrich
#unlist(lapply(names(Biostrings::GENETIC_CODE), codon2numberAMBIG)) names(Biostrings::GENETIC_CODE) |> codon2numberAMBIG()
#unlist(lapply(names(Biostrings::GENETIC_CODE), codon2numberAMBIG)) names(Biostrings::GENETIC_CODE) |> codon2numberAMBIG()
This function converts a codon
into a number
.
codon2numberTCAG(codon)
codon2numberTCAG(codon)
codon |
[mandatory] |
An object of class numeric
Kristian K Ullrich
#unlist(lapply(names(Biostrings::GENETIC_CODE), codon2numberTCAG)) names(Biostrings::GENETIC_CODE) |> codon2numberTCAG()
#unlist(lapply(names(Biostrings::GENETIC_CODE), codon2numberTCAG)) names(Biostrings::GENETIC_CODE) |> codon2numberTCAG()
This function calculates pn/ps according to Nei and Gojobori (1986).
codonmat2pnps(codonmat)
codonmat2pnps(codonmat)
codonmat |
|
An object of class pnps
which is a list with the following
components:seq1
sequence1 nameseq2
sequence2 nameCodons
sequence2 nameCompared
sequence2 nameAmbigiuous
sequence2 nameIndels
sequence2 nameNs
sequence2 nameSd
sequence2 nameSn
sequence2 nameS
sequence2 nameN
sequence2 nameps
sequence2 namepn
sequence2 namepnps
sequence2 nameds
sequence2 namedn
sequence2 namednds
sequence2 name
Kristian K Ullrich
Nei and Gojobori. (1986) Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol., 3(5), 418-426.
Ganeshan et al. (1997) Human immunodeficiency virus type 1 genetic evolution in children with different rates of development of disease. J. Virology. 71(1), 663-677.
Yang et al. (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 155(1), 431-449.
## load example sequence data data("hiv", package="MSA2dist") #codonmat2pnps(dnastring2codonmat(hiv)[,c(1, 2)]) (hiv |> dnastring2codonmat())[,c(1, 2)] |> codonmat2pnps()
## load example sequence data data("hiv", package="MSA2dist") #codonmat2pnps(dnastring2codonmat(hiv)[,c(1, 2)]) (hiv |> dnastring2codonmat())[,c(1, 2)] |> codonmat2pnps()
This function calculates average behavior of each codon for all pairwise comparisons for indels, syn, and nonsyn mutations according to Nei and Gojobori (1986).
codonmat2xy(codonmat, threads = 1)
codonmat2xy(codonmat, threads = 1)
codonmat |
|
threads |
number of parallel threads [default: 1] |
A data.frame
object with the following components:Codon
Codon indexn
number of comparisonSynSum
Sum of synNonSynSum
Sum of nonsynIndelSum
Sum of indelsSynMean
average syn per codonNonSynMean
average nonsyn per codonIndelMean
average indels per codonCumSumSynMean
cumulative average syn per codonCumSumNonSynMean
cumulative average nonsyn per codonCumSumIndelMean
cumulative indels per codon
Kristian K Ullrich
Nei and Gojobori. (1986) Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol., 3(5), 418-426.
Ganeshan et al. (1997) Human immunodeficiency virus type 1 genetic evolution in children with different rates of development of disease. J. Virology. 71(1), 663-677.
Yang et al. (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 155(1), 431-449.
dnastring2codonmat
codonmat2pnps
dnastring2kaks
kaks
## load example sequence data data("hiv", package="MSA2dist") #codonmat2xy(dnastring2codonmat(hiv)) hiv |> dnastring2codonmat() |> codonmat2xy() #codonmat2xy(dnastring2codonmat(hiv), threads=2) hiv |> dnastring2codonmat() |> codonmat2xy(threads=2)
## load example sequence data data("hiv", package="MSA2dist") #codonmat2xy(dnastring2codonmat(hiv)) hiv |> dnastring2codonmat() |> codonmat2xy() #codonmat2xy(dnastring2codonmat(hiv), threads=2) hiv |> dnastring2codonmat() |> codonmat2xy(threads=2)
This function compares two codons and returns the number of syn and non-syn sites according to Nei and Gojobori (1986).
compareCodons(codA, codB)
compareCodons(codA, codB)
codA |
|
codB |
|
vector of syn and non-syn sites
Kristian K Ullrich
Nei and Gojobori. (1986) Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol., 3(5), 418-426.
Ganeshan et al. (1997) Human immunodeficiency virus type 1 genetic evolution in children with different rates of development of disease. J. Virology. 71(1), 663-677.
Yang et al. (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 155(1), 431-449.
compareCodons("AAA","TTA") compareCodons("AAA","TAT") compareCodons("AAA","ATT") compareCodons("AAA","TTT") ## load example sequence data data("hiv", package="MSA2dist") compareCodons(dnastring2codonmat(hiv)[1,1], dnastring2codonmat(hiv)[1,2])
compareCodons("AAA","TTA") compareCodons("AAA","TAT") compareCodons("AAA","ATT") compareCodons("AAA","TTT") ## load example sequence data data("hiv", package="MSA2dist") compareCodons(dnastring2codonmat(hiv)[1,1], dnastring2codonmat(hiv)[1,2])
This function converts an ape
DNAbin
into a
DNAStringSet
.
dnabin2dnastring(dnabin)
dnabin2dnastring(dnabin)
dnabin |
|
An object of class DNAStringSet
Kristian K Ullrich
as.alignment
as.DNAbin.alignment
DNAStringSet
data(woodmouse, package="ape") ## convert into DNAStringSet #dnabin2dnastring(woodmouse) woodmouse |> dnabin2dnastring()
data(woodmouse, package="ape") ## convert into DNAStringSet #dnabin2dnastring(woodmouse) woodmouse |> dnabin2dnastring()
This function converts a DNAStringSet
into an
seqinr
alignment
.
dnastring2aln(dna)
dnastring2aln(dna)
dna |
|
An object of class alignment
which is a list with the
following components:nb
the number of aligned sequencesnam
a vector of strings containing the names of the aligned
sequencesseq
a vector of strings containing the aligned sequencescom
a vector of strings containing the commentaries for each sequence
or NA
if there are no comments
Kristian K Ullrich
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) ## convert into alignment #dnastring2aln(cds1.cds2.aln) cds1.cds2.aln |> dnastring2aln()
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) ## convert into alignment #dnastring2aln(cds1.cds2.aln) cds1.cds2.aln |> dnastring2aln()
This function converts a DNAStringSet
into a
codon matrix
.
dnastring2codonmat(cds, shorten = FALSE, frame = 1, framelist = NULL)
dnastring2codonmat(cds, shorten = FALSE, frame = 1, framelist = NULL)
cds |
|
shorten |
shorten all sequences to multiple of three [default: FALSE] |
frame |
indicates the first base of a the first codon [default: 1] |
framelist |
supply vector of frames for each entry [default: NULL] |
An object of class alignment
which is a list with the
following components:nb
the number of aligned sequencesnam
a vector of strings containing the names of the aligned
sequencesseq
a vector of strings containing the aligned sequencescom
a vector of strings containing the commentaries for each sequence
or NA
if there are no comments
Kristian K Ullrich
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) ## convert into alignment #dnastring2codonmat(cds1.cds2.aln) cds1.cds2.aln |> dnastring2codonmat() ## use frame 2 and shorten to circumvent multiple of three error cds1 <- Biostrings::DNAString("-ATGCAACATTGC-") cds2 <- Biostrings::DNAString("-ATG---CATTGC-") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) cds1.cds2.aln |> dnastring2codonmat(frame=2, shorten=TRUE)
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) ## convert into alignment #dnastring2codonmat(cds1.cds2.aln) cds1.cds2.aln |> dnastring2codonmat() ## use frame 2 and shorten to circumvent multiple of three error cds1 <- Biostrings::DNAString("-ATGCAACATTGC-") cds2 <- Biostrings::DNAString("-ATG---CATTGC-") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) cds1.cds2.aln |> dnastring2codonmat(frame=2, shorten=TRUE)
This function calculates pairwise distances for all
combinations of a DNAStringSet
.
dnastring2dist( dna, model = "IUPAC", threads = 1, symmetric = TRUE, score = NULL, mask = NULL, region = NULL, ... )
dnastring2dist( dna, model = "IUPAC", threads = 1, symmetric = TRUE, score = NULL, mask = NULL, region = NULL, ... )
dna |
|
model |
specify model either "IUPAC" or any model from
|
threads |
number of parallel threads [default: 1] |
symmetric |
symmetric score matrix [default: TRUE] |
score |
|
mask |
|
region |
|
... |
other |
A data.frame of pairwise distance values distSTRING
and
sites used sitesUsed
Kristian K Ullrich
## load example sequence data data("hiv", package="MSA2dist") #dnastring2dist(hiv, model="IUPAC") hiv |> dnastring2dist(model="IUPAC") #dnastring2dist(hiv, model="K80") hiv |> dnastring2dist(model="K80") data("woodmouse", package="ape") #dnastring2dist(dnabin2dnastring(woodmouse), score=iupacMatrix()) woodmouse |> dnabin2dnastring() |> dnastring2dist() #dnastring2dist(hiv, model = "IUPAC", threads = 2) hiv |> dnastring2dist(model = "IUPAC", threads = 2) ## create mask mask1 <- IRanges::IRanges(start=c(1,61,121), end=c(30,90,150)) ## use mask hiv |> dnastring2dist(model="IUPAC", mask=mask1) ## use region region1 <- IRanges::IRanges(start=c(1,139), end=c(75,225)) hiv |> dnastring2dist(model="IUPAC", region=region1) ## use mask and region hiv |> dnastring2dist(model="IUPAC", mask=mask1, region=region1) ## use asymmetric score matrix myscore <- iupacMatrix() myscore[1, 4] <- 0.5 (hiv |> dnastring2dist(score=myscore, symmetric=FALSE))$distSTRING[1:2, 1:2]
## load example sequence data data("hiv", package="MSA2dist") #dnastring2dist(hiv, model="IUPAC") hiv |> dnastring2dist(model="IUPAC") #dnastring2dist(hiv, model="K80") hiv |> dnastring2dist(model="K80") data("woodmouse", package="ape") #dnastring2dist(dnabin2dnastring(woodmouse), score=iupacMatrix()) woodmouse |> dnabin2dnastring() |> dnastring2dist() #dnastring2dist(hiv, model = "IUPAC", threads = 2) hiv |> dnastring2dist(model = "IUPAC", threads = 2) ## create mask mask1 <- IRanges::IRanges(start=c(1,61,121), end=c(30,90,150)) ## use mask hiv |> dnastring2dist(model="IUPAC", mask=mask1) ## use region region1 <- IRanges::IRanges(start=c(1,139), end=c(75,225)) hiv |> dnastring2dist(model="IUPAC", region=region1) ## use mask and region hiv |> dnastring2dist(model="IUPAC", mask=mask1, region=region1) ## use asymmetric score matrix myscore <- iupacMatrix() myscore[1, 4] <- 0.5 (hiv |> dnastring2dist(score=myscore, symmetric=FALSE))$distSTRING[1:2, 1:2]
This function converts a DNAStringSet
into an
ape
DNAbin
.
dnastring2dnabin(dna)
dnastring2dnabin(dna)
dna |
|
An object of class DNAbin
Kristian K Ullrich
as.alignment
as.DNAbin.alignment
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) ## convert into DNAbin #dnastring2dnabin(cds1.cds2.aln) cds1.cds2.aln |> dnastring2dnabin()
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) ## convert into DNAbin #dnastring2dnabin(cds1.cds2.aln) cds1.cds2.aln |> dnastring2dnabin()
This function calculates Ka/Ks (pN/pS)
for all combinations of a DNAStringSet
.
If the sequences in the DNAStringSet
are not a multiple-sequence
alignment, pairwise codon alignments can be calculated on the fly.
Models used and implemented according to
Li (1993) (using seqinr
) or
Nei and Gojobori (1986) (own implementation) or models from
KaKs_Calculator2
ported to MSA2dist
with Rcpp
.
dnastring2kaks( cds, model = "Li", threads = 1, isMSA = TRUE, sgc = "1", verbose = FALSE, ... )
dnastring2kaks( cds, model = "Li", threads = 1, isMSA = TRUE, sgc = "1", verbose = FALSE, ... )
cds |
|
model |
specify codon model either "Li" or "NG86" or
one of |
threads |
number of parallel threads [default: 1] |
isMSA |
cds |
sgc |
standard genetic code (for KaKs Calculator models) [default: 1] |
verbose |
verbosity (for KaKs Calculator models) [default: FALSE] |
... |
other codon alignment parameters |
A data.frame
of KaKs
values
Kristian K Ullrich
"MS/MA/GNG/GLWL/GLPB/GMLWL/GMLPB/GYN:" Wang et al. (2010) KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies.Genomics, proteomics & bioinformatics. 8(1), 77-80.
"Li/LWL:" Li et al. (1985) A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol., 2(2), 150-174.
"Li/LPB:" Li (1993). Unbiased estimation of the rates of synonymous and nonsynonymous substitution. Journal of molecular evolution, 36(1), pp.96-99.
"NG86/NG:" Nei and Gojobori. (1986) Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol., 3(5), 418-426.
"LPB:" Pamilo and Bianchi. (1993) Evolution of the Zfx and Zfy genes: Rates and interdependence between genes. Mol. Biol. Evol., 10, 271-281.
"MLWL/MLPB:" Tzeng et al. (2004). Comparison of three methods for estimating rates of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol., 21(12), 2290-2298.
"GY:" Goldman and Yang (1994). A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol., 11(5) 725-736.
"YN:" Yang et al. (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 155(1), 431-449.
"MYN:" Zhang et al. (2006). Computing Ka and Ks with a consideration of unequal transitional substitutions. BMC evolutionary biology, 6(1), 1-10.
"data(hiv):" Ganeshan et al. (1997) Human immunodeficiency virus type 1 genetic evolution in children with different rates of development of disease. J. Virology. 71(1), 663-677.
Wang et al. (2009). gamma-MYN: a new algorithm for estimating Ka and Ks with consideration of variable substitution rates. Biology Direct, 4(1), 1-18.
## load example sequence data data("hiv", package="MSA2dist") #dnastring2kaks(hiv, model="Li") hiv |> dnastring2kaks(model="Li") #dnastring2kaks(hiv, model="NG86") hiv |> dnastring2kaks(model="NG86") #dnastring2kaks(hiv, model="NG86", threads=2) hiv |> dnastring2kaks(model="NG86", threads=2) ## define three unaligned cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATGCATTGC") cds3 <- Biostrings::DNAString("ATGCAATGC") cds_sequences <- Biostrings::DNAStringSet(list(cds1, cds2, cds3)) names(cds_sequences) <- c("cds1", "cds2", "cds3") ## set isMSA to FALSE to automatically create pairwise codon alignments #dnastring2kaks(cds_sequences, model="Li", isMSA=FALSE) cds_sequences |> dnastring2kaks(model="Li", isMSA=FALSE)
## load example sequence data data("hiv", package="MSA2dist") #dnastring2kaks(hiv, model="Li") hiv |> dnastring2kaks(model="Li") #dnastring2kaks(hiv, model="NG86") hiv |> dnastring2kaks(model="NG86") #dnastring2kaks(hiv, model="NG86", threads=2) hiv |> dnastring2kaks(model="NG86", threads=2) ## define three unaligned cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATGCATTGC") cds3 <- Biostrings::DNAString("ATGCAATGC") cds_sequences <- Biostrings::DNAStringSet(list(cds1, cds2, cds3)) names(cds_sequences) <- c("cds1", "cds2", "cds3") ## set isMSA to FALSE to automatically create pairwise codon alignments #dnastring2kaks(cds_sequences, model="Li", isMSA=FALSE) cds_sequences |> dnastring2kaks(model="Li", isMSA=FALSE)
GENETIC_CODE
from Biostrings
extended by codon
number and number of syn sites.
codon2number(codon)
codon2number(codon)
codon |
|
An object of class numeric
Kristian K Ullrich
GENETIC_CODE_TCAG
GENETIC_CODE_TCAG
This function shows the mask slot from a
DNAStringSet
or an AAStringSet
metadata
information.
getmask(seq)
getmask(seq)
seq |
|
IRanges
information from metadata
Kristian K Ullrich
## load example sequence data data(iupac, package="MSA2dist") iupac.aa <- iupac |> cds2aa(shorten = TRUE) ## create mask mask1 <- IRanges::IRanges(start=c(1,41), end=c(20,50)) ## add mask iupac.aa <- iupac.aa |> addmask2string(mask=mask1) #(iupac.aa |> slot("metadata"))$mask iupac.aa |> getmask()
## load example sequence data data(iupac, package="MSA2dist") iupac.aa <- iupac |> cds2aa(shorten = TRUE) ## create mask mask1 <- IRanges::IRanges(start=c(1,41), end=c(20,50)) ## add mask iupac.aa <- iupac.aa |> addmask2string(mask=mask1) #(iupac.aa |> slot("metadata"))$mask iupac.aa |> getmask()
This function shows the position slot from a
DNAStringSet
or an AAStringSet
metadata
information.
getpos(seq)
getpos(seq)
seq |
|
GenomicRanges
information from metadata
Kristian K Ullrich
## load example sequence data data(iupac, package="MSA2dist") ## add position iupac <- iupac |> addpos2string(chrom="chr1", start=1, end=1000) #(iupac |> slot("metadata"))$GRanges iupac |> getpos()
## load example sequence data data(iupac, package="MSA2dist") ## add position iupac <- iupac |> addpos2string(chrom="chr1", start=1, end=1000) #(iupac |> slot("metadata"))$GRanges iupac |> getpos()
This function returns a DNAStringSet
reduced by all
sites containing any gaps ("-", "+", ".") or missing ("N") sites.
globalDeletion(dna)
globalDeletion(dna)
dna |
|
DNAStringSet
Kristian K Ullrich
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) globalDeletion(cds1.cds2.aln)
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) globalDeletion(cds1.cds2.aln)
This function returns an AAStringSet
reduced by all
sites containing any gaps ("-", "+", ".") or missing ("X") sites.
globalDeletionAA(aa)
globalDeletionAA(aa)
aa |
|
AAStringSet
Kristian K Ullrich
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) #globalDeletionAA(cds2aa(cds1.cds2.aln)) cds1.cds2.aln |> cds2aa() |> globalDeletionAA()
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) #globalDeletionAA(cds2aa(cds1.cds2.aln)) cds1.cds2.aln |> cds2aa() |> globalDeletionAA()
This function creates a granthamMatrix
object to be used
with the rcpp_distSTRING
function. By default,the grantham matrix is
defined as from Grantham 1974
.
(see https://link.springer.com/article/10.1007/s00335-017-9704-9)
granthamMatrix()
granthamMatrix()
matrix
Kristian K Ullrich
Grantham R. (1974). Amino Acid Difference Formula to Help Explain Protein Evolution. Science,185(4154),862-864.
granthamMatrix()
granthamMatrix()
Example cds sequences from HIV-1 sample 136 patient 1 from
Sweden envelope glycoprotein (env) gene, V3 region as DNAStringSet
.
data(hiv)
data(hiv)
an object of class DNAStringSet
see XStringSet-class
Yang et al. (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 155(1), 431-449.
data("hiv", package="MSA2dist")
data("hiv", package="MSA2dist")
This function calculates Ka/Ks (pN/pS)
for all combinations given in an indices list
of a
DNAStringSet
.
If the sequences in the DNAStringSet
are not a multiple-sequence
alignment, pairwise codon alignments can be calculated on the fly.
Models used and implemented according to
Li (1993) (using seqinr
) or
Nei and Gojobori (1986) (own implementation) or models from
KaKs_Calculator2
ported to MSA2dist
with Rcpp
.
indices2kaks( cds, indices, model = "Li", threads = 1, isMSA = TRUE, sgc = "1", verbose = FALSE, ... )
indices2kaks( cds, indices, model = "Li", threads = 1, isMSA = TRUE, sgc = "1", verbose = FALSE, ... )
cds |
|
indices |
|
model |
specify codon model either "Li" or "NG86" or
one of |
threads |
number of parallel threads [default: 1] |
isMSA |
cds |
sgc |
standard genetic code (for KaKs Calculator models) [default: 1] |
verbose |
verbosity (for KaKs Calculator models) [default: FALSE] |
... |
other codon alignment parameters |
A data.frame
of KaKs
values
Kristian K Ullrich
"MS/MA/GNG/GLWL/GLPB/GMLWL/GMLPB/GYN:" Wang et al. (2010) KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies.Genomics, proteomics & bioinformatics. 8(1), 77-80.
"Li/LWL:" Li et al. (1985) A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol., 2(2), 150-174.
"Li/LPB:" Li (1993). Unbiased estimation of the rates of synonymous and nonsynonymous substitution. Journal of molecular evolution, 36(1), pp.96-99.
"NG86/NG:" Nei and Gojobori. (1986) Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol., 3(5), 418-426.
"LPB:" Pamilo and Bianchi. (1993) Evolution of the Zfx and Zfy genes: Rates and interdependence between genes. Mol. Biol. Evol., 10, 271-281.
"MLWL/MLPB:" Tzeng et al. (2004). Comparison of three methods for estimating rates of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol., 21(12), 2290-2298.
"GY:" Goldman and Yang (1994). A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol., 11(5) 725-736.
"YN:" Yang et al. (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 155(1), 431-449.
"MYN:" Zhang et al. (2006). Computing Ka and Ks with a consideration of unequal transitional substitutions. BMC evolutionary biology, 6(1), 1-10.
"data(hiv):" Ganeshan et al. (1997) Human immunodeficiency virus type 1 genetic evolution in children with different rates of development of disease. J. Virology. 71(1), 663-677.
Wang et al. (2009). gamma-MYN: a new algorithm for estimating Ka and Ks with consideration of variable substitution rates. Biology Direct, 4(1), 1-18.
## load example sequence data data("hiv", package="MSA2dist") ## create indices idx <- list(c(2, 3), c(5,7,9)) #indices2kaks(hiv, idx, model="Li") hiv |> indices2kaks(idx, model="Li") #indices2kaks(hiv, idx, model="NG86") hiv |> indices2kaks(idx, model="NG86") #indices2kaks(hiv, idx, model="NG86", threads=2) hiv |> indices2kaks(idx, model="NG86", threads=2) ## define three unaligned cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATGCATTGC") cds3 <- Biostrings::DNAString("ATGCAATGC") cds_sequences <- Biostrings::DNAStringSet(list(cds1, cds2, cds3)) names(cds_sequences) <- c("cds1", "cds2", "cds3") ## create indices idx <- list(c(1, 2), c(1,3)) ## set isMSA to FALSE to automatically create pairwise codon alignments #indices2kaks(cds_sequences, idx, model="Li", isMSA=FALSE) cds_sequences |> indices2kaks(idx, model="Li", isMSA=FALSE)
## load example sequence data data("hiv", package="MSA2dist") ## create indices idx <- list(c(2, 3), c(5,7,9)) #indices2kaks(hiv, idx, model="Li") hiv |> indices2kaks(idx, model="Li") #indices2kaks(hiv, idx, model="NG86") hiv |> indices2kaks(idx, model="NG86") #indices2kaks(hiv, idx, model="NG86", threads=2) hiv |> indices2kaks(idx, model="NG86", threads=2) ## define three unaligned cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATGCATTGC") cds3 <- Biostrings::DNAString("ATGCAATGC") cds_sequences <- Biostrings::DNAStringSet(list(cds1, cds2, cds3)) names(cds_sequences) <- c("cds1", "cds2", "cds3") ## create indices idx <- list(c(1, 2), c(1,3)) ## set isMSA to FALSE to automatically create pairwise codon alignments #indices2kaks(cds_sequences, idx, model="Li", isMSA=FALSE) cds_sequences |> indices2kaks(idx, model="Li", isMSA=FALSE)
Example IUPAC sequences created with angsd
from different house mouse (Mus musculus) sub-populations
from Harr et al. (2016) DNAStringSet
.
data(iupac)
data(iupac)
an object of class DNAStringSet
see XStringSet-class
Harr et al. (2016) Genomic resources for wild populations of the house mouse, Mus musculus and its close relative Mus spretus. Scientific data. 3(1), 1-14.
data("iupac", package="MSA2dist")
data("iupac", package="MSA2dist")
This function creates a iupacMatrix
object to be used
with the rcpp_distSTRING
function. By default, the iupac matrix
is defined as literal distance obtained from Chang et al. 2017
.
(see https://link.springer.com/article/10.1007/s00335-017-9704-9)
iupacMatrix()
iupacMatrix()
score matrix
Kristian K Ullrich
Chang,P. L.,Kopania,E.,Keeble,S.,Sarver,B. A.,Larson, E.,Orth,A.,... & Dean,M. D. (2017). Whole exome sequencing of wild-derived inbred strains of mice improves power to link phenotype and genotype. Mammalian genome,28(9-10),416-425.
iupacMatrix()
iupacMatrix()
This function is a fork from an internal function from
Biostrings
makePostalignedSeqs(x)
makePostalignedSeqs(x)
x |
x |
get internal function makePostalignedSeqs
Kristian K Ullrich
pairwiseAlignment
,
cds2codonaln
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATGCATTGC") makePostalignedSeqs(pwalign::pairwiseAlignment( cds2aa(Biostrings::DNAStringSet(cds1)), cds2aa(Biostrings::DNAStringSet(cds2))))
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATGCATTGC") makePostalignedSeqs(pwalign::pairwiseAlignment( cds2aa(Biostrings::DNAStringSet(cds1)), cds2aa(Biostrings::DNAStringSet(cds2))))
This function takes an AAStringSet
alignment and
its corresponding coding sequences DNAStringSet
and converts
the protein alignment into a codon alignment.
pal2nal(pal, nal, remove.gaps = FALSE)
pal2nal(pal, nal, remove.gaps = FALSE)
pal |
|
nal |
|
remove.gaps |
specify if gaps in the codon alignment should be removed [default: FALSE] |
codon alignment as DNAStringSet
Kristian K Ullrich
Pagès, H et al. (2014) Biostrings: Efficient manipulation of biological strings. R package version, 2(0).
## define two cds sequences cds <- Biostrings::DNAStringSet(c("ATGCAACATTGC", "ATGCATTGC")) names(cds) <- c("cds1", "cds2") ## get protein alignment aa <- MSA2dist::cds2aa(cds) msa <- makePostalignedSeqs(pwalign::pairwiseAlignment(aa[1], aa[2]))[[1L]] names(msa) <- names(aa) ## get codon alignment nal <- MSA2dist::pal2nal(pal=msa, nal=cds) nal
## define two cds sequences cds <- Biostrings::DNAStringSet(c("ATGCAACATTGC", "ATGCATTGC")) names(cds) <- c("cds1", "cds2") ## get protein alignment aa <- MSA2dist::cds2aa(cds) msa <- makePostalignedSeqs(pwalign::pairwiseAlignment(aa[1], aa[2]))[[1L]] names(msa) <- names(aa) ## get codon alignment nal <- MSA2dist::pal2nal(pal=msa, nal=cds) nal
This function shows the population integer slot from a
DNAStringSet
or an AAStringSet
metadata
information.
popinteger(seq)
popinteger(seq)
seq |
|
population integer from metadata
Kristian K Ullrich
## load example sequence data data(iupac, package="MSA2dist") iupac.aa <- iupac |> cds2aa(shorten = TRUE) ## create poplist poplist <- list(FRA = grep("Mmd.FRA", names(iupac)), GER = grep("Mmd.GER", names(iupac)), IRA = grep("Mmd.IRA", names(iupac)), AFG = grep("Mmm.AFG", names(iupac))) iupac.aa <- iupac.aa |> addpop2string(poplist) popinteger(iupac.aa)
## load example sequence data data(iupac, package="MSA2dist") iupac.aa <- iupac |> cds2aa(shorten = TRUE) ## create poplist poplist <- list(FRA = grep("Mmd.FRA", names(iupac)), GER = grep("Mmd.GER", names(iupac)), IRA = grep("Mmd.IRA", names(iupac)), AFG = grep("Mmm.AFG", names(iupac))) iupac.aa <- iupac.aa |> addpop2string(poplist) popinteger(iupac.aa)
This function shows the population names slot from a
DNAStringSet
or an AAStringSet
metadata
information.
popnames(seq)
popnames(seq)
seq |
|
population names from metadata
Kristian K Ullrich
## load example sequence data data(iupac, package="MSA2dist") iupac.aa <- iupac |> cds2aa(shorten = TRUE) ## create poplist poplist <- list(FRA = grep("Mmd.FRA", names(iupac)), GER = grep("Mmd.GER", names(iupac)), IRA = grep("Mmd.IRA", names(iupac)), AFG = grep("Mmm.AFG", names(iupac))) iupac.aa <- iupac.aa |> addpop2string(poplist) popnames(iupac.aa)
## load example sequence data data(iupac, package="MSA2dist") iupac.aa <- iupac |> cds2aa(shorten = TRUE) ## create poplist poplist <- list(FRA = grep("Mmd.FRA", names(iupac)), GER = grep("Mmd.GER", names(iupac)), IRA = grep("Mmd.IRA", names(iupac)), AFG = grep("Mmm.AFG", names(iupac))) iupac.aa <- iupac.aa |> addpop2string(poplist) popnames(iupac.aa)
calculates pairwise distances using a score matrix
rcpp_distSTRING(dnavector, scoreMatrix, ncores = 1L, symmetric = 1L)
rcpp_distSTRING(dnavector, scoreMatrix, ncores = 1L, symmetric = 1L)
dnavector |
StringVector [mandatory] |
scoreMatrix |
NumericMatrix [mandatory] |
ncores |
number of cores [default: 1] |
symmetric |
symmetric score matrix [default: 1] |
list
Kristian K Ullrich
## load example sequence data data("hiv", package="MSA2dist") rcpp_distSTRING(dnavector=as.character(hiv), scoreMatrix=iupacMatrix())
## load example sequence data data("hiv", package="MSA2dist") rcpp_distSTRING(dnavector=as.character(hiv), scoreMatrix=iupacMatrix())
calculates KaKs as implememted in
KaKs Calculator 2.0 MSA2dist
with Rcpp
.
rcpp_KaKs(cdsstr, sgc = "1", method = "YN", verbose = FALSE)
rcpp_KaKs(cdsstr, sgc = "1", method = "YN", verbose = FALSE)
cdsstr |
StringVector [mandatory] |
sgc |
standard genetic code to use [default: 1] |
method |
KaKs Calculator 2.0 codon model [default: YN] |
verbose |
specify if verbose output [default: FALSE] |
list
Kristian K Ullrich
Wang et al. (2010) KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies.Genomics, proteomics & bioinformatics. 8(1), 77-80.
## load example sequence data data("hiv", package="MSA2dist") rcpp_KaKs(cdsstr=as.character(hiv[1:3]))
## load example sequence data data("hiv", package="MSA2dist") rcpp_KaKs(cdsstr=as.character(hiv[1:3]))
returns number of AA sites used
rcpp_pairwiseDeletionAA(aavector, ncores = 1L, symmetric = 1L)
rcpp_pairwiseDeletionAA(aavector, ncores = 1L, symmetric = 1L)
aavector |
StringVector [mandatory] |
ncores |
number of cores [default: 1] |
symmetric |
symmetric score matrix [default: 1] |
list
Kristian K Ullrich
## load example sequence data data("hiv", package="MSA2dist") h <- hiv |> cds2aa() |> as.character() rcpp_pairwiseDeletionAA(aavector=h, ncores=1)
## load example sequence data data("hiv", package="MSA2dist") h <- hiv |> cds2aa() |> as.character() rcpp_pairwiseDeletionAA(aavector=h, ncores=1)
returns number of DNA sites used
rcpp_pairwiseDeletionDNA(dnavector, ncores = 1L, symmetric = 1L)
rcpp_pairwiseDeletionDNA(dnavector, ncores = 1L, symmetric = 1L)
dnavector |
StringVector [mandatory] |
ncores |
number of cores [default: 1] |
symmetric |
symmetric score matrix [default: 1] |
list
Kristian K Ullrich
## load example sequence data data("woodmouse", package="ape") w <- woodmouse |> dnabin2dnastring() |> as.character() rcpp_pairwiseDeletionDNA(dnavector=w, ncores=1)
## load example sequence data data("woodmouse", package="ape") w <- woodmouse |> dnabin2dnastring() |> as.character() rcpp_pairwiseDeletionDNA(dnavector=w, ncores=1)
This function shows the region slot from a
DNAStringSet
or an AAStringSet
metadata
information.
region(seq)
region(seq)
seq |
|
region IRanges
object from metadata
Kristian K Ullrich
## load example sequence data data(iupac, package="MSA2dist") iupac.aa <- iupac |> cds2aa(shorten = TRUE) ## create region region1 <- IRanges::IRanges(start=c(1,41), end=c(20,50)) ## add region iupac.aa <- iupac.aa |> addregion2string(region=region1) iupac.aa |> region()
## load example sequence data data(iupac, package="MSA2dist") iupac.aa <- iupac |> cds2aa(shorten = TRUE) ## create region region1 <- IRanges::IRanges(start=c(1,41), end=c(20,50)) ## add region iupac.aa <- iupac.aa |> addregion2string(region=region1) iupac.aa |> region()
This function shows the region used slot from a
DNAStringSet
or an AAStringSet
metadata
information.
regionused(seq)
regionused(seq)
seq |
|
population names from metadata
Kristian K Ullrich
## load example sequence data data("hiv", package="MSA2dist") ## create mask mask1 <- IRanges::IRanges(start=c(11,41,71), end=c(20,50,80)) ## use mask hiv.region <- hiv |> cds2aa() |> string2region(mask=mask1) #(hiv.region |> slot("metadata"))$regionUsed hiv.region |> regionused()
## load example sequence data data("hiv", package="MSA2dist") ## create mask mask1 <- IRanges::IRanges(start=c(11,41,71), end=c(20,50,80)) ## use mask hiv.region <- hiv |> cds2aa() |> string2region(mask=mask1) #(hiv.region |> slot("metadata"))$regionUsed hiv.region |> regionused()
This function subsets a DNAStringSet
or an
AAStringSet
by a mask
and region
given one or both
options as IRanges
.
string2region(seq, mask = NULL, region = NULL, add = TRUE)
string2region(seq, mask = NULL, region = NULL, add = TRUE)
seq |
|
mask |
|
region |
|
add |
indicate if mask and region should be added to |
A list
object with the following components:DNAStringSet
or AAStringSet
regionUsed
Kristian K Ullrich
## load example sequence data data("hiv", package="MSA2dist") ## create mask mask1 <- IRanges::IRanges(start=c(11,41,71), end=c(20,50,80)) ## use mask hiv.region <- hiv |> cds2aa() |> string2region(mask=mask1) #(hiv.region |> slot("metadata"))$regionUsed hiv.region |> regionused() ## use region region1 <- IRanges::IRanges(start=c(1,75), end=c(45,85)) hiv.region <- hiv |> cds2aa() |> string2region(region=region1) #(hiv.region |> slot("metadata"))$regionUsed hiv.region |> regionused() ## use mask and region hiv.region <- hiv |> cds2aa() |> string2region(mask=mask1, region=region1) #(hiv.region |> slot("metadata"))$regionUsed hiv.region |> regionused()
## load example sequence data data("hiv", package="MSA2dist") ## create mask mask1 <- IRanges::IRanges(start=c(11,41,71), end=c(20,50,80)) ## use mask hiv.region <- hiv |> cds2aa() |> string2region(mask=mask1) #(hiv.region |> slot("metadata"))$regionUsed hiv.region |> regionused() ## use region region1 <- IRanges::IRanges(start=c(1,75), end=c(45,85)) hiv.region <- hiv |> cds2aa() |> string2region(region=region1) #(hiv.region |> slot("metadata"))$regionUsed hiv.region |> regionused() ## use mask and region hiv.region <- hiv |> cds2aa() |> string2region(mask=mask1, region=region1) #(hiv.region |> slot("metadata"))$regionUsed hiv.region |> regionused()
This function gets a subsequence from a DNAString
,
RNAString
, AAString
, BString
, DNAStringSet
,
RNAStringSet
, AAStringSet
, BStringSet
object from the Biostrings
package.
subString(x, s, e)
subString(x, s, e)
x |
|
s |
start vector [mandatory] |
e |
end vector [mandatory] |
subsequence of an Biostrings
object
Kristian K Ullrich
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) subString(cds1.cds2.aln, c(1,7), c(3,12))
## define two cds sequences cds1 <- Biostrings::DNAString("ATGCAACATTGC") cds2 <- Biostrings::DNAString("ATG---CATTGC") cds1.cds2.aln <- c(Biostrings::DNAStringSet(cds1), Biostrings::DNAStringSet(cds2)) subString(cds1.cds2.aln, c(1,7), c(3,12))
This function returns upper tri index for usage with
pivot_long
reduction.
uptriidx(n, diag = FALSE)
uptriidx(n, diag = FALSE)
n |
dimension of initial matrix [mandatory] |
diag |
indicate if diag should be retained [default: FALSE] |
list of positions
Kristian K Ullrich
uptriidx(10)
uptriidx(10)