Title: | SynMut: Designing Synonymously Mutated Sequences with Different Genomic Signatures |
---|---|
Description: | There are increasing demands on designing virus mutants with specific dinucleotide or codon composition. This tool can take both dinucleotide preference and/or codon usage bias into account while designing mutants. It is a powerful tool for in silico designs of DNA sequence mutants. |
Authors: | Haogao Gu [aut, cre], Leo L.M. Poon [led] |
Maintainer: | Haogao Gu <[email protected]> |
License: | GPL-2 |
Version: | 1.23.0 |
Built: | 2024-12-30 04:51:29 UTC |
Source: | https://github.com/bioc/SynMut |
We use a least squares approach to estimate the codon usage difference between DNA sequences.
codon_dist(seq, ref) ## S4 method for signature 'ANY' codon_dist(seq, ref)
codon_dist(seq, ref) ## S4 method for signature 'ANY' codon_dist(seq, ref)
seq |
the input DNA sequnece of |
ref |
the reference DNA sequnece of |
idea inspired by "Daniel Macedo de Melo Jorge, Ryan E. Mills, Adam S. Lauring, CodonShuffle: a tool for generating and analyzing synonymously mutated sequences, Virus Evolution, Volume 1, Issue 1, March 2015, vev012, https://doi.org/10.1093/ve/vev012"
vector
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_cu(rgd.seq) mut.seq <- codon_random(rgd.seq) codon_dist(mut.seq, rgd.seq) mut.seq2 <- codon_random(rgd.seq, keep = TRUE) codon_dist(mut.seq2, rgd.seq)
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_cu(rgd.seq) mut.seq <- codon_random(rgd.seq) codon_dist(mut.seq, rgd.seq) mut.seq2 <- codon_random(rgd.seq, keep = TRUE) codon_dist(mut.seq2, rgd.seq)
Mutating the current DNA sequences in the regioned_dna object to mimic a target codon usage pattern.
codon_mimic(object, alt, numcode = 1, ...) ## S4 method for signature 'regioned_dna,vector' codon_mimic(object, alt, numcode) ## S4 method for signature 'regioned_dna,DNAStringSet' codon_mimic(object, alt, numcode) ## S4 method for signature 'DNAStringSet,DNAStringSet' codon_mimic(object, alt, numcode)
codon_mimic(object, alt, numcode = 1, ...) ## S4 method for signature 'regioned_dna,vector' codon_mimic(object, alt, numcode) ## S4 method for signature 'regioned_dna,DNAStringSet' codon_mimic(object, alt, numcode) ## S4 method for signature 'DNAStringSet,DNAStringSet' codon_mimic(object, alt, numcode)
object |
regioned_dna object |
alt |
target codon usage vector or DNAStringSet object representing target codon usage |
numcode |
The ncbi genetic code number for translation. Default value: |
... |
... |
The ideas for codon_mimic
is similar to
codon_to
: first extract the mutable regions and then do the
mutation. However the codons in the fixed (not mutable) regions will also
alter the final codon usage, thus we have to adjust for the fixed codons
when introducing sysnonymous codons. The ideal deisgn for
codon_mimic
is not unique as the swap between positions of the
synonymous codons will not change the codon usage bias.
Details pleas refer to: https://koohoko.github.io/SynMut/algorithm.html
regioned_dna
input_seq
, codon_to
,
codon_random
, dinu_to
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) target <- get_cu(rgd.seq)[2,] new <- codon_mimic(rgd.seq, alt = target) get_cu(new) - get_cu(rgd.seq) target <- Biostrings::DNAStringSet("TTGAAAA-CTC-N--AAG") new <- codon_mimic(rgd.seq, alt = target) get_cu(new) - get_cu(rgd.seq) get_freq(new) - get_freq(rgd.seq) get_rscu(new) - get_rscu(rgd.seq)
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) target <- get_cu(rgd.seq)[2,] new <- codon_mimic(rgd.seq, alt = target) get_cu(new) - get_cu(rgd.seq) target <- Biostrings::DNAStringSet("TTGAAAA-CTC-N--AAG") new <- codon_mimic(rgd.seq, alt = target) get_cu(new) - get_cu(rgd.seq) get_freq(new) - get_freq(rgd.seq) get_rscu(new) - get_rscu(rgd.seq)
Generating random synonymous mutations (in user-defined region), with optionally keeping/not keeping the original codon usage bias.
codon_random(object, n = 1, keep = FALSE, numcode = 1, ...) ## S4 method for signature 'regioned_dna' codon_random(object, n, keep, numcode) ## S4 method for signature 'DNAStringSet' codon_random(object, n, keep, numcode)
codon_random(object, n = 1, keep = FALSE, numcode = 1, ...) ## S4 method for signature 'regioned_dna' codon_random(object, n, keep, numcode) ## S4 method for signature 'DNAStringSet' codon_random(object, n, keep, numcode)
object |
A regioned_dna object. |
n |
Optional n parameter specifying what proportion of the codons to be
mutated. Default value: |
keep |
Logical parameter controling whether keeping the codon usage bias |
numcode |
The ncbi genetic code number for translation. Default value: |
... |
... |
This method randomly sample synonymous codons for n
propotion
of every mutable codons in the sequences. This process will be likely to
alter the codon usage bias of the original sequences. However the
keep = TRUE
argument help to preserve the codon usage bias. It is
done via the synsequence
function in seqinr
package. The
synsequence
function essentially swaps the position of the
synonymous codons without introducing new codons into the original
sequences.
A regioned_dna object containing the mutants; Or a DNAStringSet object if the input is a DNAStringSet object.
input_seq
, dinu_to
,
codon_to
, codon_mimic
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) set.seed(2019) get_cu(codon_random(rgd.seq, n = 0.5)) get_cu(codon_random(rgd.seq))
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) set.seed(2019) get_cu(codon_random(rgd.seq, n = 0.5)) get_cu(codon_random(rgd.seq))
Input string of a codon to either the "max.codon = " or "min.codon = " parameter to maximize or minimize the usage of certain codon in the sequence.
codon_to(object, max.codon = NA, min.codon = NA, ...) ## S4 method for signature 'regioned_dna' codon_to(object, max.codon, min.codon)
codon_to(object, max.codon = NA, min.codon = NA, ...) ## S4 method for signature 'regioned_dna' codon_to(object, max.codon, min.codon)
object |
A regioned_dna object. |
max.codon |
A string of a codon. |
min.codon |
A string of a codon. |
... |
... |
The ideas for this function is simple. We first extract the mutable regions for every sequences, then mutated the synonymous codons of the input to the desired. There will be only one ideal design for the maximization problem, however there may be numerous comparable designs having the same minimal usage of certain codon, as we randomly sample synonymous codon for substitution when solving the minimization problem.
A regioned_dna object.
input_seq
, dinu_to
,
codon_random
, codon_mimic
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_cu(codon_to(rgd.seq, max.codon = "AAC")) - get_cu(rgd.seq) get_cu(codon_to(rgd.seq, min.codon = "AAC")) - get_cu(rgd.seq)
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_cu(codon_to(rgd.seq, max.codon = "AAC")) - get_cu(rgd.seq) get_cu(codon_to(rgd.seq, min.codon = "AAC")) - get_cu(rgd.seq)
We use a least squares approach to estimate the dinucleotide usage difference between DNA sequences
dinu_dist(seq, ref) ## S4 method for signature 'ANY' dinu_dist(seq, ref)
dinu_dist(seq, ref) ## S4 method for signature 'ANY' dinu_dist(seq, ref)
seq |
the input DNA sequnece of |
ref |
the reference DNA sequnece of |
similar method that applied in "Daniel Macedo de Melo Jorge, Ryan E. Mills, Adam S. Lauring, CodonShuffle: a tool for generating and analyzing synonymously mutated sequences, Virus Evolution, Volume 1, Issue 1, March 2015, vev012, https://doi.org/10.1093/ve/vev012"
vector
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_cu(rgd.seq) mut.seq <- codon_random(rgd.seq) dinu_dist(mut.seq, rgd.seq)
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_cu(rgd.seq) mut.seq <- codon_random(rgd.seq) dinu_dist(mut.seq, rgd.seq)
Input string of a dinucleotide to either the "max.dinu = " or "min.codon = " parameter to maximize or minimize the usage of certain codon in the sequence. Using a greedy algorithm with priority given to dinucleotide12 or dinucleotide23.
dinu_to(object, max.dinu = NA, min.dinu = NA, keep = FALSE, numcode = 1, ...) ## S4 method for signature 'regioned_dna' dinu_to(object, max.dinu, min.dinu, keep, numcode)
dinu_to(object, max.dinu = NA, min.dinu = NA, keep = FALSE, numcode = 1, ...) ## S4 method for signature 'regioned_dna' dinu_to(object, max.dinu, min.dinu, keep, numcode)
object |
A regioned_dna object. |
max.dinu |
A string of a dinucleotide. |
min.dinu |
A string of a dinucleotide. |
keep |
A logical varibale stating if the codon usage of the original sequences should be keep. Default: False. |
numcode |
The ncbi genetic code number for translation. Default value: |
... |
... |
The detail strategy for this function please refer to: https://koohoko.github.io/SynMut/algorithm.html
regioned_dna
input_seq
, codon_to
,
codon_random
, codon_mimic
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_du(dinu_to(rgd.seq, max.dinu = "cg")) - get_du(rgd.seq) get_du(dinu_to(rgd.seq, min.dinu = "AA")) - get_du(rgd.seq) get_du(dinu_to(rgd.seq, max.dinu = "cg", keep = TRUE)) - get_du(rgd.seq) get_cu(dinu_to(rgd.seq, max.dinu = "CG", keep = TRUE)) - get_cu(rgd.seq)
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_du(dinu_to(rgd.seq, max.dinu = "cg")) - get_du(rgd.seq) get_du(dinu_to(rgd.seq, min.dinu = "AA")) - get_du(rgd.seq) get_du(dinu_to(rgd.seq, max.dinu = "cg", keep = TRUE)) - get_du(rgd.seq) get_cu(dinu_to(rgd.seq, max.dinu = "CG", keep = TRUE)) - get_cu(rgd.seq)
Access the codon usage matrix
get_cu(object, ...) ## S4 method for signature 'regioned_dna' get_cu(object) ## S4 method for signature 'DNAStringSet' get_cu(object)
get_cu(object, ...) ## S4 method for signature 'regioned_dna' get_cu(object) ## S4 method for signature 'DNAStringSet' get_cu(object)
object |
regioned_dna / DNAStringSet |
... |
... |
matrix
input_seq
, get_region
,
get_nu
, get_du
, get_freq
,
get_rscu
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_cu(rgd.seq)
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_cu(rgd.seq)
Access the DNA sequence data in DNAStringSet.
get_dna(object, ...) ## S4 method for signature 'regioned_dna' get_dna(object)
get_dna(object, ...) ## S4 method for signature 'regioned_dna' get_dna(object)
object |
A regioned_dna object. |
... |
... |
DNAStringSet
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_dna(rgd.seq)
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_dna(rgd.seq)
Access the dinucleotide usage matrix
get_du(object, ...) ## S4 method for signature 'regioned_dna' get_du(object) ## S4 method for signature 'DNAStringSet' get_du(object)
get_du(object, ...) ## S4 method for signature 'regioned_dna' get_du(object) ## S4 method for signature 'DNAStringSet' get_du(object)
object |
regioned_dna / DNAStringSet |
... |
... |
matrix
input_seq
, get_region
,
get_nu
, get_cu
, get_freq
,
get_rscu
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_du(rgd.seq)
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_du(rgd.seq)
Access the synonymous codon usage frequency
get_freq(object, numcode = 1, ...) ## S4 method for signature 'regioned_dna' get_freq(object, numcode) ## S4 method for signature 'DNAStringSet' get_freq(object, numcode) ## S4 method for signature 'matrix' get_freq(object, numcode) ## S4 method for signature 'vector' get_freq(object, numcode)
get_freq(object, numcode = 1, ...) ## S4 method for signature 'regioned_dna' get_freq(object, numcode) ## S4 method for signature 'DNAStringSet' get_freq(object, numcode) ## S4 method for signature 'matrix' get_freq(object, numcode) ## S4 method for signature 'vector' get_freq(object, numcode)
object |
regioned_dna / DNAStringSet / codon usage matrix (vector) |
numcode |
The ncbi genetic code number for translation. Default value: |
... |
... |
matrix
input_seq
, get_region
,
get_cu
, get_du
, get_rscu
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_freq(rgd.seq)
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_freq(rgd.seq)
Access the nucleotide usage matrix
get_nu(object, ...) ## S4 method for signature 'regioned_dna' get_nu(object) ## S4 method for signature 'DNAStringSet' get_nu(object)
get_nu(object, ...) ## S4 method for signature 'regioned_dna' get_nu(object) ## S4 method for signature 'DNAStringSet' get_nu(object)
object |
regioned_dna / DNAStringSet |
... |
... |
matrix
input_seq
, get_region
,
get_cu
, get_du
, get_rscu
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_nu(rgd.seq)
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_nu(rgd.seq)
Access the variable regions
get_region(object, ...) ## S4 method for signature 'regioned_dna' get_region(object)
get_region(object, ...) ## S4 method for signature 'regioned_dna' get_region(object)
object |
regioned_dna |
... |
... |
list
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_region(rgd.seq)
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_region(rgd.seq)
Access the Relative Synonymous Codon Usage rscu
get_rscu(object, numcode = 1, ...) ## S4 method for signature 'regioned_dna' get_rscu(object, numcode) ## S4 method for signature 'DNAStringSet' get_rscu(object, numcode)
get_rscu(object, numcode = 1, ...) ## S4 method for signature 'regioned_dna' get_rscu(object, numcode) ## S4 method for signature 'DNAStringSet' get_rscu(object, numcode)
object |
regioned_dna / DNAStringSet / codon usage matrix (vector) |
numcode |
The ncbi genetic code number for translation. Default value: |
... |
... |
matrix
input_seq
, get_region
,
get_cu
, get_du
, get_freq
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_rscu(rgd.seq)
filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) get_rscu(rgd.seq)
Constructing regioned_dna
from DNAStringSet. Optionally input a
region
data.frame to define restricted amino-acid region for mutation.
input_seq(object, region = NA, ...) ## S4 method for signature 'character' input_seq(object, region) ## S4 method for signature 'DNAStringSet' input_seq(object, region) ## S4 method for signature 'DNAString' input_seq(object, region)
input_seq(object, region = NA, ...) ## S4 method for signature 'character' input_seq(object, region) ## S4 method for signature 'DNAStringSet' input_seq(object, region) ## S4 method for signature 'DNAString' input_seq(object, region)
object |
Filepath or DNAstringSet. The input sequences is suggested to be in open reading frame(ORF). |
region |
|
... |
... |
A regioned_dna-class object
get_cu
, get_du
,
get_region
, get_dna
# Creating a input_seq class directly from system file filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) # Optionally input with region dataframe filepath.fasta <- system.file("extdata", "example.fasta", package = "SynMut") fp.csv <- system.file("extdata", "target_regions.csv", package = "SynMut") region <- read.csv(fp.csv) rgd.seq <- input_seq(filepath.fasta, region) # Creating from exsisting DNAStringSet object seq <- Biostrings::DNAStringSet("ATCGATCGA") rgd.seq <- input_seq(seq)
# Creating a input_seq class directly from system file filepath <- system.file("extdata", "example.fasta", package = "SynMut") rgd.seq <- input_seq(filepath) # Optionally input with region dataframe filepath.fasta <- system.file("extdata", "example.fasta", package = "SynMut") fp.csv <- system.file("extdata", "target_regions.csv", package = "SynMut") region <- read.csv(fp.csv) rgd.seq <- input_seq(filepath.fasta, region) # Creating from exsisting DNAStringSet object seq <- Biostrings::DNAStringSet("ATCGATCGA") rgd.seq <- input_seq(seq)
Recording codon DNA sequences and region.
dnaseq
a DNAStingSet object recording the sequence(s)
region
a list specifying paticular regions in the sequences allowed to be mutated
Haogao Gu
n
random DNA sequnces of length m
Generate n
random DNA sequnces of length m
, optional exclude
stop codons.
seq_random(n = 1, m, no.stop.codon = FALSE, ...) ## S4 method for signature 'numeric,numeric' seq_random(n, m, no.stop.codon)
seq_random(n = 1, m, no.stop.codon = FALSE, ...) ## S4 method for signature 'numeric,numeric' seq_random(n, m, no.stop.codon)
n |
the number of the output sequence(s). |
m |
the length of the ouput sequence(s). Eighter a fixed number or a vector of different numbers. |
no.stop.codon |
Default FALSE. If TRUE, the stop codons in the frame 1 would be substituted to another random codon. |
... |
... |
a DNAStringSet object
seq_random(n = 1, m = 99) seq_random(n = 10, m = 30) seq_random(n = 10, m = 1:10) seq.nsc <- seq_random(n = 10, m = 100, no.stop.codon = TRUE) get_cu(seq.nsc)
seq_random(n = 1, m = 99) seq_random(n = 10, m = 30) seq_random(n = 10, m = 1:10) seq.nsc <- seq_random(n = 10, m = 100, no.stop.codon = TRUE) get_cu(seq.nsc)