Title: | Compute the automorphisms between DNA's Abelian group representations |
---|---|
Description: | This is a R package to compute the automorphisms between pairwise aligned DNA sequences represented as elements from a Genomic Abelian group. In a general scenario, from genomic regions till the whole genomes from a given population (from any species or close related species) can be algebraically represented as a direct sum of cyclic groups or more specifically Abelian p-groups. Basically, we propose the representation of multiple sequence alignments of length N bp as element of a finite Abelian group created by the direct sum of homocyclic Abelian group of prime-power order. |
Authors: | Robersy Sanchez [aut, cre] |
Maintainer: | Robersy Sanchez <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.9.1 |
Built: | 2024-12-30 03:13:08 UTC |
Source: | https://github.com/bioc/GenomAutomorphism |
The aminoacid similarity matrices from Amino Acid Index Database https://www.genome.jp/aaindex/ are provided here. AAindex (ver.9.2) is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids.
The similarity of amino acids can be represented numerically, expressed in terms of observed mutation rate or physicochemical properties. A similarity matrix, also called a mutation matrix, is a set of 210 numerical values, 20 diagonal and 20x19/2 off-diagonal elements, used for sequence alignments and similarity searches.
Function aa_phychem_index is wrapper function to call two other functions: aa_mutmat and aa_index
aa_phychem_index(acc = NA, aaindex = NA, acc_list = FALSE, info = FALSE) aa_mutmat(acc = NA, aaindex = c("aaindex2", "aaindex3"), acc_list = FALSE) aa_index(acc = NA, acc_list = FALSE, info = FALSE)
aa_phychem_index(acc = NA, aaindex = NA, acc_list = FALSE, info = FALSE) aa_mutmat(acc = NA, aaindex = c("aaindex2", "aaindex3"), acc_list = FALSE) aa_index(acc = NA, acc_list = FALSE, info = FALSE)
acc |
Accession id for a specified mutation or contact potential matrix. |
aaindex |
Database where the requested accession id is locate. The possible values are: "aaindex2" or "aaindex3". |
acc_list |
Logical. If TRUE, then the list of available matrices ids and index names is returned. |
info |
Logical. if TRUE, then whole information for the physicochemical index will be returned. |
Depending on the user specifications, a mutation or contact potential matrix, a list of available matrices (indices) ids or index names can be returned. More specifically:
Returns an aminoacid mutation matrix or a statistical protein contact potentials matrix.
Returns the specified aminoacid physicochemical indices.
Robersy Sanchez https://genomaths.com
aaindex1, aaindex2, aaindex3, and get_mutscore.
## Load the mutation matrices from database from the packages data("aaindex1","aaindex2", package = "GenomAutomorphism" ) ## Get the available mutation matrices mat <- aa_mutmat(aaindex = "aaindex2", acc_list = TRUE) mat[seq(10)] ## Return the 'Base-substitution-protein-stability matrix ## (Miyazawa-Jernigan, 1993)' aa_mutmat(acc = "MIYS930101", aaindex = "aaindex2") ## Return the 'BLOSUM80 substitution matrix (Henikoff-Henikoff, 1992)' aa_mutmat(acc = "HENS920103", aaindex = "aaindex2") ## Using wrapping function aa_phychem_index(acc = "EISD840101", aaindex = "aaindex1") ## Just the info. The information provided after the reference ## corresponds to the correlaiton of 'EISD840101' with other indices. aa_phychem_index(acc = "EISD840101", aaindex = "aaindex1", info = TRUE)
## Load the mutation matrices from database from the packages data("aaindex1","aaindex2", package = "GenomAutomorphism" ) ## Get the available mutation matrices mat <- aa_mutmat(aaindex = "aaindex2", acc_list = TRUE) mat[seq(10)] ## Return the 'Base-substitution-protein-stability matrix ## (Miyazawa-Jernigan, 1993)' aa_mutmat(acc = "MIYS930101", aaindex = "aaindex2") ## Return the 'BLOSUM80 substitution matrix (Henikoff-Henikoff, 1992)' aa_mutmat(acc = "HENS920103", aaindex = "aaindex2") ## Using wrapping function aa_phychem_index(acc = "EISD840101", aaindex = "aaindex1") ## Just the info. The information provided after the reference ## corresponds to the correlaiton of 'EISD840101' with other indices. aa_phychem_index(acc = "EISD840101", aaindex = "aaindex1", info = TRUE)
The aminoacid indexes from Amino Acid Index Database https://www.genome.jp/aaindex/ are provided here. AAindex (ver.9.2) is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids.
data("aaindex1", package = "GenomAutomorphism")
data("aaindex1", package = "GenomAutomorphism")
A list carrying the the description 566 Amino Acid Indices in AAindex ver.9.2 and the text file with the matrices imported from https://www.genome.jp/aaindex/.
Robersy Sanchez https://genomaths.com
## Load the mutation matrices from database from the packages data("aaindex1", package = "GenomAutomorphism", envir = environment()) ## Get the available aminoacid indices. mat <- aa_phychem_index(aaindex = "aaindex1", acc_list = TRUE) mat[1:10]
## Load the mutation matrices from database from the packages data("aaindex1", package = "GenomAutomorphism", envir = environment()) ## Get the available aminoacid indices. mat <- aa_phychem_index(aaindex = "aaindex1", acc_list = TRUE) mat[1:10]
The aminoacid similarity matrices from Amino Acid Index Database https://www.genome.jp/aaindex/ are provided here. AAindex (ver.9.2) is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids.
data("aaindex2", package = "GenomAutomorphism")
data("aaindex2", package = "GenomAutomorphism")
A list carrying the description of 94 Amino Acid Matrices in AAindex ver.9.2 and the text file of matrices imported from https://www.genome.jp/aaindex/.
The similarity of amino acids can be represented numerically, expressed in terms of observed mutation rate or physicochemical properties. A similarity matrix, also called a mutation matrix, is a set of 210 numerical values, 20 diagonal and 20x19/2 off-diagonal elements, used for sequence alignments and similarity searches.
Robersy Sanchez https://genomaths.com
aaindex2
and aa_mutmat
, and
get_mutscore
.
## Load the mutation matrices from database from the packages data("aaindex2", package = "GenomAutomorphism") ## Get the available matrices mat <- aa_mutmat(aaindex = "aaindex2", acc_list = TRUE) mat[1:10]
## Load the mutation matrices from database from the packages data("aaindex2", package = "GenomAutomorphism") ## Get the available matrices mat <- aa_mutmat(aaindex = "aaindex2", acc_list = TRUE) mat[1:10]
A statistical potential (also knowledge-based potential, empirical potential, or residue contact potential) is an energy function derived from an analysis of known structures in the Protein Data Bank.
data("aaindex3", package = "GenomAutomorphism")
data("aaindex3", package = "GenomAutomorphism")
A list carrying the the description 47 Amino Acid Matrices in AAindex ver.9.2 and the text file of matrices imported from https://www.genome.jp/aaindex/.
A list of 47 amino acid matrices from Amino Acid Index Database https://www.genome.jp/aaindex/ are provided here. AAindex is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids.
The contact potential matrix of amino acids is a set of 210 numerical values, 20 diagonal and 20x19/2 off-diagonal elements, used for sequence alignments and similarity searches.
Robersy Sanchez https://genomaths.com
aaindex1, aaindex2, and get_mutscore.
## Load the mutation matrices from database from the packages data("aaindex3", package = "GenomAutomorphism") ## Get the available mutation matrices mat <- aa_mutmat(aaindex = "aaindex3", acc_list = TRUE) mat[1:10]
## Load the mutation matrices from database from the packages data("aaindex3", package = "GenomAutomorphism") ## Get the available mutation matrices mat <- aa_mutmat(aaindex = "aaindex3", acc_list = TRUE) mat[1:10]
DNAStringSet
class objectThis is a DNAStringSet
carrying a small pairwise
DNA sequence alignment to be used in the examples provided for the
package functions.
data("aln", package = "GenomAutomorphism")
data("aln", package = "GenomAutomorphism")
DNAStringSet
class object.
data("aln", package = "GenomAutomorphism") aln
data("aln", package = "GenomAutomorphism") aln
This function computes the distance between aminoacids in terms of a statistic of the corresponding codons. The possible statistics are: 'mean', 'median', or some user defined function.
aminoacid_dist(aa1, aa2, ...) ## S4 method for signature 'character,character' aminoacid_dist( aa1, aa2, weight = NULL, stat = c("mean", "median", "user_def"), genetic_code = "1", group = c("Z4", "Z5"), cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), num.cores = 1L, tasks = 0L, verbose = FALSE ) ## S4 method for signature 'DNAStringSet,ANY' aminoacid_dist( aa1, weight = NULL, stat = c("mean", "median", "user_def"), group = c("Z4", "Z5"), cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), num.cores = 1L, tasks = 0L, verbose = FALSE ) ## S4 method for signature 'AAStringSet,ANY' aminoacid_dist( aa1, weight = NULL, stat = c("mean", "median", "user_def"), group = c("Z4", "Z5"), cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), num.cores = 1L, tasks = 0L, verbose = FALSE ) ## S4 method for signature 'CodonGroup_OR_Automorphisms,ANY' aminoacid_dist( aa1, weight = NULL, stat = c("mean", "median", "user_def"), group = c("Z4", "Z5"), cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), num.cores = 1L, tasks = 0L, verbose = FALSE )
aminoacid_dist(aa1, aa2, ...) ## S4 method for signature 'character,character' aminoacid_dist( aa1, aa2, weight = NULL, stat = c("mean", "median", "user_def"), genetic_code = "1", group = c("Z4", "Z5"), cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), num.cores = 1L, tasks = 0L, verbose = FALSE ) ## S4 method for signature 'DNAStringSet,ANY' aminoacid_dist( aa1, weight = NULL, stat = c("mean", "median", "user_def"), group = c("Z4", "Z5"), cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), num.cores = 1L, tasks = 0L, verbose = FALSE ) ## S4 method for signature 'AAStringSet,ANY' aminoacid_dist( aa1, weight = NULL, stat = c("mean", "median", "user_def"), group = c("Z4", "Z5"), cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), num.cores = 1L, tasks = 0L, verbose = FALSE ) ## S4 method for signature 'CodonGroup_OR_Automorphisms,ANY' aminoacid_dist( aa1, weight = NULL, stat = c("mean", "median", "user_def"), group = c("Z4", "Z5"), cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), num.cores = 1L, tasks = 0L, verbose = FALSE )
aa1 , aa2
|
A character string of codon sequences, i.e., sequences of
DNA base-triplets. If only 'x' argument is given, then it must be a
|
... |
Not in use yet. |
weight |
A numerical vector of weights to compute weighted Manhattan
distance between codons. If |
stat |
The name of some statistical function summarizing data like
'mean', 'median', or some user defined function ('user_def'). If
|
genetic_code |
A single string that uniquely identifies the genetic
code to extract. Should be one of the values in the id or name2 columns of
|
group |
A character string denoting the group representation for the given codon sequence as shown in reference (2-3). |
cube |
A character string denoting one of the 24 Genetic-code cubes, as given in references (2-3). |
num.cores , tasks
|
Parameters for parallel computation using package
|
verbose |
If TRUE, prints the progress bar. |
Only aminoacids sequences given in the following alphabet are accepted: "A","R","N","D","C","Q","E","G","H","I","L","K", "M","F","P", "S","T","W","Y","V", "", "-", and "X"; where symbols "" and "-" denote the presence a stop codon and of a gap, respectively, and letter "X" missing information, which are then taken as a gap.
The distance between any aminoacid and any of the non-aminoacid symbols is the ceiling of the greater distance found in the corresponding aminoacid distance matrix.
A numerical vector with the pairwise distances between codons in sequences 'x' and 'y'.
Sanchez R. Evolutionary Analysis of DNA-Protein-Coding Regions Based on a Genetic Code Cube Metric. Curr. Top. Med. Chem. 2014;14: 407–417. https://doi.org/10.2174/1568026613666131204110022.
M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119-152.PDF.
R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560. PDF.
## Write down to aminoacid sequences x <- "A*LTHMC" y <- "AAMTDM-" aminoacid_dist(aa1 = x, aa2 = y) ## Let's create an AAStringSet-class object aa <- AAStringSet(c(x, y)) aminoacid_dist(aa1 = aa) ## Let's select cube "GCAT" and group "Z5" aminoacid_dist(aa1 = aa, group = "Z5", cube = "TCGA")
## Write down to aminoacid sequences x <- "A*LTHMC" y <- "AAMTDM-" aminoacid_dist(aa1 = x, aa2 = y) ## Let's create an AAStringSet-class object aa <- AAStringSet(c(x, y)) aminoacid_dist(aa1 = aa) ## Let's select cube "GCAT" and group "Z5" aminoacid_dist(aa1 = aa, group = "Z5", cube = "TCGA")
Several methods are available to be applied on
Automorphism-class
and AutomorphismList-class
objects.
as.AutomorphismList(x, grs = GRanges(), ...) ## S4 method for signature 'GRangesList,GRanges_OR_NULL' as.AutomorphismList(x, grs = GRanges(), ...) ## S4 method for signature 'list,GRanges_OR_NULL' as.AutomorphismList(x, grs = GRanges(), ...)
as.AutomorphismList(x, grs = GRanges(), ...) ## S4 method for signature 'GRangesList,GRanges_OR_NULL' as.AutomorphismList(x, grs = GRanges(), ...) ## S4 method for signature 'list,GRanges_OR_NULL' as.AutomorphismList(x, grs = GRanges(), ...)
x |
A |
grs |
A |
... |
Not in use yet. |
The returned an AutomorphismList-class object.
automorphism_bycoef
, automorphisms
## Load a dataset data("brca1_autm", package = "GenomAutomorphism") ## Let's transforming into a list of Automorphisms-class objects x1 <- as.list(brca1_autm[seq(2)]) ## Now, object 'x1' is transformed into a AutomorphismList-class object as.AutomorphismList(x1) ## Alternatively, let's transform the list 'x1' into a GRangesList-class ## object. x1 <- GRangesList(x1) ## Next, object 'x1' is transformed into a AutomorphismList-class object as.AutomorphismList(x1)
## Load a dataset data("brca1_autm", package = "GenomAutomorphism") ## Let's transforming into a list of Automorphisms-class objects x1 <- as.list(brca1_autm[seq(2)]) ## Now, object 'x1' is transformed into a AutomorphismList-class object as.AutomorphismList(x1) ## Alternatively, let's transform the list 'x1' into a GRangesList-class ## object. x1 <- GRangesList(x1) ## Next, object 'x1' is transformed into a AutomorphismList-class object as.AutomorphismList(x1)
Given two codon sequences represented in the Z5^3 Abelian group, this function computes the automorphisms describing codon mutational events.
aut3D( seq = NULL, filepath = NULL, cube = c("ACGT", "TGCA"), cube_alt = c("CATG", "GTAC"), field = "GF5", start = NA, end = NA, chr = 1L, strand = "+", genetic_code = getGeneticCode("1"), num.cores = multicoreWorkers(), tasks = 0L, verbose = TRUE )
aut3D( seq = NULL, filepath = NULL, cube = c("ACGT", "TGCA"), cube_alt = c("CATG", "GTAC"), field = "GF5", start = NA, end = NA, chr = 1L, strand = "+", genetic_code = getGeneticCode("1"), num.cores = multicoreWorkers(), tasks = 0L, verbose = TRUE )
seq |
An object from a |
filepath |
A character vector containing the path to a file in fasta format to be read. This argument must be given if codon & base arguments are not provided. |
cube , cube_alt
|
A character string denoting pairs of the 24 Genetic-code cubes, as given in references (2-3). That is, the base pairs from the given cubes must be complementary each other. Such a cube pair are call dual cubes and, as shown in reference (3), each pair integrates group. |
field |
A character string denoting the Galois field where the 3D automorphisms are estimated. This can be 'GF(4)' or 'GF(5)', but only 'GF(5)' is implemented so far. |
start , end , chr , strand
|
Optional parameters required to build a
|
genetic_code |
The named character vector returned by
|
num.cores , tasks
|
Parameters for parallel computation using package
|
verbose |
If TRUE, prints the progress bar. |
Automorphisms in Z5^3' are described as functions
, where A is diagonal matrix, as noticed in
reference (4).
An object Automorphism-class
with four columns on its
metacolumn named: seq1, seq2, autm, and cube.
Robersy Sanchez (https://genomaths.com).
Sanchez R, Morgado E, Grau R. Gene algebra from a genetic code algebraic structure. J Math Biol. 2005 Oct;51(4):431-57. doi: 10.1007/s00285-005-0332-8. Epub 2005 Jul 13. PMID: 16012800. ( PDF).
Robersy Sanchez, Jesus Barreto (2021) Genomic Abelian Finite Groups. https://doi.org/10.1101/2021.06.01.446543.
M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119-152.PDF.
R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560. PDF.
## Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln ## Automorphism on Z5^3 autms <- aut3D(seq = aln) autms
## Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln ## Automorphism on Z5^3 autms <- aut3D(seq = aln) autms
This is a AutomorphismList object carrying a list of pairwise automorphisms between the DNA sequences from the MSA of primate somatic cytochrome C grouped by automorphism's coefficients. The grouping derives from the dataset brca1_autm after applying function automorphism_bycoef.
data("autby_coef", package = "GenomAutomorphism")
data("autby_coef", package = "GenomAutomorphism")
AutomorphismByCoefList class object.
## Load the data set data("autby_coef", package = "GenomAutomorphism") autby_coef ## Mutation type found in the data unique(autby_coef$human_1.human_2$mut_type)
## Load the data set data("autby_coef", package = "GenomAutomorphism") autby_coef ## Mutation type found in the data unique(autby_coef$human_1.human_2$mut_type)
This is a AutomorphismList object carrying a list of pairwise automorphisms between the SARS coronavirus GZ02 (GenBank: AY390556.1: 265-13398_13398-21485) and Bat SARS-like coronavirus isolate bat-SL-CoVZC45 (GenBank: MG772933.1:265-1345513455-21542), nonstructural_polyprotein. The pairwise DNA sequence alignment is available in the dataset named covid_aln and the automorphisms were estimated with function autZ64.
data("autm", package = "GenomAutomorphism")
data("autm", package = "GenomAutomorphism")
AutomorphismList class object.
The alignment of these DNA sequences is available at: https://github.com/genomaths/seqalignments/raw/master/COVID-19 in the fasta file 'AY390556.1_265-13398_13398-21485_RNA-POL_SARS_COVI_GZ02.fas'
data("autm", package = "GenomAutomorphism") autm
data("autm", package = "GenomAutomorphism") autm
This is a AutomorphismList object carrying a list of pairwise
automorphisms between the SARS coronavirus GZ02 (GenBank: AY390556.1:
265-13398_13398-21485) and Bat SARS-like coronavirus isolate bat-SL-CoVZC45
(GenBank: MG772933.1:265-1345513455-21542), nonstructural_polyprotein. The
pairwise DNA sequence alignment is available in the dataset named
covid_aln
and the automorphisms were estimated with function
aut3D
.
data("autm_3d", package = "GenomAutomorphism")
data("autm_3d", package = "GenomAutomorphism")
AutomorphismList class object.
data("autm_3d", package = "GenomAutomorphism") autm_3d
data("autm_3d", package = "GenomAutomorphism") autm_3d
This is a AutomorphismList object carrying a list of pairwise automorphisms between the SARS coronavirus GZ02 (GenBank: AY390556.1: 265-13398_13398-21485) and Bat SARS-like coronavirus isolate bat-SL-CoVZC45 (GenBank: MG772933.1:265-1345513455-21542), nonstructural_polyprotein. The pairwise DNA sequence alignment is available in the dataset named covid_aln and the automorphisms were estimated with function autZ125.
data("autm_z125", package = "GenomAutomorphism")
data("autm_z125", package = "GenomAutomorphism")
AutomorphismList class object.
data("autm_z125", package = "GenomAutomorphism") autm_z125
data("autm_z125", package = "GenomAutomorphism") autm_z125
Automorphisms with the same automorphism's coefficients are grouped.
automorphism_bycoef(x, ...) ## S4 method for signature 'Automorphism' automorphism_bycoef(x, mut.type = TRUE) ## S4 method for signature 'AutomorphismList' automorphism_bycoef( x, min.len = 1L, mut.type = TRUE, num.cores = multicoreWorkers(), tasks = 0L, verbose = TRUE )
automorphism_bycoef(x, ...) ## S4 method for signature 'Automorphism' automorphism_bycoef(x, mut.type = TRUE) ## S4 method for signature 'AutomorphismList' automorphism_bycoef( x, min.len = 1L, mut.type = TRUE, num.cores = multicoreWorkers(), tasks = 0L, verbose = TRUE )
x |
An automorphism-class object returned by function
|
... |
Not in use. |
mut.type |
Logical. Whether to include the mutation type as given by
function |
min.len |
Minimum length of a range to be reported. |
num.cores , tasks
|
Integers. Argument num.cores denotes the
number of cores to use, i.e. at most how many child processes will be run
simultaneously (see |
verbose |
logic(1). If TRUE, enable progress bar. |
An AutomorphismByCoef
class object. A coefficient
with 0 value is assigned to mutational events that are not automorphisms,
e.g., indel mutations.
## Load dataset data("autm", package = "GenomAutomorphism") automorphism_bycoef(x = autm[1:2])
## Load dataset data("autm", package = "GenomAutomorphism") automorphism_bycoef(x = autm[1:2])
This function applies a Dirichlet-Multinomial Modelling (in Bayesian framework) to compute the posterior probability of each type of mutational event. DNA bases are classified based on the physicochemical criteria used to ordering the set of codons: number of hydrogen bonds (strong-weak, S-W), chemical type (purine-pyrimidine, Y-R), and chemical groups (amino versus keto, M-K) (see reference 4). Preserved codon positions are labeled with letter “H”.
As a result, mutational events are grouped by type of mutation covering all the possible combinations of symbols: "Y", "R", "M", "W", "K", "S", and "H", for example: "YSH", "RSH", "MSH", "WSH", "KSH", "SSH", "HSH", "YHH", "RHH", and so on. Insertion/deletion mutations are not considered.
Given the observed data , where
is the
frequency for the mutation type
, we want to estimate the parameters
of the Dirichlet distribution,
,
that maximize the marginal likelihood.
The marginal likelihood of the data under the Dirichlet-multinomial model is given by
where .
To perform MLE, we maximize the log of this likelihood:
. That is, we aim to maximize this
log-likelihood with respect to
. This is done numerically
because there's no closed-form solution. Here, we use:
with initial guess set as .
Let be the vector of probabilities for each mutation type. It's
the parameter we're estimating, which represents the probabilities of
observing each mutation type in the multinomial distribution. The conjugate
distribution in this context refers to the Dirichlet distribution, which is
the prior distribution for
. When we have observed data, the
posterior distribution of
given this data is also a Dirichlet
distribution due to the conjugate property.
The prior distribution, before observing any data, our belief about the
distribution of is given by:
where the are known as initial concentration parameters,
which represents our initial belief or assumption about the distribution
of the mutation types. These parameters control how the probability mass is
distributed among the mutation types. We might set these initial parameters
based on some prior knowledge or simply to provide a non-informative prior.
Once we have the MLE for , the posterior distribution of
given the data updates these parameters to incorporate the
observed data:
where are the MLE estimates, i.e.,
are our updated belief about the frequencies of
mutation types in the population sample.
The expected values of , given the data under the posterior
Dirichlet distribution is:
These expected values directly corresponds to the "posterior probability" of
observing mutation type .
This approach provides a rigorous estimation of the Dirichlet parameters under the Dirichlet-multinomial model using MLE.
automorphism_prob(x, ...) ## S4 method for signature 'AutomorphismByCoef' automorphism_prob( x, initial_alpha = NULL, method = c("L-BFGS-B", "Nelder-Mead", "BFGS", "CG", "SANN", "Brent"), maxit = 500, abstol = 10^-8, ... ) ## S4 method for signature 'AutomorphismByCoefList' automorphism_prob( x, initial_alpha = NULL, method = c("L-BFGS-B", "Nelder-Mead", "BFGS", "CG", "SANN", "Brent"), maxit = 500, abstol = 10^-8 )
automorphism_prob(x, ...) ## S4 method for signature 'AutomorphismByCoef' automorphism_prob( x, initial_alpha = NULL, method = c("L-BFGS-B", "Nelder-Mead", "BFGS", "CG", "SANN", "Brent"), maxit = 500, abstol = 10^-8, ... ) ## S4 method for signature 'AutomorphismByCoefList' automorphism_prob( x, initial_alpha = NULL, method = c("L-BFGS-B", "Nelder-Mead", "BFGS", "CG", "SANN", "Brent"), maxit = 500, abstol = 10^-8 )
x |
An AutomorphismByCoefList-class object returned by function automorphism_bycoef. |
... |
Not in use yet. |
initial_alpha |
A vector of initial guess values for pseudo counts
|
method , maxit , abstol
|
Parameter values to pass into
|
A data frame with the posterior probabilities.
## Load the data set data("autby_coef", package = "GenomAutomorphism") post_prob <- automorphism_prob(autby_coef[1:10]) head(post_prob,10)
## Load the data set data("autby_coef", package = "GenomAutomorphism") post_prob <- automorphism_prob(autby_coef[1:10]) head(post_prob,10)
Automorphisms estimated on a pairwise or a MSA alignment
can be grouped by ranges which inherits from
GRanges-class
or a
GRanges-class
.
automorphismByRanges(x, ...) ## S4 method for signature 'Automorphism' automorphismByRanges(x) ## S4 method for signature 'AutomorphismList' automorphismByRanges( x, min.len = 0L, num.cores = multicoreWorkers(), tasks = 0L, verbose = TRUE )
automorphismByRanges(x, ...) ## S4 method for signature 'Automorphism' automorphismByRanges(x) ## S4 method for signature 'AutomorphismList' automorphismByRanges( x, min.len = 0L, num.cores = multicoreWorkers(), tasks = 0L, verbose = TRUE )
x |
An AutomorphismList-class object returned by function
|
... |
Not in use. |
min.len |
Minimum length of a range to be reported. |
num.cores , tasks
|
Integers. Argument num.cores denotes the
number of cores to use, i.e. at most how many child processes will be run
simultaneously (see |
verbose |
logic(1). If TRUE, enable progress bar. |
A GRanges-class
or a
GRangesList-class
. Each
GRanges-class
object with a column
named cube, which carries the type of cube automorphims.
## Load dataset data("autm", package = "GenomAutomorphism") automorphismByRanges(x = autm[c(1, 4)])
## Load dataset data("autm", package = "GenomAutomorphism") automorphismByRanges(x = autm[c(1, 4)])
Given two codon sequences represented in a given Abelian group, this function computes the automorphisms describing codon mutational events. Basically, this function is a wrapping to call the corresponding function for a specified Abelian group.
automorphisms(seqs = NULL, filepath = NULL, group = "Z4", ...) ## S4 method for signature 'DNAStringSet_OR_NULL' automorphisms( seqs = NULL, filepath = NULL, group = c("Z5", "Z64", "Z125", "Z5^3"), cube = c("ACGT", "TGCA"), cube_alt = c("CATG", "GTAC"), nms = NULL, start = NA, end = NA, chr = 1L, strand = "+", num.cores = multicoreWorkers(), tasks = 0L, verbose = TRUE )
automorphisms(seqs = NULL, filepath = NULL, group = "Z4", ...) ## S4 method for signature 'DNAStringSet_OR_NULL' automorphisms( seqs = NULL, filepath = NULL, group = c("Z5", "Z64", "Z125", "Z5^3"), cube = c("ACGT", "TGCA"), cube_alt = c("CATG", "GTAC"), nms = NULL, start = NA, end = NA, chr = 1L, strand = "+", num.cores = multicoreWorkers(), tasks = 0L, verbose = TRUE )
seqs |
An object from a |
filepath |
A character vector containing the path to a file in fasta format to be read. This argument must be given if codon & base arguments are not provided. |
group |
A character string denoting the group representation for the given base or codon as shown in reference (1). |
... |
Not in use. |
cube , cube_alt
|
A character string denoting pairs of the 24
Genetic-code cubes, as given in references (2-3). That is, the base pairs
from the given cubes must be complementary each other. Such a cube pair are
call |
nms |
Optional. Only used if the DNA sequence alignment provided carries more than two sequences. A character string giving short names for the alignments to be compared. If not given then the automorphisms between pairwise alignment are named as: 'aln_1', 'aln_2', and so on. |
start , end , chr , strand
|
Optional parameters required to build a
|
num.cores , tasks
|
Parameters for parallel computation using package
|
verbose |
If TRUE, prints the progress bar. |
Herein, automorphisms are algebraic descriptions of mutational
event observed in codon sequences represented on different Abelian groups.
In particular, as described in references (3-4), for each representation of
the codon set on a defined Abelian group there are 24 possible isomorphic
Abelian groups. These Abelian groups can be labeled based on the DNA
base-order used to generate them. The set of 24 Abelian groups can be
described as a group isomorphic to the symmetric group of degree four
(, see reference (4)). Function
automorphismByRanges
permits the classification of the pairwise alignment of protein-coding
sub-regions based on the mutational events observed on it and on the
genetic-code cubes that describe them.
Automorphisms in Z5, Z64 and Z125 are described as functions
and
, where k and x are
elements from the set of integers modulo 64 or modulo 125, respectively. If
an automorphisms cannot be found on any of the cubes provided in the
argument
, then function
automorphisms
will search
for automorphisms in the cubes provided in the argument .
Automorphisms in Z5^3' are described as functions ,
where A is diagonal matrix.
Arguments cube and cube_alt must be pairs of' dual cubes (see section 2.4 from reference 4).
This function returns a Automorphism-class
object
with four columns on its metacolumn named: seq1, seq2,
autm, and cube.
automorphismByRanges
:This function returns a GRanges-class
object.
Consecutive mutational events (on the codon sequence) described by
automorphisms on a same cube are grouped in a range.
automorphism_bycoef
This function returns a GRanges-class
object.
Consecutive mutational events (on the codon sequence) described by
the same automorphisms coefficients are grouped in a range.
getAutomorphisms
This function returns an AutomorphismList-class object as a list of
Automorphism-class objects, which inherits from
GRanges-class
objects.
conserved_regions
Returns a AutomorphismByCoef
class object containing the
requested regions.
Robersy Sanchez (https://genomaths.com).
Sanchez R, Morgado E, Grau R. Gene algebra from a genetic code algebraic structure. J Math Biol. 2005 Oct;51(4):431-57. doi: 10.1007/s00285-005-0332-8. Epub 2005 Jul 13. PMID: 16012800. ( PDF).
Robersy Sanchez, Jesus Barreto (2021) Genomic Abelian Finite Groups. doi:10.1101/2021.06.01.446543
M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 110-152.PDF.
R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560. PDF
## Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln ## Automorphism on "Z5^3" autms <- automorphisms(seqs = aln, group = "Z5^3", verbose = FALSE) autms ## Automorphism on "Z64" autms <- automorphisms(seqs = aln, group = "Z64", verbose = FALSE) autms ## Automorphism on "Z64" from position 1 to 33 autms <- automorphisms( seqs = aln, group = "Z64", start = 1, end = 33, verbose = FALSE ) autms
## Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln ## Automorphism on "Z5^3" autms <- automorphisms(seqs = aln, group = "Z5^3", verbose = FALSE) autms ## Automorphism on "Z64" autms <- automorphisms(seqs = aln, group = "Z64", verbose = FALSE) autms ## Automorphism on "Z64" from position 1 to 33 autms <- automorphisms( seqs = aln, group = "Z64", start = 1, end = 33, verbose = FALSE ) autms
Given two codon sequences represented in the Z125 Abelian group, this function computes the automorphisms describing codon mutational events.
autZ125( seq = NULL, filepath = NULL, cube = c("ACGT", "TGCA"), cube_alt = c("CATG", "GTAC"), start = NA, end = NA, chr = 1L, strand = "+", genetic_code = getGeneticCode("1"), num.cores = multicoreWorkers() - 1, tasks = 0L, verbose = TRUE )
autZ125( seq = NULL, filepath = NULL, cube = c("ACGT", "TGCA"), cube_alt = c("CATG", "GTAC"), start = NA, end = NA, chr = 1L, strand = "+", genetic_code = getGeneticCode("1"), num.cores = multicoreWorkers() - 1, tasks = 0L, verbose = TRUE )
seq |
An object from a |
filepath |
A character vector containing the path to a file in fasta format to be read. This argument must be given if codon & base arguments are not provided. |
cube , cube_alt
|
A character string denoting pairs of the 24 Genetic-code cubes, as given in references (2-3). That is, the base pairs from the given cubes must be complementary each other. Such a cube pair are call dual cubes and, as shown in reference (3), each pair integrates group. |
start , end , chr , strand
|
Optional parameters required to build a
|
genetic_code |
The named character vector returned by
|
num.cores , tasks
|
Parameters for parallel computation using package
|
verbose |
If TRUE, prints the progress bar. |
Automorphisms in Z125 are described as functions
, where k and x are elements from the set of
integers modulo 64. As noticed in reference (1)
An object Automorphism-class
with four columns
on its metacolumn named: seq1, seq2, autm, and
cube.
Sanchez R, Morgado E, Grau R. Gene algebra from a genetic code algebraic structure. J Math Biol. 2005 Oct;51(4):431-57. doi: 10.1007/s00285-005-0332-8. Epub 2005 Jul 13. PMID: 16012800. ( PDF).
Robersy Sanchez, Jesus Barreto (2021) Genomic Abelian Finite Groups. doi:10.1101/2021.06.01.446543
M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 110-152.PDF.
R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560. PDF
## Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln ## Automorphism on Z125 autms <- autZ125(seq = aln) autms
## Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln ## Automorphism on Z125 autms <- autZ125(seq = aln) autms
Given two codon sequences represented in the Z5 Abelian group, this function computes the automorphisms describing codon mutational events.
autZ5( seq = NULL, filepath = NULL, cube = c("ACGT", "TGCA"), cube_alt = c("CATG", "GTAC"), start = NA, end = NA, chr = 1L, strand = "+", num.cores = multicoreWorkers(), tasks = 0L, verbose = TRUE )
autZ5( seq = NULL, filepath = NULL, cube = c("ACGT", "TGCA"), cube_alt = c("CATG", "GTAC"), start = NA, end = NA, chr = 1L, strand = "+", num.cores = multicoreWorkers(), tasks = 0L, verbose = TRUE )
seq |
An object from a |
filepath |
A character vector containing the path to a file in fasta format to be read. This argument must be given if codon & base arguments are not provided. |
cube , cube_alt
|
A character string denoting pairs of the 24 Genetic-code cubes, as given in references (2-3). That is, the base pairs from the given cubes must be complementary each other. Such a cube pair are call dual cubes and, as shown in reference (3), each pair integrates group. |
start , end , chr , strand
|
Optional parameters required to build a
|
num.cores , tasks
|
Parameters for parallel computation using package
|
verbose |
If TRUE, prints the progress bar. |
Automorphisms in Z5 are described as functions
, where k and x are elements from the set of
integers modulo 64. As noticed in reference (1). The pairwise alignment
provided in argument seq or the 'fasta' file
filepath must correspond to DNA base sequences.
An object Automorphism-class
with four columns on its
metacolumn named: seq1, seq2, autm, and cube.
Sanchez R, Morgado E, Grau R. Gene algebra from a genetic code algebraic structure. J Math Biol. 2005 Oct;51(4):431-57. doi: 10.1007/s00285-005-0332-8. Epub 2005 Jul 13. PMID: 16012800. ( PDF).
Robersy Sanchez, Jesus Barreto (2021) Genomic Abelian Finite Groups. doi:10.1101/2021.06.01.446543
M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 110-152.PDF.
R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560. PDF
## Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln ## Automorphism on Z5 autms <- autZ5(seq = aln, verbose = FALSE) autms
## Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln ## Automorphism on Z5 autms <- autZ5(seq = aln, verbose = FALSE) autms
Given two codon sequences represented in the Z64 Abelian group, this function computes the automorphisms describing codon mutational events.
autZ64( seq = NULL, filepath = NULL, cube = c("ACGT", "TGCA"), cube_alt = c("CATG", "GTAC"), start = NA, end = NA, chr = 1L, strand = "+", genetic_code = getGeneticCode("1"), num.cores = multicoreWorkers(), tasks = 0L, verbose = TRUE )
autZ64( seq = NULL, filepath = NULL, cube = c("ACGT", "TGCA"), cube_alt = c("CATG", "GTAC"), start = NA, end = NA, chr = 1L, strand = "+", genetic_code = getGeneticCode("1"), num.cores = multicoreWorkers(), tasks = 0L, verbose = TRUE )
seq |
An object from a |
filepath |
A character vector containing the path to a file in fasta format to be read. This argument must be given if codon & base arguments are not provided. |
cube , cube_alt
|
A character string denoting pairs of the 24 Genetic-code cubes, as given in references (2-3). That is, the base pairs from the given cubes must be complementary each other. Such a cube pair are call dual cubes and, as shown in reference (3), each pair integrates group. |
start , end , chr , strand
|
Optional parameters required to build a
|
genetic_code |
The named character vector returned by
|
num.cores , tasks
|
Parameters for parallel computation using package
|
verbose |
If TRUE, prints the progress bar. |
Automorphisms in Z64 are described as functions
mod 64, where
and
are elements
from the set of integers modulo 64.
An object Automorphism-class
with four columns on its
metacolumn named: seq1, seq2, autm, and cube.
Robersy Sanchez (https://genomaths.com).
Sanchez R, Morgado E, Grau R. Gene algebra from a genetic code algebraic structure. J Math Biol. 2005 Oct;51(4):431-57. doi: 10.1007/s00285-005-0332-8. Epub 2005 Jul 13. PMID: 16012800. ( PDF).
Robersy Sanchez, Jesus Barreto (2021) Genomic Abelian Finite Groups. doi:10.1101/2021.06.01.446543
M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 110-152.PDF.
R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560. PDF
## Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln ## Automorphism on Z64 autms <- autZ64(seq = aln, verbose = FALSE) autms
## Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln ## Automorphism on Z64 autms <- autZ64(seq = aln, verbose = FALSE) autms
Given a string denoting a codon or base from the DNA (or RNA) alphabet, function base_coord return the base coordinates in the specify genetic-code Abelian group, as given in reference (1).
GRanges
of bases.Function seq2granges transform an object from
DNAStringSet
,
DNAMultipleAlignment-class
or a character into
an object from BaseSeq.
Function base_seq2string_set transforms an object from
BaseSeq into an object from DNAStringSet-class
.
base_coord(base = NULL, filepath = NULL, cube = "ACGT", group = "Z4", ...) ## S4 method for signature 'DNAStringSet_OR_NULL' base_coord( base = NULL, filepath = NULL, cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), group = c("Z4", "Z5"), start = NA, end = NA, chr = 1L, strand = "+" ) seq2granges(base = NULL, filepath = NULL, ...) ## S4 method for signature 'DNAStringSet_OR_NULL' seq2granges( base = NULL, filepath = NULL, start = NA, end = NA, chr = 1L, strand = "+", seq_alias = NULL, ... ) base_seq2string_set(x, ...) ## S4 method for signature 'BaseSeq' base_seq2string_set(x) base_matrix(base, ...) ## S4 method for signature 'DNAStringSet_OR_NULL' base_matrix( base, cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), group = c("Z4", "Z5"), seq_alias = NULL )
base_coord(base = NULL, filepath = NULL, cube = "ACGT", group = "Z4", ...) ## S4 method for signature 'DNAStringSet_OR_NULL' base_coord( base = NULL, filepath = NULL, cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), group = c("Z4", "Z5"), start = NA, end = NA, chr = 1L, strand = "+" ) seq2granges(base = NULL, filepath = NULL, ...) ## S4 method for signature 'DNAStringSet_OR_NULL' seq2granges( base = NULL, filepath = NULL, start = NA, end = NA, chr = 1L, strand = "+", seq_alias = NULL, ... ) base_seq2string_set(x, ...) ## S4 method for signature 'BaseSeq' base_seq2string_set(x) base_matrix(base, ...) ## S4 method for signature 'DNAStringSet_OR_NULL' base_matrix( base, cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), group = c("Z4", "Z5"), seq_alias = NULL )
base |
An object from a |
filepath |
A character vector containing the path to a file in fasta format to be read. This argument must be given if codon & base arguments are not provided. |
cube |
A character string denoting one of the 24 Genetic-code cubes, as given in references (2 2 3). |
group |
A character string denoting the group representation for the given base or codon as shown in reference (1). |
... |
Not in use yet. |
start , end , chr , strand
|
Optional parameters required to build a
|
seq_alias |
DNA sequence alias/ID and description. |
x |
A 'BaseSeq' class object. |
Function base_coord is defined only for pairwise aligned sequences. Symbols "-" and "N" usually found in DNA sequence alignments to denote gaps and missing/unknown bases are represented by the number: '-1' on Z4 and '0' on Z5. In Z64 the symbol 'NA' will be returned for codons including symbols "-" and "N".
For the sake of brevity the metacolumns from the object returned by function 'seq2granges' are named as 'S1', 'S2', 'S3', and so on. The original DNA sequence alias are stored in the slot named 'seq_alias'. (see examples).
Depending on the function called, different object will be returned:
This function returns a BaseGroup
object
carrying the DNA sequence(s) and their respective coordinates in the
requested Abelian group of base representation (one-dimension, "Z4" or
"Z5"). Observe that to get coordinates in the set of of integer numbers
("Z") is also possible but they are not defined to integrate a Abelian
group. These are just used for the further insertion the codon set in the
3D space (R^3).
This function returns a BaseGroup object carrying the DNA sequence(s), one
base per ranges. A BaseGroup class object inherits from
GRanges-class
.
This function returns a DNAStringSet-class
.
A BaseGroup-class object.
Robersy Sanchez https://genomaths.com
Robersy Sanchez, Jesus Barreto (2021) Genomic Abelian Finite Groups. doi:10.1101/2021.06.01.446543
M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119-152.PDF.
R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560.
Symmetric Group of the Genetic-Code Cubes.
codon_coord
and base2int
.
Symmetric Group of the Genetic-Code Cubes.
base_coord and codon_coord.
## Example 1. Let's get the base coordinates for codons "ACG" ## and "TGC": x0 <- c("ACG", "TGC") x1 <- DNAStringSet(x0) x1 ## Get the base coordinates on cube = "ACGT" on the Abelian group = "Z4" base_coord(x1, cube = "ACGT", group = "Z4") ## Example 2. Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln ## DNA base representation in the Abelian group Z4 bs_cor <- base_coord( base = aln, cube = "ACGT" ) bs_cor ## Example 3. DNA base representation in the Abelian group Z5 bs_cor <- base_coord( base = aln, cube = "ACGT", group = "Z5" ) bs_cor ## Example 4. Load a multiple sequence alignment (MSA) of primate BRCA1 DNA ## repair genes data("brca1_aln2", package = "GenomAutomorphism") brca1_aln2 ## Get BaseSeq-class object gr <- seq2granges(brca1_aln2) gr ## Transform the BaseSeq-class object into a DNAStringSet-class object str_set <- base_seq2string_set(gr) str_set ## Recovering the original MSA DNAMultipleAlignment(as.character(str_set)) ## Example 5. base_matrix(base = aln, cube = "CGTA", group = "Z5") ## Example 5.
## Example 1. Let's get the base coordinates for codons "ACG" ## and "TGC": x0 <- c("ACG", "TGC") x1 <- DNAStringSet(x0) x1 ## Get the base coordinates on cube = "ACGT" on the Abelian group = "Z4" base_coord(x1, cube = "ACGT", group = "Z4") ## Example 2. Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln ## DNA base representation in the Abelian group Z4 bs_cor <- base_coord( base = aln, cube = "ACGT" ) bs_cor ## Example 3. DNA base representation in the Abelian group Z5 bs_cor <- base_coord( base = aln, cube = "ACGT", group = "Z5" ) bs_cor ## Example 4. Load a multiple sequence alignment (MSA) of primate BRCA1 DNA ## repair genes data("brca1_aln2", package = "GenomAutomorphism") brca1_aln2 ## Get BaseSeq-class object gr <- seq2granges(brca1_aln2) gr ## Transform the BaseSeq-class object into a DNAStringSet-class object str_set <- base_seq2string_set(gr) str_set ## Recovering the original MSA DNAMultipleAlignment(as.character(str_set)) ## Example 5. base_matrix(base = aln, cube = "CGTA", group = "Z5") ## Example 5.
This function split a DNA sequence into a codon sequence.
base2codon(x, ...) ## S4 method for signature 'character' base2codon(x) ## S4 method for signature 'DNAStringSet' base2codon(x) ## S4 method for signature 'DNAMultipleAlignment' base2codon(x)
base2codon(x, ...) ## S4 method for signature 'character' base2codon(x) ## S4 method for signature 'DNAStringSet' base2codon(x) ## S4 method for signature 'DNAMultipleAlignment' base2codon(x)
x |
A character string, |
... |
Not in use. |
It is expected that the provided DNA sequence is multiple of 3, otherwise gaps are added to the end of the sequence.
If the argument of 'x' is character string, then a character vector
of codons will returned. If the argument of 'x' is
DNAStringSet-class
or
DNAMultipleAlignment-class
object, then a matrix
of codons is returned.
Robersy Sanchez https://genomaths.com. 01/15/2022
## Gaps are added at the sequence end. seq <- c("ACCT") base2codon(x = seq) ## This DNA sequence is multiple of 3 seq <- c("ACCTCA") base2codon(x = seq) ## Load a DNAStringSet. A matrix of codons is returned data("aln", package = "GenomAutomorphism") base2codon(x = aln)
## Gaps are added at the sequence end. seq <- c("ACCT") base2codon(x = seq) ## This DNA sequence is multiple of 3 seq <- c("ACCTCA") base2codon(x = seq) ## Load a DNAStringSet. A matrix of codons is returned data("aln", package = "GenomAutomorphism") base2codon(x = aln)
A simple function to represent DNA bases as elements from the Abelian group of integers modulo 4 (Z4), 5 (Z5), or 2 (Z2).
base2int(base, ...) ## S4 method for signature 'character' base2int( base, group = c("Z4", "Z5", "Z64", "Z125", "Z4^3", "Z5^3", "Z2"), cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), phychem = list(A = NULL, T = NULL, C = NULL, G = NULL, N = NULL) ) ## S4 method for signature 'data.frame' base2int( base, group = c("Z4", "Z5", "Z64", "Z125", "Z4^3", "Z5^3", "Z2"), cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), phychem = list(A = NULL, T = NULL, C = NULL, G = NULL, N = NULL) )
base2int(base, ...) ## S4 method for signature 'character' base2int( base, group = c("Z4", "Z5", "Z64", "Z125", "Z4^3", "Z5^3", "Z2"), cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), phychem = list(A = NULL, T = NULL, C = NULL, G = NULL, N = NULL) ) ## S4 method for signature 'data.frame' base2int( base, group = c("Z4", "Z5", "Z64", "Z125", "Z4^3", "Z5^3", "Z2"), cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), phychem = list(A = NULL, T = NULL, C = NULL, G = NULL, N = NULL) )
base |
A character vector, string , or a dataframe of letters from the DNA/RNA alphabet. |
... |
Not in use. |
group |
A character string denoting the group representation for the given base or codon as shown in reference (2-3). |
cube |
A character string denoting one of the 24 Genetic-code cubes, as given in references (2-3). |
phychem |
Optional. Eventually, it could be useful to represent
DNA bases by numerical values of measured physicochemical properties. If
provided, then this argument must be a named numerical list. For example,
the
where symbol 'N' provide the value for any letter out of DNA base alphabet. In this example, we could write NA or 0 (see example section). |
For Z2 (binary representation of DNA bases), the cube bases are represented in their order by: '00', '01', '10', and '11' (examples section).
A numerical vector.
Robersy Sanchez https://genomaths.com
Robersy Sanchez, Jesus Barreto (2021) Genomic Abelian Finite Groups. doi: 10.1101/2021.06.01.446543
M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119-152.PDF.
R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560.
base_coord, codon_coord, and dna_phychem.
## A triplet with a letter not from DNA/RNA alphabet ## 'NA' is introduced by coercion! base2int("UDG") ## The base replacement in cube "ACGT and group "Z4" base2int("ACGT") ## The base replacement in cube "ACGT and group "Z5" base2int("ACGT", group = "Z5") ## A vector of DNA base triplets base2int(c("UTG", "GTA")) ## A vector of DNA base triplets with different number of triplets. ## Codon 'GTA' is recycled! base2int(base = c("UTGGTA", "CGA"), group = "Z5") ## Data frames base2int(data.frame(x1 = c("UTG", "GTA"), x2 = c("UTG", "GTA"))) ## Cube bases are represented n their order by: '00', '01', '10', and '11', ## For example for cube = "ACGT" we have mapping: A -> '00', C -> '01', ## G -> '11', and C -> '10'. base2int("ACGT", group = "Z2", cube = "ACGT")
## A triplet with a letter not from DNA/RNA alphabet ## 'NA' is introduced by coercion! base2int("UDG") ## The base replacement in cube "ACGT and group "Z4" base2int("ACGT") ## The base replacement in cube "ACGT and group "Z5" base2int("ACGT", group = "Z5") ## A vector of DNA base triplets base2int(c("UTG", "GTA")) ## A vector of DNA base triplets with different number of triplets. ## Codon 'GTA' is recycled! base2int(base = c("UTGGTA", "CGA"), group = "Z5") ## Data frames base2int(data.frame(x1 = c("UTG", "GTA"), x2 = c("UTG", "GTA"))) ## Cube bases are represented n their order by: '00', '01', '10', and '11', ## For example for cube = "ACGT" we have mapping: A -> '00', C -> '01', ## G -> '11', and C -> '10'. base2int("ACGT", group = "Z2", cube = "ACGT")
This is a DNAMultipleAlignment
carrying a MSA of
BRCA1 DNA repair genes to be used in the
examples provided for the package functions. The original file can be
downloaded from GitHub at: https://bit.ly/3DimROD
data("brca1_aln", package = "GenomAutomorphism")
data("brca1_aln", package = "GenomAutomorphism")
DNAMultipleAlignment
class object.
brca1_aln2, brca1_autm, and covid_aln.
data("brca1_aln", package = "GenomAutomorphism") brca1_aln
data("brca1_aln", package = "GenomAutomorphism") brca1_aln
This is a DNAMultipleAlignment
carrying a MSA of
BRCA1 DNA repair genes to be used in the
examples provided for the package functions. The original file can be
downloaded from GitHub at: https://bit.ly/3DimROD. This data set has
41 DNA sequences and it contains the previous 20 primate variants found in
'brca1_aln' data set plus 21 single mutation variants (SMV) from the human
sequence NM_007298 transcript variant 4. The location of each SMV is given
in the heading from each sequence.
data("brca1_aln2", package = "GenomAutomorphism")
data("brca1_aln2", package = "GenomAutomorphism")
DNAMultipleAlignment
class object.
Robersy Sanchez https://genomaths.com
brca1_aln, brca1_autm2, cyc_aln, and covid_autm.
data("brca1_aln2", package = "GenomAutomorphism") brca1_aln2
data("brca1_aln2", package = "GenomAutomorphism") brca1_aln2
This is a AutomorphismList object carrying a list of pairwise automorphisms between the DNA sequences from the MSA of primate BRCA1 DNA repair gene. The automorphisms were estimated from the brca1_aln MSA with function autZ64.
data("brca1_autm", package = "GenomAutomorphism")
data("brca1_autm", package = "GenomAutomorphism")
AutomorphismList class object.
Robersy Sanchez https://genomaths.com
brca1_autm2, brca1_aln, brca1_aln2, and covid_autm.
data("brca1_autm", package = "GenomAutomorphism") brca1_autm
data("brca1_autm", package = "GenomAutomorphism") brca1_autm
This is a AutomorphismList object carrying a list of pairwise automorphisms between the DNA sequences from the MSA of primate BRCA1 DNA repair gene. The data set brca1_aln2 has 41 DNA sequences and it contains the previous 20 primate variants found in 'braca1_aln' data set plus 21 single mutation variants (SMV) from the human sequence NM_007298 transcript variant 4. The location of each SMV is given in the heading from each sequence. The automorphisms were estimated from the brca1_aln MSA with function autZ64.
data("brca1_autm2", package = "GenomAutomorphism")
data("brca1_autm2", package = "GenomAutomorphism")
AutomorphismList class object.
This is a list of 24 codon distance matrices created with function codon_dist_matrix in the set of 24 genetic-code cubes on Z4 (using the default weights and assuming the standard genetic code (SGC). The data set is created to speed up the computation when working with DNA sequences from superior organisms. Since distance matrices are symmetric, it is enough to provide the lower matrix. Each matrix is given as named/labeled vector (see the example).
data("cdm_z64", package = "GenomAutomorphism")
data("cdm_z64", package = "GenomAutomorphism")
A list object.
## Load the data set data("cdm_z64", package = "GenomAutomorphism") cdm_z64 ## The lower matrix (given as vector) for cube "TCGA" (picking out the 20 ## first values). Observe that this vector is labeled. Each numerical value ## corresponds to the distance between the codons specified by the ## name/label on it. For example, the distance between codons TTT and TCT ## is: 0.0625. head(cdm_z64[[ "TCGA" ]], 20)
## Load the data set data("cdm_z64", package = "GenomAutomorphism") cdm_z64 ## The lower matrix (given as vector) for cube "TCGA" (picking out the 20 ## first values). Observe that this vector is labeled. Each numerical value ## corresponds to the distance between the codons specified by the ## name/label on it. For example, the distance between codons TTT and TCT ## is: 0.0625. head(cdm_z64[[ "TCGA" ]], 20)
Given a string denoting a codon or base from the DNA (or RNA) alphabet and a genetic-code Abelian group as given in reference (1).
codon_coord(codon = NULL, ...) ## S4 method for signature 'BaseGroup' codon_coord(codon, group = NULL) ## S4 method for signature 'DNAStringSet_OR_NULL' codon_coord( codon = NULL, filepath = NULL, cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), group = c("Z4", "Z5", "Z64", "Z125", "Z4^3", "Z5^3"), start = NA, end = NA, chr = 1L, strand = "+" ) ## S4 method for signature 'matrix_OR_data_frame' codon_coord( codon, cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), group = c("Z64", "Z125", "Z4^3", "Z5^3") )
codon_coord(codon = NULL, ...) ## S4 method for signature 'BaseGroup' codon_coord(codon, group = NULL) ## S4 method for signature 'DNAStringSet_OR_NULL' codon_coord( codon = NULL, filepath = NULL, cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), group = c("Z4", "Z5", "Z64", "Z125", "Z4^3", "Z5^3"), start = NA, end = NA, chr = 1L, strand = "+" ) ## S4 method for signature 'matrix_OR_data_frame' codon_coord( codon, cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), group = c("Z64", "Z125", "Z4^3", "Z5^3") )
codon |
An object from |
... |
Not in use. |
group |
A character string denoting the group representation for the given base or codon as shown in reference (2-3). |
filepath |
A character vector containing the path to a file in fasta format to be read. This argument must be given if codon & base arguments are not provided. |
cube |
A character string denoting one of the 24 Genetic-code cubes, as given in references (2-3). |
start , end , chr , strand
|
Optional parameters required to build a
|
Symbols "-" and "N" usually found in DNA sequence alignments to denote gaps and missing/unknown bases are represented by the number: '-1' on Z4 and '0' on Z5. In Z64 the symbol 'NA' will be returned for codons including symbols "-" and "N".
This function returns a GRanges-class
object
carrying the codon sequence(s) and their respective coordinates in the
requested Abelian group or simply, when group = 'Z5^3'
3D-coordinates, which are derive from Z5 as indicated in reference (3).
Notice that the coordinates can be 3D or just one-dimension ("Z64" or
"Z125"). Hence, the pairwise alignment provided in argument
codon must correspond to codon sequences.
A CodonGroup-class
object.
Robersy Sanchez https://genomaths.com
Robersy Sanchez, Jesus Barreto (2021) Genomic Abelian Finite Groups. doi: 10.1101/2021.06.01.446543
M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119-152.PDF.
R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560.
Symmetric Group of the Genetic-Code Cubes.
codon_matrix, base_coord and base2int.
## Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln ## DNA base representation in the Abelian group Z5 bs_cor <- codon_coord( codon = aln, cube = "ACGT", group = "Z5" ) bs_cor ## 3-D coordinates ## DNA base representation in the Abelian group Z64 bs_cor <- codon_coord( codon = aln, cube = "ACGT", group = "Z64" ) bs_cor ## Giving a matrix of codons codon_coord(base2codon(x = aln))
## Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln ## DNA base representation in the Abelian group Z5 bs_cor <- codon_coord( codon = aln, cube = "ACGT", group = "Z5" ) bs_cor ## 3-D coordinates ## DNA base representation in the Abelian group Z64 bs_cor <- codon_coord( codon = aln, cube = "ACGT", group = "Z64" ) bs_cor ## Giving a matrix of codons codon_coord(base2codon(x = aln))
This function computes the weighted Manhattan distance between
codons from two sequences as given in reference (1). That is, given two
codons and
with coordinates on the set of integers modulo 5
("Z5"):
and
(see (1)),
the Weighted Manhattan distance between this two codons is defined as:
If the codon coordinates are given on "Z4", then the Weighted Manhattan distance is define as:
Herein, we move to the generalized version given in reference (3), for which:
where we use the vector of .
codon_dist(x, y, ...) ## S4 method for signature 'DNAStringSet' codon_dist( x, weight = NULL, group = c("Z4", "Z5"), cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), num.cores = 1L, tasks = 0L, verbose = FALSE ) ## S4 method for signature 'character' codon_dist( x, y, weight = NULL, group = c("Z4", "Z5"), cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), num.cores = 1L, tasks = 0L, verbose = FALSE ) ## S4 method for signature 'CodonGroup_OR_Automorphisms' codon_dist( x, weight = NULL, group = c("Z4", "Z5"), cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), num.cores = 1L, tasks = 0L, verbose = FALSE )
codon_dist(x, y, ...) ## S4 method for signature 'DNAStringSet' codon_dist( x, weight = NULL, group = c("Z4", "Z5"), cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), num.cores = 1L, tasks = 0L, verbose = FALSE ) ## S4 method for signature 'character' codon_dist( x, y, weight = NULL, group = c("Z4", "Z5"), cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), num.cores = 1L, tasks = 0L, verbose = FALSE ) ## S4 method for signature 'CodonGroup_OR_Automorphisms' codon_dist( x, weight = NULL, group = c("Z4", "Z5"), cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), num.cores = 1L, tasks = 0L, verbose = FALSE )
x , y
|
A character string of codon sequences, i.e., sequences of DNA
base-triplets. If only 'x' argument is given, then it must be a
|
... |
Not in use yet. |
weight |
A numerical vector of weights to compute weighted Manhattan
distance between codons. If |
group |
A character string denoting the group representation for the given codon sequence as shown in reference (2-3). |
cube |
A character string denoting one of the 24 Genetic-code cubes, as given in references (2-3). |
num.cores , tasks
|
Parameters for parallel computation using package
|
verbose |
If TRUE, prints the progress bar. |
A numerical vector with the pairwise distances between codons in sequences 'x' and 'y'.
Sanchez R. Evolutionary Analysis of DNA-Protein-Coding Regions Based on a Genetic Code Cube Metric. Curr Top Med Chem. 2014;14: 407–417. https://doi.org/10.2174/1568026613666131204110022.
M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119-152.PDF.
R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560. PDF.
codon_dist_matrix
, automorphisms
,
codon_coord
, and aminoacid_dist
.
## Let's write two small DNA sequences x = "ACGCGTGTACCGTGACTG" y = "TGCGCCCGTGACGCGTGA" codon_dist(x, y, group = "Z5") ## Alternatively, data can be vectors of codons, i.e., vectors of DNA ## base-triplets (including gaps simbol "-"). x = c("ACG","CGT","GTA","CCG","TGA","CTG","ACG") y = c("TGC","GCC","CGT","GAC","---","TGA","A-G") ## Gaps are not defined on "Z4" codon_dist(x, y, group = "Z4") ## Gaps are considered on "Z5" codon_dist(x, y, group = "Z5") ## Load an Automorphism-class object data("autm", package = "GenomAutomorphism") codon_dist(x = head(autm,20), group = "Z4") ## Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln codon_dist(x = aln, group = "Z5")
## Let's write two small DNA sequences x = "ACGCGTGTACCGTGACTG" y = "TGCGCCCGTGACGCGTGA" codon_dist(x, y, group = "Z5") ## Alternatively, data can be vectors of codons, i.e., vectors of DNA ## base-triplets (including gaps simbol "-"). x = c("ACG","CGT","GTA","CCG","TGA","CTG","ACG") y = c("TGC","GCC","CGT","GAC","---","TGA","A-G") ## Gaps are not defined on "Z4" codon_dist(x, y, group = "Z4") ## Gaps are considered on "Z5" codon_dist(x, y, group = "Z5") ## Load an Automorphism-class object data("autm", package = "GenomAutomorphism") codon_dist(x = head(autm,20), group = "Z4") ## Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln codon_dist(x = aln, group = "Z5")
This function computes the codon distance matrix based on the
weighted Manhattan distance between codons estimated with function
codon_dist
.
codon_dist_matrix( genetic_code = "1", group = c("Z4", "Z5"), weight = NULL, cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), output = c("list", "vector", "dist"), num.cores = 1L )
codon_dist_matrix( genetic_code = "1", group = c("Z4", "Z5"), weight = NULL, cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), output = c("list", "vector", "dist"), num.cores = 1L )
genetic_code |
A single string that uniquely identifies the genetic
code to extract. Should be one of the values in the id or name2 columns of
|
group |
A character string denoting the group representation for the given codon sequence as shown in reference (2-3). |
weight |
A numerical vector of weights to compute weighted Manhattan
distance between codons. If |
cube |
A character string denoting one of the 24 Genetic-code cubes, as given in references (2-3). |
output |
Format of the returned lower triangular matrix: as a list of 63 elements (labeled) or as a labeled vector using codons as labels. |
num.cores |
An integer to setup the number of parallel workers via
|
By construction, a distance matrix is a symmetric matrix. Hence, the knowledge of lower triangular matrix is enough for its application to any downstream analysis.
A lower triangular matrix excluding the diagonal.
## The distance matrix for codons for the Invertebrate Mitochondrial, ## cube "TGCA" with base-triplet represented on the group "Z5". Each ## coordinate from each returned numerical vector corresponds to the ## distance between codons given in the coordinate name. x <- codon_dist_matrix(genetic_code = "5", cube = "TGCA", group = "Z5", output = "vector") x[seq(61, 63)]
## The distance matrix for codons for the Invertebrate Mitochondrial, ## cube "TGCA" with base-triplet represented on the group "Z5". Each ## coordinate from each returned numerical vector corresponds to the ## distance between codons given in the coordinate name. x <- codon_dist_matrix(genetic_code = "5", cube = "TGCA", group = "Z5", output = "vector") x[seq(61, 63)]
This function build the coordinate matrix for each sequence from an aligned set of DNA codon sequences.
codon_matrix(base, ...) ## S4 method for signature 'BaseSeqMatrix' codon_matrix(base, num.cores = 1L, tasks = 0L, verbose = TRUE, ...) ## S4 method for signature 'DNAStringSet' codon_matrix( base, cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), group = c("Z4", "Z5"), num.cores = 1L, tasks = 0L, verbose = TRUE ) ## S4 method for signature 'DNAMultipleAlignment' codon_matrix( base, cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), group = c("Z4", "Z5"), num.cores = 1L, tasks = 0L, verbose = TRUE )
codon_matrix(base, ...) ## S4 method for signature 'BaseSeqMatrix' codon_matrix(base, num.cores = 1L, tasks = 0L, verbose = TRUE, ...) ## S4 method for signature 'DNAStringSet' codon_matrix( base, cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), group = c("Z4", "Z5"), num.cores = 1L, tasks = 0L, verbose = TRUE ) ## S4 method for signature 'DNAMultipleAlignment' codon_matrix( base, cube = c("ACGT", "AGCT", "TCGA", "TGCA", "CATG", "GTAC", "CTAG", "GATC", "ACTG", "ATCG", "GTCA", "GCTA", "CAGT", "TAGC", "TGAC", "CGAT", "AGTC", "ATGC", "CGTA", "CTGA", "GACT", "GCAT", "TACG", "TCAG"), group = c("Z4", "Z5"), num.cores = 1L, tasks = 0L, verbose = TRUE )
base |
A |
... |
Not in use yet. |
num.cores , tasks
|
Parameters for parallel computation using package
|
verbose |
If TRUE, prints the function log to stdout |
cube |
A character string denoting one of the 24 Genetic-code cubes, as given in references (3-4). |
group |
A character string denoting the group representation for the given base or codon as shown in reference (3-4). |
The purpose of this function is making the codon coordinates from multiple sequence alignments (MSA) available for further downstream statistical analyses, like those reported in references (1) and (2).
A ListCodonMatrix class object with the codon coordinate on its metacolumns.
Robersy Sanchez https://genomaths.com
Lorenzo-Ginori, Juan V., Aníbal Rodríguez-Fuentes, Ricardo Grau Ábalo, and Robersy Sánchez Rodríguez. "Digital signal processing in the analysis of genomic sequences." Current Bioinformatics 4, no. 1 (2009): 28-40.
Sanchez, Robersy. "Evolutionary analysis of DNA-protein-coding regions based on a genetic code cube metric." Current Topics in Medicinal Chemistry 14, no. 3 (2014): 407-417.
Robersy Sanchez, Jesus Barreto (2021) Genomic Abelian Finite Groups. doi: 10.1101/2021.06.01.446543
M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119-152.PDF.
R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560.
codon_coord, base_coord and base2int.
## Load the MSA of Primate BRCA1 DNA repair genes data("brca1_aln") ## Get the DNAStringSet for the first 33 codons and apply 'codon_matrix' brca1 <- unmasked(brca1_aln) brca1 <- subseq(brca1, start = 1, end = 33) codon_matrix(brca1) ## Get back the alignment object and apply 'codon_matrix' gives us the ## same result. brca1 <- DNAMultipleAlignment(as.character(brca1)) codon_matrix(brca1)
## Load the MSA of Primate BRCA1 DNA repair genes data("brca1_aln") ## Get the DNAStringSet for the first 33 codons and apply 'codon_matrix' brca1 <- unmasked(brca1_aln) brca1 <- subseq(brca1, start = 1, end = 33) codon_matrix(brca1) ## Get back the alignment object and apply 'codon_matrix' gives us the ## same result. brca1 <- DNAMultipleAlignment(as.character(brca1)) codon_matrix(brca1)
Returns the Conserved or the Non-conserved Regions from a MSA.
conserved_regions(x, ...) ## S4 method for signature 'Automorphism' conserved_regions( x, conserved = TRUE, output = c("all_pairs", "unique_pairs", "unique") ) ## S4 method for signature 'AutomorphismList' conserved_regions( x, conserved = TRUE, output = c("all_pairs", "unique_pairs", "unique"), num.cores = multicoreWorkers(), tasks = 0L, verbose = FALSE ) ## S4 method for signature 'AutomorphismByCoef' conserved_regions( x, conserved = TRUE, output = c("all_pairs", "unique_pairs", "unique") ) ## S4 method for signature 'AutomorphismByCoefList' conserved_regions( x, conserved = TRUE, output = c("all_pairs", "unique_pairs", "unique") )
conserved_regions(x, ...) ## S4 method for signature 'Automorphism' conserved_regions( x, conserved = TRUE, output = c("all_pairs", "unique_pairs", "unique") ) ## S4 method for signature 'AutomorphismList' conserved_regions( x, conserved = TRUE, output = c("all_pairs", "unique_pairs", "unique"), num.cores = multicoreWorkers(), tasks = 0L, verbose = FALSE ) ## S4 method for signature 'AutomorphismByCoef' conserved_regions( x, conserved = TRUE, output = c("all_pairs", "unique_pairs", "unique") ) ## S4 method for signature 'AutomorphismByCoefList' conserved_regions( x, conserved = TRUE, output = c("all_pairs", "unique_pairs", "unique") )
x |
A |
... |
Not in use. |
conserved |
Logical, Whether to return the conserved or the non-conserved regions. |
output |
A character string. Type of output. |
num.cores , tasks
|
Integers. Argument num.cores denotes the
number of cores to use, i.e. at most how many child processes will be run
simultaneously (see |
verbose |
logic(1). If TRUE, enable progress bar. |
A AutomorphismByCoef
class object containing the
requested regions.
## Load dataset data("autm", package = "GenomAutomorphism") conserved_regions(autm[1:3]) ## Load automorphism found COVID datatset data("covid_autm", package = "GenomAutomorphism") ## Conserved regions in the first 100 codons conserv <- conserved_regions(covid_autm[1:100], output = "unique") conserv
## Load dataset data("autm", package = "GenomAutomorphism") conserved_regions(autm[1:3]) ## Load automorphism found COVID datatset data("covid_autm", package = "GenomAutomorphism") ## Conserved regions in the first 100 codons conserv <- conserved_regions(covid_autm[1:100], output = "unique") conserv
This is a DNAMultipleAlignment
carrying the
pairwise sequence alignment of SARS coronavirus GZ02 (GenBank: AY390556.1:
265-13398_13398-21485) and Bat SARS-like coronavirus isolate bat-SL-CoVZC45
(GenBank: MG772933.1:265-1345513455-21542), complete genomes. The alignment
is available at GitHub:
https://github.com/genomaths/seqalignments/tree/master/COVID-19
data("covid_aln", package = "GenomAutomorphism")
data("covid_aln", package = "GenomAutomorphism")
DNAMultipleAlignment
class object.
Robersy Sanchez https://genomaths.com
brca1_aln, brca1_autm2, cyc_aln and covid_aln.
data("covid_aln", package = "GenomAutomorphism") covid_aln
data("covid_aln", package = "GenomAutomorphism") covid_aln
This is a AutomorphismList object carrying a list of pairwise automorphisms between the SARS coronavirus GZ02 (GenBank: AY390556.1: 265-13398_13398-21485) and Bat SARS-like coronavirus isolate bat-SL-CoVZC45 (GenBank: KY417151.1: protein-coding regions). The pairwise DNA sequence alignment is available in the dataset named covid_aln and the automorphisms were estimated with function autZ64.
data("covid_autm", package = "GenomAutomorphism")
data("covid_autm", package = "GenomAutomorphism")
AutomorphismList
class object.
Robersy Sanchez https://genomaths.com
brca1_autm, brca1_autm2, cyc_autm, and covid_aln.
data("covid_autm", package = "GenomAutomorphism", envir = environment()) covid_autm
data("covid_autm", package = "GenomAutomorphism", envir = environment()) covid_autm
This is a DNAMultipleAlignment
carrying a MSA of
Primate Somatic Cytochrome C to be used in the
examples provided for the package functions. The original file can be
downloaded from GitHub at: https://bit.ly/3kdEAzs
data("cyc_aln", package = "GenomAutomorphism")
data("cyc_aln", package = "GenomAutomorphism")
DNAMultipleAlignment class object.
Robersy Sanchez https://genomaths.com
brca1_aln, brca1_aln2, covid_aln, and covid_aln.
data("cyc_aln", package = "GenomAutomorphism") cyc_aln
data("cyc_aln", package = "GenomAutomorphism") cyc_aln
This is a AutomorphismList
object carrying a list of pairwise
automorphisms between the DNA sequences from the MSA of
Primate Somatic Cytochrome C to be used in the
examples provided for the package functions. The automorphisms were
estimated from the cyc_aln
MSA with function
autZ64
.
data("cyc_autm", package = "GenomAutomorphism")
data("cyc_autm", package = "GenomAutomorphism")
AutomorphismList
class object.
Robersy Sanchez https://genomaths.com
brca1_autm, brca1_autm2, covid_autm, and covid_aln.
data("cyc_autm", package = "GenomAutomorphism") cyc_autm
data("cyc_autm", package = "GenomAutomorphism") cyc_autm
This data set carries some relevant physicochemical properties of the DNA bases. Available properties are:
It is an indicatio of the thermodynamic gradient between a molecule and the anionic form of that molecule upon removal of a proton from it (Wikipedia). The proton affinity values, given in kJ/mol, were taken from reference (1), also available in Wolfram Alpha at https://www.wolframalpha.com/ and in the cell phone App 'Wolfram Alpha'.. Reference (2) provides several measurements accomplished by several computational and experimental approaches.
1-octanol/water partition coefficients, logP. In the physical sciences, a partition coefficient (P) or distribution coefficient (D) is the ratio of concentrations of a compound in a mixture of two immiscible solvents at equilibrium (3). The partition coefficient measures how hydrophilic ("water-loving") or hydrophobic ("water-fearing") a chemical substance is. Partition coefficients are useful in estimating the distribution of drugs within the body. Hydrophobic drugs with high octanol-water partition coefficients are mainly distributed to hydrophobic areas such as lipid bilayers of cells. Conversely, hydrophilic drugs (low octanol/water partition coefficients) are found primarily in aqueous regions such as blood serum. The partition coefficient values included here were taken from reference (1), also available in Wolfram Alpha at https://www.wolframalpha.com/ and in the cell phone App 'Wolfram Alpha'.
Dipole-dipole, dipole-induced-dipole and London force interactions among the bases in the helix are large, and make the free energy of the helix depend on the base composition and sequence. The dipole moment values were taken from reference (4). The dipole moment of DNA bases refers to the measure of polarity in the chemical bonds between atoms within the nucleobases. Dipole moments arise due to differences in electronegativity between the bonded atoms. In DNA bases, these dipole moments can influence the orientation of the bases when interacting with other molecules or surfaces, such as graphene/h-BN interfaces. The concept of dipole moments has been applied to analyze the electric moments of RNA-binding proteins, which can help identify DNA-binding proteins and provide insights into their mechanisms and prediction.
The term “tautomerism” is usually defined as structural isomerism with a low barrier to interconversion between the isomers, for example, the enol/imino forms for cytosine and guanine. Tautomerization is a process where the chemical structure of a molecule, such as DNA bases, undergoes a rearrangement of its atoms. This rearrangement results in the formation of different isomers, called tautomers, which can exist in solution or in a cell. The DNA bases can undergo tautomeric shifts, which can potentially contribute to mutagenic mispairings during DNA replication. The energy required for tautomerization of DNA bases is known as tautomerization energy. These values were taken from reference (2) and the value for each base corresponds to the average of the values estimated from different measurement approaches.
data("dna_phyche", package = "GenomAutomorphism")
data("dna_phyche", package = "GenomAutomorphism")
A data frame object.
Wolfram Research (2007), ChemicalData, Wolfram Language function, https://reference.wolfram.com/language/ref/ChemicalData.html (updated 2016).
Moser A, Range K, York DM. Accurate proton affinity and gas-phase basicity values for molecules important in biocatalysis. J Phys Chem B. 2010;114: 13911–13921. doi:10.1021/jp107450n.
Leo A, Hansch C, Elkins D. Partition coefficients and their uses. Chem Rev. 1971;71: 525–616. doi:10.1021/cr60274a001.
Vovusha H, Amorim RG, Scheicher RH, Sanyal B. Controlling the orientation of nucleobases by dipole moment interaction with graphene/h-BN interfaces. RSC Adv. Royal Society of Chemistry; 2018;8: 6527–6531. doi:10.1039/c7ra11664k.
data("dna_phyche", package = "GenomAutomorphism") dna_phyche ## Select DNA base tautomerization energy te <- as.list(dna_phyche$tautomerization_energy) names(te) <- rownames(dna_phyche) ## Let's create DNAStringSet-class object base <- DNAStringSet(x = c( seq1 ='ACGTGATCAAGT', seq2 = 'GTGTGATCCAGT')) dna_phychem(seqs = base, phychem = te, index_name = "Tautomerization-Energy")
data("dna_phyche", package = "GenomAutomorphism") dna_phyche ## Select DNA base tautomerization energy te <- as.list(dna_phyche$tautomerization_energy) names(te) <- rownames(dna_phyche) ## Let's create DNAStringSet-class object base <- DNAStringSet(x = c( seq1 ='ACGTGATCAAGT', seq2 = 'GTGTGATCCAGT')) dna_phychem(seqs = base, phychem = te, index_name = "Tautomerization-Energy")
This function applies the numerical indices representing various physicochemical and biochemical properties of DNA bases. As results, DNA sequences are represented as numerical vectors which can be subject of further downstream statistical analysis and digital signal processing.
dna_phychem(seqs, ...) ## S4 method for signature 'character' dna_phychem( seqs, phychem = list(A = NULL, T = NULL, C = NULL, G = NULL, N = NULL) ) ## S4 method for signature 'DNAStringSet_OR_DNAMultipleAlignment' dna_phychem( seqs, phychem = list(A = NULL, T = NULL, C = NULL, G = NULL, N = NULL), index_name = NULL, ... )
dna_phychem(seqs, ...) ## S4 method for signature 'character' dna_phychem( seqs, phychem = list(A = NULL, T = NULL, C = NULL, G = NULL, N = NULL) ) ## S4 method for signature 'DNAStringSet_OR_DNAMultipleAlignment' dna_phychem( seqs, phychem = list(A = NULL, T = NULL, C = NULL, G = NULL, N = NULL), index_name = NULL, ... )
seqs |
A character string, a |
... |
Not in use. |
phychem |
A list of DNA bases physicochemical properties, e.g., like those provided in dna_phyche. |
index_name |
Optional. Name of breve description of the base physicochemical property applied to represent the DNA sequence. |
A MatrixSeq-class object.
Robersy Sanchez https://genomaths.com
## Let's create DNAStringSet-class object base <- DNAStringSet(x = c( seq1 ='ACGTGATCAAGT', seq2 = 'GTGTGATCCAGT', seq3 = 'TCCTGATCAGGT')) dna_phychem(seqs = base, phychem = list('A' = 0.87, 'C' = 0.88, 'T' = 0.82, 'G' = 0.89, 'N' = NA), index_name = "Proton-Affinity")
## Let's create DNAStringSet-class object base <- DNAStringSet(x = c( seq1 ='ACGTGATCAAGT', seq2 = 'GTGTGATCCAGT', seq3 = 'TCCTGATCAGGT')) dna_phychem(seqs = base, phychem = list('A' = 0.87, 'C' = 0.88, 'T' = 0.82, 'G' = 0.89, 'N' = NA), index_name = "Proton-Affinity")
Given a string denoting a codon or base from the DNA (or RNA)
alphabet and a genetic-code Abelian group as given in reference (1), this
function returns an object from CodonGroup-class
carrying the
DNA base/codon sequence and coordinates represented on the given Abelian
group.
get_coord(x, ...) ## S4 method for signature 'BaseGroup_OR_CodonGroup' get_coord(x, output = c("all", "matrix.list")) ## S4 method for signature 'DNAStringSet_OR_NULL' get_coord( x, output = c("all", "matrix.list"), base_seq = TRUE, filepath = NULL, cube = "ACGT", group = "Z4", start = NA, end = NA, chr = 1L, strand = "+" )
get_coord(x, ...) ## S4 method for signature 'BaseGroup_OR_CodonGroup' get_coord(x, output = c("all", "matrix.list")) ## S4 method for signature 'DNAStringSet_OR_NULL' get_coord( x, output = c("all", "matrix.list"), base_seq = TRUE, filepath = NULL, cube = "ACGT", group = "Z4", start = NA, end = NA, chr = 1L, strand = "+" )
x |
An object from a |
... |
Not in use. |
output |
See 'Value' section. |
base_seq |
Logical. Whether to return the base or codon coordinates on the selected Abelian group. If codon coordinates are requested, then the number of the DNA bases in the given sequences must be multiple of 3. |
filepath |
A character vector containing the path to a file in fasta format to be read. This argument must be given if codon & base arguments are not provided. |
cube |
A character string denoting one of the 24 Genetic-code cubes, as given in references (2 2 3). |
group |
A character string denoting the group representation for the given base or codon as shown in reference (1). |
start , end , chr , strand
|
Optional parameters required to build a
|
Symbols '-' and 'N' usually found in DNA sequence alignments to denote gaps and missing/unknown bases are represented by the number: '-1' on Z4 and '0' in Z5. In Z64 the symbol 'NA' will be returned for codons including symbols '-' and 'N'.
Although the CodonGroup-class
object returned by
functions codon_coord
and base_coord
are useful
to store genomic information, the base and codon coordinates are not given
on them as numeric magnitudes. Function get_coord
provides
the way to get the coordinates in a numeric object in object from and still
to preserve the base/codon sequence information.
An object from CodonGroup-class
class is returned
when output = 'all'. This has two slots, the first one carrying a
list of matrices and the second one carrying the codon/base sequence
information. That is, if x is an object from
CodonGroup-class
class, then a list of matrices of codon
coordinate can be retrieved as x@CoordList and the information on the
codon sequence as x@SeqRanges.
if output = 'matrix.list', then an object from
MatrixList
class is returned.
## Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln ## DNA base representation in the Abelian group Z5 coord <- get_coord( x = aln, cube = "ACGT", group = "Z5" ) coord ## A list of vectors ## Extract the coordinate list coordList(coord) ## Extract the sequence list seqRanges(coord) ## DNA codon representation in the Abelian group Z64 coord <- get_coord( x = aln, base_seq = FALSE, cube = "ACGT", group = "Z64" ) coord ## Extract the coordinate list coordList(coord) ## Extract the sequence list seqRanges(coord)
## Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln ## DNA base representation in the Abelian group Z5 coord <- get_coord( x = aln, cube = "ACGT", group = "Z5" ) coord ## A list of vectors ## Extract the coordinate list coordList(coord) ## Extract the sequence list seqRanges(coord) ## DNA codon representation in the Abelian group Z64 coord <- get_coord( x = aln, base_seq = FALSE, cube = "ACGT", group = "Z64" ) coord ## Extract the coordinate list coordList(coord) ## Extract the sequence list seqRanges(coord)
This function is applied to get the mutation or contact potential scores representing the similarity/distance between amino acids corresponding to substitution mutations. The scores are retrieved from a mutation matrix or a statistical protein contact potentials matrix from AAindex (ver.9.2).
Alternatively, the mutation scores can be estimated based on an user mutation matrix, for example, see aminoacid_dist and codon_dist_matrix.
get_mutscore(aa1, aa2, ...) ## S4 method for signature 'character,character' get_mutscore( aa1, aa2, acc = NULL, aaindex = NULL, mutmat = NULL, alphabet = c("AA", "DNA"), num.cores = 1L, tasks = 0L, verbose = FALSE, ... ) ## S4 method for signature 'BaseSeq,missing' get_mutscore( aa1, aa2, acc = NULL, aaindex = NULL, mutmat = NULL, alphabet = c("AA", "DNA"), stat = mean, numcores = 1L, num.cores = 1L, tasks = 0L, output = c("dist", "matrix", "vector"), na.rm = TRUE, verbose = TRUE, ... ) ## S4 method for signature 'DNAStringSet,missing' get_mutscore( aa1, aa2, acc = NULL, aaindex = NULL, mutmat = NULL, alphabet = c("AA", "DNA"), stat = mean, num.cores = 1L, tasks = 0L, verbose = TRUE, output = c("dist", "matrix", "vector"), na.rm = TRUE, ... ) ## S4 method for signature 'DNAMultipleAlignment,missing' get_mutscore( aa1, aa2, acc = NULL, aaindex = NULL, mutmat = NULL, alphabet = c("AA", "DNA"), stat = mean, num.cores = 1L, tasks = 0L, verbose = TRUE, output = c("dist", "matrix", "vector"), na.rm = TRUE, ... )
get_mutscore(aa1, aa2, ...) ## S4 method for signature 'character,character' get_mutscore( aa1, aa2, acc = NULL, aaindex = NULL, mutmat = NULL, alphabet = c("AA", "DNA"), num.cores = 1L, tasks = 0L, verbose = FALSE, ... ) ## S4 method for signature 'BaseSeq,missing' get_mutscore( aa1, aa2, acc = NULL, aaindex = NULL, mutmat = NULL, alphabet = c("AA", "DNA"), stat = mean, numcores = 1L, num.cores = 1L, tasks = 0L, output = c("dist", "matrix", "vector"), na.rm = TRUE, verbose = TRUE, ... ) ## S4 method for signature 'DNAStringSet,missing' get_mutscore( aa1, aa2, acc = NULL, aaindex = NULL, mutmat = NULL, alphabet = c("AA", "DNA"), stat = mean, num.cores = 1L, tasks = 0L, verbose = TRUE, output = c("dist", "matrix", "vector"), na.rm = TRUE, ... ) ## S4 method for signature 'DNAMultipleAlignment,missing' get_mutscore( aa1, aa2, acc = NULL, aaindex = NULL, mutmat = NULL, alphabet = c("AA", "DNA"), stat = mean, num.cores = 1L, tasks = 0L, verbose = TRUE, output = c("dist", "matrix", "vector"), na.rm = TRUE, ... )
aa1 , aa2
|
A simple character representing an amino acids or a
character string of letter from the amino acid alphabet or base-triplets
from the DNA/RNA alphabet. If aa1 is an object from any
of the classes: BaseSeq, |
... |
Not in use. |
acc |
Accession id for a specified mutation or contact potential matrix. |
aaindex |
Database where the requested accession id is locate. The possible values are: "aaindex2" or "aaindex3". |
mutmat |
A mutation or any score matrix provided by the user. |
alphabet |
Whether the alphabet is from the 20 amino acid (AA) or four (DNA)/RNA base alphabet. This would prevent mistakes, i.e., the strings "ACG" would be a base-triplet on the DNA alphabet or simply the amino acid sequence of alanine, cysteine, and glutamic acid. |
num.cores , tasks
|
Parameters for parallel computation using package
|
verbose |
Optional. Only if num.cores > 1. If TRUE, prints the function log to stdout. |
stat |
Statistic that will be used to summarize the scores of the
DNA sequences provided. Only if aa1 is an object from any of
the classes: BaseSeq, |
numcores |
An integer to setup the number of parallel workers via
|
output |
Optional. Class of the returned object. Only if
aa1 is an object from any of the classes: BaseSeq,
|
na.rm |
a logical evaluating to TRUE or FALSE indicating whether NA values should be stripped before the computation proceeds. |
If a score matrix is provided by the user, then it must be a symmetric matrix 20x20.
A single numeric score or a numerical vector, or if
aa1 is an object from any of the classes: BaseSeq,
DNAStringSet
, or
DNAMultipleAlignment
, then depending on the
user selection the returned object will be:
A lower diagonal numerical vector of the sequence pairwise scores.
A dist
-class object.
A whole score matrix.
Robersy Sanchez https://genomaths.com
aa_mutmat, aaindex2 and aaindex3.
## A single amino acids substitution mutation get_mutscore("A", "C", acc = "MIYS930101", aaindex = "aaindex2") ## A tri-peptide mutation get_mutscore(aa1 = "ACG", aa2 = "ATG", acc = "MIYS930101", aaindex = "aaindex2", alphabet = "AA") ## A single base-triple mutation, i.e., a single amino acid substitution ## mutation get_mutscore(aa1 = "ACG", aa2 = "CTA", acc = "MIYS930101", aaindex = "aaindex2", alphabet = "DNA") ## Peptides can be also written as: get_mutscore(aa1 = c("A","C","G"), aa2 = c("C","T","A"), acc = "MIYS930101", aaindex = "aaindex2", alphabet = "AA")
## A single amino acids substitution mutation get_mutscore("A", "C", acc = "MIYS930101", aaindex = "aaindex2") ## A tri-peptide mutation get_mutscore(aa1 = "ACG", aa2 = "ATG", acc = "MIYS930101", aaindex = "aaindex2", alphabet = "AA") ## A single base-triple mutation, i.e., a single amino acid substitution ## mutation get_mutscore(aa1 = "ACG", aa2 = "CTA", acc = "MIYS930101", aaindex = "aaindex2", alphabet = "DNA") ## Peptides can be also written as: get_mutscore(aa1 = c("A","C","G"), aa2 = c("C","T","A"), acc = "MIYS930101", aaindex = "aaindex2", alphabet = "AA")
For the sake of saving memory, each
Automorphism-class
objects is stored in an AutomorphismList-class
, which does
not inherits from a GRanges-class
.
getAutomorphisms(x, ...) ## S4 method for signature 'AutomorphismList' getAutomorphisms(x) ## S4 method for signature 'list' getAutomorphisms(x) ## S4 method for signature 'DataFrame_OR_data.frame' getAutomorphisms(x)
getAutomorphisms(x, ...) ## S4 method for signature 'AutomorphismList' getAutomorphisms(x) ## S4 method for signature 'list' getAutomorphisms(x) ## S4 method for signature 'DataFrame_OR_data.frame' getAutomorphisms(x)
x |
|
... |
Not in use. |
This function just transform each Automorphism-class
object into an object from the same class but now inheriting from a
GRanges-class
.
This function returns an AutomorphismList-class
object as a list of Automorphism-class
objects, which inherits
from GRanges-class
objects.
## Load a dataset data("autm", package = "GenomAutomorphism") aut <- mcols(autm) aut ## This a DataFrame object ## The natural ranges for the sequence (from 1 to length(aut)) are added getAutomorphisms(aut) ## A list of automorphisms aut <- list(aut, aut) getAutomorphisms(aut) ## Automorphism-class inherits from 'GRanges-class' aut <- as(autm, "GRanges") as(aut, "Automorphism")
## Load a dataset data("autm", package = "GenomAutomorphism") aut <- mcols(autm) aut ## This a DataFrame object ## The natural ranges for the sequence (from 1 to length(aut)) are added getAutomorphisms(aut) ## A list of automorphisms aut <- list(aut, aut) getAutomorphisms(aut) ## Automorphism-class inherits from 'GRanges-class' aut <- as(autm, "GRanges") as(aut, "Automorphism")
Extract the Coordinate Representation from DNA Sequences on Specified Abelian Group.
matrices(x, ...) ## S4 method for signature 'MatrixList' matrices(x) ## S4 method for signature 'CodonSeq' matrices(x) ## S4 method for signature 'DNAStringSet_OR_NULL' matrices( x, base_seq = TRUE, filepath = NULL, cube = "ACGT", group = c("Z4", "Z5", "Z64", "Z125", "Z4^3", "Z5^3"), start = NA, end = NA, chr = 1L, strand = "+" )
matrices(x, ...) ## S4 method for signature 'MatrixList' matrices(x) ## S4 method for signature 'CodonSeq' matrices(x) ## S4 method for signature 'DNAStringSet_OR_NULL' matrices( x, base_seq = TRUE, filepath = NULL, cube = "ACGT", group = c("Z4", "Z5", "Z64", "Z125", "Z4^3", "Z5^3"), start = NA, end = NA, chr = 1L, strand = "+" )
x |
An object from a |
... |
Not in use. |
base_seq |
Logical. Whether to return the base or codon coordinates on the selected Abelian group. If codon coordinates are requested, then the number of the DNA bases in the given sequences must be multiple of 3. |
filepath |
A character vector containing the path to a file in fasta format to be read. This argument must be given if codon & base arguments are not provided. |
cube |
A character string denoting one of the 24 Genetic-code cubes, as given in references (2-3). |
group |
A character string denoting the group representation for the given base or codon as shown in reference (1). |
start , end , chr , strand
|
Optional parameters required to build a
|
These are alternative ways to get the list of matrices of base/codon
coordinate and the information on the codon sequence from
CodonSeq
and MatrixList
class objects. These
functions can either take the output from functions base_coord
and matrices
or to operate directly on a
DNAStringSet
or to retrieve the a DNA sequence
alignment from a file.
base_seq parameter will determine whether to return the
matrices of coordinate for a DNA or codon sequence. While in function
seqranges
, granges parameter will determine
whether to return a GRanges-class
object or a
DataFrame
.
The a list of vectors (group = c("Z4", "Z5", "Z64", "Z125") or a list of matrices (group = ("Z4^3", "Z5^3")) carrying the coordinate representation on the specified Abelian group.
Robersy Sanchez https://genomaths.com
Robersy Sanchez, Jesus Barreto (2021) Genomic Abelian Finite Groups. doi: 10.1101/2021.06.01.446543
M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119-152.PDF.
R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560.
Symmetric Group of the Genetic-Code Cubes.
## Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln ## Coordinate representation of the aligned sequences on "Z4". ## A list of vectors matrices( x = aln, base_seq = TRUE, filepath = NULL, cube = "ACGT", group = "Z4", ) ## Coordinate representation of the aligned sequences on "Z4". ## A list of matrices matrices( x = aln, base_seq = FALSE, filepath = NULL, cube = "ACGT", group = "Z5^3", )
## Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln ## Coordinate representation of the aligned sequences on "Z4". ## A list of vectors matrices( x = aln, base_seq = TRUE, filepath = NULL, cube = "ACGT", group = "Z4", ) ## Coordinate representation of the aligned sequences on "Z4". ## A list of matrices matrices( x = aln, base_seq = FALSE, filepath = NULL, cube = "ACGT", group = "Z5^3", )
Integer remainder of the division of the integer n by m: n mod m.
mod(n, m, ...) ## S4 method for signature 'matrix,numeric' mod(n, m)
mod(n, m, ...) ## S4 method for signature 'matrix,numeric' mod(n, m)
n |
A numeric vector (preferably of integers), a matrix where each element can be reduced to integers. |
m |
An integer vector (positive, zero, or negative). |
... |
Not in use. |
An element of x, an Automorphism-class
object.
Robersy Sanchez (https://genomaths.com).
## Example 1 ## Build a matrix 'n' and set a vector of integers 'm' n <- diag(x=1, nrow = 4, ncol = 4) * c(43,125,2,112) m <- c(64,4,4,64) ## Operation n mod m mod(n = n, m = m) ## Or simply: n %% m ## Example 2 m <- matrix(c(8,2,3, 11,12,13), nrow = 2) m m %% 4
## Example 1 ## Build a matrix 'n' and set a vector of integers 'm' n <- diag(x=1, nrow = 4, ncol = 4) * c(43,125,2,112) m <- c(64,4,4,64) ## Operation n mod m mod(n = n, m = m) ## Or simply: n %% m ## Example 2 m <- matrix(c(8,2,3, 11,12,13), nrow = 2) m m %% 4
If , and
are integer vectors, this function
try to find, at each coordinate, the solution of the MLE
mod
. If the MLE
has not
solutions (see
modlin
), the value reported for the
coordinate will be 0 and the corresponding translation.
modlineq(a, b, n, no.sol = 0L)
modlineq(a, b, n, no.sol = 0L)
a |
An integer or a vector of integers. |
b |
An integer or a vector of integers. |
n |
An integer or a vector of integers. |
no.sol |
Values to return when the equation is not solvable or yield the value 0. Default is 0. |
For , and
integer scalars, it is just a
wrapper function to call
modlin
.
If the solution is exact, then a numerical vector will be returned, otherwise, if there is not exact solution for some coordinate, the a list carrying the element on the diagonal matrix and a translation vector will be returned.
## Set the vector x, y, and m. x <- c(9,32,24,56,60,27,28,5) y <- c(8,1,0,56,60,0,28,2) modulo <- c(64,125,64,64,64,64,64,64) ## Try to solve the modular equation a x = b mod n m <- modlineq(a = x, b = y, n = modulo) m ## Or in matrix form diag(m) ## The reverse mapping is an affine transformation mt <- modlineq(a = y, b = x, n = modulo, no.sol = 1L) mt ## That is, vector 'x' is revovered with the transformaiton (y %*% diag(mt$diag) + mt$translation) %% modulo # Or cat("\n---- \n") (y %*% diag(mt$diag) + mt$translation) %% modulo == x
## Set the vector x, y, and m. x <- c(9,32,24,56,60,27,28,5) y <- c(8,1,0,56,60,0,28,2) modulo <- c(64,125,64,64,64,64,64,64) ## Try to solve the modular equation a x = b mod n m <- modlineq(a = x, b = y, n = modulo) m ## Or in matrix form diag(m) ## The reverse mapping is an affine transformation mt <- modlineq(a = y, b = x, n = modulo, no.sol = 1L) mt ## That is, vector 'x' is revovered with the transformaiton (y %*% diag(mt$diag) + mt$translation) %% modulo # Or cat("\n---- \n") (y %*% diag(mt$diag) + mt$translation) %% modulo == x
Each DNA/RNA base can be classified into three main classes according to
three criteria (1): number of hydrogen bonds (strong-weak), chemical type
(purine-pyrimidine), and chemical groups (amino versus keto). Each criterion
produces a partition of the set of bases: 1. According to the number of
hydrogen bonds (on DNA/RNA double helix): strong (three
hydrogen bonds) and weak
(two hydrogen bonds); 2. According to
the chemical type: purines
and pyrimidines
. 3.
According to the presence of amino or keto groups on the base rings: amino
and keto
. So, each mutational event can be
classified as according to the type of involved in it (2).
mut_type(x, y)
mut_type(x, y)
x , y
|
Character strings denoting DNA bases |
A character string of same length of 'x' and 'y'.
A. Cornish-Bowden, Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984, Nucleic Acids Res. 13 (1985) 3021-3030.
MA.A. Jimenez-Montano, C.R. de la Mora-Basanez, T. Poschel, The hypercube structure of the genetic code explains conservative and non-conservative aminoacid substitutions in vivo and in vitro, Biosystems. 39 (1996) 117-125.
## Mutation type 'R' mut_type("A", "G") ## Mutation type 'M' mut_type("A", "C") ## Mutation type 'W' mut_type("A", "T") ## Mutation type 'S' mut_type("G", "C")
## Mutation type 'R' mut_type("A", "G") ## Mutation type 'M' mut_type("A", "C") ## Mutation type 'W' mut_type("A", "T") ## Mutation type 'S' mut_type("G", "C")
This function applies numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids to DNA protein-coding or to aminoacid sequences. As results, DNA protein-coding or the aminoacid sequences are represented as numerical vectors which can be subject of further downstream statistical analysis and digital signal processing.
peptide_phychem_index(aa, ...) ## S4 method for signature 'character' peptide_phychem_index( aa, acc = NULL, aaindex = NA, userindex = NULL, alphabet = c("AA", "DNA"), genetic.code = getGeneticCode("1"), no.init.codon = FALSE, if.fuzzy.codon = "error", ... ) ## S4 method for signature 'DNAStringSet_OR_DNAMultipleAlignment' peptide_phychem_index( aa, acc = NULL, aaindex = NA, userindex = NULL, alphabet = c("AA", "DNA"), genetic.code = getGeneticCode("1"), no.init.codon = FALSE, if.fuzzy.codon = "error", num.cores = 1L, tasks = 0L, verbose = FALSE, ... )
peptide_phychem_index(aa, ...) ## S4 method for signature 'character' peptide_phychem_index( aa, acc = NULL, aaindex = NA, userindex = NULL, alphabet = c("AA", "DNA"), genetic.code = getGeneticCode("1"), no.init.codon = FALSE, if.fuzzy.codon = "error", ... ) ## S4 method for signature 'DNAStringSet_OR_DNAMultipleAlignment' peptide_phychem_index( aa, acc = NULL, aaindex = NA, userindex = NULL, alphabet = c("AA", "DNA"), genetic.code = getGeneticCode("1"), no.init.codon = FALSE, if.fuzzy.codon = "error", num.cores = 1L, tasks = 0L, verbose = FALSE, ... )
aa |
A character string, a |
... |
Not in use. |
acc |
Accession id for a specified mutation or contact potential matrix. |
aaindex |
Database where the requested accession id is locate and from where the aminoacid indices can be obtained. The possible values are: "aaindex2" or "aaindex3". |
userindex |
User provided aminoacid indices. This can be a numerical vector or a matrix (20 x 20). If a numerical matrix is provided, then the aminoacid indices are computes as column averages. |
alphabet |
Whether the alphabet is from the 20 aminoacid (AA) or four (DNA)/RNA base alphabet. This would prevent mistakes, i.e., the strings "ACG" would be a base-triplet on the DNA alphabet or simply the amino acid sequence of alanine, cysteine, and glutamic acid. |
genetic.code , no.init.codon , if.fuzzy.codon
|
The same as given in function translation. |
num.cores , tasks
|
Parameters for parallel computation using package
|
verbose |
If TRUE, prints the function log to stdout. |
If a DNA sequence is given, then it is assumed that it is a DNA base-triplet sequence, i.e., the base sequence must be multiple of 3.
Errors can be originated if the given sequences carry letter which are not from the DNA or aminoacid alphabet.
Depending on the user specifications, a mutation or contact potential matrix, a list of available matrices (indices) ids or index names can be returned. More specifically:
Returns an aminoacid mutation matrix or a statistical protein contact potentials matrix.
Returns the specified aminoacid physicochemical indices.
Robersy Sanchez https://genomaths.com
## Let's create DNAStringSet-class object base <- DNAStringSet(x = c( seq1 ='ACGTCATAAAGT', seq2 = 'GTGTAATACAGT', seq3 = 'TCCTCATAAGGT')) ## The stop condon 'TAA' yields NA aa <- peptide_phychem_index(base, acc = "EISD840101") aa ## Description of the physicochemical index slot(aa, 'phychem') ## Get the aminoacid sequences. The stop codon 'TAA' is replaced by '*'. slot(aa, 'seqs') aa <- peptide_phychem_index(base, acc = "MIYS850103", aaindex = "aaindex3") aa ## Description of the physicochemical index slot(aa, 'phychem')
## Let's create DNAStringSet-class object base <- DNAStringSet(x = c( seq1 ='ACGTCATAAAGT', seq2 = 'GTGTAATACAGT', seq3 = 'TCCTCATAAGGT')) ## The stop condon 'TAA' yields NA aa <- peptide_phychem_index(base, acc = "EISD840101") aa ## Description of the physicochemical index slot(aa, 'phychem') ## Get the aminoacid sequences. The stop codon 'TAA' is replaced by '*'. slot(aa, 'seqs') aa <- peptide_phychem_index(base, acc = "MIYS850103", aaindex = "aaindex3") aa ## Description of the physicochemical index slot(aa, 'phychem')
Extract the gene ranges and coordinates from a pairwise alignment of codon/base sequences represented on a given Abelian group.
seqranges(x, ...) ## S4 method for signature 'CodonSeq' seqranges(x, granges = TRUE) ## S4 method for signature 'DNAStringSet_OR_NULL' seqranges( x, granges = TRUE, base_seq = TRUE, filepath = NULL, start = NA, end = NA, chr = 1L, strand = "+" )
seqranges(x, ...) ## S4 method for signature 'CodonSeq' seqranges(x, granges = TRUE) ## S4 method for signature 'DNAStringSet_OR_NULL' seqranges( x, granges = TRUE, base_seq = TRUE, filepath = NULL, start = NA, end = NA, chr = 1L, strand = "+" )
x |
An object from a |
... |
Not in use. |
granges |
Logical. Whether to return a
|
base_seq |
Logical. Whether to return the base or codon coordinates on the selected Abelian group. If codon coordinates are requested, then the number of the DNA bases in the given sequences must be multiple of 3. |
filepath |
A character vector containing the path to a file in fasta format to be read. This argument must be given if codon & base arguments are not provided. |
start , end , chr , strand
|
Optional parameters required to build a
|
This function provide an alternative way to get the codon
coordinate and the information on the codon sequence from a
CodonSeq
class objects. The function can either take the
output from functions codon_coord
or to operate directly on a
DNAStringSet
or to retrieve the a DNA sequence
alignment from a file.
Robersy Sanchez https://genomaths.com
Robersy Sanchez, Jesus Barreto (2021) Genomic Abelian Finite Groups. doi:10.1101/2021.06.01.446543
M. V Jose, E.R. Morgado, R. Sanchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119-152.PDF.
R. Sanchez. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560.
matrices
, codon_coord
, and
base_coord
.
## Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln ## A GRanges object carrying the aligned DNA sequence. seqranges( x = aln, base_seq = TRUE, filepath = NULL, ) ## A GRanges object carrying the aligned codon sequence. seqranges( x = aln, base_seq = FALSE, filepath = NULL, )
## Load a pairwise alignment data("aln", package = "GenomAutomorphism") aln ## A GRanges object carrying the aligned DNA sequence. seqranges( x = aln, base_seq = TRUE, filepath = NULL, ) ## A GRanges object carrying the aligned codon sequence. seqranges( x = aln, base_seq = FALSE, filepath = NULL, )
This function apply a function over a list-like object
preserving its attributes and simplify (if requested) the list as
sapply
function does. slapply returns a
list of the same length as 'x', each element of which is the result of
applying FUN to the corresponding element of 'x'.
slapply( x, FUN, keep.attr = FALSE, class = NULL, simplify = TRUE, USE.NAMES = TRUE, ... )
slapply( x, FUN, keep.attr = FALSE, class = NULL, simplify = TRUE, USE.NAMES = TRUE, ... )
x |
A list-like or vector-like object. |
FUN , ...
|
The same as described in |
keep.attr |
Logic. If TRUE, then the original attributes from 'x' are preserved in the returned list. Default is FALSE. |
class |
Name of the class to which the returned list belongs to. Default is NULL. |
simplify , USE.NAMES
|
The same as described in
|
Same as in ?base::lapply
if keep.attr = FALSE.
Otherwise same values preserving original attributes from 'x'.
Robersy Sanchez (https://genomaths.com).
## Create a list x <- list(a = seq(10), beta = exp(seq(-3, 3)), logic = c(TRUE, FALSE, FALSE, TRUE)) class(x) <- "nice" ## To compute the list mean for each list element using 'base::lapply' class(slapply(x, mean, simplify = FALSE)) ## Simply 'base::lapply' preserving attributes slapply(x, mean, keep.attr = TRUE, simplify = FALSE) ## To preserve attributes and simplify slapply(x, mean, keep.attr = TRUE, simplify = TRUE)
## Create a list x <- list(a = seq(10), beta = exp(seq(-3, 3)), logic = c(TRUE, FALSE, FALSE, TRUE)) class(x) <- "nice" ## To compute the list mean for each list element using 'base::lapply' class(slapply(x, mean, simplify = FALSE)) ## Simply 'base::lapply' preserving attributes slapply(x, mean, keep.attr = TRUE, simplify = FALSE) ## To preserve attributes and simplify slapply(x, mean, keep.attr = TRUE, simplify = TRUE)
GRanges-class
objectsSorts a GRanges-class
objects
by seqname (chromosome), start, and position.
sortByChromAndStart(x) sortByChromAndEnd(x)
sortByChromAndStart(x) sortByChromAndEnd(x)
x |
GRanges object |
Objects that inherits from a
GRanges-class
can be sorted as well.
GRanges-class
object or from the
original object class.
GR <- as(c("chr2:1-1", "chr1:1-1"), "GRanges") GR <- sortByChromAndStart(GR)
GR <- as(c("chr2:1-1", "chr1:1-1"), "GRanges") GR <- sortByChromAndStart(GR)
A simple function to transform a string into character vector.
str2chr(x, split = "", ...) ## S4 method for signature 'character' str2chr(x, split = "", ...) ## S4 method for signature 'list' str2chr(x, split = "", num.cores = 1L, tasks = 0L, verbose = FALSE, ...)
str2chr(x, split = "", ...) ## S4 method for signature 'character' str2chr(x, split = "", ...) ## S4 method for signature 'list' str2chr(x, split = "", num.cores = 1L, tasks = 0L, verbose = FALSE, ...)
x |
A character string or a list/vector of character strings. |
split |
The same as in |
... |
Further parameters for |
num.cores , tasks
|
Parameters for parallel computation using package
|
verbose |
If TRUE, prints the function log to stdout. |
A character string
Robersy Sanchez https://genomaths.com
## A character string str2chr("ATCAGCGGGATCTT") ## A list of character strings str2chr(list(str1 = "ATCAGCGGGATCTT", str2 = "CTTCTTCGTCAGGC"))
## A character string str2chr("ATCAGCGGGATCTT") ## A list of character strings str2chr(list(str1 = "ATCAGCGGGATCTT", str2 = "CTTCTTCGTCAGGC"))
A simple function to transform a string of digits into a numeric vector.
str2dig(x, split = "", ...) ## S4 method for signature 'character' str2dig(x, split = "", ...) ## S4 method for signature 'list' str2dig(x, split = "", num.cores = 1L, tasks = 0L, verbose = FALSE, ...)
str2dig(x, split = "", ...) ## S4 method for signature 'character' str2dig(x, split = "", ...) ## S4 method for signature 'list' str2dig(x, split = "", num.cores = 1L, tasks = 0L, verbose = FALSE, ...)
x |
A character string or a list/ of character strings of numeric/digit symbols. |
split |
The same as in |
... |
Further parameters for |
num.cores , tasks
|
Parameters for parallel computation using package
|
verbose |
If TRUE, prints the function log to stdout. |
A integer vector or a list of integer vectors.
Robersy Sanchez https://genomaths.com
## A integer vector str2dig("12231456247") ## A list of integer vectors str2dig(list(num1 = "12231456247", num2 = "521436897"))
## A integer vector str2dig("12231456247") ## A list of integer vectors str2dig(list(num1 = "12231456247", num2 = "521436897"))
This function extends translate
function to include letters that are frequently found in the DNA sequence
databases to indicate missing information and are not part of the the
DNA/RNA alphabet. Also, it is able to process sequences as just simple
'character' objects.
translation(x, ...) ## S4 method for signature 'character' translation( x, genetic.code = getGeneticCode("1"), no.init.codon = FALSE, if.fuzzy.codon = "error" ) ## S4 method for signature 'BioString' translation( x, genetic.code = getGeneticCode("1"), no.init.codon = FALSE, if.fuzzy.codon = "error" )
translation(x, ...) ## S4 method for signature 'character' translation( x, genetic.code = getGeneticCode("1"), no.init.codon = FALSE, if.fuzzy.codon = "error" ) ## S4 method for signature 'BioString' translation( x, genetic.code = getGeneticCode("1"), no.init.codon = FALSE, if.fuzzy.codon = "error" )
x |
A character string or the same arguments given to function
|
... |
Not in use yet. |
genetic.code |
The same as in |
no.init.codon , if.fuzzy.codon
|
Used only if 'x' is not a 'character'
object. The same as in |
If argument 'x' belong to any of the classes admitted by function
translate
, then this function is called to make
the translation.
The translated amino acid sequence.
Robersy Sanchez https://genomaths.com
## Load a small DNA sequence alingment data("aln", package = "GenomAutomorphism") translation(aln) ## Load a pairwise DNA sequence alingment of COVID-19 genomes data("covid_aln", package = "GenomAutomorphism") translation(covid_aln)
## Load a small DNA sequence alingment data("aln", package = "GenomAutomorphism") translation(aln) ## Load a pairwise DNA sequence alingment of COVID-19 genomes data("covid_aln", package = "GenomAutomorphism") translation(covid_aln)