Title: | Outlier detection in multiple sequence alignments |
---|---|
Description: | Performs outlier detection of sequences in a multiple sequence alignment using bootstrap of predefined distance metrics. Outlier sequences can make downstream analyses unreliable or make the alignments less accurate while they are being constructed. This package implements the OD-seq algorithm proposed by Jehl et al (doi 10.1186/s12859-015-0702-1) for aligned sequences and a variant using string kernels for unaligned sequences. |
Authors: | José Jiménez |
Maintainer: | José Jiménez <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.35.0 |
Built: | 2024-10-30 09:05:28 UTC |
Source: | https://github.com/bioc/odseq |
Performs outlier detection of sequences in a multiple sequence alignment using bootstrap of predefined distance metrics. Outlier sequences can make downstream analyses unreliable or make the alignments less accurate while they are being constructed. This package implements the OD-seq algorithm proposed by Jehl et al (doi 10.1186/s12859-015-0702-1) for aligned sequences and a variant using string kernels for unaligned sequences.
The DESCRIPTION file:
Package: | odseq |
Type: | Package |
Title: | Outlier detection in multiple sequence alignments |
Version: | 1.35.0 |
Date: | 2015-12-20 |
Author: | José Jiménez |
Maintainer: | José Jiménez <[email protected]> |
Description: | Performs outlier detection of sequences in a multiple sequence alignment using bootstrap of predefined distance metrics. Outlier sequences can make downstream analyses unreliable or make the alignments less accurate while they are being constructed. This package implements the OD-seq algorithm proposed by Jehl et al (doi 10.1186/s12859-015-0702-1) for aligned sequences and a variant using string kernels for unaligned sequences. |
License: | MIT + file LICENSE |
LazyData: | True |
Encoding: | UTF-8 |
biocViews: | Alignment, MultipleSequenceAlignment |
VignetteBuilder: | knitr |
Suggests: | knitr(>= 1.11) |
Depends: | R (>= 3.2.3) |
Imports: | msa (>= 1.2.1), kebabs (>= 1.4.1), mclust (>= 5.1) |
NeedsCompilation: | no |
Repository: | https://bioc.r-universe.dev |
RemoteUrl: | https://github.com/bioc/odseq |
RemoteRef: | HEAD |
RemoteSha: | c7de581ae4b1dbec826ef07ea5a07f52b266c9a7 |
Index of help topics:
odmix Gaussian mixture modelling of distances in a multiple sequence alignment. odseq Outlier detection in a multiple sequence alignment odseq-package Outlier detection in multiple sequence alignments odseq_unaligned Outlier detection provided a distance/similarity matrix of sequences. seqs PFAM plus random data.
José Jiménez
Maintainer: José Jiménez <[email protected]>
[1] OD-seq: outlier detection in multiple sequence alignments. Peter Jehl, Fabian Sievers and Desmond G. Higgins. BMC Bioinformatics. 2015.
library(msa) data(seqs) al <- msa(seqs) odseq(al, distance_metric = "affine", B = 1000, threshold = 0.025)
library(msa) data(seqs) al <- msa(seqs) odseq(al, distance_metric = "affine", B = 1000, threshold = 0.025)
This function performs clustering of biological sequences via fitting a
Gaussian mixture model of the distances defined by the odseq
algorithm
odmix(msa_object, distance_metric, groups)
odmix(msa_object, distance_metric, groups)
msa_object |
An object of formal class |
distance_metric |
A string indicating the type of distance metric to be computed. Either |
groups |
Number of groups to fit in the mixture model. If a numeric vector of size |
A list containing the following items:
prob |
A numeric matrix of size n x groups where the probability of belonging to a group is provided for each sequence. |
class |
The class assigned according to |
BIC |
BIC values for the models proposed in |
José Jiménez <[email protected]>
library(msa) data(seqs) al <- msa(seqs) odmix(al, distance_metric = "affine", groups = 2)
library(msa) data(seqs) al <- msa(seqs) odmix(al, distance_metric = "affine", groups = 2)
This function will first compute a distance metric among every sequence in the multiple alignment. Then it will bootstrap an average score of these distance to provide information on the distribution of scores, which is used to distinguish outlier sequences with a certain threshold
odseq(msa_object, distance_metric = "linear", B = 100, threshold = 0.025)
odseq(msa_object, distance_metric = "linear", B = 100, threshold = 0.025)
msa_object |
An object of formal class |
distance_metric |
A string indicating the type of distance metric to be computed. Either |
B |
Integer indicating the number of bootstrap replicates to be run. The higher the more robust the detection should be. |
threshold |
Float indicating the probability to be left at the right of the bootstrap scores distribution when computing outliers. This parameter may need some tuning depending on each specific problem |
Returns a logical vector, where TRUE
indicates an outlier.
José Jiménez <[email protected]>
[1] OD-seq: outlier detection in multiple sequence alignments. Peter Jehl, Fabian Sievers and Desmond G. Higgins. BMC Bioinformatics. 2015.
library(msa) data(seqs) al <- msa(seqs) odseq(al, distance_metric = "affine", B = 1000, threshold = 0.025)
library(msa) data(seqs) al <- msa(seqs) odseq(al, distance_metric = "affine", B = 1000, threshold = 0.025)
Provided a similarity matrix (like the ones provided using string kernels in kebabs). It will then compute a score for each sequence and perform bootstrap to provide information on the distribution of the scores, which is used to distinguish outlier sequences.
odseq_unaligned(distance_matrix, B = 100, threshold = 0.025, type = "similarity")
odseq_unaligned(distance_matrix, B = 100, threshold = 0.025, type = "similarity")
distance_matrix |
A numeric matrix representing either similarity or distance among unaligned sequences. Package kebabs may be useful for this task. |
B |
Integer indicating the number of bootstrap replicates to be run. The higher the more robust the detection should be. |
threshold |
Float indicating the probability to be left at the right of the bootstrap scores distribution when computing outliers. This parameter may need some tuning depending on each specific problem |
type |
A string indicating the type of distance metric used. Either |
Returns a logical vector, where TRUE
indicates an outlier.
José Jiménez <[email protected]>
[1] OD-seq: outlier detection in multiple sequence alignments. Peter Jehl, Fabian Sievers and Desmond G. Higgins. BMC Bioinformatics. 2015.
library(kebabs) data(seqs) sp <- spectrumKernel(k = 3) mat <- getKernelMatrix(sp, seqs) odseq_unaligned(mat, B = 1000, threshold = 0.025, type = "similarity")
library(kebabs) data(seqs) sp <- spectrumKernel(k = 3) mat <- getKernelMatrix(sp, seqs) odseq_unaligned(mat, B = 1000, threshold = 0.025, type = "similarity")
Sequences from a certain PFAM family plus 100 random sequences.
data("seqs")
data("seqs")
An object of class AAStringSet
.
data(seqs)
data(seqs)