| Title: | Abundance and Compositional Based Binning of Metagenomes |
|---|---|
| Description: | Provide functions for performing abundance and compositional based binning on metagenomic samples, directly from FASTA or FASTQ files. Functions are implemented in Java and called via rJava. Parallel implementation that operates directly on input FASTA/FASTQ files for fast execution. Inputs may be file paths or Biostrings/ShortRead sequence objects; results are returned as a MetabinResult S4 object wrapping cluster assignments, algorithm parameters, and input metadata. |
| Authors: | Anestis Gkanogiannis [aut, cre] (ORCID: <https://orcid.org/0000-0002-6441-0688>) |
| Maintainer: | Anestis Gkanogiannis <[email protected]> |
| License: | GPL-3 |
| Version: | 2.1.0 |
| Built: | 2026-05-22 09:50:18 UTC |
| Source: | https://github.com/bioc/metabinR |
This function performs abundance based binning on metagenomic samples, directly from FASTA or FASTQ files, by long kmer analysis (k>8). See doi:10.1186/s12859-016-1186-3 for more details.
abundance_based_binning( ..., eMin = 1, eMax = 0, kMerSizeAB = 10, numOfClustersAB = 3, outputAB = "AB.cluster", keepQuality = FALSE, dryRun = FALSE, gzip = FALSE, numOfThreads = BiocParallel::bpworkers() )abundance_based_binning( ..., eMin = 1, eMax = 0, kMerSizeAB = 10, numOfClustersAB = 3, outputAB = "AB.cluster", keepQuality = FALSE, dryRun = FALSE, gzip = FALSE, numOfThreads = BiocParallel::bpworkers() )
... |
Input sequences. Either character paths to FASTA/FASTQ files
(uncompressed or gzip compressed), a |
eMin |
Exclude kmers of less or equal count. |
eMax |
Exclude kmers of more or equal count. |
kMerSizeAB |
kmer length for Abundance based Binning. |
numOfClustersAB |
Number of Clusters for Abundance based Binning. |
outputAB |
Output Abundance based Binning Clusters files location and prefix. |
keepQuality |
Keep fastq qualities on the output files. (will produce .fastq) |
dryRun |
Don't write any output files. |
gzip |
Gzip output files. |
numOfThreads |
Number of threads to use. Defaults to
|
A MetabinResult object. Its assignments slot
is a DataFrame with numOfClustersAB + 2
columns:
read_id : read identifier from fasta header
AB : read was assigned to this AB cluster index
AB.n : read to cluster AB.n distance
For backwards-compatible data.frame output use
as.data.frame(result).
Anestis Gkanogiannis, [email protected]
https://github.com/gkanogiannis/metabinR
res <- abundance_based_binning( system.file("extdata", "reads.metagenome.fasta.gz", package = "metabinR"), dryRun = TRUE, kMerSizeAB = 8 ) res head(as.data.frame(res))res <- abundance_based_binning( system.file("extdata", "reads.metagenome.fasta.gz", package = "metabinR"), dryRun = TRUE, kMerSizeAB = 8 ) res head(as.data.frame(res))
data.frame
Preserves backwards compatibility with the data.frame return of
metabinR <= 1.x.
## S4 method for signature 'MetabinResult' as.data.frame(x, row.names = NULL, optional = FALSE, ...)## S4 method for signature 'MetabinResult' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
|
row.names, optional, ...
|
Unused; retained for S3 signature compatibility. |
A base data.frame of the assignments.
This function performs composition based binning on metagenomic samples, directly from FASTA or FASTQ files, by short kmer analysis (k<8). See doi:10.1186/s12859-016-1186-3 for more details.
composition_based_binning( ..., kMerSizeCB = 4, numOfClustersCB = 5, outputCB = "CB.cluster", keepQuality = FALSE, dryRun = FALSE, gzip = FALSE, numOfThreads = BiocParallel::bpworkers() )composition_based_binning( ..., kMerSizeCB = 4, numOfClustersCB = 5, outputCB = "CB.cluster", keepQuality = FALSE, dryRun = FALSE, gzip = FALSE, numOfThreads = BiocParallel::bpworkers() )
... |
Input sequences. Either character paths to FASTA/FASTQ files
(uncompressed or gzip compressed), a |
kMerSizeCB |
kmer length for Composition based Binning. |
numOfClustersCB |
Number of Clusters for Composition based Binning. |
outputCB |
Output Composition based Binning Clusters files location and prefix. |
keepQuality |
Keep fastq qualities on the output files. (will produce .fastq) |
dryRun |
Don't write any output files. |
gzip |
Gzip output files. |
numOfThreads |
Number of threads to use. Defaults to
|
A MetabinResult object. Its assignments slot
is a DataFrame with numOfClustersCB + 2
columns:
read_id : read identifier from fasta header
CB : read was assigned to this CB cluster index
CB.n : read to cluster CB.n distance
For backwards-compatible data.frame output use
as.data.frame(result).
Anestis Gkanogiannis, [email protected]
https://github.com/gkanogiannis/metabinR
res <- composition_based_binning( system.file("extdata", "reads.metagenome.fasta.gz", package = "metabinR"), dryRun = TRUE, kMerSizeCB = 2 ) resres <- composition_based_binning( system.file("extdata", "reads.metagenome.fasta.gz", package = "metabinR"), dryRun = TRUE, kMerSizeCB = 2 ) res
This function performs hierarchical binning on metagenomic samples,
directly from FASTA or FASTQ files.
First it analyzes sequences by long kmer analysis (k>8),
as in abundance_based_binning.
Then for each AB bin, it guesses the number of composition bins in it and
performs composition based binning by short kmer analysis (k<8),
as in composition_based_binning.
See doi:10.1186/s12859-016-1186-3 for more details.
hierarchical_binning( ..., eMin = 1, eMax = 0, kMerSizeAB = 10, kMerSizeCB = 4, genomeSize = 3e+06, numOfClustersAB = 3, outputC = "ABxCB.cluster", keepQuality = FALSE, dryRun = FALSE, gzip = FALSE, numOfThreads = BiocParallel::bpworkers() )hierarchical_binning( ..., eMin = 1, eMax = 0, kMerSizeAB = 10, kMerSizeCB = 4, genomeSize = 3e+06, numOfClustersAB = 3, outputC = "ABxCB.cluster", keepQuality = FALSE, dryRun = FALSE, gzip = FALSE, numOfThreads = BiocParallel::bpworkers() )
... |
Input sequences. Either character paths to FASTA/FASTQ files
(uncompressed or gzip compressed), a |
eMin |
Exclude kmers of less or equal count. |
eMax |
Exclude kmers of more or equal count. |
kMerSizeAB |
kmer length for Abundance based Binning. |
kMerSizeCB |
kmer length for Composition based Binning. |
genomeSize |
Average genome size of taxa in the metagenome data. |
numOfClustersAB |
Number of Clusters for Abundance based Binning. |
outputC |
Output Hierarchical Binning (ABxCB) Clusters files location and prefix. |
keepQuality |
Keep fastq qualities on the output files. (will produce .fastq) |
dryRun |
Don't write any output files. |
gzip |
Gzip output files. |
numOfThreads |
Number of threads to use. Defaults to
|
A MetabinResult object. Its assignments slot
is a DataFrame:
read_id : read identifier from fasta header
ABxCB : read was assigned to this ABxCB cluster index
ABxCB.n : read to cluster ABxCB.n distance
For backwards-compatible data.frame output use
as.data.frame(result).
Anestis Gkanogiannis, [email protected]
https://github.com/gkanogiannis/metabinR
res <- hierarchical_binning( system.file("extdata", "reads.metagenome.fasta.gz", package = "metabinR"), dryRun = TRUE, kMerSizeAB = 4, kMerSizeCB = 2 ) resres <- hierarchical_binning( system.file("extdata", "reads.metagenome.fasta.gz", package = "metabinR"), dryRun = TRUE, kMerSizeAB = 4, kMerSizeCB = 2 ) res
Returns the vector of JVM flags passed to .jpackage
on package load. Set options(metabinR.jvm.flags = c(...)) before
loading the package to override; set options(java.parameters = ...)
to prepend heap-size flags (e.g. "-Xmx4g") in the usual rJava way.
metabinR_jvm_options()metabinR_jvm_options()
A character vector of JVM flags.
metabinR_jvm_options()metabinR_jvm_options()
Accessors for MetabinResult
assignments(x) nClusters(x) parameters(x) algorithm(x) ## S4 method for signature 'MetabinResult' assignments(x) ## S4 method for signature 'MetabinResult' parameters(x) ## S4 method for signature 'MetabinResult' algorithm(x) ## S4 method for signature 'MetabinResult' nClusters(x)assignments(x) nClusters(x) parameters(x) algorithm(x) ## S4 method for signature 'MetabinResult' assignments(x) ## S4 method for signature 'MetabinResult' parameters(x) ## S4 method for signature 'MetabinResult' algorithm(x) ## S4 method for signature 'MetabinResult' nClusters(x)
x |
A MetabinResult object. |
assignments() returns a DataFrame
with the per-read cluster assignments and distances.
nClusters() returns an integer scalar: the number of clusters
inferred by the algorithm. parameters() returns the list of
arguments passed to the binning function. algorithm() returns the
algorithm tag ("AB", "CB", or "ABxCB").
res <- abundance_based_binning( system.file("extdata", "reads.metagenome.fasta.gz", package = "metabinR"), dryRun = TRUE, kMerSizeAB = 4, numOfClustersAB = 2 ) assignments(res) nClusters(res) algorithm(res) parameters(res)$kMerSizeABres <- abundance_based_binning( system.file("extdata", "reads.metagenome.fasta.gz", package = "metabinR"), dryRun = TRUE, kMerSizeAB = 4, numOfClustersAB = 2 ) assignments(res) nClusters(res) algorithm(res) parameters(res)$kMerSizeAB
An S4 class returned by [abundance_based_binning()], [composition_based_binning()] and [hierarchical_binning()].
## S4 method for signature 'MetabinResult' show(object)## S4 method for signature 'MetabinResult' show(object)
object |
Objects of this class are returned by the binning functions.
Use assignments, nClusters,
parameters, algorithm, or
as.data.frame() to access the results.
assignmentsA DataFrame of cluster
assignments. The first column is read_id; subsequent columns
are algorithm-specific (see the corresponding binning function).
parametersNamed list of the parameters passed to the algorithm.
inputsCharacter vector of input file paths that were processed.
algorithmCharacter scalar: one of "AB", "CB",
"ABxCB".
res <- abundance_based_binning( system.file("extdata", "reads.metagenome.fasta.gz", package = "metabinR"), dryRun = TRUE, kMerSizeAB = 4, numOfClustersAB = 2 ) res is(res, "MetabinResult")res <- abundance_based_binning( system.file("extdata", "reads.metagenome.fasta.gz", package = "metabinR"), dryRun = TRUE, kMerSizeAB = 4, numOfClustersAB = 2 ) res is(res, "MetabinResult")