Title: | Abundance and Compositional Based Binning of Metagenomes |
---|---|
Description: | Provide functions for performing abundance and compositional based binning on metagenomic samples, directly from FASTA or FASTQ files. Functions are implemented in Java and called via rJava. Parallel implementation that operates directly on input FASTA/FASTQ files for fast execution. |
Authors: | Anestis Gkanogiannis [aut, cre] |
Maintainer: | Anestis Gkanogiannis <[email protected]> |
License: | GPL-3 |
Version: | 1.9.0 |
Built: | 2024-12-29 06:08:21 UTC |
Source: | https://github.com/bioc/metabinR |
This function performs abundance based binning on metagenomic samples, directly from FASTA or FASTQ files, by long kmer analysis (k>8). See doi:10.1186/s12859-016-1186-3 for more details.
abundance_based_binning( ..., eMin = 1, eMax = 0, kMerSizeAB = 10, numOfClustersAB = 3, outputAB = "AB.cluster", keepQuality = FALSE, dryRun = FALSE, gzip = FALSE, numOfThreads = 1 )
abundance_based_binning( ..., eMin = 1, eMax = 0, kMerSizeAB = 10, numOfClustersAB = 3, outputAB = "AB.cluster", keepQuality = FALSE, dryRun = FALSE, gzip = FALSE, numOfThreads = 1 )
... |
Input fasta/fastq files locations (uncompressed or gzip compressed). |
eMin |
Exclude kmers of less or equal count. |
eMax |
Exclude kmers of more or equal count. |
kMerSizeAB |
kmer length for Abundance based Binning. |
numOfClustersAB |
Number of Clusters for Abundance based Binning. |
outputAB |
Output Abundance based Binning Clusters files location and prefix. |
keepQuality |
Keep fastq qualities on the output files. (will produce .fastq) |
dryRun |
Don't write any output files. |
gzip |
Gzip output files. |
numOfThreads |
Number of threads to use. |
A data.frame
of the binning assignments.
Return value contains numOfClustersAB + 2
columns.
read_id
: read identifier from fasta header
AB
: read was assigned to this AB cluster index
AB.n
: read to cluster AB.n distance
Anestis Gkanogiannis, [email protected]
https://github.com/gkanogiannis/metabinR
abundance_based_binning( system.file("extdata", "reads.metagenome.fasta.gz",package = "metabinR"), dryRun = TRUE, kMerSizeAB = 8 )
abundance_based_binning( system.file("extdata", "reads.metagenome.fasta.gz",package = "metabinR"), dryRun = TRUE, kMerSizeAB = 8 )
This function performs composition based binning on metagenomic samples, directly from FASTA or FASTQ files, by short kmer analysis (k<8). See doi:10.1186/s12859-016-1186-3 for more details.
composition_based_binning( ..., kMerSizeCB = 4, numOfClustersCB = 5, outputCB = "CB.cluster", keepQuality = FALSE, dryRun = FALSE, gzip = FALSE, numOfThreads = 1 )
composition_based_binning( ..., kMerSizeCB = 4, numOfClustersCB = 5, outputCB = "CB.cluster", keepQuality = FALSE, dryRun = FALSE, gzip = FALSE, numOfThreads = 1 )
... |
Input fasta/fastq files locations (uncompressed or gzip compressed). |
kMerSizeCB |
kmer length for Composition based Binning. |
numOfClustersCB |
Number of Clusters for Composition based Binning. |
outputCB |
Output Composition based Binning Clusters files location and prefix. |
keepQuality |
Keep fastq qualities on the output files. (will produce .fastq) |
dryRun |
Don't write any output files. |
gzip |
Gzip output files. |
numOfThreads |
Number of threads to use. |
A data.frame
of the binning assignments.
Return value contains numOfClustersCB + 2
columns.
read_id
: read identifier from fasta header
CB
: read was assigned to this CB cluster index
CB.n
: read to cluster CB.n distance
Anestis Gkanogiannis, [email protected]
https://github.com/gkanogiannis/metabinR
composition_based_binning( system.file("extdata", "reads.metagenome.fasta.gz",package = "metabinR"), dryRun = TRUE, kMerSizeCB = 2 )
composition_based_binning( system.file("extdata", "reads.metagenome.fasta.gz",package = "metabinR"), dryRun = TRUE, kMerSizeCB = 2 )
This function performs hierarchical binning on metagenomic samples,
directly from FASTA or FASTQ files.
First it analyzes sequences by long kmer analysis (k>8),
as in abundance_based_binning
.
Then for each AB bin, it guesses the number of composition bins in it and
performs composition based binning by short kmer analysis (k<8),
as in composition_based_binning
.
See doi:10.1186/s12859-016-1186-3 for more details.
hierarchical_binning( ..., eMin = 1, eMax = 0, kMerSizeAB = 10, kMerSizeCB = 4, genomeSize = 3e+06, numOfClustersAB = 3, outputC = "ABxCB.cluster", keepQuality = FALSE, dryRun = FALSE, gzip = FALSE, numOfThreads = 1 )
hierarchical_binning( ..., eMin = 1, eMax = 0, kMerSizeAB = 10, kMerSizeCB = 4, genomeSize = 3e+06, numOfClustersAB = 3, outputC = "ABxCB.cluster", keepQuality = FALSE, dryRun = FALSE, gzip = FALSE, numOfThreads = 1 )
... |
Input fasta/fastq files locations (uncompressed or gzip compressed). |
eMin |
Exclude kmers of less or equal count. |
eMax |
Exclude kmers of more or equal count. |
kMerSizeAB |
kmer length for Abundance based Binning. |
kMerSizeCB |
kmer length for Composition based Binning. |
genomeSize |
Average genome size of taxa in the metagenome data. |
numOfClustersAB |
Number of Clusters for Abundance based Binning. |
outputC |
Output Hierarchical Binning (ABxCB) Clusters files location and prefix. |
keepQuality |
Keep fastq qualities on the output files. (will produce .fastq) |
dryRun |
Don't write any output files. |
gzip |
Gzip output files. |
numOfThreads |
Number of threads to use. |
A data.frame
of the binning assignments.
Return value contains numOfClustersAB + 2
columns.
read_id
: read identifier from fasta header
ABxCB
: read was assigned to this ABxCB cluster index
ABxCB.n
: read to cluster ABxCB.n distance
Anestis Gkanogiannis, [email protected]
https://github.com/gkanogiannis/metabinR
hierarchical_binning( system.file("extdata", "reads.metagenome.fasta.gz",package = "metabinR"), dryRun = TRUE, kMerSizeAB = 4, kMerSizeCB = 2 )
hierarchical_binning( system.file("extdata", "reads.metagenome.fasta.gz",package = "metabinR"), dryRun = TRUE, kMerSizeAB = 4, kMerSizeCB = 2 )