NEWS
biomformat 1.39.17
BUG FIXES
- Added explicit S4 dispatch method for biom_data() with signature
c("biom", "character", "missing"). Previously, calling biom_data()
with only a character rows= argument (e.g. biom_data(x, rows="GG_OTU_3"))
matched the catch-all ("biom", "character", "ANY") method, which then
forwarded the still-missing columns argument to the next dispatch layer,
causing the error: "argument 'columns' is missing, with no default".
The new method defaults columns to 1:ncol(x) and dispatches cleanly.
Fixes the vignette build ERROR on Bioconductor (chunk named_subsetting,
biomformat.Rmd lines 402-413).
R CMD check: 0 ERRORs, 0 WARNINGs, 4 pre-existing NOTEs.
biomformat 1.39.16
BUG FIXES
- Updated Joseph N. Paulson's email address in Authors@R to
[email protected].
R CMD check: 0 ERRORs, 0 NOTEs, 2 pre-existing vignette WARNINGs.
biomformat 1.39.15
BUG FIXES
- Replaced deprecated Author:/Maintainer: DESCRIPTION fields with the
modern Authors@R: format using person() with role=c("aut","cre") for
the maintainer. This was flagged as an ERROR by BiocCheck >= 1.46 and
was the root cause of the "invalid email" complaint from Bioconductor —
the field itself was structurally invalid, not just the address string.
Updated maintainer email to [email protected] (reachable address).
Added .BiocCheck to .Rbuildignore.
R CMD check: 0 ERRORs, 0 NOTEs, 2 pre-existing vignette WARNINGs.
BiocCheck: 1 ERROR (Watched Tags on support site — manual web action),
3 WARNINGs (version parity, no BioC deps, missing \value),
14 NOTEs (style/cosmetic).
biomformat 1.39.14
DOCUMENTATION / VIGNETTE
- Added two new vignette sections to vignettes/biomformat.Rmd:
* "Constructing a BIOM from R data": end-to-end make_biom() workflow
showing how to build a biom object from a count matrix and a
data.frame taxonomy table with list-valued columns (the dada2 pattern),
write it with write_biom(), and read it back. Includes a callout
directing large-dataset users to write_hdf5_biom() (addressing Issue
#8), with a guarded HDF5 code chunk. Directly addresses Issues #4,
#6, #9 for users who land on the vignette.
* "Subsetting biom_data() by name": demonstrates the character-vector
row/column subsetting interface of biom_data() that was previously
undocumented in the vignette.
R CMD check: 0 ERRORs, 0 NOTEs, 2 pre-existing vignette WARNINGs.
biomformat 1.39.13
DOCUMENTATION
- write_biom(): added @details section documenting the 2^31-1 byte
size limitation that arises because jsonlite serialises the entire
BIOM object to a single R character string. Users with large datasets
(thousands of samples or features) are now explicitly directed to
write_hdf5_biom() which has no such size constraint.
Closes GitHub Issue #8.
R CMD check: 0 ERRORs, 0 NOTEs, 2 pre-existing vignette WARNINGs.
biomformat 1.39.12
NEW FEATURES / TESTS
- Issue #7 regression test: added inst/extdata/zero_col_hdf5.biom, a
minimal 3x3 HDF5 BIOM fixture where the middle sample ("ZeroSamp") has
all-zero counts. The fixture is generated by the companion script
inst/extdata/create_zero_col_hdf5.R using rhdf5 directly.
Added two tests in test-hdf5-write.R:
* "zero-column HDF5 fixture: biom_data() returns correct 3x3 matrix" —
verifies that ZeroSamp is all zeros and the other two columns have
the expected non-zero values. Validates that generate_matrix()
(rewritten in v1.39.8 to use sparseMatrix directly from CCS triplets)
correctly handles the case where indptr[j] == indptr[j+1].
* "zero-column HDF5 fixture: make_biom() + write_hdf5_biom() round-trip"
— verifies that write_hdf5_biom() -> read_biom() preserves the
all-zero column.
Both tests are guarded with skip_if_not_installed("rhdf5").
R CMD check: 0 ERRORs, 0 NOTEs, 2 pre-existing vignette WARNINGs.
biomformat 1.39.11
BUG FIXES
- make_biom(): fix Issue #4 — NULL id was serialised as {} (empty JSON object)
by jsonlite. make_biom() now substitutes "No Table ID" when id is NULL,
matching the BIOM spec and ensuring write_biom() -> read_biom() is a
lossless round-trip. validObject() now succeeds on the round-tripped object.
- make_biom(): fix Issue #6 — when observation_metadata (or sample_metadata)
is a data.frame with list columns (e.g. a "taxonomy" column holding
character vectors of rank assignments, as produced by dada2), the metadata
was serialised as a bare JSON array ([[...]]) instead of a named JSON object
({"taxonomy":[...]}). The BIOM spec and downstream tools (phyloseq
import_biom(), the Python biom library) require the named-object form.
Root cause: as.matrix() on a list-column data.frame produces a list-matrix;
as.list(row) then collapses field names. Fix: detect list-column data.frames
and build per-row metadata directly as named lists, bypassing as.matrix().
Closes GitHub Issue #6. Also resolves user-support Issue #9 (dada2 -> biom
-> write_biom workflow now works correctly).
R CMD check: 0 ERRORs, 0 NOTEs, 2 pre-existing vignette WARNINGs.
biomformat 1.39.10
NEW FEATURES
- Updated vignette with four new sections:
* HDF5 (BIOM v2) read and write via write_hdf5_biom() / read_biom(),
including JSON-to-HDF5 conversion.
* Tidy long-format output via as.data.frame() and as_tibble.biom(),
with purrr-style summarisation examples (Shannon diversity,
per-sample total counts) and base-R fallbacks.
* SummarizedExperiment interoperability via
biom_to_SummarizedExperiment() and as(x, "SummarizedExperiment"),
showing assay(), colData(), and rowData() access.
* Session info section.
- All new vignette sections are guarded with requireNamespace() so
the vignette builds cleanly without optional dependencies
(rhdf5, tibble, purrr/dplyr, SummarizedExperiment/S4Vectors).
biomformat 1.39.9
NEW FEATURES
- write_hdf5_biom(x, biom_file): new exported function that serialises a
biom object to the BIOM v2 HDF5 format. Writes both the sample-major
and observation-major compressed-sparse representations required by the
spec, plus all sample and observation metadata. Requires rhdf5
(Bioconductor); a clear error is raised if it is absent.
read_biom() -> write_hdf5_biom() -> read_biom() is a lossless round-trip
for both count data and metadata.
BUG FIXES
- Moved rhdf5 from Imports to Suggests. The package loads and all JSON
BIOM functionality works without rhdf5 installed; HDF5 read/write simply
stops with an informative message when rhdf5 is absent.
biomformat 1.39.8
BUG FIXES / PERFORMANCE
- generate_matrix(): rewrote HDF5/BIOM-v2 matrix reconstruction to build
a sparse Matrix directly from the CCS (indptr/indices/data) triplets
stored in the HDF5 file, instead of first constructing a dense
base::matrix via sapply() and then converting. For large datasets this
avoids an O(n_obs * n_samples) allocation. The return value (list of
named vectors, one per observation) is unchanged so all downstream code
is unaffected. Also handles the edge case of an all-zero matrix
(length(data) == 0) explicitly.
biomformat 1.39.7
NEW FEATURES
- as.data.frame.biom(): new S3 method that converts a biom object to a
long-format (tidy) data.frame with one row per (feature, sample) pair.
Columns: feature_id, sample_id, count, plus any sample and observation
metadata columns appended via left-join. Pure base R, no tidyverse
dependency.
- as_tibble.biom(): thin wrapper around as.data.frame.biom() that returns
a tibble. Requires the 'tibble' package (Suggests only). Call via
tibble::as_tibble(x) or as_tibble.biom(x) directly.
DEPENDENCY CHANGES
- Added tibble to Suggests (optional; only needed for as_tibble.biom()).
biomformat 1.39.6
NEW FEATURES
- biom_to_SummarizedExperiment(): new exported function that converts a
biom object into a SummarizedExperiment, placing the count/value matrix
in assay("counts"), sample metadata in colData(), and feature metadata
in rowData(). Both colData and rowData are S4Vectors::DataFrame objects.
When a biom object carries no metadata (the accessor returns NULL), an
empty DataFrame with correct row/col names is used, ensuring the SE is
always valid. No hard dependency is introduced: SummarizedExperiment and
S4Vectors are listed in Suggests only.
- as(x, "SummarizedExperiment"): S4 coercion method registered at load
time when SummarizedExperiment is available, delegating to
biom_to_SummarizedExperiment().
DEPENDENCY CHANGES
- Added SummarizedExperiment and S4Vectors to Suggests.
TESTS
- New tests/testthat/test-SE.R with 6 tests covering: return class,
assay content, colData content, rowData content, S4 coercion, NULL
metadata, and SE dimension/dimname correctness.
All tests are skipped gracefully when SummarizedExperiment is not
installed.
biomformat 1.39.5
DEPENDENCY CHANGES
- Removed plyr (>= 1.8) from Imports entirely. plyr is unmaintained and
every use in this codebase now has a direct base-R equivalent.
This eliminates an unmaintained dependency and reduces install footprint.
- Bumped R dependency from >= 3.2 to >= 4.1, ensuring modern base-R idioms
(including the native pipe |>) are available.
- Replaced import(Matrix) (whole-namespace) with selective
importFrom(Matrix, Matrix, sparseMatrix, drop0) in NAMESPACE.
Replaced import(methods) with selective importFrom(methods, ...).
Follows CRAN/Bioconductor best practices; prevents namespace pollution.
USER-VISIBLE CHANGES
- biom_data(), sample_metadata(), observation_metadata(): the parallel=
argument is now a no-op with a deprecation warning when passed as TRUE.
The plyr-backed parallel execution it previously enabled no longer exists.
Existing code that passes parallel=FALSE (the default) is unaffected.
INTERNAL CHANGES
- make_biom(): replaced plyr::alply() with lapply(seq_len(nrow(...)))
for building per-row named metadata lists.
- biom_data() dense path: replaced plyr::laply() with
do.call(rbind, lapply(...)).
- biom_data() sparse numeric path: replaced plyr::ldply(x$data) with
do.call(rbind, lapply(x$data, as.data.frame)).
- biom_data() sparse unicode path: replaced plyr::ldply(x$data, function...)
with do.call(rbind, lapply(x$data, function(e) ...)).
- extract_metadata(): replaced plyr::llply()/plyr::ldply() with
lapply() / do.call(rbind, lapply(...)).
biomformat 1.39.4
BUG FIXES
- biom_data(): fixed data-corruption bug on both dense and sparse BIOM paths
where subsetting to a single row or single column silently collapsed the
result into a dimensionless named vector, discarding dim(), rownames(), and
colnames(). This caused downstream tools (notably phyloseq::import_biom())
to fail silently or produce incorrect OTU tables.
Fix: on the sparse path, added drop = FALSE to the matrix subsetting call
(m[rows, columns, drop = FALSE]). On the dense path, the laply() result is
now immediately reshaped with matrix(m, nrow, ncol) before Matrix() coercion,
ensuring a 2-D object is always returned regardless of dimension lengths.
Closes GitHub PR #12 (https://github.com/joey711/biomformat/pull/12):
"Fix biom_data() when dealing with 1-taxon and 1-sample BIOM data"
Supersedes GitHub PR #11 (https://github.com/joey711/biomformat/pull/11):
"Fix unidentical biom output by make_biom()"
- biom_data(): simplified the post-subsetting naming block. Both paths now
always produce a 2-D object, so rownames() and colnames() are applied
unconditionally on all code paths (the previous is.null(dim(m)) branch
is no longer needed).
TESTS
- Added regression tests in tests/testthat/test-IO.R covering all new
behaviour introduced across v1.39.2-v1.39.4:
* sparse single-row subset retains dim() and dimnames (PR #12)
* sparse single-col subset retains dim() and dimnames (PR #12)
* sparse single-cell (1x1) subset retains dim()
* full sparse matrix unaffected by drop = FALSE fix
* read_biom() routes HDF5 fixture cleanly without jsonlite warning
(Issue #14, PR #16)
* read_biom() correctly classifies all JSON and HDF5 extdata fixtures
biomformat 1.39.3
BUG FIXES
- read_biom(): replaced the fragile JSON-first / HDF5-fallback try() chain
with a deterministic magic-bytes router. The function now reads the first
4 bytes of the file; if the HDF5 signature (\x89 H D F) is detected it
routes exclusively to read_hdf5_biom() and never invokes jsonlite. JSON
files route exclusively to jsonlite. This eliminates the confusing
"lexical error: invalid char in json text ... 89HDF" warning that users
encountered when HDF5 files were accidentally passed through the JSON
parser first.
Closes GitHub Issue #14 (https://github.com/joey711/biomformat/issues/14):
"Unable to read HDF5 biom file"
Supersedes GitHub PR #16 (https://github.com/joey711/biomformat/pull/16):
"Improve handling of HDF5 BIOM files"
Partially addresses GitHub Issue #5 (https://github.com/joey711/biomformat/issues/5)
and Issue #3 (https://github.com/joey711/biomformat/issues/3):
fatal C-level aborts when reading large or malformed HDF5 files are now
caught by the new tryCatch() wrapper in read_hdf5_biom() and re-emitted
as informative R-level warnings instead of crashing the session.
- read_hdf5_biom(): added requireNamespace("rhdf5", quietly = TRUE) guard.
If the rhdf5 package is not installed, or if the underlying HDF5 system
libraries are absent (common on stripped BBS nodes or end-user machines
without libhdf5), the function now emits a clear R-level warning
identifying the missing dependency and returns NULL invisibly, instead of
producing a fatal C-level abort.
- read_hdf5_biom(): wrapped the h5read() call in tryCatch() so that any
C-level or system-library error is caught and re-emitted as an informative
R warning, keeping the R session alive.
DEPENDENCY CHANGES
- Bumped Matrix dependency from (>= 1.2) to (>= 1.7-0) in DESCRIPTION.
This pins the package against the post-SuiteSparse ABI break and prevents
runtime crashes caused by binary incompatibilities in upstream sparse-matrix
libraries on BBS nodes running R 4.4+.
biomformat 1.39.2
BUG FIXES
- Fixed fatal test ERROR under testthat >= 3.0.0: replaced all deprecated
expect_that(x, is_true()), expect_that(x, is_identical_to(y)), and
expect_that(x, is_a("cls")) calls in tests/testthat/test-IO.R with their
modern equivalents (expect_true(), expect_identical(), expect_is(),
expect_true(is(x, "cls"))). The removed helpers caused the entire test
suite to ERROR on current Bioconductor BBS nodes, which was the primary
trigger for the deprecation warning.
Addresses GitHub Issue #17 (https://github.com/joey711/biomformat/issues/17):
"Bioconductor failure and risk of deprecation"
- Added missing importFrom(stats, setNames) and importFrom(utils,
packageVersion) directives to NAMESPACE, resolving "no visible global
function definition" NOTEs from R CMD check.
biomformat 0.3.13
USER-VISIBLE CHANGES
- Added make_biom function. Creates biom object from standard R data table(s).
biomformat 0.3.12
USER-VISIBLE CHANGES
- No user-visible changes. All future compatibility changes.
BUG FIXES
- Unit test changes to work with upcoming R release and new testthat version.
- This solves Issue 4: https://github.com/joey711/biom/issues/4
biomformat 0.3.11
USER-VISIBLE CHANGES
- No user-visible changes. All future CRAN compatibility changes.
BUG FIXES
- Clarified license and project in the README.md
- Added TODO.html, README.html, and TODO.md to .Rbuildignore (requested by CRAN)
- Moved 'biom-demo.Rmd' to 'vignettes/'
- Updated 'inst/NEWS' (this) file to official format
- Removed pre-built vignette HTML so that it is re-built during package build.
This updates things like the build-date in the vignette, but also ensures that
the user sees in the vignette the results of code that just worked with their
copy of the package.
biomformat 0.3.10
USER-VISIBLE CHANGES
- These changes should not affect any package behavior.
- Some of the top-level documentation has been changed to reflect new development
location on GitHub.
BUG FIXES
- Minor fixes for CRAN compatibility
- This addresses Issue 1: https://github.com/joey711/biom/issues/1
biomformat 0.3.9
SIGNIFICANT USER-VISIBLE CHANGES
- speed improvement for sparse matrices
NEW FEATURES
- The 'biom_data' parsing function now uses a vectorized (matrix-indexed)
assignment while parsing sparse matrices.
- Unofficial benchmarks estimate a few 100X speedup.
biomformat 0.3.8
SIGNIFICANT USER-VISIBLE CHANGES
- First release version released on CRAN