biocthis developer notes

Note that biocthis is not a Bioconductor-core package and as such it is not a Bioconductor official package. It was made by and for Leonardo Collado-Torres so he could more easily maintain and create Bioconductor packages as listed at lcolladotor.github.io/pkgs/. Hopefully biocthis will be helpful for you too.

Basics

For the basics, please check the Introduction to biocthis vignette.

biocthis developer notes

Backstory

In 2019, I was able to take the “Building Tidy Tools” workshop taught by Charlotte and Hadley Wickham during rstudio::conf(2019) thanks to a diversity scholarship. During this workshop, I learned about usethis (Wickham, Bryan, Barrett et al., 2024), devtools (Wickham, Hester, Chang et al., 2022), testthat (Wickham, 2011), among other R packages, and how to use RStudio Desktop to create R packages more efficiently. I got to revise this material and practice it more for the CDSB Workshop 2019: How to Build and Create Tidy Tools where we re-used the materials (with their permission) and translated them to Spanish. Over the years I have made several Bioconductor R packages that I maintain. Yet I learned a lot thanks to Charlotte and Hadley and have been relying more and more on usethis and related packages.

Earlier this year (2020) one of my Bioconductor packages (regionReport) was presenting some errors on some operating systems but not on others. I first spent quite a bit of time setting up the corresponding R installation in my non-work Windows computer. I still struggled to reproduce the error, so I finally learned how to use the Bioconductor docker images. That is, run the following code to then have an environment with all the system dependencies installed for Bioconductor packages. In this system you can then install your package dependencies and get very close to the Linux environment machine used for testing Bioconductor packages.

docker run \
    -e PASSWORD=bioc \
    -p 8787:8787 \
    bioconductor/bioconductor_docker:devel

Using this docker image, I was finally able to reproduce the error which involved others Bioconductor packages. However, there was a second hard-to-reproduce error. Using GitHub Actions, which I’ll talk about more soon, I was then able to find the root cause of this second issue and resolve it.

biocthis (Collado-Torres, 2024) was born from my interest to keep using usethis and related tools, but in a Bioconductor-friendly way. That is, this is a package that will help me (and maybe others too). This package was born from these 5 issues:

Styling code

BiocCheck is run on all new Bioconductor package submissions and by default it checks whether the new package adheres to the Bioconductor coding style guide. For a long time, it has suggested formatR as a solution for automatically styling code in an R package. While formatR mostly works and I’ve used it before, I recently discovered styler which can be used for styling code to fit the tidyverse coding style guide. On my own packages, I have found styler to be superior to formatR because it:

  • breaks less code,
  • can format roxygen2 example code,
  • and can re-format R Markdown files like vignettes.
  • Plus it seems to me to be under active maintenance, which is always a good thing.

Several of the issues I made are related to using styler to automatically re-format your code to match more closely the Bioconductor coding style guide. That is how bioc_style() was born and it was the suggested approach as discussed at Bioc-friendly style feature suggestion r-lib/styler#636. The maintainer of styler, Lorenz Walthert, has a great reply on that issue linking for a more detailed discussion on how to expand styler if the job requires doing so.

Currently, bioc_style() does not fully replicate the Bioconductor coding style, but it gets close enough. As Martin Morgan said at Recommend styler over formatR suggestion Bioconductor/BiocCheck#57, a solution that gets 90% of the way is good enough. bioc_style() is a very short function, mostly because the Bioconductor and Tidyverse coding style guides are overall very similar. This function won’t solve all the formatting issues detected by BiocCheck, but if you really want to, you can disable the formatting checks with:

## Use the following for the latest options
BiocCheck::usage()
## Disable formatting checks
BiocCheck::BiocCheck(`--no-check-formatting` = TRUE)

GitHub Actions

Motivation

I have been using Travis CI for several years now to help me run R CMD check every time I make a commit and push it to GitHub. Travis CI has mostly worked well for me, though I frequently had to maneuver around the 50 minute limit. I also recently ran into a problem where Hadley Wickham replied “We now recommend using the github actions workflow instead; which avoids all this configuration pain”. I also ran into a problem that didn’t always happen in Travis CI but that was potentially related to the computational resources provided (memory). I heard the term GitHub Actions at rstudio::conf(2020) but I ended up missing Jim Hester’s talk which you can watch online: I highly recommend it and wish I had started my adventure into GitHub Actions with it. Briefly, GitHub Actions allows you to run checks on Windows, macOS or Linux for up to 6 hours on machines with 7 GB of RAM. That’s two more operating systems than what I was using with Travis CI, a significant amount longer of time, and a decent chunk of memory.

The significance of these 3 operating systems is important to me because Bioconductor runs nightly checks on those 3 platforms. It’s a great way to know if your Bioconductor R package will work for most users. However, you only get one report per day. If you are not the most organized person like me, and have to fix your code before a release, then you don’t have as many days to check your R package(s) and need more frequent feedback. So I’ve been looking for a way to run checks on all three platforms on demand. Bioconductor has a Single Package Builder which does this, but it is restricted to new package submissions.

I know that there’s AppVeyor for running checks on Windows, but I never used it. Travis CI does support macOS and Linux. In the past, I have used rhub and I was able to run tests on a package using a combination of Travis CI and rhub as detailed at r-hub/rhub/issues#52. rhub maintainers have also taken steps to support Bioconductor’s release cycle as described at r-hub/rhub/issues#38. Regardless of the platform, it would ultimately be nice to have a single configuration file that you (the package developer) don’t need to update for every Bioconductor release cycle.

Developing a Bioconductor-friendly GHA workflow

I saw on Twitter the announcement about GitHub Actions in usethis and that is when I started to look more into usethis and actions by Jim Hester, particularly r-lib/actions/examples. As my usual, I tried to just get it to work and then had to look more closely at the documentation and the code. Naively, I thought that I could make r-lib/actions/examples/check-standard.yaml Bioconductor-friendly, which Jim Hester immediately recognized as a complicated task. As you can see at Bioconductor-friendly R CMD check action feature suggestion r-lib/actions#84 this took a while. When working on this, I also looked at several other resources and real world examples:

Most of the development of the Bioconductor-friendly GitHub Actions workflow provided by biocthis was done with leekgroup/derfinderPlot/.github/workflows/check-bioc.yml and LieberInstitute/recount3/.github/workflows/check-bioc.yml as detailed at: Bioconductor-friendly R CMD check action feature suggestion r-lib/actions#84. It was then further improved by a pull request with tests carried out at lcolladotor/testmatrix.

This work eventually lead to use_bioc_github_action() as it is today. The features of this GHA workflow are described in the Introduction to biocthis vignette. Going back to the story about developing this GHA workflow, while working on this GHA workflow, I ran into several issues and I wouldn’t be surprised if we run into more of them later on.

  • R has more tags than just release and devel, so just before R 4.0.0 was released it was called alpha, while release pointed to 3.6.3 and devel to 4.1.0. At r-lib/actions/pull/68 the decision was made that this was a transient issue. In the meantime I wanted to get the GHA to work, this lead me to many issues about installing package dependencies on R 4.1.0 to test Bioconductor 3.11, which is NOT the thing you should do! Bioconductor 3.11 is meant to run on R 4.0.x, not 4.1.x. Hervé Pagès helped me with some of these tricks, particularly with Windows. Also on Windows, I learned more about r-lib/actions from the update to support Rtools 4.0 on Windows by Jeroen Ooms, which Constantin AE and I were discussing on the RStudio Community website. On macOS I ran into compiling XML from source and its system dependencies. I also ran into xml2/issues/296 which is now officially resolved thanks to xml2/issues/302, though looking at r-lib/usethis/.github/workflows/R-CMD-check.yaml and r-lib/usethis/commits/.github/workflows/R-CMD-check.yaml was very helpful.
  • I had some issues installing BiocCheck and running R CMD BiocCheck on both Windows and the Bioconductor docker image (different issues) that I can avoid using code like this: Rscript -e "BiocCheck::BiocCheck()".
  • Apparently, we found a bug in the internal R code. It’s my second issue ever that traces back to the R internals! This is something we discussed quite a bit at the Bioc-devel mailing list with Martin Morgan, Charlotte Soneson and others. Check this message and the April thread history. I thought that it was linked to r-lib/remotes/issues/296 at some point.
  • I learned about caching on GitHub actions at r-lib/actions/issues/86 in general, I’ve been trying to update the GHA workflow here to reflect changes at actions.
  • I ran into some issues configuring git to then run pkgdown, which involved the GITHUB_TOKEN environment variable on the Bioconductor docker step and using git config --local a couple of times. You will also need to run pkgdown::deploy_to_branch() once locally to set up the gh-pages branch properly for pkgdown to work from GitHub Actions.
  • I ran into a small issue r-lib/covr/issues/427 with covr in case your package doesn’t have tests. This and the caching discussion, along with r-lib/actions/examples/pr-commands.yaml motivated me to use environment variables and conditionals to have a single workflow with some options at the top that you can set to run: covr, extract info from testthat, run pkgdown, or ignore the cache by including the keyword /nocache on your commit message.
  • I had to learn how to compile git from source by modifying these instructions in order to have a git version equal or newer to 2.18 on Ubuntu 18.04 such that I can then use actions/checkout@v2 and avoid issues with running pkgdown. I later learned how to use a ppa for Ubuntu for installing the latest git version. If you use actions/checkout@v1 you can end up at r-lib/actions/issues/50. If you use the default git 2.17.1 on Ubuntu 18.04 then you run into actions/checkout/issues/238 and other related issues. I could have avoided this by running pkgdown on macOS instead of the Bioconductor docker image that is based on rockerdev/rstudio:R.4.0.0_ubuntu18.04. Nowadays, the bioconductor docker devel images are based on Ubuntu 20.04, known as focal. The RStudio Package Manager (RSPM) greatly improves the speed at which R packages are installed in Linux and thus on the Bioconductor docker images. Since August 2022, ubuntu-latest changed to Ubuntu 22.04, also known as jammy.
  • Nitesh Turaga and Carl Boettiger rocker-org/rocker-versioned/issues/208 updated the rocker docker image for RStudio to R 4.0 and the corresponding Bioconductor docker image. To do so, they resolved several issues themselves. From my side, I only had to wait =)
  • I also had to learn how to mount directories with docker to enable caching the R package files when using the Bioconductor docker images, and other GitHub Actions syntax for which I relied quite heavily on their manual and Google searches; most of them lead me to the GitHub Actions community website.
  • In late September 2020, Marcel Ramos sent a pull request that greatly changed the GHA actions workflow to (1) use a template and (2) avoid the code redundancy across macOS, Windows and Linux (running a Bioconductor docker), which makes the resulting GHA workflow easier to understand and customize.

Potential future additions

  • It might be useful to use exactly the same .Renviron as the one used in the Bioconductor machines by downloading files like 3.11/Renviron.bioc and locating them correctly.
  • On a similar route, it might be useful to utilize rcli, potentially avoiding the need for dealing directly with the .Renviron files.

Wrapping up

The resulting Bioconductor-friendly GitHub Actions workflow that you can add to your package with biocthis::use_bioc_github_action() has many comments which you might find helpful for understanding why some steps are done the way they are. I have tried to simplify the workflow when possible, but it depends on the latest version of many tools and thus will expose you to issues you might have not dealt with, particularly compilation issues of R packages with R-devel (six months of the year with the current Bioconductor release cycle). If you need help, start by going through the steps listed at r-lib/actions#where-to-find-help. biocthis exclusive issues are always welcome, though please include the information that will enable others to help you faster. Thank you!

usethis-like functions

biocthis also provides other usethis-like functions. To make these functions, I looked at the code inside usethis and learned how to make templates, how the data is passed to the templates and some other steps. Some of the functions are really identical to the ones from usethis but point to a custom template provided by biocthis. These functions have simplified for me the task of having uniform README.Rmd/md and vignette files for instance, as well as having GitHub issue & support templates that include some Bioconductor-specific information and some of my own personal preferences for asking for help. I also included template R scripts through use_bioc_pkg_templates() that is an idea I first learned at rstudio::conf(2020) on the golem package. Those scripts are useful to keep track of code that you had to run to make the R package or to update it later. These scripts can greatly jump-start your R/Bioconductor package creation process. So maybe you’ll see more packages by me and others soon =) In particular, I really hope that we can get more CDSB members to submit R/Bioconductor packages to the world as explained in this story, which is something I care about quite a bit.

Acknowledgments

I just want to thank everyone for helping me understand different pieces of code, for producing the tools I used, for interacting with me across many GitHub issues, as well as answering questions on multiple mailing lists. The names below are in order they appear in this vignette:

as well as several organizations and members:

Thank you very much! 🙌🏽😊

Reproducibility

The biocthis package (Collado-Torres, 2024) was made possible thanks to:

  • R (R Core Team, 2024)
  • BiocStyle (Oleś, 2024)
  • covr (Hester, 2023)
  • devtools (Wickham, Hester, Chang et al., 2022)
  • fs (Hester, Wickham, and Csárdi, 2024)
  • glue (Hester and Bryan, 2024)
  • knitr (Xie, 2024)
  • pkgdown (Wickham, Hesselberth, Salmon et al., 2024)
  • rlang (Henry and Wickham, 2024)
  • RefManageR (McLean, 2017)
  • rmarkdown (Allaire, Xie, Dervieux et al., 2024)
  • sessioninfo (Wickham, Chang, Flight et al., 2021)
  • styler (Müller and Walthert, 2024)
  • testthat (Wickham, 2011)
  • usethis (Wickham, Bryan, Barrett et al., 2024)

This package was developed using biocthis.

Code for creating the vignette

## Create the vignette
library("rmarkdown")
system.time(render("biocthis_dev_notes.Rmd", "BiocStyle::html_document"))

## Extract the R code
library("knitr")
knit("biocthis_dev_notes.Rmd", tangle = TRUE)

Date the vignette was generated.

#> [1] "2024-12-11 03:10:19 UTC"

Wallclock time spent generating the vignette.

#> Time difference of 0.227 secs

R session information.

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.2 (2024-10-31)
#>  os       Ubuntu 24.04.1 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  C
#>  ctype    en_US.UTF-8
#>  tz       Etc/UTC
#>  date     2024-12-11
#>  pandoc   3.2.1 @ /usr/local/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  package     * version  date (UTC) lib source
#>  askpass       1.2.1    2024-10-04 [2] RSPM (R 4.4.0)
#>  backports     1.5.0    2024-05-23 [2] RSPM (R 4.4.0)
#>  bibtex        0.5.1    2023-01-26 [2] RSPM (R 4.4.0)
#>  BiocManager   1.30.25  2024-08-28 [2] RSPM (R 4.4.0)
#>  BiocStyle   * 2.35.0   2024-11-19 [2] https://bioc.r-universe.dev (R 4.4.2)
#>  biocthis    * 1.17.0   2024-12-11 [1] https://bioc.r-universe.dev (R 4.4.2)
#>  brio          1.1.5    2024-04-24 [2] RSPM (R 4.4.0)
#>  bslib         0.8.0    2024-07-29 [2] RSPM (R 4.4.0)
#>  buildtools    1.0.0    2024-12-09 [3] local (/pkg)
#>  cachem        1.1.0    2024-05-16 [2] RSPM (R 4.4.0)
#>  cli           3.6.3    2024-06-21 [2] RSPM (R 4.4.0)
#>  credentials   2.0.2    2024-10-04 [2] RSPM (R 4.4.0)
#>  desc          1.4.3    2023-12-10 [2] RSPM (R 4.4.0)
#>  digest        0.6.37   2024-08-19 [2] RSPM (R 4.4.0)
#>  evaluate      1.0.1    2024-10-10 [2] RSPM (R 4.4.0)
#>  fansi         1.0.6    2023-12-08 [2] RSPM (R 4.4.0)
#>  fastmap       1.2.0    2024-05-15 [2] RSPM (R 4.4.0)
#>  fs            1.6.5    2024-10-30 [2] RSPM (R 4.4.0)
#>  generics      0.1.3    2022-07-05 [2] RSPM (R 4.4.0)
#>  gert          2.1.4    2024-10-14 [2] RSPM (R 4.4.0)
#>  glue          1.8.0    2024-09-30 [2] RSPM (R 4.4.0)
#>  htmltools     0.5.8.1  2024-04-04 [2] RSPM (R 4.4.0)
#>  httr          1.4.7    2023-08-15 [2] RSPM (R 4.4.0)
#>  jquerylib     0.1.4    2021-04-26 [2] RSPM (R 4.4.0)
#>  jsonlite      1.8.9    2024-09-20 [2] RSPM (R 4.4.0)
#>  knitr         1.49     2024-11-08 [2] RSPM (R 4.4.0)
#>  lifecycle     1.0.4    2023-11-07 [2] RSPM (R 4.4.0)
#>  lubridate     1.9.4    2024-12-08 [2] RSPM (R 4.4.0)
#>  magrittr      2.0.3    2022-03-30 [2] RSPM (R 4.4.0)
#>  maketools     1.3.1    2024-10-04 [3] RSPM (R 4.4.0)
#>  openssl       2.2.2    2024-09-20 [2] RSPM (R 4.4.0)
#>  pillar        1.9.0    2023-03-22 [2] RSPM (R 4.4.0)
#>  pkgconfig     2.0.3    2019-09-22 [2] RSPM (R 4.4.0)
#>  plyr          1.8.9    2023-10-02 [2] RSPM (R 4.4.0)
#>  purrr         1.0.2    2023-08-10 [2] RSPM (R 4.4.0)
#>  R.cache       0.16.0   2022-07-21 [2] RSPM (R 4.4.0)
#>  R.methodsS3   1.8.2    2022-06-13 [2] RSPM (R 4.4.0)
#>  R.oo          1.27.0   2024-11-01 [2] RSPM (R 4.4.0)
#>  R.utils       2.12.3   2023-11-18 [2] RSPM (R 4.4.0)
#>  R6            2.5.1    2021-08-19 [2] RSPM (R 4.4.0)
#>  Rcpp          1.0.13-1 2024-11-02 [2] RSPM (R 4.4.0)
#>  RefManageR  * 1.4.0    2022-09-30 [2] RSPM (R 4.4.0)
#>  rlang         1.1.4    2024-06-04 [2] RSPM (R 4.4.0)
#>  rmarkdown     2.29     2024-11-04 [2] RSPM (R 4.4.0)
#>  roxygen2      7.3.2    2024-06-28 [2] RSPM (R 4.4.0)
#>  rprojroot     2.0.4    2023-11-05 [2] RSPM (R 4.4.0)
#>  rstudioapi    0.17.1   2024-10-22 [2] RSPM (R 4.4.0)
#>  sass          0.4.9    2024-03-15 [2] RSPM (R 4.4.0)
#>  sessioninfo * 1.2.2    2021-12-06 [2] RSPM (R 4.4.0)
#>  stringi       1.8.4    2024-05-06 [2] RSPM (R 4.4.0)
#>  stringr       1.5.1    2023-11-14 [2] RSPM (R 4.4.0)
#>  styler      * 1.10.3   2024-04-07 [2] RSPM (R 4.4.0)
#>  sys           3.4.3    2024-10-04 [2] RSPM (R 4.4.0)
#>  testthat      3.2.2    2024-12-10 [2] CRAN (R 4.4.2)
#>  tibble        3.2.1    2023-03-20 [2] RSPM (R 4.4.0)
#>  timechange    0.3.0    2024-01-18 [2] RSPM (R 4.4.0)
#>  usethis     * 3.1.0    2024-11-26 [2] RSPM (R 4.4.0)
#>  utf8          1.2.4    2023-10-22 [2] RSPM (R 4.4.0)
#>  vctrs         0.6.5    2023-12-01 [2] RSPM (R 4.4.0)
#>  whisker       0.4.1    2022-12-05 [2] RSPM (R 4.4.0)
#>  withr         3.0.2    2024-10-28 [2] RSPM (R 4.4.0)
#>  xfun          0.49     2024-10-31 [2] RSPM (R 4.4.0)
#>  xml2          1.3.6    2023-12-04 [2] RSPM (R 4.4.0)
#>  yaml          2.3.10   2024-07-26 [2] RSPM (R 4.4.0)
#> 
#>  [1] /tmp/RtmpbYbBm6/Rinst13fb7de10620
#>  [2] /github/workspace/pkglib
#>  [3] /usr/local/lib/R/site-library
#>  [4] /usr/lib/R/site-library
#>  [5] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Bibliography

This vignette was generated using BiocStyle (Oleś, 2024) with knitr (Xie, 2024) and rmarkdown (Allaire, Xie, Dervieux et al., 2024) running behind the scenes.

Citations made with RefManageR (McLean, 2017).

[1] J. Allaire, Y. Xie, C. Dervieux, et al. rmarkdown: Dynamic Documents for R. R package version 2.29. 2024. URL: https://github.com/rstudio/rmarkdown.

[2] L. Collado-Torres. Automate package and project setup for Bioconductor packages. https://github.com/lcolladotor/biocthisbiocthis - R package version 1.17.0. 2024. DOI: 10.18129/B9.bioc.biocthis. URL: http://www.bioconductor.org/packages/biocthis.

[3] L. Henry and H. Wickham. rlang: Functions for Base Types and Core R and ‘Tidyverse’ Features. R package version 1.1.4, https://github.com/r-lib/rlang. 2024. URL: https://rlang.r-lib.org.

[4] J. Hester. covr: Test Coverage for Packages. R package version 3.6.4, https://github.com/r-lib/covr. 2023. URL: https://covr.r-lib.org.

[5] J. Hester and J. Bryan. glue: Interpreted String Literals. R package version 1.8.0, https://github.com/tidyverse/glue. 2024. URL: https://glue.tidyverse.org/.

[6] J. Hester, H. Wickham, and G. Csárdi. fs: Cross-Platform File System Operations Based on ‘libuv’. R package version 1.6.5, https://github.com/r-lib/fs. 2024. URL: https://fs.r-lib.org.

[7] M. W. McLean. “RefManageR: Import and Manage BibTeX and BibLaTeX References in R”. In: The Journal of Open Source Software (2017). DOI: 10.21105/joss.00338.

[8] K. Müller and L. Walthert. styler: Non-Invasive Pretty Printing of R Code. R package version 1.10.3, https://styler.r-lib.org. 2024. URL: https://github.com/r-lib/styler.

[9] A. Oleś. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.35.0. 2024. URL: https://github.com/Bioconductor/BiocStyle.

[10] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2024. URL: https://www.R-project.org/.

[11] H. Wickham. “testthat: Get Started with Testing”. In: The R Journal 3 (2011), pp. 5–10. URL: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.

[12] H. Wickham, J. Bryan, M. Barrett, et al. usethis: Automate Package and Project Setup. R package version 3.1.0, https://github.com/r-lib/usethis. 2024. URL: https://usethis.r-lib.org.

[13] H. Wickham, W. Chang, R. Flight, et al. sessioninfo: R Session Information. R package version 1.2.2, https://r-lib.github.io/sessioninfo/. 2021. URL: https://github.com/r-lib/sessioninfo#readme.

[14] H. Wickham, J. Hesselberth, M. Salmon, et al. pkgdown: Make Static HTML Documentation for a Package. R package version 2.1.1, https://github.com/r-lib/pkgdown. 2024. URL: https://pkgdown.r-lib.org/.

[15] H. Wickham, J. Hester, W. Chang, et al. devtools: Tools to Make Developing R Packages Easier. R package version 2.4.5, https://github.com/r-lib/devtools. 2022. URL: https://devtools.r-lib.org/.

[16] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.49. 2024. URL: https://yihui.org/knitr/.