HubPub: Help with publication of Hub packages

Introduction

HubPub provides users with functionality to help with the Bioconductor Hub structures. The package provides the ability to create a skeleton of a Hub style package that the user can then populate with the necessary information. There are also functions to help add resources to the Hub pacakge metadata files as well as publish data to the Bioconductor S3 bucket.

Installation

Install the most recent version from Bioconductor:

if(!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("HubPub")

Then load HubPub:

library(HubPub)

HubPub

Creating a Hub styled package

The create_pkg() function creates the skeleton of a package that follows the guidelines for a Bioconductor Hub type package. More information about what are the requirements and content for a Hub style package the developer can look at the “Creating A Hub Package” vignette from this package.

create_pkg() requires a path to where the packages is to be created and the type of package that should be created (“AnnotationHub” or “ExperimentHub”). There is also a variable use_git that indicates if the package should be set up with git (default is TRUE).

NOTE: This function is intended for a developer that has not created the package yet. If the package has already been created, then this function will not benefit the developer. There are a couple other functions in this package that deal with resources that might be helpful, more on these later in the vignette.

fl <- tempdir()
create_pkg(file.path(fl, "examplePkg"), "ExperimentHub")
#> ✔ Creating '/tmp/RtmpR58m6C/examplePkg/'.
#> ✔ Setting active project to "/tmp/RtmpR58m6C/examplePkg".
#> ✔ Creating 'R/'.
#> ✔ Writing 'DESCRIPTION'.
#> Package: examplePkg
#> Title: What the Package Does (One Line, Title Case)
#> Version: 0.99.0
#> Date: 2024-10-30
#> Authors@R (parsed):
#>     * First Last <[email protected]> [aut, cre] (YOUR-ORCID-ID)
#> Description: What the package does (one paragraph).
#> License: Artistic-2.0
#> BugReports: https://support.bioconductor.org/t/examplePkg
#> Imports:
#>     ExperimentHub
#> Suggests:
#>     ExperimentHubData
#> Encoding: UTF-8
#> Roxygen: list(markdown = TRUE)
#> RoxygenNote: 7.0.0
#> biocViews: ExperimentHub
#> ✔ Writing 'NAMESPACE'.
#> ✔ Setting active project to "<no active project>".
#> ✔ Setting active project to "/tmp/RtmpR58m6C/examplePkg".
#> ✔ Initialising Git repo.
#> ✔ Adding ".Rproj.user", ".Rhistory", ".Rdata", ".httr-oauth", ".DS_Store", and
#>   ".quarto" to '.gitignore'.
#> ✔ Writing 'R/examplePkg-package.R'.
#> ✔ Writing 'NEWS.md'.
#> ✔ Creating 'man/'.
#> ✔ Creating 'inst/scripts/'.
#> ✔ Writing 'inst/scripts/make-data.R'.
#> ✔ Writing 'inst/scripts/make-metadata.R'.
#> ✔ Writing 'R/zzz.R'.
#> ✔ Creating 'inst/extdata/'.
#> ✔ Adding testthat to 'Suggests' field in DESCRIPTION.
#> ✔ Adding "3" to 'Config/testthat/edition'.
#> ✔ Creating 'tests/testthat/'.
#> ✔ Writing 'tests/testthat.R'.
#> ☐ Call `usethis::use_test()` to initialize a basic test file and open it for
#>   editing.
#> ✔ Writing 'tests/testthat/test_metadata.R'.
#> [1] "/tmp/RtmpR58m6C/examplePkg"

Once the package is created the developer can go through and make any changes to the package. For example, the DESCRIPTON file contains very basic requirements but the developer should go back and fill in the ‘Title:’ and ‘Description:’ fields.

Adding a resource to the metadata file

Another useful function in HubPub is add_resource(). This function can be useful for developers who are creating a new Hub related package or for developers who want to add a new resource to an existing Hub package. The purpose of this function is to add a hub resource to the package metadata.csv file. The function requires the name of the package (or the path to the newly created package) and a named list with the data to be added to the resource. To get the elements and content for this list look at ?hub_metadata. There is also information in the “Creating A Hub Package” vignette from this package.

metadata <- hub_metadata(
    Title = "ENCODE",
    Description = "a test entry",
    BiocVersion = "4.1",
    Genome = NA_character_,
    SourceType = "JSON",
    SourceUrl = "http://www.encodeproject.org",
    SourceVersion = "x.y.z",
    Species = NA_character_,
    TaxonomyId = as.integer(9606),
    Coordinate_1_based = NA,
    DataProvider = "ENCODE Project",
    Maintainer = "tst person <[email protected]>",
    RDataClass = "Rda",
    DispatchClass = "Rda",
    Location_Prefix = "s3://experimenthub/",
    RDataPath = "ENCODExplorerData/encode_df_lite.rda",
    Tags = "ENCODE:Homo sapiens"
)

add_resource(file.path(fl, "examplePkg"), metadata)
#> Warning: replacing previous import 'utils::findMatches' by
#> 'S4Vectors::findMatches' when loading 'ExperimentHubData'
#> [1] Creating log directory /github/home/.AnnotationHubData
#> [1] "/tmp/RtmpR58m6C/examplePkg/inst/extdata/metadata.csv"

Then if you want to see what the metadata file looks like you can read in the csv file like the following.

resource <- file.path(fl, "examplePkg", "inst", "extdata", "metadata.csv")
tst <- read.csv(resource)
tst
#>    Title  Description BiocVersion Genome SourceType
#> 1 ENCODE a test entry         4.1     NA       JSON
#>                      SourceUrl SourceVersion Species TaxonomyId
#> 1 http://www.encodeproject.org         x.y.z      NA       9606
#>   Coordinate_1_based   DataProvider                 Maintainer RDataClass
#> 1                 NA ENCODE Project tst person <[email protected]>        Rda
#>   DispatchClass     Location_Prefix                            RDataPath
#> 1           Rda s3://experimenthub/ ENCODExplorerData/encode_df_lite.rda
#>                  Tags
#> 1 ENCODE:Homo sapiens

Publishing the resource to AWS S3

The final function in HubPub helps the developer with publishing data resources to an Bioconductor AWS S3. The function utilizes functions for the aws.s3 package to place files or directories on S3. The developer should have already contacted the Bioconductor hubs maintainers to get the necessary credentials to access the bucket. Once the credentials are received the developer should declare them in the system environment before running this function. The function requires a path to the file or name of the directory to be added to the bucket and a name for how the object should be named on the bucket. If adding a directory be sure there are no nested directories and only files.

The below code chunk demonstrates the use of the function using a dummy dataset. It will only work if the necessary global environments have been declared with the hub credentials.

## For publishing directories with multiple files
fl <- tempdir()
utils::write.csv(mtcars, file = file.path(fl, "mtcars1.csv"))
utils::write.csv(mtcars, file = file.path(fl, "mtcars2.csv"))
publish_resource(fl, "test_dir")
#> Warning in publish_resource(fl, "test_dir"): Not all system environment
#> variables are set, do so and rerun function.
#> copy '/tmp/RtmpR58m6C/examplePkg' to 's3://annotation-contributor/test_dir/examplePkg'
#> copy '/tmp/RtmpR58m6C/mtcars1.csv' to 's3://annotation-contributor/test_dir/mtcars1.csv'
#> copy '/tmp/RtmpR58m6C/mtcars2.csv' to 's3://annotation-contributor/test_dir/mtcars2.csv'
#> copy '/tmp/RtmpR58m6C/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fbioc%2Fsrc%2Fcontrib.rds' to 's3://annotation-contributor/test_dir/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fbioc%2Fsrc%2Fcontrib.rds'
#> copy '/tmp/RtmpR58m6C/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fbooks%2Fsrc%2Fcontrib.rds' to 's3://annotation-contributor/test_dir/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fbooks%2Fsrc%2Fcontrib.rds'
#> copy '/tmp/RtmpR58m6C/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fdata%2Fannotation%2Fsrc%2Fcontrib.rds' to 's3://annotation-contributor/test_dir/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fdata%2Fannotation%2Fsrc%2Fcontrib.rds'
#> copy '/tmp/RtmpR58m6C/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fdata%2Fexperiment%2Fsrc%2Fcontrib.rds' to 's3://annotation-contributor/test_dir/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fdata%2Fexperiment%2Fsrc%2Fcontrib.rds'
#> copy '/tmp/RtmpR58m6C/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fworkflows%2Fsrc%2Fcontrib.rds' to 's3://annotation-contributor/test_dir/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fworkflows%2Fsrc%2Fcontrib.rds'
#> copy '/tmp/RtmpR58m6C/repos_https%3A%2F%2Fcloud.r-project.org%2Fsrc%2Fcontrib.rds' to 's3://annotation-contributor/test_dir/repos_https%3A%2F%2Fcloud.r-project.org%2Fsrc%2Fcontrib.rds'
#> $`/tmp/RtmpR58m6C/examplePkg`
#> NULL
#> 
#> $`/tmp/RtmpR58m6C/mtcars1.csv`
#> NULL
#> 
#> $`/tmp/RtmpR58m6C/mtcars2.csv`
#> NULL
#> 
#> $`/tmp/RtmpR58m6C/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fbioc%2Fsrc%2Fcontrib.rds`
#> NULL
#> 
#> $`/tmp/RtmpR58m6C/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fbooks%2Fsrc%2Fcontrib.rds`
#> NULL
#> 
#> $`/tmp/RtmpR58m6C/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fdata%2Fannotation%2Fsrc%2Fcontrib.rds`
#> NULL
#> 
#> $`/tmp/RtmpR58m6C/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fdata%2Fexperiment%2Fsrc%2Fcontrib.rds`
#> NULL
#> 
#> $`/tmp/RtmpR58m6C/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fworkflows%2Fsrc%2Fcontrib.rds`
#> NULL
#> 
#> $`/tmp/RtmpR58m6C/repos_https%3A%2F%2Fcloud.r-project.org%2Fsrc%2Fcontrib.rds`
#> NULL

## For publishing a single file
utils::write.csv(mtcars, file = file.path(fl, "mtcars3.csv"))
publish_resource(file.path(fl, "mtcars3.csv"), "test_dir")
#> Warning in publish_resource(file.path(fl, "mtcars3.csv"), "test_dir"): Not all
#> system environment variables are set, do so and rerun function.
#> copy '/tmp/RtmpR58m6C/mtcars3.csv' to 's3://annotation-contributor/test_dir/mtcars3.csv'
#> $`/tmp/RtmpR58m6C/mtcars3.csv`
#> NULL

Session Information

sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] futile.logger_1.4.3 HubPub_1.15.0       BiocStyle_2.35.0   
#> 
#> loaded via a namespace (and not attached):
#>   [1] sys_3.4.3                   rstudioapi_0.17.1          
#>   [3] jsonlite_1.8.9              magrittr_2.0.3             
#>   [5] GenomicFeatures_1.57.1      rmarkdown_2.28             
#>   [7] fs_1.6.4                    BiocIO_1.17.0              
#>   [9] zlibbioc_1.51.2             vctrs_0.6.5                
#>  [11] memoise_2.0.1               Rsamtools_2.21.2           
#>  [13] RCurl_1.98-1.16             askpass_1.2.1              
#>  [15] base64enc_0.1-3             BiocBaseUtils_1.9.0        
#>  [17] htmltools_0.5.8.1           S4Arrays_1.5.11            
#>  [19] usethis_3.0.0               progress_1.2.3             
#>  [21] lambda.r_1.2.4              AnnotationHub_3.15.0       
#>  [23] curl_5.2.3                  SparseArray_1.5.45         
#>  [25] sass_0.4.9                  bslib_0.8.0                
#>  [27] desc_1.4.3                  testthat_3.2.1.1           
#>  [29] httr2_1.0.5                 futile.options_1.0.1       
#>  [31] cachem_1.1.0                available_1.1.0            
#>  [33] buildtools_1.0.0            GenomicAlignments_1.41.0   
#>  [35] whisker_0.4.1               lifecycle_1.0.4            
#>  [37] pkgconfig_2.0.3             Matrix_1.7-1               
#>  [39] R6_2.5.1                    fastmap_1.2.0              
#>  [41] BiocCheck_1.43.0            GenomeInfoDbData_1.2.13    
#>  [43] MatrixGenerics_1.17.1       digest_0.6.37              
#>  [45] AnnotationDbi_1.69.0        S4Vectors_0.43.2           
#>  [47] OrganismDbi_1.47.0          rprojroot_2.0.4            
#>  [49] ExperimentHub_2.13.1        aws.signature_0.6.0        
#>  [51] GenomicRanges_1.57.2        RSQLite_2.3.7              
#>  [53] filelock_1.0.3              fansi_1.0.6                
#>  [55] httr_1.4.7                  abind_1.4-8                
#>  [57] compiler_4.4.1              bit64_4.5.2                
#>  [59] withr_3.0.2                 biocViews_1.75.0           
#>  [61] BiocParallel_1.41.0         DBI_1.2.3                  
#>  [63] R.utils_2.12.3              biomaRt_2.63.0             
#>  [65] openssl_2.2.2               rappdirs_0.3.3             
#>  [67] DelayedArray_0.31.14        rjson_0.2.23               
#>  [69] tools_4.4.1                 R.oo_1.26.0                
#>  [71] glue_1.8.0                  restfulr_0.0.15            
#>  [73] R.cache_0.16.0              grid_4.4.1                 
#>  [75] stringdist_0.9.12           generics_0.1.3             
#>  [77] R.methodsS3_1.8.2           hms_1.1.3                  
#>  [79] xml2_1.3.6                  utf8_1.2.4                 
#>  [81] XVector_0.45.0              BiocGenerics_0.53.0        
#>  [83] BiocVersion_3.21.1          pillar_1.9.0               
#>  [85] stringr_1.5.1               dplyr_1.1.4                
#>  [87] BiocFileCache_2.15.0        lattice_0.22-6             
#>  [89] AnnotationHubData_1.37.0    rtracklayer_1.65.0         
#>  [91] bit_4.5.0                   tidyselect_1.2.1           
#>  [93] RBGL_1.81.0                 maketools_1.3.1            
#>  [95] Biostrings_2.75.0           knitr_1.48                 
#>  [97] biocthis_1.17.0             IRanges_2.39.2             
#>  [99] SummarizedExperiment_1.35.5 stats4_4.4.1               
#> [101] xfun_0.48                   Biobase_2.67.0             
#> [103] credentials_2.0.2           brio_1.1.5                 
#> [105] matrixStats_1.4.1           stringi_1.8.4              
#> [107] UCSC.utils_1.1.0            yaml_2.3.10                
#> [109] evaluate_1.0.1              codetools_0.2-20           
#> [111] tibble_3.2.1                BiocManager_1.30.25        
#> [113] graph_1.83.0                cli_3.6.3                  
#> [115] jquerylib_0.1.4             styler_1.10.3              
#> [117] GenomeInfoDb_1.41.2         gert_2.1.4                 
#> [119] dbplyr_2.5.0                png_0.1-8                  
#> [121] XML_3.99-0.17               RUnit_0.4.33               
#> [123] parallel_4.4.1              blob_1.2.4                 
#> [125] prettyunits_1.2.0           aws.s3_0.3.21              
#> [127] AnnotationForge_1.49.0      bitops_1.0-9               
#> [129] txdbmaker_1.1.2             ExperimentHubData_1.31.0   
#> [131] purrr_1.0.2                 crayon_1.5.3               
#> [133] rlang_1.1.4                 formatR_1.14               
#> [135] KEGGREST_1.45.1