HubPub
provides users with functionality to help with
the Bioconductor Hub structures. The package provides the
ability to create a skeleton of a Hub style package that the user can
then populate with the necessary information. There are also functions
to help add resources to the Hub pacakge metadata files as well as
publish data to the Bioconductor S3 bucket.
Install the most recent version from Bioconductor:
if(!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("HubPub")
Then load HubPub
:
The create_pkg()
function creates the skeleton of a
package that follows the guidelines for a Bioconductor Hub type
package. More information about what are the requirements and content
for a Hub style package the developer can look at the “Creating A Hub
Package” vignette from this package.
create_pkg()
requires a path to where the packages is to
be created and the type of package that should be created
(“AnnotationHub” or “ExperimentHub”). There is also a variable
use_git
that indicates if the package should be set up with
git (default is TRUE
).
NOTE: This function is intended for a developer that has not created the package yet. If the package has already been created, then this function will not benefit the developer. There are a couple other functions in this package that deal with resources that might be helpful, more on these later in the vignette.
fl <- tempdir()
create_pkg(file.path(fl, "examplePkg"), "ExperimentHub")
#> ✔ Creating '/tmp/RtmpR58m6C/examplePkg/'.
#> ✔ Setting active project to "/tmp/RtmpR58m6C/examplePkg".
#> ✔ Creating 'R/'.
#> ✔ Writing 'DESCRIPTION'.
#> Package: examplePkg
#> Title: What the Package Does (One Line, Title Case)
#> Version: 0.99.0
#> Date: 2024-10-30
#> Authors@R (parsed):
#> * First Last <[email protected]> [aut, cre] (YOUR-ORCID-ID)
#> Description: What the package does (one paragraph).
#> License: Artistic-2.0
#> BugReports: https://support.bioconductor.org/t/examplePkg
#> Imports:
#> ExperimentHub
#> Suggests:
#> ExperimentHubData
#> Encoding: UTF-8
#> Roxygen: list(markdown = TRUE)
#> RoxygenNote: 7.0.0
#> biocViews: ExperimentHub
#> ✔ Writing 'NAMESPACE'.
#> ✔ Setting active project to "<no active project>".
#> ✔ Setting active project to "/tmp/RtmpR58m6C/examplePkg".
#> ✔ Initialising Git repo.
#> ✔ Adding ".Rproj.user", ".Rhistory", ".Rdata", ".httr-oauth", ".DS_Store", and
#> ".quarto" to '.gitignore'.
#> ✔ Writing 'R/examplePkg-package.R'.
#> ✔ Writing 'NEWS.md'.
#> ✔ Creating 'man/'.
#> ✔ Creating 'inst/scripts/'.
#> ✔ Writing 'inst/scripts/make-data.R'.
#> ✔ Writing 'inst/scripts/make-metadata.R'.
#> ✔ Writing 'R/zzz.R'.
#> ✔ Creating 'inst/extdata/'.
#> ✔ Adding testthat to 'Suggests' field in DESCRIPTION.
#> ✔ Adding "3" to 'Config/testthat/edition'.
#> ✔ Creating 'tests/testthat/'.
#> ✔ Writing 'tests/testthat.R'.
#> ☐ Call `usethis::use_test()` to initialize a basic test file and open it for
#> editing.
#> ✔ Writing 'tests/testthat/test_metadata.R'.
#> [1] "/tmp/RtmpR58m6C/examplePkg"
Once the package is created the developer can go through and make any changes to the package. For example, the DESCRIPTON file contains very basic requirements but the developer should go back and fill in the ‘Title:’ and ‘Description:’ fields.
Another useful function in HubPub
is
add_resource()
. This function can be useful for developers
who are creating a new Hub related package or for developers who want to
add a new resource to an existing Hub package. The purpose of this
function is to add a hub resource to the package metadata.csv file. The
function requires the name of the package (or the path to the newly
created package) and a named list with the data to be added to the
resource. To get the elements and content for this list look at
?hub_metadata
. There is also information in the “Creating A
Hub Package” vignette from this package.
metadata <- hub_metadata(
Title = "ENCODE",
Description = "a test entry",
BiocVersion = "4.1",
Genome = NA_character_,
SourceType = "JSON",
SourceUrl = "http://www.encodeproject.org",
SourceVersion = "x.y.z",
Species = NA_character_,
TaxonomyId = as.integer(9606),
Coordinate_1_based = NA,
DataProvider = "ENCODE Project",
Maintainer = "tst person <[email protected]>",
RDataClass = "Rda",
DispatchClass = "Rda",
Location_Prefix = "s3://experimenthub/",
RDataPath = "ENCODExplorerData/encode_df_lite.rda",
Tags = "ENCODE:Homo sapiens"
)
add_resource(file.path(fl, "examplePkg"), metadata)
#> Warning: replacing previous import 'utils::findMatches' by
#> 'S4Vectors::findMatches' when loading 'ExperimentHubData'
#> [1] Creating log directory /github/home/.AnnotationHubData
#> [1] "/tmp/RtmpR58m6C/examplePkg/inst/extdata/metadata.csv"
Then if you want to see what the metadata file looks like you can read in the csv file like the following.
resource <- file.path(fl, "examplePkg", "inst", "extdata", "metadata.csv")
tst <- read.csv(resource)
tst
#> Title Description BiocVersion Genome SourceType
#> 1 ENCODE a test entry 4.1 NA JSON
#> SourceUrl SourceVersion Species TaxonomyId
#> 1 http://www.encodeproject.org x.y.z NA 9606
#> Coordinate_1_based DataProvider Maintainer RDataClass
#> 1 NA ENCODE Project tst person <[email protected]> Rda
#> DispatchClass Location_Prefix RDataPath
#> 1 Rda s3://experimenthub/ ENCODExplorerData/encode_df_lite.rda
#> Tags
#> 1 ENCODE:Homo sapiens
The final function in HubPub
helps the developer with
publishing data resources to an Bioconductor AWS S3. The function
utilizes functions for the aws.s3
package to place files or
directories on S3. The developer should have already contacted the
Bioconductor hubs maintainers to get the necessary credentials to access
the bucket. Once the credentials are received the developer should
declare them in the system environment before running this function. The
function requires a path to the file or name of the directory to be
added to the bucket and a name for how the object should be named on the
bucket. If adding a directory be sure there are no nested directories
and only files.
The below code chunk demonstrates the use of the function using a dummy dataset. It will only work if the necessary global environments have been declared with the hub credentials.
## For publishing directories with multiple files
fl <- tempdir()
utils::write.csv(mtcars, file = file.path(fl, "mtcars1.csv"))
utils::write.csv(mtcars, file = file.path(fl, "mtcars2.csv"))
publish_resource(fl, "test_dir")
#> Warning in publish_resource(fl, "test_dir"): Not all system environment
#> variables are set, do so and rerun function.
#> copy '/tmp/RtmpR58m6C/examplePkg' to 's3://annotation-contributor/test_dir/examplePkg'
#> copy '/tmp/RtmpR58m6C/mtcars1.csv' to 's3://annotation-contributor/test_dir/mtcars1.csv'
#> copy '/tmp/RtmpR58m6C/mtcars2.csv' to 's3://annotation-contributor/test_dir/mtcars2.csv'
#> copy '/tmp/RtmpR58m6C/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fbioc%2Fsrc%2Fcontrib.rds' to 's3://annotation-contributor/test_dir/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fbioc%2Fsrc%2Fcontrib.rds'
#> copy '/tmp/RtmpR58m6C/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fbooks%2Fsrc%2Fcontrib.rds' to 's3://annotation-contributor/test_dir/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fbooks%2Fsrc%2Fcontrib.rds'
#> copy '/tmp/RtmpR58m6C/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fdata%2Fannotation%2Fsrc%2Fcontrib.rds' to 's3://annotation-contributor/test_dir/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fdata%2Fannotation%2Fsrc%2Fcontrib.rds'
#> copy '/tmp/RtmpR58m6C/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fdata%2Fexperiment%2Fsrc%2Fcontrib.rds' to 's3://annotation-contributor/test_dir/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fdata%2Fexperiment%2Fsrc%2Fcontrib.rds'
#> copy '/tmp/RtmpR58m6C/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fworkflows%2Fsrc%2Fcontrib.rds' to 's3://annotation-contributor/test_dir/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fworkflows%2Fsrc%2Fcontrib.rds'
#> copy '/tmp/RtmpR58m6C/repos_https%3A%2F%2Fcloud.r-project.org%2Fsrc%2Fcontrib.rds' to 's3://annotation-contributor/test_dir/repos_https%3A%2F%2Fcloud.r-project.org%2Fsrc%2Fcontrib.rds'
#> $`/tmp/RtmpR58m6C/examplePkg`
#> NULL
#>
#> $`/tmp/RtmpR58m6C/mtcars1.csv`
#> NULL
#>
#> $`/tmp/RtmpR58m6C/mtcars2.csv`
#> NULL
#>
#> $`/tmp/RtmpR58m6C/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fbioc%2Fsrc%2Fcontrib.rds`
#> NULL
#>
#> $`/tmp/RtmpR58m6C/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fbooks%2Fsrc%2Fcontrib.rds`
#> NULL
#>
#> $`/tmp/RtmpR58m6C/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fdata%2Fannotation%2Fsrc%2Fcontrib.rds`
#> NULL
#>
#> $`/tmp/RtmpR58m6C/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fdata%2Fexperiment%2Fsrc%2Fcontrib.rds`
#> NULL
#>
#> $`/tmp/RtmpR58m6C/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.20%2Fworkflows%2Fsrc%2Fcontrib.rds`
#> NULL
#>
#> $`/tmp/RtmpR58m6C/repos_https%3A%2F%2Fcloud.r-project.org%2Fsrc%2Fcontrib.rds`
#> NULL
## For publishing a single file
utils::write.csv(mtcars, file = file.path(fl, "mtcars3.csv"))
publish_resource(file.path(fl, "mtcars3.csv"), "test_dir")
#> Warning in publish_resource(file.path(fl, "mtcars3.csv"), "test_dir"): Not all
#> system environment variables are set, do so and rerun function.
#> copy '/tmp/RtmpR58m6C/mtcars3.csv' to 's3://annotation-contributor/test_dir/mtcars3.csv'
#> $`/tmp/RtmpR58m6C/mtcars3.csv`
#> NULL
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] futile.logger_1.4.3 HubPub_1.15.0 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] sys_3.4.3 rstudioapi_0.17.1
#> [3] jsonlite_1.8.9 magrittr_2.0.3
#> [5] GenomicFeatures_1.57.1 rmarkdown_2.28
#> [7] fs_1.6.4 BiocIO_1.17.0
#> [9] zlibbioc_1.51.2 vctrs_0.6.5
#> [11] memoise_2.0.1 Rsamtools_2.21.2
#> [13] RCurl_1.98-1.16 askpass_1.2.1
#> [15] base64enc_0.1-3 BiocBaseUtils_1.9.0
#> [17] htmltools_0.5.8.1 S4Arrays_1.5.11
#> [19] usethis_3.0.0 progress_1.2.3
#> [21] lambda.r_1.2.4 AnnotationHub_3.15.0
#> [23] curl_5.2.3 SparseArray_1.5.45
#> [25] sass_0.4.9 bslib_0.8.0
#> [27] desc_1.4.3 testthat_3.2.1.1
#> [29] httr2_1.0.5 futile.options_1.0.1
#> [31] cachem_1.1.0 available_1.1.0
#> [33] buildtools_1.0.0 GenomicAlignments_1.41.0
#> [35] whisker_0.4.1 lifecycle_1.0.4
#> [37] pkgconfig_2.0.3 Matrix_1.7-1
#> [39] R6_2.5.1 fastmap_1.2.0
#> [41] BiocCheck_1.43.0 GenomeInfoDbData_1.2.13
#> [43] MatrixGenerics_1.17.1 digest_0.6.37
#> [45] AnnotationDbi_1.69.0 S4Vectors_0.43.2
#> [47] OrganismDbi_1.47.0 rprojroot_2.0.4
#> [49] ExperimentHub_2.13.1 aws.signature_0.6.0
#> [51] GenomicRanges_1.57.2 RSQLite_2.3.7
#> [53] filelock_1.0.3 fansi_1.0.6
#> [55] httr_1.4.7 abind_1.4-8
#> [57] compiler_4.4.1 bit64_4.5.2
#> [59] withr_3.0.2 biocViews_1.75.0
#> [61] BiocParallel_1.41.0 DBI_1.2.3
#> [63] R.utils_2.12.3 biomaRt_2.63.0
#> [65] openssl_2.2.2 rappdirs_0.3.3
#> [67] DelayedArray_0.31.14 rjson_0.2.23
#> [69] tools_4.4.1 R.oo_1.26.0
#> [71] glue_1.8.0 restfulr_0.0.15
#> [73] R.cache_0.16.0 grid_4.4.1
#> [75] stringdist_0.9.12 generics_0.1.3
#> [77] R.methodsS3_1.8.2 hms_1.1.3
#> [79] xml2_1.3.6 utf8_1.2.4
#> [81] XVector_0.45.0 BiocGenerics_0.53.0
#> [83] BiocVersion_3.21.1 pillar_1.9.0
#> [85] stringr_1.5.1 dplyr_1.1.4
#> [87] BiocFileCache_2.15.0 lattice_0.22-6
#> [89] AnnotationHubData_1.37.0 rtracklayer_1.65.0
#> [91] bit_4.5.0 tidyselect_1.2.1
#> [93] RBGL_1.81.0 maketools_1.3.1
#> [95] Biostrings_2.75.0 knitr_1.48
#> [97] biocthis_1.17.0 IRanges_2.39.2
#> [99] SummarizedExperiment_1.35.5 stats4_4.4.1
#> [101] xfun_0.48 Biobase_2.67.0
#> [103] credentials_2.0.2 brio_1.1.5
#> [105] matrixStats_1.4.1 stringi_1.8.4
#> [107] UCSC.utils_1.1.0 yaml_2.3.10
#> [109] evaluate_1.0.1 codetools_0.2-20
#> [111] tibble_3.2.1 BiocManager_1.30.25
#> [113] graph_1.83.0 cli_3.6.3
#> [115] jquerylib_0.1.4 styler_1.10.3
#> [117] GenomeInfoDb_1.41.2 gert_2.1.4
#> [119] dbplyr_2.5.0 png_0.1-8
#> [121] XML_3.99-0.17 RUnit_0.4.33
#> [123] parallel_4.4.1 blob_1.2.4
#> [125] prettyunits_1.2.0 aws.s3_0.3.21
#> [127] AnnotationForge_1.49.0 bitops_1.0-9
#> [129] txdbmaker_1.1.2 ExperimentHubData_1.31.0
#> [131] purrr_1.0.2 crayon_1.5.3
#> [133] rlang_1.1.4 formatR_1.14
#> [135] KEGGREST_1.45.1