Organization of files on a local machine can be cumbersome. This is especially true for local copies of remote resources that may periodically require a new download to have the most updated information available. BiocFileCache is designed to help manage local and remote resource files stored locally. It provides a convenient location to organize files and once added to the cache management, the package provides functions to determine if remote resources are out of date and require a new download.
BiocFileCache
is a Bioconductor package and can
be installed through BiocManager::install()
.
if (!"BiocManager" %in% rownames(installed.packages()))
install.packages("BiocManager")
BiocManager::install("BiocFileCache", dependencies=TRUE)
After the package is installed, it can be loaded into R workspace by
The initial step to utilizing BiocFileCache
in managing files is to create a cache object specifying a location. We
will create a temporary directory for use with examples in this
vignette. If a path is not specified upon creation, the default location
is a directory ~/.BiocFileCache
in the typical user cache
directory as defined by
tools::R_user_dir("", which="cache")
.
BiocFileCache uses CRAN package httr
functions
HEAD
and GET
for accessing web resources. This
can be problematic if operating behind a proxy. The easiest solution is
to set the httr::set_config
with the proxy information.
proxy <- httr::use_proxy("http://my_user:my_password@myproxy:8080")
## or
proxy <- httr::use_proxy(Sys.getenv('http_proxy'))
httr::set_config(proxy)
The situation may occur where a cache is desired to be shared across
multiple users on a system. This presents permissions errors. To allow
access to multiple users create a group that the users belong to and
that the cache belongs too. Permissions of potentially two files need to
be altered depending on what you would like individuals to be able to
accomplish with the cache. A read-only cache will require manual
manipulatios of the BiocFileCache.sqlite.LOCK so that the group
permissions are g+rw
. To allow users to download files to
the shared cache, both the BiocFileCache.sqlite.LOCK file and the
BiocFileCache.sqlite file will need group permissions to
g+rw
. Please google how to create a user group for your
system of interest. To find the location of the cache to be able to
change the group and file permissions, you may run the following in R if
you used the default location:
tools::R_user_dir("BiocFileCache", which="cache")
or if you
created a unique location, something like the following:
bfc = BiocFileCache(cache="someUniquelocation"); bfccache(bfc)
.
For quick reference in linux you will use
chown currentuser:newgroup
to change the group and
chmod
to change the file permissions:
chmod 660
or chmod g+rw
should accomplish the
correct permissions.
Two issues have been commonly reported regarding the lock file.
There could be permission ERROR regarding group and public access.
See the previous Group Cache Access
section.
This is an issue with filelock on particular systems. Particular partitions and non standard file systems may not support filelock. The solution is to use a different section of the system to create the cache. The easiest way to define a new cache location is by using environment variables.
In R:
Sys.setenv(BFC_CACHE=<new cache location>)
Alternatively, you can set an environment variable globally to avoid having to set uniquely in each R session. Please google for specific instructions for setting environment variables globally for your particular OS system.
Other common filelock implemented packages that have specific environment variables to control location are:
As of BiocFileCache version > 1.15.1, the default caching location
has changed. The default cache is now controlled by the function
tools::R_user_dir
instead of
rappdirs::user_cache_dir
. Users who have utilized the
default BiocFileCache location, to continue using the created cache,
must move the cache and its files to the new default location or delete
the old cache and have to redownload any previous files.
The following steps can be used to move the files to the new location:
Determine the old location by running the following in R
rappdirs::user_cache_dir(appname="BiocFileCache")
Determine the new location by running the following in R
tools::R_user_dir("BiocFileCache", which="cache")
Move the files to the new location. You can do this manually or do the following steps in R. Remember if you have a lot of cached files, this may take awhile and you will need permissions on all the files in order to move them.
# make sure you have permissions on the cache/files
# use at own risk
moveFiles<-function(package){
olddir <- path.expand(rappdirs::user_cache_dir(appname=package))
newdir <- tools::R_user_dir(package, which="cache")
dir.create(path=newdir, recursive=TRUE)
files <- list.files(olddir, full.names =TRUE)
moveres <- vapply(files,
FUN=function(fl){
filename = basename(fl)
newname = file.path(newdir, filename)
file.rename(fl, newname)
},
FUN.VALUE = logical(1))
if(all(moveres)) unlink(olddir, recursive=TRUE)
}
package="BiocFileCache"
moveFiles(package)
Users may always specify a unique caching location by providing the
cache
argument to the BiocFileCache constructor; however
users must always specify this location as it will not be recognized by
default in subsequent runs.
Alternatively, the default caching location may also be controlled by
a user-wise or system-wide environment variable. Users may set the
environment variable BFC_CACHE
to the old location to
continue using as default location.
Lastly, if a user does not care about the already existing default cache, the old location may be deleted to move forward with the new default location. This option should be used with caution. Once deleted, old cached resources will no longer be available and have to be re-downloaded.
One can do this manually by navigating to the location indicated in
the ERROR message as Problematic cache:
and deleting the
folder and all its content.
The following can be done to delete through R code:
CAUTION This will remove the old cache and all downloaded resources.
library(BiocFileCache)
package = "BiocFileCache"
BFC_CACHE = rappdirs::user_cache_dir(appname=package)
Sys.setenv(BFC_CACHE = BFC_CACHE)
bfc = BiocFileCache(BFC_CACHE)
## CAUTION: This removes the cache and all downloaded resources
removebfc(bfc, ask=FALSE)
## create new empty cache in new default location
bfc = BiocFileCache(ask=FALSE)
sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dplyr_1.1.4 BiocFileCache_2.15.0 dbplyr_2.5.0
## [4] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.5.0.1 jsonlite_1.8.9 compiler_4.4.2
## [4] BiocManager_1.30.25 filelock_1.0.3 tidyselect_1.2.1
## [7] blob_1.2.4 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 R6_2.5.1 generics_0.1.3
## [13] curl_6.0.1 knitr_1.49 tibble_3.2.1
## [16] maketools_1.3.1 DBI_1.2.3 bslib_0.8.0
## [19] pillar_1.10.0 rlang_1.1.4 utf8_1.2.4
## [22] cachem_1.1.0 xfun_0.49 sass_0.4.9
## [25] sys_3.4.3 bit64_4.5.2 RSQLite_2.3.9
## [28] memoise_2.0.1 cli_3.6.3 withr_3.0.2
## [31] magrittr_2.0.3 digest_0.6.37 lifecycle_1.0.4
## [34] vctrs_0.6.5 evaluate_1.0.1 glue_1.8.0
## [37] buildtools_1.0.0 purrr_1.0.2 httr_1.4.7
## [40] rmarkdown_2.29 tools_4.4.2 pkgconfig_2.0.3
## [43] htmltools_0.5.8.1