Organization of files on a local machine can be cumbersome. This is especially true for local copies of remote resources that may periodically require a new download to have the most updated information available. BiocFileCache is designed to help manage local and remote resource files stored locally. It provides a convenient location to organize files and once added to the cache management, the package provides functions to determine if remote resources are out of date and require a new download.
BiocFileCache
is a Bioconductor package and can
be installed through BiocManager::install()
.
if (!"BiocManager" %in% rownames(installed.packages()))
install.packages("BiocManager")
BiocManager::install("BiocFileCache", dependencies=TRUE)
After the package is installed, it can be loaded into R workspace by
The initial step to utilizing BiocFileCache
in managing files is to create a cache object specifying a location. We
will create a temporary directory for use with examples in this
vignette. If a path is not specified upon creation, the default location
is a directory ~/.BiocFileCache
in the typical user cache
directory as defined by
tools::R_user_dir("", which="cache")
.
If the path location exists and has been utilized to store files previously, the previous object will be loaded with any files saved to the cache. If the path location does not exist the user will be prompted to create the new directory. If the session is not interactive to promt the user or the user decides not to create the directory a temporary directory will be used.
Some utility functions to examine the cache are:
bfccache(bfc)
length(bfc)
show(bfc)
bfcinfo(bfc)
bfccache()
will show the cache path.
NOTE: Because we are using temporary directories, your
path location will be different than shown.
length()
on a BiocFileCache will show the number of
files currently being tracked by the BiocFileCache
. For
more detailed information on what is store in the
BiocFileCache
object, there is a show method which will
display the object, object class, cache path, and number of items
currently being tracked.
bfc
## class: BiocFileCache
## bfccache: /tmp/RtmpgSNcjm/file16a842cfaf4a
## bfccount: 0
## For more information see: bfcinfo() or bfcquery()
bfcinfo()
will list a table of
BiocFileCache
resource files being tracked in the cache. It
returns a dplyr
object of class tbl_sqlite
.
bfcinfo(bfc)
## # A tibble: 0 × 10
## # ℹ 10 variables: rid <chr>, rname <chr>, create_time <dbl>, access_time <dbl>,
## # rpath <chr>, rtype <chr>, fpath <chr>, last_modified_time <dbl>,
## # etag <chr>, expires <dbl>
The table of resource files includes the following information:
rid
: resource id. Autogenerated. This is a unique
identifier automatically generated when a resource is added to the
cache.rname
: resource name. This is given by the user when a
resource is added to the cache. It does not have to be unique and can be
updated at anytime. We recommend descriptive key words and
identifiers.create_time
: The date and time a resource is added to
the cache.access_time
: The date and time a resource is utilized
within the cache. The access time is updated when the resource is
updated or downloaded.rpath
: resource path. This is the path to the local
file.rtype
: resource type. Either “local” or “web”,
indicating if the resource has a remote origin.fpath
: If rtype is “web”, this is the link to the
remote resource. It will be utilized to download the remote data.last_modified_time
: For a remote resource, the
last_modified (if available) information for the local copy of the data.
This information is checked against the remote resource to determine if
the local copy is stale and needs to be updated. If it is not available
or your resource is not a remote resource, the last modified time will
be marked as NA.etag
: For a remote resource, the etag (if available)
information for the local copy of the data. This information is checked
against the remote resource to determine if the local copy is stale and
needs to be updated. If it is not available or your resource is not a
remote resource, the etag will be marked as NA.expires
: For a remote resource, the expires (if
available) information for the local copy of the data. This information
is checked against the Sys.time
to determine if the local
copy needs to be updated. If it is not available or your resource is not
a remote resource, the expires will be marked as NA.Now that we have created the cache object and location, let’s explore adding files that the cache will manage!
Now that a BiocFileCache
object and cache location has
been created, files can be added to the cache for tracking. There are
two functions to add a resource to the cache:
bfcnew()
bfcadd()
The difference between the options: bfcnew()
creates an
entry for a resource and returns a filepath to save to. As there are
many types of data that can be saved in many different ways,
bfcnew()
allows you to save any R data object in
the appropriate manner and still be able to track the saved file.
bfcadd()
should be utilized when a file already exists or a
remote resource is being accessed.
bfcnew
takes the BiocFileCache
object and a
user specified rname
and returns a path location to save
data to. (optionally) you can add the file extension if you know the
type of file that will be saved:
savepath <- bfcnew(bfc, "NewResource", ext=".RData")
savepath
## BFC1
## "/tmp/RtmpgSNcjm/file16a842cfaf4a/16a8cb4689d_16a8cb4689d.RData"
## now we can use that path in any save function
m = matrix(1:12, nrow=3)
save(m, file=savepath)
## and that file will be tracked in the cache
bfcinfo(bfc)
## # A tibble: 1 × 10
## rid rname create_time access_time rpath rtype fpath last_modified_time etag
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 BFC1 NewR… 2024-12-19… 2024-12-19… /tmp… rela… 16a8… NA <NA>
## # ℹ 1 more variable: expires <dbl>
bfcadd()
is for existing files or remote resources. The
user will still specify an rname
of their choosing but also
must specify a path to local file or web resource as fpath
.
If no fpath
is given, the default is to assume the
rname
is also the path location. If the fpath
is a local file, there are a few options for the user determined by the
action
argument. action
will allow the user to
either copy
the existing file into the cache directory,
move
the existing file into the cache directory, or leave
the file whereever it is on the local system yet still track through the
cache object asis
. copy and move will rename the file to
the generated cache file path. If the fpath
is a remote
source, the source will try to be downloaded, if it is successful it
will save in the cache location and track in the cache object; The
original source will be added to the cache information as
fpath
. If the user does not want the remote resource to be
downloaded initially, the argument download=FALSE
may be
used to delay the download but add the resource to the cache. Relative
path locations may also be used, specified with
rtype = "relative"
. This will store a relative location for
the file within the cache; only actions copy
and
move
are available for relative paths.
First let’s use local files:
fl1 <- tempfile(); file.create(fl1)
## [1] TRUE
add2 <- bfcadd(bfc, "Test_addCopy", fl1) # copy
# returns filepath being tracked in cache
add2
## BFC2
## "/tmp/RtmpgSNcjm/file16a842cfaf4a/16a84f975d3e_file16a8bc18ffb"
# the name is the unique rid in the cache
rid2 <- names(add2)
fl2 <- tempfile(); file.create(fl2)
## [1] TRUE
add3 <- bfcadd(bfc, "Test2_addMove", fl2, action="move") # move
rid3 <- names(add3)
fl3 <- tempfile(); file.create(fl3)
## [1] TRUE
add4 <- bfcadd(bfc, "Test3_addAsis", fl3, rtype="local",
action="asis") # reference
rid4 <- names(add4)
file.exists(fl1) # TRUE - copied from original location
## [1] TRUE
file.exists(fl2) # FALSE - moved from original location
## [1] FALSE
file.exists(fl3) # TRUE - left asis, original location tracked
## [1] TRUE
Now let’s add some examples with remote sources:
url <- "http://httpbin.org/get"
add5 <- bfcadd(bfc, "TestWeb", fpath=url)
rid5 <- names(add5)
url2<- "https://bioconductor.org/packages/stats/bioc/BiocFileCache/BiocFileCache_2024_stats.tab"
add6 <- bfcadd(bfc, "TestWeb", fpath=url2)
rid6 <- names(add6)
# add a remote resource but don't initially download
add7 <- bfcadd(bfc, "TestNoDweb", fpath=url2, download=FALSE)
rid7 <- names(add7)
# let's look at our BiocFileCache object now
bfc
## class: BiocFileCache
## bfccache: /tmp/RtmpgSNcjm/file16a842cfaf4a
## bfccount: 7
## For more information see: bfcinfo() or bfcquery()
bfcinfo(bfc)
## # A tibble: 7 × 10
## rid rname create_time access_time rpath rtype fpath last_modified_time etag
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 BFC1 NewR… 2024-12-19… 2024-12-19… /tmp… rela… 16a8… <NA> <NA>
## 2 BFC2 Test… 2024-12-19… 2024-12-19… /tmp… rela… /tmp… <NA> <NA>
## 3 BFC3 Test… 2024-12-19… 2024-12-19… /tmp… rela… /tmp… <NA> <NA>
## 4 BFC4 Test… 2024-12-19… 2024-12-19… /tmp… local /tmp… <NA> <NA>
## 5 BFC5 Test… 2024-12-19… 2024-12-19… /tmp… web http… <NA> <NA>
## 6 BFC6 Test… 2024-12-19… 2024-12-19… /tmp… web http… 2024-12-10 <NA>
## 7 BFC7 Test… 2024-12-19… 2024-12-19… /tmp… web http… <NA> <NA>
## # ℹ 1 more variable: expires <dbl>
Now that we are tracking resources, let’s explore accessing their information!
Files will by default have a unique identifier added to the start of
the original file name (identifier_originalName) when added to the cache
to allow for multiple versions of the same file name. There is an option
to override this default behavior by using the fname
argument of bfcadd
or bfcnew
.
fname
takes one of two options: unique
or
exact
. The unique
option behaves as default
and adds a unique identifier to the original file name. The
exact
option wil override and not add a unique identifier
and an exact match to the original file name will be added.
Before we get into exploring individual resources, a helper function.
Most of the functions provided require the unique rid[s] assigned to a
resource. The bfcadd
and bfcnew
return the
path as a named character vector, the name of the character vector is
the rid. However, you may want to access a resource that you have added
some time ago.
bfcquery()
bfcquery()
will take in a key word and search across the
rname
, rpath
, and fpath
for any
matching entries. The columns that are searched can be controlled with
the argument field
.
bfcquery(bfc, "Web")
## # A tibble: 2 × 10
## rid rname create_time access_time rpath rtype fpath last_modified_time etag
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 BFC5 Test… 2024-12-19… 2024-12-19… /tmp… web http… <NA> <NA>
## 2 BFC6 Test… 2024-12-19… 2024-12-19… /tmp… web http… 2024-12-10 <NA>
## # ℹ 1 more variable: expires <dbl>
bfcquery(bfc, "copy")
## # A tibble: 0 × 10
## # ℹ 10 variables: rid <chr>, rname <chr>, create_time <dbl>, access_time <dbl>,
## # rpath <chr>, rtype <chr>, fpath <chr>, last_modified_time <dbl>,
## # etag <chr>, expires <dbl>
q1 <- bfcquery(bfc, "BiocFileCache")
q1
## # A tibble: 2 × 10
## rid rname create_time access_time rpath rtype fpath last_modified_time etag
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 BFC6 Test… 2024-12-19… 2024-12-19… /tmp… web http… 2024-12-10 <NA>
## 2 BFC7 Test… 2024-12-19… 2024-12-19… /tmp… web http… <NA> <NA>
## # ℹ 1 more variable: expires <dbl>
class(q1)
## [1] "tbl_bfc" "tbl_bfc" "tbl_df" "tbl" "data.frame"
As you can see above bfcquery()
, returns an object of
class tbl_sql
and can be investiaged further utilizing
methods for these classes, such as the package dplyr
methods. The rid
can be seen in the first column of the
table to be used in other functions. To get a quick count of how many
objects in the cache matched the query, use bfccount()
.
[
[
allows for subsetting of the BiocFileCache object. The
output will be a BiocFileSubCache object. Users will still be able to
query, remove (from the subset object only), and access resources of the
subset, however the resources cannot be updated.
bfcsubWeb = bfc[paste0("BFC", 5:6)]
bfcsubWeb
## class: BiocFileCacheReadOnly
## bfccache: /tmp/RtmpgSNcjm/file16a842cfaf4a
## bfccount: 2
## For more information see: bfcinfo() or bfcquery()
bfcinfo(bfcsubWeb)
## # A tibble: 2 × 10
## rid rname create_time access_time rpath rtype fpath last_modified_time etag
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 BFC5 Test… 2024-12-19… 2024-12-19… /tmp… web http… <NA> <NA>
## 2 BFC6 Test… 2024-12-19… 2024-12-19… /tmp… web http… 2024-12-10 <NA>
## # ℹ 1 more variable: expires <dbl>
There are three methods for retrieving the BiocFileCache
resource path location.
[[
bfcpath()
bfcrpath()
The [[
will access the rpath
saved in the
BiocFileCache
. Retrieving this location will return the
path to the local version of the resource; allowing the user to then use
this path in any load/read methods most appropriate for the resource.
The bfcpath()
and bfcrpath()
both return a
named character vector also displaying the local file that can be used
for retrieval. bfcpath
requires rids
while
bfcrpath()
can use rids
or rnames
(but not both). bfcrpath()
can be used to add a resource
into the cache when rnames
are specified; if the element in
rnames
is not found, it will try and add to the cache with
bfcadd()
.
bfc[["BFC2"]]
## BFC2
## "/tmp/RtmpgSNcjm/file16a842cfaf4a/16a84f975d3e_file16a8bc18ffb"
bfcpath(bfc, "BFC2")
## BFC2
## "/tmp/RtmpgSNcjm/file16a842cfaf4a/16a84f975d3e_file16a8bc18ffb"
bfcpath(bfc, "BFC5")
## BFC5
## "/tmp/RtmpgSNcjm/file16a842cfaf4a/16a8aaf100d_get"
bfcrpath(bfc, rids="BFC5")
## BFC5
## "/tmp/RtmpgSNcjm/file16a842cfaf4a/16a8aaf100d_get"
bfcrpath(bfc)
## BFC1
## "/tmp/RtmpgSNcjm/file16a842cfaf4a/16a8cb4689d_16a8cb4689d.RData"
## BFC2
## "/tmp/RtmpgSNcjm/file16a842cfaf4a/16a84f975d3e_file16a8bc18ffb"
## BFC3
## "/tmp/RtmpgSNcjm/file16a842cfaf4a/16a82e5d8378_file16a85dd39e90"
## BFC4
## "/tmp/RtmpgSNcjm/file16a889d2ccb"
## BFC5
## "/tmp/RtmpgSNcjm/file16a842cfaf4a/16a8aaf100d_get"
## BFC6
## "/tmp/RtmpgSNcjm/file16a842cfaf4a/16a8df4a2a9_BiocFileCache_2024_stats.tab"
## BFC7
## "/tmp/RtmpgSNcjm/file16a842cfaf4a/16a8f20f864_BiocFileCache_2024_stats.tab"
bfcrpath(bfc, c("http://httpbin.org/get","Test3_addAsis"))
## adding rname 'http://httpbin.org/get'
## BFC4
## "/tmp/RtmpgSNcjm/file16a889d2ccb"
## BFC8
## "/tmp/RtmpgSNcjm/file16a842cfaf4a/16a87f7dbf16_get"
Managing remote resources locally involves knowing when to update the local copy of the data.
bfcneedsupdate()
bfcneedsupdate()
is a method that will check the local
copy of the data’s etag and last_modifed time to the etag and
last_modified time of the remote resource as well as an expires time.
The cache saves this information when the web resource is initially
added. The expires time is checked against the current Sys.time to see
if the local resource has expired. If so the resource will deem need to
be updated; if unavailable or not expired will check the etag and
last_modified_time. The etag information is used definitively if it is
available, if it is not available it checks the last_modified time. If
the resource does not have a last_modified tag either, it is
undetermined. If the resource has not been download yet, it is
TRUE
.
Note: This function does not automatically download
the remote source if it is out of date. Please see
bfcdownload()
.
Just as you could access the rpath
, the local resource
path can be set with
[[<-
The file must exist in order to be replaced in the
BiocFileCache
. If the user wishes to rename, they must make
a copy (or touch) the file first.
fileBeingReplaced <- bfc[[rid3]]
fileBeingReplaced
## BFC3
## "/tmp/RtmpgSNcjm/file16a842cfaf4a/16a82e5d8378_file16a85dd39e90"
# fl3 was created when we were adding resources
fl3
## [1] "/tmp/RtmpgSNcjm/file16a889d2ccb"
bfc[[rid3]]<-fl3
## Warning in `[[<-`(`*tmp*`, rid3, value = "/tmp/RtmpgSNcjm/file16a889d2ccb"):
## updating rpath, changing rtype to 'local'
bfc[[rid3]]
## BFC3
## "/tmp/RtmpgSNcjm/file16a889d2ccb"
The user may also wish to change the rname
or
fpath
associated with a resource in addition to the
rpath
. This can be done with
bfcupdate()
Again, if changing the rpath
the file must exist. If a
fpath
is being updated, the data will be downloaded and the
user will be prompted to overwrite the current file specified in
rpath
. If the user does not want to be prompted about
overwritting of files, ask=FALSE
may be used.
bfcinfo(bfc, "BFC1")
## # A tibble: 1 × 10
## rid rname create_time access_time rpath rtype fpath last_modified_time etag
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 BFC1 NewR… 2024-12-19… 2024-12-19… /tmp… rela… 16a8… NA <NA>
## # ℹ 1 more variable: expires <dbl>
bfcupdate(bfc, "BFC1", rname="FirstEntry")
bfcinfo(bfc, "BFC1")
## # A tibble: 1 × 10
## rid rname create_time access_time rpath rtype fpath last_modified_time etag
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 BFC1 Firs… 2024-12-19… 2024-12-19… /tmp… rela… 16a8… NA <NA>
## # ℹ 1 more variable: expires <dbl>
Now let’s update a web resource
suppressPackageStartupMessages({
library(dplyr)
})
bfcinfo(bfc, "BFC6") %>% select(rid, rpath, fpath)
## # A tibble: 1 × 3
## rid rpath fpath
## <chr> <chr> <chr>
## 1 BFC6 /tmp/RtmpgSNcjm/file16a842cfaf4a/16a8df4a2a9_BiocFileCache_2024_s… http…
bfcupdate(bfc, "BFC6", fpath=url, rname="Duplicate", ask=FALSE)
bfcinfo(bfc, "BFC6") %>% select(rid, rpath, fpath)
## # A tibble: 1 × 3
## rid rpath fpath
## <chr> <chr> <chr>
## 1 BFC6 /tmp/RtmpgSNcjm/file16a842cfaf4a/16a8df4a2a9_BiocFileCache_2024_s… http…
Lastly, remote resources may require an update if the Data is out of
date (See bfcneedsupdate()
). The bfcdownload
function will attempt to download from the original resource saved in
the cache as fpath
and overwrite the out of date file
rpath
bfcdownload()
The following confirms that resources need updating, and the performs the update
The following functions are provided for metadata:
bfcmeta()<-
bfcmeta()
bfcmetalist()
bfcmetaremove()
Additional metadata can be added as data.frames
that
become tables in the sql database. The data.frame
must
contain a column rid
that matches the rid
column in the cache. Any metadata added will then be displayed when
accessing the cache. Metadata is added with bfcmeta()<-
.
A table name
must be provided as an argument. Users can add
multiple metadata tables as long as the names are unique. Tables may be
appended or overwritten using additional arguments
append=TRUE
or overwrite=TRUE
.
names(bfcinfo(bfc))
## [1] "rid" "rname" "create_time"
## [4] "access_time" "rpath" "rtype"
## [7] "fpath" "last_modified_time" "etag"
## [10] "expires"
meta <- as.data.frame(list(rid=bfcrid(bfc)[1:3], idx=1:3))
bfcmeta(bfc, name="resourceData") <- meta
names(bfcinfo(bfc))
## [1] "rid" "rname" "create_time"
## [4] "access_time" "rpath" "rtype"
## [7] "fpath" "last_modified_time" "etag"
## [10] "expires" "idx"
The metadata tables that exist can be listed with
bfcmetalist()
and can be retrieved with
bfcmeta()
.
bfcmetalist(bfc)
## [1] "resourceData"
bfcmeta(bfc, name="resourceData")
## rid idx
## 1 BFC1 1
## 2 BFC2 2
## 3 BFC3 3
Lastly, metadata can be removed with
bfcmetaremove()
.
Note:
While quick implementations of all the functions exist where if you
don’t specify a BiocFileCache object it will operate on
BiocFileCache()
, this option is not available for
bfcmeta()<-
. This function must always specify a
BiocFileCache object by first defining a variable and then passing that
variable into the function.
Example of ERROR:
bfcmeta(name="resourceData") <- meta
Error in bfcmeta(name = "resourceData") <- meta :
target of assignment expands to non-language object
Correct implementation:
All other functions have a default, if the BiocFileCache object is
missing it will operate on the default cache
BiocFileCache()
.
Now that we have added resources, it is also possible to remove a resource.
bfcremove()
When you remove a resource from the cache, it will also delete the
local file but only if it is stored in the cache directory as given by
bfccache(bfc)
. If it is a path to a file somewhere else on
the user system, it will only be removed from the
BiocFileCache
object but the file not deleted.
# let's remind ourselves of our object
bfc
## class: BiocFileCache
## bfccache: /tmp/RtmpgSNcjm/file16a842cfaf4a
## bfccount: 8
## For more information see: bfcinfo() or bfcquery()
bfcremove(bfc, "BFC6")
bfcremove(bfc, "BFC1")
# let's look at our BiocFileCache object now
bfc
## class: BiocFileCache
## bfccache: /tmp/RtmpgSNcjm/file16a842cfaf4a
## bfccount: 6
## For more information see: bfcinfo() or bfcquery()
There is another helper function that may be of use:
bfcsync()
This function will compare two things:
rpath
cannot be found (This would occur if
bfcnew()
is used and the path was not used to save an
object)bfccache(bfc)
), that are not being tracked by the
BiocFileCache
object# create a new entry that hasn't been used
path <- bfcnew(bfc, "UseMe")
rmMe <- names(path)
# We also have a file not being tracked because we updated rpath
bfcsync(bfc)
## entries without corresponding files: 'BFC7' 'BFC9'
## files without cache entries
## /tmp/RtmpgSNcjm/file16a842cfaf4a/16a82e5d8378_file16a85dd39e90
## /tmp/RtmpgSNcjm/file16a842cfaf4a/add_or_return_rname.LOCK
##
## [1] FALSE
# you can suppress the messages and just have a TRUE/FALSE
bfcsync(bfc, FALSE)
## [1] FALSE
#
# Let's do some cleaning to have a synced object
#
bfcremove(bfc, rmMe)
unlink(fileBeingReplaced)
bfcsync(bfc)
## entries without corresponding files: 'BFC7'
## files without cache entries
## /tmp/RtmpgSNcjm/file16a842cfaf4a/add_or_return_rname.LOCK
##
## [1] FALSE
There is a helper function to export a BiocFileCache and associated files as a tar or zip archive as well as the appropriate import function.
exportbfc()
importbfc()
The exportbfc
function will take in a BiocFileCache
object or subsetted object and create a tar or zip archive that can then
be shared to other collaborators on different computer systems. The user
can choose where the archive is created with outputFile
;
the current working directory and the name
BiocFileCacheExport.tar
is used as default. By default a
tar archive is created, but the user can create a zip archive instead
using the argument outputMethod="zip"
. Any additional
argument to the utils::zip
or utils::tar
may
also be utilized.
The following are some example calls:
# export entire biocfilecache
exportbfc(bfc)
# export the first 4 entries of biocfilecache
# as a compressed tar
exportbfc(bfc, rids=paste0("BFC", 1:4),
outputFile="BiocFileCacheExport.tar.gz", compression="gzip")
# export the subsetted object of web resources as zip
sub1 <- bfc[bfcrid(bfcquery(bfc, "web", field='rtype'))]
exportbfc(sub1, outputFile = "BiocFileCacheExportWeb.zip",
outMethod="zip")
The archive once inflated on a users system will have a fully
functional copy of the sent cache. The archive can be extracted manually
and the path used in the constructor BiocFileCache()
or for
convenience the function importbfc
may be utilized. The
importbfc
function takes in a path to the appropriate tar
or zip file, the argument archiveMethod
indicating if
untar
or unzip
should be used (the default is
untar), a path to where the archive should be extracted to as
exdir
, and any additional arguments to the
utils::untar
and utils::unzip
methods. The
function will extract the files and load the associated BiocFileCache
object into the R session.
The following are example calls to load the above example exported objects:
There exists the following helper functions to convert existing data to a BiocFileCache:
makeBiocFileCacheFromDataFrame
These functions may take awhile to run if there are a lot of resources, however if the BiocFileCache is stored in a permanent location it will only need to be run once.
makeBiocFileCacheFromDataFrame
takes an existing
data.frame and creates a BiocFileCache object. The cache location can be
specified by the cache
argument. The cache
must not already exist and the user will be prompted to create the
location. If the user opts ‘N’, the cache will be created in a temporary
directory and this function will have to be run again upon a new R
session. The original data.frame must contain the required BiocFileCache
columns rtype
, rpath
, and fpath
as described in the section 1.2 “Creating / Loading the Cache”. The
optional columns rname
, last_modified_time
,
etag
and expires
may also be specified in the
original data.frame although are not required and will be populated with
defaults if missing. For resources with rtype="local"
, the
actionLocal
will control if the local copy of the file is
copied or moved to the cache location, or if it is left asis on the
local system; A local copy of the file must exist if the resource is
identified as rtype=local
. For resources with
rtype="web"
, actionWeb
will control if the
local copy of the remote file is copied or moved to the cache location.
It is a requirement of BiocFileCache that all remote resources download
their local copy to the cache location. A local copy of the file does
not have to exist and can be downloaded into the cache at a later time.
Any additional columns of the original data.frame besides those required
or optional BiocFileCache columns, are separated and added to the
BiocFileCache as a meta data table with the name given as
metadataName
. See section 1.6 on “Adding Metadata”.
The following is an example data.frame with minimal columns ‘rtype’,
‘rpath’, and ‘fpath’ and one additional column that will become metadata
‘keywords’. The ‘rpath’ can be NA
as these are remote
resources (rtype='web'
) that have not been downloaded
yet.
tbl <- data.frame(rtype=c("web","web"),
rpath=c(NA_character_,NA_character_),
fpath=c("http://httpbin.org/get",
"https://en.wikipedia.org/wiki/Bioconductor"),
keywords = c("httpbin", "wiki"), stringsAsFactors=FALSE)
tbl
## rtype rpath fpath keywords
## 1 web <NA> http://httpbin.org/get httpbin
## 2 web <NA> https://en.wikipedia.org/wiki/Bioconductor wiki
Finally, there are two function involved with cleaning or deleting the cache:
cleanbfc()
removebfc()
cleanbfc()
will evaluate the resources in the
BiocFileCache
object and determine which, if any, have not
been created, redownloaded, or updated in a specified number of days. If
ask=TRUE
, each entry that is above that threshold will ask
if it should be removed from the cache object and the file deleted (only
deleted if in bfccache(bfc)
location). If
ask=FALSE
, it does not ask about each file and
automatically removes and deletes the file. The default number of days
is 120. If a resource has not needed any updates, this function could
give a false positive. It is also does not take into account how many
time the resource was loaded by retrieving the path (ie. via [[,
bfcpath, bfcrpath), so may not be an accurate indication of how often
the resource is utilized. Please use this function with caution.
removebfc()
will remove the BiocFileCache
complete from the system. Any files saved in bfccache(bfc)
directory will also be deleted.
Note Use with caution!
BiocFileCache uses CRAN package httr
functions
HEAD
and GET
for accessing web resources. This
can be problematic if operating behind a proxy. The easiest solution is
to set the httr::set_config
with the proxy information.
The situation may occur where a cache is desired to be shared across
multiple users on a system. This presents permissions errors. To allow
access to multiple users create a group that the users belong to and
that the cache belongs too. Permissions of potentially two files need to
be altered depending on what you would like individuals to be able to
accomplish with the cache. A read-only cache will require manual
manipulatios of the BiocFileCache.sqlite.LOCK so that the group
permissions are g+rw
. To allow users to download files to
the shared cache, both the BiocFileCache.sqlite.LOCK file and the
BiocFileCache.sqlite file will need group permissions to
g+rw
. Please google how to create a user group for your
system of interest. To find the location of the cache to be able to
change the group and file permissions, you may run the following in R if
you used the default location:
tools::R_user_dir("BiocFileCache", which="cache")
or if you
created a unique location, something like the following:
bfc = BiocFileCache(cache="someUniquelocation"); bfccache(bfc)
.
For quick reference in linux you will use
chown currentuser:newgroup
to change the group and
chmod
to change the file permissions:
chmod 660
or chmod g+rw
should accomplish the
correct permissions.
It is our hope that this package allows for easier management of local and remote resources.
sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dplyr_1.1.4 BiocFileCache_2.15.0 dbplyr_2.5.0
## [4] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.5.0.1 jsonlite_1.8.9 compiler_4.4.2
## [4] BiocManager_1.30.25 filelock_1.0.3 tidyselect_1.2.1
## [7] blob_1.2.4 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 R6_2.5.1 generics_0.1.3
## [13] curl_6.0.1 knitr_1.49 tibble_3.2.1
## [16] maketools_1.3.1 DBI_1.2.3 bslib_0.8.0
## [19] pillar_1.10.0 rlang_1.1.4 utf8_1.2.4
## [22] cachem_1.1.0 xfun_0.49 sass_0.4.9
## [25] sys_3.4.3 bit64_4.5.2 RSQLite_2.3.9
## [28] memoise_2.0.1 cli_3.6.3 withr_3.0.2
## [31] magrittr_2.0.3 digest_0.6.37 lifecycle_1.0.4
## [34] vctrs_0.6.5 evaluate_1.0.1 glue_1.8.0
## [37] buildtools_1.0.0 purrr_1.0.2 httr_1.4.7
## [40] rmarkdown_2.29 tools_4.4.2 pkgconfig_2.0.3
## [43] htmltools_0.5.8.1