SharedObject
is designed for sharing data across many R
workers. It allows multiple workers to read and write the same R object
located in the same memory location. This feature is useful in parallel
computing when a large R object needs to be read by all R workers. It
has the potential to reduce the memory consumption and the overhead of
data transmission.
For the basic R type, the package supports raw
,
logical
, integer
, numeric
,
complex
and character
. Note that sharing a
character vector is beneficial only when there are a lot repetitions in
the elements of the vector. Due to the complicated structure of the
character vector, you are not allowed to set the value of a shared
character vector to a value which haven’t presented in the vector.
Therefore, It is recommended to treat the shared character vector as
read-only.
For the container, the package supports list
,
pairlist
and environment
. Sharing a container
is equivalent to sharing all elements in the container, the container
itself will not be shared. Therefore, adding or replacing an element in
a shared container in one worker will not implicitly change the shared
container in the other workers. Since a data frame is fundamentally a
list object, sharing a data frame will follow the same principle.
For the more complicated data structure like S3
and
S4
class. They are available out-of-box. Therefore, there
is no need to customize the share
function to support an
S3/S4 class. However, if the S3/S4 class has a special
design(e.g. on-disk data), the function share
is an S4
generic and developers are free to define their own share
method.
When an object is not sharable, no error will be given and the same
object will be returned. This should be a rare case as most data types
are supported. The argument mustWork = TRUE
can be used if
you want to make sure the return value is a shared object.
## the element `A` is sharable and `B` is not
x <- list(A = 1:3, B = as.symbol("x"))
## No error will be given,
## but the element `B` is not shared
shared_x <- share(x)
## Use the `mustWork` argument
## An error will be given for the non-sharable object `B`
tryCatch({
shared_x <- share(x, mustWork = TRUE)
},
error=function(msg)message(msg$message)
)
#> The object of the class <name> cannot be shared.
#> To suppress this error and return the same object,
#> provide `mustWork = FALSE` as a function argument
#> or change its default value in the package settings
As we mentioned before, the package provides is.shared
function to identify a shared object. By default, is.shared
function returns a single logical value indicating whether the object is
a shared object or contains any shared objects. If the object is a
container(e.g. list), you can explore the details using the
depth
parameter.
There are some options that can control the default behavior of a shared object, you can view them via
sharedObjectPkgOptions()
#> $mustWork
#> [1] FALSE
#>
#> $sharedAttributes
#> [1] TRUE
#>
#> $copyOnWrite
#> [1] TRUE
#>
#> $sharedSubset
#> [1] FALSE
#>
#> $sharedCopy
#> [1] FALSE
#>
#> $minLength
#> [1] 3
As we have seen previously, the option mustWork = FALSE
suppress the error message when the function share
encounter a non-sharable object and force the function to return the
same object. sharedSubset
controls whether the subset of a
shared object is still a shared object. minLength
determines the minimum length of a shared object. An R object will not
be shared if its length is less than the minimum length.
We will talk about the options copyOnWrite
and
sharedCopy
in the advanced section, but for most users it
is safe to ignore them. The global setting can be modified via
sharedObjectPkgOptions
## change the default setting
sharedObjectPkgOptions(mustWork = TRUE)
## Check if the change is made
sharedObjectPkgOptions("mustWork")
#> [1] TRUE
## Restore the default
sharedObjectPkgOptions(mustWork = FALSE)
Note that the package options can be temporary overwritten by
providing named parameters to the function share
. For
example, you can overwrite the package mustwork
via
share(x, mustWork = TRUE)
.
Since all workers are using shared objects located in the same memory location, a change made on a shared object in one worker can affect the value of the object in the other workers. To prevent users from changing the values of a shared object unintentionally, a shared object will duplicate itself if a change of its value is made. For example
x1 <- share(1:4)
x2 <- x1
## x2 becames a regular R object after the change
is.shared(x2)
#> [1] TRUE
x2[1] <- 10L
is.shared(x2)
#> [1] FALSE
## x1 is not changed
x1
#> [1] 1 2 3 4
x2
#> [1] 10 2 3 4
When we change the value of x2
, R will first duplicate
the object x2
, then applies the change. Therefore, although
x1
and x2
share the same data, the change in
x2
will not affect the value of x1
. This
default behavior can be overwritten by the parameter
copyOnWrite
.
x1 <- share(1:4, copyOnWrite = FALSE)
x2 <- x1
## x2 will not be duplicated when a change is made
is.shared(x2)
#> [1] TRUE
x2[1] <- 0L
is.shared(x2)
#> [1] TRUE
## x1 has been changed
x1
#> [1] 0 2 3 4
x2
#> [1] 0 2 3 4
If copy-on-write is off, a change in the matrix x2
causes a change in x1
. This feature could be potentially
useful to collect the results from workers. For example, you can
pre-allocate an empty shared object with
copyOnWrite = FALSE
and let the workers write their results
back to the shared object. This will avoid the need of sending the data
from workers to the main process. However, due to the limitation of R,
it is possible to change the value of a shared object unexpectedly. For
example
The above example shows a surprising result when the copy-on-write
feature is off. Simply calling an unary function can change the values
of a shared object. Therefore, users must use this feature with caution.
The copy-on-write feature of an object can be set via the
setCopyOnwrite
function or the copyOnWrite
parameter in the share
function.
## Create x1 with copy-on-write off
x1 <- share(1:4, copyOnWrite = FALSE)
x2 <- x1
## change the value of x2
x2[1] <- 0L
## Both x1 and x2 are affected
x1
#> [1] 0 2 3 4
x2
#> [1] 0 2 3 4
## Enable copy-on-write
## x2 is now independent with x1
setCopyOnWrite(x2, TRUE)
x2[2] <- 0L
## only x2 is affected
x1
#> [1] 0 2 3 4
x2
#> [1] 0 0 3 4
This flexibility provides a way to do safe operations during the computation and return the results without memory duplication.
If a high-precision value is assigned to a low-precision shared object(E.g. assigning a numeric value to an integer shared object), an implicit type conversion will be triggered for correctly storing the change. The resulting object would be a regular R object, not a shared object. Therefore, the change will not be broadcasted even if the copy-on-write feature is off. Users should be cautious with the data type that a shared object is using.
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] parallel stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] SharedObject_1.21.0 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.37 R6_2.5.1 fastmap_1.2.0
#> [4] xfun_0.48 maketools_1.3.1 cachem_1.1.0
#> [7] knitr_1.48 BiocGenerics_0.53.0 htmltools_0.5.8.1
#> [10] rmarkdown_2.28 buildtools_1.0.0 lifecycle_1.0.4
#> [13] cli_3.6.3 sass_0.4.9 jquerylib_0.1.4
#> [16] compiler_4.4.1 sys_3.4.3 tools_4.4.1
#> [19] evaluate_1.0.1 bslib_0.8.0 Rcpp_1.0.13
#> [22] yaml_2.3.10 BiocManager_1.30.25 jsonlite_1.8.9
#> [25] rlang_1.1.4