Package 'ReUseData' reference manual

Title:	Reusable and reproducible Data Management
Description:	ReUseData is an _R/Bioconductor_ software tool to provide a systematic and versatile approach for standardized and reproducible data management. ReUseData facilitates transformation of shell or other ad hoc scripts for data preprocessing into workflow-based data recipes. Evaluation of data recipes generate curated data files in their generic formats (e.g., VCF, bed). Both recipes and data are cached using database infrastructure for easy data management and reuse. Prebuilt data recipes are available through ReUseData portal ("https://rcwl.org/dataRecipes/") with full annotation and user instructions. Pregenerated data are available through ReUseData cloud bucket that is directly downloadable through "getCloudData()".
Authors:	Qian Liu [aut, cre]
Maintainer:	Qian Liu <[email protected]>
License:	GPL-3
Version:	1.7.0
Built:	2025-03-04 06:06:25 UTC
Source:	https://github.com/bioc/ReUseData

annData

Description

Add annotation or meta information to existing data

Usage

annData(
  path,
  notes,
  date = Sys.Date(),
  recursive = TRUE,
  md5 = FALSE,
  skip = "*.md|meta.yml",
  force = FALSE,
  ...
)
annData(
  path,
  notes,
  date = Sys.Date(),
  recursive = TRUE,
  md5 = FALSE,
  skip = "*.md|meta.yml",
  force = FALSE,
  ...
)

Arguments

`path`	The data path to annotate.
`notes`	User assigned notes/keywords to annotate the data and be used for keywords matching in `dataSearch(keywords = )`.
`date`	The date of the data.
`recursive`	Whether to annotate all data recursively.
`md5`	Whether to generate md5 values for all files.
`skip`	Patter to skip files in the path.
`force`	Whether to force regenerate meta.yml.
`...`	The other options from `list.files`

dataHub Class

Description

dataHub class, constructor, and methods.

Usage

dataHub(BFC)

dataHub(BFC)

## S4 method for signature 'dataHub'
show(object)

dataNames(object)

dataParams(object)

dataNotes(object)

dataPaths(object)

dataYml(object)

dataTags(object)

## S4 method for signature 'dataHub'
dataTags(object)

dataTags(object, append = TRUE) <- value

## S4 replacement method for signature 'dataHub'
dataTags(object, append = FALSE) <- value

## S4 method for signature 'dataHub,ANY,ANY,ANY'
x[i, j, drop]

## S4 replacement method for signature 'dataHub,ANY,ANY,ANY'
x[i, j] <- value

## S4 method for signature 'dataHub'
c(x, ...)

toList(
  x,
  listNames = NULL,
  format = c("list", "json", "yaml"),
  type = NULL,
  file = character()
)
dataHub(BFC)

dataHub(BFC)

## S4 method for signature 'dataHub'
show(object)

dataNames(object)

dataParams(object)

dataNotes(object)

dataPaths(object)

dataYml(object)

dataTags(object)

## S4 method for signature 'dataHub'
dataTags(object)

dataTags(object, append = TRUE) <- value

## S4 replacement method for signature 'dataHub'
dataTags(object, append = FALSE) <- value

## S4 method for signature 'dataHub,ANY,ANY,ANY'
x[i, j, drop]

## S4 replacement method for signature 'dataHub,ANY,ANY,ANY'
x[i, j] <- value

## S4 method for signature 'dataHub'
c(x, ...)

toList(
  x,
  listNames = NULL,
  format = c("list", "json", "yaml"),
  type = NULL,
  file = character()
)

Arguments

`BFC`	A BiocFileCache object created for data and recipes.
`object`	A `dataHub` object.
`append`	Whether to append new tag or replace all tags.
`value`	A `dataHub` object
`x`	A `dataHub` object.
`i`	The integer index of the `dataHub` object, or a logical vector same length as the `dataHub` object.
`j`	inherited from `[` generic.
`drop`	Inherited from `[` generic.
`...`	More `dataHub` objects to combine.
`listNames`	A vector of names for the output list.
`format`	can be "list", "json" or "yaml". Supports partial match. Default is list.
`type`	The type of workflow input list, such as cwl.
`file`	The file name to save the data list in required format. The data extension needs to be included, e.g., ".json" or ".yml".

Value

dataHub: a dataHub object.

dataNames: the names of datasets in dataHub object.

dataParams: the data recipe parameter values for datasets in dataHub object.

dataNotes: the notes of datasets in dataHub object.

dataPaths: the file paths of datasets in dataHub object.

dataYml: the yaml file paths of datasets in dataHub object.

dataTags: the tags of datasets in dataHub object.

toList: A list of datasets in specific format, and a file if file argument is specified.

Examples

outdir <- file.path(tempdir(), "SharedData")
dataUpdate(outdir, cloud = TRUE)
dd <- dataSearch(c("liftover", "GRCh38"))
dataNames(dd)
dataParams(dd)
dataNotes(dd)
dataTags(dd)
dataYml(dd)
toList(dd)
toList(dd, format = "yaml")
toList(dd, format = "json", file = tempfile())
outdir <- file.path(tempdir(), "SharedData")
dataUpdate(outdir, cloud = TRUE)
dd <- dataSearch(c("liftover", "GRCh38"))
dataNames(dd)
dataParams(dd)
dataNotes(dd)
dataTags(dd)
dataYml(dd)
toList(dd)
toList(dd, format = "yaml")
toList(dd, format = "json", file = tempfile())

dataSearch search data in local data caching system

Description

dataSearch search data in local data caching system

Usage

dataSearch(keywords = character(), cachePath = "ReUseData")
dataSearch(keywords = character(), cachePath = "ReUseData")

Arguments

`keywords`	character vector of keywords to be matched to the local datasets. It matches the "notes" when generating the data using `getData(notes = )`. Keywords can be a tag with the data in `⁠#tag⁠` format. If not specified, function returns the full data list.
`cachePath`	A character string for the data cache. Must match the one specified in `dataUpdate()`. Default is "ReUseData".

Value

a dataHub object containing the information about local data cache, e.g., data name, data path, etc.

Examples

dataSearch()
dataSearch(c("gencode")) 
dataSearch("#gatk")

dataSearch()
dataSearch(c("gencode")) 
dataSearch("#gatk")

dataUpdate

Description

Function to update the local data records by reading the yaml files in the specified directory recursively.

Usage

dataUpdate(
  dir,
  cachePath = "ReUseData",
  outMeta = FALSE,
  keepTags = TRUE,
  cleanup = FALSE,
  cloud = FALSE,
  remote = FALSE,
  checkData = TRUE,
  duplicate = FALSE
)
dataUpdate(
  dir,
  cachePath = "ReUseData",
  outMeta = FALSE,
  keepTags = TRUE,
  cleanup = FALSE,
  cloud = FALSE,
  remote = FALSE,
  checkData = TRUE,
  duplicate = FALSE
)

Arguments

`dir`	a character string for the directory where all data are saved. Data information will be collected recursively within this directory.
`cachePath`	A character string specifying the name for the `BiocFileCache` object to store all the curated data resources. Once specified, must match the `cachePath` argument in `dataSearch`. Default is "ReUseData".
`outMeta`	Logical. If TRUE, a "meta_data.csv" file will be generated in the `dir`, containing information about all available datasets in the directory: The file path to the yaml files, and yaml entries including parameter values for data recipe, file path to datasets, notes, version (from `getData()`), if available and data generating date.
`keepTags`	If keep the prior assigned data tags. Default is TRUE.
`cleanup`	If remove any invalid intermediate files. Default is FALSE. In cases one data recipe (with same parameter values) was evaluated multiple times, the same data file(s) will match to multiple intermediate files (e.g., .yml). `cleanup` will remove older intermediate files, and only keep the most recent ones that matches the data file. When there are any intermediate files that don't match to any data file, `cleanup` will also remove those.
`cloud`	Whether to return the pregenerated data from Google Cloud bucket of ReUseData. Default is FALSE.
`remote`	Whether to use the csv file (containing information about pregenerated data on Google Cloud) from GitHub, which is most up-to-date. Only works when `cloud = TRUE`. Default is FALSE.
`checkData`	check if the data (listed as "# output: " in the yml file) exists. If not, do not include in the output csv file. This argument is added for internal testing purpose.
`duplicate`	Whether to remove duplicates. If TRUE, older version of duplicates will be removed.

Details

Users can directly retrieve information for all available datasets by using meta_data(dir=), which generates a data frame in R with same information as described above and can be saved out. dataUpdate does extra check for all datasets (check the file path in "output" column), remove invalid ones, e.g., empty or non-existing file path, and create a data cache for all valid datasets.

Value

a dataHub object containing the information about local data cache, e.g., data name, data path, etc.

Examples

## Generate data
## Not run: 
library(Rcwl)
outdir <- file.path(tempdir(), "SharedData")

echo_out <- recipeLoad("echo_out")
Rcwl::inputs(echo_out)
echo_out$input <- "Hello World!"
echo_out$outfile <- "outfile"
res <- getData(echo_out,
               outdir = outdir,
               notes = c("echo", "hello", "world", "txt"),
               showLog = TRUE)

ensembl_liftover <- recipeLoad("ensembl_liftover")
Rcwl::inputs(ensembl_liftover)
ensembl_liftover$species <- "human"
ensembl_liftover$from <- "GRCh37"
ensembl_liftover$to <- "GRCh38"
res <- getData(ensembl_liftover,
        outdir = outdir, 
        notes = c("ensembl", "liftover", "human", "GRCh37", "GRCh38"),
        showLog = TRUE)

## Update data cache (with or without prebuilt data sets from ReUseData cloud bucket)
dataUpdate(dir = outdir)
dataUpdate(dir = outdir, cloud = TRUE)

## newly generated data are now cached and searchable
dataSearch(c("hello", "world"))
dataSearch(c("ensembl", "liftover"))  ## both locally generated data and google cloud data! 

## End(Not run)
## Generate data
## Not run: 
library(Rcwl)
outdir <- file.path(tempdir(), "SharedData")

echo_out <- recipeLoad("echo_out")
Rcwl::inputs(echo_out)
echo_out$input <- "Hello World!"
echo_out$outfile <- "outfile"
res <- getData(echo_out,
               outdir = outdir,
               notes = c("echo", "hello", "world", "txt"),
               showLog = TRUE)

ensembl_liftover <- recipeLoad("ensembl_liftover")
Rcwl::inputs(ensembl_liftover)
ensembl_liftover$species <- "human"
ensembl_liftover$from <- "GRCh37"
ensembl_liftover$to <- "GRCh38"
res <- getData(ensembl_liftover,
        outdir = outdir, 
        notes = c("ensembl", "liftover", "human", "GRCh37", "GRCh38"),
        showLog = TRUE)

## Update data cache (with or without prebuilt data sets from ReUseData cloud bucket)
dataUpdate(dir = outdir)
dataUpdate(dir = outdir, cloud = TRUE)

## newly generated data are now cached and searchable
dataSearch(c("hello", "world"))
dataSearch(c("ensembl", "liftover"))  ## both locally generated data and google cloud data! 

## End(Not run)

getCloudData Download the pregenerated curated data sets from ReUseData cloud bucket

Description

getCloudData Download the pregenerated curated data sets from ReUseData cloud bucket

Usage

getCloudData(datahub, outdir = character())
getCloudData(datahub, outdir = character())

Arguments

`datahub`	The `dataHub` object returned from `dataSearch()` with 1 data record available on ReUseData cloud bucket.
`outdir`	The output directory for the data (and concomitant annotation files) to be downloaded. It is recommended to use a new folder under a shared folder for a new to-be-downloaded data.

Value

Data and concomitant annotation files will be downloaded to the user-specified folder that is locally searchable with dataSearch().

Examples

outdir <- file.path(tempdir(), "gcpData")
dh <- dataSearch(c("ensembl", "GRCh38"))
dh <- dh[grep("http", dataPaths(dh))]

## download data from google bucket
getCloudData(dh[1], outdir = outdir)

## Update local data caching
dataUpdate(outdir)  ## no "cloud=TRUE" here, only showing local data cache

## Now the data is available to use locally 
dataSearch(c("ensembl", "GRCh38"))

outdir <- file.path(tempdir(), "gcpData")
dh <- dataSearch(c("ensembl", "GRCh38"))
dh <- dh[grep("http", dataPaths(dh))]

## download data from google bucket
getCloudData(dh[1], outdir = outdir)

## Update local data caching
dataUpdate(outdir)  ## no "cloud=TRUE" here, only showing local data cache

## Now the data is available to use locally 
dataSearch(c("ensembl", "GRCh38"))

getData

Description

Evaluation of data recipes to generate curated dataset of interest.

Usage

getData(
  rcp,
  outdir,
  prefix = NULL,
  notes = c(),
  conda = FALSE,
  BPPARAM = NULL,
  ...
)
getData(
  rcp,
  outdir,
  prefix = NULL,
  notes = c(),
  conda = FALSE,
  BPPARAM = NULL,
  ...
)

Arguments

`rcp`	the data recipe in `cwlProcess` S4 class.
`outdir`	Character string specifying the directory to store the output files. Will automatically create if not exist or provided.
`prefix`	Character string specifying the file name of the annotation files (.yml, .cwl, .sh, .md5).
`notes`	User assigned notes/keywords to annotate the data and be used for keywords matching in `dataSearch(keywords = )`.
`conda`	Whether to use conda to install required software when evaluating the data recipe as a CWL workflow. Default is FALSE.
`BPPARAM`	The options for `BiocParallel::bpparam`.
`...`	Arguments to be passed into `Rcwl:runCWL()`.

Value

The data files and 4 meta files: .cwl: The cwl script that was internally run to get the data; .yml: the input parameter values for the data recipe and user specified data annotation notes, versions etc; .sh: The script for data processing; .md: checksum file to verify the integrity of generated data files.

Examples

## Not run: 
library(Rcwl)
outdir <- file.path(tempdir(), "SharedData")

## Example 1
echo_out <- recipeLoad("echo_out")
Rcwl::inputs(echo_out)
echo_out$input <- "Hello World!"
echo_out$outfile <- "outfile"
res <- getData(echo_out,
               outdir = outdir,
               notes = c("echo", "hello", "world", "txt"),
               showLog = TRUE)

# Example 2
ensembl_liftover <- recipeLoad("ensembl_liftover")
Rcwl::inputs(ensembl_liftover)
ensembl_liftover$species <- "human"
ensembl_liftover$from <- "GRCh37"
ensembl_liftover$to <- "GRCh38"

res <- getData(ensembl_liftover,
        outdir = outdir, 
        notes = c("ensembl", "liftover", "human", "GRCh37", "GRCh38"),
        showLog = TRUE)
dir(outdir)

## End(Not run)
## Not run: 
library(Rcwl)
outdir <- file.path(tempdir(), "SharedData")

## Example 1
echo_out <- recipeLoad("echo_out")
Rcwl::inputs(echo_out)
echo_out$input <- "Hello World!"
echo_out$outfile <- "outfile"
res <- getData(echo_out,
               outdir = outdir,
               notes = c("echo", "hello", "world", "txt"),
               showLog = TRUE)

# Example 2
ensembl_liftover <- recipeLoad("ensembl_liftover")
Rcwl::inputs(ensembl_liftover)
ensembl_liftover$species <- "human"
ensembl_liftover$from <- "GRCh37"
ensembl_liftover$to <- "GRCh38"

res <- getData(ensembl_liftover,
        outdir = outdir, 
        notes = c("ensembl", "liftover", "human", "GRCh37", "GRCh38"),
        showLog = TRUE)
dir(outdir)

## End(Not run)

meta_data

Description

Functions to generate the meta csv file for local cached dataset.

Usage

meta_data(dir = "", cleanup = FALSE, checkData = TRUE)
meta_data(dir = "", cleanup = FALSE, checkData = TRUE)

Arguments

`dir`	The path to the shared data folder.
`cleanup`	If remove any invalid intermediate files. Default is FALSE. In cases one data recipe (with same parameter values) was evaluated multiple times, the same data file(s) will match to multiple intermediate files (e.g., .yml). `cleanup` will remove older intermediate files, and only keep the most recent ones that matches the data file. When there are any intermediate files that don't match to any data file, `cleanup` will also remove those.
`checkData`	check if the data (listed as "# output: " in the yml file) exists. If not, do not include in the output csv file. This argument is added for internal testing purpose.

Value

a data.frame with yml file name, parameter values, data file paths, date, and user-specified notes when generating the data with getData().

Examples

outdir <- file.path(tempdir(), "SharedData")
meta_data(outdir)
outdir <- file.path(tempdir(), "SharedData")
meta_data(outdir)

recipeHub

Description

recipeHub class, constructor, and methods.

Usage

recipeHub(BFC)

recipeHub(BFC)

## S4 method for signature 'recipeHub'
show(object)

## S4 method for signature 'recipeHub,ANY,ANY,ANY'
x[i]

recipeNames(object)
recipeHub(BFC)

recipeHub(BFC)

## S4 method for signature 'recipeHub'
show(object)

## S4 method for signature 'recipeHub,ANY,ANY,ANY'
x[i]

recipeNames(object)

Arguments

`BFC`	A BiocFileCache object created for recipe and recipes.
`object`	The `recipeHub` object
`x`	The `recipeHub` object
`i`	The integer index of the `recipeHub` object

Value

recipeHub: a recipeHub object.

[: A recipeHub object that was subsetted.

recipeNames: the recipe names for the recipeHub object.

Examples

rcps <- recipeSearch(c("gencode"))
## rcp1 <- rcps[1]
## recipeNames(rcp1)
rcps <- recipeSearch(c("gencode"))
## rcp1 <- rcps[1]
## recipeNames(rcp1)

recipeLoad

Description

To load data recipe(s) into R environment.

Usage

recipeLoad(
  rcp = c(),
  cachePath = "ReUseDataRecipe",
  env = .GlobalEnv,
  return = TRUE
)
recipeLoad(
  rcp = c(),
  cachePath = "ReUseDataRecipe",
  env = .GlobalEnv,
  return = TRUE
)

Arguments

`rcp`	The (vector of) character string of recipe name or file path (`recipeNames()` or `mcols()$fpath` column of the `recipeHub` object returned from `recipeSearch`).
`cachePath`	A character string for the recipe cache. Must match the one specified in `recipeUpdate()`. Default is "ReUseDataRecipe".
`env`	The R environment to export to. Default is `.GlobalEnv`.
`return`	Whether to return the recipe to a user-assigned R object. Default is TRUE, where user need to assign a variable name to the recipe. e.g., `rcp1 <- recipeLoad()`. If FALSE, it loads the recipe and uses its original name, and user doesn't need to assign a new name. e.g., `recipeLoad(return=TRUE)`. If multiple recipes are to be loaded, `return=FALSE` must be used.

Value

A data recipe of cwlProcess S4 class, which is ready to be evaluated in R.

Examples

########################
## Load single recipe
########################

library(Rcwl)
recipeUpdate()
recipeSearch("liftover")
rcp <- recipeLoad("ensembl_liftover")
Rcwl::inputs(rcp)
rm(rcp)

gencode_annotation <- recipeLoad("gencode_annotation")
inputs(gencode_annotation)
rm(gencode_annotation)

#########################
## Load multiple recipes
#########################

rcphub <- recipeSearch("gencode")
recipeNames(rcphub)
recipeLoad(recipeNames(rcphub), return=FALSE)
inputs(gencode_transcripts)
########################
## Load single recipe
########################

library(Rcwl)
recipeUpdate()
recipeSearch("liftover")
rcp <- recipeLoad("ensembl_liftover")
Rcwl::inputs(rcp)
rm(rcp)

gencode_annotation <- recipeLoad("gencode_annotation")
inputs(gencode_annotation)
rm(gencode_annotation)

#########################
## Load multiple recipes
#########################

rcphub <- recipeSearch("gencode")
recipeNames(rcphub)
recipeLoad(recipeNames(rcphub), return=FALSE)
inputs(gencode_transcripts)

recipeMake

Description

Constructor function of data recipe

Usage

recipeMake(
  shscript = character(),
  paramID = c(),
  paramType = c(),
  outputID = c(),
  outputType = c("File[]"),
  outputGlob = character(0),
  requireTools = character(0)
)
recipeMake(
  shscript = character(),
  paramID = c(),
  paramType = c(),
  outputID = c(),
  outputType = c("File[]"),
  outputGlob = character(0),
  requireTools = character(0)
)

Arguments

`shscript`	character string. Can take either the file path to the user provided shell script, or directly the script content, that are to be converted into a data recipe.
`paramID`	Character vector. The user specified parameter ID for the recipe.
`paramType`	Character vector specifying the type for each `paramID`. One parameter can be of multiple types in list. Valid values are "int" for integer, "boolean" for boolean, "float" for numeric, "File" for file path, "File[]" for an array of files, etc. Can also take "double", "long", "null", "Directory". See details.
`outputID`	the ID for each output.
`outputType`	the output type for each output.
`outputGlob`	the glob pattern of output files. E.g., "hg19.*".
`requireTools`	the command-line tools to be used for data processing/curation in the user-provided shell script. The value here must exactly match the tool name. E.g., "bwa", "samtools", etc. A particular version of that tool can be specified in the format of "tool=version", e.g., "samtools=1.3".

Details

For parameter types, more details can be found here: "https://www.commonwl.org/v1.2/CommandLineTool.html#CWLType".

recipeMake is a convenient function for wrapping a shell script into a data recipe (in cwlProcess S4 class). Please use Rcwl::cwlProcess for more options and functionalities, especially when the recipe gets complicated, e.g., needs a docker image for a command-line tool, or one parameter takes multiple types, etc. Refer to this recipe as an example: https://github.com/rworkflow/ReUseDataRecipe/blob/master/reference_genome.R

Value

a data recipe in cwlProcess S4 class with all details about the shell script for data processing/curation, inputs, outputs, required tools and corresponding docker files. It is readily taken by getData() to evaluate the shell scripts included and generate the data locally. Find more details with ?Rcwl::cwlProcess.

Examples

## Not run: 
library(Rcwl)
##############
### example 1
##############

script <- "
input=$1
outfile=$2
echo \"Print the input: $input\" > $outfile.txt
"
rcp <- recipeMake(shscript = script,
                  paramID = c("input", "outfile"),
                  paramType = c("string", "string"),
                  outputID = "echoout",
                  outputGlob = "*.txt")
inputs(rcp)
outputs(rcp)
rcp$input <- "Hello World!"
rcp$outfile <- "outfile"
res <- getData(rcp, outdir = tempdir(),
               notes = c("echo", "hello", "world", "txt"),
               showLog = TRUE)
readLines(res$out)

##############
### example 2
##############

shfile <- system.file("extdata", "gencode_transcripts.sh", package = "ReUseData")
readLines(shfile)
rcp <- recipeMake(shscript = shfile,
                  paramID = c("species", "version"),
                  paramType = c("string", "string"),
                  outputID = "transcripts", 
                  outputGlob = "*.transcripts.fa*",
                  requireTools = c("wget", "gzip", "samtools")
                  )
Rcwl::inputs(rcp)
rcp$species <- "human"
rcp$version <- "42"
res <- getData(rcp,
        outdir = tempdir(), 
        notes = c("gencode", "transcripts", "human", "42"),
        showLog = TRUE)
res$output
dir(tempdir())

## End(Not run)
## Not run: 
library(Rcwl)
##############
### example 1
##############

script <- "
input=$1
outfile=$2
echo \"Print the input: $input\" > $outfile.txt
"
rcp <- recipeMake(shscript = script,
                  paramID = c("input", "outfile"),
                  paramType = c("string", "string"),
                  outputID = "echoout",
                  outputGlob = "*.txt")
inputs(rcp)
outputs(rcp)
rcp$input <- "Hello World!"
rcp$outfile <- "outfile"
res <- getData(rcp, outdir = tempdir(),
               notes = c("echo", "hello", "world", "txt"),
               showLog = TRUE)
readLines(res$out)

##############
### example 2
##############

shfile <- system.file("extdata", "gencode_transcripts.sh", package = "ReUseData")
readLines(shfile)
rcp <- recipeMake(shscript = shfile,
                  paramID = c("species", "version"),
                  paramType = c("string", "string"),
                  outputID = "transcripts", 
                  outputGlob = "*.transcripts.fa*",
                  requireTools = c("wget", "gzip", "samtools")
                  )
Rcwl::inputs(rcp)
rcp$species <- "human"
rcp$version <- "42"
res <- getData(rcp,
        outdir = tempdir(), 
        notes = c("gencode", "transcripts", "human", "42"),
        showLog = TRUE)
res$output
dir(tempdir())

## End(Not run)

recipeSearch

Description

Search existing data recipes.

Usage

recipeSearch(keywords = character(), cachePath = "ReUseDataRecipe")
recipeSearch(keywords = character(), cachePath = "ReUseDataRecipe")

Arguments

`keywords`	character vector of keywords to be matched to the recipe names. If not specified, function returns the full recipe list.
`cachePath`	A character string for the recipe cache. Must match the one specified in `recipeUpdate()`. Default is "ReUseDataRecipe".

Value

A recipeHub object.

Examples

recipeSearch()
recipeSearch("gencode")
recipeSearch(c("STAR", "index"))
recipeSearch()
recipeSearch("gencode")
recipeSearch(c("STAR", "index"))

recipeUpdate

Description

Function to sync and get the most updated and newly added data recipes through the pubic "rworkflow/ReUseDataRecipe" GitHub repository or user-specified private GitHub repository.

Usage

recipeUpdate(
  cachePath = "ReUseDataRecipe",
  force = FALSE,
  remote = FALSE,
  repos = "rworkflow/ReUseDataRecipe"
)
recipeUpdate(
  cachePath = "ReUseDataRecipe",
  force = FALSE,
  remote = FALSE,
  repos = "rworkflow/ReUseDataRecipe"
)

Arguments

`cachePath`	A character string specifying the name for the `BiocFileCache` object to store the ReUseData recipes. Once specified here, must use the same for `cachePath` argument in `recipeSearch`, and `recipeLoad`. Default is "ReUseDataRecipe".
`force`	Whether to remove existing and regenerate recipes cache. Default is FALSE. Only use if any old recipes that have been previously cached locally are updated remotely (on GitHub `repos`).
`remote`	Whether to download the data recipes directly from a GitHub repository. Default is FALSE.
`repos`	The GitHub repository containing data recipes that are to be synced to local cache. Only works when `remote=TRUE`. Default is "rworkflow/ReUseDataRecipe" GitHub repository where public data recipes are saved, which might be more up-to-date than the recipes contained in`ReUseData` package. It can also be a private GitHub repository where users save their own data recipes.

Value

a recipeHub object.

Examples

## recipeUpdate()
## recipeUpdate(force=TRUE)
## recipeUpdate(force = TRUE, remote = TRUE)
## recipeUpdate()
## recipeUpdate(force=TRUE)
## recipeUpdate(force = TRUE, remote = TRUE)

ReUseData

Description

ReUseData is an R/Bioconductor software tool to provide a systematic and versatile approach for standardized and reproducible data management. ReUseData facilitates transformation of shell or other ad hoc scripts for data preprocessing into workflow-based data recipes. Evaluation of data recipes generate curated data files in their generic formats (e.g., VCF, bed). Both recipes and data are cached using database infrastructure for easy data management and reuse. Prebuilt data recipes are available through ReUseData portal ("https://rcwl.org/dataRecipes/") with full annotation and user instructions. Pregenerated data are available through ReUseData cloud bucket that is directly downloadable through "getCloudData()".

Package 'ReUseData'

Help Index

annData

Description

Usage

Arguments

dataHub Class

Description

Usage

Arguments

Value

Examples

dataSearch search data in local data caching system

Description

Usage

Arguments

Value

Examples

dataUpdate

Description

Usage

Arguments

Details

Value

Examples

getCloudData Download the pregenerated curated data sets from ReUseData cloud bucket

Description

Usage

Arguments

Value

Examples

getData

Description

Usage

Arguments

Value

Examples

meta_data

Description

Usage

Arguments

Value

Examples

recipeHub

Description

Usage

Arguments

Value

Examples

recipeLoad

Description

Usage

Arguments

Value

Examples

recipeMake

Description

Usage

Arguments

Details

Value

Examples

recipeSearch

Description

Usage

Arguments

Value

Examples

recipeUpdate

Description

Usage

Arguments

Value

Examples

ReUseData

Description