| Title: | R Interface to European Genome-Phenome Archive |
|---|---|
| Description: | The European Genome-phenome Archive (EGA) provides long-term storage and controlled sharing of personally identifiable genetic data. The Rega package offers a streamlined and extensible R interface to the EGA API, facilitating the programmatic upload of metadata. GEO-like Excel submission template is provided as a default method of organizing submission metadata. |
| Authors: | Igor Cervenka [aut, cre] (ORCID: <https://orcid.org/0000-0002-9438-5161>), Athimed El Taher [aut] (ORCID: <https://orcid.org/0000-0003-2424-8476>), Robert Ivanek [aut] (ORCID: <https://orcid.org/0000-0002-8403-056X>) |
| Maintainer: | Igor Cervenka <[email protected]> |
| License: | Artistic-2.0 |
| Version: | 1.1.0 |
| Built: | 2026-05-31 06:42:48 UTC |
| Source: | https://github.com/bioc/Rega |
Mark Required Fields with a Prefix
add_required_str(p, r, req_str = "* ")add_required_str(p, r, req_str = "* ")
p |
Character vector. All fields to be processed. |
r |
Character vector. Fields that are required. |
req_str |
Character. Prefix to mark required fields. Defaults to |
A character vector with required fields prefixed and ordered to appear before non-required fields.
# Mark required fields with a prefix add_required_str(c("Name", "Id", "Age"), c("Id", "Name"))# Mark required fields with a prefix add_required_str(c("Name", "Id", "Age"), c("Id", "Name"))
Format Aliases from a Table
aliases_formatter(tab, params)aliases_formatter(tab, params)
tab |
Data frame. The input table where the first row contains column names. |
params |
List. Additional parameters for formatting. Takes a formatter params value from parser parameter yaml file. Currently unused. |
A named list where each name corresponds to a formatted column name, and values are non-NA elements of the respective column.
tab <- data.frame(Alias = c("name1", "name2", NA), Value = c(1, 2, 3)) aliases_formatter(tab, params = list())tab <- data.frame(Alias = c("name1", "name2", NA), Value = c(1, 2, 3)) aliases_formatter(tab, params = list())
This function dynamically creates an API function based on a given operation definition and API specification. The generated function handles URL construction, parameter validation, request execution, and response parsing.
api_function_factory( op, api, verbosity = 0, bearer_token = NULL, token_url = .EGA_TOKEN_URL )api_function_factory( op, api, verbosity = 0, bearer_token = NULL, token_url = .EGA_TOKEN_URL )
op |
List. The API operation definition, including method, path, parameters, and request body schema. |
api |
List. The API specification, including host and global security definitions. |
verbosity |
Integer, optional, values 0-3. Indicates with which
verbosity level should the requests |
bearer_token |
Character, optional. The API bearer token for
authentication, will be included in the headers of the request. Defaults to
|
token_url |
Character, optional. Token endpoint URL from which to obtain
the access token. If |
A dynamically generated function that performs the specified API
operation. The function accepts arguments corresponding to operation
parameters and executes the request using httr2.
api <- extract_api() opdefs <- extract_operation_definitions(api) # Generate an API function for a specific operation f <- api_function_factory( opdefs[["get__files"]], api, bearer_token = "my_key" ) # Call the generated function with parameters (requires credentials) try( result <- f(status = "value1", prefix = "value2") )api <- extract_api() opdefs <- extract_operation_definitions(api) # Generate an API function for a specific operation f <- api_function_factory( opdefs[["get__files"]], api, bearer_token = "my_key" ) # Call the generated function with parameters (requires credentials) try( result <- f(status = "value1", prefix = "value2") )
This function converts API-style names with underscores into human-readable labels by replacing underscores with spaces and applying title case.
api_name_to_label(x)api_name_to_label(x)
x |
Character vector. API field names to be converted. |
A character vector with API names converted to human-readable labels.
api_name_to_label(c("first_name", "last_name", "instument_model"))api_name_to_label(c("first_name", "last_name", "instument_model"))
Format a Column Table
column_table_formatter(tab, params)column_table_formatter(tab, params)
tab |
Data frame. The input table sumbission metadata file where the first row contains column names. |
params |
List. Additional parameters for formatting. Takes a formatter params value from parser parameter yaml file. Currently unused. |
A cleaned data frame with column names set from the first row, empty rows removed, and whitespace trimmed from all values.
df <- data.frame( ...1 = c("* Alias", "Sample1", "Sample2"), ...2 = c("* Phenotype", "wt", "ko"), ...3 = c("Description", NA, NA) ) column_table_formatter(df, list())df <- data.frame( ...1 = c("* Alias", "Sample1", "Sample2"), ...2 = c("* Phenotype", "wt", "ko"), ...3 = c("Description", NA, NA) ) column_table_formatter(df, list())
This function creates a named list of functions for interacting with an API, based on its specification and operation definitions.
create_client(api, ...)create_client(api, ...)
api |
List. The API specification, including operation definitions, host, and global settings. |
... |
List. List of additional arguments passed to
|
A named list of functions, where each function corresponds to an API operation. The function names match the operation IDs from the specification.
client <- create_client( extract_api(), bearer_token = "my_key", verbosity = 1 ) # Call an operation using the client (requires credentials) try( result <- client$get__files(status = "value1", prefix = "value2") )client <- create_client( extract_api(), bearer_token = "my_key", verbosity = 1 ) # Call an operation using the client (requires credentials) try( result <- client$get__files(status = "value1", prefix = "value2") )
This function parses the extdata/ega_full_template_v3.xlsx using the
bundled parser parameter file in extdata/default_parser_params.yaml to
extract information for EGA submission into format that can be easily passed
into EGA API endpoints.
default_parser(metadata_file, param_file = NULL)default_parser(metadata_file, param_file = NULL)
metadata_file |
Character. Path to a default template xlsx file containing the submission metadata information. |
param_file |
Character. Path to a yaml file with parameters for parser.
If NULL, uses the |
List of data frames or lists. Submission information parsed from the xlsx file.
default_parser( system.file("extdata/submission_example.xlsx", package = "Rega") )default_parser( system.file("extdata/submission_example.xlsx", package = "Rega") )
Used to validate internal consistency of submission metadata parsed using the default parser. Performs several checks on EGA dataset for submission, ensuring that aliases for studies, experiments, samples, runs, analyses and datasets are are properly linked, as they will be replaced with provisional or accession IDs during submission process. Displays a success message if all validation passed or a summary message if validation failed. In addition it returns a data frame with validation details.
default_validator(meta, aliases = NULL)default_validator(meta, aliases = NULL)
meta |
List of data frames. Correspond to tables of EGA submission. |
aliases |
List of lists. Aliases that should present in the EGA tables.
If |
Data frame. Validator object that includes all performed validations and their statistics (number of passes, fails and NAs, or whether errors or warnings were encountered during validation)
minimal_metadata <- list( aliases = list( studies = "Study1", experiments = "Experiment1", datasets = "Dataset1", samples = "Sample1", runs = "Run1", analyses = "Analysis1" ), files = tibble::tibble( file = "raw.fastq.gz", ega_file = list("raw.fastq.gz.c4gh") ), submission = tibble::tibble(title = "Submission"), studies = tibble::tibble( study = "Study1", title = "Study Title", description = "Study Description", study_type = "Whole Genome Sequencing" ), samples = tibble::tibble( alias = "Sample1", phenotype = "wild-type", biological_sex = "female", subject_id = "ID1" ), experiments = tibble::tibble( study = "Study1", experiment = "Experiment1", design_description = "Experiment Design", library_selection = "RANDOM", instrument_model_id = 1L, library_layout = "SINGLE", library_strategy = "WGS", library_source = "GENOMIC" ), runs = tibble::tibble( run = "Run1", experiment = "Experiment1", run_file_type = "srf", alias = "Sample1", files = list("raw.fastq.gz.c4gh") ), datasets = tibble::tibble( dataset = "Dataset1", title = "Dataset Title", description = "Dataset Description", policy_accession_id = "EGAP00000000001", dataset_types = list("Whole genome sequencing"), runs = list("Run1") ) ) default_validator(minimal_metadata)minimal_metadata <- list( aliases = list( studies = "Study1", experiments = "Experiment1", datasets = "Dataset1", samples = "Sample1", runs = "Run1", analyses = "Analysis1" ), files = tibble::tibble( file = "raw.fastq.gz", ega_file = list("raw.fastq.gz.c4gh") ), submission = tibble::tibble(title = "Submission"), studies = tibble::tibble( study = "Study1", title = "Study Title", description = "Study Description", study_type = "Whole Genome Sequencing" ), samples = tibble::tibble( alias = "Sample1", phenotype = "wild-type", biological_sex = "female", subject_id = "ID1" ), experiments = tibble::tibble( study = "Study1", experiment = "Experiment1", design_description = "Experiment Design", library_selection = "RANDOM", instrument_model_id = 1L, library_layout = "SINGLE", library_strategy = "WGS", library_source = "GENOMIC" ), runs = tibble::tibble( run = "Run1", experiment = "Experiment1", run_file_type = "srf", alias = "Sample1", files = list("raw.fastq.gz.c4gh") ), datasets = tibble::tibble( dataset = "Dataset1", title = "Dataset Title", description = "Dataset Description", policy_accession_id = "EGAP00000000001", dataset_types = list("Whole genome sequencing"), runs = list("Run1") ) ) default_validator(minimal_metadata)
Deletes a submission identified by its ID using the client and logs the response if a logfile is specified.
delete_submission(id, client = NULL, logfile = NULL, ...)delete_submission(id, client = NULL, logfile = NULL, ...)
id |
A string representing the submission identifier (provisional ID). |
client |
An API client object with a |
logfile |
A string specifying the path to a log file. If |
... |
Additional arguments for future extensions (currently unused). |
A list containing the response for the submission deletion.
mock_client <- list( delete__submissions__provisional_id = function(id) list(status = "deleted") ) delete_submission("5678901", mock_client)mock_client <- list( delete__submissions__provisional_id = function(id) list(status = "deleted") ) delete_submission("5678901", mock_client)
Deletes all data associated with a submission ID using the client and logs the responses if a logfile is specified.
delete_submission_contents(id, client = NULL, logfile = NULL, ...)delete_submission_contents(id, client = NULL, logfile = NULL, ...)
id |
A string representing the submission identifier. Can be either an accession or provisional ID. |
client |
List of functions. EGA API client created by |
logfile |
A string specifying the path to a log file. If |
... |
Additional arguments for future extensions (currently unused). |
A list of responses for the deletion of associated datasets, analyses, runs, experiments, samples, and studies.
mock_client <- list( "delete__submissions__provisional_id__datasets" = function(id) list(status = "deleted") ) delete_submission_contents(5678901, client = mock_client)mock_client <- list( "delete__submissions__provisional_id__datasets" = function(id) list(status = "deleted") ) delete_submission_contents(5678901, client = mock_client)
A named list of functions used to coerce delimited strings into specific data types during metadata processing.
DELIM_CONVERTERSDELIM_CONVERTERS
A named list:
Coerces values to integer.
df <- list(pub_sheet = data.frame( pubmed_ids = c(" 123456, 789", NA, "1001 ") )) process_delimited_column( df, "pubmed_ids", separator = ",", converters = DELIM_CONVERTERS )df <- list(pub_sheet = data.frame( pubmed_ids = c(" 123456, 789", NA, "1001 ") )) process_delimited_column( df, "pubmed_ids", separator = ",", converters = DELIM_CONVERTERS )
ega_oauth implements the EGA OAuth resource owner password flow, as defined
by Section 4.3 of RFC 6749. It allows the user to supply their password once,
exchanging it for an access token that can be cached locally. Please avoid
entering the password directly when calling this function as it will be
captured by .Rhistory.
ega_oauth( req, username = .get_ega_username(), password = .get_ega_password(), token_url = .EGA_TOKEN_URL )ega_oauth( req, username = .get_ega_username(), password = .get_ega_password(), token_url = .EGA_TOKEN_URL )
req |
A httr2 request. |
username |
Character. EGA User name. Defaults to the value returned by
|
password |
Character. EGA user Password. Defaults to the value returned
by |
token_url |
Character. The URL for the EGA token endpoint. Defaults to
.EGA_TOKEN_URL =
|
returns a modified HTTP request that will use OAuth
req <- httr2::request("https://example.com/") # Request OAuth with default credentials try(oauth_req <- ega_oauth(req)) # Request OAuth with custom credentials oauth_req <- ega_oauth(req, username = "user", password = "pass")req <- httr2::request("https://example.com/") # Request OAuth with default credentials try(oauth_req <- ega_oauth(req)) # Request OAuth with custom credentials oauth_req <- ega_oauth(req, username = "user", password = "pass")
This function retrieves an API token from the European Genome-Phenome Archive (EGA) using user credentials.
ega_token( username = .get_ega_username(), password = .get_ega_password(), token_url = .EGA_TOKEN_URL )ega_token( username = .get_ega_username(), password = .get_ega_password(), token_url = .EGA_TOKEN_URL )
username |
Character. The username for EGA authentication. Defaults to
the value returned by |
password |
Character. The password for EGA authentication. Defaults to
the value returned by |
token_url |
Character. The URL for the EGA token endpoint. Defaults to
the standard EGA token URL if not provided. Defaults to .EGA_TOKEN_URL =
|
A list containing the token details if successful. Actual token value
can be retrieved by token$access_token
try( ega_token(username = "my_username", password = "my_password") ) try( ega_token(token_url = "https://www.example.com") )try( ega_token(username = "my_username", password = "my_password") ) try( ega_token(token_url = "https://www.example.com") )
This function parses an API specification file (JSON or YAML) and extracts relevant details.
extract_api(spec_file = NULL, host = NULL)extract_api(spec_file = NULL, host = NULL)
spec_file |
Character. Optional.Path to the API specification file in
JSON or YAML format. If NULL default |
host |
Character. Optional. The API host URL. If not supplied, it will
be inferred from the specification file's |
A list containing the parsed API specification, including the host
and basePath elements. If the specification file lacks required elements,
appropriate warnings or errors are raised.
# Extract API details from a default YAML specification file api <- extract_api() # Extract API details with a custom host api <- extract_api(host = "https://api.example.com")# Extract API details from a default YAML specification file api <- extract_api() # Extract API details with a custom host api <- extract_api(host = "https://api.example.com")
This function extracts operation definitions from an API specification, including HTTP methods, paths, parameters, request bodies, and responses.
extract_operation_definitions(api)extract_operation_definitions(api)
api |
List. Parsed API specification, generated from a JSON or YAML
file. Must include a |
A named list of operations, where each name corresponds to an operation ID. If operation Id is not found in the specification, unique one will be created. Each operation contains:
method: HTTP method (e.g., GET, POST).
path: Endpoint path.
parameters: List of operation parameters.
requestBody: Details of the request body (if any).
responses: Possible responses for the operation.
security: Security requirements for the operation.
# Extract operation definitions from a parsed API specification opdefs <- extract_operation_definitions(extract_api()) opdefs[["post__submissions"]]# Extract operation definitions from a parsed API specification opdefs <- extract_operation_definitions(extract_api()) opdefs[["post__submissions"]]
Extracts the specific resource identifier (e.g., "users", "datasets") from
the path of an httr2 response object by parsing the segment immediately
following /api/.
extract_resource_name(resp)extract_resource_name(resp)
resp |
An |
A character string containing the resource name.
resp <- httr2::response( method = "GET", url = "https://www.example.com/api/files" ) extract_resource_name(resp)resp <- httr2::response( method = "GET", url = "https://www.example.com/api/files" ) extract_resource_name(resp)
Query the remote client for requested file prefix, test whether a file is
found for every element of file_list and return the server response.
fetch_files(file_list, client = NULL)fetch_files(file_list, client = NULL)
file_list |
A character vector or list of file prefixes to check. |
client |
List of functions. EGA API client created by |
Data frame. Parsed response from client API for requested files.
mock_client <- list( get__files = function(prefix = NULL) { data.frame(provisional_id = 12345, ega_relative_path = prefix) } ) fetch_files(c("file_a", "file_b"), mock_client)mock_client <- list( get__files = function(prefix = NULL) { data.frame(provisional_id = 12345, ega_relative_path = prefix) } ) fetch_files(c("file_a", "file_b"), mock_client)
Format File Table with EGA File Paths
file_formatter(tab, params)file_formatter(tab, params)
tab |
Data frame. The input table containing file information. Columns
|
params |
List. Additional parameters for formatting. Takes a formatter
params value from parser parameter yaml file. Includes |
A formatted data frame with cleaned column names, and updated
ega_file paths based on file and relative path information.
params <- list(prefix = "", crypt_ext = "c4gh", prepend_slash = FALSE) # Dummy data, first row will be moved to column names tab <- data.frame( x1 = c("file", "value1", "value2"), x2 = c("ega_inbox_relative_path", NA, "proj1") ) file_formatter(tab, params)params <- list(prefix = "", crypt_ext = "c4gh", prepend_slash = FALSE) # Dummy data, first row will be moved to column names tab <- data.frame( x1 = c("file", "value1", "value2"), x2 = c("ega_inbox_relative_path", NA, "proj1") ) file_formatter(tab, params)
This function filters out elements from a character vector that match a specified regular expression pattern, removing ID fields.
filter_id_fields(x, pattern = NULL)filter_id_fields(x, pattern = NULL)
x |
Character vector. The input vector to filter. |
pattern |
Character. Optional. A regular expression pattern for matching
ID fields to exclude. Defaults to
|
A character vector with elements matching the pattern removed.
# Filter out default ID fields fields <- c("accession_id", "policy_accession_id", "name", "provisional_id") filter_id_fields(fields) # Filter with a custom pattern filter_id_fields(fields, pattern = "_id")# Filter out default ID fields fields <- c("accession_id", "policy_accession_id", "name", "provisional_id") filter_id_fields(fields) # Filter with a custom pattern filter_id_fields(fields, pattern = "_id")
Submits the finalisation request for a submission identified by either an accession or provisional ID. Validates the release date and sends optional dataset changelogs.
finalise_submission( id, release_date, dataset_changelogs = data.frame(), client = NULL, logfile = NULL, ... )finalise_submission( id, release_date, dataset_changelogs = data.frame(), client = NULL, logfile = NULL, ... )
id |
Character scalar. The submission accession or provisional ID. |
release_date |
Character scalar. Expected release date in YYYY-MM-DD format. |
dataset_changelogs |
Data frame. Optional changelog metadata for
associated datasets. If specified, the requred columns are |
client |
List of functions. EGA API client created by |
logfile |
Character. Path of log file to log the |
... |
List. Additional arguments to the function. |
The API response object from the finalisation request.
# Requires credentials try( finalise_submission("123456", "2025-12-31") )# Requires credentials try( finalise_submission("123456", "2025-12-31") )
Use First Row as Column Names for a Data Frame
first_row_to_colnames(df, to_api = TRUE)first_row_to_colnames(df, to_api = TRUE)
df |
Data frame. The input data frame whose first row will become column names. |
to_api |
Logical. Whether to convert labels to API-style names using
|
A data frame with updated column names and the first row removed.
df <- data.frame(id = c("A B", "C D_"), value = c("* E F", "GH")) first_row_to_colnames(df) first_row_to_colnames(df, to_api = FALSE)df <- data.frame(id = c("A B", "C D_"), value = c("* E F", "GH")) first_row_to_colnames(df) first_row_to_colnames(df, to_api = FALSE)
If NA values are present in any of the columns to be nested, they will
be removed. If the column is not present it will be added with NA as a
single value.
fold_column(tab, column_prefix, new_name)fold_column(tab, column_prefix, new_name)
tab |
Data frame. The input table with columns to fold. |
column_prefix |
Character. The prefix of columns to nest into a single column represented as list. |
new_name |
Character. The name of the new folded column. |
A data frame with the specified columns nested into a single column.
tab <- data.frame(id = c(1, 2), name.1 = c("A1", NA), name.2 = c("B1", "B2")) fold_column(tab, "name", "folded_column")tab <- data.frame(id = c(1, 2), name.1 = c("A1", NA), name.2 = c("B1", "B2")) fold_column(tab, "name", "folded_column")
Formats and processes chromosome-related metadata from an input object by applying chromosome group lookups or splitting chromosome strings from the EGA enums.
format_chromosomes(metadata)format_chromosomes(metadata)
metadata |
List. A list of data frames representing metadata sheets,
containing |
A list of formatted chromosome data extracted or computed from the input metadata.
# Mock metadata data frame metadata <- list( analyses = data.frame( chromosomes = I(list( NA, list("group1--1--chr1--name1", "group2--3--chr3--name3"), "group1--2--chr2--name2" )), chromosome_groups = c("group1", NA, "group3"), stringsAsFactors = FALSE ), select_input_data = list( chromosomes = c("group1--1--chr1--name1", "group1--2--chr2--name2") ) ) format_chromosomes(metadata)# Mock metadata data frame metadata <- list( analyses = data.frame( chromosomes = I(list( NA, list("group1--1--chr1--name1", "group2--3--chr3--name3"), "group1--2--chr2--name2" )), chromosome_groups = c("group1", NA, "group3"), stringsAsFactors = FALSE ), select_input_data = list( chromosomes = c("group1--1--chr1--name1", "group1--2--chr2--name2") ) ) format_chromosomes(metadata)
Retrieve Chromosome Belonging to a Group
get_chr_group(group_id, chr_enum, sep = "--")get_chr_group(group_id, chr_enum, sep = "--")
group_id |
Character. The group ID to filter by. |
chr_enum |
Character vector. Chromosome enumeration data, where each
element is a string containing fields separated by |
sep |
Character. Field separator in a string. Defaults to |
An integer vector of chromosome IDs corresponding to the specified group ID.
get_chr_group( "group1", c("group1--1--chr1--name1", "group2--2--chr2--name2") )get_chr_group( "group1", c("group1--1--chr1--name1", "group2--2--chr2--name2") )
Searches for entries across specified EGA metadata types that match a given title string. Returns a list of data frames for each type.
get_entry_by_title(title, type = NULL, client = NULL, logfile = NULL, ...)get_entry_by_title(title, type = NULL, client = NULL, logfile = NULL, ...)
title |
Character scalar. The title or substring to search for. |
type |
Character vector. One or more metadata types ("submissions", "studies", "samples", "experiments", "runs", "analyses" and "datasets"). If NULL, searches all valid types. |
client |
List of functions. EGA API client created by |
logfile |
Character. Path of log file to log the |
... |
List. Additional arguments to the function. |
A named list of data frames containing entries matching the title.
# Requires credentials try( get_entry_by_title("My Study", type = "studies") )# Requires credentials try( get_entry_by_title("My Study", type = "studies") )
This function retrieves the values of a specified enum from an API by invoking the corresponding client function.
get_enum(client, enum_name, enum_prefix = "get__enums__")get_enum(client, enum_name, enum_prefix = "get__enums__")
client |
List. The API client, typically generated by |
enum_name |
Character. The name of the enum to retrieve. |
enum_prefix |
Character. Optional. The prefix used in the client for
enum functions. Defaults to |
The values of the specified enum.
# Create API client with mock api_key client <- create_client(extract_api(), token_url = "ABCD") # Retrieve enum values from the API client (requires credentials to work) try( platform_models <- get_enum(client, enum_name = "platform_models") )# Create API client with mock api_key client <- create_client(extract_api(), token_url = "ABCD") # Retrieve enum values from the API client (requires credentials to work) try( platform_models <- get_enum(client, enum_name = "platform_models") )
Retrieve a Formatter Function by Type of Submission Metadata Table
get_formatter(x, params)get_formatter(x, params)
x |
Character. The name of the submission metadata table/sheet. |
params |
List. A list containing a |
The formatter function corresponding to the specified table.
# Load formatter params params <- yaml::read_yaml(system.file( "extdata/default_parser_params.yaml", package = "Rega" )) # Dummy data, first row will be moved to column names tab <- data.frame( x1 = c("file", "value1", "value2"), x2 = c("ega_inbox_relative_path", NA, "proj1") ) ff <- get_formatter("files", params) ff_params <- get_formatter_params("files", params) ff(tab, ff_params)# Load formatter params params <- yaml::read_yaml(system.file( "extdata/default_parser_params.yaml", package = "Rega" )) # Dummy data, first row will be moved to column names tab <- data.frame( x1 = c("file", "value1", "value2"), x2 = c("ega_inbox_relative_path", NA, "proj1") ) ff <- get_formatter("files", params) ff_params <- get_formatter_params("files", params) ff(tab, ff_params)
Retrieve Formatter Parameters by Name
get_formatter_params(x, params)get_formatter_params(x, params)
x |
Character. The name of the formatter for which to retrieve parameters. |
params |
List. A list containing a |
A list of parameters for the specified formatter.
# Load formatter params params <- yaml::read_yaml(system.file( "extdata/default_parser_params.yaml", package = "Rega" )) # Dummy data, first row will be moved to column names tab <- data.frame( x1 = c("file", "value1", "value2"), x2 = c("ega_inbox_relative_path", NA, "proj1") ) ff <- get_formatter("files", params) ff_params <- get_formatter_params("files", params) ff(tab, ff_params)# Load formatter params params <- yaml::read_yaml(system.file( "extdata/default_parser_params.yaml", package = "Rega" )) # Dummy data, first row will be moved to column names tab <- data.frame( x1 = c("file", "value1", "value2"), x2 = c("ega_inbox_relative_path", NA, "proj1") ) ff <- get_formatter("files", params) ff_params <- get_formatter_params("files", params) ff(tab, ff_params)
Retrieve the Schema for an API Operation
get_operation_schema(op)get_operation_schema(op)
op |
List. The API operation definition containing a |
The schema for the operation's JSON request body, or NULL if
no schema is defined.
# Get operations from API opdefs <- extract_operation_definitions(extract_api()) # Retrieve the schema for a specific operation schema <- get_operation_schema(opdefs[["post__submissions"]])# Get operations from API opdefs <- extract_operation_definitions(extract_api()) # Retrieve the schema for a specific operation schema <- get_operation_schema(opdefs[["post__submissions"]])
This function retrieves existing data from an API or submits new data if it does not exist, with optional error handling and retrieval options.
If no data is present in the database, supplied data will be inserted.
If there is data already present in the database and the number of records don't match, error will raised.
If the number of records match and retrieve is set to TRUE
data will be retrieved from database and nothing will be inserted. If
retrieve is set to FALSE, error will be raised.
get_or_post( submission_id, data, client, endpoint, retrieve = FALSE, id_type = "provisional" )get_or_post( submission_id, data, client, endpoint, retrieve = FALSE, id_type = "provisional" )
submission_id |
An integer representing the submission provisional ID. |
data |
A data frame to be submitted. |
client |
An API client object with |
endpoint |
A string specifying the EGA API endpoint. The endpoint will be a submission type endpoint identified with provisional ID. |
retrieve |
A logical flag indicating whether to retrieve data
if it already exists. Defaults to |
id_type |
A string specifying type of EGA id. One of 'provisional' or
'accession'. Defaults to |
A data frame containing the response from the API.
# Create mock client for API endpoint mock_client <- list( get__submissions__provisional_id__endpoint = function(id) { message("Mock GET request") # Simulate an empty response (no existing data) return(NULL) }, post__submissions__provisional_id__endpoint = function(id, body) { message("Mock POST request") message(body) # Simulate returning submitted data } ) # Create mock data to test the function test_data <- data.frame(id = 1:3, value = c("A", "B", "C")) # Test the function with mock data and client result <- get_or_post( submission_id = 12345, data = test_data, client = mock_client, endpoint = "endpoint", retrieve = FALSE )# Create mock client for API endpoint mock_client <- list( get__submissions__provisional_id__endpoint = function(id) { message("Mock GET request") # Simulate an empty response (no existing data) return(NULL) }, post__submissions__provisional_id__endpoint = function(id, body) { message("Mock POST request") message(body) # Simulate returning submitted data } ) # Create mock data to test the function test_data <- data.frame(id = 1:3, value = c("A", "B", "C")) # Test the function with mock data and client result <- get_or_post( submission_id = 12345, data = test_data, client = mock_client, endpoint = "endpoint", retrieve = FALSE )
This function extracts property names from a schema, optionally filters out ID fields, and applies formatting such as marking required fields and prettifying the labels.
get_properties(schema, filter_ids = TRUE)get_properties(schema, filter_ids = TRUE)
schema |
List. The schema containing |
filter_ids |
Logical. Whether to filter out ID fields from the
properties. Defaults to |
A character vector of formatted property names and indications of required fields.
schemas <- get_schemas(extract_api()) # Extract and format properties from a schema get_properties(schemas[[5]]) # Extract properties without filtering ID fields get_properties(schemas[[6]], filter_ids = FALSE)schemas <- get_schemas(extract_api()) # Extract and format properties from a schema get_properties(schemas[[5]]) # Extract properties without filtering ID fields get_properties(schemas[[6]], filter_ids = FALSE)
This function extracts and returns schemas related to requests from the API specification.
get_schemas(api)get_schemas(api)
api |
List. The API specification, typically containing a |
A list of schemas whose names contain "Request", filtered from the
schemas element of the API specification.
# Extract request schemas from an API specification request_schemas <- get_schemas(extract_api())# Extract request schemas from an API specification request_schemas <- get_schemas(extract_api())
Count how many sentences are in a character string, based on terminal
punctuation marks ., !, or ? following an
alphanumeric character. If there is no punctuation character at the end, it
will still count it as another sentence.
get_sentence_number(text)get_sentence_number(text)
text |
A character vector. |
An integer scalar giving the number of sentences in text.
get_sentence_number("First sentence. Second sentence? Third!")get_sentence_number("First sentence. Second sentence? Third!")
Retrieves data associated with a submission ID using the client and logs the responses if a logfile is specified.
get_submission(id, client = NULL, logfile = NULL, ...)get_submission(id, client = NULL, logfile = NULL, ...)
id |
A string representing the submission identifier. Can be either an accession or provisional ID. |
client |
List of functions. EGA API client created by |
logfile |
A string specifying the path to a log file. If |
... |
Additional arguments for future extensions (currently unused). |
A list of responses including submission data and associated datasets, analyses, runs, experiments, samples, and studies.
mock_client <- list( "get__submissions__accession_id" = function(id) list(data = id), "get__submissions__accession_id__datasets" = function(id) list(datasets = id) ) get_submission("EGAB12345678901", mock_client)mock_client <- list( "get__submissions__accession_id" = function(id) list(data = id), "get__submissions__accession_id__datasets" = function(id) list(datasets = id) ) get_submission("EGAB12345678901", mock_client)
Compute the number of words in each element of a character vector using non-word separators.
get_word_number(text)get_word_number(text)
text |
A character vector with text. |
An integer vector giving the number of words per element of
text.
get_word_number(c("one two", "three four five"))get_word_number(c("one two", "three four five"))
Determines whether a specified sheet is present and contains at least one non-NA value in the provided metadata.
has_linked_sheets(metadata, colname)has_linked_sheets(metadata, colname)
metadata |
A list of data frame objects to check. |
colname |
A string specifying the name of the column to look for. |
A logical vector indicating whether each element of metadata
contains the specified column with at least one non-NA value.
metadata <- list( sheet1 = list(sheet_name = c(1, NA)), sheet2 = list(other_name = NA) ) has_linked_sheets(metadata, "sheet_name")metadata <- list( sheet1 = list(sheet_name = c(1, NA)), sheet2 = list(other_name = NA) ) has_linked_sheets(metadata, "sheet_name")
Verifies whether the input string matches the format of a valid accession identifier based on a specified schema.
is_accession(x, schema = NULL)is_accession(x, schema = NULL)
x |
A character vector to be tested for validity as accessions. |
schema |
A character string specifying the schema. Valid options include
"study", "studies", "sample", "samples", "experiment", "experiments",
"analysis", "analyses", "run", "runs", "policy", "DAC", "dataset",
"datasets", "submission" and |
A logical vector indicating which values are accession IDs.
is_accession("EGAB00000000001", "submission") # TRUE is_accession("EGA12345678901", "sample") # FALSEis_accession("EGAB00000000001", "submission") # TRUE is_accession("EGA12345678901", "sample") # FALSE
Determine if input values match the format of provisional IDs, either as whole-number numerics or as character strings of at least two digits without leading zeros.
is_provisional(x)is_provisional(x)
x |
A numeric or character vector of candidate provisional IDs. |
A logical vector indicating which values are provisional IDs.
is_provisional(c(10, 11, 3.5, 9))is_provisional(c(10, 11, 3.5, 9))
Checks whether a given HTTP method is valid based on a predefined list of accepted methods (matches on lowercase).
is_valid_http_method(m)is_valid_http_method(m)
m |
A string representing the HTTP method to validate. |
A logical value: TRUE if m is a valid HTTP method,
otherwise FALSE.
is_valid_http_method("GET") # TRUE is_valid_http_method("get") # TRUE is_valid_http_method("DELETE") # TRUE is_valid_http_method("foo") # FALSE is_valid_http_method(NULL) # FALSEis_valid_http_method("GET") # TRUE is_valid_http_method("get") # TRUE is_valid_http_method("DELETE") # TRUE is_valid_http_method("foo") # FALSE is_valid_http_method(NULL) # FALSE
Convert Prettified Labels to API Names
label_to_api_name(x, req_str = "* ")label_to_api_name(x, req_str = "* ")
x |
Character vector. Prettified labels to convert. |
req_str |
Character. Optional prefix to remove from labels. Defaults to
|
A character vector with labels converted to API-style names.
label_to_api_name(c("* First Name", "Last Name")) label_to_api_name(c("# Instrument Model", "# Fragment SD"), req_str = "# ")label_to_api_name(c("* First Name", "Last Name")) label_to_api_name(c("# Instrument Model", "# Fragment SD"), req_str = "# ")
Data frames representing metadata sheets that contain column names
corresponding to sheet_name containing an ID reference (to a first column
in sheet_name) will be replaced with the rest of the values nested as a
list.
link_sheet(metadata, sheet_name)link_sheet(metadata, sheet_name)
metadata |
List. A list of data frames representing metadata sheets. |
sheet_name |
Character. The name of the sheet to link with other sheets. |
A list of updated metadata with the specified values replaced based
on referenced values present in sheet_name.
# Link data from a specific sheet to other sheets in metadata metadata <- list( sheet1 = data.frame(id = c(1, 2), linked_sheet = c("A", "B")), linked_sheet = data.frame(id = c("A", "B"), value = c(10, 20)) ) updated_metadata <- link_sheet(metadata, "linked_sheet")# Link data from a specific sheet to other sheets in metadata metadata <- list( sheet1 = data.frame(id = c(1, 2), linked_sheet = c("A", "B")), linked_sheet = data.frame(id = c("A", "B"), value = c(10, 20)) ) updated_metadata <- link_sheet(metadata, "linked_sheet")
This function adds a new column to a data frame by mapping values from an existing column through a lookup table.
lut_add(df, to, from, lut)lut_add(df, to, from, lut)
df |
A data frame to which the new column will be added. |
to |
A string specifying the name of the new column. |
from |
A string specifying the name of the column to map values from. |
lut |
A named list or vector serving as the lookup table. |
The input data frame with the added column.
df <- data.frame(id = c("A", "B", "C"), stringsAsFactors = FALSE) lut <- list(A = 1, B = 2, C = 3) lut_add(df, "value", "id", lut)df <- data.frame(id = c("A", "B", "C"), stringsAsFactors = FALSE) lut <- list(A = 1, B = 2, C = 3) lut_add(df, "value", "id", lut)
Merges a target column with a source column in a linked sheet's data, processing it into a format suitable for JSON parsing. Includes API-specific adjustments for certain data.
merge_linked_sheet(target, source, dat, sheet)merge_linked_sheet(target, source, dat, sheet)
target |
A vector containing the target values. |
source |
A string specifying the source column to merge on. |
dat |
A data frame representing the data to be linked. |
sheet |
A string specifying the name of the sheet, used for API-specific processing. |
A data frame containing the merged data, or an empty list if the
target is entirely NA.
target <- c(1, 2, 3) source <- "id" dat <- data.frame(id = c(1, 2, 3), value = c("A", "B", "C")) merge_linked_sheet(target, source, dat, "collaborators")target <- c(1, 2, 3) source <- "id" dat <- data.frame(id = c(1, 2, 3), value = c("A", "B", "C")) merge_linked_sheet(target, source, dat, "collaborators")
Adds multiple columns to a data frame by applying multiple lookup tables, each defined by a set of arguments specifying the new column, the source column, and the lookup table.
multi_lut_add(df, ...)multi_lut_add(df, ...)
df |
A data frame to which new columns will be added. |
... |
A series of lists, each containing three elements: the name of
the new column ( |
The input data frame with the added columns.
df <- data.frame(id = c("A", "B", "C"), stringsAsFactors = FALSE) lut1 <- list(A = 1, B = 2, C = 3) lut2 <- list(A = "x", B = "y", C = "z") multi_lut_add(df, list("value1", "id", lut1), list("value2", "id", lut2))df <- data.frame(id = c("A", "B", "C"), stringsAsFactors = FALSE) lut1 <- list(A = 1, B = 2, C = 3) lut2 <- list(A = "x", B = "y", C = "z") multi_lut_add(df, list("value1", "id", lut1), list("value2", "id", lut2))
Replaces NA values in a list with empty lists, preserving the original
structure of the list. Doesn't work on nested lists.
na_to_empty_list(l)na_to_empty_list(l)
l |
A list containing elements that may include |
A list where any NA values have been replaced with empty
lists.
input_list <- list(1, NA, "text", NA) na_to_empty_list(input_list)input_list <- list(1, NA, "text", NA) na_to_empty_list(input_list)
This function creates a new submission and associates all specified data with it. Following data has to be present in the request data object: submission studies, experiments, samples, runs, analyses, datasets. The files associated with the submission must be present in the EGA Inbox and they are fetched and matched according to Inbox path. In case the submission is interrupted or fails, all the information entered into EGA database is rolled back apart from the submission itself. If the workflow successfully creates a submission, but fails in the following steps, the returned submission ID can be used as a parameter to the workflow to continue entering data into existing submission. If logfile is specified, the responses from successfully executed steps (even if the error occurs), will be saved.
new_submission( dat, client = NULL, logfile = NULL, submission_id = NULL, retrieve = FALSE, ... )new_submission( dat, client = NULL, logfile = NULL, submission_id = NULL, retrieve = FALSE, ... )
dat |
List of data frames. Parsed submission metadata containing correctly formatted and linked information for submission |
client |
List of functions. EGA API client created by |
logfile |
Character. Path of log file to log the |
submission_id |
Integer. |
retrieve |
Logical. |
... |
List. Additional arguments to the function. |
List of data frames. Parsed response objects from httr2 requests
minimal_metadata <- list( aliases = list( studies = "Study1", experiments = "Experiment1", datasets = "Dataset1", samples = "Sample1", runs = "Run1", analyses = "Analysis1" ), files = tibble::tibble( file = "raw.fastq.gz", ega_file = list("raw.fastq.gz.c4gh") ), submission = tibble::tibble(title = "Submission"), studies = tibble::tibble( study = "Study1", title = "Study Title", description = "Study Description", study_type = "Whole Genome Sequencing" ), samples = tibble::tibble( alias = "Sample1", phenotype = "wild-type", biological_sex = "female", subject_id = "ID1" ), experiments = tibble::tibble( study = "Study1", experiment = "Experiment1", design_description = "Experiment Design", library_selection = "RANDOM", instrument_model_id = 1L, library_layout = "SINGLE", library_strategy = "WGS", library_source = "GENOMIC" ), runs = tibble::tibble( run = "Run1", experiment = "Experiment1", run_file_type = "srf", alias = "Sample1", files = list("raw.fastq.gz.c4gh") ), datasets = tibble::tibble( dataset = "Dataset1", title = "Dataset Title", description = "Dataset Description", policy_accession_id = "EGAP00000000001", dataset_types = list("Whole genome sequencing"), runs = list("Run1") ) ) ega <- create_client(extract_api(), verbosity = 0) # Requires credentials try( new_submission(minimal_metadata, ega) )minimal_metadata <- list( aliases = list( studies = "Study1", experiments = "Experiment1", datasets = "Dataset1", samples = "Sample1", runs = "Run1", analyses = "Analysis1" ), files = tibble::tibble( file = "raw.fastq.gz", ega_file = list("raw.fastq.gz.c4gh") ), submission = tibble::tibble(title = "Submission"), studies = tibble::tibble( study = "Study1", title = "Study Title", description = "Study Description", study_type = "Whole Genome Sequencing" ), samples = tibble::tibble( alias = "Sample1", phenotype = "wild-type", biological_sex = "female", subject_id = "ID1" ), experiments = tibble::tibble( study = "Study1", experiment = "Experiment1", design_description = "Experiment Design", library_selection = "RANDOM", instrument_model_id = 1L, library_layout = "SINGLE", library_strategy = "WGS", library_source = "GENOMIC" ), runs = tibble::tibble( run = "Run1", experiment = "Experiment1", run_file_type = "srf", alias = "Sample1", files = list("raw.fastq.gz.c4gh") ), datasets = tibble::tibble( dataset = "Dataset1", title = "Dataset Title", description = "Dataset Description", policy_accession_id = "EGAP00000000001", dataset_types = list("Whole genome sequencing"), runs = list("Run1") ) ) ega <- create_client(extract_api(), verbosity = 0) # Requires credentials try( new_submission(minimal_metadata, ega) )
Parses the body of a body of httr2 response object from the EGA API,
handling JSON and plain text content, and formats it into a tibble for
further processing.
parse_ega_body(resp)parse_ega_body(resp)
resp |
An HTTP response object from the EGA API. |
A tibble containing the parsed and formatted response data. If the response is plain text without a JSON-like structure, a one-column tibble is returned with the raw content.
# Example with JSON response json_resp <- httr2::response( method = "GET", url = "https://www.example.com/api/files", status = 200, headers = list("content-type" = "application/json"), body = charToRaw('[{"id": 1, "name": "test"}]') ) parse_ega_body(json_resp) # Example with plain text response text_resp <- httr2::response( method = "POST", url = "https://www.example.com/api/submissions", status = 200, headers = list("content-type" = "text/plain"), body = charToRaw("Sample response text") ) parse_ega_body(text_resp)# Example with JSON response json_resp <- httr2::response( method = "GET", url = "https://www.example.com/api/files", status = 200, headers = list("content-type" = "application/json"), body = charToRaw('[{"id": 1, "name": "test"}]') ) parse_ega_body(json_resp) # Example with plain text response text_resp <- httr2::response( method = "POST", url = "https://www.example.com/api/submissions", status = 200, headers = list("content-type" = "text/plain"), body = charToRaw("Sample response text") ) parse_ega_body(text_resp)
This function parses an enum, represented as a data frame or character vector, into a formatted string for display or further use.
parse_enum(enum, sep = "--")parse_enum(enum, sep = "--")
enum |
Data frame or character vector. The enum to parse. If a data frame, its rows are concatenated into strings. If a character vector, its elements are joined with newlines. |
sep |
Character. If enum has multiple fields, they will be pasted into a
single string using this separator. Defaults to |
A single string representing the parsed enum. Rows are joined by
newlines and multiple enum fields are joined by sep.
# Parse an enum as a data frame df_enum <- data.frame(key = c("A", "B"), value = c("1", "2")) parse_enum(df_enum) # Parse an enum as a character vector vec_enum <- c("A", "B", "C") parse_enum(vec_enum)# Parse an enum as a data frame df_enum <- data.frame(key = c("A", "B"), value = c("1", "2")) parse_enum(df_enum) # Parse an enum as a character vector vec_enum <- c("A", "B", "C") parse_enum(vec_enum)
Extracts the JSON body from a response and ensures the output is structured as a list of objects. Named lists (single records) are wrapped in a parent list to maintain consistency for downstream unnesting.
parse_json_body(resp)parse_json_body(resp)
resp |
An |
A list of lists, where each inner list represents a record.
json_resp <- httr2::response( method = "GET", url = "https://www.example.com/api/files", status = 200, headers = list("content-type" = "application/json"), body = charToRaw('[{"id": 1, "name": "test"}]') ) parse_json_body(json_resp)json_resp <- httr2::response( method = "GET", url = "https://www.example.com/api/files", status = 200, headers = list("content-type" = "application/json"), body = charToRaw('[{"id": 1, "name": "test"}]') ) parse_json_body(json_resp)
Processes a text response by either parsing it as JSON (if structured with curly braces or square brackets) or returning it as a list. Null JSON elements are converted to empty lists to facilitate unnesting.
parse_text_body(resp)parse_text_body(resp)
resp |
An |
A list of parsed data or a tibble if the content is raw text.
text_resp <- httr2::response( method = "POST", url = "https://www.example.com/api/submissions", status = 200, headers = list("content-type" = "text/plain"), body = charToRaw("Sample response text") ) parse_text_body(text_resp)text_resp <- httr2::response( method = "POST", url = "https://www.example.com/api/submissions", status = 200, headers = list("content-type" = "text/plain"), body = charToRaw("Sample response text") ) parse_text_body(text_resp)
This function processes chromosome data by extracting unique chromosome IDs and labels or retrieving chromosome group information from a lookup where applicable.
process_chromosomes(chr_data, select_input_data)process_chromosomes(chr_data, select_input_data)
chr_data |
A list containing chromosome-related information. Expected to
have items |
select_input_data |
A list containing look-up data for |
A data frame with chromosome id and label if chromosomes are
present. If only chromosome groups exist, returns the result of lookup
against the select_input_data with get_chr_group(). If neither are
present, returns an empty list.
select_input_data <- list( chromosomes = c("group1--1--chr1--name1", "group1--2--chr2--name2") ) chr_data_1 <- list( chromosomes = list("group1--1--chr1--name1", "group2--3--chr3--name3"), chromosome_groups = NA_character_ ) process_chromosomes(chr_data_1, select_input_data) chr_data_2 <- list( chromosomes = NA, chromosome_groups = "group1" ) process_chromosomes(chr_data_2, select_input_data)select_input_data <- list( chromosomes = c("group1--1--chr1--name1", "group1--2--chr2--name2") ) chr_data_1 <- list( chromosomes = list("group1--1--chr1--name1", "group2--3--chr3--name3"), chromosome_groups = NA_character_ ) process_chromosomes(chr_data_1, select_input_data) chr_data_2 <- list( chromosomes = NA, chromosome_groups = "group1" ) process_chromosomes(chr_data_2, select_input_data)
The specified column name is searched for across all the data frames. If the
column is pubmed_ids, values are converted to integers.
process_delimited_column( metadata, column_name, separator, converters = DELIM_CONVERTERS )process_delimited_column( metadata, column_name, separator, converters = DELIM_CONVERTERS )
metadata |
List. A list of data frames representing metadata sheets. |
column_name |
Character. The name of the column to process. |
separator |
Character. The delimiter used to split column values. |
converters |
Named list of functions. Specifies how to coerce columns
with delimited strings into specific types or values. Defaults to
|
A list of updated metadata with the specified column split into lists based on the delimiter and trimmed.
metadata <- list( sheet1 = data.frame(pubmed_ids = c("123; 456", "130; 789; 102", NA)) ) process_delimited_column(metadata, "pubmed_ids", ";")metadata <- list( sheet1 = data.frame(pubmed_ids = c("123; 456", "130; 789; 102", NA)) ) process_delimited_column(metadata, "pubmed_ids", ";")
Rolls back specified endpoints for a submission identified by its accession ID using the client and logs the responses if a logfile is specified.
rollback_submission(id, endpoints, client = NULL, logfile = NULL, ...)rollback_submission(id, endpoints, client = NULL, logfile = NULL, ...)
id |
A string representing the submission identifier. Must be an accession ID. |
endpoints |
A character vector of endpoint names to rollback. |
client |
List of functions. EGA API client created by |
logfile |
A string specifying the path to a log file. If |
... |
Additional arguments for future extensions (currently unused). |
A list of responses from the rollback operations for each endpoint.
mock_client <- list( "put__submissions__accession_id__datasets__rollback" = function(id) list(status = "rolled back") ) rollback_submission("EGAB00000000001", list("datasets"), mock_client)mock_client <- list( "put__submissions__accession_id__datasets__rollback" = function(id) list(status = "rolled back") ) rollback_submission("EGAB00000000001", list("datasets"), mock_client)
Format a Row Table
row_table_formatter(tab, params)row_table_formatter(tab, params)
tab |
Data frame. The input table from a submission metadata file |
params |
List. Additional parameters for formatting. Takes a formatter params value from parser parameter yaml file. |
A cleaned and formatted tibble with correctly organized rows and columns, whitespace trimmed, and folding applied to specified columns.
# Formatter parameters params <- list(fold = "extra_attributes") # Sample data frame df <- data.frame( ...1 = c("* Study", "* Title", "Extra Attributes", "Extra Attributes"), ...2 = c("Study1", "Title1", "A", "B"), ...3 = c("* Study", "* Title", "Extra Attributes", NA), ...4 = c("Study2", "Title2", "C", NA) ) row_table_formatter(df, params)# Formatter parameters params <- list(fold = "extra_attributes") # Sample data frame df <- data.frame( ...1 = c("* Study", "* Title", "Extra Attributes", "Extra Attributes"), ...2 = c("Study1", "Title1", "A", "B"), ...3 = c("* Study", "* Title", "Extra Attributes", NA), ...4 = c("Study2", "Title2", "C", NA) ) row_table_formatter(df, params)
Validates uniqueness of sample aliases by comparing input against existing records in the EGA database. Throws an error if duplicates are found and retrieval is not enabled.
samples_in_db(samples, client = NULL, retrieve = FALSE)samples_in_db(samples, client = NULL, retrieve = FALSE)
samples |
Character vector of sample aliases to check. |
client |
An EGA API client object. If NULL, one is created. |
retrieve |
Logical scalar. If TRUE, exists without error even if samples are found in the database. |
Logical TRUE if validation passes.
my_client <- list( get__samples = function(prefix = NULL) { data.frame(alias = c("unique_sample1", "unique_sample_2")) } ) samples_in_db(c("sample1", "sample2"), client = my_client, retrieve = FALSE)my_client <- list( get__samples = function(prefix = NULL) { data.frame(alias = c("unique_sample1", "unique_sample_2")) } ) samples_in_db(c("sample1", "sample2"), client = my_client, retrieve = FALSE)
This function saves a list of API responses to a specified log file in YAML format.
save_log(responses, logfile)save_log(responses, logfile)
responses |
A list of responses to be saved. |
logfile |
A string specifying the path to the log file. If |
Invisibly returns NULL
responses <- list(status = "success", data = list(a = 1, b = "text")) save_log(responses, logfile = NULL)responses <- list(status = "success", data = list(a = 1, b = "text")) save_log(responses, logfile = NULL)
Creates a closure function to display sequential progress messages for a specified number of steps.
step_msg(steps)step_msg(steps)
steps |
An integer specifying the total number of steps. |
A function that takes a message string as input and displays it along with the current step and total steps. The step count increments automatically with each call.
stepper <- step_msg(3) stepper("Initializing") # "Step 1/3 - Initializing" stepper("Processing") # "Step 2/3 - Processing" stepper("Finalizing") # "Step 3/3 - Finalizing"stepper <- step_msg(3) stepper("Initializing") # "Step 1/3 - Initializing" stepper("Processing") # "Step 2/3 - Processing" stepper("Finalizing") # "Step 3/3 - Finalizing"
This function iterates over rows of a data frame, submitting each row to a specified API endpoint function, and combines the responses into a single data structure.
submit_table(tab, id, endpoint_func)submit_table(tab, id, endpoint_func)
tab |
A data frame containing the data to be submitted. |
id |
An EGA accession/provisional ID passed to the |
endpoint_func |
A function that handles the API request. It should
accept |
Data frame. A combined response object from the API.
tab <- data.frame(a = 1:2, b = c("x", "y")) mock_endpoint <- function(id, body) list(id = id, body = body) submit_table(tab, 12345, mock_endpoint)tab <- data.frame(a = 1:2, b = c("x", "y")) mock_endpoint <- function(id, body) list(id = id, body = body) submit_table(tab, 12345, mock_endpoint)
Wraps a logic function in a tryCatch block to handle errors during a specific submission step, optionally triggering a rollback function.
try_step(step_name, logic_fn, rollback_fn, responses, logfile)try_step(step_name, logic_fn, rollback_fn, responses, logfile)
step_name |
Character. The name of the current workflow step. |
logic_fn |
Function. The primary logic to execute for this step. |
rollback_fn |
Function. A function to clean up if an error occurs. |
responses |
List. Current collection of API responses for logging. |
logfile |
Character. Path to the log file. |
The result of logic_fn().
try_step( "test", function() 1 + 1, function() print("fail"), list(), "log.txt" )try_step( "test", function() 1 + 1, function() print("fail"), list(), "log.txt" )
Converts a list into a single-row data frame with unboxed elements if all elements have a length of 1. Otherwise, an error is raised.
unbox_list(l)unbox_list(l)
l |
A list where all elements must have a length of 1. |
A data frame with unboxed elements, suitable for JSON conversion.
input_list <- list(a = 1, b = "text", c = TRUE) unbox_list(input_list)input_list <- list(a = 1, b = "text", c = TRUE) unbox_list(input_list)
This function converts a single row of a data frame into an unboxed JSON object, effectively removing the array structure.
unbox_row(row)unbox_row(row)
row |
A single row of a data frame. |
A JSON object with unboxed values for the input row.
row <- data.frame(a = 1, b = "text", stringsAsFactors = FALSE)[1, ] unbox_row(row)row <- data.frame(a = 1, b = "text", stringsAsFactors = FALSE)[1, ] unbox_row(row)
Handles retrieval or deletion of data associated with a submission accession/provisional ID using a specified client and method.
use_submission(id, method, client = NULL)use_submission(id, method, client = NULL)
id |
Character or numeric. Represents the submission identifier. Can be either an accession or provisional ID. |
method |
A string specifying the operation to perform. Valid options are "get" or "delete". |
client |
List of functions. EGA API client created by |
A named list containing responses for datasets, analyses, runs, experiments, samples, and studies.
mock_client <- list( "get__submissions__accession_id__datasets" = function(id) { list(data = id) }, "delete__submissions__provisional_id__datasets" = function(id) list(status = "deleted") ) use_submission("EGAB12345678901", "get", mock_client)mock_client <- list( "get__submissions__accession_id__datasets" = function(id) { list(data = id) }, "delete__submissions__provisional_id__datasets" = function(id) list(status = "deleted") ) use_submission("EGAB12345678901", "get", mock_client)
Function handles oneOf directives in a way that it in a case of validation
fail, it displays the overall result of the validation as first and then it
tests separately against all oneOf sub schemas.
validate_schema(payload, schema)validate_schema(payload, schema)
payload |
The payload to validate against the schema. JSON string or
single row of data frame converted to JSON representation with |
schema |
List. The JSON schema defining the validation rules. |
Logical value indicating whether the payload is valid. If invalid,
the result includes an errors attribute detailing the validation errors.
schema <- list( type = "object", properties = list( id = list(type = "integer"), title = list(type = "string") ), required = c("id") ) payload_true <- data.frame(id = c(12345), title = c("abcd")) payload_false <- data.frame(id = c("12345"), title = c(0.355)) validate_schema(jsonlite::unbox(payload_true), schema) validate_schema(jsonlite::unbox(payload_false), schema)schema <- list( type = "object", properties = list( id = list(type = "integer"), title = list(type = "string") ), required = c("id") ) payload_true <- data.frame(id = c(12345), title = c("abcd")) payload_false <- data.frame(id = c("12345"), title = c(0.355)) validate_schema(jsonlite::unbox(payload_true), schema) validate_schema(jsonlite::unbox(payload_false), schema)
Convert Validation Results to a Message
validation_to_msg(v)validation_to_msg(v)
v |
Logical. The validation result, which may include an |
A character string summarizing the validation results. If validation errors are present, they are included in the message; otherwise, a success message is returned.
validation_result <- FALSE attr(validation_result, "errors") <- data.frame( field = c("name"), message = c("Missing") ) msg <- validation_to_msg(validation_result) message(msg)validation_result <- FALSE attr(validation_result, "errors") <- data.frame( field = c("name"), message = c("Missing") ) msg <- validation_to_msg(validation_result) message(msg)
Creates a custom error handler for managing errors during a workflow step. Logs responses, executes additional expressions, and stops execution with a detailed message and a stack trace.
workflow_error_handler(step, responses, logfile, ...)workflow_error_handler(step, responses, logfile, ...)
step |
A string representing the current workflow step. |
responses |
A list of responses to be logged in case of an error. |
logfile |
A string specifying the path to the log file. If |
... |
Additional expressions to evaluate when an error occurs. |
A function to handle errors during the specified workflow step.
handler <- workflow_error_handler( step = "submission", responses = list(), logfile = NULL ) tryCatch("Example code without error", error = handler)handler <- workflow_error_handler( step = "submission", responses = list(), logfile = NULL ) tryCatch("Example code without error", error = handler)