Title: | GenoMetric Query Language for R/Bioconductor |
---|---|
Description: | This package brings the GenoMetric Query Language (GMQL) functionalities into the R environment. GMQL is a high-level, declarative language to manage heterogeneous genomic datasets for biomedical purposes, using simple queries to process genomic regions and their metadata and properties. GMQL adopts algorithms efficiently designed for big data using cloud-computing technologies (like Apache Hadoop and Spark) allowing GMQL to run on modern infrastructures, in order to achieve scalability and high performance. It allows to create, manipulate and extract genomic data from different data sources both locally and remotely. Our RGMQL functions allow complex queries and processing leveraging on the R idiomatic paradigm. The RGMQL package also provides a rich set of ancillary classes that allow sophisticated input/output management and sorting, such as: ASC, DESC, BAG, MIN, MAX, SUM, AVG, MEDIAN, STD, Q1, Q2, Q3 (and many others). Note that many RGMQL functions are not directly executed in R environment, but are deferred until real execution is issued. |
Authors: | Simone Pallotta [aut, cre], Marco Masseroli [aut] |
Maintainer: | Simone Pallotta <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.27.1 |
Built: | 2024-12-21 03:30:22 UTC |
Source: | https://github.com/bioc/RGMQL |
Wrapper to GMQL MERGE operator
It builds a dataset consisting of a single sample having as many regions as the number of regions of all the input dataset samples and as many metadata as the union of the 'attribute-value' tuples of the input samples. If groupBy is specified, the samples are then partitioned in groups, each with a distinct value of the grouping metadata attributes. The operation is separately applied to each group, yielding one sample in the result for each group. Samples whose metadata are not present in the grouping metadata parameter are disregarded.
## S4 method for signature 'GMQLDataset' aggregate(x, groupBy = conds())
## S4 method for signature 'GMQLDataset' aggregate(x, groupBy = conds())
x |
GMQLDataset class object |
groupBy |
|
GMQLDataset object. It contains the value to use as input for the subsequent GMQLDataset method
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such file as a GMQL dataset named "exp" ## using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") exp = read_gmql(test_path) ## This statement creates a dataset called merged which contains one ## sample for each antibody_target and cell value found within the metadata ## of the exp dataset sample; each created sample contains all regions ## from all 'exp' samples with a specific value for their ## antibody_target and cell metadata attributes. merged = aggregate(exp, conds(c("antibody_target", "cell")))
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such file as a GMQL dataset named "exp" ## using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") exp = read_gmql(test_path) ## This statement creates a dataset called merged which contains one ## sample for each antibody_target and cell value found within the metadata ## of the exp dataset sample; each created sample contains all regions ## from all 'exp' samples with a specific value for their ## antibody_target and cell metadata attributes. merged = aggregate(exp, conds(c("antibody_target", "cell")))
This class constructor is used to create instances of AGGREGATES object, to be used in GMQL functions that require aggregate on value.
SUM(value) COUNT() COUNTSAMP() MIN(value) MAX(value) AVG(value) MEDIAN(value) STD(value) BAG(value) BAGD(value) Q1(value) Q2(value) Q3(value)
SUM(value) COUNT() COUNTSAMP() MIN(value) MAX(value) AVG(value) MEDIAN(value) STD(value) BAG(value) BAGD(value) Q1(value) Q2(value) Q3(value)
value |
string identifying name of metadata or region attribute |
SUM: It prepares input parameter to be passed to the library function sum, performing all the type conversions needed
COUNT: It prepares input parameter to be passed to the library function count, performing all the type conversions needed
COUNTSAMP: It prepares input parameter to be passed to the library function countsamp, performing all the type conversions needed. It is used only with group_by functions
MIN: It prepares input parameter to be passed to the library function minimum, performing all the type conversions needed
MAX: It prepares input parameter to be passed to the library function maximum, performing all the type conversions needed
AVG: It prepares input parameter to be passed to the library function mean, performing all the type conversions needed
MEDIAN: It prepares input parameter to be passed to the library function median, performing all the type conversions needed
STD: It prepares input parameter to be passed to the library function standard deviation, performing all the type conversions needed
BAG: It prepares input parameter to be passed to the library function bag; this function creates comma-separated strings of attribute values, performing all the type conversions needed
BAGD: It prepares input parameter to be passed to the library function bagd; this function creates comma-separated strings of distinct attribute values, performing all the type conversions needed
Q1: It prepares input parameter to be passed to the library function fist quartile, performing all the type conversions needed
Q2: It prepares input parameter to be passed to the library function second quartile, performing all the type conversions needed
Q3: It prepares input parameter to be passed to the library function third quartile, performing all the type conversions needed
Aggregate object
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such folder as a GMQL dataset ## named "exp" using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") exp = read_gmql(test_path) ## This statement copies all samples of exp dataset into res dataset, and ## then calculates new metadata attribute sum_score for each of them: ## sum_score is the sum of score values of the sample regions. res = extend(exp, sum_score = SUM("score")) ## This statement copies all samples of exp dataset into res dataset, ## and then calculates new metadata attribute min_pvalue for each of them: ## min_pvalue is the minimum pvalue of the sample regions. res = extend(exp, min_pvalue = MIN("pvalue")) ## This statement copies all samples of exp dataset into res dataset, ## and then calculates new metadata attribute max_score for each of them: ## max_score is the maximum score of the sample regions. res = extend(exp, max_score = MAX("score")) ## The following cover operation produces output regions where at least 2 ## and at most 3 regions of exp dataset overlap, having as resulting region ## attribute the average signal of the overlapping regions; ## the result has one sample for each input cell value. res = cover(exp, 2, 3, groupBy = conds("cell"), avg_signal = AVG("signal")) ## This statement copies all samples of 'exp' dataset into 'out' dataset, ## and then for each of them it adds another metadata attribute, allScore, ## which is the aggregation comma-separated list of all the values ## that the region attribute score takes in the sample. out = extend(exp, allScore = BAG("score")) ## This statement counts the regions in each sample and stores their number ## as value of the new metadata RegionCount attribute of the sample. out = extend(exp, RegionCount = COUNT()) ## This statement copies all samples of exp dataset into res dataset, ## and then calculates new metadata attribute std_score for each of them: ## std_score is the standard deviation of the score values of the sample ## regions. res = extend(exp, std_score = STD("score")) ## This statement copies all samples of exp dataset into res dataset, ## and then calculates new metadata attribute m_score for each of them: ## m_score is the median score of the sample regions. res = extend(exp, m_score = MEDIAN("score"))
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such folder as a GMQL dataset ## named "exp" using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") exp = read_gmql(test_path) ## This statement copies all samples of exp dataset into res dataset, and ## then calculates new metadata attribute sum_score for each of them: ## sum_score is the sum of score values of the sample regions. res = extend(exp, sum_score = SUM("score")) ## This statement copies all samples of exp dataset into res dataset, ## and then calculates new metadata attribute min_pvalue for each of them: ## min_pvalue is the minimum pvalue of the sample regions. res = extend(exp, min_pvalue = MIN("pvalue")) ## This statement copies all samples of exp dataset into res dataset, ## and then calculates new metadata attribute max_score for each of them: ## max_score is the maximum score of the sample regions. res = extend(exp, max_score = MAX("score")) ## The following cover operation produces output regions where at least 2 ## and at most 3 regions of exp dataset overlap, having as resulting region ## attribute the average signal of the overlapping regions; ## the result has one sample for each input cell value. res = cover(exp, 2, 3, groupBy = conds("cell"), avg_signal = AVG("signal")) ## This statement copies all samples of 'exp' dataset into 'out' dataset, ## and then for each of them it adds another metadata attribute, allScore, ## which is the aggregation comma-separated list of all the values ## that the region attribute score takes in the sample. out = extend(exp, allScore = BAG("score")) ## This statement counts the regions in each sample and stores their number ## as value of the new metadata RegionCount attribute of the sample. out = extend(exp, RegionCount = COUNT()) ## This statement copies all samples of exp dataset into res dataset, ## and then calculates new metadata attribute std_score for each of them: ## std_score is the standard deviation of the score values of the sample ## regions. res = extend(exp, std_score = STD("score")) ## This statement copies all samples of exp dataset into res dataset, ## and then calculates new metadata attribute m_score for each of them: ## m_score is the median score of the sample regions. res = extend(exp, m_score = MEDIAN("score"))
Wrapper to GMQL ORDER operator
It is used to order either samples or sample regions or both, according to a set of metadata and/or region attributes. Order can be specified as ascending / descending for every attribute. The number of samples and their regions remain the same (unless fetching options are specified), as well as their attributes, but a new ordering metadata and/or region attribute is added. Sorted samples or regions have a new attribute "_order", added to their metadata, or "order" added to their regions, or to both of them as specified in input.
## S4 method for signature 'GMQLDataset' arrange( .data, metadata_ordering = NULL, regions_ordering = NULL, fetch_opt = "", num_fetch = 0L, reg_fetch_opt = "", reg_num_fetch = 0L, .by_group = NULL )
## S4 method for signature 'GMQLDataset' arrange( .data, metadata_ordering = NULL, regions_ordering = NULL, fetch_opt = "", num_fetch = 0L, reg_fetch_opt = "", reg_num_fetch = 0L, .by_group = NULL )
.data |
GMQLDataset class object |
metadata_ordering |
list of ordering functions containing name of
metadata attribute.
The functions available are: |
regions_ordering |
list of ordering functions containing name of
region attribute.
The functions available are: |
fetch_opt |
string indicating the option used to fetch the first k samples; it can assume the values:
if NULL, num_fetch is not considered |
num_fetch |
integer value identifying the number of samples to fetch; by default it is 0, that means all samples are fetched |
reg_fetch_opt |
string indicating the option used to fetch the first k regions; it can assume the values:
if NULL, reg_num_fetch is not considered |
reg_num_fetch |
integer value identifying the number of regions to fetch; by default it is 0, that means all regions are fetched |
.by_group |
parameters inherited, unused with GMQLDateset data object |
GMQLDataset object. It contains the value to use as input for the subsequent GMQLDataset method
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such file as a GMQL dataset named ## "data" using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") data = read_gmql(test_path) ## The following statement orders the samples according to the Region_Count ## metadata attribute and takes the two samples that have the highest count. o = arrange(data, list(ASC("Region_Count")), fetch_opt = "mtop", num_fetch = 2)
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such file as a GMQL dataset named ## "data" using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") data = read_gmql(test_path) ## The following statement orders the samples according to the Region_Count ## metadata attribute and takes the two samples that have the highest count. o = arrange(data, list(ASC("Region_Count")), fetch_opt = "mtop", num_fetch = 2)
Wrapper to GMQL MATERIALIZE operator
It saves the content of a dataset that contains samples metadata and regions. It is normally used to persist the content of any dataset generated during a GMQL query. Any dataset can be materialized, but the operation can be time-consuming. For best performance, materialize the relevant data only.
## S4 method for signature 'GMQLDataset' collect(x, name = "ds1", dir_out = getwd())
## S4 method for signature 'GMQLDataset' collect(x, name = "ds1", dir_out = getwd())
x |
GMQLDataset class object |
name |
name of the result dataset. By default it is the string "ds1" |
dir_out |
destination folder path. By default it is the current working directory of the R process |
An error occures if the directory already exist at the destination folder path
None
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such file as a GMQL dataset named ## "data" using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") data = read_gmql(test_path) ## The following statement materializes the dataset 'data', previoulsy read, ## at the specific destination test_path into local folder "ds1" opportunely ## created collect(data, dir_out = test_path)
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such file as a GMQL dataset named ## "data" using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") data = read_gmql(test_path) ## The following statement materializes the dataset 'data', previoulsy read, ## at the specific destination test_path into local folder "ds1" opportunely ## created collect(data, dir_out = test_path)
It compiles a GMQL query taken from file or inserted as text string, using the proper GMQL web service available on a remote server
compile_query(url, query) compile_query_fromfile(url, filePath)
compile_query(url, query) compile_query_fromfile(url, filePath)
url |
string url of server: It must contain the server address and base url; service name is added automatically |
query |
string text of a GMQL query |
filePath |
string path of txt file containing a GMQL query |
None
## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## This statement gets the query as text string and runs the compile ## web service compile_query(remote_url, "DATASET = SELECT() Example_Dataset_1; MATERIALIZE DATASET INTO RESULT_DS;") ## This statement defines the path to the file "query1.txt" in the ## subdirectory "example" of the package "RGMQL" and run the compile ## web service test_path <- system.file("example", package = "RGMQL") test_query <- file.path(test_path, "query1.txt") compile_query_fromfile(remote_url, test_query) ## Logout from GMQL REST services suite logout_gmql(remote_url)
## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## This statement gets the query as text string and runs the compile ## web service compile_query(remote_url, "DATASET = SELECT() Example_Dataset_1; MATERIALIZE DATASET INTO RESULT_DS;") ## This statement defines the path to the file "query1.txt" in the ## subdirectory "example" of the package "RGMQL" and run the compile ## web service test_path <- system.file("example", package = "RGMQL") test_query <- file.path(test_path, "query1.txt") compile_query_fromfile(remote_url, test_query) ## Logout from GMQL REST services suite logout_gmql(remote_url)
Wrapper to GMQL COVER operator
It takes as input a dataset containing one or more samples and returns another dataset (with a single sample, if no groupBy option is specified) by “collapsing” the input dataset samples and their regions according to certain rules specified by the input parameters. The attributes of the output genomic regions are only the region coordinates, and Jaccard indexes (JaccardIntersect and JaccardResult). Jaccard Indexes are standard measures of similarity of the contributing regions, added as default region attributes. The JaccardIntersect index is calculated as the ratio between the lengths of the intersection and of the union of the contributing regions; the JaccardResult index is calculated as the ratio between the lengths of the result and the union of the contributing regions. If aggregate functions are specified, a new region attribute is added for each aggregate function specified. Output metadata are the union of the input ones. If groupBy clause is specified, the input samples are partitioned in groups, each with distinct values of the grouping metadata attributes, and the cover operation is separately applied to each group, yielding to one sample in the result for each group. Input samples that do not satisfy the groupBy condition are disregarded.
cover(.data, ...) ## S4 method for signature 'GMQLDataset' cover(.data, min_acc, max_acc, groupBy = conds(), variation = "cover", ...)
cover(.data, ...) ## S4 method for signature 'GMQLDataset' cover(.data, min_acc, max_acc, groupBy = conds(), variation = "cover", ...)
.data |
GMQLDataset class object |
... |
a series of expressions separated by comma in the form
key = aggregate. The aggregate is an object of
class AGGREGATES. The aggregate functions available are:
"mixed style" is not allowed |
min_acc |
minimum number of overlapping regions to be considered during execution. It is an integer number, declared also as string. minAcc accepts also:
|
max_acc |
maximum number of overlapping regions to be considered during execution. It is an integer number, declared also as string. maxAcc accept also: |
groupBy |
|
variation |
string identifying the cover GMQL operator variation. The admissible strings are:
It can be all caps or lowercase |
GMQLDataset object. It contains the value to use as input for the subsequent GMQLDataset method
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such file as a GMQL dataset named "exp" ## using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") exp = read_gmql(test_path) ## The following statement produces an output dataset with a single output ## sample. The COVER operation considers all areas defined by a minimum ## of two overlapping regions in the input samples, up to any amount of ## overlapping regions. res = cover(exp, 2, ANY()) ## The following GMQL statement computes the result grouping the input ## exp samples by the values of their cell metadata attribute, ## thus one output res sample is generated for each cell value; ## output regions are produced where at least 2 and at most 3 regions ## of grouped exp samples overlap, setting as attributes of the resulting ## regions the minimum pvalue of the overlapping regions (min_pvalue) ## and their Jaccard indexes (JaccardIntersect and JaccardResult). res = cover(exp, 2, 3, groupBy = conds("cell"), min_pValue = MIN("pvalue"))
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such file as a GMQL dataset named "exp" ## using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") exp = read_gmql(test_path) ## The following statement produces an output dataset with a single output ## sample. The COVER operation considers all areas defined by a minimum ## of two overlapping regions in the input samples, up to any amount of ## overlapping regions. res = cover(exp, 2, ANY()) ## The following GMQL statement computes the result grouping the input ## exp samples by the values of their cell metadata attribute, ## thus one output res sample is generated for each cell value; ## output regions are produced where at least 2 and at most 3 regions ## of grouped exp samples overlap, setting as attributes of the resulting ## regions the minimum pvalue of the overlapping regions (min_pvalue) ## and their Jaccard indexes (JaccardIntersect and JaccardResult). res = cover(exp, 2, 3, groupBy = conds("cell"), min_pValue = MIN("pvalue"))
This class constructor is used to create instances of PARAM object to be used in GMQL cover method
ALL() ANY()
ALL() ANY()
ALL: It represents the number of samples in the input dataset.
ANY: It represents any amount of overlapping regions to be considered.
Param object
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the file "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such file as a GMQL dataset named "exp" ## using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") exp = read_gmql(test_path) ## The following statement produces an output dataset with a single ## output sample. The COVER operation considers all areas defined by ## a minimum of two overlapping regions in the input samples, ## up to maximum amount of overlapping regions equal to the number ## of input samples. res = cover(exp, 2, ALL()) ## The following statement produces an output dataset with a single ## output sample. The COVER operation considers all areas defined by ## a minimum of two overlapping regions in the input samples, ## up to any amount of overlapping regions. res = cover(exp, 2, ANY()) ## The following statement produces an output dataset with a single ## output sample. The COVER operation considers all areas defined by ## minimum of overlapping regions in the input samples equal to half of ## the number of input samples, up to any amount of overlapping regions. res = cover(exp, ALL()/2, ANY())
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the file "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such file as a GMQL dataset named "exp" ## using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") exp = read_gmql(test_path) ## The following statement produces an output dataset with a single ## output sample. The COVER operation considers all areas defined by ## a minimum of two overlapping regions in the input samples, ## up to maximum amount of overlapping regions equal to the number ## of input samples. res = cover(exp, 2, ALL()) ## The following statement produces an output dataset with a single ## output sample. The COVER operation considers all areas defined by ## a minimum of two overlapping regions in the input samples, ## up to any amount of overlapping regions. res = cover(exp, 2, ANY()) ## The following statement produces an output dataset with a single ## output sample. The COVER operation considers all areas defined by ## minimum of overlapping regions in the input samples equal to half of ## the number of input samples, up to any amount of overlapping regions. res = cover(exp, ALL()/2, ANY())
It deletes single private dataset specified by name from remote repository using the proper GMQL web service available on a remote server
delete_dataset(url, datasetName)
delete_dataset(url, datasetName)
url |
string url of server: It must contain the server address and base url; service name is added automatically |
datasetName |
string name of dataset to delete |
If no error occurs, it prints "Deleted Dataset", otherwise a specific error is printed
None
## Not run: ## This dataset does not exist remote_url <- "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) delete_dataset(remote_url, "test1_20170604_180908_RESULT_DS") ## End(Not run)
## Not run: ## This dataset does not exist remote_url <- "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) delete_dataset(remote_url, "test1_20170604_180908_RESULT_DS") ## End(Not run)
This class constructor is used to create instances of DISTAL object to be used in GMQL JOIN operations (RGMQL merge functions) that use genometric predicate parameter requiring distal condition on value
DL(value) DG(value) DLE(value) DGE(value) MD(value) UP() DOWN()
DL(value) DG(value) DLE(value) DGE(value) MD(value) UP() DOWN()
value |
string identifying distance between genomic regions in base pair |
DL: It denotes the less distance clause, which selects all the regions of a joined experiment dataset sample such that their distance from the anchor region of the joined reference dataset sample is less than 'value' bases.
DLE: It denotes the less equal distance clause, which selects all the regions of a joined experiment dataset sample such that their distance from the anchor region of the joined reference dataset sample is less than, or equal to, 'value' bases.
DG: It denotes the great distance clause, which selects all the regions of a joined experiment dataset sample such that their distance from the anchor region of the joined reference dataset sample is greater than 'value' bases.
DGE: It denotes the great equal distance clause, which selects all the regions of a joined experiment dataset sample such that their distance from the anchor region of the joined reference dataset sample is greater than, or equal to, 'value' bases.
MD: It denotes the minimum distance clause, which selects the first 'value' regions of the joined experiment at minimial distance from the anchor region of the joined reference dataset sample.
UP: It denotes the upstream direction of the genome. It makes predicates to be hold on the upstream of the regions of the joined reference dataset sample. UP is true when region of the joined experiment dataset sample is in the upstream genome of the anchor region of the joined reference dataset sample. When this clause is not present, distal conditions apply to both directions of the genome.
DOWN: It denotes the downstream direction of the genome. It makes predicates to be hold on the downstream of the regions of the joined reference dataset sample. DOWN is true when region of the joined experiment dataset sample is in the downstream genome of the anchor region of the joined reference dataset sample. When this clause is not present, distal conditions apply to both directions of the genome.
Distal object
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folders "DATASET" and "DATASET_GDM" in the subdirectory ## "example" of the package "RGMQL", and opens such folders as a GMQL ## datasets named "TSS" and "HM", respectively, using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") test_path2 <- system.file("example", "DATASET_GDM", package = "RGMQL") TSS = read_gmql(test_path) HM = read_gmql(test_path2) ## Given a dataset HM and one called TSS with a sample including ## Transcription Start Site annotations, this statement searches for those ## regions of HM that are at a minimal distance from a transcription ## start site (TSS) and takes the first/closest one for each TSS, provided ## that such distance is lesser than 1200 bases and joined TSS and HM ## samples are obtained from the same provider (joinby clause). join_data = merge(TSS, HM, genometric_predicate = list(MD(1), DL(1200)), conds("provider"), region_output = "RIGHT") ## Given a dataset HM and one called TSS with a sample including ## Transcription Start Site annotations, this statement searches for those ## regions of HM that are downstream and at a minimal distance from a ## transcription start site (TSS) and takes the first/closest one for each ## TSS, provided that such distance is greater than 12K bases and joined ## TSS and HM samples are obtained from the same provider (joinby clause). join_data = merge(TSS, HM, genometric_predicate = list(MD(1), DGE(12000), DOWN()), conds("provider"), region_output = "RIGHT")
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folders "DATASET" and "DATASET_GDM" in the subdirectory ## "example" of the package "RGMQL", and opens such folders as a GMQL ## datasets named "TSS" and "HM", respectively, using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") test_path2 <- system.file("example", "DATASET_GDM", package = "RGMQL") TSS = read_gmql(test_path) HM = read_gmql(test_path2) ## Given a dataset HM and one called TSS with a sample including ## Transcription Start Site annotations, this statement searches for those ## regions of HM that are at a minimal distance from a transcription ## start site (TSS) and takes the first/closest one for each TSS, provided ## that such distance is lesser than 1200 bases and joined TSS and HM ## samples are obtained from the same provider (joinby clause). join_data = merge(TSS, HM, genometric_predicate = list(MD(1), DL(1200)), conds("provider"), region_output = "RIGHT") ## Given a dataset HM and one called TSS with a sample including ## Transcription Start Site annotations, this statement searches for those ## regions of HM that are downstream and at a minimal distance from a ## transcription start site (TSS) and takes the first/closest one for each ## TSS, provided that such distance is greater than 12K bases and joined ## TSS and HM samples are obtained from the same provider (joinby clause). join_data = merge(TSS, HM, genometric_predicate = list(MD(1), DGE(12000), DOWN()), conds("provider"), region_output = "RIGHT")
It donwloads private dataset as zip file from remote repository to local path, or donwloads and saves it into R environment as GRangesList, using the proper GMQL web service available on a remote server
download_dataset(url, datasetName, path = getwd()) download_as_GRangesList(url, datasetName)
download_dataset(url, datasetName, path = getwd()) download_as_GRangesList(url, datasetName)
url |
string url of server: It must contain the server address and base url; service name is added automatically |
datasetName |
string name of dataset to download |
path |
string local path folder where to store dataset, by default it is R working directory |
If error occurs, a specific error is printed
None
GRangesList containing all GMQL samples in dataset
## Download dataset in R working directory ## In this case we try to download a dataset of the user ## (public datasets from remote repository cannot be downloaded) ## Not run: remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) download_dataset(remote_url, "Example_Dataset_1", path = getwd()) ## Create GRangesList from user dataset Example_Dataset1 got ## from repository download_as_GRangesList(remote_url, "Example_Dataset_1") ## End(Not run)
## Download dataset in R working directory ## In this case we try to download a dataset of the user ## (public datasets from remote repository cannot be downloaded) ## Not run: remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) download_dataset(remote_url, "Example_Dataset_1", path = getwd()) ## Create GRangesList from user dataset Example_Dataset1 got ## from repository download_as_GRangesList(remote_url, "Example_Dataset_1") ## End(Not run)
This function is used to support joinBy and/or groupBy function parameter.
conds(default = c(""), full = c(""), exact = c(""))
conds(default = c(""), full = c(""), exact = c(""))
default |
concatenation of string identifying a name of metadata attribute to be evaluated. It defines a DEFAULT evaluation of the input values. DEFAULT evaluation: the two attributes match if both end with value. |
full |
concatenation of string identifying a name of metadata attribute to be evaluated. It defines a FULL (FULLNAME) evaluation of the input values. FULL evaluation: two attributes match if they both end with value and, if they have further prefixes, the two prefix sequences are identical. |
exact |
concatenation of string identifying a name of metadata attribute to be evaluated. It defines a EXACT evaluation of the input values. EXACT evaluation: only attributes exactly as value match; no further prefixes are allowed. |
list of 2-D array containing method of evaluation and metadata attribute name
It executes GMQL query. The function works only after invoking at least one collect
execute()
execute()
None
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such folder as a GMQL dataset ## named "data" init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") data = read_gmql(test_path) ## The following statement materializes the dataset "data", previoulsy read, ## at the specific destination test_path into local folder "ds1" opportunely ## created collect(data, dir_out = test_path) ## This statement executes GMQL query. ## Not run: execute() ## End(Not run)
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such folder as a GMQL dataset ## named "data" init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") data = read_gmql(test_path) ## The following statement materializes the dataset "data", previoulsy read, ## at the specific destination test_path into local folder "ds1" opportunely ## created collect(data, dir_out = test_path) ## This statement executes GMQL query. ## Not run: execute() ## End(Not run)
It creates GMQL dataset from GRangesList. All samples are in GDM (tab-separated values) or GTF file format
export_gmql(samples, dir_out, is_gtf)
export_gmql(samples, dir_out, is_gtf)
samples |
GRangesList |
dir_out |
folder path where to create a folder and write the sample files |
is_gtf |
logical value indicating if samples have to be exported with GTF or GDM format |
The GMQL dataset is made up by two different file types:
metadata files: they contain metadata associated with corrisponding sample.
region files: they contain genomic regions data.
region schema file: XML file that contains region attribute names (e.g. chr, start, end, pvalue)
Sample region files and metadata files are associated through file name: for example S_0001.gdm for region file and S_0001.gdm.meta for its metadata file
None
## Load and attach add-on GenomicRanges package library(GenomicRanges) ## These statemens create two GRanges with the region attributes: seqnames, ## ranges (region coordinates) and strand, plus two column elements: ## score and GC gr1 <- GRanges(seqnames = "chr2", ranges = IRanges(3, 6), strand = "+", score = 5L, GC = 0.45) gr2 <- GRanges(seqnames = c("chr1", "chr1"), ranges = IRanges(c(7,13), width = 3), strand = c("+", "-"), score = 3:4, GC = c(0.3, 0.5)) ## This statement creates a GRangesList using the previous GRanges grl = GRangesList(gr1, gr2) ## This statement defines the path to the subdirectory "example" of the ## package "RGMQL" and exports the GRangesList as GMQL datasets with sample ## files in GTF file format, using the last name of 'dir_out' path as ## dataset name test_out_path <- system.file("example", package = "RGMQL") export_gmql(grl, test_out_path, TRUE)
## Load and attach add-on GenomicRanges package library(GenomicRanges) ## These statemens create two GRanges with the region attributes: seqnames, ## ranges (region coordinates) and strand, plus two column elements: ## score and GC gr1 <- GRanges(seqnames = "chr2", ranges = IRanges(3, 6), strand = "+", score = 5L, GC = 0.45) gr2 <- GRanges(seqnames = c("chr1", "chr1"), ranges = IRanges(c(7,13), width = 3), strand = c("+", "-"), score = 3:4, GC = c(0.3, 0.5)) ## This statement creates a GRangesList using the previous GRanges grl = GRangesList(gr1, gr2) ## This statement defines the path to the subdirectory "example" of the ## package "RGMQL" and exports the GRangesList as GMQL datasets with sample ## files in GTF file format, using the last name of 'dir_out' path as ## dataset name test_out_path <- system.file("example", package = "RGMQL") export_gmql(grl, test_out_path, TRUE)
Wrapper to GMQL EXTEND operator
For each sample in an input dataset, it generates new metadata attributes as result of aggregate functions applied to sample region attributes and adds them to the existing metadata attributes of the sample. Aggregate functions are applied sample by sample.
extend(.data, ...) ## S4 method for signature 'GMQLDataset' extend(.data, ...)
extend(.data, ...) ## S4 method for signature 'GMQLDataset' extend(.data, ...)
.data |
GMQLDataset class object |
... |
a series of expressions separated by comma in the form
key = aggregate. The aggregate is an object of
class AGGREGATES. The aggregate functions available are:
"mixed style" is not allowed |
GMQLDataset object. It contains the value to use as input for the subsequent GMQLDataset method
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such folder as a GMQL dataset ## named "data" init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") data <- read_gmql(test_path) ## This statement counts the regions in each sample and stores their number ## as value of the new metadata attribute RegionCount of the sample. e <- extend(data, RegionCount = COUNT()) ## This statement copies all samples of data dataset into 'res' dataset, ## and then calculates for each of them two new metadata attributes: ## 1. RegionCount is the number of sample regions; ## 2. MinP is the minimum pvalue of the sample regions. ## res sample regions are the same as the ones in data. res = extend(data, RegionCount = COUNT(), MinP = MIN("pvalue"))
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such folder as a GMQL dataset ## named "data" init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") data <- read_gmql(test_path) ## This statement counts the regions in each sample and stores their number ## as value of the new metadata attribute RegionCount of the sample. e <- extend(data, RegionCount = COUNT()) ## This statement copies all samples of data dataset into 'res' dataset, ## and then calculates for each of them two new metadata attributes: ## 1. RegionCount is the number of sample regions; ## 2. MinP is the minimum pvalue of the sample regions. ## res sample regions are the same as the ones in data. res = extend(data, RegionCount = COUNT(), MinP = MIN("pvalue"))
Wrapper to GMQL SELECT operator
It creates a new dataset from an existing one by extracting a subset of samples and/or regions from the input dataset according to the predicate. Each sample in the output dataset has the same region attributes, values, and metadata as in the input dataset. When semijoin function is defined, it extracts those samples containing all metadata attributes defined in semijoin clause with at least one metadata value in common with semijoin dataset. If no metadata in common between input dataset and semijoin dataset, no sample is extracted.
## S4 method for signature 'GMQLDataset' filter( .data, m_predicate = NULL, r_predicate = NULL, semijoin = NULL, .by = NULL, .preserve = NULL )
## S4 method for signature 'GMQLDataset' filter( .data, m_predicate = NULL, r_predicate = NULL, semijoin = NULL, .by = NULL, .preserve = NULL )
.data |
GMQLDataset class object |
m_predicate |
logical predicate made up by R logical operations on metadata attributes. Only !, |, ||, &, && are admitted. |
r_predicate |
logical predicate made up by R logical operations on region attributes. Only !, |, ||, &, && are admitted. |
semijoin |
|
.by |
parameters inherited, unused with GMQLDateset data object |
.preserve |
parameters inherited, unused with GMQLDateset data object |
GMQLDataset object. It contains the value to use as input for the subsequent GMQLDataset method
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such folder as a GMQL dataset ## named "data" init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") data <- read_gmql(test_path) ## This statement selects from input the data samples of patients younger ## than 70 years old, based on filtering on sample metadata attribute ## 'patient_age' filter_data <- filter(data, patient_age < 70, NULL) ## This statement defines the path to the folder "DATASET_GDM" in the ## subdirectory "example" of the package "RGMQL" and opens such folder ## as a GMQL dataset named "join_data" test_path2 <- system.file("example", "DATASET_GDM", package = "RGMQL") join_data <- read_gmql(test_path2) ## This statement creates a new dataset called 'jun_tf' by selecting those ## samples and their regions from the existing 'data' dataset such that: ## Each output sample has a metadata attribute called antibody_target ## with value JUN. ## Each output sample also has not a metadata attribute called "cell" ## that has the same value of at least one of the values that a metadata ## attribute equally called cell has in at least one sample ## of the 'join_data' dataset. ## For each sample satisfying previous conditions, only its regions that ## have a region attribute called 'pvalue' with the associated value ## less than 0.01 are conserved in output jun_tf <- filter(data, antibody_target == "JUN", pvalue < 0.01, semijoin(join_data, FALSE, conds("cell")))
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such folder as a GMQL dataset ## named "data" init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") data <- read_gmql(test_path) ## This statement selects from input the data samples of patients younger ## than 70 years old, based on filtering on sample metadata attribute ## 'patient_age' filter_data <- filter(data, patient_age < 70, NULL) ## This statement defines the path to the folder "DATASET_GDM" in the ## subdirectory "example" of the package "RGMQL" and opens such folder ## as a GMQL dataset named "join_data" test_path2 <- system.file("example", "DATASET_GDM", package = "RGMQL") join_data <- read_gmql(test_path2) ## This statement creates a new dataset called 'jun_tf' by selecting those ## samples and their regions from the existing 'data' dataset such that: ## Each output sample has a metadata attribute called antibody_target ## with value JUN. ## Each output sample also has not a metadata attribute called "cell" ## that has the same value of at least one of the values that a metadata ## attribute equally called cell has in at least one sample ## of the 'join_data' dataset. ## For each sample satisfying previous conditions, only its regions that ## have a region attribute called 'pvalue' with the associated value ## less than 0.01 are conserved in output jun_tf <- filter(data, antibody_target == "JUN", pvalue < 0.01, semijoin(join_data, FALSE, conds("cell")))
This function lets user to create a new GRangesList with fixed information: seqnames, ranges and strand, and a variable part made up by the regions defined as input. The metadata and metadata_prefix are used to filter the data and choose only the samples that match at least one metdatata with its prefix. The input regions are shown for each sample obtained from filtering.
filter_and_extract( data, metadata = NULL, metadata_prefix = NULL, region_attributes = NULL, suffix = "antibody_target" )
filter_and_extract( data, metadata = NULL, metadata_prefix = NULL, region_attributes = NULL, suffix = "antibody_target" )
data |
string GMQL dataset folder path or GRangesList object |
metadata |
vector of strings containing names of metadata attributes to be searched for in metadata files. Data will be extracted if at least one condition is satisfied: this condition is logically "ANDed" with prefix filtering (see below) if NULL no filtering action occures (i.e every sample is taken for region filtering) |
metadata_prefix |
vector of strings that will support the metadata filtering. If defined, each 'metadata' is concatenated with the corresponding prefix. |
region_attributes |
vector of strings that extracts only region
attributes specified; if NULL no regions attribute is taken and the output
is only GRanges made up by the region coordinate attributes
(seqnames, start, end, strand);
It is also possible to assign the |
suffix |
name for each metadata column of GRanges. By default it is the value of the metadata attribute named "antibody_target". This string is taken from sample metadata file or from metadata() associated. If not present, the column name is the name of selected regions specified by 'region_attributes' input parameter |
This function works only with dataset or GRangesList all whose samples or Granges have the same region coordinates (chr, ranges, strand) ordered in the same way for each sample
In case of GRangesList data input, the function searches for metadata into metadata() function associated to GRangesList.
GRanges with selected regions
## This statement defines the path to the folder "DATASET" in the ## subdirectory "example" of the package "RGMQL" and filters such folder ## dataset including at output only "pvalue" and "peak" region attributes test_path <- system.file("example", "DATASET", package = "RGMQL") filter_and_extract(test_path, region_attributes = c("pvalue", "peak")) ## This statement imports a GMQL dataset as GRangesList and filters it ## including at output only "pvalue" and "peak" region attributes, the sort ## function makes sure that the region coordinates (chr, ranges, strand) ## of all samples are ordered correctly grl <- import_gmql(test_path, TRUE) sorted_grl <- sort(grl) filter_and_extract(sorted_grl, region_attributes = c("pvalue", "peak")) ## This statement imports a GMQL dataset as GRangesList and filters it ## including all the region attributes sorted_grl_full <- sort(grl) filter_and_extract(sorted_grl_full, region_attributes = FULL()) ## This statement imports a GMQL dataset as GRangesList and filters it ## including all the region attributes except "jaccard" sorted_grl_full_except <- sort(grl) filter_and_extract( sorted_grl_full_except, region_attributes = FULL("jaccard") )
## This statement defines the path to the folder "DATASET" in the ## subdirectory "example" of the package "RGMQL" and filters such folder ## dataset including at output only "pvalue" and "peak" region attributes test_path <- system.file("example", "DATASET", package = "RGMQL") filter_and_extract(test_path, region_attributes = c("pvalue", "peak")) ## This statement imports a GMQL dataset as GRangesList and filters it ## including at output only "pvalue" and "peak" region attributes, the sort ## function makes sure that the region coordinates (chr, ranges, strand) ## of all samples are ordered correctly grl <- import_gmql(test_path, TRUE) sorted_grl <- sort(grl) filter_and_extract(sorted_grl, region_attributes = c("pvalue", "peak")) ## This statement imports a GMQL dataset as GRangesList and filters it ## including all the region attributes sorted_grl_full <- sort(grl) filter_and_extract(sorted_grl_full, region_attributes = FULL()) ## This statement imports a GMQL dataset as GRangesList and filters it ## including all the region attributes except "jaccard" sorted_grl_full_except <- sort(grl) filter_and_extract( sorted_grl_full_except, region_attributes = FULL("jaccard") )
This class constructor is used to create instances of PARAM object to be used in filter and extract function.
FULL(except = NULL)
FULL(except = NULL)
except |
The list of attribute to not consider |
It is used to encompasses all the region parameters already present into the dataset or GRangesList
FULL: It consider all the region paramter
Param object
Wrapper to GMQL GROUP operator
It performs the grouping of samples and/or sample regions of the input dataset based on one specified metadata and/or region attribute. If the metadata attribute is multi-value, i.e., it assumes multiple values for sample (e.g., both <disease, cancer> and <disease, diabetes>), the grouping identifies different groups of samples for each attribute value combination (e.g., group1 for samples that feature the combination <disease, cancer>, group2 for samples that feature the combination <disease, diabetes>, and group3 for samples that feature both combinations <disease, cancer> and <disease, diabetes>). For each obtained group, it is possible to request the evaluation of aggregate functions on metadata attributes; these functions consider the metadata contained in all samples of the group. The regions, their attributes and their values in output are the same as the ones in input for each sample, and the total number of samples does not change. All metadata in the input samples are conserved with their values in the output samples, with the addition of the "_group" attribute, whose value is the identifier of the group to which the specific sample is assigned; other metadata attributes can be added as aggregate functions computed on specified metadata. When used on region attributes, group_by can group regions of each sample individually, based on their coordinates (chr, start, stop, strand) and possibly also on other specified grouping region attributes (when these are present in the schema of the input dataset). In each sample, regions found in the same group (i.e., regions with same coordinates and grouping attribute values), are combined into a single region; this allows to merge regions that are duplicated inside the same sample (based on the values of their coordinates and of other possible specified region attributes). For each grouped region, it is possible to request the evaluation of aggregate functions on other region attributes (i.e., which are not coordinates, or grouping region attributes). This use is independent on the possible grouping realised based on metadata. The generated output schema only contains the original region attributes on which the grouping has been based, and additionally the attributes in case calculated as aggregated functions. If the group_by is applied only on regions, the output metadata and their values are equal to the ones in input. Both when applied on metadata and on regions, the group_by operation returns a number of output samples equal to the number of input ones. Note that the two possible uses of group_by, on metadata and on regions, are perfectly orthogonal, therefore they can be used in combination or independently.
## S4 method for signature 'GMQLDataset' group_by( .data, groupBy_meta = conds(), groupBy_regions = c(""), region_aggregates = NULL, meta_aggregates = NULL, .add = NULL, .drop = NULL )
## S4 method for signature 'GMQLDataset' group_by( .data, groupBy_meta = conds(), groupBy_regions = c(""), region_aggregates = NULL, meta_aggregates = NULL, .add = NULL, .drop = NULL )
.data |
GMQLDataset object |
groupBy_meta |
|
groupBy_regions |
vector of strings made up by region attribute names |
region_aggregates |
It accepts a list of aggregate functions on |
meta_aggregates |
It accepts a list of aggregate functions on
metadata attribute.
All the elements in the form key = aggregate.
The aggregate is an object of class AGGREGATES.
The aggregate functions available are:
"mixed style" is not allowed |
.add |
parameters inherited, unused with GMQLDateset data object |
.drop |
parameters inherited, unused with GMQLDateset data object |
GMQLDataset object. It contains the value to use as input for the subsequent GMQLDataset method
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such file as a GMQL dataset named "exp" ## using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") exp = read_gmql(test_path) ## This GMQL statement groups samples of the input 'exp' dataset according ## to their value of the metadata attribute 'tumor_type' and computes the ## maximum value that the metadata attribute 'size' takes inside the samples ## belonging to each group. The samples in the output GROUPS_T dataset ## have a new _group metadata attribute which indicates which group they ## belong to, based on the grouping on the metadata attribute tumor_type. ## In addition, they present the new metadata aggregate attribute 'MaxSize'. ## Note that the samples without metadata attribute 'tumor_type' are ## assigned to a single group with _group value equal 0 GROUPS_T = group_by(exp, conds("tumor_type"), meta_aggregates = list(max_size = MAX("size"))) ## This GMQL statement takes as input dataset the same input dataset as ## the previous example. Yet, it calculates new _group values based on the ## grouping attribute 'cell', and adds the metadata aggregate attribute ## 'n_samp', which counts the number of samples belonging to the respective ## group. It has the following output GROUPS_C dataset samples ## (note that now no sample has metadata attribute _group with value ## equal 0 since all input samples include the metadata attribute cell, ## with different values, on which the new grouping is based) GROUPS_C = group_by(exp, conds("cell"), meta_aggregates = list(n_samp = COUNTSAMP())) ## This GMQL statement groups the regions of each 'exp' dataset sample by ## region coordinates chr, left, right, strand (these are implicitly ## considered) and the additional region attribute score (which is ## explicitly specified), and keeps only one region for each group. ## In the output GROUPS dataset schema, the new region attributes ## avg_pvalue and max_qvalue are added, respectively computed as the ## average of the values taken by the pvalue and the maximum of the values ## taken by the qvalue region attributes in the regions grouped together, ## and the computed value is assigned to each region of each output sample. ## Note that the region attributes which are not coordinates or score are ## discarded. GROUPS = group_by(exp, groupBy_regions = "score", region_aggregates = list(avg_pvalue = AVG("pvalue"), max_qvalue = MAX("qvalue")))
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such file as a GMQL dataset named "exp" ## using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") exp = read_gmql(test_path) ## This GMQL statement groups samples of the input 'exp' dataset according ## to their value of the metadata attribute 'tumor_type' and computes the ## maximum value that the metadata attribute 'size' takes inside the samples ## belonging to each group. The samples in the output GROUPS_T dataset ## have a new _group metadata attribute which indicates which group they ## belong to, based on the grouping on the metadata attribute tumor_type. ## In addition, they present the new metadata aggregate attribute 'MaxSize'. ## Note that the samples without metadata attribute 'tumor_type' are ## assigned to a single group with _group value equal 0 GROUPS_T = group_by(exp, conds("tumor_type"), meta_aggregates = list(max_size = MAX("size"))) ## This GMQL statement takes as input dataset the same input dataset as ## the previous example. Yet, it calculates new _group values based on the ## grouping attribute 'cell', and adds the metadata aggregate attribute ## 'n_samp', which counts the number of samples belonging to the respective ## group. It has the following output GROUPS_C dataset samples ## (note that now no sample has metadata attribute _group with value ## equal 0 since all input samples include the metadata attribute cell, ## with different values, on which the new grouping is based) GROUPS_C = group_by(exp, conds("cell"), meta_aggregates = list(n_samp = COUNTSAMP())) ## This GMQL statement groups the regions of each 'exp' dataset sample by ## region coordinates chr, left, right, strand (these are implicitly ## considered) and the additional region attribute score (which is ## explicitly specified), and keeps only one region for each group. ## In the output GROUPS dataset schema, the new region attributes ## avg_pvalue and max_qvalue are added, respectively computed as the ## average of the values taken by the pvalue and the maximum of the values ## taken by the qvalue region attributes in the regions grouped together, ## and the computed value is assigned to each region of each output sample. ## Note that the region attributes which are not coordinates or score are ## discarded. GROUPS = group_by(exp, groupBy_regions = "score", region_aggregates = list(avg_pvalue = AVG("pvalue"), max_qvalue = MAX("qvalue")))
It creates a GRangesList from GMQL samples in dataset. It reads sample files in GTF or GDM/tab-delimited format.
import_gmql(dataset_path, is_gtf)
import_gmql(dataset_path, is_gtf)
dataset_path |
string with GMQL dataset folder path |
is_gtf |
logical value indicating if dataset samples are in GTF format; if TRUE and dataset does not contain GTF samples, an error occurs |
GRangesList containing all GMQL samples in dataset
## This statement defines the path to the subdirectory "example" of the ## package "RGMQL" and imports as GRangesList the contained GMQL dataset test_path <- system.file("example", "DATASET", package = "RGMQL") grl = import_gmql(test_path, TRUE)
## This statement defines the path to the subdirectory "example" of the ## package "RGMQL" and imports as GRangesList the contained GMQL dataset test_path <- system.file("example", "DATASET", package = "RGMQL") grl = import_gmql(test_path, TRUE)
It initializes and runs GMQL server for executing GMQL query. It also performs a login to GMQL REST services suite, if needed
init_gmql( output_format = "GTF", remote_processing = FALSE, url = NULL, username = NULL, password = NULL )
init_gmql( output_format = "GTF", remote_processing = FALSE, url = NULL, username = NULL, password = NULL )
output_format |
string that identifies the output format of all sample files. It can be TAB, GTF or COLLECT:
|
remote_processing |
logical value specifying the processing mode. True for processing on cluster (remote), false for local processing. |
url |
string url of server: It must contain the server address
and base url; service name is added automatically.
If NULL, no login is performed.
You can always perform it by calling the function |
username |
string name used during remote server signup |
password |
string password used during remote server signup |
None
## This statement initializes GMQL with local processing with sample files ## output format as tab-delimited init_gmql("tab", FALSE) ## This statement initializes GMQL with remote processing remote_url = "http://www.gmql.eu/gmql-rest/" init_gmql(remote_processing = TRUE, url = remote_url)
## This statement initializes GMQL with local processing with sample files ## output format as tab-delimited init_gmql("tab", FALSE) ## This statement initializes GMQL with remote processing remote_url = "http://www.gmql.eu/gmql-rest/" init_gmql(remote_processing = TRUE, url = remote_url)
It shows a job log or traces a specific job
show_job_log(url, job_id) trace_job(url, job_id)
show_job_log(url, job_id) trace_job(url, job_id)
url |
string url of server: It must contain the server address and base url; service name is added automatically |
job_id |
string id of the job |
If error occurs, a specific error is printed
Log or trace text
## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## List all jobs list_jobs <- show_jobs_list(remote_url) ## Not run: jobs_1 <- list_jobs$jobs[[1]] ## Show jobs_1 log show_job_log(remote_url, jobs_1) ## Trace jobs_1 trace_job(remote_url, jobs_1) ## End(Not run)
## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## List all jobs list_jobs <- show_jobs_list(remote_url) ## Not run: jobs_1 <- list_jobs$jobs[[1]] ## Show jobs_1 log show_job_log(remote_url, jobs_1) ## Trace jobs_1 trace_job(remote_url, jobs_1) ## End(Not run)
Login to GMQL REST services suite as a registered user, specifying username and password, or as guest, using the proper GMQL web service available on a remote server
login_gmql(url, username = NULL, password = NULL)
login_gmql(url, username = NULL, password = NULL)
url |
string url of server: It must contain the server address and base url; service name is added automatically |
username |
string name used during signup |
password |
string password used during signup |
If both username and password are missing, you will be logged as guest. After login you will receive an authentication token. As token remains valid on server (until the next login / registration or logout), a user can safely use a token for a previous session as a convenience; this token is saved in R Global environment to perform subsequent REST call even on complete R restart (if the environment has been saved). If error occurs, a specific error is printed
None
## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url)
## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url)
Logout from GMQL REST services suite using the proper GMQL web service available on a remote server
logout_gmql(url)
logout_gmql(url)
url |
string url of server: It must contain the server address and base url; service name is added automatically |
After logout the authentication token will be invalidated. The authentication token is removed from R Global environment. If error occurs, a specific error is printed
None
## Login to GMQL REST services suite as guest, then logout remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) logout_gmql(remote_url)
## Login to GMQL REST services suite as guest, then logout remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) logout_gmql(remote_url)
Wrapper to GMQL MAP operator
It computes, for each sample in the right dataset, aggregates over the values of the right dataset regions that intersect with a region in a left dataset sample, for each region of each sample in the left dataset. The number of generated output samples is the Cartesian product of the samples in the two input datasets; each output sample has the same regions as the related input left dataset sample, with their attributes and values, plus the attributes computed as aggregates over right region values. Output sample metadata are the union of the related input sample metadata, whose attribute names are prefixed with 'left' or 'right' respectively.
map(x, y, ...) ## S4 method for signature 'GMQLDataset' map(x, y, ..., joinBy = conds(), count_name = "")
map(x, y, ...) ## S4 method for signature 'GMQLDataset' map(x, y, ..., joinBy = conds(), count_name = "")
x |
GMQLDataset class object |
y |
GMQLDataset class object |
... |
a series of expressions separated by comma in the form
key = aggregate. The aggregate is an object of
class AGGREGATES. The aggregate functions available are:
"mixed style" is not allowed |
joinBy |
|
count_name |
string defining the metadata count name; if it is not specified the name is "count_left_right" |
When the joinby clause is present, only pairs of samples of x dataset and of y dataset with metadata M1 and M2, respectively, that satisfy the joinby condition are considered.
The clause consists of a list of metadata attribute names that must be present with equal values in both M1 and M2
GMQLDataset object. It contains the value to use as input for the subsequent GMQLDataset method
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folders "DATASET" and "DATASET_GDM" in the subdirectory ## "example" of the package "RGMQL", and opens such folders as a GMQL ## dataset named "exp" and "ref", respectively, using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") test_path2 <- system.file("example", "DATASET_GDM", package = "RGMQL") exp = read_gmql(test_path) ref = read_gmql(test_path2) ## This statement counts the number of regions in each sample from exp ## dataset that overlap with a ref dataset region, and for each ref region ## it computes the minimum score of all the regions in each exp sample that ## overlap with it. The MAP joinBy option ensures that only the exp samples ## referring to the same 'cell_tissue' of a ref sample are mapped on such ## ref sample; exp samples with no cell_tissue metadata attribute, or with ## such metadata attribute, but with a different value from the one(s) ## of ref sample(s), are disregarded. out = map(ref, exp, minScore = MIN("score"), joinBy = conds("cell_tissue"))
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folders "DATASET" and "DATASET_GDM" in the subdirectory ## "example" of the package "RGMQL", and opens such folders as a GMQL ## dataset named "exp" and "ref", respectively, using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") test_path2 <- system.file("example", "DATASET_GDM", package = "RGMQL") exp = read_gmql(test_path) ref = read_gmql(test_path2) ## This statement counts the number of regions in each sample from exp ## dataset that overlap with a ref dataset region, and for each ref region ## it computes the minimum score of all the regions in each exp sample that ## overlap with it. The MAP joinBy option ensures that only the exp samples ## referring to the same 'cell_tissue' of a ref sample are mapped on such ## ref sample; exp samples with no cell_tissue metadata attribute, or with ## such metadata attribute, but with a different value from the one(s) ## of ref sample(s), are disregarded. out = map(ref, exp, minScore = MIN("score"), joinBy = conds("cell_tissue"))
Wrapper to GMQL JOIN operator
It takes in input two datasets, respectively known as anchor (left) and experiment (right) and returns a dataset of samples consisting of regions extracted from the operands according to the specified conditions (a.k.a genometric_predicate and region_attribute predicate). The number of generated output samples is the Cartesian product of the number of samples in the anchor and in the experiment dataset (if joinBy is not specified). The output metadata are the union of the input metadata, with their attribute names prefixed with left or right dataset name, respectively.
## S4 method for signature 'GMQLDataset,GMQLDataset' merge( x, y, genometric_predicate = NULL, region_output = "CAT", joinBy = conds(), reg_attr = c("") )
## S4 method for signature 'GMQLDataset,GMQLDataset' merge( x, y, genometric_predicate = NULL, region_output = "CAT", joinBy = conds(), reg_attr = c("") )
x |
GMQLDataset class object |
y |
GMQLDataset class object |
genometric_predicate |
it is a list of DISTAL objects.
For details of DISTAL objects see:
|
region_output |
single string that declares which region is given in output for each input pair of left dataset and right dataset regions satisfying the genometric predicate and/or the region attribute predicate:
|
joinBy |
|
reg_attr |
vector of strings made up by region field attribute names, whose values in the paired left and right dataset regions must be equal in order to consider the two paired regions. If specified, region_output cannot be INT or CAT. |
GMQLDataset object. It contains the value to use as input for the subsequent GMQLDataset method
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folders "DATASET" and "DATASET_GDM" in the subdirectory ## "example" of the package "RGMQL" and opens such folders as a GMQL ## datasets named TSS and HM, respectively, using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") test_path2 <- system.file("example", "DATASET_GDM", package = "RGMQL") TSS = read_gmql(test_path) HM = read_gmql(test_path2) ## Given a dataset HM and one called TSS with a sample including ## Transcription Start Site annotations, this statement searches for those ## regions of HM that are at a minimal distance from a transcription start ## site (TSS) and takes the first/closest one for each TSS, provided that ## such distance is lesser than 120K bases and joined TSS and HM ## samples are obtained from the same provider (joinby clause). join_data = merge(TSS, HM, genometric_predicate = list(MD(1), DLE(120000)), conds("provider"), region_output = "RIGHT")
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folders "DATASET" and "DATASET_GDM" in the subdirectory ## "example" of the package "RGMQL" and opens such folders as a GMQL ## datasets named TSS and HM, respectively, using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") test_path2 <- system.file("example", "DATASET_GDM", package = "RGMQL") TSS = read_gmql(test_path) HM = read_gmql(test_path2) ## Given a dataset HM and one called TSS with a sample including ## Transcription Start Site annotations, this statement searches for those ## regions of HM that are at a minimal distance from a transcription start ## site (TSS) and takes the first/closest one for each TSS, provided that ## such distance is lesser than 120K bases and joined TSS and HM ## samples are obtained from the same provider (joinby clause). join_data = merge(TSS, HM, genometric_predicate = list(MD(1), DLE(120000)), conds("provider"), region_output = "RIGHT")
This class constructor is used to create instances of OPERATOR object, to be used in GMQL functions that require operator on value.
META(value, type = NULL) NIL(type) SQRT(value)
META(value, type = NULL) NIL(type) SQRT(value)
value |
string identifying name of metadata attribute |
type |
string identifying the type of the attribute value; it must be: INTEGER, DOUBLE or STRING. For NIL() function, only INTEGER and DOUBLE are allowed |
META: It prepares input parameter to be passed to library function meta, performing all the type conversions needed
SQRT: It prepares input parameter to be passed to library function sqrt, performing all the type conversions needed
NIL: It prepares input parameter to be passed to library function null, performing all the type conversions needed
Operator object
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such folder as a GMQL dataset ## named "exp" init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") exp = read_gmql(test_path) ## This statement allows to select, in all input samples, all those regions ## for which the region attribute score has a value which is greater ## than the metadata attribute value "avg_score" in their sample. data = filter(exp, r_predicate = score > META("avg_score"), NULL) ## This statement defines new numeric region attributes with "null" value. ## The syntax for creating a new attribute with null value is ## attribute_name = NULL(TYPE), where type may be INTEGER or DOUBLE. out = select(exp, regions_update = list(signal = NIL("INTEGER"), pvalue = NIL("DOUBLE"))) ## This statement allows to build an output dataset named 'out' such that ## all the samples from the input dataset 'exp' are conserved, ## as well as their region attributes (and their values) ## and their metadata attributes (and their values). ## The new metadata attribute 'concSq' is added to all output samples ## with value correspondent to the mathematical squared root ## of the pre-existing metadata attribute 'concentration'. out = select(exp, metadata_update = list(concSq = SQRT("concentration")))
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such folder as a GMQL dataset ## named "exp" init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") exp = read_gmql(test_path) ## This statement allows to select, in all input samples, all those regions ## for which the region attribute score has a value which is greater ## than the metadata attribute value "avg_score" in their sample. data = filter(exp, r_predicate = score > META("avg_score"), NULL) ## This statement defines new numeric region attributes with "null" value. ## The syntax for creating a new attribute with null value is ## attribute_name = NULL(TYPE), where type may be INTEGER or DOUBLE. out = select(exp, regions_update = list(signal = NIL("INTEGER"), pvalue = NIL("DOUBLE"))) ## This statement allows to build an output dataset named 'out' such that ## all the samples from the input dataset 'exp' are conserved, ## as well as their region attributes (and their values) ## and their metadata attributes (and their values). ## The new metadata attribute 'concSq' is added to all output samples ## with value correspondent to the mathematical squared root ## of the pre-existing metadata attribute 'concentration'. out = select(exp, metadata_update = list(concSq = SQRT("concentration")))
These functions are used to create a series of metadata as string that require ordering on value; it is used only in arrange method (see example).
DESC(...) ASC(...)
DESC(...) ASC(...)
... |
series of metatdata as string |
ASC: It defines an ascending order for input value
DESC: It defines a descending order for input value
Ordering object
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such file as a GMQL dataset named ## "data" using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") data = read_gmql(test_path) ## This statement orders the samples according to the Region_Count metadata ## attribute and takes the two samples that have the lowest count. asc = arrange(data, list(ASC("Region_Count")), fetch_opt = "mtop", num_fetch = 2) ## This statement orders the regions of each samples according to their ## pvalue attribute value and in each sample it takes the first seven ## regions with the highest pvalue desc = arrange(data, regions_ordering = list(DESC("pvalue")), reg_fetch_opt = "rtop", reg_num_fetch = 7)
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such file as a GMQL dataset named ## "data" using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") data = read_gmql(test_path) ## This statement orders the samples according to the Region_Count metadata ## attribute and takes the two samples that have the lowest count. asc = arrange(data, list(ASC("Region_Count")), fetch_opt = "mtop", num_fetch = 2) ## This statement orders the regions of each samples according to their ## pvalue attribute value and in each sample it takes the first seven ## regions with the highest pvalue desc = arrange(data, regions_ordering = list(DESC("pvalue")), reg_fetch_opt = "rtop", reg_num_fetch = 7)
It reads a GMQL dataset, as a folder containing some homogenus samples on disk or as a GRangesList, saving it in Scala memory in a way that can be referenced in R. It is also used to read a repository dataset in case of remote processing.
read_gmql(dataset, parser = "CustomParser", is_local = TRUE) read_GRangesList(samples)
read_gmql(dataset, parser = "CustomParser", is_local = TRUE) read_GRangesList(samples)
dataset |
folder path for GMQL dataset or dataset name on repository |
parser |
string used to parsing dataset files. The Parsers available are:
Default is CustomParser. |
is_local |
logical value indicating local or remote dataset |
samples |
GRangesList |
Normally, a GMQL dataset contains an XML schema file that contains name of region attributes. (e.g chr, start, stop, strand) The CustomParser reads this XML schema; if you already know what kind of schema your files have, use one of the parsers defined, without reading any XML schema.
If GRangesList has no metadata: i.e. metadata() is empty, two metadata are generated:
"provider" = "PoliMi"
"application" = "RGMQL"
NOTE: The folder layout must obey the following rules and adopt the following layout: The dataset folder can have any name, but must contains the sub-folders named: "files". The sub-folder "files" contains the dataset files and the schema xml file. The schema files adopt the following the naming conventions:
- "schema.xml" - "test.schema"
The names must be in LOWERCASE. Any other schema file will not be conisdered, if both are present, "test.schema" will be used.
GMQLDataset object. It contains the value to use as input for the subsequent GMQLDataset method
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such folder as a GMQL dataset ## named "data" using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") data = read_gmql(test_path) ## This statement opens such folder as a GMQL dataset named "data" using ## "NarrowPeakParser" dataPeak = read_gmql(test_path,"NarrowPeakParser") ## This statement reads a remote public dataset stored into GMQL system ## repository. For a public dataset in a (remote) GMQL repository the ## prefix "public." is needed before dataset name remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) data1 = read_gmql("public.Example_Dataset_1", is_local = FALSE)
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such folder as a GMQL dataset ## named "data" using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") data = read_gmql(test_path) ## This statement opens such folder as a GMQL dataset named "data" using ## "NarrowPeakParser" dataPeak = read_gmql(test_path,"NarrowPeakParser") ## This statement reads a remote public dataset stored into GMQL system ## repository. For a public dataset in a (remote) GMQL repository the ## prefix "public." is needed before dataset name remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) data1 = read_gmql("public.Example_Dataset_1", is_local = FALSE)
Register to GMQL REST services suite using the proper GMQL web service available on a remote server.
register_gmql(url, username, psw, email, first_name, last_name)
register_gmql(url, username, psw, email, first_name, last_name)
url |
string url of server: It must contains the server address and base url; service name is added automatically |
username |
string user name used to login in |
psw |
string password used to login in |
email |
string user email |
first_name |
string user first name |
last_name |
string user last name |
After registration you will receive an authentication token. As token remains valid on server (until the next login / registration or logout), a user can safely use a token for a previous session as a convenience; this token is saved in R Global environment to perform subsequent REST calls or batch processing even on complete R restart (if the environment has been saved). If error occurs, a specific error is printed.
None
## Register to GMQL REST services suite remote_url = "http://www.gmql.eu/gmql-rest/" ## Not run: register_gmql(remote_url,"foo","foo","[email protected]","foo","foo") ## End(Not run)
## Register to GMQL REST services suite remote_url = "http://www.gmql.eu/gmql-rest/" ## Not run: register_gmql(remote_url,"foo","foo","[email protected]","foo","foo") ## End(Not run)
It allows to enable or disable remote processing
remote_processing(is_remote)
remote_processing(is_remote)
is_remote |
logical value used in order to set the processing mode. TRUE: you set a remote query processing mode, otherwise it will be local |
The invocation of this function allows to change mode of processing. After invoking collect() function, it is not possbile to switch the processing mode.
None
## This statement initializes GMQL with local processing with sample ## files output format as tab-delimited, and then it changes processing ## mode to remote init_gmql("tab", remote_processing = FALSE) remote_processing(TRUE)
## This statement initializes GMQL with local processing with sample ## files output format as tab-delimited, and then it changes processing ## mode to remote init_gmql("tab", remote_processing = FALSE) remote_processing(TRUE)
It runs a GMQL query into repository taken from file or inserted as text string, using the proper GMQL web service available on a remote server
run_query(url, queryName, query, output_gtf = TRUE) run_query_fromfile(url, filePath, output_gtf = TRUE)
run_query(url, queryName, query, output_gtf = TRUE) run_query_fromfile(url, filePath, output_gtf = TRUE)
url |
string url of server: It must contain the server address and base url; service name is added automatically |
queryName |
string name of the GMQL query file |
query |
string text of the GMQL query |
output_gtf |
logical value indicating file format used for storing samples generated by the query. The possiblities are:
|
filePath |
string path of a txt file containing a GMQL query |
If error occurs, a specific error is printed
None
## Not run: ## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## Run query as string input parameter ## NOTE: not very suitable for long queries run_query(remote_url, "query_1", "DATASET = SELECT() Example_Dataset1; MATERIALIZE DATASET INTO RESULT_DS;", output_gtf = FALSE) ## With system.file() this statement defines the path to the folder ## "example" of the package "RGMQL", and then it executes the query ## written in the text file "query1.txt" test_path <- system.file("example", package = "RGMQL") test_query <- file.path(test_path, "query1.txt") run_query_fromfile(remote_url, test_query, output_gtf = FALSE) ## End(Not run)
## Not run: ## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## Run query as string input parameter ## NOTE: not very suitable for long queries run_query(remote_url, "query_1", "DATASET = SELECT() Example_Dataset1; MATERIALIZE DATASET INTO RESULT_DS;", output_gtf = FALSE) ## With system.file() this statement defines the path to the folder ## "example" of the package "RGMQL", and then it executes the query ## written in the text file "query1.txt" test_path <- system.file("example", package = "RGMQL") test_query <- file.path(test_path, "query1.txt") run_query_fromfile(remote_url, test_query, output_gtf = FALSE) ## End(Not run)
It retrieves metadata of a specific sample in dataset using the proper GMQL web service available on a remote server
sample_metadata(url, datasetName, sampleName)
sample_metadata(url, datasetName, sampleName)
url |
string url of server: It must contain the server address and base url; service name is added automatically |
datasetName |
string name of dataset of interest |
sampleName |
string name of sample of interest |
If error occurs, a specific error is printed
List of metadata in the form 'key = value'
## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## This statement retrieves metadata of sample 'S_00000' from public ## dataset 'Example_Dataset_1' sample_metadata(remote_url, "public.Example_Dataset_1", "S_00000")
## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## This statement retrieves metadata of sample 'S_00000' from public ## dataset 'Example_Dataset_1' sample_metadata(remote_url, "public.Example_Dataset_1", "S_00000")
It retrieves regions data of a specific sample (whose name is specified in the parameter "sampleName") in a specific dataset (whose name is specified in the parameter "datasetName") using the proper GMQL web service available on a remote server
sample_region(url, datasetName, sampleName)
sample_region(url, datasetName, sampleName)
url |
string url of server. It must contain the server address and base url; service name is added automatically |
datasetName |
string name of dataset of interest |
sampleName |
string name of sample of interest |
If error occurs, a specific error is printed
GRanges data containing regions of sample
## Not run: ## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## This statement retrieves regions data of sample "S_00000" from public ## dataset "Example_Dataset_1" sample_region(remote_url, "public.Example_Dataset_1", "S_00000") ## End(Not run)
## Not run: ## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## This statement retrieves regions data of sample "S_00000" from public ## dataset "Example_Dataset_1" sample_region(remote_url, "public.Example_Dataset_1", "S_00000") ## End(Not run)
It saves a GMQL query into repository, taken from file or inserted as text string, using the proper GMQL web service available on a remote server
save_query(url, queryName, queryTxt) save_query_fromfile(url, queryName, filePath)
save_query(url, queryName, queryTxt) save_query_fromfile(url, queryName, filePath)
url |
string url of server: It must contain the server address and base url; service name is added automatically |
queryName |
string name of query |
queryTxt |
string text of GMQL query |
filePath |
string local file path of a txt file containing a GMQL query |
If you save a query with the same name of another query already stored in repository, you will overwrite it; if no error occurs, it prints: "Saved", otherwise it prints the error
None
## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## This statement saves query written directly as input string parameter ## with name "dna_query" save_query(remote_url, "example_query", "DATASET = SELECT() Example_Dataset_1; MATERIALIZE DATASET INTO RESULT_DS;") ## With system.file() this statement defines the path to the folder ## "example" of the package "RGMQL", and then it saves the query written ## in the text file "query1.txt" into remote repository test_path <- system.file("example", package = "RGMQL") test_query <- file.path(test_path, "query1.txt") save_query_fromfile(remote_url, "query1", test_query)
## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## This statement saves query written directly as input string parameter ## with name "dna_query" save_query(remote_url, "example_query", "DATASET = SELECT() Example_Dataset_1; MATERIALIZE DATASET INTO RESULT_DS;") ## With system.file() this statement defines the path to the folder ## "example" of the package "RGMQL", and then it saves the query written ## in the text file "query1.txt" into remote repository test_path <- system.file("example", package = "RGMQL") test_query <- file.path(test_path, "query1.txt") save_query_fromfile(remote_url, "query1", test_query)
Wrapper to GMQL PROJECT operator
It creates, from an existing dataset, a new dataset with all the samples from input dataset, but keeping for each sample in the input dataset only those metadata and/or region attributes specified. Region coordinates and values of the remaining metadata and/or region attributes remain equal to those in the input dataset. It allows to:
Remove existing metadata and/or region attributes from a dataset
Update or set new metadata and/or region attributes in the result
## S4 method for signature 'GMQLDataset' select( .data, metadata = NULL, metadata_update = NULL, all_but_meta = FALSE, regions = NULL, regions_update = NULL, all_but_reg = FALSE )
## S4 method for signature 'GMQLDataset' select( .data, metadata = NULL, metadata_update = NULL, all_but_meta = FALSE, regions = NULL, regions_update = NULL, all_but_reg = FALSE )
.data |
GMQLDataset class object |
metadata |
vector of strings made up by metadata attributes |
metadata_update |
list of updating rules in the form of key = value generating new metadata attributes and/or attribute values. The following options are available:
|
all_but_meta |
logical value indicating which metadata you want to exclude; If FALSE, only the metadata attributes specified in metadata argument are kept in the output of the operation; if TRUE, the metadata are all kept except those in metadata argument. If metadata input parameter is not defined all_but_meta is not considerd. |
regions |
vector of strings made up by region attributes |
regions_update |
list of updating rules in the form of key = value generating new genomic region attributes and/or values. The following options are available:
|
all_but_reg |
logical value indicating which region attributes you want to exclude; if FALSE, only the regions attributes specified in regions argumentare kept in the output of the operation; if TRUE, the regions attributes are all kept except those in regions argument. If regions is not defined, all_but_reg is not considerd. |
GMQLDataset object. It contains the value to use as input for the subsequent GMQLDataset method
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such folder as a GMQL dataset ## named "data" init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") data = read_gmql(test_path) ## This statement creates a new dataset called CTCF_NORM_SCORE by preserving ## all region attributes apart from score, and creating a new region ## attribute called new_score by dividing the existing score value of each ## region by 1000.0 and incrementing it by 100. ## It also generates, for each sample of the new dataset, ## a new metadata attribute called normalized with value 1, ## which can be used in future selections. CTCF_NORM_SCORE = select(data, metadata_update = list(normalized = 1), regions_update = list(new_score = (score / 1000.0) + 100), regions = c("score"), all_but_reg = TRUE)
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such folder as a GMQL dataset ## named "data" init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") data = read_gmql(test_path) ## This statement creates a new dataset called CTCF_NORM_SCORE by preserving ## all region attributes apart from score, and creating a new region ## attribute called new_score by dividing the existing score value of each ## region by 1000.0 and incrementing it by 100. ## It also generates, for each sample of the new dataset, ## a new metadata attribute called normalized with value 1, ## which can be used in future selections. CTCF_NORM_SCORE = select(data, metadata_update = list(normalized = 1), regions_update = list(new_score = (score / 1000.0) + 100), regions = c("score"), all_but_reg = TRUE)
This function is used as support to the filter method to define semijoin conditions on metadata
semijoin(.data, is_in = TRUE, groupBy)
semijoin(.data, is_in = TRUE, groupBy)
.data |
GMQLDataset class object |
is_in |
logical value: TRUE => for a given sample of input dataset
'.data' in |
groupBy |
|
semijoin condition as list
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folders "DATASET" and "DATASET_GDM" in the subdirectory ## "example" of the package "RGMQL" and opens such folders as GMQL datasets ## named "data" and "join_data", respectively init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") test_path2 <- system.file("example", "DATASET_GDM", package = "RGMQL") data <- read_gmql(test_path) join_data <- read_gmql(test_path2) ## This statement creates a new dataset called 'jun_tf' by selecting those ## samples and their regions from the existing 'data' dataset such that: ## Each output sample has a metadata attribute called antibody_target ## with value JUN. ## Each output sample also has not a metadata attribute called cell ## that has the same value of at least one of the values that a metadata ## attribute equally called cell has in at least one sample ## of the 'join_data' dataset. ## For each sample satisfying previous conditions, only its regions that ## have a region attribute called pValue with the associated value ## less than 0.01 are conserved in output jun_tf <- filter(data, antibody_target == "JUN", pvalue < 0.01, semijoin(join_data, FALSE, conds("cell")))
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folders "DATASET" and "DATASET_GDM" in the subdirectory ## "example" of the package "RGMQL" and opens such folders as GMQL datasets ## named "data" and "join_data", respectively init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") test_path2 <- system.file("example", "DATASET_GDM", package = "RGMQL") data <- read_gmql(test_path) join_data <- read_gmql(test_path2) ## This statement creates a new dataset called 'jun_tf' by selecting those ## samples and their regions from the existing 'data' dataset such that: ## Each output sample has a metadata attribute called antibody_target ## with value JUN. ## Each output sample also has not a metadata attribute called cell ## that has the same value of at least one of the values that a metadata ## attribute equally called cell has in at least one sample ## of the 'join_data' dataset. ## For each sample satisfying previous conditions, only its regions that ## have a region attribute called pValue with the associated value ## less than 0.01 are conserved in output jun_tf <- filter(data, antibody_target == "JUN", pvalue < 0.01, semijoin(join_data, FALSE, conds("cell")))
Wrapper to GMQL DIFFERENCE operator
It produces one sample in the result for each sample of the left operand, by keeping the same metadata of the left input sample and only those regions (with their attributes and values) of the left input sample which do not intersect with any region in any right operand sample. The optional joinBy clause is used to extract a subset of pairs from the Cartesian product of the two input datasets x and y on which to apply the DIFFERENCE operator: only those samples that have the same value for each specified metadata attribute are considered when performing the difference.
## S4 method for signature 'GMQLDataset,GMQLDataset' setdiff(x, y, joinBy = conds(), is_exact = FALSE)
## S4 method for signature 'GMQLDataset,GMQLDataset' setdiff(x, y, joinBy = conds(), is_exact = FALSE)
x |
GMQLDataset class object |
y |
GMQLDataset class object |
joinBy |
|
is_exact |
single logical value: TRUE means that the region difference is executed only on regions in 'x' dataset with exactly the same coordinates of at least one region present in 'y' dataset; if is_exact = FALSE, the difference is executed on all regions in 'x' dataset that overlap (even just one base) with at least one region in 'y' dataset |
GMQLDataset object. It contains the value to use as input for the subsequent GMQLDataset method
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folders "DATASET" and "DATASET_GDM" in the subdirectory ## "example" of the package "RGMQL" and opens such folders as a GMQL ## datasets named "data1" and "data2", respectively, using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") test_path2 <- system.file("example", "DATASET_GDM", package = "RGMQL") data1 = read_gmql(test_path) data2 = read_gmql(test_path2) ## This statement returns all the regions in the first dataset ## that do not overlap any region in the second dataset. out = setdiff(data1, data2) ## This statement extracts for every pair of samples s1 in data1 ## and s2 in data2 having the same value of the metadata ## attribute 'cell' the regions that appear in s1 but ## do not overlap any region in s2. ## Metadata of the result are the same as the metadata of s1. out_t = setdiff(data1, data2, conds("cell"))
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folders "DATASET" and "DATASET_GDM" in the subdirectory ## "example" of the package "RGMQL" and opens such folders as a GMQL ## datasets named "data1" and "data2", respectively, using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") test_path2 <- system.file("example", "DATASET_GDM", package = "RGMQL") data1 = read_gmql(test_path) data2 = read_gmql(test_path2) ## This statement returns all the regions in the first dataset ## that do not overlap any region in the second dataset. out = setdiff(data1, data2) ## This statement extracts for every pair of samples s1 in data1 ## and s2 in data2 having the same value of the metadata ## attribute 'cell' the regions that appear in s1 but ## do not overlap any region in s2. ## Metadata of the result are the same as the metadata of s1. out_t = setdiff(data1, data2, conds("cell"))
It show the presence of the metadata keys in that specific regions set, showing its value or just the logical value TRUE.
show_all_metadata(dataset, show_value = FALSE)
show_all_metadata(dataset, show_value = FALSE)
dataset |
string with GMQL dataset folder path or remote dataset. In case of remote dataset to distinguish among private or public repository each name must be prefixed with "private." or "public." respectively. |
show_value |
whether or not show the value associated to metadata, otherwise only logical value (TRUE or FALSE) are shown. |
A Dataframe containing the mapping between metadata and the regions set
## This statement defines the path to the sub-directory "example" of the ## package "RGMQL" and show all the metadata inside the GMQL dataset among ## all the meta files and return a data-frame, viewing as logical value ## representing its presence or not for each region set. test_path <- system.file("example", "DATASET", package = "RGMQL") show_all_metadata(test_path) ## This statement defines the path to the sub-directory "example" of the ## package "RGMQL" and show all the metadata inside the GMQL dataset among ## all the meta files and return a data-frame, viewing also its value. test_path <- system.file("example", "DATASET", package = "RGMQL") show_all_metadata(test_path, show_value = TRUE) ## This statement the remote dataset called "Example_Dataset_1" on public ## repository and show all the metadata inside the GMQL dataset among ## all the meta files and return a data-frame, viewing also its value. test_url = "http://www.gmql.eu/gmql-rest" login_gmql(test_url) show_all_metadata("public.Example_Dataset_1", show_value = TRUE)
## This statement defines the path to the sub-directory "example" of the ## package "RGMQL" and show all the metadata inside the GMQL dataset among ## all the meta files and return a data-frame, viewing as logical value ## representing its presence or not for each region set. test_path <- system.file("example", "DATASET", package = "RGMQL") show_all_metadata(test_path) ## This statement defines the path to the sub-directory "example" of the ## package "RGMQL" and show all the metadata inside the GMQL dataset among ## all the meta files and return a data-frame, viewing also its value. test_path <- system.file("example", "DATASET", package = "RGMQL") show_all_metadata(test_path, show_value = TRUE) ## This statement the remote dataset called "Example_Dataset_1" on public ## repository and show all the metadata inside the GMQL dataset among ## all the meta files and return a data-frame, viewing also its value. test_url = "http://www.gmql.eu/gmql-rest" login_gmql(test_url) show_all_metadata("public.Example_Dataset_1", show_value = TRUE)
It shows all GMQL datasets stored by the user or public in remote repository, using the proper GMQL web service available on a remote server
show_datasets_list(url)
show_datasets_list(url)
url |
single string url of server: It must contain the server address and base url; service name is added automatically |
If error occurs, a specific error is printed
List of datasets. Every dataset in the list is described by:
name: name of dataset
owner: public or name of the user
## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## List all datasets list <- show_datasets_list(remote_url)
## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## List all datasets list <- show_datasets_list(remote_url)
It shows all jobs (run, succeded or failed) invoked by the user on remote server using, the proper GMQL web service available on a remote server
show_jobs_list(url)
show_jobs_list(url)
url |
string url of server: It must contain the server address and base url; service name is added automatically |
If error occurs, a specific error is printed
List of jobs. Every job in the list is described by:
id: unique job identifier
## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## List all jobs list_jobs <- show_jobs_list(remote_url)
## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## List all jobs list_jobs <- show_jobs_list(remote_url)
It shows all the GMQL queries saved by the user on remote repository, using the proper GMQL web service available on a remote server
show_queries_list(url)
show_queries_list(url)
url |
string url of server: It must contain the server address and base url; service name is added automatically |
If error occurs, a specific error is printed
List of queries. Every query in the list is described by:
name: name of query
text: text of GMQL query
## Login to GMQL REST services suite remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## List all queries executed on remote GMQL system list <- show_queries_list(remote_url)
## Login to GMQL REST services suite remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## List all queries executed on remote GMQL system list <- show_queries_list(remote_url)
It show all samples from a specific GMQL dataset on remote repository, using the proper GMQL web service available on a remote server
show_samples_list(url, datasetName)
show_samples_list(url, datasetName)
url |
string url of server: It must contain the server address and base url; service name is added automatically |
datasetName |
name of dataset containing the samples whose list we like to get; if the dataset is a public dataset, we have to add "public." as prefix, as shown in the example below, otherwise no prefix is needed |
If error occurs, a specific error is printed
List of samples in dataset. Every sample in the list is described by:
id: id of sample
name: name of sample
path: sample repository path
## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## This statement shows all samples present into public dataset ## 'Example_Dataset_1' list <- show_samples_list(remote_url, "public.Example_Dataset_1")
## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## This statement shows all samples present into public dataset ## 'Example_Dataset_1' list <- show_samples_list(remote_url, "public.Example_Dataset_1")
It shows the region attribute schema of a specific GMQL dataset on remote repository, using the proper GMQL web service available on a remote server
show_schema(url, datasetName)
show_schema(url, datasetName)
url |
string url of server: It must contain the server address and base url; service name is added automatically |
datasetName |
name of dataset to get the schema; if the dataset is a public dataset, we have to add "public." as prefix, as shown in the example below, otherwise no prefix is needed |
If error occurs, a specific error is printed
List of region schema fields. Every field in the list is described by:
name: name of field (e.g. chr, start, end, strand, ...)
fieldType: (e.g. STRING, DOUBLE, ...)
## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## Show schema of public dataset 'Example_Dataset_1' list <- show_schema(remote_url, "public.Example_Dataset_1")
## Login to GMQL REST services suite as guest remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## Show schema of public dataset 'Example_Dataset_1' list <- show_schema(remote_url, "public.Example_Dataset_1")
It stops GMQL server processing
stop_gmql()
stop_gmql()
None
## This statement first initializes GMQL with local processing and with ## sample file output format as tab-delimited, and then stops it init_gmql("tab", FALSE) stop_gmql()
## This statement first initializes GMQL with local processing and with ## sample file output format as tab-delimited, and then stops it init_gmql("tab", FALSE) stop_gmql()
It stops a specific current query job
stop_job(url, job_id)
stop_job(url, job_id)
url |
string url of server: It must contain the server address and base url; service name is added automatically |
job_id |
string id of the job |
If error occurs, a specific error is printed
None
## Not run: ## Login to GMQL REST services suite at remote url remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## This statement shows all jobs at GMQL remote system and selects one ## running job, saving it into 'jobs_1' (in this case is the first of the ## list), and then stop it list_jobs <- show_jobs_list(remote_url) jobs_1 <- list_jobs$jobs[[1]] stop_job(remote_url, jobs_1) ## End(Not run)
## Not run: ## Login to GMQL REST services suite at remote url remote_url = "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## This statement shows all jobs at GMQL remote system and selects one ## running job, saving it into 'jobs_1' (in this case is the first of the ## list), and then stop it list_jobs <- show_jobs_list(remote_url) jobs_1 <- list_jobs$jobs[[1]] stop_job(remote_url, jobs_1) ## End(Not run)
Wrapper to TAKE operation
It saves the content of a dataset that contains samples metadata and regions as GRangesList. It is normally used to store in memory the content of any dataset generated during a GMQL query. The operation can be very time-consuming. If you invoked any materialization before take function, all those datasets are materialized as folders.
take(.data, ...) ## S4 method for signature 'GMQLDataset' take(.data, rows = 0L)
take(.data, ...) ## S4 method for signature 'GMQLDataset' take(.data, rows = 0L)
.data |
returned object from any GMQL function |
... |
Additional arguments for use in other specific methods of the generic take function |
rows |
number of regions rows for each sample that you want to retrieve and store in memory. By default it is 0, that means take all rows for each sample |
GRangesList with associated metadata
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such folder as a GMQL dataset ## named "rd" using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") rd = read_gmql(test_path) ## This statement creates a dataset called 'aggr' which contains one ## sample for each antibody_target and cell value found within the metadata ## of the 'rd' dataset sample; each created sample contains all regions ## from all 'rd' samples with a specific value for their ## antibody_target and cell metadata attributes. aggr = aggregate(rd, conds(c("antibody_target", "cell"))) ## This statement performs the query and returns the resulted dataset as ## GRangesList named 'taken'. It returns only the first 45 regions of ## each sample present into GRangesList and all the medatata associated ## with each sample taken <- take(aggr, rows = 45)
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folder "DATASET" in the subdirectory "example" ## of the package "RGMQL" and opens such folder as a GMQL dataset ## named "rd" using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") rd = read_gmql(test_path) ## This statement creates a dataset called 'aggr' which contains one ## sample for each antibody_target and cell value found within the metadata ## of the 'rd' dataset sample; each created sample contains all regions ## from all 'rd' samples with a specific value for their ## antibody_target and cell metadata attributes. aggr = aggregate(rd, conds(c("antibody_target", "cell"))) ## This statement performs the query and returns the resulted dataset as ## GRangesList named 'taken'. It returns only the first 45 regions of ## each sample present into GRangesList and all the medatata associated ## with each sample taken <- take(aggr, rows = 45)
Wrapper to GMQL UNION operator
It is used to integrate samples of two homogeneous or heterogeneous datasets within a single dataset; for each sample of either input dataset, a result sample is created as follows:
Metadata are the same as in the original sample.
Resulting schema is the schema of the left input dataset.
Regions are the same (in coordinates and attribute values) as in the original sample, if it is from the left input dataset; if it is from the right input dataset, its regions are the same in coordinates, but only region attributes identical (in name and type) to those of the left input dataset are retained, with the same values. Region attributes which are missing in an input dataset sample w.r.t. the merged schema are set to null.
## S4 method for signature 'GMQLDataset,GMQLDataset' union(x, y)
## S4 method for signature 'GMQLDataset,GMQLDataset' union(x, y)
x |
GMQLDataset object |
y |
GMQLDataset object |
GMQLDataset object. It contains the value to use as input for the subsequent GMQLDataset method
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folders "DATASET" and "DATASET_GDM" in the subdirectory ## "example" of the package "RGMQL" and opens such folders as a GMQL ## datasets named "data1" and "data2", respectively, using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") test_path2 <- system.file("example", "DATASET_GDM", package = "RGMQL") data1 <- read_gmql(test_path) data2 <- read_gmql(test_path2) ## This statement creates a dataset called 'full' which contains all samples ## from the datasets 'data1' and 'data2' res <- union(data1, data2)
## This statement initializes and runs the GMQL server for local execution ## and creation of results on disk. Then, with system.file() it defines ## the path to the folders "DATASET" and "DATASET_GDM" in the subdirectory ## "example" of the package "RGMQL" and opens such folders as a GMQL ## datasets named "data1" and "data2", respectively, using CustomParser init_gmql() test_path <- system.file("example", "DATASET", package = "RGMQL") test_path2 <- system.file("example", "DATASET_GDM", package = "RGMQL") data1 <- read_gmql(test_path) data2 <- read_gmql(test_path2) ## This statement creates a dataset called 'full' which contains all samples ## from the datasets 'data1' and 'data2' res <- union(data1, data2)
It uploads a folder (GMQL or not) containing sample files using the proper GMQL web service available on a remote server: a new dataset is created on remote repository
upload_dataset(url, datasetName, folderPath, schemaName = NULL)
upload_dataset(url, datasetName, folderPath, schemaName = NULL)
url |
string url of server: It must contain the server address and base url; service name is added automatically |
datasetName |
name of dataset to create in repository |
folderPath |
string local path to the folder containing the samples files |
schemaName |
string name of schema used to parse the samples; schemaName available are:
if schemaName is NULL, it looks for a XML schema file to read in the folderPath |
If no error occurs, it prints "Upload Complete", otherwise a specific error is printed
NOTE: The folder layout must obey the following rules and adopt the following layout: The dataset folder can have any name, but must contains the sub-folders named: "files". The sub-folder "files" contains the dataset files and the schema xml file. The schema files adopt the following the naming conventions:
- "schema.xml" - "test.schema"
The names must be in LOWERCASE. Any other schema file will not be conisdered, if both are present, "test.schema" will be used.
None
## Not run: ## This statement defines the path to the folder "DATASET_GDM" in the ## subdirectory "example" of the package "RGMQL" test_path <- system.file("example", "DATASET_GDM", package = "RGMQL") ## Login to GMQL REST services suite at remote url remote_url <- "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## Upload of GMQL dataset with "dataset1" as name, without specifying any ## schema upload_dataset(remote_url, "dataset1", folderPath = test_path) ## End(Not run)
## Not run: ## This statement defines the path to the folder "DATASET_GDM" in the ## subdirectory "example" of the package "RGMQL" test_path <- system.file("example", "DATASET_GDM", package = "RGMQL") ## Login to GMQL REST services suite at remote url remote_url <- "http://www.gmql.eu/gmql-rest/" login_gmql(remote_url) ## Upload of GMQL dataset with "dataset1" as name, without specifying any ## schema upload_dataset(remote_url, "dataset1", folderPath = test_path) ## End(Not run)