Package 'gemma.R' reference manual

Title:	A wrapper for Gemma's Restful API to access curated gene expression data and differential expression analyses
Description:	Low- and high-level wrappers for Gemma's RESTful API. They enable access to curated expression and differential expression data from over 10,000 published studies. Gemma is a web site, database and a set of tools for the meta-analysis, re-use and sharing of genomics data, currently primarily targeted at the analysis of gene expression profiles.
Authors:	Javier Castillo-Arnemann [aut] , Jordan Sicherman [aut] , Ogan Mancarci [cre, aut] , Guillaume Poirier-Morency [aut]
Maintainer:	Ogan Mancarci <[email protected]>
License:	Apache License (>= 2)
Version:	3.1.9
Built:	2024-07-17 19:38:41 UTC
Source:	https://github.com/bioc/gemma.R

Return all supported filter properties

Description

Some functions such as get_datasets and get_platforms_by_ids include a filter argument that allows creation of more complex queries. This function returns a list of supported properties to be used in those filters

Usage

filter_properties()
filter_properties()

Value

A list of data.tables that contain supported properties and their data types

Examples

filter_properties()

filter_properties()

Clear gemma.R cache

Description

Forget past results from memoised calls to the Gemma API (ie. using functions with memoised = TRUE)

Usage

forget_gemma_memoised()
forget_gemma_memoised()

Value

TRUE to indicate cache was cleared.

Examples

forget_gemma_memoised()
forget_gemma_memoised()

Custom gemma call

Description

A minimal function to create custom calls. Can be used to acquire unimplemented endpoints and/or raw output without any processing. Refer to the API documentation.

Usage

gemma_call(call, ..., json = TRUE)
gemma_call(call, ..., json = TRUE)

Arguments

`call`	Gemma API endpoint.
`...`	parameters included in the call
`json`	If `TRUE` will parse the content as a list

Value

A list if json = TRUE and an httr response if FALSE

Examples

# get singular value decomposition for the dataset
gemma_call('datasets/{dataset}/svd',dataset = 1)
# get singular value decomposition for the dataset
gemma_call('datasets/{dataset}/svd',dataset = 1)

Create printable tables out of gemma.R outputs

Description

Creates a kable where certain columns are automatically shortened to better fit a document.

Usage

gemma_kable(table)
gemma_kable(table)

Arguments

table

A data.table or data.frame outputted by a gemma.R function

Enable and disable memoisation of gemma.R functions

Description

Enable and disable memoisation of gemma.R functions

Usage

gemma_memoise(
  memoised = FALSE,
  cache = rappdirs::user_cache_dir(appname = "gemmaR")
)
gemma_memoise(
  memoised = FALSE,
  cache = rappdirs::user_cache_dir(appname = "gemmaR")
)

Arguments

`memoised`	boolean. If TRUE memoisation will be enabled
`cache`	File path or "cache_in_memory". File path will chose a location to save the cache files for memoisation. "cache_in_memory" will store the cache in the current R session

gemma.R package: Access curated gene expression data and differential expression analyses

Description

This package contains wrappers and convenience functions for Gemma's RESTful API that enables access to curated expression and differential expression data from over 15,000 published studies (as of mid-2022). Gemma (https://gemma.msl.ubc.ca) is a web site, database and a set of tools for the meta-analysis, re-use and sharing of transcriptomics data, currently primarily targeted at the analysis of gene expression profiles.

Details

Most users will want to start with the high-level functions like get_dataset_object, get_differential_expression_values and get_platform_annotations Additional lower-level methods are available that directly map to the Gemma RESTful API methods.

For more information and detailed usage instructions check the README, the function reference and the vignette.

All software-related questions should be posted to the Bioconductor Support Site: https://support.bioconductor.org

Author(s)

Javier Castillo-Arnemann, Jordan Sicherman, Ogan Mancarci, Guillaume Poirier-Morency

References

Lim, N. et al., Curation of over 10 000 transcriptomic studies to enable data reuse, Database, 2021. https://doi.org/10.1093/database/baab006

Get all pages of a paginated call

Description

Given a Gemma.R output from a function with offset and limit arguments, returns the output from all pages. All arguments other than offset, limit

Usage

get_all_pages(
  query,
  step_size = 100,
  binder = rbind,
  directory = NULL,
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_all_pages(
  query,
  step_size = 100,
  binder = rbind,
  directory = NULL,
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`query`	Output from a gemma.R function with offset and limit argument
`step_size`	Size of individual calls to the server. 100 is the maximum value
`binder`	Binding function for the calls. If `raw = FALSE` use `rbind` to combine the data.tables. If not, use `c` to combine lists
`directory`	Directory to save the output from the individual calls to. If provided, each page is saved to separate files.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. This function always saves the output as an RDS file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

A data.table or a list containing data from all pages.

Return child terms of a term

Description

When querying for ontology terms, Gemma propagates these terms to include any datasets with their child terms in the results. This function returns these children for any number of terms, including all children and the terms itself in the output vector

Usage

get_child_terms(terms)
get_child_terms(terms)

Arguments

terms

An array of terms

Value

An array containing descendends of the annotation terms, including the terms themselves

Examples

get_child_terms("http://purl.obolibrary.org/obo/MONDO_0000408")

get_child_terms("http://purl.obolibrary.org/obo/MONDO_0000408")

Retrieve the annotations of a dataset

Description

Retrieve the annotations of a dataset

Usage

get_dataset_annotations(
  dataset,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_dataset_annotations(
  dataset,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`dataset`	A numerical dataset identifier or a dataset short name
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

A data table with information about the annotations of the queried dataset. A list if raw = TRUE.A 404 error if the given identifier does not map to any object.

The fields of the output data.table are:

class.name: Name of the annotation class (e.g. organism part)
class.URI: URI for the annotation class
term.name: Name of the annotation term (e.g. lung)
term.URI: URI for the annotation term
object.class: Class of object that the term originated from.

Examples

get_dataset_annotations("GSE2018")
get_dataset_annotations("GSE2018")

Retrieve the design of a dataset

Description

Retrieve the design of a dataset

Usage

get_dataset_design(
  dataset,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_dataset_design(
  dataset,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`dataset`	A numerical dataset identifier or a dataset short name
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

A data table of the design matrix for the queried dataset. A 404 error if the given identifier does not map to any object

Examples

head(get_dataset_design("GSE2018"))
head(get_dataset_design("GSE2018"))

Retrieve annotations and surface level stats for a dataset's differential analyses

Description

Retrieve annotations and surface level stats for a dataset's differential analyses

Usage

get_dataset_differential_expression_analyses(
  dataset,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_dataset_differential_expression_analyses(
  dataset,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`dataset`	A numerical dataset identifier or a dataset short name
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

A data table with information about the differential expression analysis of the queried dataset. Note that this funciton does not return differential expression values themselves. Use get_differential_expression_values to get differential expression values (see examples).

The fields of the output data.table are:

result.ID: Result set ID of the differential expression analysis. May represent multiple factors in a single model.
contrast.ID: Id of the specific contrast factor. Together with the result.ID they uniquely represent a given contrast.
experiment.ID: Id of the source experiment
factor.category: Category for the contrast
factor.category.URI: URI for the contrast category
factor.ID: ID of the factor
baseline.factors: Characteristics of the baseline. This field is a data.table
experimental.factors: Characteristics of the experimental group. This field is a data.table
isSubset: TRUE if the result set belong to a subset, FALSE if not. Subsets are created when performing differential expression to avoid unhelpful comparisons.
subsetFactor: Characteristics of the subset. This field is a data.table
probes.analyzed: Number of probesets represented in the contrast
genes.analyzed: Number of genes represented in the contrast

Examples

result <- get_dataset_differential_expression_analyses("GSE2872")
get_differential_expression_values(resultSet = result$result.ID[1])
result <- get_dataset_differential_expression_analyses("GSE2872")
get_differential_expression_values(resultSet = result$result.ID[1])

Retrieve the expression data matrix of a set of datasets and genes

Description

Retrieve the expression data matrix of a set of datasets and genes

Usage

get_dataset_expression_for_genes(
  datasets,
  genes,
  keepNonSpecific = FALSE,
  consolidate = NA_character_,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_dataset_expression_for_genes(
  datasets,
  genes,
  keepNonSpecific = FALSE,
  consolidate = NA_character_,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`datasets`	A vector of dataset IDs or short names
`genes`	A vector of NCBI IDs, Ensembl IDs or gene symbols.
`keepNonSpecific`	logical. `FALSE` by default. If `TRUE`, results from probesets that are not specific to the gene will also be returned.
`consolidate`	An option for gene expression level consolidation. If empty, will return every probe for the genes. "pickmax" to pick the probe with the highest expression, "pickvar" to pick the prove with the highest variance and "average" for returning the average expression
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

A list of data frames

Examples

get_dataset_expression_for_genes("GSE2018", genes = c(10225, 2841))
get_dataset_expression_for_genes("GSE2018", genes = c(10225, 2841))

Compile gene expression data and metadata

Description

Return an annotated Bioconductor-compatible data structure or a long form tibble of the queried dataset, including expression data and the experimental design.

Usage

get_dataset_object(
  datasets,
  genes = NULL,
  keepNonSpecific = FALSE,
  consolidate = NA_character_,
  resultSets = NULL,
  contrasts = NULL,
  metaType = "text",
  type = "se",
  memoised = getOption("gemma.memoised", FALSE)
)
get_dataset_object(
  datasets,
  genes = NULL,
  keepNonSpecific = FALSE,
  consolidate = NA_character_,
  resultSets = NULL,
  contrasts = NULL,
  metaType = "text",
  type = "se",
  memoised = getOption("gemma.memoised", FALSE)
)

Arguments

`datasets`	A vector of dataset IDs or short names
`genes`	A vector of NCBI IDs, Ensembl IDs or gene symbols.
`keepNonSpecific`	logical. `FALSE` by default. If `TRUE`, results from probesets that are not specific to the gene will also be returned.
`consolidate`	An option for gene expression level consolidation. If empty, will return every probe for the genes. "pickmax" to pick the probe with the highest expression, "pickvar" to pick the prove with the highest variance and "average" for returning the average expression
`resultSets`	Result set IDs of the a differential expression analysis. Optional. If provided, the output will only include the samples from the subset used in the result set ID. Must be the same length as `datasets`.'
`contrasts`	Contrast IDs of a differential expression contrast. Optional. Need resultSets to be defined to work. If provided, the output will only include samples relevant to the specific contrats.
`metaType`	How should the metadata information should be included. Can be "text", "uri" or "both". "text" and "uri" options
`type`	"se"for a SummarizedExperiment or "eset" for Expression Set. We recommend using SummarizedExperiments which are more recent. See the Summarized experiment vignette or the ExpressionSet vignette for more details. "tidy" for a long form data frame compatible with tidyverse functions. 'list' to return a list containing individual data frames containing expression values, design and the experiment.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.

Value

A list of SummarizedExperiments, ExpressionSets or a tibble containing metadata and expression data for the queried datasets and genes. Metadata will be expanded to include a variable number of factors that annotates samples from a dataset but will always include single "factorValues" column that houses data.tables that include all annotations for a given sample.

Examples

get_dataset_object("GSE2018")
get_dataset_object("GSE2018")

Retrieve the platforms of a dataset

Description

Retrieve the platforms of a dataset

Usage

get_dataset_platforms(
  dataset,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_dataset_platforms(
  dataset,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`dataset`	A numerical dataset identifier or a dataset short name
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

A data table with information about the platform(s). A list if raw = TRUE. A 404 error if the given identifier does not map to any object

The fields of the output data.table are:

platform.ID: Internal identifier of the platform
platform.shortName: Shortname of the platform.
platform.name: Full name of the platform.
platform.description: Free text description of the platform
platform.troubled: Whether or not the platform was marked "troubled" by a Gemma process or a curator
platform.experimentCount: Number of experiments using the platform within Gemma
platform.type: Technology type for the platform.
taxon.name: Name of the species platform was made for
taxon.scientific: Scientific name for the taxon
taxon.ID: Internal identifier given to the species by Gemma
taxon.NCBI: NCBI ID of the taxon
taxon.database.name: Underlying database used in Gemma for the taxon
taxon.database.ID: ID of the underyling database used in Gemma for the taxon

Examples

get_dataset_platforms("GSE2018")
get_dataset_platforms("GSE2018")

Retrieve processed expression data of a dataset

Description

Retrieve processed expression data of a dataset

Usage

get_dataset_processed_expression(
  dataset,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_dataset_processed_expression(
  dataset,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`dataset`	A numerical dataset identifier or a dataset short name
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

If raw is FALSE (default), a data table of the expression matrix for the queried dataset. If raw is TRUE, returns the binary file in raw form.

Examples

get_dataset_processed_expression("GSE2018")
get_dataset_processed_expression("GSE2018")

Retrieve quantitation types of a dataset

Description

Retrieve quantitation types of a dataset

Usage

get_dataset_quantitation_types(
  dataset,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_dataset_quantitation_types(
  dataset,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`dataset`	A numerical dataset identifier or a dataset short name
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

A data.table containing the quantitation types

The fields of the output data.table are:

id: If of the quantitation type. Any raw quantitation type can be accessed by get_dataset_raw_expression function using this id.
name: Name of the quantitation type
description: Description of the quantitation type
type: Type of the quantitation type. Either raw or processed. Each dataset will have one processed quantitation type which is the data returned using get_dataset_processed_expression
ratio: Whether or not the quanitation type is a ratio of multiple quantitation types. Typically TRUE for processed TWOCOLOR quantitation type.
preferred: The preferred raw quantitation type. This version is used in generation of the processed data within gemma.
recomputed: If TRUE this quantitation type is generated by recomputing raw data files Gemma had access to.

Examples

get_dataset_quantitation_types("GSE59918")
get_dataset_quantitation_types("GSE59918")

Retrieve raw expression data of a dataset

Description

Retrieve raw expression data of a dataset

Usage

get_dataset_raw_expression(
  dataset,
  quantitationType,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_dataset_raw_expression(
  dataset,
  quantitationType,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`dataset`	A numerical dataset identifier or a dataset short name
`quantitationType`	Quantitation type id. These can be acquired using `get_dataset_quantitation_types` function. This endpoint can only return non-processed quantitation types.
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

If raw is FALSE (default), a data table of the expression matrix for the queried dataset. If raw is TRUE, returns the binary file in raw form.

Examples

q_types <- get_dataset_quantitation_types("GSE59918")
get_dataset_raw_expression("GSE59918", q_types$id[q_types$name == "Counts"])
q_types <- get_dataset_quantitation_types("GSE59918")
get_dataset_raw_expression("GSE59918", q_types$id[q_types$name == "Counts"])

Retrieve the samples of a dataset

Description

Retrieve the samples of a dataset

Usage

get_dataset_samples(
  dataset,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_dataset_samples(
  dataset,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`dataset`	A numerical dataset identifier or a dataset short name
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

A data table with information about the samples of the queried dataset. A list if raw = TRUE. A 404 error if the given identifier does not map to any object.

The fields of the output data.table are:

sample.name: Internal name given to the sample.
sample.ID: Internal ID of the sample
sample.description: Free text description of the sample
sample.outlier: Whether or not the sample is marked as an outlier
sample.accession: Accession ID of the sample in it's original database
sample.database: Database of origin for the sample
sample.characteristics: Characteristics of the sample. This field is a data table
sample.factorValues: Experimental factor values of the sample. This field is a data table

Examples

head(get_dataset_samples("GSE2018"))
head(get_dataset_samples("GSE2018"))

Retrieve all datasets

Description

Retrieve all datasets

Usage

get_datasets(
  query = NA_character_,
  filter = NA_character_,
  taxa = NA_character_,
  uris = NA_character_,
  offset = 0L,
  limit = 20L,
  sort = "+id",
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_datasets(
  query = NA_character_,
  filter = NA_character_,
  taxa = NA_character_,
  uris = NA_character_,
  offset = 0L,
  limit = 20L,
  sort = "+id",
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`query`	The search query. Queries can include plain text or ontology terms They also support conjunctions ("alpha AND beta"), disjunctions ("alpha OR beta") grouping ("(alpha OR beta) AND gamma"), prefixing ("alpha*"), wildcard characters ("BRCA?") and fuzzy matches ("alpha~").
`filter`	Filter results by matching expression. Use `filter_properties` function to get a list of all available parameters. These properties can be combined using "and" "or" clauses and may contain common operators such as "=", "<" or "in". (e.g. "taxon.commonName = human", "taxon.commonName in (human,mouse), "id < 1000")
`taxa`	A vector of taxon common names (e.g. human, mouse, rat). Providing multiple species will return results for all species. These are appended to the filter and equivalent to filtering for `taxon.commonName` property
`uris`	A vector of ontology term URIs. Providing multiple terms will return results containing any of the terms and their children. These are appended to the filter and equivalent to filtering for `allCharacteristics.valueUri`
`offset`	The offset of the first retrieved result.
`limit`	Defaults to 20. Limits the result to specified amount of objects. Has a maximum value of 100. Use together with `offset` and the `totalElements` attribute in the output to compile all data if needed.
`sort`	Order results by the given property and direction. The '+' sign indicate ascending order whereas the '-' indicate descending.
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

A data table with information about the queried dataset(s). A list if raw = TRUE. Returns an empty list if no datasets matched.

The fields of the output data.table are:

experiment.shortName: Shortname given to the dataset within Gemma. Often corresponds to accession ID
experiment.name: Full title of the dataset
experiment.ID: Internal ID of the dataset.
experiment.description: Description of the dataset
experiment.troubled: Did an automatic process within gemma or a curator mark the dataset as "troubled"
experiment.accession: Accession ID of the dataset in the external database it was taken from
experiment.database: The name of the database where the dataset was taken from
experiment.URI: URI of the original database
experiment.sampleCount: Number of samples in the dataset
experiment.batchEffectText: A text field describing whether the dataset has batch effects
experiment.batchCorrected: Whether batch correction has been performed on the dataset.
experiment.batchConfound: 0 if batch info isn't available, -1 if batch counfoud is detected, 1 if batch information is available and no batch confound found
experiment.batchEffect: -1 if batch p value < 0.0001, 1 if batch p value > 0.1, 0 if otherwise and when there is no batch information is available or when the data is confounded with batches.
experiment.rawData: -1 if no raw data available, 1 if raw data was available. When available, Gemma reprocesses raw data to get expression values and batches
geeq.qScore: Data quality score given to the dataset by Gemma.
geeq.sScore: Suitability score given to the dataset by Gemma. Refers to factors like batches, platforms and other aspects of experimental design
taxon.name: Name of the species
taxon.scientific: Scientific name for the taxon
taxon.ID: Internal identifier given to the species by Gemma
taxon.NCBI: NCBI ID of the taxon
taxon.database.name: Underlying database used in Gemma for the taxon
taxon.database.ID: ID of the underyling database used in Gemma for the taxon

Examples

get_datasets()
get_datasets(taxa = c("mouse", "human"), uris = "http://purl.obolibrary.org/obo/UBERON_0002048")
# filter below is equivalent to the call above
get_datasets(filter = "taxon.commonName in (mouse,human) and allCharacteristics.valueUri = http://purl.obolibrary.org/obo/UBERON_0002048")
get_datasets(query = "lung")
get_datasets()
get_datasets(taxa = c("mouse", "human"), uris = "http://purl.obolibrary.org/obo/UBERON_0002048")
# filter below is equivalent to the call above
get_datasets(filter = "taxon.commonName in (mouse,human) and allCharacteristics.valueUri = http://purl.obolibrary.org/obo/UBERON_0002048")
get_datasets(query = "lung")

Retrieve datasets by their identifiers

Description

Retrieve datasets by their identifiers

Usage

get_datasets_by_ids(
  datasets = NA_character_,
  filter = NA_character_,
  taxa = NA_character_,
  uris = NA_character_,
  offset = 0L,
  limit = 20L,
  sort = "+id",
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_datasets_by_ids(
  datasets = NA_character_,
  filter = NA_character_,
  taxa = NA_character_,
  uris = NA_character_,
  offset = 0L,
  limit = 20L,
  sort = "+id",
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`datasets`	Numerical dataset identifiers or dataset short names. If not specified, all datasets will be returned instead
`filter`	Filter results by matching expression. Use `filter_properties` function to get a list of all available parameters. These properties can be combined using "and" "or" clauses and may contain common operators such as "=", "<" or "in". (e.g. "taxon.commonName = human", "taxon.commonName in (human,mouse), "id < 1000")
`taxa`	A vector of taxon common names (e.g. human, mouse, rat). Providing multiple species will return results for all species. These are appended to the filter and equivalent to filtering for `taxon.commonName` property
`uris`	A vector of ontology term URIs. Providing multiple terms will return results containing any of the terms and their children. These are appended to the filter and equivalent to filtering for `allCharacteristics.valueUri`
`offset`	The offset of the first retrieved result.
`limit`	Defaults to 20. Limits the result to specified amount of objects. Has a maximum value of 100. Use together with `offset` and the `totalElements` attribute in the output to compile all data if needed.
`sort`	Order results by the given property and direction. The '+' sign indicate ascending order whereas the '-' indicate descending.
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

A data table with information about the queried dataset(s). A list if raw = TRUE. Returns an empty list if no datasets matched.

The fields of the output data.table are:

experiment.shortName: Shortname given to the dataset within Gemma. Often corresponds to accession ID
experiment.name: Full title of the dataset
experiment.ID: Internal ID of the dataset.
experiment.description: Description of the dataset
experiment.troubled: Did an automatic process within gemma or a curator mark the dataset as "troubled"
experiment.accession: Accession ID of the dataset in the external database it was taken from
experiment.database: The name of the database where the dataset was taken from
experiment.URI: URI of the original database
experiment.sampleCount: Number of samples in the dataset
experiment.batchEffectText: A text field describing whether the dataset has batch effects
experiment.batchCorrected: Whether batch correction has been performed on the dataset.
experiment.batchConfound: 0 if batch info isn't available, -1 if batch counfoud is detected, 1 if batch information is available and no batch confound found
experiment.batchEffect: -1 if batch p value < 0.0001, 1 if batch p value > 0.1, 0 if otherwise and when there is no batch information is available or when the data is confounded with batches.
experiment.rawData: -1 if no raw data available, 1 if raw data was available. When available, Gemma reprocesses raw data to get expression values and batches
geeq.qScore: Data quality score given to the dataset by Gemma.
geeq.sScore: Suitability score given to the dataset by Gemma. Refers to factors like batches, platforms and other aspects of experimental design
taxon.name: Name of the species
taxon.scientific: Scientific name for the taxon
taxon.ID: Internal identifier given to the species by Gemma
taxon.NCBI: NCBI ID of the taxon
taxon.database.name: Underlying database used in Gemma for the taxon
taxon.database.ID: ID of the underyling database used in Gemma for the taxon

Examples

get_datasets_by_ids("GSE2018")
get_datasets_by_ids(c("GSE2018", "GSE2872"))
get_datasets_by_ids("GSE2018")
get_datasets_by_ids(c("GSE2018", "GSE2872"))

Retrieve differential expression results

Description

Retrieves the differential expression result set(s) associated with the dataset. To get more information about the contrasts in individual resultSets and annotation terms associated them, use get_dataset_differential_expression_analyses()

Usage

get_differential_expression_values(
  dataset = NA_character_,
  resultSets = NA_integer_,
  keepNonSpecific = FALSE,
  readableContrasts = FALSE,
  memoised = getOption("gemma.memoised", FALSE)
)
get_differential_expression_values(
  dataset = NA_character_,
  resultSets = NA_integer_,
  keepNonSpecific = FALSE,
  readableContrasts = FALSE,
  memoised = getOption("gemma.memoised", FALSE)
)

Arguments

`dataset`	A dataset identifier.
`resultSets`	resultSet identifiers. If a dataset is not provided, all result sets will be downloaded. If it is provided it will only be used to ensure all result sets belong to the dataset.
`keepNonSpecific`	logical. FALSE by default. If TRUE, results from probesets that are not specific to the gene will also be returned.
`readableContrasts`	If `FALSE` (default), the returned columns will use internal constrasts IDs as names. Details about the contrasts can be accessed using `get_dataset_differential_expression_analyses`. If TRUE IDs will be replaced with human readable contrast information.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.

Details

In Gemma each result set corresponds to the estimated effects associated with a single factor in the design, and each can have multiple contrasts (for each level compared to baseline). Thus a dataset with a 2x3 factorial design will have two result sets, one of which will have one contrast, and one having two contrasts.

The methodology for differential expression is explained in Curation of over 10000 transcriptomic studies to enable data reuse. Briefly, differential expression analysis is performed on the dataset based on the annotated experimental design with up two three potentially nested factors. Gemma attempts to automatically assign baseline conditions for each factor. In the absence of a clear control condition, a baseline is arbitrarily selected. A generalized linear model with empirical Bayes shrinkage of t-statistics is fit to the data for each platform element (probe/gene) using an implementation of the limma algorithm. For RNA-seq data, we use weighted regression, applying the voom algorithm to compute weights from the mean–variance relationship of the data. Contrasts of each condition are then computed compared to the selected baseline. In some situations, Gemma will split the data into subsets for analysis. A typical such situation is when a ‘batch’ factor is present and confounded with another factor, the subsets being determined by the levels of the confounding factor.

Value

A list of data tables with differential expression values per result set.

Examples

get_differential_expression_values("GSE2018")
get_differential_expression_values("GSE2018")

Retrieve the differential expression results for a given gene among datasets matching the provided query and filter

Description

Retrieve the differential expression results for a given gene among datasets matching the provided query and filter

Usage

get_gene_differential_expression_values(
  gene,
  query = NA_character_,
  taxa = NA_character_,
  uris = NA_character_,
  filter = NA_character_,
  threshold = 1,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_gene_differential_expression_values(
  gene,
  query = NA_character_,
  taxa = NA_character_,
  uris = NA_character_,
  filter = NA_character_,
  threshold = 1,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`gene`	An ensembl gene identifier which typically starts with ensg or an ncbi gene identifier or an official gene symbol approved by hgnc
`query`	The search query. Queries can include plain text or ontology terms They also support conjunctions ("alpha AND beta"), disjunctions ("alpha OR beta") grouping ("(alpha OR beta) AND gamma"), prefixing ("alpha*"), wildcard characters ("BRCA?") and fuzzy matches ("alpha~").
`taxa`	A vector of taxon common names (e.g. human, mouse, rat). Providing multiple species will return results for all species. These are appended to the filter and equivalent to filtering for `taxon.commonName` property
`uris`	A vector of ontology term URIs. Providing multiple terms will return results containing any of the terms and their children. These are appended to the filter and equivalent to filtering for `allCharacteristics.valueUri`
`filter`	Filter results by matching expression. Use `filter_properties` function to get a list of all available parameters. These properties can be combined using "and" "or" clauses and may contain common operators such as "=", "<" or "in". (e.g. "taxon.commonName = human", "taxon.commonName in (human,mouse), "id < 1000")
`threshold`	number
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

A data.table containing differential expression results. This table is stripped down some relevant information for speed of execution. Details about the contrasts can be accessesed via get_result_sets function

The fields of the output data.table are:

result.ID: Result set ID of the differential expression analysis. May represent multiple factors in a single model.
contrast.ID: Id of the specific contrast factor. Together with the result.ID they uniquely represent a given contrast.
experiment.ID: Id of the source experiment
factor.coefficient: Model coefficient calculated for the specific contrast factor
factor.logfc: Log 2 fold change calculated for the specific contrast factor
factor.pvalue: p values calculated for the specific contrast factor

Examples

# get all differential expression results for ENO2
# from datasets marked with the ontology term for brain
head(get_gene_differential_expression_values(2026, uris = "http://purl.obolibrary.org/obo/UBERON_0000955"))
# get all differential expression results for ENO2
# from datasets marked with the ontology term for brain
head(get_gene_differential_expression_values(2026, uris = "http://purl.obolibrary.org/obo/UBERON_0000955"))

Retrieve the GO terms associated to a gene

Description

Retrieve the GO terms associated to a gene

Usage

get_gene_go_terms(
  gene,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_gene_go_terms(
  gene,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`gene`	An ensembl gene identifier which typically starts with ensg or an ncbi gene identifier or an official gene symbol approved by hgnc
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

A data table with information about the GO terms assigned to the queried gene. A list if raw = TRUE. A 404 error if the given identifier does not map to any object.

The fields of the output data.table are:

term.name: Name of the term
term.ID: ID of the term
term.URI: URI of the term

Examples

get_gene_go_terms(3091)
get_gene_go_terms(3091)

Retrieve the physical locations of a given gene

Description

Retrieve the physical locations of a given gene

Usage

get_gene_locations(
  gene,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_gene_locations(
  gene,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`gene`	An ensembl gene identifier which typically starts with ensg or an ncbi gene identifier or an official gene symbol approved by hgnc
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

A data table with information about the physical location of the queried gene. A list if raw = TRUE. A 404 error if the given identifier does not map to any object.

The fields of the output data.table are:

chromosome: Name of the chromosome the gene is located
strand: Which strand the gene is located
nucleotide: Nucleotide number for the gene
length: Gene length
taxon.name: Name of the taxon
taxon.scientific: Scientific name for the taxon
taxon.ID: Internal ID for the taxon given by Gemma
taxon.NCBI: NCBI ID for the taxon
taxon.database.name: Name of the database used in Gemma for the taxon

Examples

get_gene_locations("DYRK1A")
get_gene_locations(1859)
get_gene_locations("DYRK1A")
get_gene_locations(1859)

Retrieve the probes associated to a genes across all platforms

Description

Retrieve the probes associated to a genes across all platforms

Usage

get_gene_probes(
  gene,
  offset = 0L,
  limit = 20L,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_gene_probes(
  gene,
  offset = 0L,
  limit = 20L,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`gene`	An ensembl gene identifier which typically starts with ensg or an ncbi gene identifier or an official gene symbol approved by hgnc
`offset`	The offset of the first retrieved result.
`limit`	Defaults to 20. Limits the result to specified amount of objects. Has a maximum value of 100. Use together with `offset` and the `totalElements` attribute in the output to compile all data if needed.
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

A data table with information about the probes representing a gene across all platrofms. A list if raw = TRUE. A 404 error if the given identifier does not map to any genes.

The fields of the output data.table are:

element.name: Name of the element. Typically the probeset name
element.description: A free text field providing optional information about the element
platform.shortName: Shortname of the platform given by Gemma. Typically the GPL identifier.
platform.name: Full name of the platform
platform.ID: Id number of the platform given by Gemma
platform.type: Type of the platform.
platform.description: Free text field describing the platform.
platform.troubled: Whether the platform is marked as troubled by a Gemma curator.
taxon.name: Name of the species platform was made for
taxon.scientific: Scientific name for the taxon
taxon.ID: Internal identifier given to the species by Gemma
taxon.NCBI: NCBI ID of the taxon
taxon.database.name: Underlying database used in Gemma for the taxon
taxon.database.ID: ID of the underyling database used in Gemma for the taxon

Examples

get_gene_probes(1859)
get_gene_probes(1859)

Retrieve genes matching gene identifiers

Description

Retrieve genes matching gene identifiers

Usage

get_genes(
  genes,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_genes(
  genes,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`genes`	A vector of NCBI IDs, Ensembl IDs or gene symbols.
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

A data table with information about the querried gene(s) A list if raw = TRUE.

The fields of the output data.table are:

gene.symbol: Symbol for the gene
gene.ensembl: Ensembl ID for the gene
gene.NCBI: NCBI id for the gene
gene.name: Name of the gene
gene.aliases: Gene aliases. Each row includes a vector
gene.MFX.rank: Multifunctionality rank for the gene
taxon.name: Name of the species
taxon.scientific: Scientific name for the taxon
taxon.ID: Internal identifier given to the species by Gemma
taxon.NCBI: NCBI ID of the taxon
taxon.database.name: Underlying database used in Gemma for the taxon
taxon.database.ID: ID of the underlying database used in Gemma for the taxon

Examples

get_genes("DYRK1A")
get_genes(c("DYRK1A", "PTEN"))
get_genes("DYRK1A")
get_genes(c("DYRK1A", "PTEN"))

Retrieve Platform Annotations by Gemma

Description

Gets Gemma's platform annotations including mappings of microarray probes to genes.

Usage

get_platform_annotations(
  platform,
  annotType = c("noParents", "allParents", "bioProcess"),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE),
  memoised = getOption("gemma.memoise", FALSE),
  unzip = FALSE
)
get_platform_annotations(
  platform,
  annotType = c("noParents", "allParents", "bioProcess"),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE),
  memoised = getOption("gemma.memoise", FALSE),
  unzip = FALSE
)

Arguments

`platform`	A platform numerical identifiers or platform short name.
`annotType`	Which GO terms should the output include
`file`	Where to save the annotation file to, or empty to just load into memory
`overwrite`	Whether or not to overwrite an existing file
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`unzip`	Whether or not to unzip the file (if @param file is not empty)

Value

A table of annotations

ProbeName: Probeset names provided by the platform. Gene symbols for generic annotations
GeneSymbols: Genes that were found to be aligned to the probe sequence. Note that it is possible for probes to be non-specific. Alignment to multiple genes are indicated with gene symbols separated by "|"s
GeneNames: Name of the gene
GOTerms: GO Terms associated with the genes. annotType argument can be used to choose which terms should be included.
GemmaIDs and NCBIids: respective IDs for the genes.

Examples

head(get_platform_annotations("GPL96"))
head(get_platform_annotations('Generic_human_ncbiIds'))
head(get_platform_annotations("GPL96"))
head(get_platform_annotations('Generic_human_ncbiIds'))

Retrieve all experiments using a given platform

Description

Retrieve all experiments using a given platform

Usage

get_platform_datasets(
  platform,
  offset = 0L,
  limit = 20L,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_platform_datasets(
  platform,
  offset = 0L,
  limit = 20L,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`platform`	A platform numerical identifier or a platform short name
`offset`	The offset of the first retrieved result.
`limit`	Defaults to 20. Limits the result to specified amount of objects. Has a maximum value of 100. Use together with `offset` and the `totalElements` attribute in the output to compile all data if needed.
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

A data table with information about the queried dataset(s). A list if raw = TRUE. Returns an empty list if no datasets matched.

The fields of the output data.table are:

experiment.shortName: Shortname given to the dataset within Gemma. Often corresponds to accession ID
experiment.name: Full title of the dataset
experiment.ID: Internal ID of the dataset.
experiment.description: Description of the dataset
experiment.troubled: Did an automatic process within gemma or a curator mark the dataset as "troubled"
experiment.accession: Accession ID of the dataset in the external database it was taken from
experiment.database: The name of the database where the dataset was taken from
experiment.URI: URI of the original database
experiment.sampleCount: Number of samples in the dataset
experiment.batchEffectText: A text field describing whether the dataset has batch effects
experiment.batchCorrected: Whether batch correction has been performed on the dataset.
experiment.batchConfound: 0 if batch info isn't available, -1 if batch counfoud is detected, 1 if batch information is available and no batch confound found
experiment.batchEffect: -1 if batch p value < 0.0001, 1 if batch p value > 0.1, 0 if otherwise and when there is no batch information is available or when the data is confounded with batches.
experiment.rawData: -1 if no raw data available, 1 if raw data was available. When available, Gemma reprocesses raw data to get expression values and batches
geeq.qScore: Data quality score given to the dataset by Gemma.
geeq.sScore: Suitability score given to the dataset by Gemma. Refers to factors like batches, platforms and other aspects of experimental design
taxon.name: Name of the species
taxon.scientific: Scientific name for the taxon
taxon.ID: Internal identifier given to the species by Gemma
taxon.NCBI: NCBI ID of the taxon
taxon.database.name: Underlying database used in Gemma for the taxon
taxon.database.ID: ID of the underyling database used in Gemma for the taxon

Examples

head(get_platform_datasets("GPL1355"))
head(get_platform_datasets("GPL1355"))

Retrieve the genes associated to a probe in a given platform

Description

Retrieve the genes associated to a probe in a given platform

Usage

get_platform_element_genes(
  platform,
  probe,
  offset = 0L,
  limit = 20L,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_platform_element_genes(
  platform,
  probe,
  offset = 0L,
  limit = 20L,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`platform`	A platform numerical identifier or a platform short name
`probe`	A probe name or it's numerical identifier
`offset`	The offset of the first retrieved result.
`limit`	Defaults to 20. Limits the result to specified amount of objects. Has a maximum value of 100. Use together with `offset` and the `totalElements` attribute in the output to compile all data if needed.
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

A data table with information about the querried gene(s) A list if raw = TRUE.

The fields of the output data.table are:

gene.symbol: Symbol for the gene
gene.ensembl: Ensembl ID for the gene
gene.NCBI: NCBI id for the gene
gene.name: Name of the gene
gene.aliases: Gene aliases. Each row includes a vector
gene.MFX.rank: Multifunctionality rank for the gene
taxon.name: Name of the species
taxon.scientific: Scientific name for the taxon
taxon.ID: Internal identifier given to the species by Gemma
taxon.NCBI: NCBI ID of the taxon
taxon.database.name: Underlying database used in Gemma for the taxon
taxon.database.ID: ID of the underlying database used in Gemma for the taxon

Examples

get_platform_element_genes("GPL1355", "AFFX_Rat_beta-actin_M_at")
get_platform_element_genes("GPL1355", "AFFX_Rat_beta-actin_M_at")

Retrieve all platforms matching a set of platform identifiers

Description

Retrieve all platforms matching a set of platform identifiers

Usage

get_platforms_by_ids(
  platforms = NA_character_,
  filter = NA_character_,
  taxa = NA_character_,
  offset = 0L,
  limit = 20L,
  sort = "+id",
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_platforms_by_ids(
  platforms = NA_character_,
  filter = NA_character_,
  taxa = NA_character_,
  offset = 0L,
  limit = 20L,
  sort = "+id",
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`platforms`	Platform numerical identifiers or platform short names. If not specified, all platforms will be returned instead
`filter`	Filter results by matching expression. Use `filter_properties` function to get a list of all available parameters. These properties can be combined using "and" "or" clauses and may contain common operators such as "=", "<" or "in". (e.g. "taxon.commonName = human", "taxon.commonName in (human,mouse), "id < 1000")
`taxa`	A vector of taxon common names (e.g. human, mouse, rat). Providing multiple species will return results for all species. These are appended to the filter and equivalent to filtering for `taxon.commonName` property
`offset`	The offset of the first retrieved result.
`limit`	Defaults to 20. Limits the result to specified amount of objects. Has a maximum value of 100. Use together with `offset` and the `totalElements` attribute in the output to compile all data if needed.
`sort`	Order results by the given property and direction. The '+' sign indicate ascending order whereas the '-' indicate descending.
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

A data table with information about the platform(s). A list if raw = TRUE. A 404 error if the given identifier does not map to any object

The fields of the output data.table are:

platform.ID: Internal identifier of the platform
platform.shortName: Shortname of the platform.
platform.name: Full name of the platform.
platform.description: Free text description of the platform
platform.troubled: Whether or not the platform was marked "troubled" by a Gemma process or a curator
platform.experimentCount: Number of experiments using the platform within Gemma
platform.type: Technology type for the platform.
taxon.name: Name of the species platform was made for
taxon.scientific: Scientific name for the taxon
taxon.ID: Internal identifier given to the species by Gemma
taxon.NCBI: NCBI ID of the taxon
taxon.database.name: Underlying database used in Gemma for the taxon
taxon.database.ID: ID of the underyling database used in Gemma for the taxon

Examples

get_platforms_by_ids("GPL1355")
get_platforms_by_ids(c("GPL1355", "GPL96"))
get_platforms_by_ids("GPL1355")
get_platforms_by_ids(c("GPL1355", "GPL96"))

Retrieve all result sets matching the provided criteria

Description

Returns queried result set

Usage

get_result_sets(
  datasets = NA_character_,
  resultSets = NA_character_,
  filter = NA_character_,
  offset = 0,
  limit = 20,
  sort = "+id",
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
get_result_sets(
  datasets = NA_character_,
  resultSets = NA_character_,
  filter = NA_character_,
  offset = 0,
  limit = 20,
  sort = "+id",
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`datasets`	A vector of dataset IDs or short names
`resultSets`	A resultSet identifier. Note that result set identifiers are not static and can change when Gemma re-runs analyses internally. Whem using these as inputs, try to make sure you access a currently existing result set ID by basing them on result sets returned for a particular dataset or filter used in `get_result_sets`
`filter`	Filter results by matching expression. Use `filter_properties` function to get a list of all available parameters. These properties can be combined using "and" "or" clauses and may contain common operators such as "=", "<" or "in". (e.g. "taxon.commonName = human", "taxon.commonName in (human,mouse), "id < 1000")
`offset`	The offset of the first retrieved result.
`limit`	Defaults to 20. Limits the result to specified amount of objects. Has a maximum value of 100. Use together with `offset` and the `totalElements` attribute in the output to compile all data if needed.
`sort`	Order results by the given property and direction. The '+' sign indicate ascending order whereas the '-' indicate descending.
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Details

Output and usage of this function is mostly identical to get_dataset_differential_expression_analyses. The principal difference being the ability to restrict your result sets, being able to query across multiple datasets and being able to use the filter argument to search based on result set properties.

Value

A data table with information about the queried result sets. Note that this function does not return differential expression values themselves. Use get_differential_expression_values to get differential expression values

result.ID: Result set ID of the differential expression analysis. May represent multiple factors in a single model.
contrast.ID: Id of the specific contrast factor. Together with the result.ID they uniquely represent a given contrast.
experiment.ID: Id of the source experiment
factor.category: Category for the contrast
factor.category.URI: URI for the contrast category
factor.ID: ID of the factor
baseline.factors: Characteristics of the baseline. This field is a data.table
experimental.factors: Characteristics of the experimental group. This field is a data.table
isSubset: TRUE if the result set belong to a subset, FALSE if not. Subsets are created when performing differential expression to avoid unhelpful comparisons.
subsetFactor: Characteristics of the subset. This field is a data.table

Examples

get_result_sets(dataset = 1)
# get all contrasts comparing disease states. use filter_properties to see avaialble options
get_result_sets(filter = "baselineGroup.characteristics.value = disease")
get_result_sets(dataset = 1)
# get all contrasts comparing disease states. use filter_properties to see avaialble options
get_result_sets(filter = "baselineGroup.characteristics.value = disease")

Get taxa

Description

Returns taxa and their versions used in Gemma

Usage

get_taxa(memoised = getOption("gemma.memoised", FALSE))
get_taxa(memoised = getOption("gemma.memoised", FALSE))

Arguments

memoised

Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.

Value

A data frame including the names, IDs and database information about the taxons

Examples

get_taxa()
get_taxa()

Make simplified design frames

Description

Using on the output of get_dataset_samples, this function creates a simplified design table, granting one column to each experimental variable

Usage

make_design(samples, metaType = "text")
make_design(samples, metaType = "text")

Arguments

`samples`	An output from get_dataset_samples. The output should not be raw
`metaType`	Type of metadata to include in the output. "text", "uri" or "both"

Value

A data.frame including the design table for the dataset

Examples

samples <- get_dataset_samples('GSE46416') 
make_design(samples)

samples <- get_dataset_samples('GSE46416') 
make_design(samples)

Search for annotation tags

Description

Search for annotation tags

Usage

search_annotations(
  query,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
search_annotations(
  query,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`query`	The search query. Queries can include plain text or ontology terms They also support conjunctions ("alpha AND beta"), disjunctions ("alpha OR beta") grouping ("(alpha OR beta) AND gamma"), prefixing ("alpha*"), wildcard characters ("BRCA?") and fuzzy matches ("alpha~").
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

A data table with annotations (annotation search result value objects) matching the given identifiers. A list if raw = TRUE. A 400 error if required parameters are missing.

The fields of the output data.table are:

category.name: Category that the annotation belongs to
category.URI: URI for the category.name
value.name: Annotation term
value.URI: URI for the value.name

Examples

search_annotations("traumatic")
search_annotations("traumatic")

Search everything in Gemma

Description

Search everything in Gemma

Usage

search_gemma(
  query,
  taxon = NA_character_,
  platform = NA_character_,
  limit = 100,
  resultType = "experiment",
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)
search_gemma(
  query,
  taxon = NA_character_,
  platform = NA_character_,
  limit = 100,
  resultType = "experiment",
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)
)

Arguments

`query`	The search query. Queries can include plain text or ontology terms They also support conjunctions ("alpha AND beta"), disjunctions ("alpha OR beta") grouping ("(alpha OR beta) AND gamma"), prefixing ("alpha*"), wildcard characters ("BRCA?") and fuzzy matches ("alpha~").
`taxon`	A numerical taxon identifier or an ncbi taxon identifier or a taxon identifier that matches either its scientific or common name
`platform`	A platform numerical identifier or a platform short name
`limit`	Defaults to 100 with a maximum value of 2000. Limits the number of returned results. Note that this function does not support pagination.
`resultType`	The kind of results that should be included in the output. Can be experiment, gene, platform or a long object type name, documented in the API documentation.
`raw`	`TRUE` to receive results as-is from Gemma, or `FALSE` to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.
`memoised`	Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing `options(gemma.memoised = TRUE)` will ensure that the cache is always used. Use `forget_gemma_memoised` to clear the cache.
`file`	The name of a file to save the results to, or `NULL` to not write results to a file. If `raw == TRUE`, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.
`overwrite`	Whether or not to overwrite if a file exists at the specified filename.

Value

If raw = FALSE and resultType is experiment, gene or platform, a data.table containing the search results. If it is any other type, a list of results. A list with additional details about the search if raw = TRUE

Examples

search_gemma("bipolar")
search_gemma("bipolar")

Authentication by user name

Description

Allows the user to access information that requires logging in to Gemma. To log out, run set_gemma_user without specifying the username or password.

Usage

set_gemma_user(username = NULL, password = NULL)
set_gemma_user(username = NULL, password = NULL)

Arguments

`username`	Your username (or empty, if logging out)
`password`	Your password (or empty, if logging out)

Value

TRUE if authentication is successful, FALSE if not

Update result

Description

Re-runs the function used to create a gemma.R output to update the data at hand. Useful if you have a reason to believe parts of the data has changed since your last accession and you wish to update while decoupling the update process from your original code used to generate the data.

Usage

update_result(query)
update_result(query)

Arguments

query

Output from a gemma.R function

Details

Note that if you have used the file and overwrite arguments with the original call, this will also repeat to regenarete the file based on your initial preference

Examples

annots <- get_dataset_annotations(1)
# wait for a couple of years..
# wonder if the results are the same
updated_annots <- update_result(annots)

# also works with outputs of get_all_pages
platforms <- get_all_pages(get_platforms_by_ids())
updated_platforms <- update_result(platforms)

annots <- get_dataset_annotations(1)
# wait for a couple of years..
# wonder if the results are the same
updated_annots <- update_result(annots)

# also works with outputs of get_all_pages
platforms <- get_all_pages(get_platforms_by_ids())
updated_platforms <- update_result(platforms)

Package 'gemma.R'

Help Index

Return all supported filter properties

Description

Usage

Value

Examples

Clear gemma.R cache

Description

Usage

Value

Examples

Custom gemma call

Description

Usage

Arguments

Value

Examples

Create printable tables out of gemma.R outputs

Description

Usage

Arguments

Enable and disable memoisation of gemma.R functions

Description

Usage

Arguments

gemma.R package: Access curated gene expression data and differential expression analyses

Description

Details

Author(s)

References

See Also

Get all pages of a paginated call

Description

Usage

Arguments

Value

Return child terms of a term

Description

Usage

Arguments

Value

Examples

Retrieve the annotations of a dataset

Description

Usage

Arguments

Value

Examples

Retrieve the design of a dataset

Description

Usage

Arguments

Value

Examples

Retrieve annotations and surface level stats for a dataset's differential analyses

Description

Usage

Arguments

Value

Examples

Retrieve the expression data matrix of a set of datasets and genes

Description

Usage

Arguments

Value

Examples

Compile gene expression data and metadata

Description

Usage

Arguments

Value

Examples

Retrieve the platforms of a dataset

Description

Usage

Arguments

Value

Examples

Retrieve processed expression data of a dataset