Package 'gemma.R'

Title: A wrapper for Gemma's Restful API to access curated gene expression data and differential expression analyses
Description: Low- and high-level wrappers for Gemma's RESTful API. They enable access to curated expression and differential expression data from over 10,000 published studies. Gemma is a web site, database and a set of tools for the meta-analysis, re-use and sharing of genomics data, currently primarily targeted at the analysis of gene expression profiles.
Authors: Javier Castillo-Arnemann [aut] , Jordan Sicherman [aut] , Ogan Mancarci [cre, aut] , Guillaume Poirier-Morency [aut]
Maintainer: Ogan Mancarci <[email protected]>
License: Apache License (>= 2)
Version: 3.1.9
Built: 2024-07-17 19:38:41 UTC

Help Index

Return all supported filter properties


Some functions such as get_datasets and get_platforms_by_ids include a filter argument that allows creation of more complex queries. This function returns a list of supported properties to be used in those filters




A list of data.tables that contain supported properties and their data types



Clear gemma.R cache


Forget past results from memoised calls to the Gemma API (ie. using functions with memoised = TRUE)




TRUE to indicate cache was cleared.



Custom gemma call


A minimal function to create custom calls. Can be used to acquire unimplemented endpoints and/or raw output without any processing. Refer to the API documentation.


gemma_call(call, ..., json = TRUE)



Gemma API endpoint.


parameters included in the call


If TRUE will parse the content as a list


A list if json = TRUE and an httr response if FALSE


# get singular value decomposition for the dataset
gemma_call('datasets/{dataset}/svd',dataset = 1)

Create printable tables out of gemma.R outputs


Creates a kable where certain columns are automatically shortened to better fit a document.





A data.table or data.frame outputted by a gemma.R function

Enable and disable memoisation of gemma.R functions


Enable and disable memoisation of gemma.R functions


  memoised = FALSE,
  cache = rappdirs::user_cache_dir(appname = "gemmaR")



boolean. If TRUE memoisation will be enabled


File path or "cache_in_memory". File path will chose a location to save the cache files for memoisation. "cache_in_memory" will store the cache in the current R session

gemma.R package: Access curated gene expression data and differential expression analyses


This package contains wrappers and convenience functions for Gemma's RESTful API that enables access to curated expression and differential expression data from over 15,000 published studies (as of mid-2022). Gemma ( is a web site, database and a set of tools for the meta-analysis, re-use and sharing of transcriptomics data, currently primarily targeted at the analysis of gene expression profiles.


Most users will want to start with the high-level functions like get_dataset_object, get_differential_expression_values and get_platform_annotations Additional lower-level methods are available that directly map to the Gemma RESTful API methods.

For more information and detailed usage instructions check the README, the function reference and the vignette.

All software-related questions should be posted to the Bioconductor Support Site:


Javier Castillo-Arnemann, Jordan Sicherman, Ogan Mancarci, Guillaume Poirier-Morency


Lim, N. et al., Curation of over 10 000 transcriptomic studies to enable data reuse, Database, 2021.

See Also

Useful links:

Get all pages of a paginated call


Given a Gemma.R output from a function with offset and limit arguments, returns the output from all pages. All arguments other than offset, limit


  step_size = 100,
  binder = rbind,
  directory = NULL,
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



Output from a gemma.R function with offset and limit argument


Size of individual calls to the server. 100 is the maximum value


Binding function for the calls. If raw = FALSE use rbind to combine the data.tables. If not, use c to combine lists


Directory to save the output from the individual calls to. If provided, each page is saved to separate files.


The name of a file to save the results to, or NULL to not write results to a file. This function always saves the output as an RDS file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


A data.table or a list containing data from all pages.

Return child terms of a term


When querying for ontology terms, Gemma propagates these terms to include any datasets with their child terms in the results. This function returns these children for any number of terms, including all children and the terms itself in the output vector





An array of terms


An array containing descendends of the annotation terms, including the terms themselves



Retrieve the annotations of a dataset


Retrieve the annotations of a dataset


  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



A numerical dataset identifier or a dataset short name


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


A data table with information about the annotations of the queried dataset. A list if raw = TRUE.A 404 error if the given identifier does not map to any object.

The fields of the output data.table are:

  • Name of the annotation class (e.g. organism part)

  • class.URI: URI for the annotation class

  • Name of the annotation term (e.g. lung)

  • term.URI: URI for the annotation term

  • object.class: Class of object that the term originated from.



Retrieve the design of a dataset


Retrieve the design of a dataset


  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



A numerical dataset identifier or a dataset short name


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


A data table of the design matrix for the queried dataset. A 404 error if the given identifier does not map to any object



Retrieve annotations and surface level stats for a dataset's differential analyses


Retrieve annotations and surface level stats for a dataset's differential analyses


  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



A numerical dataset identifier or a dataset short name


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


A data table with information about the differential expression analysis of the queried dataset. Note that this funciton does not return differential expression values themselves. Use get_differential_expression_values to get differential expression values (see examples).

The fields of the output data.table are:

  • result.ID: Result set ID of the differential expression analysis. May represent multiple factors in a single model.

  • contrast.ID: Id of the specific contrast factor. Together with the result.ID they uniquely represent a given contrast.

  • experiment.ID: Id of the source experiment

  • factor.category: Category for the contrast

  • factor.category.URI: URI for the contrast category

  • factor.ID: ID of the factor

  • baseline.factors: Characteristics of the baseline. This field is a data.table

  • experimental.factors: Characteristics of the experimental group. This field is a data.table

  • isSubset: TRUE if the result set belong to a subset, FALSE if not. Subsets are created when performing differential expression to avoid unhelpful comparisons.

  • subsetFactor: Characteristics of the subset. This field is a data.table

  • probes.analyzed: Number of probesets represented in the contrast

  • genes.analyzed: Number of genes represented in the contrast


result <- get_dataset_differential_expression_analyses("GSE2872")
get_differential_expression_values(resultSet = result$result.ID[1])

Retrieve the expression data matrix of a set of datasets and genes


Retrieve the expression data matrix of a set of datasets and genes


  keepNonSpecific = FALSE,
  consolidate = NA_character_,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



A vector of dataset IDs or short names


A vector of NCBI IDs, Ensembl IDs or gene symbols.


logical. FALSE by default. If TRUE, results from probesets that are not specific to the gene will also be returned.


An option for gene expression level consolidation. If empty, will return every probe for the genes. "pickmax" to pick the probe with the highest expression, "pickvar" to pick the prove with the highest variance and "average" for returning the average expression


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


A list of data frames


get_dataset_expression_for_genes("GSE2018", genes = c(10225, 2841))

Compile gene expression data and metadata


Return an annotated Bioconductor-compatible data structure or a long form tibble of the queried dataset, including expression data and the experimental design.


  genes = NULL,
  keepNonSpecific = FALSE,
  consolidate = NA_character_,
  resultSets = NULL,
  contrasts = NULL,
  metaType = "text",
  type = "se",
  memoised = getOption("gemma.memoised", FALSE)



A vector of dataset IDs or short names


A vector of NCBI IDs, Ensembl IDs or gene symbols.


logical. FALSE by default. If TRUE, results from probesets that are not specific to the gene will also be returned.


An option for gene expression level consolidation. If empty, will return every probe for the genes. "pickmax" to pick the probe with the highest expression, "pickvar" to pick the prove with the highest variance and "average" for returning the average expression


Result set IDs of the a differential expression analysis. Optional. If provided, the output will only include the samples from the subset used in the result set ID. Must be the same length as datasets.'


Contrast IDs of a differential expression contrast. Optional. Need resultSets to be defined to work. If provided, the output will only include samples relevant to the specific contrats.


How should the metadata information should be included. Can be "text", "uri" or "both". "text" and "uri" options


"se"for a SummarizedExperiment or "eset" for Expression Set. We recommend using SummarizedExperiments which are more recent. See the Summarized experiment vignette or the ExpressionSet vignette for more details. "tidy" for a long form data frame compatible with tidyverse functions. 'list' to return a list containing individual data frames containing expression values, design and the experiment.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


A list of SummarizedExperiments, ExpressionSets or a tibble containing metadata and expression data for the queried datasets and genes. Metadata will be expanded to include a variable number of factors that annotates samples from a dataset but will always include single "factorValues" column that houses data.tables that include all annotations for a given sample.



Retrieve the platforms of a dataset


Retrieve the platforms of a dataset


  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



A numerical dataset identifier or a dataset short name


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


A data table with information about the platform(s). A list if raw = TRUE. A 404 error if the given identifier does not map to any object

The fields of the output data.table are:

  • platform.ID: Internal identifier of the platform

  • platform.shortName: Shortname of the platform.

  • Full name of the platform.

  • platform.description: Free text description of the platform

  • platform.troubled: Whether or not the platform was marked "troubled" by a Gemma process or a curator

  • platform.experimentCount: Number of experiments using the platform within Gemma

  • platform.type: Technology type for the platform.

  • Name of the species platform was made for

  • taxon.scientific: Scientific name for the taxon

  • taxon.ID: Internal identifier given to the species by Gemma

  • taxon.NCBI: NCBI ID of the taxon

  • Underlying database used in Gemma for the taxon

  • taxon.database.ID: ID of the underyling database used in Gemma for the taxon



Retrieve processed expression data of a dataset


Retrieve processed expression data of a dataset


  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



A numerical dataset identifier or a dataset short name


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


If raw is FALSE (default), a data table of the expression matrix for the queried dataset. If raw is TRUE, returns the binary file in raw form.



Retrieve quantitation types of a dataset


Retrieve quantitation types of a dataset


  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



A numerical dataset identifier or a dataset short name


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


A data.table containing the quantitation types

The fields of the output data.table are:

  • id: If of the quantitation type. Any raw quantitation type can be accessed by get_dataset_raw_expression function using this id.

  • name: Name of the quantitation type

  • description: Description of the quantitation type

  • type: Type of the quantitation type. Either raw or processed. Each dataset will have one processed quantitation type which is the data returned using get_dataset_processed_expression

  • ratio: Whether or not the quanitation type is a ratio of multiple quantitation types. Typically TRUE for processed TWOCOLOR quantitation type.

  • preferred: The preferred raw quantitation type. This version is used in generation of the processed data within gemma.

  • recomputed: If TRUE this quantitation type is generated by recomputing raw data files Gemma had access to.



Retrieve raw expression data of a dataset


Retrieve raw expression data of a dataset


  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



A numerical dataset identifier or a dataset short name


Quantitation type id. These can be acquired using get_dataset_quantitation_types function. This endpoint can only return non-processed quantitation types.


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


If raw is FALSE (default), a data table of the expression matrix for the queried dataset. If raw is TRUE, returns the binary file in raw form.


q_types <- get_dataset_quantitation_types("GSE59918")
get_dataset_raw_expression("GSE59918", q_types$id[q_types$name == "Counts"])

Retrieve the samples of a dataset


Retrieve the samples of a dataset


  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



A numerical dataset identifier or a dataset short name


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


A data table with information about the samples of the queried dataset. A list if raw = TRUE. A 404 error if the given identifier does not map to any object.

The fields of the output data.table are:

  • Internal name given to the sample.

  • sample.ID: Internal ID of the sample

  • sample.description: Free text description of the sample

  • sample.outlier: Whether or not the sample is marked as an outlier

  • sample.accession: Accession ID of the sample in it's original database

  • sample.database: Database of origin for the sample

  • sample.characteristics: Characteristics of the sample. This field is a data table

  • sample.factorValues: Experimental factor values of the sample. This field is a data table



Retrieve all datasets


Retrieve all datasets


  query = NA_character_,
  filter = NA_character_,
  taxa = NA_character_,
  uris = NA_character_,
  offset = 0L,
  limit = 20L,
  sort = "+id",
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



The search query. Queries can include plain text or ontology terms They also support conjunctions ("alpha AND beta"), disjunctions ("alpha OR beta") grouping ("(alpha OR beta) AND gamma"), prefixing ("alpha*"), wildcard characters ("BRCA?") and fuzzy matches ("alpha~").


Filter results by matching expression. Use filter_properties function to get a list of all available parameters. These properties can be combined using "and" "or" clauses and may contain common operators such as "=", "<" or "in". (e.g. "taxon.commonName = human", "taxon.commonName in (human,mouse), "id < 1000")


A vector of taxon common names (e.g. human, mouse, rat). Providing multiple species will return results for all species. These are appended to the filter and equivalent to filtering for taxon.commonName property


A vector of ontology term URIs. Providing multiple terms will return results containing any of the terms and their children. These are appended to the filter and equivalent to filtering for allCharacteristics.valueUri


The offset of the first retrieved result.


Defaults to 20. Limits the result to specified amount of objects. Has a maximum value of 100. Use together with offset and the totalElements attribute in the output to compile all data if needed.


Order results by the given property and direction. The '+' sign indicate ascending order whereas the '-' indicate descending.


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


A data table with information about the queried dataset(s). A list if raw = TRUE. Returns an empty list if no datasets matched.

The fields of the output data.table are:

  • experiment.shortName: Shortname given to the dataset within Gemma. Often corresponds to accession ID

  • Full title of the dataset

  • experiment.ID: Internal ID of the dataset.

  • experiment.description: Description of the dataset

  • experiment.troubled: Did an automatic process within gemma or a curator mark the dataset as "troubled"

  • experiment.accession: Accession ID of the dataset in the external database it was taken from

  • experiment.database: The name of the database where the dataset was taken from

  • experiment.URI: URI of the original database

  • experiment.sampleCount: Number of samples in the dataset

  • experiment.batchEffectText: A text field describing whether the dataset has batch effects

  • experiment.batchCorrected: Whether batch correction has been performed on the dataset.

  • experiment.batchConfound: 0 if batch info isn't available, -1 if batch counfoud is detected, 1 if batch information is available and no batch confound found

  • experiment.batchEffect: -1 if batch p value < 0.0001, 1 if batch p value > 0.1, 0 if otherwise and when there is no batch information is available or when the data is confounded with batches.

  • experiment.rawData: -1 if no raw data available, 1 if raw data was available. When available, Gemma reprocesses raw data to get expression values and batches

  • geeq.qScore: Data quality score given to the dataset by Gemma.

  • geeq.sScore: Suitability score given to the dataset by Gemma. Refers to factors like batches, platforms and other aspects of experimental design

  • Name of the species

  • taxon.scientific: Scientific name for the taxon

  • taxon.ID: Internal identifier given to the species by Gemma

  • taxon.NCBI: NCBI ID of the taxon

  • Underlying database used in Gemma for the taxon

  • taxon.database.ID: ID of the underyling database used in Gemma for the taxon


get_datasets(taxa = c("mouse", "human"), uris = "")
# filter below is equivalent to the call above
get_datasets(filter = "taxon.commonName in (mouse,human) and allCharacteristics.valueUri =")
get_datasets(query = "lung")

Retrieve datasets by their identifiers


Retrieve datasets by their identifiers


  datasets = NA_character_,
  filter = NA_character_,
  taxa = NA_character_,
  uris = NA_character_,
  offset = 0L,
  limit = 20L,
  sort = "+id",
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



Numerical dataset identifiers or dataset short names. If not specified, all datasets will be returned instead


Filter results by matching expression. Use filter_properties function to get a list of all available parameters. These properties can be combined using "and" "or" clauses and may contain common operators such as "=", "<" or "in". (e.g. "taxon.commonName = human", "taxon.commonName in (human,mouse), "id < 1000")


A vector of taxon common names (e.g. human, mouse, rat). Providing multiple species will return results for all species. These are appended to the filter and equivalent to filtering for taxon.commonName property


A vector of ontology term URIs. Providing multiple terms will return results containing any of the terms and their children. These are appended to the filter and equivalent to filtering for allCharacteristics.valueUri


The offset of the first retrieved result.


Defaults to 20. Limits the result to specified amount of objects. Has a maximum value of 100. Use together with offset and the totalElements attribute in the output to compile all data if needed.


Order results by the given property and direction. The '+' sign indicate ascending order whereas the '-' indicate descending.


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


A data table with information about the queried dataset(s). A list if raw = TRUE. Returns an empty list if no datasets matched.

The fields of the output data.table are:

  • experiment.shortName: Shortname given to the dataset within Gemma. Often corresponds to accession ID

  • Full title of the dataset

  • experiment.ID: Internal ID of the dataset.

  • experiment.description: Description of the dataset

  • experiment.troubled: Did an automatic process within gemma or a curator mark the dataset as "troubled"

  • experiment.accession: Accession ID of the dataset in the external database it was taken from

  • experiment.database: The name of the database where the dataset was taken from

  • experiment.URI: URI of the original database

  • experiment.sampleCount: Number of samples in the dataset

  • experiment.batchEffectText: A text field describing whether the dataset has batch effects

  • experiment.batchCorrected: Whether batch correction has been performed on the dataset.

  • experiment.batchConfound: 0 if batch info isn't available, -1 if batch counfoud is detected, 1 if batch information is available and no batch confound found

  • experiment.batchEffect: -1 if batch p value < 0.0001, 1 if batch p value > 0.1, 0 if otherwise and when there is no batch information is available or when the data is confounded with batches.

  • experiment.rawData: -1 if no raw data available, 1 if raw data was available. When available, Gemma reprocesses raw data to get expression values and batches

  • geeq.qScore: Data quality score given to the dataset by Gemma.

  • geeq.sScore: Suitability score given to the dataset by Gemma. Refers to factors like batches, platforms and other aspects of experimental design

  • Name of the species

  • taxon.scientific: Scientific name for the taxon

  • taxon.ID: Internal identifier given to the species by Gemma

  • taxon.NCBI: NCBI ID of the taxon

  • Underlying database used in Gemma for the taxon

  • taxon.database.ID: ID of the underyling database used in Gemma for the taxon


get_datasets_by_ids(c("GSE2018", "GSE2872"))

Retrieve differential expression results


Retrieves the differential expression result set(s) associated with the dataset. To get more information about the contrasts in individual resultSets and annotation terms associated them, use get_dataset_differential_expression_analyses()


  dataset = NA_character_,
  resultSets = NA_integer_,
  keepNonSpecific = FALSE,
  readableContrasts = FALSE,
  memoised = getOption("gemma.memoised", FALSE)



A dataset identifier.


resultSet identifiers. If a dataset is not provided, all result sets will be downloaded. If it is provided it will only be used to ensure all result sets belong to the dataset.


logical. FALSE by default. If TRUE, results from probesets that are not specific to the gene will also be returned.


If FALSE (default), the returned columns will use internal constrasts IDs as names. Details about the contrasts can be accessed using get_dataset_differential_expression_analyses. If TRUE IDs will be replaced with human readable contrast information.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


In Gemma each result set corresponds to the estimated effects associated with a single factor in the design, and each can have multiple contrasts (for each level compared to baseline). Thus a dataset with a 2x3 factorial design will have two result sets, one of which will have one contrast, and one having two contrasts.

The methodology for differential expression is explained in Curation of over 10000 transcriptomic studies to enable data reuse. Briefly, differential expression analysis is performed on the dataset based on the annotated experimental design with up two three potentially nested factors. Gemma attempts to automatically assign baseline conditions for each factor. In the absence of a clear control condition, a baseline is arbitrarily selected. A generalized linear model with empirical Bayes shrinkage of t-statistics is fit to the data for each platform element (probe/gene) using an implementation of the limma algorithm. For RNA-seq data, we use weighted regression, applying the voom algorithm to compute weights from the mean–variance relationship of the data. Contrasts of each condition are then computed compared to the selected baseline. In some situations, Gemma will split the data into subsets for analysis. A typical such situation is when a ‘batch’ factor is present and confounded with another factor, the subsets being determined by the levels of the confounding factor.


A list of data tables with differential expression values per result set.



Retrieve the differential expression results for a given gene among datasets matching the provided query and filter


Retrieve the differential expression results for a given gene among datasets matching the provided query and filter


  query = NA_character_,
  taxa = NA_character_,
  uris = NA_character_,
  filter = NA_character_,
  threshold = 1,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



An ensembl gene identifier which typically starts with ensg or an ncbi gene identifier or an official gene symbol approved by hgnc


The search query. Queries can include plain text or ontology terms They also support conjunctions ("alpha AND beta"), disjunctions ("alpha OR beta") grouping ("(alpha OR beta) AND gamma"), prefixing ("alpha*"), wildcard characters ("BRCA?") and fuzzy matches ("alpha~").


A vector of taxon common names (e.g. human, mouse, rat). Providing multiple species will return results for all species. These are appended to the filter and equivalent to filtering for taxon.commonName property


A vector of ontology term URIs. Providing multiple terms will return results containing any of the terms and their children. These are appended to the filter and equivalent to filtering for allCharacteristics.valueUri


Filter results by matching expression. Use filter_properties function to get a list of all available parameters. These properties can be combined using "and" "or" clauses and may contain common operators such as "=", "<" or "in". (e.g. "taxon.commonName = human", "taxon.commonName in (human,mouse), "id < 1000")




TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


A data.table containing differential expression results. This table is stripped down some relevant information for speed of execution. Details about the contrasts can be accessesed via get_result_sets function

The fields of the output data.table are:

  • result.ID: Result set ID of the differential expression analysis. May represent multiple factors in a single model.

  • contrast.ID: Id of the specific contrast factor. Together with the result.ID they uniquely represent a given contrast.

  • experiment.ID: Id of the source experiment

  • factor.coefficient: Model coefficient calculated for the specific contrast factor

  • factor.logfc: Log 2 fold change calculated for the specific contrast factor

  • factor.pvalue: p values calculated for the specific contrast factor


# get all differential expression results for ENO2
# from datasets marked with the ontology term for brain
head(get_gene_differential_expression_values(2026, uris = ""))

Retrieve the GO terms associated to a gene


Retrieve the GO terms associated to a gene


  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



An ensembl gene identifier which typically starts with ensg or an ncbi gene identifier or an official gene symbol approved by hgnc


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


A data table with information about the GO terms assigned to the queried gene. A list if raw = TRUE. A 404 error if the given identifier does not map to any object.

The fields of the output data.table are:

  • Name of the term

  • term.ID: ID of the term

  • term.URI: URI of the term



Retrieve the physical locations of a given gene


Retrieve the physical locations of a given gene


  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



An ensembl gene identifier which typically starts with ensg or an ncbi gene identifier or an official gene symbol approved by hgnc


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


A data table with information about the physical location of the queried gene. A list if raw = TRUE. A 404 error if the given identifier does not map to any object.

The fields of the output data.table are:

  • chromosome: Name of the chromosome the gene is located

  • strand: Which strand the gene is located

  • nucleotide: Nucleotide number for the gene

  • length: Gene length

  • Name of the taxon

  • taxon.scientific: Scientific name for the taxon

  • taxon.ID: Internal ID for the taxon given by Gemma

  • taxon.NCBI: NCBI ID for the taxon

  • Name of the database used in Gemma for the taxon



Retrieve the probes associated to a genes across all platforms


Retrieve the probes associated to a genes across all platforms


  offset = 0L,
  limit = 20L,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



An ensembl gene identifier which typically starts with ensg or an ncbi gene identifier or an official gene symbol approved by hgnc


The offset of the first retrieved result.


Defaults to 20. Limits the result to specified amount of objects. Has a maximum value of 100. Use together with offset and the totalElements attribute in the output to compile all data if needed.


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


A data table with information about the probes representing a gene across all platrofms. A list if raw = TRUE. A 404 error if the given identifier does not map to any genes.

The fields of the output data.table are:

  • Name of the element. Typically the probeset name

  • element.description: A free text field providing optional information about the element

  • platform.shortName: Shortname of the platform given by Gemma. Typically the GPL identifier.

  • Full name of the platform

  • platform.ID: Id number of the platform given by Gemma

  • platform.type: Type of the platform.

  • platform.description: Free text field describing the platform.

  • platform.troubled: Whether the platform is marked as troubled by a Gemma curator.

  • Name of the species platform was made for

  • taxon.scientific: Scientific name for the taxon

  • taxon.ID: Internal identifier given to the species by Gemma

  • taxon.NCBI: NCBI ID of the taxon

  • Underlying database used in Gemma for the taxon

  • taxon.database.ID: ID of the underyling database used in Gemma for the taxon



Retrieve genes matching gene identifiers


Retrieve genes matching gene identifiers


  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



A vector of NCBI IDs, Ensembl IDs or gene symbols.


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


A data table with information about the querried gene(s) A list if raw = TRUE.

The fields of the output data.table are:

  • gene.symbol: Symbol for the gene

  • gene.ensembl: Ensembl ID for the gene

  • gene.NCBI: NCBI id for the gene

  • Name of the gene

  • gene.aliases: Gene aliases. Each row includes a vector

  • gene.MFX.rank: Multifunctionality rank for the gene

  • Name of the species

  • taxon.scientific: Scientific name for the taxon

  • taxon.ID: Internal identifier given to the species by Gemma

  • taxon.NCBI: NCBI ID of the taxon

  • Underlying database used in Gemma for the taxon

  • taxon.database.ID: ID of the underlying database used in Gemma for the taxon


get_genes(c("DYRK1A", "PTEN"))

Retrieve Platform Annotations by Gemma


Gets Gemma's platform annotations including mappings of microarray probes to genes.


  annotType = c("noParents", "allParents", "bioProcess"),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE),
  memoised = getOption("gemma.memoise", FALSE),
  unzip = FALSE



A platform numerical identifiers or platform short name.


Which GO terms should the output include


Where to save the annotation file to, or empty to just load into memory


Whether or not to overwrite an existing file


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


Whether or not to unzip the file (if @param file is not empty)


A table of annotations

  • ProbeName: Probeset names provided by the platform. Gene symbols for generic annotations

  • GeneSymbols: Genes that were found to be aligned to the probe sequence. Note that it is possible for probes to be non-specific. Alignment to multiple genes are indicated with gene symbols separated by "|"s

  • GeneNames: Name of the gene

  • GOTerms: GO Terms associated with the genes. annotType argument can be used to choose which terms should be included.

  • GemmaIDs and NCBIids: respective IDs for the genes.



Retrieve all experiments using a given platform


Retrieve all experiments using a given platform


  offset = 0L,
  limit = 20L,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



A platform numerical identifier or a platform short name


The offset of the first retrieved result.


Defaults to 20. Limits the result to specified amount of objects. Has a maximum value of 100. Use together with offset and the totalElements attribute in the output to compile all data if needed.


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


A data table with information about the queried dataset(s). A list if raw = TRUE. Returns an empty list if no datasets matched.

The fields of the output data.table are:

  • experiment.shortName: Shortname given to the dataset within Gemma. Often corresponds to accession ID

  • Full title of the dataset

  • experiment.ID: Internal ID of the dataset.

  • experiment.description: Description of the dataset

  • experiment.troubled: Did an automatic process within gemma or a curator mark the dataset as "troubled"

  • experiment.accession: Accession ID of the dataset in the external database it was taken from

  • experiment.database: The name of the database where the dataset was taken from

  • experiment.URI: URI of the original database

  • experiment.sampleCount: Number of samples in the dataset

  • experiment.batchEffectText: A text field describing whether the dataset has batch effects

  • experiment.batchCorrected: Whether batch correction has been performed on the dataset.

  • experiment.batchConfound: 0 if batch info isn't available, -1 if batch counfoud is detected, 1 if batch information is available and no batch confound found

  • experiment.batchEffect: -1 if batch p value < 0.0001, 1 if batch p value > 0.1, 0 if otherwise and when there is no batch information is available or when the data is confounded with batches.

  • experiment.rawData: -1 if no raw data available, 1 if raw data was available. When available, Gemma reprocesses raw data to get expression values and batches

  • geeq.qScore: Data quality score given to the dataset by Gemma.

  • geeq.sScore: Suitability score given to the dataset by Gemma. Refers to factors like batches, platforms and other aspects of experimental design

  • Name of the species

  • taxon.scientific: Scientific name for the taxon

  • taxon.ID: Internal identifier given to the species by Gemma

  • taxon.NCBI: NCBI ID of the taxon

  • Underlying database used in Gemma for the taxon

  • taxon.database.ID: ID of the underyling database used in Gemma for the taxon



Retrieve the genes associated to a probe in a given platform


Retrieve the genes associated to a probe in a given platform


  offset = 0L,
  limit = 20L,
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



A platform numerical identifier or a platform short name


A probe name or it's numerical identifier


The offset of the first retrieved result.


Defaults to 20. Limits the result to specified amount of objects. Has a maximum value of 100. Use together with offset and the totalElements attribute in the output to compile all data if needed.


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


A data table with information about the querried gene(s) A list if raw = TRUE.

The fields of the output data.table are:

  • gene.symbol: Symbol for the gene

  • gene.ensembl: Ensembl ID for the gene

  • gene.NCBI: NCBI id for the gene

  • Name of the gene

  • gene.aliases: Gene aliases. Each row includes a vector

  • gene.MFX.rank: Multifunctionality rank for the gene

  • Name of the species

  • taxon.scientific: Scientific name for the taxon

  • taxon.ID: Internal identifier given to the species by Gemma

  • taxon.NCBI: NCBI ID of the taxon

  • Underlying database used in Gemma for the taxon

  • taxon.database.ID: ID of the underlying database used in Gemma for the taxon


get_platform_element_genes("GPL1355", "AFFX_Rat_beta-actin_M_at")

Retrieve all platforms matching a set of platform identifiers


Retrieve all platforms matching a set of platform identifiers


  platforms = NA_character_,
  filter = NA_character_,
  taxa = NA_character_,
  offset = 0L,
  limit = 20L,
  sort = "+id",
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



Platform numerical identifiers or platform short names. If not specified, all platforms will be returned instead


Filter results by matching expression. Use filter_properties function to get a list of all available parameters. These properties can be combined using "and" "or" clauses and may contain common operators such as "=", "<" or "in". (e.g. "taxon.commonName = human", "taxon.commonName in (human,mouse), "id < 1000")


A vector of taxon common names (e.g. human, mouse, rat). Providing multiple species will return results for all species. These are appended to the filter and equivalent to filtering for taxon.commonName property


The offset of the first retrieved result.


Defaults to 20. Limits the result to specified amount of objects. Has a maximum value of 100. Use together with offset and the totalElements attribute in the output to compile all data if needed.


Order results by the given property and direction. The '+' sign indicate ascending order whereas the '-' indicate descending.


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


A data table with information about the platform(s). A list if raw = TRUE. A 404 error if the given identifier does not map to any object

The fields of the output data.table are:

  • platform.ID: Internal identifier of the platform

  • platform.shortName: Shortname of the platform.

  • Full name of the platform.

  • platform.description: Free text description of the platform

  • platform.troubled: Whether or not the platform was marked "troubled" by a Gemma process or a curator

  • platform.experimentCount: Number of experiments using the platform within Gemma

  • platform.type: Technology type for the platform.

  • Name of the species platform was made for

  • taxon.scientific: Scientific name for the taxon

  • taxon.ID: Internal identifier given to the species by Gemma

  • taxon.NCBI: NCBI ID of the taxon

  • Underlying database used in Gemma for the taxon

  • taxon.database.ID: ID of the underyling database used in Gemma for the taxon


get_platforms_by_ids(c("GPL1355", "GPL96"))

Retrieve all result sets matching the provided criteria


Returns queried result set


  datasets = NA_character_,
  resultSets = NA_character_,
  filter = NA_character_,
  offset = 0,
  limit = 20,
  sort = "+id",
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



A vector of dataset IDs or short names


A resultSet identifier. Note that result set identifiers are not static and can change when Gemma re-runs analyses internally. Whem using these as inputs, try to make sure you access a currently existing result set ID by basing them on result sets returned for a particular dataset or filter used in get_result_sets


Filter results by matching expression. Use filter_properties function to get a list of all available parameters. These properties can be combined using "and" "or" clauses and may contain common operators such as "=", "<" or "in". (e.g. "taxon.commonName = human", "taxon.commonName in (human,mouse), "id < 1000")


The offset of the first retrieved result.


Defaults to 20. Limits the result to specified amount of objects. Has a maximum value of 100. Use together with offset and the totalElements attribute in the output to compile all data if needed.


Order results by the given property and direction. The '+' sign indicate ascending order whereas the '-' indicate descending.


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


Output and usage of this function is mostly identical to get_dataset_differential_expression_analyses. The principal difference being the ability to restrict your result sets, being able to query across multiple datasets and being able to use the filter argument to search based on result set properties.


A data table with information about the queried result sets. Note that this function does not return differential expression values themselves. Use get_differential_expression_values to get differential expression values

  • result.ID: Result set ID of the differential expression analysis. May represent multiple factors in a single model.

  • contrast.ID: Id of the specific contrast factor. Together with the result.ID they uniquely represent a given contrast.

  • experiment.ID: Id of the source experiment

  • factor.category: Category for the contrast

  • factor.category.URI: URI for the contrast category

  • factor.ID: ID of the factor

  • baseline.factors: Characteristics of the baseline. This field is a data.table

  • experimental.factors: Characteristics of the experimental group. This field is a data.table

  • isSubset: TRUE if the result set belong to a subset, FALSE if not. Subsets are created when performing differential expression to avoid unhelpful comparisons.

  • subsetFactor: Characteristics of the subset. This field is a data.table


get_result_sets(dataset = 1)
# get all contrasts comparing disease states. use filter_properties to see avaialble options
get_result_sets(filter = "baselineGroup.characteristics.value = disease")

Get taxa


Returns taxa and their versions used in Gemma


get_taxa(memoised = getOption("gemma.memoised", FALSE))



Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


A data frame including the names, IDs and database information about the taxons



Make simplified design frames


Using on the output of get_dataset_samples, this function creates a simplified design table, granting one column to each experimental variable


make_design(samples, metaType = "text")



An output from get_dataset_samples. The output should not be raw


Type of metadata to include in the output. "text", "uri" or "both"


A data.frame including the design table for the dataset


samples <- get_dataset_samples('GSE46416') 

Search for annotation tags


Search for annotation tags


  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



The search query. Queries can include plain text or ontology terms They also support conjunctions ("alpha AND beta"), disjunctions ("alpha OR beta") grouping ("(alpha OR beta) AND gamma"), prefixing ("alpha*"), wildcard characters ("BRCA?") and fuzzy matches ("alpha~").


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


A data table with annotations (annotation search result value objects) matching the given identifiers. A list if raw = TRUE. A 400 error if required parameters are missing.

The fields of the output data.table are:

  • Category that the annotation belongs to

  • category.URI: URI for the

  • Annotation term

  • value.URI: URI for the



Search everything in Gemma


Search everything in Gemma


  taxon = NA_character_,
  platform = NA_character_,
  limit = 100,
  resultType = "experiment",
  raw = getOption("gemma.raw", FALSE),
  memoised = getOption("gemma.memoised", FALSE),
  file = getOption("gemma.file", NA_character_),
  overwrite = getOption("gemma.overwrite", FALSE)



The search query. Queries can include plain text or ontology terms They also support conjunctions ("alpha AND beta"), disjunctions ("alpha OR beta") grouping ("(alpha OR beta) AND gamma"), prefixing ("alpha*"), wildcard characters ("BRCA?") and fuzzy matches ("alpha~").


A numerical taxon identifier or an ncbi taxon identifier or a taxon identifier that matches either its scientific or common name


A platform numerical identifier or a platform short name


Defaults to 100 with a maximum value of 2000. Limits the number of returned results. Note that this function does not support pagination.


The kind of results that should be included in the output. Can be experiment, gene, platform or a long object type name, documented in the API documentation.


TRUE to receive results as-is from Gemma, or FALSE to enable parsing. Raw results usually contain additional fields and flags that are omitted in the parsed results.


Whether or not to save to cache for future calls with the same inputs and use the result saved in cache if a result is already saved. Doing options(gemma.memoised = TRUE) will ensure that the cache is always used. Use forget_gemma_memoised to clear the cache.


The name of a file to save the results to, or NULL to not write results to a file. If raw == TRUE, the output will be the raw endpoint from the API, likely a JSON or a gzip file. Otherwise, it will be a RDS file.


Whether or not to overwrite if a file exists at the specified filename.


If raw = FALSE and resultType is experiment, gene or platform, a data.table containing the search results. If it is any other type, a list of results. A list with additional details about the search if raw = TRUE



Authentication by user name


Allows the user to access information that requires logging in to Gemma. To log out, run set_gemma_user without specifying the username or password.


set_gemma_user(username = NULL, password = NULL)



Your username (or empty, if logging out)


Your password (or empty, if logging out)


TRUE if authentication is successful, FALSE if not

Update result


Re-runs the function used to create a gemma.R output to update the data at hand. Useful if you have a reason to believe parts of the data has changed since your last accession and you wish to update while decoupling the update process from your original code used to generate the data.





Output from a gemma.R function


Note that if you have used the file and overwrite arguments with the original call, this will also repeat to regenarete the file based on your initial preference


annots <- get_dataset_annotations(1)
# wait for a couple of years..
# wonder if the results are the same
updated_annots <- update_result(annots)

# also works with outputs of get_all_pages
platforms <- get_all_pages(get_platforms_by_ids())
updated_platforms <- update_result(platforms)