Package 'pairedGSEA' reference manual

Title:	Paired DGE and DGS analysis for gene set enrichment analysis
Description:	pairedGSEA makes it simple to run a paired Differential Gene Expression (DGE) and Differencital Gene Splicing (DGS) analysis. The package allows you to store intermediate results for further investiation, if desired. pairedGSEA comes with a wrapper function for running an Over-Representation Analysis (ORA) and functionalities for plotting the results.
Authors:	Søren Helweg Dam [cre, aut] , Lars Rønn Olsen [aut] , Kristoffer Vitting-Seerup [aut]
Maintainer:	Søren Helweg Dam <[email protected]>
License:	MIT + file LICENSE
Version:	1.7.0
Built:	2025-03-27 03:38:39 UTC
Source:	https://github.com/bioc/pairedGSEA

Output of running paired_diff on example_se.

Description

This example result is used primarily to do package tests and for function man pages

Usage

data("example_diff_result")
data("example_diff_result")

Format

A 'DataFrame' with 954 rows and 7 columns.

Value

A 'DataFrame'.

MSigDB gene sets from humans, category C5 with ensemble gene IDs

Description

This example gene set list is used primarily to do package tests and for function man pages.

Usage

data("example_gene_sets")
data("example_gene_sets")

Format

A list of 77 human gene sets

Value

A list of gene sets

Output of running paired_ora on example_diff_result and gene sets extracted from MSigDB

Description

This example result is used primarily to do package tests and for function man pages.

Usage

data("example_ora_results")
data("example_ora_results")

Format

A 'DataFrame' with 559 rows and 18 columns.

Value

A 'DataFrame'

A small subset of the GEO:GSE61220 data set.

Description

The subset is used in the vignettes and function man pages. The subset was created by extracting genes belonging to Telomere-related gene sets and randomly selecting 900 other genes from the original dataset.

Usage

data("example_se")
data("example_se")

Format

A 'SummarizedExperiment'

assay: Count matrix with 5611 transcripts and 6 samples
colData: The metadata associated with the count matrix

Value

A 'SummarizedExperiment'

Source

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE61220

Run paired DESeq2 and DEXSeq analyses

Description

With paired_diff you can run a paired differential gene expression and splicing analysis. The function expects a counts matrix or a SummarizedExperiment or DESeqDataSet object as input. A preliminary prefiltering step is performed to remove genes with a summed count lower than the provided threshold. Likewise, genes with counts in only one sample are removed. This step is mostly to speed up differential analyses, as DESeq2 will do a stricter filtering. Surrogate Variable Analysis is recommended to allow the analyses to take batch effects etc. into account. After the two differential analyses, the transcript-level p-values will be aggregated to gene-level to allow subsequent Gene-Set Enrichment Analysis. Transcript-level results can be extracted by setting store_results = TRUE.

Usage

paired_diff(
    object,
    group_col,
    sample_col,
    baseline,
    case,
    metadata = NULL,
    covariates = NULL,
    experiment_title = NULL,
    store_results = FALSE,
    run_sva = TRUE,
    use_limma = FALSE,
    prefilter = 10,
    test = "LRT",
    fit_type = "local",
    quiet = FALSE,
    parallel = FALSE,
    BPPARAM = BiocParallel::bpparam(),
    expression_only = FALSE,
    custom_design = FALSE,
    ...
    )
paired_diff(
    object,
    group_col,
    sample_col,
    baseline,
    case,
    metadata = NULL,
    covariates = NULL,
    experiment_title = NULL,
    store_results = FALSE,
    run_sva = TRUE,
    use_limma = FALSE,
    prefilter = 10,
    test = "LRT",
    fit_type = "local",
    quiet = FALSE,
    parallel = FALSE,
    BPPARAM = BiocParallel::bpparam(),
    expression_only = FALSE,
    custom_design = FALSE,
    ...
    )

Arguments

`object`	A data object of the types matrix, `SummarizedExperiment`, or `DESeqDataSet`. If a matrix is used, please also provide metadata.
`group_col`	The metadata column specifying the what group each sample is associated with
`sample_col`	The column in the metadata that specifies the sample IDs (should correspond to column names in `object`). Set to `"rownames"` if the rownames should be used.
`baseline`	Group value of baseline samples
`case`	Group value of case samples
`metadata`	(Default: `NULL`) A metadata file or `data.frame` object
`covariates`	Name of column(s) in the `metadata` that indicate(s) covariates. E.g., c("gender", "tissue_type")
`experiment_title`	Title of your experiment. Your results will be stored in `paste0("results/", experiment_title, "_pairedGSEA.RDS")`.
`store_results`	(Default: `FALSE`) A logical indicating if results should be stored in the folder `"results/"`.
`run_sva`	(Default: `TRUE`) A logical stating whether SVA should be run.
`use_limma`	(Default: `FALSE`) A logical determining if `limma+voom` or `DESeq2` + `DEXSeq` should be used for the analysis
`prefilter`	(Default: `10`) The prefilter threshold, where `rowSums` lower than the prefilter threshold will be removed from the count matrix. Set to 0 or `FALSE` to prevent prefiltering
`test`	either "Wald" or "LRT", which will then use either Wald significance tests (defined by `nbinomWaldTest`), or the likelihood ratio test on the difference in deviance between a full and reduced model formula (defined by `nbinomLRT`)
`fit_type`	(Default: `"local"`) Either `"parametric", "local", "mean", or "glmGamPoi"` for the type of fitting of dispersions to the mean intensity.
`quiet`	(Default: `FALSE`) Whether to print messages
`parallel`	(Default: `FALSE`) If FALSE, no parallelization. If TRUE, parallel execution using `BiocParallel`, see next argument `BPPARAM`.
`BPPARAM`	(Default: `bpparam()`) An optional parameter object passed internally to `bplapply` when `parallel = TRUE`. If not specified, the parameters last registered with register will be used.
`expression_only`	(Default: `FALSE`) A logical that indicates whether to only run `DESeq2` analysis. Not generally recommended. The setting was implemented to make the SVA impact analysis easier
`custom_design`	(Default: `FALSE`) A logical or formula. Can be used to apply a custom design formula for the analysis. Generally not recommended, as `pairedGSEA` will make its own design formula from the group and `covariate` columns
`...`	Additional parameters passed to `DESeq()`

Value

A DFrame of aggregated pvalues

Examples


# Run analysis on included example data
data("example_se")

diff_results <- paired_diff(
    object = example_se[1:15, ],
    group_col = "group_nr",
    sample_col = "id",
    baseline = 1,
    case = 2,
    experiment_title = "Example",
    store_results = FALSE 
)

# Run analysis on included example data
data("example_se")

diff_results <- paired_diff(
    object = example_se[1:15, ],
    group_col = "group_nr",
    sample_col = "id",
    baseline = 1,
    case = 2,
    experiment_title = "Example",
    store_results = FALSE 
)

Paired Over-Representation Analysis

Description

paired_ora uses fora to run the over-representation analysis. First the aggregated pvalues are adjusted using the Benjamini & Hochberg method. The analysis is run on all significant genes found by DESeq2 and DEXSeq individually. I.e., two runs of fora are executed and subsequently joined into a single object. You can use prepare_msigdb to create a list of gene_sets.

Usage

paired_ora(
    paired_diff_result,
    gene_sets,
    cutoff = 0.05,
    min_size = 25,
    experiment_title = NULL,
    expression_only = FALSE,
    quiet = FALSE
    )
paired_ora(
    paired_diff_result,
    gene_sets,
    cutoff = 0.05,
    min_size = 25,
    experiment_title = NULL,
    expression_only = FALSE,
    quiet = FALSE
    )

Arguments

`paired_diff_result`	The output of `paired_diff`
`gene_sets`	List of gene sets to analyse
`cutoff`	(Default: `0.05`) Adjusted p-value cutoff for selecting significant genes
`min_size`	(Default: `25`) Minimal size of a gene set to test. All pathways below the threshold are excluded.
`experiment_title`	Title of your experiment. Your results will be stored in `paste0("results/", experiment_title, "_fora.RDS")`.
`expression_only`	(Default: `FALSE`) A logical that indicates whether to only run `DESeq2` analysis. Not generally recommended.
`quiet`	(Default: `FALSE`) Whether to print messages

Value

A data.table of merged ORA results

Examples

data("example_diff_result")
data("example_gene_sets")

ora <- paired_ora(
    example_diff_result,
    example_gene_sets)


data("example_diff_result")
data("example_gene_sets")

ora <- paired_ora(
    example_diff_result,
    example_gene_sets)

Scatter plot of Over-Representation Analysis results

Description

Scatter plot of Over-Representation Analysis results

Usage

plot_ora(
  ora,
  pattern = NULL,
  paired = FALSE,
  plotly = FALSE,
  cutoff = 0.05,
  lines = TRUE,
  colors = c("darkgray", "purple", "lightblue", "maroon")
)
plot_ora(
  ora,
  pattern = NULL,
  paired = FALSE,
  plotly = FALSE,
  cutoff = 0.05,
  lines = TRUE,
  colors = c("darkgray", "purple", "lightblue", "maroon")
)

Arguments

`ora`	Output of `paired_ora`
`pattern`	Highlight pathways containing a specific regex pattern
`paired`	(Default: TRUE) New plotting mode for paired ora analysis
`plotly`	(Default: `FALSE`) Logical on whether to return plot as an interactive `plotly` plot or a simple ggplot.
`cutoff`	(Default: `0.2`) Adjusted p-value cutoff for pathways to include
`lines`	(Default: `TRUE`) Whether to show dashed lines
`colors`	(Default: `c("darkgray", "purple", "navy")`) Colors to use in plot. The colors are ordered as "Both", "DGS", and "DGE"

Value

A ggplot

Examples

data(example_ora_results)

plot_ora(example_ora_results)
data(example_ora_results)

plot_ora(example_ora_results)

Load MSigDB and convert to names list of gene sets

Description

This function is wrapper around msigdbr(). Please see their manual for details on its use. The function extracts the gene set name and a user-defined gene id type (Default: "ensembl_gene"). Please make sure the gene IDs match those from your DE analysis. This function will format the gene sets such that they can be directly used with paired_ora().

Usage

prepare_msigdb(
  gene_id_type = "ensembl_gene",
  species = "Homo sapiens",
  db_species = c("HS", "MM"),
  collection = "C5",
  subcollection = NULL,
  category = NULL,
  subcategory = NULL
)
prepare_msigdb(
  gene_id_type = "ensembl_gene",
  species = "Homo sapiens",
  db_species = c("HS", "MM"),
  collection = "C5",
  subcollection = NULL,
  category = NULL,
  subcategory = NULL
)

Arguments

`gene_id_type`	(Default: "ensemble_gene") The gene ID type to extract. The IDs should match the gene IDs from your DE analysis.
`species`	Species name for output genes, such as `"Homo sapiens"` or `"Mus musculus"`. Use `msigdbr_species()` for available options.
`db_species`	Species abbreviation for the human or mouse databases (`"HS"` or `"MM"`).
`collection`	Collection abbreviation, such as `"H"` or `"C1"`. Use `msigdbr_collections()` for the available options.
`subcollection`	Sub-collection abbreviation, such as `"CGP"` or `"BP"`. Use `msigdbr_collections()` for the available options.
`category`	use the `collection` argument
`subcategory`	use the `subcollection` argument

Value

A list of gene sets

Examples

gene_sets <- prepare_msigdb(species = "Homo sapiens")
gene_sets <- prepare_msigdb(species = "Homo sapiens")

Package 'pairedGSEA'

Help Index

Output of running paired_diff on example_se.

Description

Usage

Format

Value

MSigDB gene sets from humans, category C5 with ensemble gene IDs

Description

Usage

Format

Value

Output of running paired_ora on example_diff_result and gene sets extracted from MSigDB

Description

Usage

Format

Value

A small subset of the GEO:GSE61220 data set.

Description

Usage

Format

Value

Source

Run paired DESeq2 and DEXSeq analyses

Description

Usage

Arguments

Value

See Also

Examples

Paired Over-Representation Analysis

Description

Usage

Arguments

Value

See Also

Examples

Scatter plot of Over-Representation Analysis results

Description

Usage

Arguments

Value

Examples

Load MSigDB and convert to names list of gene sets

Description

Usage

Arguments

Value

Examples