Title: | Paired DGE and DGS analysis for gene set enrichment analysis |
---|---|
Description: | pairedGSEA makes it simple to run a paired Differential Gene Expression (DGE) and Differencital Gene Splicing (DGS) analysis. The package allows you to store intermediate results for further investiation, if desired. pairedGSEA comes with a wrapper function for running an Over-Representation Analysis (ORA) and functionalities for plotting the results. |
Authors: | Søren Helweg Dam [cre, aut] , Lars Rønn Olsen [aut] , Kristoffer Vitting-Seerup [aut] |
Maintainer: | Søren Helweg Dam <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.7.0 |
Built: | 2024-11-18 03:42:33 UTC |
Source: | https://github.com/bioc/pairedGSEA |
This example result is used primarily to do package tests and for function man pages
data("example_diff_result")
data("example_diff_result")
A 'DataFrame' with 954 rows and 7 columns.
A 'DataFrame'.
This example gene set list is used primarily to do package tests and for function man pages.
data("example_gene_sets")
data("example_gene_sets")
A list of 77 human gene sets
A list of gene sets
This example result is used primarily to do package tests and for function man pages.
data("example_ora_results")
data("example_ora_results")
A 'DataFrame' with 559 rows and 18 columns.
A 'DataFrame'
The subset is used in the vignettes and function man pages. The subset was created by extracting genes belonging to Telomere-related gene sets and randomly selecting 900 other genes from the original dataset.
data("example_se")
data("example_se")
A 'SummarizedExperiment'
Count matrix with 5611 transcripts and 6 samples
The metadata associated with the count matrix
A 'SummarizedExperiment'
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE61220
With paired_diff
you can run a paired differential gene expression and
splicing analysis. The function expects a counts matrix or a
SummarizedExperiment
or
DESeqDataSet
object as input.
A preliminary prefiltering step is performed to remove genes with a summed
count lower than the provided threshold. Likewise, genes with counts in
only one sample are removed. This step is mostly to speed up differential
analyses, as DESeq2
will do a stricter filtering.
Surrogate Variable Analysis is recommended to allow the analyses to take
batch effects etc. into account.
After the two differential analyses, the transcript-level p-values will be
aggregated to gene-level to allow subsequent Gene-Set Enrichment Analysis.
Transcript-level results can be extracted by setting
store_results = TRUE
.
paired_diff( object, group_col, sample_col, baseline, case, metadata = NULL, covariates = NULL, experiment_title = NULL, store_results = FALSE, run_sva = TRUE, use_limma = FALSE, prefilter = 10, test = "LRT", fit_type = "local", quiet = FALSE, parallel = FALSE, BPPARAM = BiocParallel::bpparam(), expression_only = FALSE, custom_design = FALSE, ... )
paired_diff( object, group_col, sample_col, baseline, case, metadata = NULL, covariates = NULL, experiment_title = NULL, store_results = FALSE, run_sva = TRUE, use_limma = FALSE, prefilter = 10, test = "LRT", fit_type = "local", quiet = FALSE, parallel = FALSE, BPPARAM = BiocParallel::bpparam(), expression_only = FALSE, custom_design = FALSE, ... )
object |
A data object of the types matrix,
|
group_col |
The metadata column specifying the what group each sample is associated with |
sample_col |
The column in the metadata that specifies the sample IDs
(should correspond to column names in |
baseline |
Group value of baseline samples |
case |
Group value of case samples |
metadata |
(Default: |
covariates |
Name of column(s) in the |
experiment_title |
Title of your experiment. Your results will be
stored in |
store_results |
(Default: |
run_sva |
(Default: |
use_limma |
(Default: |
prefilter |
(Default: |
test |
either "Wald" or "LRT", which will then use either
Wald significance tests (defined by |
fit_type |
(Default: |
quiet |
(Default: |
parallel |
(Default: |
BPPARAM |
(Default:
|
expression_only |
(Default: |
custom_design |
(Default: |
... |
Additional parameters passed to
|
A DFrame of aggregated pvalues
Other paired:
paired_ora()
# Run analysis on included example data data("example_se") diff_results <- paired_diff( object = example_se[1:15, ], group_col = "group_nr", sample_col = "id", baseline = 1, case = 2, experiment_title = "Example", store_results = FALSE )
# Run analysis on included example data data("example_se") diff_results <- paired_diff( object = example_se[1:15, ], group_col = "group_nr", sample_col = "id", baseline = 1, case = 2, experiment_title = "Example", store_results = FALSE )
paired_ora uses fora
to run the
over-representation analysis.
First the aggregated pvalues are adjusted using the
Benjamini & Hochberg method.
The analysis is run on all significant genes found by
DESeq2
and
DEXSeq
individually.
I.e., two runs of fora
are executed and subsequently
joined into a single object.
You can use prepare_msigdb
to create a list of gene_sets.
paired_ora( paired_diff_result, gene_sets, cutoff = 0.05, min_size = 25, experiment_title = NULL, expression_only = FALSE, quiet = FALSE )
paired_ora( paired_diff_result, gene_sets, cutoff = 0.05, min_size = 25, experiment_title = NULL, expression_only = FALSE, quiet = FALSE )
paired_diff_result |
The output of
|
gene_sets |
List of gene sets to analyse |
cutoff |
(Default: |
min_size |
(Default: |
experiment_title |
Title of your experiment. Your results will be
stored in |
expression_only |
(Default: |
quiet |
(Default: |
A data.table of merged ORA results
Other paired:
paired_diff()
data("example_diff_result") data("example_gene_sets") ora <- paired_ora( example_diff_result, example_gene_sets)
data("example_diff_result") data("example_gene_sets") ora <- paired_ora( example_diff_result, example_gene_sets)
Scatter plot of Over-Representation Analysis results
plot_ora( ora, pattern = NULL, paired = TRUE, plotly = FALSE, cutoff = 0.05, lines = TRUE, colors = c("darkgray", "purple", "lightblue", "maroon") )
plot_ora( ora, pattern = NULL, paired = TRUE, plotly = FALSE, cutoff = 0.05, lines = TRUE, colors = c("darkgray", "purple", "lightblue", "maroon") )
ora |
Output of |
pattern |
Highlight pathways containing a specific regex pattern |
paired |
(Default: TRUE) New plotting mode for paired ora analysis |
plotly |
(Default: |
cutoff |
(Default: |
lines |
(Default: |
colors |
(Default: |
A ggplot
Suggested: importFrom plotly ggplotly
data(example_ora_results) plot_ora(example_ora_results, pattern = "Telomer")
data(example_ora_results) plot_ora(example_ora_results, pattern = "Telomer")
This function is wrapper around msigdbr()
.
Please see their manual for details on its use.
The function extracts the gene set name and a user-defined gene id type
(Default: "ensembl_gene").
Please make sure the gene IDs match those from your DE analysis.
This function will format the gene sets such that they can be directly
used with paired_ora()
.
prepare_msigdb( gene_id_type = "ensembl_gene", species = "Homo sapiens", category = "C5", subcategory = NULL )
prepare_msigdb( gene_id_type = "ensembl_gene", species = "Homo sapiens", category = "C5", subcategory = NULL )
gene_id_type |
(Default: "ensemble_gene") The gene ID type to extract. The IDs should match the gene IDs from your DE analysis. |
species |
Species name, such as Homo sapiens or Mus musculus. |
category |
MSigDB collection abbreviation, such as H or C1. |
subcategory |
MSigDB sub-collection abbreviation, such as CGP or BP. |
A list of gene sets
Suggested: importFrom msigdbr msigdbr
gene_sets <- prepare_msigdb(species = "Homo sapiens")
gene_sets <- prepare_msigdb(species = "Homo sapiens")