Package 'SeqGSEA'

Title:	Gene Set Enrichment Analysis (GSEA) of RNA-Seq Data: integrating differential expression and splicing
Description:	The package generally provides methods for gene set enrichment analysis of high-throughput RNA-Seq data by integrating differential expression and splicing. It uses negative binomial distribution to model read count data, which accounts for sequencing biases and biological variation. Based on permutation tests, statistical significance can also be achieved regarding each gene's differential expression and splicing, respectively.
Authors:	Xi Wang <[email protected]>
Maintainer:	Xi Wang <[email protected]>
License:	GPL (>= 3)
Version:	1.47.0
Built:	2025-03-30 06:29:55 UTC
Source:	https://github.com/bioc/SeqGSEA

Help Index

SeqGSEA: a Bioconductor package for gene set enrichment analysis of RNA-Seq data
Calculate running enrichment scores of gene sets
Calculate enrichment scores for gene sets in the permutation data sets
Convert ensembl gene IDs to gene symbols
Convert gene symbols to ensembl gene IDs
Accessors for the 'counts' slot of a ReadCountSet object.
Calculate NB-statistics quantifying differential expression for each gene
Calculate NB-statistics quantifying DE for each gene in the permutation data sets
Perform negative binomial exact test for differential expression
Permutation for p-values in differential expression analysis
Pre-calculated DE/DS scores
Compute NB-statistics quantifying differential splicing on the permutation data set.
Permutation for p-values in differential splicing analysis
Form a table for DS analysis results at the Exon level
Form a table for DS analysis results at the gene level
Calculate NB-statistics quantifying differential splicing for individual exons
Calculate NB-statistics quantifying differential splicing for each gene
Accessor to the exonID slot of ReadCountSet objects
Check exon testability
Accessor to the geneID slot of ReadCountSet objects
Get the gene list in a SeqGeneSet object
Calculate gene scores on permutation data sets
Calculate gene scores by integrating DE and DS scores
Get the descriptions of gene sets in a SeqGeneSet object
Get the names of gene set in a SeqGeneSet object
Get the numbers of genes in each gene set in a SeqGeneSet object
Check gene testability
Generate permutation matrix
Calculate read counts of genes from a ReadCountSet object
SeqGeneSet object example
Form a table for GSEA results
Main function of gene set enrichment analysis
Get the labels of samples in a ReadCountSet object
Load Exon Count Data
Load gene sets from files
Initialize a new SeqGeneSet object
Generate a new ReadCountSet object
Normalize enrichment scores
Get normalization factors for normalization DE or DS scores
Plot the distribution of enrichment scores
Plot gene (DE/DS) scores
Plot showing SeqGeneSet's p-values/FDRs vs. NESs
Plot gene set details
Integration of differential expression and differential splice scores with a rank-based strategy
ReadCountSet object example
Class "ReadCountSet"
Run DESeq for differential expression analysis
An all-in function that allows end users to apply SeqGSEA to their data with one step.
Normalization of DE/DS scores
Class "SeqGeneSet"
Calculate significance of ESs
Number of gene sets in a SeqGeneSet object
Get a new ReadCountSet with specified gene IDs.
Extract top differentially expressed genes.
Extract top differentially spliced exons
Extract top differentially spliced genes
Extract top significant gene sets
Write DE/DS scores and gene scores
Write gene set supporting information

SeqGSEA: a Bioconductor package for gene set enrichment analysis of RNA-Seq data

Description

SeqGSEA is an R package for gene set enrichment analysis of RNA-Seq data with the ability to integrate differential expression and differential splice in functional analysis.

Details

Package:	SeqGSEA
Type:	Package
License:	GPL (>= 3)

A User's Guide is available as well as the usual help page documentation for each of the individual functions.

The most useful functions are listed below:

* ReadCountSet class

ReadCountSet-class
ReadCountSet
exonID
geneID
counts-methods
label
subsetByGenes

* SeqGeneSet class

SeqGeneSet-class
geneSetDescs
geneSetNames
geneSetSize
size

* Load data

newReadCountSet
loadExonCountData
newGeneSets
loadGenesets

* DE analysis

getGeneCount
runDESeq
DENBStat4GSEA
DENBStatPermut4GSEA
DENBTest
DEpermutePval

* DS analysis

DSpermute4GSEA
DSpermutePval
exonTestability
geneTestability
estiExonNBstat
estiGeneNBstat

* GSEA main

GSEnrichAnalyze
calES
calES.perm
genePermuteScore
geneScore
rankCombine
normES
normFactor
scoreNormalization
signifES

* Result tables

GSEAresultTable
DSresultExonTable
DSresultGeneTable
topDEGenes
topDSExons
topDSGenes
topGeneSets

* Result displays

plotES
plotGeneScore
plotSig
plotSigGeneSet
writeSigGeneSet

* Miscellaneous

genpermuteMat
convertEnsembl2Symbol
convertSymbol2Ensembl

Author(s)

Xi Wang and Murray J. Cairns

Maintainer: Xi Wang <[email protected]>

References

Xi Wang and Murray J. Cairns (2013). Gene Set Enrichment Analysis of RNA-Seq Data: Integrating Differential Expression and Splicing. BMC Bioinformatics, 14(Suppl 5):S16.

Calculate running enrichment scores of gene sets

Description

This is an internal function to calculate running enrichment scores of each gene set in the SeqGeneSet object specified

Usage

calES(gene.set, gene.score, weighted.type = 1)
calES(gene.set, gene.score, weighted.type = 1)

Arguments

`gene.set`	a SeqGeneSet object.
`gene.score`	a vector of gene scores corresponding to the `geneList` slot of `gene.set`.
`weighted.type`	gene score weight type.

`gene.set`	a SeqGeneSet object.
`gene.score.perm`	a matrix of gene scores on the permutation data sets.
`weighted.type`	gene score weight type.

`dds`	a DESeqDataSet object, can be the output of `runDESeq`.
`permuteMat`	a permutation matrix generated by `genpermuteMat`.

`DEGres`	the output of `DENBStat4GSEA`.
`permuteNBstat`	the output of `DENBStatPermut4GSEA`.

`RCS`	a ReadCountSet object after running `exonTestability`.
`permuteMat`	a permutation matrix generated by `genpermuteMat`.

`RCS`	a ReadCountSet object after running `estiExonNBstat` and `estiGeneNBstat`.
`permuteMat`	a permutation matrix generated by `genpermuteMat`.

`RCS`	a ReadCountSet object.
`cutoff`	exons with read counts less than this cutoff are to be marked as untestable.

`DEscoreMat`	normalized DE scores on permutation data sets.
`DSscoreMat`	normalized DS scores on permutation data sets.
`method`	one of the integration methods: linear, quadratic, or rank; default: linear.
`DEweight`	any number between 0 and 1 (included), the weight of differential expression scores (the weight for differential splice is (1-DEweight)).

`DEscore`	normalized DE scores.
`DSscore`	normalized DS scores.
`method`	one of the integration methods: linear, quadratic, or rank; default: linear.
`DEweight`	any number between 0 and 1 (included), the weight of differential expression scores (the weight for differential splice is (1-DEweight)).

`obj`	a ReadCountSet object or a label vector. This function needs the original sample label information to generate permutation matrix.
`times`	an integer indication the times of permutation.
`seed`	an integer or NULL, to produce the random seed (an integer vector) for generating random permutation matrix: the same seed generates the same permutation matrix, which is introduced for reproducibility.

`gene.set`	a SeqGeneSet object after running `GSEnrichAnalyze`.
`GSDesc`	logical indicating whether to output gene set descriptions. default: FALSE

`case.files`	a character vector containing the exon count file names for case samples
`control.files`	a character vector containing the exon count file names for control samples

`geneset.file`	the file containing the gene set annotation.
`geneIDs`	gene IDs that have expression values in the studied data set.
`geneID.type`	indicating the type of gene IDs, gene symbol or emsembl gene IDs.
`genesetsize.min`	the minimum number of genes in a gene set that will be treated in the analysis.
`genesetsize.max`	the maximum number of genes in a gene set that will be treated in the analysis.
`singleCell`	logical, whether to creat a SeqGeneSet object for scGSEA.

`GS`	a list, each element is an integer vector, indicating the indexes of genes in each gene set. See Details below.
`GSNames`	a character string vector, each is the name of each gene set.
`GSDescs`	a character string vector, each is the description of each gene set.
`geneList`	a character string vector of gene IDs. See Details below.
`scGSEA`	logical, if this object used for scGSEA.
`name`	the name of this category of gene sets.
`sourceFile`	the source file name of this category of gene sets.
`GSSizeMin`	the minimum number of genes in a gene set to be analyzed. Default: 5
`GSSizeMax`	the maximum number of genes in a gene set to be analyzed. Default: 1000

`readCounts`	a data frame, read counts for each exon of each samples. Must have colnames, which indicate the label of samples.
`exonIDs`	a character vector indicating exon IDs.
`geneIDs`	a character vector indicating gene IDs.

`gene.set`	a SeqGeneSet object after running `GSEnrichAnalyze`.
`pdf`	whether to save the plot to PDF file; if yes, provide the name of the PDF file.

`score`	the gene/DE/DS score vector.
`perm.score`	a matrix of the corresponding gene/DE/DS scores on the permutation data sets.
`pdf`	if a PDF file name provided, plot will be save to that file.
`main`	the key words representing the type of scores that will be shown in the plot main title.

`gene.set`	a SeqGeneSet object after running `GSEnrichAnalyze`.
`i`	the i-th gene set in the SeqGeneSet object. `topGeneSets` is useful to find the most significantly overrepresented gene set.
`gene.score`	the gene score vector containing gene scores for each gene.
`pdf`	whether to save the plot to PDF file; if yes, provide the name of the PDF file.

`DEscore`	differential expression scores, normalized.
`DSscore`	differential splice scores, normalized.
`DEscoreMat`	differential expression scores in permuted data sets, normalized.
`DSscoreMat`	differential splice scores in permuted data sets, normalized.
`DEweight`	any number between 0 and 1 (included), the weight of differential expression scores (so the weight for differential splice is (1-DEweight)).

`geneCounts`	a matrix containing read counts for each gene, can be the output of `getGeneCount`.
`label`	the sample classification labels.

`data.dir`	a character vector, the path to your count data directory.
`case.pattern`	a character vector, the unique pattern in the file names of case samples. E.g, if file names starting with "SC", the pattern writes "^SC".
`ctrl.pattern`	a character vector, the unique pattern in the file names of control samples.
`geneset.file`	a character vector, the path to your gene set file. The gene set file must be in GMT format. Please refer to the link follows for details. http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29
`output.prefix`	a character vector, the path with prefix for output files.
`topGS`	an integer, this number of top ranked gene sets will be output with details; if geneset.file contains less than this number of gene sets, all gene sets' result details will be output. Default: 10.
`geneID.type`	the gene ID type in geneset.file. Currently only support "gene.symbol" and "ensembl". Default: gene.symbol.
`nCores`	an integer. The number of cores for running SeqGSEA. Default: 1
`perm.times`	an integer. The number of times for permutation, which will be used for normalizing DE and DS scores and for GSEA significance analysis. Recommended values are greater than 1000. Default: 1000.
`seed`	an integer or NULL, used for setting the seeds to generate random numbers. The same seed will guarantee the same analysis results given by SeqGSEA. Default: NULL.
`minExonReadCount`	an integer. An exon with total read count across all samples less than this number will be marked as untestable and be excluded in SeqGSEA analysis. Default: 5.
`integrationMethod`	one of the three integration methods for DE and DS score integration: linear, quadratic, or rank. Default: linear.
`DEweight`	a real number between 0 and 1 OR a vector of those. Each number is the DE weight in DE and DS integration. If using a vector of real numbers, SeqGSEA will run with each of them individually. Default: 0.5.
`DEonly`	logical, whether to run SeqGSEA only considering DE. Default: FALSE
`minGSsize`	an integer. The minimum gene set size: gene sets with genes less than this number will be skipped. Default: 5.
`maxGSsize`	an integer. The maximum gene set size: gene sets with genes greater than this number will be skipped. Default: 1000.
`GSEA.WeightedType`	the weight type of the main GSEA algorithm, can be 0 (unweighted = Kolmogorov-Smirnov), 1 (weighted), and 2 (over-weighted). Default: 1. It is recommended not to change it.

`scores`	a vector (a nX1 matrix) of a matrix of scores, rows corresponding to genes and columns corresponding to a study or permutation.
`norm.factor`	normalization factor, output of the function `normFactor`.

`DEGres`	DE analysis results.
`n`	the number of top DE genes.
`sortBy`	indicating which method to rank genes.

`RCS`	a ReadCountSet object after running `DSpermutePval`.
`n`	the number of top genes.
`sortBy`	indicating whether p-value or NBstat to be used for ranking genes.

`gene.set`	an object of class SeqGeneSet after GSEA runs.
`n`	the number of top gene sets.
`sortBy`	indicating which method to rank gene sets.
`GSDesc`	logical indicating whether or not to output gene set descriptions.

Package 'SeqGSEA'

Help Index

SeqGSEA: a Bioconductor package for gene set enrichment analysis of RNA-Seq data

Description

Details

Author(s)

References

Calculate running enrichment scores of gene sets

Description

Usage

Arguments

Author(s)

See Also

Examples

Calculate enrichment scores for gene sets in the permutation data sets

Description

Usage

Arguments

Author(s)

See Also

Examples

Convert ensembl gene IDs to gene symbols

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Convert gene symbols to ensembl gene IDs

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Accessors for the 'counts' slot of a ReadCountSet object.

Description

Usage

Arguments

Author(s)

Examples

Calculate NB-statistics quantifying differential expression for each gene

Description

Usage

Arguments

Value

Note

Author(s)

References

See Also

Examples

Calculate NB-statistics quantifying DE for each gene in the permutation data sets

Description

Usage

Arguments

Value

Note

Author(s)

References

See Also

Examples

Perform negative binomial exact test for differential expression

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Permutation for p-values in differential expression analysis

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

`gene.set`	an object of class SeqGeneSet with `GSEnrichAnalyze` done.
`i`	the i-th gene set in the SeqGeneSet object. `topGeneSets` is useful to find the most significantly overrepresented gene set.
`gene.score`	the vector of gene scores for running GSEA.
`file`	output file name, if not specified print to screen.