Title: | Gene Set Enrichment Analysis (GSEA) of RNA-Seq Data: integrating differential expression and splicing |
---|---|
Description: | The package generally provides methods for gene set enrichment analysis of high-throughput RNA-Seq data by integrating differential expression and splicing. It uses negative binomial distribution to model read count data, which accounts for sequencing biases and biological variation. Based on permutation tests, statistical significance can also be achieved regarding each gene's differential expression and splicing, respectively. |
Authors: | Xi Wang <[email protected]> |
Maintainer: | Xi Wang <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.47.0 |
Built: | 2024-10-31 05:21:49 UTC |
Source: | https://github.com/bioc/SeqGSEA |
SeqGSEA is an R package for gene set enrichment analysis of RNA-Seq data with the ability to integrate differential expression and differential splice in functional analysis.
Package: | SeqGSEA |
Type: | Package |
License: | GPL (>= 3) |
A User's Guide is available as well as the usual help page documentation for each of the individual functions.
The most useful functions are listed below:
* ReadCountSet class
* SeqGeneSet class
* Load data
* DE analysis
* DS analysis
* GSEA main
* Result tables
* Result displays
* Miscellaneous
Xi Wang and Murray J. Cairns
Maintainer: Xi Wang <[email protected]>
Xi Wang and Murray J. Cairns (2013). Gene Set Enrichment Analysis of RNA-Seq Data: Integrating Differential Expression and Splicing. BMC Bioinformatics, 14(Suppl 5):S16.
This is an internal function to calculate running enrichment scores of each gene set in the SeqGeneSet object specified
calES(gene.set, gene.score, weighted.type = 1)
calES(gene.set, gene.score, weighted.type = 1)
gene.set |
a SeqGeneSet object. |
gene.score |
a vector of gene scores corresponding to the |
weighted.type |
gene score weight type. |
Xi Wang, [email protected]
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) data(GS_example, package="SeqGSEA") rES <- calES(GS_example, gene.score) rES[1,]
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) data(GS_example, package="SeqGSEA") rES <- calES(GS_example, gene.score) rES[1,]
This is an internal function to calculate enrichment scores for gene sets in the permutation data sets.
calES.perm(gene.set, gene.score.perm, weighted.type = 1)
calES.perm(gene.set, gene.score.perm, weighted.type = 1)
gene.set |
a SeqGeneSet object. |
gene.score.perm |
a matrix of gene scores on the permutation data sets. |
weighted.type |
gene score weight type. |
Xi Wang, [email protected]
data(DEscore.perm, package="SeqGSEA") data(DSscore.perm, package="SeqGSEA") gene.score.perm <- genePermuteScore(DEscore.perm, DSscore.perm, method="linear", DEweight=0.3) data(GS_example, package="SeqGSEA") ES.perm <- calES.perm(GS_example, gene.score.perm) ES.perm[1:5,1:5]
data(DEscore.perm, package="SeqGSEA") data(DSscore.perm, package="SeqGSEA") gene.score.perm <- genePermuteScore(DEscore.perm, DSscore.perm, method="linear", DEweight=0.3) data(GS_example, package="SeqGSEA") ES.perm <- calES.perm(GS_example, gene.score.perm) ES.perm[1:5,1:5]
Convert ensembl gene IDs to gene symbols
convertEnsembl2Symbol(ensembl.genes)
convertEnsembl2Symbol(ensembl.genes)
ensembl.genes |
ensembl gene ID(s). |
A 2-column matrix showing the correspondence of ensembl gene IDs and gene symbols.
Xi Wang, [email protected]
## Not run: convertEnsembl2Symbol("ENSG00000162946") #DISC1 ## End(Not run)
## Not run: convertEnsembl2Symbol("ENSG00000162946") #DISC1 ## End(Not run)
Convert gene symbols to ensembl gene IDs
convertSymbol2Ensembl(symbols)
convertSymbol2Ensembl(symbols)
symbols |
gene symbol(s). |
A 2-column matrix showing the correspondence of gene symbols and ensembl gene IDs.
Xi Wang, [email protected]
## Not run: convertSymbol2Ensembl("DISC1") #ENSG00000162946 ## End(Not run)
## Not run: convertSymbol2Ensembl("DISC1") #ENSG00000162946 ## End(Not run)
Accessors for the 'counts' slot of a ReadCountSet object.
## S4 method for signature 'ReadCountSet' counts(object) ## S4 replacement method for signature 'ReadCountSet,matrix' counts(object) <- value
## S4 method for signature 'ReadCountSet' counts(object) ## S4 replacement method for signature 'ReadCountSet,matrix' counts(object) <- value
object |
a ReadCountSet object |
value |
a matrix of read counts |
Xi Wang, [email protected]
data(RCS_example, package="SeqGSEA") readCounts <- counts(RCS_example) head(readCounts)
data(RCS_example, package="SeqGSEA") readCounts <- counts(RCS_example) head(readCounts)
Calculate NB-statistics quantifying differential expression between two groups of samples compared.
The results will be used for GSEA run. Comparing with DENBTest
, this function will not
calculate NB test p-values.
This function only works with two-group comparison.
DENBStat4GSEA(dds)
DENBStat4GSEA(dds)
dds |
A DESeqDataSet object with size factors and dispersion parameters estimated. Recommended to take the output of |
A data frame containing each gene's expression means and variances in each group, and each gene's DE NB-statistics.
The results with the output of DENBStatPermut4GSEA
can also be used to run DEpermutePval
.
Xi Wang, [email protected]
Xi Wang and Murray J. Cairns (2013). Gene Set Enrichment Analysis of RNA-Seq Data: Integrating Differential Expression and Splicing. BMC Bioinformatics, 14(Suppl 5):S16.
DENBTest
,
runDESeq
,
DENBStatPermut4GSEA
data(RCS_example, package="SeqGSEA") geneCounts <- getGeneCount(RCS_example) label <- label(RCS_example) DEG <- runDESeq(geneCounts, label) DEGres <- DENBStat4GSEA(DEG) head(DEGres)
data(RCS_example, package="SeqGSEA") geneCounts <- getGeneCount(RCS_example) label <- label(RCS_example) DEG <- runDESeq(geneCounts, label) DEGres <- DENBStat4GSEA(DEG) head(DEGres)
Calculate NB-statistics quantifying differential expression for each gene in the permutation data sets. The results will be used for GSEA run.
DENBStatPermut4GSEA(dds, permuteMat)
DENBStatPermut4GSEA(dds, permuteMat)
dds |
a DESeqDataSet object, can be the output of |
permuteMat |
a permutation matrix generated by |
A matrix of NB-statistics. Each row corresponds to each gene, and each column to each permutation.
The results with the output of DENBStat4GSEA
can also be used to run DEpermutePval
.
Xi Wang, [email protected]
Xi Wang and Murray J. Cairns (2013). Gene Set Enrichment Analysis of RNA-Seq Data: Integrating Differential Expression and Splicing. BMC Bioinformatics, 14(Suppl 5):S16.
DENBStat4GSEA
,
runDESeq
,
DEpermutePval
,
genpermuteMat
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10) geneCounts <- getGeneCount(RCS_example) label <- label(RCS_example) dds <- runDESeq(geneCounts, label) DEpermNBstat <- DENBStatPermut4GSEA(dds, permuteMat) DEpermNBstat[1:10,1:10]
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10) geneCounts <- getGeneCount(RCS_example) label <- label(RCS_example) dds <- runDESeq(geneCounts, label) DEpermNBstat <- DENBStatPermut4GSEA(dds, permuteMat) DEpermNBstat[1:10,1:10]
Perform negative binomial exact test for differential expression - a modified version of nbinomTest in DESeq package.
DENBTest(dds)
DENBTest(dds)
dds |
A DESeqDataSet object with size factors and dispersion parameters estimated. Recommended to take the output of |
A data frame of the test results. Information contains mean expression values, NB-statistics, (log) fold-changes, p-values, and adjusted p-values.
Xi Wang, [email protected]
Anders, S. and Huber, W. (2010) Differential expression analysis for sequence count data, Genome Biol, 11, R106.
data(RCS_example, package="SeqGSEA") geneCounts <- getGeneCount(RCS_example) label <- label(RCS_example) DEG <- runDESeq(geneCounts, label) DEGres <- DENBTest(DEG) head(DEGres)
data(RCS_example, package="SeqGSEA") geneCounts <- getGeneCount(RCS_example) label <- label(RCS_example) DEG <- runDESeq(geneCounts, label) DEGres <- DENBTest(DEG) head(DEGres)
Calculate permutation p-values in differential expression analysis for each genes.
DEpermutePval(DEGres, permuteNBstat)
DEpermutePval(DEGres, permuteNBstat)
DEGres |
the output of |
permuteNBstat |
the output of |
A data frame containing the expression means and variances for each gene in each group compared, and NB-stats, permutation p-values and adjusted p-values for each gene.
Xi Wang, [email protected]
runDESeq
,
DENBStat4GSEA
,
DENBStatPermut4GSEA
,
DENBTest
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10) geneCounts <- getGeneCount(RCS_example) label <- label(RCS_example) DEG <- runDESeq(geneCounts, label) DEGres <- DENBStat4GSEA(DEG) DEpermNBstat <- DENBStatPermut4GSEA(DEG, permuteMat) DEGres <- DEpermutePval(DEGres, DEpermNBstat) head(DEGres)
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10) geneCounts <- getGeneCount(RCS_example) label <- label(RCS_example) DEG <- runDESeq(geneCounts, label) DEGres <- DENBStat4GSEA(DEG) DEpermNBstat <- DENBStatPermut4GSEA(DEG, permuteMat) DEGres <- DEpermutePval(DEGres, DEpermNBstat) head(DEGres)
DEscore
and DSscore
are pre-calculated DE and DS scores, respectively;
DEscore.perm
and DSscore.perm
are pre-calculated DE and DS scores on the permutation data sets, respectively;
They are used in examples of the SeqGSEA
package.
Note that these scores are of no meaning but to demonstrate the usage of functions.
data("DEscore") data("DEscore.perm") data("DSscore") data("DSscore.perm")
data("DEscore") data("DEscore.perm") data("DSscore") data("DSscore.perm")
Xi Wang and Murray J. Cairns (2013). Gene Set Enrichment Analysis of RNA-Seq Data: Integrating Differential Expression and Splicing. BMC Bioinformatics, 14(Suppl 5):S16.
This function is to calculate NB-statistics quantifying differential splicing for each gene on each permutation data set. The results will be used for GSEA run as DS background.
DSpermute4GSEA(RCS, permuteMat)
DSpermute4GSEA(RCS, permuteMat)
RCS |
a ReadCountSet object after running |
permuteMat |
a permutation matrix generated by |
Parallel running configuration: TODO
A ReadCountSet object with slot permute_NBstat_gene updated.
Please run exonTestability
before run this function.
Xi Wang, [email protected]
Xi Wang and Murray J. Cairns (2013). Gene Set Enrichment Analysis of RNA-Seq Data: Integrating Differential Expression and Splicing. BMC Bioinformatics, 14(Suppl 5):S16.
exonTestability
,
genpermuteMat
,
DENBStatPermut4GSEA
,
DSpermutePval
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10) RCS_example <- exonTestability(RCS_example) RCS_example <- DSpermute4GSEA(RCS_example, permuteMat) head(RCS_example@permute_NBstat_gene)
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10) RCS_example <- exonTestability(RCS_example) RCS_example <- DSpermute4GSEA(RCS_example, permuteMat) head(RCS_example@permute_NBstat_gene)
Calculate permutation p-values in differential splicing analysis.
DSpermutePval(RCS, permuteMat)
DSpermutePval(RCS, permuteMat)
RCS |
a ReadCountSet object after running |
permuteMat |
a permutation matrix generated by |
Permutation p-values are computed based on NB-statistics for comparison of the studied groups and NB-statistics from the permutation data sets.
A ReadCountSet object with slots permute_NBstat_exon
, permute_NBstat_gene
, featureData
, and featureData_gene
updated.
Xi Wang, [email protected]
Xi Wang and Murray J. Cairns (2013). Gene Set Enrichment Analysis of RNA-Seq Data: Integrating Differential Expression and Splicing. BMC Bioinformatics, 14(Suppl 5):S16.
estiExonNBstat
,
estiGeneNBstat
,
genpermuteMat
,
DSpermute4GSEA
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10) RCS_example <- exonTestability(RCS_example) RCS_example <- estiExonNBstat(RCS_example) RCS_example <- estiGeneNBstat(RCS_example) RCS_example <- DSpermutePval(RCS_example, permuteMat) head(DSresultExonTable(RCS_example)) head(DSresultGeneTable(RCS_example))
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10) RCS_example <- exonTestability(RCS_example) RCS_example <- estiExonNBstat(RCS_example) RCS_example <- estiGeneNBstat(RCS_example) RCS_example <- DSpermutePval(RCS_example, permuteMat) head(DSresultExonTable(RCS_example)) head(DSresultGeneTable(RCS_example))
Form a table for differential splicing analysis results at the Exon level.
DSresultExonTable(RCS)
DSresultExonTable(RCS)
RCS |
A ReadCountSet object with |
A data frame containing each exon's NB-statistics, p-values and adjusted p-values for differential splicing analysis.
A matrix containing exon DS analysis results, including testability, NBstats, p-values and adjusted p-values.
Xi Wang, [email protected]
DSresultGeneTable
,
DSpermutePval
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10) RCS_example <- exonTestability(RCS_example) RCS_example <- estiExonNBstat(RCS_example) RCS_example <- estiGeneNBstat(RCS_example) RCS_example <- DSpermutePval(RCS_example, permuteMat) head(DSresultExonTable(RCS_example))
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10) RCS_example <- exonTestability(RCS_example) RCS_example <- estiExonNBstat(RCS_example) RCS_example <- estiGeneNBstat(RCS_example) RCS_example <- DSpermutePval(RCS_example, permuteMat) head(DSresultExonTable(RCS_example))
Form a table for differential splicing analysis results at the gene level.
DSresultGeneTable(RCS)
DSresultGeneTable(RCS)
RCS |
A ReadCountSet object with |
A data frame containing each gene's NB-statistics, p-values and adjusted p-values for differential splicing analysis.
Xi Wang, [email protected]
DSresultExonTable
,
DSpermutePval
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10) RCS_example <- exonTestability(RCS_example) RCS_example <- estiExonNBstat(RCS_example) RCS_example <- estiGeneNBstat(RCS_example) RCS_example <- DSpermutePval(RCS_example, permuteMat) head(DSresultGeneTable(RCS_example))
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10) RCS_example <- exonTestability(RCS_example) RCS_example <- estiExonNBstat(RCS_example) RCS_example <- estiGeneNBstat(RCS_example) RCS_example <- DSpermutePval(RCS_example, permuteMat) head(DSresultGeneTable(RCS_example))
Calculate NB-statistics quantifying differential splicing for individual exons between two groups of samples compared.
estiExonNBstat(RCS)
estiExonNBstat(RCS)
RCS |
a ReadCountSet object after running |
A ReadCountSet object with the slot featureData
updated.
Please run exonTestability
before you run this function.
Xi Wang, [email protected]
Weichen Wang, Zhiyi Qin, Zhixing Feng, Xi Wang and Xuegong Zhang (2013). Identifying differentially spliced genes from two groups of RNA-seq samples. Gene, 518(1):164-170.
exonTestability
,
estiGeneNBstat
data(RCS_example, package="SeqGSEA") RCS_example <- exonTestability(RCS_example, cutoff=5) RCS_example <- estiExonNBstat(RCS_example) head(fData(RCS_example))
data(RCS_example, package="SeqGSEA") RCS_example <- exonTestability(RCS_example, cutoff=5) RCS_example <- estiExonNBstat(RCS_example) head(fData(RCS_example))
Calculate NB-statistics quantifying differential splicing for each gene between two groups of samples compared. The results will be used for GSEA run (as DS-scores).
estiGeneNBstat(RCS)
estiGeneNBstat(RCS)
RCS |
a ReadCountSet object after running |
A ReadCountSet object with slot featureData_gene
updated.
Please run estiExonNBstat
before run this function.
Xi Wang, [email protected]
Weichen Wang, Zhiyi Qin, Zhixing Feng, Xi Wang and Xuegong Zhang (2013). Identifying differentially spliced genes from two groups of RNA-seq samples. Gene, 518(1):164-170.
data(RCS_example, package="SeqGSEA") RCS_example <- exonTestability(RCS_example, cutoff=5) RCS_example <- estiExonNBstat(RCS_example) RCS_example <- estiGeneNBstat(RCS_example) head(RCS_example@featureData_gene)
data(RCS_example, package="SeqGSEA") RCS_example <- exonTestability(RCS_example, cutoff=5) RCS_example <- estiExonNBstat(RCS_example) RCS_example <- estiGeneNBstat(RCS_example) head(RCS_example@featureData_gene)
Accessor to the exonID slot of ReadCountSet objects
exonID(RCS) exonID(RCS) <- value
exonID(RCS) exonID(RCS) <- value
RCS |
a ReadCountSet object |
value |
a vector of exon IDs |
A character vector of exon IDs; or a ReadCountSet object.
Xi Wang, [email protected]
data(RCS_example, package="SeqGSEA") exonID(RCS_example)
data(RCS_example, package="SeqGSEA") exonID(RCS_example)
Check exon testability, filtering out exons with very few (default: 5) read counts
exonTestability(RCS, cutoff = 5)
exonTestability(RCS, cutoff = 5)
RCS |
a ReadCountSet object. |
cutoff |
exons with read counts less than this cutoff are to be marked as untestable. |
a ReadCountSet object with slot fData
updated.
Xi Wang, [email protected]
data(RCS_example, package="SeqGSEA") RCS_example <- exonTestability(RCS_example, cutoff=5) head(fData(RCS_example))
data(RCS_example, package="SeqGSEA") RCS_example <- exonTestability(RCS_example, cutoff=5) head(fData(RCS_example))
Accessor to the geneID slot of ReadCountSet objects
geneID(RCS) geneID(RCS) <- value
geneID(RCS) geneID(RCS) <- value
RCS |
a ReadCountSet object |
value |
a vector of gene IDs |
A character vector of gene IDs, which can be duplicated; or a ReadCountSet object.
Xi Wang, [email protected]
data(RCS_example, package="SeqGSEA") geneID(RCS_example)
data(RCS_example, package="SeqGSEA") geneID(RCS_example)
Get the gene list in a SeqGeneSet object
geneList(GS)
geneList(GS)
GS |
A SeqGeneSet object. |
TBA
A vector of gene IDs.
Xi Wang, [email protected]
loadGenesets
,
SeqGeneSet-class
## gs <- newGeneSets(GS=list(1:10, 6:15, 11:20), geneList=paste("Gene", 1:22, sep=""), GSNames=c("gs1","gs2","gs3"), GSDescs=c("test1","test2","test3"), name="gs examples") geneList(gs) ## End
## gs <- newGeneSets(GS=list(1:10, 6:15, 11:20), geneList=paste("Gene", 1:22, sep=""), GSNames=c("gs1","gs2","gs3"), GSDescs=c("test1","test2","test3"), name="gs examples") geneList(gs) ## End
Calculate gene scores on permutation data sets
genePermuteScore(DEscoreMat, DSscoreMat = NULL, method = c("linear", "quadratic", "rank"), DEweight = 0.5)
genePermuteScore(DEscoreMat, DSscoreMat = NULL, method = c("linear", "quadratic", "rank"), DEweight = 0.5)
DEscoreMat |
normalized DE scores on permutation data sets. |
DSscoreMat |
normalized DS scores on permutation data sets. |
method |
one of the integration methods: linear, quadratic, or rank; default: linear. |
DEweight |
any number between 0 and 1 (included), the weight of differential expression scores (the weight for differential splice is (1-DEweight)). |
The integration methods including "linear", "quadratic", and "rank" are detailed in Wang and Cairns (2013). Here the rank method refers only to the method using data-set-specific ranks.
For DE-only analysis, just specify DEweight to be 1, and the DSscoreMat value can be NULL.
A gene score matrix.
Xi Wang, [email protected]
Xi Wang and Murray J. Cairns (2013). Gene Set Enrichment Analysis of RNA-Seq Data: Integrating Differential Expression and Splicing. BMC Bioinformatics, 14(Suppl 5):S16.
data(DEscore.perm, package="SeqGSEA") data(DSscore.perm, package="SeqGSEA") # linear combination with weight for DE 0.3 gene.score.perm <- genePermuteScore(DEscore.perm, DSscore.perm, method="linear", DEweight=0.3) # DE only analysis gene.score.perm <- genePermuteScore(DEscore.perm, DEweight=1)
data(DEscore.perm, package="SeqGSEA") data(DSscore.perm, package="SeqGSEA") # linear combination with weight for DE 0.3 gene.score.perm <- genePermuteScore(DEscore.perm, DSscore.perm, method="linear", DEweight=0.3) # DE only analysis gene.score.perm <- genePermuteScore(DEscore.perm, DEweight=1)
Calculate gene scores by integrating DE and DS scores
geneScore(DEscore, DSscore = NULL, method = c("linear", "quadratic", "rank"), DEweight = 0.5)
geneScore(DEscore, DSscore = NULL, method = c("linear", "quadratic", "rank"), DEweight = 0.5)
DEscore |
normalized DE scores. |
DSscore |
normalized DS scores. |
method |
one of the integration methods: linear, quadratic, or rank; default: linear. |
DEweight |
any number between 0 and 1 (included), the weight of differential expression scores (the weight for differential splice is (1-DEweight)). |
The integration methods including "linear", "quadratic", and "rank" are detailed in Wang and Cairns (2013). Here the rank method refers only to the method using data-set-specific ranks.
For DE-only analysis, just specify DEweight to be 1, and the DSscore value can be NULL.
A vector of gene scores.
Xi Wang, [email protected]
Xi Wang and Murray J. Cairns (2013). Gene Set Enrichment Analysis of RNA-Seq Data: Integrating Differential Expression and Splicing. BMC Bioinformatics, 14(Suppl 5):S16.
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") # linear combination with weight for DE 0.3 gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) # DE only analysis gene.score <- geneScore(DEscore, DEweight = 1)
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") # linear combination with weight for DE 0.3 gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) # DE only analysis gene.score <- geneScore(DEscore, DEweight = 1)
Get the descriptions of gene sets in a SeqGeneSet object
geneSetDescs(GS)
geneSetDescs(GS)
GS |
a SeqGeneSet object. |
Gene sets with size less than GSSizeMin
or more than GSSizeMax
are not included.
A vector of descriptions of each gene set in the SeqGeneSet object.
Xi Wang, [email protected]
geneSetNames
,
geneSetSize
,
SeqGeneSet-class
,
loadGenesets
data(GS_example, package="SeqGSEA") geneSetDescs(GS_example)
data(GS_example, package="SeqGSEA") geneSetDescs(GS_example)
Get the names of gene set in a SeqGeneSet object
geneSetNames(GS)
geneSetNames(GS)
GS |
a SeqGeneSet object. |
Gene sets with size less than GSSizeMin
or more than GSSizeMax
are not included.
A vector of gene set names in this SeqGeneSet object.
Xi Wang, [email protected]
geneSetDescs
,
geneSetSize
,
SeqGeneSet-class
,
loadGenesets
data(GS_example, package="SeqGSEA") geneSetNames(GS_example)
data(GS_example, package="SeqGSEA") geneSetNames(GS_example)
Get the numbers of genes in each gene set in a SeqGeneSet object
geneSetSize(GS)
geneSetSize(GS)
GS |
a SeqGeneSet object. |
Gene sets with size less than GSSizeMin
or more than GSSizeMax
are not included.
A vector of integers indicating the number of genes in each gene set in this SeqGeneSet object.
Xi Wang, [email protected]
geneSetNames
,
geneSetDescs
,
SeqGeneSet-class
,
loadGenesets
data(GS_example, package="SeqGSEA") geneSetSize(GS_example)
data(GS_example, package="SeqGSEA") geneSetSize(GS_example)
This function is to determine each gene's testability. A gene is testable if at least one of its exons are testable.
geneTestability(RCS)
geneTestability(RCS)
RCS |
a ReadCountSet object after exon testability checked, usually the output of |
This result can applied to filter out genes not expressed.
A logical vector indicating which genes are testable, i.e., having at least one exon testable.
Please run exonTestability
before run this function.
Xi Wang, [email protected]
exonTestability
,
subsetByGenes
data(RCS_example, package="SeqGSEA") RCS_example <- exonTestability(RCS_example, cutoff=5) geneTestable <- geneTestability(RCS_example) head(geneTestable)
data(RCS_example, package="SeqGSEA") RCS_example <- exonTestability(RCS_example, cutoff=5) geneTestable <- geneTestability(RCS_example) head(geneTestable)
Generate permutation matrix from ReadCountSet objects or from label vectors.
genpermuteMat(obj, times = 1000, seed = NULL)
genpermuteMat(obj, times = 1000, seed = NULL)
obj |
a ReadCountSet object or a label vector. This function needs the original sample label information to generate permutation matrix. |
times |
an integer indication the times of permutation. |
seed |
an integer or NULL, to produce the random seed (an integer vector) for generating random permutation matrix: the same seed generates the same permutation matrix, which is introduced for reproducibility. |
A sample label shuffled matrix, rows corresponding to samples and columns for each permutation.
Xi Wang, [email protected]
DSpermute4GSEA
,
DENBStatPermut4GSEA
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10, seed=0) RCS_example <- exonTestability(RCS_example) RCS_example <- DSpermute4GSEA(RCS_example, permuteMat)
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10, seed=0) RCS_example <- exonTestability(RCS_example) RCS_example <- DSpermute4GSEA(RCS_example, permuteMat)
Calculate read counts of genes from a ReadCountSet object
getGeneCount(RCS)
getGeneCount(RCS)
RCS |
a ReadCountSet object |
This function can be used to get gene read counts from exon read counts.
a matrix of gene read counts for each gene (row) and each sample (col).
Xi Wang, [email protected]
data(RCS_example, package="SeqGSEA") geneCounts <- getGeneCount(RCS_example)
data(RCS_example, package="SeqGSEA") geneCounts <- getGeneCount(RCS_example)
An exemplified SeqGeneSet object to demonstrate functions in the SeqGSEA
package.
This object was generated with collection #6 (C6) gene sets of the Molecular Signatures
Database (MSigDB) v3.1.
data("GS_example")
data("GS_example")
Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S., and Mesirov, J. P. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA, 102(43): 15545-50.
Form a table for GSEA results.
GSEAresultTable(gene.set, GSDesc = FALSE)
GSEAresultTable(gene.set, GSDesc = FALSE)
gene.set |
a SeqGeneSet object after running |
GSDesc |
logical indicating whether to output gene set descriptions. default: FALSE |
A data frame containing columns of GSName, GSSize, ES, ES.pos, pval, FDR, and FWER.
Xi Wang, [email protected]
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) data(DEscore.perm, package="SeqGSEA") data(DSscore.perm, package="SeqGSEA") gene.score.perm <- genePermuteScore(DEscore.perm, DSscore.perm, method="linear", DEweight=0.3) data(GS_example, package="SeqGSEA") GS_example <- GSEnrichAnalyze(GS_example, gene.score, gene.score.perm) head(GSEAresultTable(GS_example))
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) data(DEscore.perm, package="SeqGSEA") data(DSscore.perm, package="SeqGSEA") gene.score.perm <- genePermuteScore(DEscore.perm, DSscore.perm, method="linear", DEweight=0.3) data(GS_example, package="SeqGSEA") GS_example <- GSEnrichAnalyze(GS_example, gene.score, gene.score.perm) head(GSEAresultTable(GS_example))
The main function of gene set enrichment analysis
GSEnrichAnalyze(gene.set, gene.score, gene.score.perm, weighted.type = 1)
GSEnrichAnalyze(gene.set, gene.score, gene.score.perm, weighted.type = 1)
gene.set |
a SeqGeneSet object. |
gene.score |
a vector of integrated gene scores in the same order as genes listed in the |
gene.score.perm |
a matrix of integrated gene scores on permutation data sets; row: genes; col: permutation. |
weighted.type |
weight type for gene scores; default: 1. |
A SeqGeneSet object with many slots updated, such as GSEA.ES
and GSEA.pval
.
Xi Wang, [email protected]
Xi Wang and Murray J. Cairns (2013). Gene Set Enrichment Analysis of RNA-Seq Data: Integrating Differential Expression and Splicing. BMC Bioinformatics, 14(Suppl 5):S16.
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) data(DEscore.perm, package="SeqGSEA") data(DSscore.perm, package="SeqGSEA") gene.score.perm <- genePermuteScore(DEscore.perm, DSscore.perm, method="linear", DEweight=0.3) data(GS_example, package="SeqGSEA") GS_example <- GSEnrichAnalyze(GS_example, gene.score, gene.score.perm) topGeneSets(GS_example, 5)
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) data(DEscore.perm, package="SeqGSEA") data(DSscore.perm, package="SeqGSEA") gene.score.perm <- genePermuteScore(DEscore.perm, DSscore.perm, method="linear", DEweight=0.3) data(GS_example, package="SeqGSEA") GS_example <- GSEnrichAnalyze(GS_example, gene.score, gene.score.perm) topGeneSets(GS_example, 5)
Get the labels of samples in a ReadCountSet object
label(RCS)
label(RCS)
RCS |
a ReadCountSet object |
Xi Wang, [email protected]
data(RCS_example, package="SeqGSEA") label(RCS_example)
data(RCS_example, package="SeqGSEA") label(RCS_example)
This function is used to load (sub-)exon count data. Exon count data can
be got by the Python script count_in_exons.py
.
loadExonCountData(case.files, control.files)
loadExonCountData(case.files, control.files)
case.files |
a character vector containing the exon count file names for case samples |
control.files |
a character vector containing the exon count file names for control samples |
You may need the Python script count_in_exons.py (released with this package)
to generate your exon count files from read mapping results (say BAM files).
The detailed usage can be obtained by simply typing
python \path\to\count_in_exons.py
. Users can also use other scripts or
software for exon read counting.
The format of the exon count file is:
GeneName1:001[tab]Count11 GeneName1:002[tab]Count12 ... GeneName1:00N[tab]Count1N GeneName2:001[tab]Count21 ...
This function returns a ReadCountSet object.
Xi Wang, [email protected]
newReadCountSet
,
ReadCountSet-class
library(SeqGSEA) dat.dir = system.file("extdata", package="SeqGSEA", mustWork=TRUE) case.pattern <- "^SC" ctrl.pattern <- "^SN" case.files <- dir(dat.dir, pattern=case.pattern, full.names = TRUE) control.files <- dir(dat.dir, pattern=ctrl.pattern, full.names = TRUE) ## Not run: RCS <- loadExonCountData(case.files, control.files) RCS ## End(Not run)
library(SeqGSEA) dat.dir = system.file("extdata", package="SeqGSEA", mustWork=TRUE) case.pattern <- "^SC" ctrl.pattern <- "^SN" case.files <- dir(dat.dir, pattern=case.pattern, full.names = TRUE) control.files <- dir(dat.dir, pattern=ctrl.pattern, full.names = TRUE) ## Not run: RCS <- loadExonCountData(case.files, control.files) RCS ## End(Not run)
This function is to load annotation of gene sets from files. The files are in the format of Molecular Signatures Database (MSigDB), and those files can be downloaded at http://www.broadinstitute.org/gsea/msigdb/index.jsp.
loadGenesets(geneset.file, geneIDs, geneID.type = c("gene.symbol", "ensembl"), genesetsize.min = 5, genesetsize.max = 1000, singleCell = FALSE)
loadGenesets(geneset.file, geneIDs, geneID.type = c("gene.symbol", "ensembl"), genesetsize.min = 5, genesetsize.max = 1000, singleCell = FALSE)
geneset.file |
the file containing the gene set annotation. |
geneIDs |
gene IDs that have expression values in the studied data set. |
geneID.type |
indicating the type of gene IDs, gene symbol or emsembl gene IDs. |
genesetsize.min |
the minimum number of genes in a gene set that will be treated in the analysis. |
genesetsize.max |
the maximum number of genes in a gene set that will be treated in the analysis. |
singleCell |
logical, whether to creat a SeqGeneSet object for scGSEA. |
TBA
A SeqGeneSet object.
Xi Wang, [email protected]
## Not run: data(RCS_example, package="SeqGSEA") geneIDs <- geneID(RCS_example) geneID.type <- "ensembl" geneset.file <- system.file("extdata", "gs_symb.txt", package="SeqGSEA", mustWork=TRUE) GS <- loadGenesets(geneset.file, geneIDs, geneID.type = geneID.type) GS ## End(Not run)
## Not run: data(RCS_example, package="SeqGSEA") geneIDs <- geneID(RCS_example) geneID.type <- "ensembl" geneset.file <- system.file("extdata", "gs_symb.txt", package="SeqGSEA", mustWork=TRUE) GS <- loadGenesets(geneset.file, geneIDs, geneID.type = geneID.type) GS ## End(Not run)
This is an internal function to generate a new SeqGeneSet object.
newGeneSets(GS, GSNames, GSDescs, geneList, scGSEA = FALSE, name = NA_character_, sourceFile = NA_character_, GSSizeMin = 5, GSSizeMax = 1000)
newGeneSets(GS, GSNames, GSDescs, geneList, scGSEA = FALSE, name = NA_character_, sourceFile = NA_character_, GSSizeMin = 5, GSSizeMax = 1000)
GS |
a list, each element is an integer vector, indicating the indexes of genes in each gene set. See Details below. |
GSNames |
a character string vector, each is the name of each gene set. |
GSDescs |
a character string vector, each is the description of each gene set. |
geneList |
a character string vector of gene IDs. See Details below. |
scGSEA |
logical, if this object used for scGSEA. |
name |
the name of this category of gene sets. |
sourceFile |
the source file name of this category of gene sets. |
GSSizeMin |
the minimum number of genes in a gene set to be analyzed. Default: 5 |
GSSizeMax |
the maximum number of genes in a gene set to be analyzed. Default: 1000 |
TBA
A SeqGeneSet object.
Xi Wang, [email protected]
loadGenesets
,
SeqGeneSet-class
## gs <- newGeneSets(GS=list(1:10, 6:15, 11:20), geneList=paste("Gene", 1:22, sep=""), GSNames=c("gs1","gs2","gs3"), GSDescs=c("test1","test2","test3"), name="gs examples") gs ## End
## gs <- newGeneSets(GS=list(1:10, 6:15, 11:20), geneList=paste("Gene", 1:22, sep=""), GSNames=c("gs1","gs2","gs3"), GSDescs=c("test1","test2","test3"), name="gs examples") gs ## End
This is a internal function to generate a new ReadCountSet object.
newReadCountSet(readCounts, exonIDs, geneIDs)
newReadCountSet(readCounts, exonIDs, geneIDs)
readCounts |
a data frame, read counts for each exon of each samples. Must have colnames, which indicate the label of samples. |
exonIDs |
a character vector indicating exon IDs. |
geneIDs |
a character vector indicating gene IDs. |
A object of the ReadCountSet class.
Xi Wang, [email protected]
loadExonCountData
,
ReadCountSet-class
rcounts <- cbind(t(sapply(1:10, function(x) {rnbinom(5, size=10, prob=runif(1))} ) ) , t(sapply(1:10, function(x) {rnbinom(5, size=10, prob=runif(1))} ) ) ) colnames(rcounts) <- c(paste("S", 1:5, sep=""), paste("C", 1:5, sep="")) geneIDs <- c(rep("G1", 4), rep("G2", 6)) exonIDs <- c(paste("E", 1:4, sep=""), paste("E", 1:6, sep="")) ## RCS <- newReadCountSet(rcounts, exonIDs, geneIDs) RCS ## End
rcounts <- cbind(t(sapply(1:10, function(x) {rnbinom(5, size=10, prob=runif(1))} ) ) , t(sapply(1:10, function(x) {rnbinom(5, size=10, prob=runif(1))} ) ) ) colnames(rcounts) <- c(paste("S", 1:5, sep=""), paste("C", 1:5, sep="")) geneIDs <- c(rep("G1", 4), rep("G2", 6)) exonIDs <- c(paste("E", 1:4, sep=""), paste("E", 1:6, sep="")) ## RCS <- newReadCountSet(rcounts, exonIDs, geneIDs) RCS ## End
This is an internal function to normalize enrichment scores. For advanced users only.
normES(gene.set)
normES(gene.set)
gene.set |
a SeqGeneSet object after running |
A SeqGeneSet object with ES scores normalized.
Xi Wang, [email protected]
Get normalization factors from permutation scores for normalization DE or DS scores
normFactor(permStat)
normFactor(permStat)
permStat |
a matrix of NB-statistics from permutation data sets, with row corresponding to genes and columns to permutations. |
A vector of normalization factors, each for one gene.
Xi Wang, [email protected]
Xi Wang and Murray J. Cairns (2013). Gene Set Enrichment Analysis of RNA-Seq Data: Integrating Differential Expression and Splicing. BMC Bioinformatics, 14(Suppl 5):S16.
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10) RCS_example <- exonTestability(RCS_example) RCS_example <- estiExonNBstat(RCS_example) RCS_example <- estiGeneNBstat(RCS_example) RCS_example <- DSpermute4GSEA(RCS_example, permuteMat) ## (not run) DSscore.normFac <- normFactor(RCS_example@permute_NBstat_gene) DSscore <- scoreNormalization(RCS_example@featureData_gene$NBstat, DSscore.normFac) DSscore.perm <- scoreNormalization(RCS_example@permute_NBstat_gene, DSscore.normFac) ## End (not run)
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10) RCS_example <- exonTestability(RCS_example) RCS_example <- estiExonNBstat(RCS_example) RCS_example <- estiGeneNBstat(RCS_example) RCS_example <- DSpermute4GSEA(RCS_example, permuteMat) ## (not run) DSscore.normFac <- normFactor(RCS_example@permute_NBstat_gene) DSscore <- scoreNormalization(RCS_example@featureData_gene$NBstat, DSscore.normFac) DSscore.perm <- scoreNormalization(RCS_example@permute_NBstat_gene, DSscore.normFac) ## End (not run)
This function is to plot the distribution of enrichment scores, with comparison with permutation enrichment scores.
plotES(gene.set, pdf = NULL)
plotES(gene.set, pdf = NULL)
gene.set |
a SeqGeneSet object after running |
pdf |
whether to save the plot to PDF file; if yes, provide the name of the PDF file. |
Xi Wang, [email protected]
GSEnrichAnalyze
,
plotSigGeneSet
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) data(DEscore.perm, package="SeqGSEA") data(DSscore.perm, package="SeqGSEA") gene.score.perm <- genePermuteScore(DEscore.perm, DSscore.perm, method="linear", DEweight=0.3) data(GS_example, package="SeqGSEA") GS_example <- GSEnrichAnalyze(GS_example, gene.score, gene.score.perm) plotES(GS_example)
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) data(DEscore.perm, package="SeqGSEA") data(DSscore.perm, package="SeqGSEA") gene.score.perm <- genePermuteScore(DEscore.perm, DSscore.perm, method="linear", DEweight=0.3) data(GS_example, package="SeqGSEA") GS_example <- GSEnrichAnalyze(GS_example, gene.score, gene.score.perm) plotES(GS_example)
This function is to plot gene scores, as well as DE scores and DS scores
plotGeneScore(score, perm.score = NULL, pdf = NULL, main = c("Overall", "Expression", "Splicing"))
plotGeneScore(score, perm.score = NULL, pdf = NULL, main = c("Overall", "Expression", "Splicing"))
score |
the gene/DE/DS score vector. |
perm.score |
a matrix of the corresponding gene/DE/DS scores on the permutation data sets. |
pdf |
if a PDF file name provided, plot will be save to that file. |
main |
the key words representing the type of scores that will be shown in the plot main title. |
The plot shows the ranked scores from the largest to the smallest. Lines also show the maximum and average scores, values shown on the top left.
Xi Wang, [email protected]
data(DEscore, package="SeqGSEA") plotGeneScore(DEscore, main="Expression") data(DSscore, package="SeqGSEA") gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) plotGeneScore(gene.score)
data(DEscore, package="SeqGSEA") plotGeneScore(DEscore, main="Expression") data(DSscore, package="SeqGSEA") gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) plotGeneScore(gene.score)
The function is to generate a plot of p-values (FDRs) versus normalized enrichment scores (NES). It also shows the distribution of p-values (FDRs) in this gene set category.
plotSig(gene.set, pdf = NULL)
plotSig(gene.set, pdf = NULL)
gene.set |
a SeqGeneSet object after running |
pdf |
whether to save the plot to PDF file; if yes, provide the name of the PDF file. |
Xi Wang, [email protected]
GSEnrichAnalyze
,
plotSigGeneSet
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) data(DEscore.perm, package="SeqGSEA") data(DSscore.perm, package="SeqGSEA") gene.score.perm <- genePermuteScore(DEscore.perm, DSscore.perm, method="linear", DEweight=0.3) data(GS_example, package="SeqGSEA") GS_example <- GSEnrichAnalyze(GS_example, gene.score, gene.score.perm) plotSig(GS_example)
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) data(DEscore.perm, package="SeqGSEA") data(DSscore.perm, package="SeqGSEA") gene.score.perm <- genePermuteScore(DEscore.perm, DSscore.perm, method="linear", DEweight=0.3) data(GS_example, package="SeqGSEA") GS_example <- GSEnrichAnalyze(GS_example, gene.score, gene.score.perm) plotSig(GS_example)
This function is to generate a two-panel plot showing detailed information of the gene set specified. One panel is showing the running enrichment scores and the position where the ES appear. The other panel shows the significance level of the ES, comparing with permutation ESs.
plotSigGeneSet(gene.set, i, gene.score, pdf = NULL)
plotSigGeneSet(gene.set, i, gene.score, pdf = NULL)
gene.set |
a SeqGeneSet object after running |
i |
the i-th gene set in the SeqGeneSet object. |
gene.score |
the gene score vector containing gene scores for each gene. |
pdf |
whether to save the plot to PDF file; if yes, provide the name of the PDF file. |
See writeSigGeneSet
, which writes the detailed gene set information to a file or to the screen.
Xi Wang, [email protected]
GSEnrichAnalyze
,
topGeneSets
,
plotSig
,
plotES
,
writeSigGeneSet
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) data(DEscore.perm, package="SeqGSEA") data(DSscore.perm, package="SeqGSEA") gene.score.perm <- genePermuteScore(DEscore.perm, DSscore.perm, method="linear", DEweight=0.3) data(GS_example, package="SeqGSEA") GS_example <- GSEnrichAnalyze(GS_example, gene.score, gene.score.perm) topGeneSets(GS_example, n=5) plotSigGeneSet(GS_example, 9, gene.score) # 9th gene set is the most significant one.
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) data(DEscore.perm, package="SeqGSEA") data(DSscore.perm, package="SeqGSEA") gene.score.perm <- genePermuteScore(DEscore.perm, DSscore.perm, method="linear", DEweight=0.3) data(GS_example, package="SeqGSEA") GS_example <- GSEnrichAnalyze(GS_example, gene.score, gene.score.perm) topGeneSets(GS_example, n=5) plotSigGeneSet(GS_example, 9, gene.score) # 9th gene set is the most significant one.
Integration of differential expression and differential splice scores with a rank-based strategy, which simultaneously integrates observed scores and permutation scores using the same ranks.
rankCombine(DEscore, DSscore, DEscoreMat, DSscoreMat, DEweight = 0.5)
rankCombine(DEscore, DSscore, DEscoreMat, DSscoreMat, DEweight = 0.5)
DEscore |
differential expression scores, normalized. |
DSscore |
differential splice scores, normalized. |
DEscoreMat |
differential expression scores in permuted data sets, normalized. |
DSscoreMat |
differential splice scores in permuted data sets, normalized. |
DEweight |
any number between 0 and 1 (included), the weight of differential expression scores (so the weight for differential splice is (1-DEweight)). |
This integration method is also known as integration with global ranks. See Wang and Cairns (2013) for details.
A list with two elements geneScore
and genePermuteScore
.
Xi Wang, [email protected]
Xi Wang and Murray J. Cairns (2013). Gene Set Enrichment Analysis of RNA-Seq Data: Integrating Differential Expression and Splicing. BMC Bioinformatics, 14(Suppl 5):S16.
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") data(DEscore.perm, package="SeqGSEA") data(DSscore.perm, package="SeqGSEA") combine <- rankCombine(DEscore, DSscore, DEscore.perm, DSscore.perm, DEweight=0.3) gene.score <- combine$geneScore gene.score.perm <- combine$genePermuteScore
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") data(DEscore.perm, package="SeqGSEA") data(DSscore.perm, package="SeqGSEA") combine <- rankCombine(DEscore, DSscore, DEscore.perm, DSscore.perm, DEweight=0.3) gene.score <- combine$geneScore gene.score.perm <- combine$genePermuteScore
An exemplified ReadCountSet object to demonstrate functions in the SeqGSEA
package.
This object is comprised of 20 samples across 5,000 exons, a part of the prostate cancer
RNA-Seq data set from Kannan et al (2011).
Please note that the count data in this example object is incomplete.
data("RCS_example")
data("RCS_example")
Kannan, K., Wang, L., Wang, J., Ittmann, M. M., Li, W., and Yen, L. (2001). Recurrent chimeric RNAs enriched in human prostate cancer identified by deep sequencing. Proc Natl Acad Sci USA, 108(22): 9172-7.
"ReadCountSet"
ReadCountSet class
Objects can be created by calls of the form newReadCountSet
.
featureData_gene
:Object of class "data.frame"
. Data for each genes.
permute_NBstat_exon
:Object of class "matrix"
. NB statistics of exons on the permutation data sets.
permute_NBstat_gene
:Object of class "matrix"
. NB statistics of genes on the permutation data sets.
assayData
:Object of class "AssayData"
. The read count data.
phenoData
:Object of class "AnnotatedDataFrame"
. Data for each samples.
featureData
:Object of class "AnnotatedDataFrame"
. Data for each exons.
experimentData
:Object of class "MIAxE"
. Experiment data.
annotation
:Object of class "character"
. Not used.
protocolData
:Object of class "AnnotatedDataFrame"
. Protocol information.
.__classVersion__
:Object of class "Versions"
. Version information.
Get counts from a ReadCountSet object. See counts
.
Set counts to a ReadCountSet object. See counts
.
Class "eSet"
, directly.
Xi Wang, [email protected]
Xi Wang and Murray J. Cairns (2013). Gene Set Enrichment Analysis of RNA-Seq Data: Integrating Differential Expression and Splicing. BMC Bioinformatics, 14(Suppl 5):S16.
newReadCountSet
,
loadExonCountData
,
exonID
,
geneID
,
counts-methods
,
label
,
subsetByGenes
showClass("ReadCountSet")
showClass("ReadCountSet")
This function provides a wrapper to run DESeq for differential expression analysis. It includes two steps,
DESeq::estimateSizeFactors
and DESeq::estimateDispersions
.
runDESeq(geneCounts, label)
runDESeq(geneCounts, label)
geneCounts |
a matrix containing read counts for each gene, can be the output of |
label |
the sample classification labels. |
A DESeqDataSet object with size factors and dispersion parameters been estimated.
Xi Wang, [email protected]
Anders, S. and Huber, W. (2010) Differential expression analysis for sequence count data, Genome Biol, 11, R106.
getGeneCount
,
DENBTest
,
DENBStat4GSEA
data(RCS_example, package="SeqGSEA") geneCounts <- getGeneCount(RCS_example) label <- label(RCS_example) dds <- runDESeq(geneCounts, label)
data(RCS_example, package="SeqGSEA") geneCounts <- getGeneCount(RCS_example) label <- label(RCS_example) dds <- runDESeq(geneCounts, label)
This function provides typical SeqGSEA analysis pipelines for end users to apply the SeqGSEA method in the easiest fashion. It assumes the pipelines start with exon reads counts, even for the DE-only analysis. Users should specify their file locations and a few parameters before running this pipeline.
It allows DE-only analysis, which will skip the DS analysis portion, and it also allows users to try different weights in integrating DE and DS scores, which will save time in computing the DE and DS scores.
The function returns a list of SeqGSEA analysis results in the format of GSEAresultTable
, and
generates a few plots and writes a few files, whose name prefix can be specified. The output files will either
be in PDF format or TXT format, and generated by plotGeneScore
, writeScores
,
plotES
, plotSig
, plotSigGeneSet
, and writeSigGeneSet
.
runSeqGSEA(data.dir, case.pattern, ctrl.pattern, geneset.file, output.prefix, topGS=10, geneID.type=c("gene.symbol", "ensembl"), nCores=1, perm.times=1000, seed=NULL, minExonReadCount=5, integrationMethod=c("linear", "quadratic", "rank"), DEweight=c(0.5), DEonly=FALSE, minGSsize=5, maxGSsize=1000, GSEA.WeightedType=1)
runSeqGSEA(data.dir, case.pattern, ctrl.pattern, geneset.file, output.prefix, topGS=10, geneID.type=c("gene.symbol", "ensembl"), nCores=1, perm.times=1000, seed=NULL, minExonReadCount=5, integrationMethod=c("linear", "quadratic", "rank"), DEweight=c(0.5), DEonly=FALSE, minGSsize=5, maxGSsize=1000, GSEA.WeightedType=1)
data.dir |
a character vector, the path to your count data directory. |
case.pattern |
a character vector, the unique pattern in the file names of case samples. E.g, if file names starting with "SC", the pattern writes "^SC". |
ctrl.pattern |
a character vector, the unique pattern in the file names of control samples. |
geneset.file |
a character vector, the path to your gene set file. The gene set file must be in GMT format. Please refer to the link follows for details. http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29 |
output.prefix |
a character vector, the path with prefix for output files. |
topGS |
an integer, this number of top ranked gene sets will be output with details; if geneset.file contains less than this number of gene sets, all gene sets' result details will be output. Default: 10. |
geneID.type |
the gene ID type in geneset.file. Currently only support "gene.symbol" and "ensembl". Default: gene.symbol. |
nCores |
an integer. The number of cores for running SeqGSEA. Default: 1 |
perm.times |
an integer. The number of times for permutation, which will be used for normalizing DE and DS scores and for GSEA significance analysis. Recommended values are greater than 1000. Default: 1000. |
seed |
an integer or NULL, used for setting the seeds to generate random numbers. The same seed will guarantee the same analysis results given by SeqGSEA. Default: NULL. |
minExonReadCount |
an integer. An exon with total read count across all samples less than this number will be marked as untestable and be excluded in SeqGSEA analysis. Default: 5. |
integrationMethod |
one of the three integration methods for DE and DS score integration: linear, quadratic, or rank. Default: linear. |
DEweight |
a real number between 0 and 1 OR a vector of those. Each number is the DE weight in DE and DS integration. If using a vector of real numbers, SeqGSEA will run with each of them individually. Default: 0.5. |
DEonly |
logical, whether to run SeqGSEA only considering DE. Default: FALSE |
minGSsize |
an integer. The minimum gene set size: gene sets with genes less than this number will be skipped. Default: 5. |
maxGSsize |
an integer. The maximum gene set size: gene sets with genes greater than this number will be skipped. Default: 1000. |
GSEA.WeightedType |
the weight type of the main GSEA algorithm, can be 0 (unweighted = Kolmogorov-Smirnov), 1 (weighted), and 2 (over-weighted). Default: 1. It is recommended not to change it. |
A list of SeqGSEA analysis results in the format of GSEAresultTable
, which allows users for meta-analysis.
Xi Wang, [email protected]
Xi Wang and Murray J. Cairns (2013). Gene Set Enrichment Analysis of RNA-Seq Data: Integrating Differential Expression and Splicing. BMC Bioinformatics, 14(Suppl 5):S16.
GSEAresultTable
,
geneScore
,
GSEnrichAnalyze
### Initialization ### # input file location and pattern data.dir <- system.file("extdata", package="SeqGSEA", mustWork=TRUE) case.pattern <- "^SC" # file name starting with "SC" ctrl.pattern <- "^SN" # file name starting with "SN" # gene set file and type geneset.file <- system.file("extdata", "gs_symb.txt", package="SeqGSEA", mustWork=TRUE) geneID.type <- "ensembl" # output file prefix output.prefix <- "SeqGSEAexample" # analysis parameters nCores <- 1 perm.times <- 10 DEonly <- FALSE DEweight <- c(0.2, 0.5, 0.8) # a vector for different weights integrationMethod <- "linear" ### one step SeqGSEA running ### # Caution: if running the following command line, it will generate many files in your working directory ## Not run: runSeqGSEA(data.dir=data.dir, case.pattern=case.pattern, ctrl.pattern=ctrl.pattern, geneset.file=geneset.file, geneID.type=geneID.type, output.prefix=output.prefix, nCores=nCores, perm.times=perm.times, integrationMethod=integrationMethod, DEonly=DEonly, DEweight=DEweight) ## End(Not run)
### Initialization ### # input file location and pattern data.dir <- system.file("extdata", package="SeqGSEA", mustWork=TRUE) case.pattern <- "^SC" # file name starting with "SC" ctrl.pattern <- "^SN" # file name starting with "SN" # gene set file and type geneset.file <- system.file("extdata", "gs_symb.txt", package="SeqGSEA", mustWork=TRUE) geneID.type <- "ensembl" # output file prefix output.prefix <- "SeqGSEAexample" # analysis parameters nCores <- 1 perm.times <- 10 DEonly <- FALSE DEweight <- c(0.2, 0.5, 0.8) # a vector for different weights integrationMethod <- "linear" ### one step SeqGSEA running ### # Caution: if running the following command line, it will generate many files in your working directory ## Not run: runSeqGSEA(data.dir=data.dir, case.pattern=case.pattern, ctrl.pattern=ctrl.pattern, geneset.file=geneset.file, geneID.type=geneID.type, output.prefix=output.prefix, nCores=nCores, perm.times=perm.times, integrationMethod=integrationMethod, DEonly=DEonly, DEweight=DEweight) ## End(Not run)
Normalization of DE/DS scores or permutation DE/DS scores.
scoreNormalization(scores, norm.factor)
scoreNormalization(scores, norm.factor)
scores |
a vector (a nX1 matrix) of a matrix of scores, rows corresponding to genes and columns corresponding to a study or permutation. |
norm.factor |
normalization factor, output of the function |
A normalized vector or matrix depending on the input: with the same dimensions as the input.
Xi Wang, [email protected]
Xi Wang and Murray J. Cairns (2013). Gene Set Enrichment Analysis of RNA-Seq Data: Integrating Differential Expression and Splicing. BMC Bioinformatics, 14(Suppl 5):S16.
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10) RCS_example <- exonTestability(RCS_example) RCS_example <- estiExonNBstat(RCS_example) RCS_example <- estiGeneNBstat(RCS_example) RCS_example <- DSpermute4GSEA(RCS_example, permuteMat) ## (not run) DSscore.normFac <- normFactor(RCS_example@permute_NBstat_gene) DSscore <- scoreNormalization(RCS_example@featureData_gene$NBstat, DSscore.normFac) DSscore.perm <- scoreNormalization(RCS_example@permute_NBstat_gene, DSscore.normFac) ## End (not run)
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10) RCS_example <- exonTestability(RCS_example) RCS_example <- estiExonNBstat(RCS_example) RCS_example <- estiGeneNBstat(RCS_example) RCS_example <- DSpermute4GSEA(RCS_example, permuteMat) ## (not run) DSscore.normFac <- normFactor(RCS_example@permute_NBstat_gene) DSscore <- scoreNormalization(RCS_example@featureData_gene$NBstat, DSscore.normFac) DSscore.perm <- scoreNormalization(RCS_example@permute_NBstat_gene, DSscore.normFac) ## End (not run)
"SeqGeneSet"
SeqGeneSet class
Objects can be created by calls of the function newGeneSets
.
name
:Object of class "character"
the name of this gene set category
sourceFile
:Object of class "character"
the source file of gene set category
geneList
:Object of class "character"
the gene ID list indicating genes involved in this GSEA
GS
:Object of class "list"
a list of gene indexes corresponding to geneList
, each element in the list
indicating which genes are in each gene set of this SeqGeneSet object
GSNames
:Object of class "character"
. Gene set names.
GSDescs
:Object of class "character"
. Gene set descriptions.
GSSize
:Object of class "numeric"
. Gene set sizes.
GSSizeMin
:Object of class "numeric"
. The minimum gene set size to be analyzed.
GSSizeMax
:Object of class "numeric"
. The maximum gene set size to be analyzed.
GS.Excluded
:Object of class "list"
. Gene sets excluded to be analyzed.
GSNames.Excluded
:Object of class "character"
. Gene set names excluded to be analyzed.
GSDescs.Excluded
:Object of class "character"
. Gene set descriptions excluded to be analyzed.
GSEA.ES
:Object of class "numeric"
. Enrichment scores.
GSEA.ES.pos
:Object of class "numeric"
. The positions where enrichment scores appear.
GSEA.ES.perm
:Object of class "matrix"
. The enrichment scores of the permutation data sets.
GSEA.score.cumsum
:Object of class "matrix"
. Running enrichment scores.
GSEA.normFlag
:Object of class "logical"
. Logical indicating whether GSEA.ES
has been normalized.
GSEA.pval
:Object of class "numeric"
. P-values of each gene set.
GSEA.FWER
:Object of class "numeric"
. Family-wise error rate of each gene set.
GSEA.FDR
:Object of class "numeric"
. False discovery rate of each gene set.
sc.ES
:Object of class "numeric"
. Enrichment scores in scGSEA.
sc.ES.perm
:Object of class "matrix"
. The enrichment scores of the permutation data sets in scGSEA.
sc.normFlag
:Object of class "logical"
. Logical indicating whether sc.ES
has been normalized in scGSEA.
scGSEA
:Object of class "logical"
. Whether or not used for scGSEA.
sc.pval
:Object of class "numeric"
. P-values of each gene set in scGSEA.
sc.FWER
:Object of class "numeric"
. Family-wise error rate of each gene set in scGSEA.
sc.FDR
:Object of class "numeric"
. False discovery rate of each gene set in scGSEA.
version
:Object of class "Versions"
. Version information.
Get a sub-list of gene sets, and return a SeqGeneSet object.
Show basic information of the SeqGeneSet object.
Xi Wang, [email protected]
Xi Wang and Murray J. Cairns (2013). Gene Set Enrichment Analysis of RNA-Seq Data: Integrating Differential Expression and Splicing. BMC Bioinformatics, 14(Suppl 5):S16.
newGeneSets
,
size
,
geneSetNames
,
geneSetDescs
,
geneSetSize
showClass("SeqGeneSet")
showClass("SeqGeneSet")
The is an internal function to calculate significance of ESs of each gene set. For advanced users only.
signifES(gene.set)
signifES(gene.set)
gene.set |
a GeneSet object after running |
A SeqGeneSet object with gene set enrichment significance metrics calculated.
Xi Wang, [email protected]
This function to get the number of gene sets in a SeqGeneSet object.
size(GS)
size(GS)
GS |
an object of class SeqGeneSet. |
Gene sets with size less than GSSizeMin
or more than GSSizeMax
are not included.
The number of gene sets in this SeqGeneSet object.
Xi Wang, [email protected]
SeqGeneSet-class
,
loadGenesets
data(GS_example, package="SeqGSEA") size(GS_example)
data(GS_example, package="SeqGSEA") size(GS_example)
Get a new ReadCountSet with specified gene IDs.
subsetByGenes(RCS, genes)
subsetByGenes(RCS, genes)
RCS |
a ReadCountSet object. |
genes |
a list of gene IDS. |
This function returns a new ReadCountSet object, with changes in slots assayData
, featureData
, featureData_gene
, and permute_NBstat_exon
and permute_NBstat_gene
if they have been calculated.
Xi Wang, [email protected]
data(RCS_example, package="SeqGSEA") RCS_example genes <- c("ENSG00000000938", "ENSG00000000005") RCS_sub <- subsetByGenes(RCS_example, genes) RCS_sub
data(RCS_example, package="SeqGSEA") RCS_example genes <- c("ENSG00000000938", "ENSG00000000005") RCS_sub <- subsetByGenes(RCS_example, genes) RCS_sub
This function is to extract top n differentially expressed genes, ranked by either DESeq p-values, DESeq adjusted p-values, permutation p-values, permutation adjusted p-values, or NB-statistics.
topDEGenes(DEGres, n = 20, sortBy = c("padj", "pval", "perm.pval", "perm.padj", "NBstat", "foldChange"))
topDEGenes(DEGres, n = 20, sortBy = c("padj", "pval", "perm.pval", "perm.padj", "NBstat", "foldChange"))
DEGres |
DE analysis results. |
n |
the number of top DE genes. |
sortBy |
indicating which method to rank genes. |
If the sortBy method is not among the column names, the function will result in an error.
A table for top n DE genes with significance metrics.
Xi Wang, [email protected]
data(RCS_example, package="SeqGSEA") geneCounts <- getGeneCount(RCS_example) label <- label(RCS_example) DEG <- runDESeq(geneCounts, label) permuteMat <- genpermuteMat(RCS_example, times=10) DEGres <- DENBTest(DEG) DEpermNBstat <- DENBStatPermut4GSEA(DEG, permuteMat) DEGres <- DEpermutePval(DEGres, DEpermNBstat) topDEGenes(DEGres, n = 10, sortBy = "NBstat")
data(RCS_example, package="SeqGSEA") geneCounts <- getGeneCount(RCS_example) label <- label(RCS_example) DEG <- runDESeq(geneCounts, label) permuteMat <- genpermuteMat(RCS_example, times=10) DEGres <- DENBTest(DEG) DEpermNBstat <- DENBStatPermut4GSEA(DEG, permuteMat) DEGres <- DEpermutePval(DEGres, DEpermNBstat) topDEGenes(DEGres, n = 10, sortBy = "NBstat")
This function is to extract top n differentially spliced exons, ranked by p-values or NB-stats.
topDSExons(RCS, n = 20, sortBy = c("pvalue", "NBstat"))
topDSExons(RCS, n = 20, sortBy = c("pvalue", "NBstat"))
RCS |
a ReadCountSet object after running |
n |
the number of top genes. |
sortBy |
indicating whether p-value or NBstat to be used for ranking genes. |
A table for top n exons. Columns include: geneID, exonID, testable, NBstat, pvalue, padjust, and meanCounts.
Xi Wang, [email protected]
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10) RCS_example <- exonTestability(RCS_example) RCS_example <- estiExonNBstat(RCS_example) RCS_example <- estiGeneNBstat(RCS_example) RCS_example <- DSpermutePval(RCS_example, permuteMat) topDSExons(RCS_example, 10, "NB")
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10) RCS_example <- exonTestability(RCS_example) RCS_example <- estiExonNBstat(RCS_example) RCS_example <- estiGeneNBstat(RCS_example) RCS_example <- DSpermutePval(RCS_example, permuteMat) topDSExons(RCS_example, 10, "NB")
This function to extract top n differentially spliced genes, ranked by p-values or NBstats.
topDSGenes(RCS, n = 20, sortBy = c("pvalue", "NBstat"))
topDSGenes(RCS, n = 20, sortBy = c("pvalue", "NBstat"))
RCS |
a ReadCountSet object after running |
n |
the number of top genes. |
sortBy |
indicating whether p-value or NBstat to be used for ranking genes. |
A table for top n genes. Columns include: geneID, NBstat, pvalue, and padjust.
Xi Wang, [email protected]
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10) RCS_example <- exonTestability(RCS_example) RCS_example <- estiExonNBstat(RCS_example) RCS_example <- estiGeneNBstat(RCS_example) RCS_example <- DSpermutePval(RCS_example, permuteMat) topDSGenes(RCS_example, 10, "NB")
data(RCS_example, package="SeqGSEA") permuteMat <- genpermuteMat(RCS_example, times=10) RCS_example <- exonTestability(RCS_example) RCS_example <- estiExonNBstat(RCS_example) RCS_example <- estiGeneNBstat(RCS_example) RCS_example <- DSpermutePval(RCS_example, permuteMat) topDSGenes(RCS_example, 10, "NB")
This function is to extract n top significant gene sets overrepresented in the samples studied, ranked by FDR, p-values, or FWER.
topGeneSets(gene.set, n = 20, sortBy = c("FDR", "pvalue", "FWER"), GSDesc = FALSE)
topGeneSets(gene.set, n = 20, sortBy = c("FDR", "pvalue", "FWER"), GSDesc = FALSE)
gene.set |
an object of class SeqGeneSet after GSEA runs. |
n |
the number of top gene sets. |
sortBy |
indicating which method to rank gene sets. |
GSDesc |
logical indicating whether or not to output gene set descriptions. |
A data frame for top n gene sets detected with respect to the ranking method specified. Information includes: GSName, GSSize, ES, ES.pos, pval, FDR, and FWER.
Xi Wang, [email protected]
GSEnrichAnalyze
,
GSEAresultTable
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) data(DEscore.perm, package="SeqGSEA") data(DSscore.perm, package="SeqGSEA") gene.score.perm <- genePermuteScore(DEscore.perm, DSscore.perm, method="linear", DEweight=0.3) data(GS_example, package="SeqGSEA") GS_example <- GSEnrichAnalyze(GS_example, gene.score, gene.score.perm) topGeneSets(GS_example, n=5)
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) data(DEscore.perm, package="SeqGSEA") data(DSscore.perm, package="SeqGSEA") gene.score.perm <- genePermuteScore(DEscore.perm, DSscore.perm, method="linear", DEweight=0.3) data(GS_example, package="SeqGSEA") GS_example <- GSEnrichAnalyze(GS_example, gene.score, gene.score.perm) topGeneSets(GS_example, n=5)
This function is to write DE and DS scores, and optionally gene scores.
writeScores(DEscore, DSscore, geneScore=NULL, geneScoreAttr=NULL, file="")
writeScores(DEscore, DSscore, geneScore=NULL, geneScoreAttr=NULL, file="")
DEscore |
normalized DE scores. |
DSscore |
normalized DS scores. |
geneScore |
gene scores integrated from DE and DS scores. |
geneScoreAttr |
the parameters for integrating DE and DS scores. |
file |
output file name, if not specified print to screen. |
Xi Wang, [email protected]
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) writeScores(DEscore, DSscore) # without gene scores writeScores(DEscore, DSscore, geneScore = gene.score, geneScoreAttr = "linear,0.3") # gene scores with attr.
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) writeScores(DEscore, DSscore) # without gene scores writeScores(DEscore, DSscore, geneScore = gene.score, geneScoreAttr = "linear,0.3") # gene scores with attr.
This function is to write the specified gene set (whose index is i) with significance information, including p-value and FDR, and gene scores for each gene in this set.
writeSigGeneSet(gene.set, i, gene.score, file = "")
writeSigGeneSet(gene.set, i, gene.score, file = "")
gene.set |
an object of class SeqGeneSet with |
i |
the i-th gene set in the SeqGeneSet object. |
gene.score |
the vector of gene scores for running GSEA. |
file |
output file name, if not specified print to screen. |
See plotSigGeneSet
, which shows graphic information of the gene set specified.
Xi Wang, [email protected]
GSEnrichAnalyze
,
topGeneSets
,
plotSigGeneSet
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) data(DEscore.perm, package="SeqGSEA") data(DSscore.perm, package="SeqGSEA") gene.score.perm <- genePermuteScore(DEscore.perm, DSscore.perm, method="linear", DEweight=0.3) data(GS_example, package="SeqGSEA") GS_example <- GSEnrichAnalyze(GS_example, gene.score, gene.score.perm) topGeneSets(GS_example, n=5) writeSigGeneSet(GS_example, 9, gene.score) # 9th gene set is the most significant one.
data(DEscore, package="SeqGSEA") data(DSscore, package="SeqGSEA") gene.score <- geneScore(DEscore, DSscore, method="linear", DEweight = 0.3) data(DEscore.perm, package="SeqGSEA") data(DSscore.perm, package="SeqGSEA") gene.score.perm <- genePermuteScore(DEscore.perm, DSscore.perm, method="linear", DEweight=0.3) data(GS_example, package="SeqGSEA") GS_example <- GSEnrichAnalyze(GS_example, gene.score, gene.score.perm) topGeneSets(GS_example, n=5) writeSigGeneSet(GS_example, 9, gene.score) # 9th gene set is the most significant one.