Title: | Rotation based gene set analysis |
---|---|
Description: | This package implements a variety of functions useful for gene set analysis using rotations to approximate the null distribution. It contributes with the implementation of seven test statistic scores that can be used with different goals and interpretations. Several functions are available to complement the statistical results with graphical representations. |
Authors: | Adria Caballe [aut, cre] |
Maintainer: | Adria Caballe <[email protected]> |
License: | GPL-3 |
Version: | 1.5.0 |
Built: | 2024-10-31 04:31:50 UTC |
Source: | https://github.com/bioc/roastgsa |
from dragtable v1.0 of Dan Vanderkam.
dragtable
dragtable
character vector
Character vector with dragtable
http://danvk.org/dragtable/
kryogenix.org/code/browser/sorttable
Counts matrix of RNA-seq study with 19 tumor Bladder Urothelial Carcinoma samples and 19 adjacent healthy tissues
expr.tcga
expr.tcga
matrix
Matrix with expression matrix
https://bioconductor.org/packages/release/bioc/html/GSEABenchmarkeR.html
Geistlinger L, Csaba G, Santarelli M, Ramos M, Schiffer L, Law C, Turaga N, Davis S, Carey V, Morgan M, Zimmer R, Waldron L (2020). Toward a gold standard for benchmarking gene set enrichment analysis. Briefings in Bioinformatics. doi:10.1093/bib/bbz158.
Gene information of RNA-seq study with 19 tumor Bladder Urothelial Carcinoma samples and 19 adjacent healthy tissues
fd.tcga
fd.tcga
DFrame
Data frame with gene symbols
https://bioconductor.org/packages/release/bioc/html/GSEABenchmarkeR.html
Geistlinger L, Csaba G, Santarelli M, Ramos M, Schiffer L, Law C, Turaga N, Davis S, Carey V, Morgan M, Zimmer R, Waldron L (2020). Toward a gold standard for benchmarking gene set enrichment analysis. Briefings in Bioinformatics. doi:10.1093/bib/bbz158.
Hallmark geneset collection from msigdb
hallmarks.hs
hallmarks.hs
character list
List with hallmark genes
https://www.gsea-msigdb.org/gsea/downloads.jsp
Liberzon, A. et al.: The Molecular Signatures Database Hallmark Gene Set Collection. Cell Systems 1(6), 417-425 (2015). doi:10.1016/j.cels.2015.12.004
Heatmap showing sample variation for either genes (in a particular gene set) or summarized gene signatures.
heatmaprgsa_hm(obj, y, intvar, adj.var = NULL, whplot = 1, toplot = TRUE, pathwaylevel = FALSE, mycol = c("black","orange","green","white"), sample2zero = FALSE, rgsa.like=FALSE, psel = NULL, dendrogram = "n", col= bluered(100), trace='none', notecol='black', notecex=1, keysize=.9, cexCol=1.5, Rowv = NULL, Colv = FALSE, las =2, fdrkey = FALSE, quantile.sat = 0.95, order1= NULL, order2 = NULL, sizex =8, sizey =5, ...)
heatmaprgsa_hm(obj, y, intvar, adj.var = NULL, whplot = 1, toplot = TRUE, pathwaylevel = FALSE, mycol = c("black","orange","green","white"), sample2zero = FALSE, rgsa.like=FALSE, psel = NULL, dendrogram = "n", col= bluered(100), trace='none', notecol='black', notecex=1, keysize=.9, cexCol=1.5, Rowv = NULL, Colv = FALSE, las =2, fdrkey = FALSE, quantile.sat = 0.95, order1= NULL, order2 = NULL, sizex =8, sizey =5, ...)
obj |
an object of class 'roastgsa' |
y |
data used for roastgsa |
intvar |
name of variable of
interest in |
adj.var |
name of covariates in |
whplot |
selected pathway. If integer vector, the pathways are
selected in the same order as the table in |
toplot |
whether to plot the heatmap or just return the adjusted expression matrix |
pathwaylevel |
If TRUE, the heatmap shows the variation at the pathway level. Otherwise, the heatmap shows the variation of all genes in the selected pathways. |
mycol |
color for heatmap columns defining the groups of the variable of interest |
sample2zero |
Only applicable for |
rgsa.like |
apply roastgsa transformations of data (restandarization and set.statistic operations) samplewise (see details below). |
psel |
character vector with probesets (one per gene) to be used for roastgsa statistic in a microarray experiment |
dendrogram |
|
col |
|
trace |
|
notecol |
|
notecex |
|
keysize |
|
cexCol |
|
Rowv |
|
Colv |
|
las |
orientation of x axis |
fdrkey |
if |
quantile.sat |
numeric between 0.5 and 1 used to saturate high values at such specified quantile (used to avoid extreme values in the visualization) |
order1 |
genes order. If NULL its ordered based on the moderated-t statistics |
order2 |
samples order. If NULL its ordered using the information of the variable of interest. |
sizex |
size of x axis |
sizey |
size of y axis |
... |
Arguments passed to or from other methods to the low level. |
This heatmap considers columns (
being the sample size). The
first column represents the moderated-t statistic (or a
restandarization of the same in case of competitive testing). The
other columns confine the expression data scaled by the standard error of
the estimated coefficient in the model and centered (if
rgsa.like = TRUE
). In such case, the cross product of
all data columns and the design matrix equals the first column of the
heatmap, and the average of the first column of the heatmap equals
the observed roastgsa test statistic (at least when the set.statistic
used is either mean
or maxmean
).
a data.frame object with source data for heatmap representation
Adria Caballe Mestres
[1] A comparison of rotation-based scores for gene set analysis Adria Caballe Mestres, Antonio Berenguer Llergo, Camille Stephan-Otto Attolini bioRxiv 2021.03.23.436604;
roastgsa
and plotStats
and
plotGSEA
y <- array(rnorm(10000),dim = c(1000,10)) covar <- data.frame(voi = factor(c(rep(0,5),rep(1,5)))) colnames(y) <- rownames(covar) <- paste0("sample",1:10) rownames(y) <- paste0("gene",1:1000) form <- as.formula("~ voi") index <- lapply(1:10, function(o) sample(1:1000,50)) names(index) <- paste0("gset",1:10) roastgsa1 <- roastgsa(y, covar, form = form, self.contained = TRUE, set.statistic = "maxmean", index = index, nrot = 200, mccores = 1, executation.info = FALSE) heatmaprgsa_hm(roastgsa1, y, intvar = "voi", whplot = 1, toplot = TRUE, pathwaylevel = FALSE, mycol = c("black","orange","green","white"), sample2zero = FALSE) heatmaprgsa_hm(roastgsa1, y, intvar = "voi", whplot = 1:10, toplot = TRUE, pathwaylevel = TRUE, mycol = c("black","orange","green","white"), sample2zero = FALSE)
y <- array(rnorm(10000),dim = c(1000,10)) covar <- data.frame(voi = factor(c(rep(0,5),rep(1,5)))) colnames(y) <- rownames(covar) <- paste0("sample",1:10) rownames(y) <- paste0("gene",1:1000) form <- as.formula("~ voi") index <- lapply(1:10, function(o) sample(1:1000,50)) names(index) <- paste0("gset",1:10) roastgsa1 <- roastgsa(y, covar, form = form, self.contained = TRUE, set.statistic = "maxmean", index = index, nrot = 200, mccores = 1, executation.info = FALSE) heatmaprgsa_hm(roastgsa1, y, intvar = "voi", whplot = 1, toplot = TRUE, pathwaylevel = FALSE, mycol = c("black","orange","green","white"), sample2zero = FALSE) heatmaprgsa_hm(roastgsa1, y, intvar = "voi", whplot = 1:10, toplot = TRUE, pathwaylevel = TRUE, mycol = c("black","orange","green","white"), sample2zero = FALSE)
Writing html document with roastgsa output
htmlrgsa(obj, htmlpath = "", htmlname = "file.html", plotpath ="", plotstats = TRUE, plotgsea = TRUE, indheatmap = TRUE, ploteffsize = TRUE, links_plots = list(stats= NULL, gsea = NULL, heatmap = NULL, effsize = NULL), y, whplots = NULL, geneDEhtmlfiles = NULL, tit = "", margins = c(5,16), sizesHeatmap = c(1200, 800), typeheatmap = c("heatmap.2","ggplot2"), intvar, adj.var = NULL, mycol, varrot, psel = NULL, sorttable, dragtable, ...)
htmlrgsa(obj, htmlpath = "", htmlname = "file.html", plotpath ="", plotstats = TRUE, plotgsea = TRUE, indheatmap = TRUE, ploteffsize = TRUE, links_plots = list(stats= NULL, gsea = NULL, heatmap = NULL, effsize = NULL), y, whplots = NULL, geneDEhtmlfiles = NULL, tit = "", margins = c(5,16), sizesHeatmap = c(1200, 800), typeheatmap = c("heatmap.2","ggplot2"), intvar, adj.var = NULL, mycol, varrot, psel = NULL, sorttable, dragtable, ...)
obj |
an object of class 'roastgsa' |
htmlpath |
path for html file to be placed |
htmlname |
name of html file |
plotpath |
added path from argument |
plotstats |
plots using |
plotgsea |
plots using |
indheatmap |
plots using |
ploteffsize |
plots using |
links_plots |
list with 4 elements (stats, gsea, heatmap and
effsize) specifying the path of all plots (paths set from htmlpath)
in case these were already created. If NULL, links are obtained from
|
y |
data used for |
whplots |
selected pathways. If integer vector, the pathways are selected in the same order as the table in obj$res. If null all tested pathways are selected |
geneDEhtmlfiles |
vector with links to html-tables showing the
differential expression results for the subsets of genes determined by
|
tit |
title of the html file |
margins |
margins for the heatmap plots |
sizesHeatmap |
vector with two elements providing png sizes (width, height) |
typeheatmap |
either ggplot2 type or heatmap.2 type |
intvar |
for |
adj.var |
for |
mycol |
color for heatmap columns defining the groups of the variable of interest |
varrot |
an object of class 'varrotrand'
(see |
psel |
character vector with probesets (one per gene) to be used for roastgsa statistic in a microarray experiment |
sorttable |
internal data loaded with |
dragtable |
internal data loaded with |
... |
Arguments passed to or from other methods to the low level. |
This function permits to explore a html-table with the statistical
results and graphical representation of the top gene sets obtained
from an object of class roastgsa
.
By default four plots are considered for each gene set of interest:
plotStats
, plotGSEA
,
heatmaprgsa_hm
and ploteffsignaturesize
.
The first three can be computed from the 'roastgsa' object, whereas
for ploteffsignaturesize
, an object of class 'varrotrand'
(see varrotrand
) with the estimated rotation score
variances for randomly selected gene sets of several sizes has to
be defined at first.
It saves an html table with the main results of the roastgsa hypothesis tesing.
Adria Caballe Mestres
[1] A comparison of rotation-based scores for gene set analysis Adria Caballe Mestres, Antonio Berenguer Llergo, Camille Stephan-Otto Attolini bioRxiv 2021.03.23.436604;
data(sorttable) data(dragtable) y <- array(rnorm(10000),dim = c(1000,10)) covar <- data.frame(voi = factor(c(rep(0,5),rep(1,5)))) colnames(y) <- rownames(covar) <- paste0("sample",1:10) rownames(y) <- paste0("gene",1:1000) form <- as.formula("~ voi") index <- lapply(1:10, function(o) sample(1:1000,50)) names(index) <- paste0("gset",1:10) roastgsa1 <- roastgsa(y, covar, form = form, self.contained = TRUE, set.statistic = "maxmean", index = index, nrot = 200, mccores = 1, executation.info = FALSE) htmlrgsa(roastgsa1, htmlpath = "", htmlname = "test.html", plotpath ="plots/", plotstats = FALSE, plotgsea = FALSE, indheatmap = FALSE, ploteffsize = FALSE, links_plots = list(stats= NULL, gsea = NULL, heatmap = NULL, effsize = NULL), y = y, sorttable = sorttable, dragtable = dragtable)
data(sorttable) data(dragtable) y <- array(rnorm(10000),dim = c(1000,10)) covar <- data.frame(voi = factor(c(rep(0,5),rep(1,5)))) colnames(y) <- rownames(covar) <- paste0("sample",1:10) rownames(y) <- paste0("gene",1:1000) form <- as.formula("~ voi") index <- lapply(1:10, function(o) sample(1:1000,50)) names(index) <- paste0("gset",1:10) roastgsa1 <- roastgsa(y, covar, form = form, self.contained = TRUE, set.statistic = "maxmean", index = index, nrot = 200, mccores = 1, executation.info = FALSE) htmlrgsa(roastgsa1, htmlpath = "", htmlname = "test.html", plotpath ="plots/", plotstats = FALSE, plotgsea = FALSE, indheatmap = FALSE, ploteffsize = FALSE, links_plots = list(stats= NULL, gsea = NULL, heatmap = NULL, effsize = NULL), y = y, sorttable = sorttable, dragtable = dragtable)
KEGG genesets obtained with limma function getGeneKEGGLinks
kegg.hs
kegg.hs
character list
List with KEGG genes
https://www.kegg.jp/kegg/rest/keggapi.html
Kanehisa, M. and Goto, S.; KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27-30 (2000).
Sample information of RNA-seq study with 19 tumor Bladder Urothelial Carcinoma samples and 19 adjacent healthy tissues
pd.tcga
pd.tcga
DFrame
Data frame with sample info
https://bioconductor.org/packages/release/bioc/html/GSEABenchmarkeR.html
Geistlinger L, Csaba G, Santarelli M, Ramos M, Schiffer L, Law C, Turaga N, Davis S, Carey V, Morgan M, Zimmer R, Waldron L (2020). Toward a gold standard for benchmarking gene set enrichment analysis. Briefings in Bioinformatics. doi:10.1093/bib/bbz158.
Plot for roastgsa
objects
## S3 method for class 'roastgsa' plot(x, type = c("stats","GSEA"), whplot = 1, maintitle = "", gsainfo = TRUE, cex.sub = 0.8, lwd = 2, ...)
## S3 method for class 'roastgsa' plot(x, type = c("stats","GSEA"), whplot = 1, maintitle = "", gsainfo = TRUE, cex.sub = 0.8, lwd = 2, ...)
x |
an object of class |
type |
plot type, either 'stats' or 'GSEA' |
whplot |
selected pathway. If integer vector, the pathways are
selected in the same order as observed in the |
maintitle |
plot main title. If |
gsainfo |
if |
cex.sub |
cex for subtitle |
lwd |
line width |
... |
Arguments passed to or from other methods to the low level. |
Details for using 'type = stats' in the plot are given in
plotStats
.
Details for using 'type = GSEA' in the plot are given in plotGSEA
.
plot object with the graphical representation of roastgsa
results.
Adria Caballe Mestres
[1] A comparison of rotation-based scores for gene set analysis Adria Caballe Mestres, Antonio Berenguer Llergo, Camille Stephan-Otto Attolini bioRxiv 2021.03.23.436604;
y <- array(rnorm(10000),dim = c(1000,10)) covar <- data.frame(voi = factor(c(rep(0,5),rep(1,5)))) colnames(y) <- rownames(covar) <- paste0("sample",1:10) rownames(y) <- paste0("gene",1:1000) form <- as.formula("~ voi") index <- lapply(1:10, function(o) sample(1:1000,50)) names(index) <- paste0("gset",1:10) roastgsa1 <- roastgsa(y, covar, form = form, self.contained = TRUE, set.statistic = "maxmean", index = index, nrot = 200, mccores = 1, executation.info = FALSE) plot(roastgsa1, type = "stats", whplot = 1, gsainfo =TRUE, maintitle = "", statistic = "mean")
y <- array(rnorm(10000),dim = c(1000,10)) covar <- data.frame(voi = factor(c(rep(0,5),rep(1,5)))) colnames(y) <- rownames(covar) <- paste0("sample",1:10) rownames(y) <- paste0("gene",1:1000) form <- as.formula("~ voi") index <- lapply(1:10, function(o) sample(1:1000,50)) names(index) <- paste0("gset",1:10) roastgsa1 <- roastgsa(y, covar, form = form, self.contained = TRUE, set.statistic = "maxmean", index = index, nrot = 200, mccores = 1, executation.info = FALSE) plot(roastgsa1, type = "stats", whplot = 1, gsainfo =TRUE, maintitle = "", statistic = "mean")
Scatter plot of single sample z-score summarized data
## S3 method for class 'ssGSA' plot(x, orderby, whplot = 1, col = "black", samplename =FALSE, maintitle = "", ssgsaInfo = TRUE, cex.sub = 0.8, ...)
## S3 method for class 'ssGSA' plot(x, orderby, whplot = 1, col = "black", samplename =FALSE, maintitle = "", ssgsaInfo = TRUE, cex.sub = 0.8, ...)
x |
object of class 'ssGSA' |
orderby |
numeric or factor vector of the same size and order of data columns used for ssGSA. It sets the x-axis of the plot |
whplot |
selected pathway. If integer vector, the pathways are
selected in the same order as the table in |
col |
color of scatterplot points |
samplename |
whether to show or not the names of the samples instead of points |
maintitle |
plot main title. If |
ssgsaInfo |
if TRUE, the subtitle shows the ssGSA results |
cex.sub |
cex for subtitle |
... |
Arguments passed to or from other methods to the low level. |
This graphic is a great alternative to explore gene set variation at
sample level. This is sometimes ignored when doing GSEA, where classic
representations (e.g., plotGSEA
) show gene variation
after averaging out the sample differences within each experimental
condition.
plot object with the graphical representation of ssGSA results
Adria Caballe Mestres
[1] Caballe Mestres A, Berenguer Llergo A and Stephan-Otto Attolini C. Adjusting for systematic technical biases in risk assessment of gene signatures in transcriptomic cancer cohorts. bioRxiv (2018).
y <- array(rnorm(10000),dim = c(1000,10)) covar <- data.frame(voi = factor(c(rep(0,5),rep(1,5)))) colnames(y) <- rownames(covar) <- paste0("sample",1:10) rownames(y) <- paste0("gene",1:1000) form <- as.formula("~ voi") index <- lapply(1:10, function(o) sample(1:1000,50)) names(index) <- paste0("gset",1:10) design <- model.matrix(form, covar) ssgsa1 <- ssGSA(y, obj=NULL, design = design, contrast = 2, index = index, method = c("GScor")) plot(ssgsa1, orderby = covar$voi, whplot = 1 )
y <- array(rnorm(10000),dim = c(1000,10)) covar <- data.frame(voi = factor(c(rep(0,5),rep(1,5)))) colnames(y) <- rownames(covar) <- paste0("sample",1:10) rownames(y) <- paste0("gene",1:1000) form <- as.formula("~ voi") index <- lapply(1:10, function(o) sample(1:1000,50)) names(index) <- paste0("gset",1:10) design <- model.matrix(form, covar) ssgsa1 <- ssGSA(y, obj=NULL, design = design, contrast = 2, index = index, method = c("GScor")) plot(ssgsa1, orderby = covar$voi, whplot = 1 )
Approximation of effective signature size under gene randomization
ploteffsignaturesize(obj, varrot, whplot = 1, ...)
ploteffsignaturesize(obj, varrot, whplot = 1, ...)
obj |
an object of class 'roastgsa' |
varrot |
an object of class 'varrotrand'
(see |
whplot |
selected pathway. If integer vector, the pathways are
selected in the same order as the table in |
... |
Arguments passed to or from other methods to the low level. |
The plot shows the approximated probability of obtaining a test statistic variance (under rotations of the residual space of the data) as extreme as the observed when generating randomly gene sets of several sizes.
plot object with the effective signature size representation of roastgsa results
Adria Caballe Mestres
[1] A comparison of rotation-based scores for gene set analysis Adria Caballe Mestres, Antonio Berenguer Llergo, Camille Stephan-Otto Attolini bioRxiv 2021.03.23.436604;
varrotrand
and roastgsa
y <- array(rnorm(10000),dim = c(1000,10)) covar <- data.frame(voi = factor(c(rep(0,5),rep(1,5)))) colnames(y) <- rownames(covar) <- paste0("sample",1:10) rownames(y) <- paste0("gene",1:1000) form <- as.formula("~ voi") index <- lapply(1:10, function(o) sample(1:1000,50)) names(index) <- paste0("gset",1:10) roastgsa1 <- roastgsa(y, covar, form = form, self.contained = TRUE, set.statistic = "maxmean", index = index, nrot = 100, mccores = 1, executation.info = FALSE) varrot <- varrotrand(roastgsa1, y, testedsizes = c(seq(5,50, by=5), seq(55,200,by=10)), nrep = 50) ploteffsignaturesize(roastgsa1, varrot, whplot = 2)
y <- array(rnorm(10000),dim = c(1000,10)) covar <- data.frame(voi = factor(c(rep(0,5),rep(1,5)))) colnames(y) <- rownames(covar) <- paste0("sample",1:10) rownames(y) <- paste0("gene",1:1000) form <- as.formula("~ voi") index <- lapply(1:10, function(o) sample(1:1000,50)) names(index) <- paste0("gset",1:10) roastgsa1 <- roastgsa(y, covar, form = form, self.contained = TRUE, set.statistic = "maxmean", index = index, nrot = 100, mccores = 1, executation.info = FALSE) varrot <- varrotrand(roastgsa1, y, testedsizes = c(seq(5,50, by=5), seq(55,200,by=10)), nrep = 50) ploteffsignaturesize(roastgsa1, varrot, whplot = 2)
GSEA plot for roastgsa
objects
plotGSEA(obj, whplot = 1, maintitle = "", gsainfo = TRUE, cex.sub = 0.8, lwd = 2, ...)
plotGSEA(obj, whplot = 1, maintitle = "", gsainfo = TRUE, cex.sub = 0.8, lwd = 2, ...)
obj |
an object of class |
whplot |
selected pathway. If integer vector, the pathways are
selected in the same order as observed in the |
maintitle |
plot main title. If |
gsainfo |
if |
cex.sub |
cex for subtitle |
lwd |
line width |
... |
Arguments passed to or from other methods to the low level. |
Standard representation of Kolmogorov-Smirnov GSEA enrichment score.
plot object with the GSEA representation of roastgsa results
Adria Caballe Mestres
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS. 2005;102(43);15545-15550.
y <- array(rnorm(10000),dim = c(1000,10)) covar <- data.frame(voi = factor(c(rep(0,5),rep(1,5)))) colnames(y) <- rownames(covar) <- paste0("sample",1:10) rownames(y) <- paste0("gene",1:1000) form <- as.formula("~ voi") index <- lapply(1:10, function(o) sample(1:1000,50)) names(index) <- paste0("gset",1:10) roastgsa1 <- roastgsa(y, covar, form = form, self.contained = TRUE, set.statistic = "maxmean", index = index, nrot = 200, mccores = 1, executation.info = FALSE) plotGSEA(roastgsa1, whplot = 1, gsainfo =TRUE, maintitle = "", statistic = "mean")
y <- array(rnorm(10000),dim = c(1000,10)) covar <- data.frame(voi = factor(c(rep(0,5),rep(1,5)))) colnames(y) <- rownames(covar) <- paste0("sample",1:10) rownames(y) <- paste0("gene",1:1000) form <- as.formula("~ voi") index <- lapply(1:10, function(o) sample(1:1000,50)) names(index) <- paste0("gset",1:10) roastgsa1 <- roastgsa(y, covar, form = form, self.contained = TRUE, set.statistic = "maxmean", index = index, nrot = 200, mccores = 1, executation.info = FALSE) plotGSEA(roastgsa1, whplot = 1, gsainfo =TRUE, maintitle = "", statistic = "mean")
General gene set analysis plot showing the ordered moderated-t statistics for the selected pathway
plotStats(obj, whplot = 1, maintitle = "", statistic = "mean", ylimAll = TRUE, ylim = NULL, minpointsDens = 20, gsainfo = TRUE, cex.sub = 0.8, lwd = 2, ...)
plotStats(obj, whplot = 1, maintitle = "", statistic = "mean", ylimAll = TRUE, ylim = NULL, minpointsDens = 20, gsainfo = TRUE, cex.sub = 0.8, lwd = 2, ...)
obj |
an object of class 'roastgsa' |
whplot |
selected pathway. If integer vector, the pathways are
selected in the same order as the table in |
maintitle |
plot main title. If |
statistic |
to be selected from 'mean' or 'median' |
ylimAll |
y limits are found using data from all genesets (if
TRUE) or using data from only the plotted geneset (if FALSE). Only
if |
ylim |
vector of size two with y limits |
minpointsDens |
minum number of genes needed to draw the density plot |
gsainfo |
if TRUE, the subtitle shows the enrichment results |
cex.sub |
cex for subtitle |
lwd |
line width |
... |
Arguments passed to or from other methods to the low level. |
The statistic
argument is used for competitive testing computations of
restandardized moderated-t statistics. If "median"
, the median
of all stats is used for centering and the median absolute deviation
is used for scaling. If "mean"
, standard normalization
applies.
It shows the ordered moderated t-statistics in various formats, area for up- and down- expressed genes, barcode plot for these ordered values and density.
plot object with a general representation of roastgsa results
Adria Caballe Mestres
[1] A comparison of rotation-based scores for gene set analysis Adria Caballe Mestres, Antonio Berenguer Llergo, Camille Stephan-Otto Attolini bioRxiv 2021.03.23.436604;
y <- array(rnorm(10000),dim = c(1000,10)) covar <- data.frame(voi = factor(c(rep(0,5),rep(1,5)))) colnames(y) <- rownames(covar) <- paste0("sample",1:10) rownames(y) <- paste0("gene",1:1000) form <- as.formula("~ voi") index <- lapply(1:10, function(o) sample(1:1000,50)) names(index) <- paste0("gset",1:10) roastgsa1 <- roastgsa(y, covar, form = form, self.contained = TRUE, set.statistic = "maxmean", index = index, nrot = 200, mccores = 1, executation.info = FALSE) plotStats(roastgsa1, whplot = 1, maintitle = "general plot", statistic = "mean")
y <- array(rnorm(10000),dim = c(1000,10)) covar <- data.frame(voi = factor(c(rep(0,5),rep(1,5)))) colnames(y) <- rownames(covar) <- paste0("sample",1:10) rownames(y) <- paste0("gene",1:1000) form <- as.formula("~ voi") index <- lapply(1:10, function(o) sample(1:1000,50)) names(index) <- paste0("gset",1:10) roastgsa1 <- roastgsa(y, covar, form = form, self.contained = TRUE, set.statistic = "maxmean", index = index, nrot = 200, mccores = 1, executation.info = FALSE) plotStats(roastgsa1, whplot = 1, maintitle = "general plot", statistic = "mean")
Gene set analysis using rotations for hypothesis testing. Test statistic options include KS-based statistics used in GSEA or GSVA as well as summary statistics such as mean, maxmean, median, absmean and mean.rank
roastgsa(y, covar, form, contrast = NA, design = NULL, gsetsel, gspath, index = NULL, self.contained = FALSE, set.statistic = "maxmean", psel = NULL, nrot = 9999, minsize = 10, maxsize = 500, mccores = 1, executation.info = TRUE, weights = NULL, shrink.resid = TRUE, normalizeScores = TRUE, ...)
roastgsa(y, covar, form, contrast = NA, design = NULL, gsetsel, gspath, index = NULL, self.contained = FALSE, set.statistic = "maxmean", psel = NULL, nrot = 9999, minsize = 10, maxsize = 500, mccores = 1, executation.info = TRUE, weights = NULL, shrink.resid = TRUE, normalizeScores = TRUE, ...)
y |
expression matrix with columns indicating samples and rows indicating genes |
covar |
data frame with the covariates |
form |
description of the model to be fitted |
contrast |
comparison to consider in the model. If NA, the last column of the design matrix is used |
design |
the design matrix of the experiment. If null, this is
calculated using the |
gsetsel |
character string with gene set database to be used in
format |
gspath |
path for the gene set database |
index |
list with index vectors specifying which rows of |
self.contained |
competitive test ( |
set.statistic |
to be chosen from |
psel |
character vector with probesets (one per gene) to be used for roastgsa statistic in a microarray experiment |
nrot |
number of rotations used for hypothesis testing |
minsize |
minimum size of the testing sets allowed for hypothesis testing |
maxsize |
maximum size of the testing sets allowed for hypothesis testing |
mccores |
the number of cores to use for parallel executions |
executation.info |
Show (if set to TRUE) the progress-bar of the iterative process |
weights |
list with the gene weights in each testing set. Only
for set.statistic = |
shrink.resid |
if |
normalizeScores |
transform the moderated t-statistics to z-scores |
... |
Arguments passed to or from other methods to the low level. |
We consider 7 different enrichment score functions which we refer by the names of mean, maxmean, median, absmean, mean.rank, GSEA and GSVA. The first four functions (mean, maxmean, median, absmean) are formulated for the two type of testing problems (self-contained and competitive). The mean.rank, GSEA and GSVA are exclusive scores for the competitive approach. The absmean is a non-directional score that can be used to give priority to gene sets with both activator and inhibitor genes. The mean is a democratic score that gives priority to detecting gene sets in which a large fraction of their genes present similar effect sizes going at the same direction. The maxmean (default) falls in between the mean and the absmean scores, being capable to recover both type of gene sets consistently.
Some of the defined sets are composed by genes that interact together in
any particular biological condition, leading to intra-gene set
correlation structures with high levels of correlation. We encourage the
usage of effective signatures size, that can be a proxy for the number
of uncorrelated genes in the gene set used for GSA
(varrotrand
and
ploteffsignaturesize
). Through the argument
weights
, we provide the possibility to redefining the gene set by
weighting the importance of each gene in the list.
GSEA and GSVA scores are computationally much more intensive than the other scores.
return an object of class roastgsa
with attributes
"res" |
data.frame with main results obtained in hypothesis
testing. Total genes in the geneset, the number of genes also in the
|
"stats" |
Moderated t-statistics for all genes |
"contrast" |
contrast used in a vector form |
"index" |
list with gene set symbols |
Adria Caballe Mestres
[1] A comparison of rotation-based scores for gene set analysis Adria Caballe Mestres, Antonio Berenguer Llergo, Camille Stephan-Otto Attolini bioRxiv 2021.03.23.436604; doi:
[2] E. Lim, D. Wu, G. K. Smyth, M.-L. Asselin-Labat, F. Vaillant, and J. E. Visvader. ROAST: rotation gene set tests for complex microarray experiments. Bioinformatics, 26(17):2176-2182, 2010.
y <- array(rnorm(10000),dim = c(1000,10)) covar <- data.frame(voi = factor(c(rep(0,5),rep(1,5)))) colnames(y) <- rownames(covar) <- paste0("sample",1:10) rownames(y) <- paste0("gene",1:1000) form <- as.formula("~ voi") index <- lapply(1:10, function(o) sample(1:1000,50)) names(index) <- paste0("gset",1:10) roastgsa1 <- roastgsa(y, covar, form = form, self.contained = TRUE, set.statistic = "maxmean", index = index, nrot = 200, mccores = 1, executation.info = FALSE) print(roastgsa1)
y <- array(rnorm(10000),dim = c(1000,10)) covar <- data.frame(voi = factor(c(rep(0,5),rep(1,5)))) colnames(y) <- rownames(covar) <- paste0("sample",1:10) rownames(y) <- paste0("gene",1:1000) form <- as.formula("~ voi") index <- lapply(1:10, function(o) sample(1:1000,50)) names(index) <- paste0("gset",1:10) roastgsa1 <- roastgsa(y, covar, form = form, self.contained = TRUE, set.statistic = "maxmean", index = index, nrot = 200, mccores = 1, executation.info = FALSE) print(roastgsa1)
from sorttable v2.0 of Stuart Langridge.
sorttable
sorttable
character vector
Character vector with sorttable
http://www.kryogenix.org/code/browser/sorttable/
http://www.kryogenix.org/code/browser/sorttable/
Single sample gene set analysis using z-score summarized data for linear model hypothesis testing
ssGSA(y, obj = NULL, design = NULL, contrast = NULL, index = NULL, method = c("GScor","GSadj","zscore"))
ssGSA(y, obj = NULL, design = NULL, contrast = NULL, index = NULL, method = c("GScor","GSadj","zscore"))
y |
expression matrix with columns indicating samples and rows indicating genes |
obj |
object of class 'roastgsa' used to extract the design, the contrast and the index arguments |
design |
the design matrix of the experiment. Considered only
if |
contrast |
comparison to consider in the model. Considered only
if |
index |
list with index vectors specifying which rows of |
method |
If |
A correction by the overall tendency can be done a priori (GScor) or it
can be incorporated as a covariate in the linear model (GSadj).
The correction variable used here is what we have called the global signature
(GS) of the experiment, that for each sample can be calculated as the
average z-score of all genes measured in y
. This GS corrects or centers
global technical / sampling directions in the data.
return an object of class ssGSA
with attributes
"res" |
data.frame with main results obtained in hypothesis testing. Total genes in the gene set, the average score, the test statistic, p-value and adjusted pvalue. |
"stats" |
adjusted z-scores matrix |
Adria Caballe Mestres
[1] Caballe Mestres A, Berenguer Llergo A and Stephan-Otto Attolini C. Adjusting for systematic technical biases in risk assessment of gene signatures in transcriptomic cancer cohorts. bioRxiv (2018).
y <- array(rnorm(10000),dim = c(1000,10)) covar <- data.frame(voi = factor(c(rep(0,5),rep(1,5)))) colnames(y) <- rownames(covar) <- paste0("sample",1:10) rownames(y) <- paste0("gene",1:1000) form <- as.formula("~ voi") index <- lapply(1:10, function(o) sample(1:1000,50)) names(index) <- paste0("gset",1:10) design <- model.matrix(form, covar) ssgsa1 <- ssGSA(y, obj=NULL, design = design, contrast = 2, index = index, method = c("GScor"))
y <- array(rnorm(10000),dim = c(1000,10)) covar <- data.frame(voi = factor(c(rep(0,5),rep(1,5)))) colnames(y) <- rownames(covar) <- paste0("sample",1:10) rownames(y) <- paste0("gene",1:1000) form <- as.formula("~ voi") index <- lapply(1:10, function(o) sample(1:1000,50)) names(index) <- paste0("gset",1:10) design <- model.matrix(form, covar) ssgsa1 <- ssGSA(y, obj=NULL, design = design, contrast = 2, index = index, method = c("GScor"))
Computation of the sample variance of rotation scores under gene randomization
varrotrand(obj, y, testedsizes = c(3:30,seq(32,50, by=2), seq(55,200,by=5)), nrep = 200, nrot = NULL, mccores = NULL, psel = NULL)
varrotrand(obj, y, testedsizes = c(3:30,seq(32,50, by=2), seq(55,200,by=5)), nrep = 200, nrot = NULL, mccores = NULL, psel = NULL)
obj |
an object of class 'roastgsa' |
y |
data used in |
testedsizes |
effective sizes to be tested |
nrep |
number of randomly selected gene sets created for each tested effective size |
nrot |
number of rotations used for hypothesis testing |
mccores |
the number of cores to use for parallel executions |
psel |
character vector with probesets (one per gene) to be used for roastgsa statistic in a microarray experiment |
When a specific gene that is highly correlated to the rest of the gene set
finds an extreme value, even under H, it is likely that many
other genes in the gene set follow it with large values as well. We define
the concept of effective signature size of a gene set by the number of
randomly selected (not necessarily independent) genes that are needed to
achieve comparable variability levels on rotation summary test statistics.
This can be viewed as a realistic measure of the total number of independent
variables that contribute to the power of the test. The function presented
here computes the sample variance of the rotation scores in randomly generated
signatures of several sizes. The comparison to the observed variances (using
the testing gene sets in the
roastgsa
call) is done through the function
ploteffsignaturesize
.
return an object of class varrotrand
with attributes
"varrot" |
matrix |
"testedsizes" |
effective sizes being tested |
"nrep" |
number of gene sets created for each tested effective size |
Adria Caballe Mestres
[1] A comparison of rotation-based scores for gene set analysis Adria Caballe Mestres, Antonio Berenguer Llergo, Camille Stephan-Otto Attolini bioRxiv 2021.03.23.436604;
ploteffsignaturesize
to visualize results and
roastgsa
for gsa approach
y <- array(rnorm(10000),dim = c(1000,10)) covar <- data.frame(voi = factor(c(rep(0,5),rep(1,5)))) colnames(y) <- rownames(covar) <- paste0("sample",1:10) rownames(y) <- paste0("gene",1:1000) form <- as.formula("~ voi") index <- lapply(1:10, function(o) sample(1:1000,50)) names(index) <- paste0("gset",1:10) roastgsa1 <- roastgsa(y, covar, form = form, self.contained = TRUE, set.statistic = "maxmean", index = index, nrot = 100, mccores = 1, executation.info = FALSE) varrot <- varrotrand(roastgsa1, y, testedsizes = c(seq(5,50, by=5), seq(55,200,by=10)), nrep = 50)
y <- array(rnorm(10000),dim = c(1000,10)) covar <- data.frame(voi = factor(c(rep(0,5),rep(1,5)))) colnames(y) <- rownames(covar) <- paste0("sample",1:10) rownames(y) <- paste0("gene",1:1000) form <- as.formula("~ voi") index <- lapply(1:10, function(o) sample(1:1000,50)) names(index) <- paste0("gset",1:10) roastgsa1 <- roastgsa(y, covar, form = form, self.contained = TRUE, set.statistic = "maxmean", index = index, nrot = 100, mccores = 1, executation.info = FALSE) varrot <- varrotrand(roastgsa1, y, testedsizes = c(seq(5,50, by=5), seq(55,200,by=10)), nrep = 50)