Title: | Gene set analysis following differential expression using linear (mixed) modeling with dream |
---|---|
Description: | Zenith performs gene set analysis on the result of differential expression using linear (mixed) modeling with dream by considering the correlation between gene expression traits. This package implements the camera method from the limma package proposed by Wu and Smyth (2012). Zenith is a simple extension of camera to be compatible with linear mixed models implemented in variancePartition::dream(). |
Authors: | Gabriel Hoffman [aut, cre] |
Maintainer: | Gabriel Hoffman <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.9.0 |
Built: | 2024-12-02 06:17:03 UTC |
Source: | https://github.com/bioc/zenith |
Same as limma::.rankSumTestWithCorrelation
, but returns effect size.
.rankSumTestWithCorrelation(index, statistics, correlation = 0, df = Inf)
.rankSumTestWithCorrelation(index, statistics, correlation = 0, df = Inf)
index |
any index vector such that |
statistics |
numeric vector giving values of the test statistic. |
correlation |
numeric scalar, average correlation between cases in the test group. Cases in the second group are assumed independent of each other and other the first group. |
df |
degrees of freedom which the correlation has been estimated. |
See limma::.rankSumTestWithCorrelation
data.frame storing results of hypothesis test
Evaluate mean correlation between residuals in gene set based on results from dream
corInGeneSet(fit, idx, squareCorr = FALSE)
corInGeneSet(fit, idx, squareCorr = FALSE)
fit |
result of differential expression with dream |
idx |
indeces or rownames to extract |
squareCorr |
compute the mean squared correlation instead |
list storing correlation and variance inflation factor
Load Gene Ontology genesets
get_GeneOntology( onto = c("BP", "MF", "CC"), to = "ENSEMBL", includeOffspring = TRUE, org = "hsa" )
get_GeneOntology( onto = c("BP", "MF", "CC"), to = "ENSEMBL", includeOffspring = TRUE, org = "hsa" )
onto |
array of categories to load |
to |
convert gene names to this type using |
includeOffspring |
if TRUE, follow the GO hierarchy down and include all genes in offspring sets for a given gene set |
org |
organism. human ( |
This function loads the GO gene sets using the packages EnrichmentBrowser
and GO.db
It can take a mintute to load because converting gene name type is slow.
Gene sets stored as GeneSetCollection
# load GO Biological Process # gs = get_GeneOntology('BP') # load all gene sets # gs = get_GeneOntology()
# load GO Biological Process # gs = get_GeneOntology('BP') # load all gene sets # gs = get_GeneOntology()
Load MSigDB genesets
get_MSigDB( cat = unique(msigdbr_collections()$gs_cat), to = "ENSEMBL", org = "hsa" )
get_MSigDB( cat = unique(msigdbr_collections()$gs_cat), to = "ENSEMBL", org = "hsa" )
cat |
array of categories to load. Defaults to array of all MSigDB categories |
to |
convert gene names to this type using |
org |
organism. human ( |
This function loads the MSigDB gene sets using the packages EnrichmentBrowser
and msigdbr
. It can take a mintute to load because converting gene name type is slow.
Gene sets stored as GeneSetCollection
# load Hallmark gene sets gs = get_MSigDB('H') # load all gene sets # gs = get_MSigDB()
# load Hallmark gene sets gs = get_MSigDB('H') # load all gene sets # gs = get_MSigDB()
Heatmap of zenith results showing genesets that have the top and bottom t-statistics from each assay.
plotZenithResults( df, ntop = 5, nbottom = 5, label.angle = 45, zmax = NULL, transpose = FALSE, sortByGeneset = TRUE )
plotZenithResults( df, ntop = 5, nbottom = 5, label.angle = 45, zmax = NULL, transpose = FALSE, sortByGeneset = TRUE )
df |
result |
ntop |
number of gene sets with highest t-statistic to show |
nbottom |
number of gene sets with lowest t-statistic to show |
label.angle |
angle of x-axis label |
zmax |
maxium of the color scales. If not specified, used range of the observed t-statistics |
transpose |
transpose the axes of the plot |
sortByGeneset |
use hierarchical clustering to sort gene sets. Default is TRUE |
Heatmap showing enrichment for gene sets and cell types
# Load packages library(edgeR) library(variancePartition) library(tweeDEseqCountData) # Load RNA-seq data from LCL's data(pickrell) geneCounts = exprs(pickrell.eset) df_metadata = pData(pickrell.eset) # Filter genes # Note this is low coverage data, so just use as code example dsgn = model.matrix(~ gender, df_metadata) keep = filterByExpr(geneCounts, dsgn, min.count=5) # Compute library size normalization dge = DGEList(counts = geneCounts[keep,]) dge = calcNormFactors(dge) # Estimate precision weights using voom vobj = voomWithDreamWeights(dge, ~ gender, df_metadata) # Apply dream analysis fit = dream(vobj, ~ gender,df_metadata) fit = eBayes(fit) # Load Hallmark genes from MSigDB # use gene 'SYMBOL', or 'ENSEMBL' id # use get_GeneOntology() to load Gene Ontology gs = get_MSigDB("H", to="ENSEMBL") # Run zenith analysis res.gsa = zenith_gsa(fit, gs, 'gendermale', progressbar=FALSE ) # Show top gene sets head(res.gsa, 2) # for each cell type select 3 genesets with largest t-statistic # and 1 geneset with the lowest # Grey boxes indicate the gene set could not be evaluted because # to few genes were represented plotZenithResults(res.gsa)
# Load packages library(edgeR) library(variancePartition) library(tweeDEseqCountData) # Load RNA-seq data from LCL's data(pickrell) geneCounts = exprs(pickrell.eset) df_metadata = pData(pickrell.eset) # Filter genes # Note this is low coverage data, so just use as code example dsgn = model.matrix(~ gender, df_metadata) keep = filterByExpr(geneCounts, dsgn, min.count=5) # Compute library size normalization dge = DGEList(counts = geneCounts[keep,]) dge = calcNormFactors(dge) # Estimate precision weights using voom vobj = voomWithDreamWeights(dge, ~ gender, df_metadata) # Apply dream analysis fit = dream(vobj, ~ gender,df_metadata) fit = eBayes(fit) # Load Hallmark genes from MSigDB # use gene 'SYMBOL', or 'ENSEMBL' id # use get_GeneOntology() to load Gene Ontology gs = get_MSigDB("H", to="ENSEMBL") # Run zenith analysis res.gsa = zenith_gsa(fit, gs, 'gendermale', progressbar=FALSE ) # Show top gene sets head(res.gsa, 2) # for each cell type select 3 genesets with largest t-statistic # and 1 geneset with the lowest # Grey boxes indicate the gene set could not be evaluted because # to few genes were represented plotZenithResults(res.gsa)
Perform gene set analysis on the result of differential expression using linear (mixed) modeling with variancePartition::dream
by considering the correlation between gene expression traits. This package is a slight modification of limma::camera
to 1) be compatible with dream, and 2) allow identification of gene sets with log fold changes with mixed sign.
zenith( fit, coef, index, use.ranks = FALSE, allow.neg.cor = FALSE, progressbar = TRUE, inter.gene.cor = 0.01 )
zenith( fit, coef, index, use.ranks = FALSE, allow.neg.cor = FALSE, progressbar = TRUE, inter.gene.cor = 0.01 )
fit |
result of differential expression with dream |
coef |
coefficient to test using |
index |
an index vector or a list of index vectors. Can be any vector such that |
use.ranks |
do a rank-based test ( |
allow.neg.cor |
should reduced variance inflation factors be allowed for negative correlations? |
progressbar |
if TRUE, show progress bar |
inter.gene.cor |
if NA, estimate correlation from data. Otherwise, use specified value |
zenith
gives the same results as camera(..., inter.gene.cor=NA)
which estimates the correlation with each gene set.
For differential expression with dream using linear (mixed) models see Hoffman and Roussos (2020). For the original camera gene set test see Wu and Smyth (2012).
NGenes
: number of genes in this set
Correlation
: mean correlation between expression of genes in this set
delta
: difference in mean t-statistic for genes in this set compared to genes not in this set
se
: standard error of delta
p.less
: p-value for hypothesis test of H0: delta < 0
p.greater
: p-value for hypothesis test of H0: delta > 0
PValue
: p-value for hypothesis test H0: delta != 0
Direction
: direction of effect based on sign(delta)
FDR
: false discovery rate based on Benjamini-Hochberg method in p.adjust
Hoffman GE, Roussos P (2020). “dream: Powerful differential expression analysis for repeated measures designs.” Bioinformatics. doi:10.1093/bioinformatics/btaa687.
Wu D, Smyth GK (2012). “Camera: a competitive gene set test accounting for inter-gene correlation.” Nucleic acids research, 40(17), e133. doi:10.1093/nar/gks461.
library(variancePartition) # simulate meta-data info <- data.frame(Age=c(20, 31, 52, 35, 43, 45),Group=c(0,0,0,1,1,1)) # simulate expression data y <- matrix(rnorm(1000*6),1000,6) rownames(y) = paste0("gene", 1:1000) colnames(y) = rownames(info) # First set of 20 genes are genuinely differentially expressed index1 <- 1:20 y[index1,4:6] <- y[index1,4:6]+1 # Second set of 20 genes are not DE index2 <- 21:40 # perform differential expression analysis with dream fit = dream(y, ~ Age + Group, info) fit = eBayes(fit) # perform gene set analysis testing Age res = zenith(fit, "Age", list(set1=index1,set2=index2) ) head(res)
library(variancePartition) # simulate meta-data info <- data.frame(Age=c(20, 31, 52, 35, 43, 45),Group=c(0,0,0,1,1,1)) # simulate expression data y <- matrix(rnorm(1000*6),1000,6) rownames(y) = paste0("gene", 1:1000) colnames(y) = rownames(info) # First set of 20 genes are genuinely differentially expressed index1 <- 1:20 y[index1,4:6] <- y[index1,4:6]+1 # Second set of 20 genes are not DE index2 <- 21:40 # perform differential expression analysis with dream fit = dream(y, ~ Age + Group, info) fit = eBayes(fit) # perform gene set analysis testing Age res = zenith(fit, "Age", list(set1=index1,set2=index2) ) head(res)
Perform a competitive gene set analysis accounting for correlation between genes.
zenith_gsa( fit, geneSets, coefs, use.ranks = FALSE, n_genes_min = 10, inter.gene.cor = 0.01, progressbar = TRUE, ... ) ## S4 method for signature 'MArrayLM,GeneSetCollection' zenith_gsa( fit, geneSets, coefs, use.ranks = FALSE, n_genes_min = 10, inter.gene.cor = 0.01, progressbar = TRUE, ... )
zenith_gsa( fit, geneSets, coefs, use.ranks = FALSE, n_genes_min = 10, inter.gene.cor = 0.01, progressbar = TRUE, ... ) ## S4 method for signature 'MArrayLM,GeneSetCollection' zenith_gsa( fit, geneSets, coefs, use.ranks = FALSE, n_genes_min = 10, inter.gene.cor = 0.01, progressbar = TRUE, ... )
fit |
results from |
geneSets |
|
coefs |
list of coefficients to test using |
use.ranks |
do a rank-based test |
n_genes_min |
minumum number of genes in a geneset |
inter.gene.cor |
if NA, estimate correlation from data. Otherwise, use specified value |
progressbar |
if TRUE, show progress bar |
... |
other arguments |
This code adapts the widely used camera()
analysis (Wu and Smyth 2012) in the limma
package (Ritchie et al. 2015) to the case of linear (mixed) models used by variancePartition::dream()
.
data.frame
of results for each gene set and cell type
Ritchie ME, Phipson B, Wu DI, Hu Y, Law CW, Shi W, Smyth GK (2015).
“limma powers differential expression analyses for RNA-sequencing and microarray studies.”
Nucleic acids research, 43(7), e47–e47.
Wu D, Smyth GK (2012).
“Camera: a competitive gene set test accounting for inter-gene correlation.”
Nucleic acids research, 40(17), e133.
doi:10.1093/nar/gks461.
limma::camera
# Load packages library(edgeR) library(variancePartition) library(tweeDEseqCountData) # Load RNA-seq data from LCL's data(pickrell) geneCounts = exprs(pickrell.eset) df_metadata = pData(pickrell.eset) # Filter genes # Note this is low coverage data, so just use as code example dsgn = model.matrix(~ gender, df_metadata) keep = filterByExpr(geneCounts, dsgn, min.count=5) # Compute library size normalization dge = DGEList(counts = geneCounts[keep,]) dge = calcNormFactors(dge) # Estimate precision weights using voom vobj = voomWithDreamWeights(dge, ~ gender, df_metadata) # Apply dream analysis fit = dream(vobj, ~ gender, df_metadata) fit = eBayes(fit) # Load Hallmark genes from MSigDB # use gene 'SYMBOL', or 'ENSEMBL' id # use get_GeneOntology() to load Gene Ontology gs = get_MSigDB("H", to="ENSEMBL") # Run zenith analysis res.gsa = zenith_gsa(fit, gs, 'gendermale', progressbar=FALSE ) # Show top gene sets head(res.gsa, 2) # for each cell type select 3 genesets with largest t-statistic # and 1 geneset with the lowest # Grey boxes indicate the gene set could not be evaluted because # to few genes were represented plotZenithResults(res.gsa)
# Load packages library(edgeR) library(variancePartition) library(tweeDEseqCountData) # Load RNA-seq data from LCL's data(pickrell) geneCounts = exprs(pickrell.eset) df_metadata = pData(pickrell.eset) # Filter genes # Note this is low coverage data, so just use as code example dsgn = model.matrix(~ gender, df_metadata) keep = filterByExpr(geneCounts, dsgn, min.count=5) # Compute library size normalization dge = DGEList(counts = geneCounts[keep,]) dge = calcNormFactors(dge) # Estimate precision weights using voom vobj = voomWithDreamWeights(dge, ~ gender, df_metadata) # Apply dream analysis fit = dream(vobj, ~ gender, df_metadata) fit = eBayes(fit) # Load Hallmark genes from MSigDB # use gene 'SYMBOL', or 'ENSEMBL' id # use get_GeneOntology() to load Gene Ontology gs = get_MSigDB("H", to="ENSEMBL") # Run zenith analysis res.gsa = zenith_gsa(fit, gs, 'gendermale', progressbar=FALSE ) # Show top gene sets head(res.gsa, 2) # for each cell type select 3 genesets with largest t-statistic # and 1 geneset with the lowest # Grey boxes indicate the gene set could not be evaluted because # to few genes were represented plotZenithResults(res.gsa)
Perform gene set analysis on the result of a pre-computed test statistic. Test whether statistics in a gene set are larger/smaller than statistics not in the set.
zenithPR_gsa( statistics, ids, geneSets, use.ranks = FALSE, n_genes_min = 10, progressbar = TRUE, inter.gene.cor = 0.01, coef.name = "zenithPR" )
zenithPR_gsa( statistics, ids, geneSets, use.ranks = FALSE, n_genes_min = 10, progressbar = TRUE, inter.gene.cor = 0.01, coef.name = "zenithPR" )
statistics |
pre-computed test statistics |
ids |
name of gene for each entry in |
geneSets |
|
use.ranks |
do a rank-based test |
n_genes_min |
minumum number of genes in a geneset |
progressbar |
if TRUE, show progress bar |
inter.gene.cor |
correlation of test statistics with in gene set |
coef.name |
name of column to store test statistic |
This is the same as zenith_gsa()
, but uses pre-computed test statistics. Note that zenithPR_gsa()
may give slightly different results for small samples sizes, if zenithPR_gsa()
is fed t-statistics instead of z-statistics.
NGenes
: number of genes in this set
Correlation
: mean correlation between expression of genes in this set
delta
: difference in mean t-statistic for genes in this set compared to genes not in this set
se
: standard error of delta
p.less
: p-value for hypothesis test of H0: delta < 0
p.greater
: p-value for hypothesis test of H0: delta > 0
PValue
: p-value for hypothesis test H0: delta != 0
Direction
: direction of effect based on sign(delta)
FDR
: false discovery rate based on Benjamini-Hochberg method in p.adjust
coef.name
: name for pre-computed test statistics. Default: zenithPR
zenith_gsa()
, limma::cameraPR()
# Load packages library(edgeR) library(variancePartition) library(tweeDEseqCountData) # Load RNA-seq data from LCL's data(pickrell) geneCounts = exprs(pickrell.eset) df_metadata = pData(pickrell.eset) # Filter genes # Note this is low coverage data, so just use as code example dsgn = model.matrix(~ gender, df_metadata) keep = filterByExpr(geneCounts, dsgn, min.count=5) # Compute library size normalization dge = DGEList(counts = geneCounts[keep,]) dge = calcNormFactors(dge) # Estimate precision weights using voom vobj = voomWithDreamWeights(dge, ~ gender, df_metadata) # Apply dream analysis fit = dream(vobj, ~ gender, df_metadata) fit = eBayes(fit) # Load Hallmark genes from MSigDB # use gene 'SYMBOL', or 'ENSEMBL' id # use get_GeneOntology() to load Gene Ontology gs = get_MSigDB("H", to="ENSEMBL") # Run zenithPR analysis with a test statistic for each gene tab = topTable(fit, coef='gendermale', number=Inf) res.gsa = zenithPR_gsa(tab$t, rownames(tab), gs)
# Load packages library(edgeR) library(variancePartition) library(tweeDEseqCountData) # Load RNA-seq data from LCL's data(pickrell) geneCounts = exprs(pickrell.eset) df_metadata = pData(pickrell.eset) # Filter genes # Note this is low coverage data, so just use as code example dsgn = model.matrix(~ gender, df_metadata) keep = filterByExpr(geneCounts, dsgn, min.count=5) # Compute library size normalization dge = DGEList(counts = geneCounts[keep,]) dge = calcNormFactors(dge) # Estimate precision weights using voom vobj = voomWithDreamWeights(dge, ~ gender, df_metadata) # Apply dream analysis fit = dream(vobj, ~ gender, df_metadata) fit = eBayes(fit) # Load Hallmark genes from MSigDB # use gene 'SYMBOL', or 'ENSEMBL' id # use get_GeneOntology() to load Gene Ontology gs = get_MSigDB("H", to="ENSEMBL") # Run zenithPR analysis with a test statistic for each gene tab = topTable(fit, coef='gendermale', number=Inf) res.gsa = zenithPR_gsa(tab$t, rownames(tab), gs)