Title: | Tissue-specific gene enrichment analysis |
---|---|
Description: | The TissueEnrich package is used to calculate enrichment of tissue-specific genes in a set of input genes. For example, the user can input the most highly expressed genes from RNA-Seq data, or gene co-expression modules to determine which tissue-specific genes are enriched in those datasets. Tissue-specific genes were defined by processing RNA-Seq data from the Human Protein Atlas (HPA) (Uhlén et al. 2015), GTEx (Ardlie et al. 2015), and mouse ENCODE (Shen et al. 2012) using the algorithm from the HPA (Uhlén et al. 2015).The hypergeometric test is being used to determine if the tissue-specific genes are enriched among the input genes. Along with tissue-specific gene enrichment, the TissueEnrich package can also be used to define tissue-specific genes from expression datasets provided by the user, which can then be used to calculate tissue-specific gene enrichments. |
Authors: | Ashish Jain [aut, cre], Geetu Tuteja [aut] |
Maintainer: | Ashish Jain <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.27.0 |
Built: | 2024-11-19 04:41:48 UTC |
Source: | https://github.com/bioc/TissueEnrich |
The teEnrichment function is used to calculate the enrichment of tissue-specific genes, given an input gene set. It uses tissue-specific genes defined by processing RNA-Seq datasets from human and mouse.
teEnrichment( inputGenes = NULL, rnaSeqDataset = 1, tissueSpecificGeneType = 1, multiHypoCorrection = TRUE, backgroundGenes = NULL )
teEnrichment( inputGenes = NULL, rnaSeqDataset = 1, tissueSpecificGeneType = 1, multiHypoCorrection = TRUE, backgroundGenes = NULL )
inputGenes |
A GeneSet object containing the input genes, organism type ('Homo Sapiens' or 'Mus Musculus'), and gene id identifier (Gene Symbol or ENSEMBL identifier). |
rnaSeqDataset |
An integer describing the dataset to be used for enrichment analysis. 1 for 'Human Protein Atlas' (default), 2 for 'GTEx', 3 for 'Mouse ENCODE'. Default 1. |
tissueSpecificGeneType |
An integer describing the type of tissue- specific genes to be used. 1 for 'All' (default), 2 for 'Tissue-Enriched', 3 for 'Tissue-Enhanced', and 4 for 'Group-Enriched'. Default 1. |
multiHypoCorrection |
Flag to correct P-values for multiple hypothesis using BH method. Default TRUE. |
backgroundGenes |
A GeneSet object containing the background gene list, organism type ('Homo Sapiens' or 'Mus Musculus'), and gene id identifier (Gene Symbol or ENSEMBL identifier). The input genes must be present in the background gene list. If not provided all the genes will be used as background. |
The output is a list with four objects. The first object is the SummarizedExperiment object containing the enrichment results, the second and the third object contains the expression values and tissue-specificity information of the tissue-specific genes for genes from the input gene set, and the fourth is a GeneSet object containing genes that were not identified in the tissue-specific gene data.
Ashish Jain, Geetu Tuteja
library(dplyr) library(ggplot2) genes<-system.file('extdata', 'inputGenes.txt', package = 'TissueEnrich') inputGenes<-scan(genes,character()) gs<-GeneSet(geneIds=inputGenes,organism='Homo Sapiens', geneIdType=SymbolIdentifier()) output<-teEnrichment(gs) seEnrichmentOutput<-output[[1]] enrichmentOutput<-setNames(data.frame(assay(seEnrichmentOutput), row.names = rowData(seEnrichmentOutput)[,1]), colData(seEnrichmentOutput)[,1]) enrichmentOutput$Tissue<-row.names(enrichmentOutput) #Plotting the P-Values ggplot(enrichmentOutput,aes(x=reorder(Tissue,-Log10PValue),y=Log10PValue, label = Tissue.Specific.Genes,fill = Tissue))+ geom_bar(stat = 'identity')+ labs(x='', y = '-LOG10(P-Value)')+ theme_bw()+ theme(legend.position='none')+ theme(plot.title = element_text(hjust = 0.5,size = 20),axis.title = element_text(size=15))+ theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1), panel.grid.major= element_blank(),panel.grid.minor = element_blank())
library(dplyr) library(ggplot2) genes<-system.file('extdata', 'inputGenes.txt', package = 'TissueEnrich') inputGenes<-scan(genes,character()) gs<-GeneSet(geneIds=inputGenes,organism='Homo Sapiens', geneIdType=SymbolIdentifier()) output<-teEnrichment(gs) seEnrichmentOutput<-output[[1]] enrichmentOutput<-setNames(data.frame(assay(seEnrichmentOutput), row.names = rowData(seEnrichmentOutput)[,1]), colData(seEnrichmentOutput)[,1]) enrichmentOutput$Tissue<-row.names(enrichmentOutput) #Plotting the P-Values ggplot(enrichmentOutput,aes(x=reorder(Tissue,-Log10PValue),y=Log10PValue, label = Tissue.Specific.Genes,fill = Tissue))+ geom_bar(stat = 'identity')+ labs(x='', y = '-LOG10(P-Value)')+ theme_bw()+ theme(legend.position='none')+ theme(plot.title = element_text(hjust = 0.5,size = 20),axis.title = element_text(size=15))+ theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1), panel.grid.major= element_blank(),panel.grid.minor = element_blank())
The teEnrichmentCustom function is used to calculate tissue-specific gene enrichment using tissue-specific genes defined using the teGeneRetrieval function.
teEnrichmentCustom( inputGenes = NULL, tissueSpecificGenes = NULL, tissueSpecificGeneType = 1, multiHypoCorrection = TRUE, backgroundGenes = NULL )
teEnrichmentCustom( inputGenes = NULL, tissueSpecificGenes = NULL, tissueSpecificGeneType = 1, multiHypoCorrection = TRUE, backgroundGenes = NULL )
inputGenes |
An GeneSet object containing the input genes. |
tissueSpecificGenes |
A SummarizedExperiment object. Output from 'teGeneRetrieval' function. Default NULL. |
tissueSpecificGeneType |
An integer describing the type of tissue-specific genes to be used. 1 for 'All' (default), 2 for 'Tissue-Enriched',3 for 'Tissue-Enhanced', and 4 for 'Group-Enriched'. Default 1. |
multiHypoCorrection |
Flag to correct P-values for multiple hypothesis using BH method. Default TRUE. |
backgroundGenes |
A GeneSet object containing the background gene list, organism type ('Homo Sapiens' or 'Mus Musculus'), and gene id identifier (Gene Symbol or ENSEMBL identifier). The input genes must be present in the background gene list. If not provided all the genes will be used as background. |
The output is a list with three objects. The first object is the SummarizedExperiment object containing the enrichment results, the second object contains the tissue-specificity information of the tissue-specific genes for genes from the input gene set, and the third is a GeneSet object containing genes that were not identified in the tissue-specific gene data.
Ashish Jain, Geetu Tuteja
library(dplyr) data<-system.file('extdata', 'test.expressiondata.txt', package = 'TissueEnrich') expressionData<-read.table(data,header=TRUE,row.names=1,sep='\t') se<-SummarizedExperiment(assays = SimpleList(as.matrix(expressionData)), rowData = row.names(expressionData),colData = colnames(expressionData)) output<-teGeneRetrieval(se) head(metadata(output)[['TissueSpecificGenes']]) genes<-system.file('extdata', 'inputGenesEnsembl.txt', package = 'TissueEnrich') inputGenes<-scan(genes,character()) gs<-GeneSet(geneIds=inputGenes) output2<-teEnrichmentCustom(gs,output) #Plotting the P-Values enrichmentOutput<-setNames(data.frame(assay(output2[[1]]), row.names = rowData(output2[[1]])[,1]), colData(output2[[1]])[,1]) enrichmentOutput$Tissue<-row.names(enrichmentOutput) ggplot(enrichmentOutput,aes(x=reorder(Tissue,-Log10PValue),y=Log10PValue, label = Tissue.Specific.Genes,fill = Tissue))+ geom_bar(stat = 'identity')+ labs(x='', y = '-LOG10(P-Value)')+ theme_bw()+ theme(legend.position='none')+ theme(plot.title = element_text(hjust = 0.5,size = 20),axis.title = element_text(size=15))+ theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1), panel.grid.major= element_blank(),panel.grid.minor = element_blank())
library(dplyr) data<-system.file('extdata', 'test.expressiondata.txt', package = 'TissueEnrich') expressionData<-read.table(data,header=TRUE,row.names=1,sep='\t') se<-SummarizedExperiment(assays = SimpleList(as.matrix(expressionData)), rowData = row.names(expressionData),colData = colnames(expressionData)) output<-teGeneRetrieval(se) head(metadata(output)[['TissueSpecificGenes']]) genes<-system.file('extdata', 'inputGenesEnsembl.txt', package = 'TissueEnrich') inputGenes<-scan(genes,character()) gs<-GeneSet(geneIds=inputGenes) output2<-teEnrichmentCustom(gs,output) #Plotting the P-Values enrichmentOutput<-setNames(data.frame(assay(output2[[1]]), row.names = rowData(output2[[1]])[,1]), colData(output2[[1]])[,1]) enrichmentOutput$Tissue<-row.names(enrichmentOutput) ggplot(enrichmentOutput,aes(x=reorder(Tissue,-Log10PValue),y=Log10PValue, label = Tissue.Specific.Genes,fill = Tissue))+ geom_bar(stat = 'identity')+ labs(x='', y = '-LOG10(P-Value)')+ theme_bw()+ theme(legend.position='none')+ theme(plot.title = element_text(hjust = 0.5,size = 20),axis.title = element_text(size=15))+ theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1), panel.grid.major= element_blank(),panel.grid.minor = element_blank())
The teGeneRetrieval function is used to define tissue-specific genes, using the algorithm from the HPA (Uhlén et al. 2015). It takes a gene expression SummarizedExperiment object as input (rows as genes and columns as tissue) and classifies the genes into different gene groups. The users also have the option of changing the default thresholds to vary the degree of tissue specificity of genes. More details about the gene groups and HPA thresholds are provided below. More details about the gene groups are provided in the vignette.
teGeneRetrieval( expressionData, foldChangeThreshold = 5, maxNumberOfTissues = 7, expressedGeneThreshold = 1 )
teGeneRetrieval( expressionData, foldChangeThreshold = 5, maxNumberOfTissues = 7, expressedGeneThreshold = 1 )
expressionData |
A SummarizedExperiment object containing gene expression values. |
foldChangeThreshold |
A numeric Threshold of fold change, default 5. |
maxNumberOfTissues |
A numeric Maximum number of tissues in a group for group enriched genes, default 7. |
expressedGeneThreshold |
A numeric Minimum gene expression cutoff for the gene to be called as expressed, default 1. |
The output is a SummarizedExperiment object containing the information about the tissue-specific genes with three columns: Gene, Tissue, and Enrichment group of the gene in the given tissue.
Ashish Jain, Geetu Tuteja
library(SummarizedExperiment) data<-system.file('extdata', 'test.expressiondata.txt', package = 'TissueEnrich') expressionData<-read.table(data,header=TRUE,row.names=1,sep='\t') se<-SummarizedExperiment(assays = SimpleList(as.matrix(expressionData)), rowData = row.names(expressionData),colData = colnames(expressionData)) output<-teGeneRetrieval(se) head(assay(output))
library(SummarizedExperiment) data<-system.file('extdata', 'test.expressiondata.txt', package = 'TissueEnrich') expressionData<-read.table(data,header=TRUE,row.names=1,sep='\t') se<-SummarizedExperiment(assays = SimpleList(as.matrix(expressionData)), rowData = row.names(expressionData),colData = colnames(expressionData)) output<-teGeneRetrieval(se) head(assay(output))