Title: | Multi-Contrast Gene Set Enrichment Analysis |
---|---|
Description: | mitch is an R package for multi-contrast enrichment analysis. At it’s heart, it uses a rank-MANOVA based statistical approach to detect sets of genes that exhibit enrichment in the multidimensional space as compared to the background. The rank-MANOVA concept dates to work by Cox and Mann (https://doi.org/10.1186/1471-2105-13-S16-S12). mitch is useful for pathway analysis of profiling studies with one, two or more contrasts, or in studies with multiple omics profiling, for example proteomic, transcriptomic, epigenomic analysis of the same samples. mitch is perfectly suited for pathway level differential analysis of scRNA-seq data. We have an established routine for pathway enrichment of Infinium Methylation Array data (see vignette). The main strengths of mitch are that it can import datasets easily from many upstream tools and has advanced plotting features to visualise these enrichments. |
Authors: | Mark Ziemann [aut, cre, cph] , Antony Kaspi [aut, cph] |
Maintainer: | Mark Ziemann <[email protected]> |
License: | CC BY-SA 4.0 + file LICENSE |
Version: | 1.19.3 |
Built: | 2024-12-04 18:02:26 UTC |
Source: | https://github.com/bioc/mitch |
Genesets from Reactome database suitable for enrichment analysis. Acquired August 2019. The structure of this data is a named list of vectors, containing human gene names (character strings). This is a sample of 200 gene sets from the approximately 2000 present in the full dataset.
data(genesetsExample)
data(genesetsExample)
A list of gene sets
Reactome website: https://reactome.org/
Fabregat et al. (2017) BMC Bioinformatics volume 18, Article number: 142, https://www.ncbi.nlm.nih.gov/pubmed/28249561
data(genesetsExample)
data(genesetsExample)
This function imports GMT files into a list of character vectors for mitch analysis. GMT files are a commonly used format for lists of genes used in pathway enrichment analysis. GMT files can be obtained from Reactome, MSigDB, etc.
gmt_import(gmtfile)
gmt_import(gmtfile)
gmtfile |
a gmt file. |
a list of gene sets.
# Import some gene sets genesetsExample<-gmt_import(system.file('extdata/sample_genesets.gmt', package = 'mitch'))
# Import some gene sets genesetsExample<-gmt_import(system.file('extdata/sample_genesets.gmt', package = 'mitch'))
Example edgeR result of differential ChIP-seq H3K36ac. This is a dataframe which contains columns for log fold change, log counts per million, p-value and FDR adjusted p-value. These columns consist of numerical values. The row names represent human gene names. This is a sample of 1000 gene of an original dataset that contains measurements of ~30000 genes.
data(k36a)
data(k36a)
data frame
data(k36a)
data(k36a)
Example edgeR result of differential ChIP-seq H3K9ac. This is a dataframe which contains columns for log fold change, log counts per million, p-value and FDR adjusted p-value. These columns consist of numerical values. The row names represent human gene names. This is a sample of 1000 gene of an original dataset that contains measurements of ~30000 genes.
data(k9a)
data(k9a)
data frame
data(k9a)
data(k9a)
mitch is an R package for multi-dimensional enrichment analysis. At it's heart, it uses a rank-MANOVA based statistical approach to detect sets of genes that exhibit enrichment in the multidimensional space as compared to the background. mitch is useful for pathway analysis of profiling studies with two to or more contrasts, or in studies with multiple omics profiling, for example proteomic, transcriptomic, epigenomic analysis of the same samples. mitch is perfectly suited for pathway level differential analysis of scRNA-seq data.
A typical mitch workflow consists of: 1) Import gene sets with gmt_import() 2) Import profiling data with mitch_import() 3) Calculate enrichments with mitch_calc() 4) And generate plots and reports with mitch_plots() and mitch_report()
More documentation on the github page https://github.com/markziemann/mitch or with ?<function>, eg: ?mitch_import
Maintainer: Mark Ziemann [email protected] (ORCID) [copyright holder]
Authors:
Antony Kaspi [copyright holder]
Useful links:
# Example workflow # Import some gene sets genesetsExample<-gmt_import(system.file('extdata/sample_genesets.gmt', package = 'mitch')) # Load some edgeR tables (rna, k9a, k36a). data(rna,k9a,k36a) # Create a list of differential profiles myList<-list('rna'=rna,'k9a'=k9a,'k36a'=k36a) # Import as edgeR table myImportedData<-mitch_import(myList,DEtype='edger') # Calculate enrichment using MANOVA resExample<-mitch_calc(myImportedData,genesetsExample,priority='effect', resrows=5,cores=2) # Generate some high res plots in PDF format mitch_plots(resExample,outfile='outres.pdf') #' Generate a report of the analysis in HTML format mitch_report(resExample,'outres.html')
# Example workflow # Import some gene sets genesetsExample<-gmt_import(system.file('extdata/sample_genesets.gmt', package = 'mitch')) # Load some edgeR tables (rna, k9a, k36a). data(rna,k9a,k36a) # Create a list of differential profiles myList<-list('rna'=rna,'k9a'=k9a,'k36a'=k36a) # Import as edgeR table myImportedData<-mitch_import(myList,DEtype='edger') # Calculate enrichment using MANOVA resExample<-mitch_calc(myImportedData,genesetsExample,priority='effect', resrows=5,cores=2) # Generate some high res plots in PDF format mitch_plots(resExample,outfile='outres.pdf') #' Generate a report of the analysis in HTML format mitch_report(resExample,'outres.html')
This function performs multivariate gene set enrichment analysis.
mitch_calc( x, genesets, minsetsize = 10, cores = 1, resrows = 50, priority = NULL )
mitch_calc( x, genesets, minsetsize = 10, cores = 1, resrows = 50, priority = NULL )
x |
a multicolumn numerical table with each column containing differential expression scores for a contrast. Rownames must match genesets. |
genesets |
lists of genes imported by the gmt_imprt function or similar. |
minsetsize |
the minimum number of genes required in a set for it to be included in the statistical analysis. Default is 10. |
cores |
the number of parallel threads for computation. Defaults to 1. |
resrows |
an integer value representing the number of top genesets for which a detailed report is to be generated. Default is 50. |
priority |
the prioritisation metric used to selecting top gene sets. Valid options are 'significance', 'effect' and 'SD'. |
mitch res object with the following parts: $input_profile: the supplied input differential profile $input_genesets: the supplied input gene sets $ranked_profile: the differential profile after ranking $enrichment_result: the table of MANOVA/ANOVA enrichment results for each gene set $analysis_metrics: several metrics that are important to the interpretation of the results $detailed_sets: a list of dataframes containing ranks of members of prioritised gene sets.
# Example using mitch to calculate multivariate enrichments and # prioritise based on effect size data(myImportedData,genesetsExample) resExample<-mitch_calc(myImportedData,genesetsExample,priority='effect', minsetsize=5,cores=2)
# Example using mitch to calculate multivariate enrichments and # prioritise based on effect size data(myImportedData,genesetsExample) resExample<-mitch_calc(myImportedData,genesetsExample,priority='effect', minsetsize=5,cores=2)
This function imports differential omics data from common differential tools like edgeR, limma and DESeq2. It calculates a summarised differential expression metric by multiplying the sign of the log fold change by the -log10 of the p-value. If this behaviour is not desired, mitch_import can be bypassed in favour of another scoring metric.
mitch_import(x, DEtype, geneIDcol = NULL, geneTable = NULL, joinType = NULL)
mitch_import(x, DEtype, geneIDcol = NULL, geneTable = NULL, joinType = NULL)
x |
a list of differential expression tables. |
DEtype |
the program that generated the differential expression table Valid options are 'edgeR', 'DESeq2', 'limma', 'ABSSeq', 'Sleuth', 'Seurat', 'topConfects', 'muscat', 'Swish', 'scDE', 'MAST', 'DEsingle', 'ballgown', 'NOIseq', 'TCC', 'DEDS', 'cuffdiff', 'fishpond', 'missMethyl', 'DMRcate', 'DEP', 'msmsTests', 'plgem', 'SDAMS', 'DEqMS', 'DiffBind' and 'prescored'. Where 'prescored' is a dataframe containing the test statistic and gene ID (either in rowname or separate column) and nothing else. 'preranked' is an alias for 'prescored'. |
geneIDcol |
the column containing gene names. If gene names are specified as row names, then geneIDcol=NULL. |
geneTable |
a 2 column table mapping gene identifiers in the profile to gene identifiers in the gene sets. |
joinType |
the type of join to perform, either 'inner' or 'full'. By default, joins are 'inner' except for Seurat and muscat where full is used. |
a multi-column table compatible with mitch_calc analysis.
# first step is to create a list of differential profiles data(rna,k9a,k36a) x<-list('rna'=rna,'k9a'=k9a,'k36a'=k36a) # import as edgeR table imported<-mitch_import(x,DEtype='edger')
# first step is to create a list of differential profiles data(rna,k9a,k36a) x<-list('rna'=rna,'k9a'=k9a,'k36a'=k36a) # import as edgeR table imported<-mitch_import(x,DEtype='edger')
This function generates several plots of multivariate gene set enrichment in high resolution PDF format. The number of detailed sets to generate is dictated by the resrows set in the mitch_calc command.
mitch_plots(res, outfile = "Rplots.pdf")
mitch_plots(res, outfile = "Rplots.pdf")
res |
a mitch results object. |
outfile |
the destination file for the plots in PDF format. should contain 'pdf' suffix. Defaults to 'Rplots.pdf' |
generates a PDF file containing enrichment plots.
data(resExample) mitch_plots(resExample,outfile='outres.pdf')
data(resExample) mitch_plots(resExample,outfile='outres.pdf')
This function generates an R markdown based html report containing tables and several plots of mitch results The plots are in png format, so are not as high in resolution as compared to the PDF generated by mitch_plots function. The number of detailed sets to generate is dictated by the resrows set in the mitch_calc command.
mitch_report(res, outfile, overwrite = FALSE)
mitch_report(res, outfile, overwrite = FALSE)
res |
a mitch results object. |
outfile |
the destination file for the html report. should contain 'html' suffix. Defaults to 'Rplots.pdf' |
overwrite |
should overwrite the report file if it already exists? |
generates a HTML file containing enrichment plots.
data(resExample) mitch_report(resExample,'outres2.html')
data(resExample) mitch_report(resExample,'outres2.html')
Example of three edgeR profiles imported using mitch. The structure of this data is a dataframe where each column represents one of the following profiling datasets after scoring: RNA, H3K9ac and H3K36ac. Each row represents one gene and this dataset contains just 1000 rows to keep the example dataset small.
data(myImportedData)
data(myImportedData)
data frame
data(myImportedData)
data(myImportedData)
Example edgeR results of differential RNA, H3K9ac and H3K36ac profiling. The structure of this data is a list of three dataframes. Each data frame is 1000 lines only.
data(myList)
data(myList)
data frame
data(myList)
data(myList)
Example of mitch results. Enrichment of the Reactome gene sets in the RNA, H3K9ac and H3K36ac datasets. The structure of this data set is a list where the 1st element is "input_profile" that has been imported (data frame), 2nd element is the "input_genesets" (names list of gene names [character vectors]), 3rd is "ranked_profile" which is the input profiling data after ranking (data frame), 4th is "enrichment_result" which is a data frame which provides enrichment information on each gene set in the profiling data including s scores, p-values and FDR adjusted p-values. 5th is "analysis_metrics" (list). The 6th slot is "detailed_sets" which is a list of 5 matrices which details the enrichment of members of selected gene sets.
data(resExample)
data(resExample)
list of mixed data types
data(resExample)
data(resExample)
Example edgeR result of differential RNA expression. This is a dataframe which contains columns for log fold change, log counts per million, p-value and FDR adjusted p-value. These columns consist of numerical values. The row names represent human gene names. This is a sample of 1000 gene of an original dataset that contains measurements of ~15000 genes.
data(rna)
data(rna)
data frame
data(rna)
data(rna)