Title: | Interactive Differential Expression AnaLysis |
---|---|
Description: | This package provides functions for an Interactive Differential Expression AnaLysis of RNA-sequencing datasets, to extract quickly and effectively information downstream the step of differential expression. A Shiny application encapsulates the whole package. |
Authors: | Federico Marini [aut, cre] |
Maintainer: | Federico Marini <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.29.0 |
Built: | 2024-07-05 02:51:52 UTC |
Source: | https://github.com/bioc/ideal |
Generate a tidy table with the DE genes from the results of DESeq
deseqresult2DEgenes(deseqresult, FDR = 0.05)
deseqresult2DEgenes(deseqresult, FDR = 0.05)
deseqresult |
A |
FDR |
Numeric value, the significance level for thresholding adjusted p-values |
A "tidy" data.frame with only genes marked as differentially expressed
# with simulated data... library(DESeq2) dds <- DESeq2::makeExampleDESeqDataSet(n = 100, m = 8, betaSD = 2) dds <- DESeq(dds) res <- results(dds) deseqresult2DEgenes(res)
# with simulated data... library(DESeq2) dds <- DESeq2::makeExampleDESeqDataSet(n = 100, m = 8, betaSD = 2) dds <- DESeq(dds) res <- results(dds) deseqresult2DEgenes(res)
Generate a tidy table with the results of DESeq
deseqresult2tbl(deseqresult)
deseqresult2tbl(deseqresult)
deseqresult |
A |
A "tidy" data.frame with all genes
# with simulated data... library(DESeq2) dds <- DESeq2::makeExampleDESeqDataSet(n = 100, m = 8, betaSD = 1) dds <- DESeq2::DESeq(dds) res <- DESeq2::results(dds) deseqresult2tbl(res)
# with simulated data... library(DESeq2) dds <- DESeq2::makeExampleDESeqDataSet(n = 100, m = 8, betaSD = 1) dds <- DESeq2::DESeq(dds) res <- DESeq2::results(dds) deseqresult2tbl(res)
Plot for normalized counts of a single gene, with jittered points superimposed on the boxplot
ggplotCounts( dds, gene, intgroup = "condition", annotation_obj = NULL, transform = TRUE, labels_repel = TRUE )
ggplotCounts( dds, gene, intgroup = "condition", annotation_obj = NULL, transform = TRUE, labels_repel = TRUE )
dds |
A |
gene |
A character, specifying the name of the gene to plot |
intgroup |
Interesting groups: a character vector of
names in |
annotation_obj |
A |
transform |
Logical value, corresponding whether to have log scale y-axis or not. Defaults to TRUE. |
labels_repel |
Logical value. Whether to use |
Note: this function relies on the plotCounts
function of DESeq2,
therefore pseudocounts of 0.5 are added to each point
An object created by ggplot
library(airway) data(airway) airway dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway), colData = colData(airway), design = ~ cell + dex ) ggplotCounts(dds_airway, gene = "ENSG00000103196", # CRISPLD2 in the original publication intgroup = "dex" )
library(airway) data(airway) airway dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway), colData = colData(airway), design = ~ cell + dex ) ggplotCounts(dds_airway, gene = "ENSG00000103196", # CRISPLD2 in the original publication intgroup = "dex" )
A wrapper for extracting functional GO terms enriched in a list of (DE) genes, based on the algorithm and the implementation in the goseq package
goseqTable( de.genes, assayed.genes, genome = "hg38", id = "ensGene", testCats = c("GO:BP", "GO:MF", "GO:CC"), FDR_GO_cutoff = 1, nTop = 200, orgDbPkg = "org.Hs.eg.db", addGeneToTerms = TRUE )
goseqTable( de.genes, assayed.genes, genome = "hg38", id = "ensGene", testCats = c("GO:BP", "GO:MF", "GO:CC"), FDR_GO_cutoff = 1, nTop = 200, orgDbPkg = "org.Hs.eg.db", addGeneToTerms = TRUE )
de.genes |
A vector of (differentially expressed) genes |
assayed.genes |
A vector of background genes, e.g. all (expressed) genes in the assays |
genome |
A string identifying the genome that genes refer to, as in the
|
id |
A string identifying the gene identifier used by genes, as in the
|
testCats |
A vector specifying which categories to test for over representation amongst DE genes - can be any combination of "GO:CC", "GO:BP", "GO:MF" & "KEGG" |
FDR_GO_cutoff |
Numeric value for subsetting the results |
nTop |
Number of categories to extract, and optionally process for adding genes to the respective terms |
orgDbPkg |
Character string, named as the |
addGeneToTerms |
Logical, whether to add a column with all genes annotated to each GO term |
Note: the feature length retrieval is based on the goseq
function,
and requires that the corresponding TxDb packages are installed and available
A table containing the computed GO Terms and related enrichment scores
library(airway) data(airway) airway dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway), colData = colData(airway), design = ~ cell + dex ) dds_airway <- DESeq2::DESeq(dds_airway) res_airway <- DESeq2::results(dds_airway) res_subset <- deseqresult2DEgenes(res_airway)[1:100, ] myde <- res_subset$id myassayed <- rownames(res_airway) ## Not run: mygo <- goseqTable(myde, myassayed, testCats = "GO:BP", addGeneToTerms = FALSE ) head(mygo) ## End(Not run)
library(airway) data(airway) airway dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway), colData = colData(airway), design = ~ cell + dex ) dds_airway <- DESeq2::DESeq(dds_airway) res_airway <- DESeq2::results(dds_airway) res_subset <- deseqresult2DEgenes(res_airway)[1:100, ] myde <- res_subset$id myassayed <- rownames(res_airway) ## Not run: mygo <- goseqTable(myde, myassayed, testCats = "GO:BP", addGeneToTerms = FALSE ) head(mygo) ## End(Not run)
ideal makes differential expression analysis interactive, easy and reproducible. This function launches the main application included in the package.
ideal( dds_obj = NULL, res_obj = NULL, annotation_obj = NULL, countmatrix = NULL, expdesign = NULL, gene_signatures = NULL )
ideal( dds_obj = NULL, res_obj = NULL, annotation_obj = NULL, countmatrix = NULL, expdesign = NULL, gene_signatures = NULL )
dds_obj |
A |
res_obj |
A |
annotation_obj |
A |
countmatrix |
A count matrix, with genes as rows and samples as columns. If not provided, it is possible to upload the data during the execution of the Shiny App |
expdesign |
A |
gene_signatures |
A list of vectors, one for each pathway/signature. This
is for example the output of the |
A Shiny App is launched for interactive data exploration and differential expression analysis
# with simulated data... library(DESeq2) dds <- DESeq2::makeExampleDESeqDataSet(n = 100, m = 8) cm <- counts(dds) cd <- colData(dds) # with the well known airway package... library(airway) data(airway) airway dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway), colData = colData(airway), design = ~ cell + dex ) ## Not run: ideal() ideal(dds) ideal(dds_airway) dds_airway <- DESeq2::DESeq(dds_airway) res_airway <- DESeq2::results(dds_airway) ideal(dds_airway, res_airway) ## End(Not run)
# with simulated data... library(DESeq2) dds <- DESeq2::makeExampleDESeqDataSet(n = 100, m = 8) cm <- counts(dds) cd <- colData(dds) # with the well known airway package... library(airway) data(airway) airway dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway), colData = colData(airway), design = ~ cell + dex ) ## Not run: ideal() ideal(dds) ideal(dds_airway) dds_airway <- DESeq2::DESeq(dds_airway) res_airway <- DESeq2::results(dds_airway) ideal(dds_airway, res_airway) ## End(Not run)
ideal makes differential expression analysis interactive, easy and reproducible. The analysis of RNA-seq datasets is guided by the Shiny app as main component of the package, which also provides a wide set of functions to efficiently extract information from the existing data. The app can be also deployed on a Shiny server, to allow its usage without any installation on the user's side.
ideal makes differential expression analysis interactive, easy and reproducible. The analysis of RNA-seq datasets is guided by the Shiny app as main component of the package, which also provides a wide set of functions to efficiently extract information from the existing data. The app can be also deployed on a Shiny server, to allow its usage without any installation on the user's side.
Federico Marini [email protected], 2016-2017
Maintainer: Federico Marini [email protected]
Useful links:
Report bugs at https://github.com/federicomarini/ideal/issues
MA-plot from base means and log fold changes, in the ggplot2 framework, with additional support to annotate genes if provided.
plot_ma( res_obj, FDR = 0.05, point_alpha = 0.2, sig_color = "red", annotation_obj = NULL, draw_y0 = TRUE, hlines = NULL, title = NULL, xlab = "mean of normalized counts - log10 scale", ylim = NULL, add_rug = TRUE, intgenes = NULL, intgenes_color = "steelblue", labels_intgenes = TRUE, labels_repel = TRUE )
plot_ma( res_obj, FDR = 0.05, point_alpha = 0.2, sig_color = "red", annotation_obj = NULL, draw_y0 = TRUE, hlines = NULL, title = NULL, xlab = "mean of normalized counts - log10 scale", ylim = NULL, add_rug = TRUE, intgenes = NULL, intgenes_color = "steelblue", labels_intgenes = TRUE, labels_repel = TRUE )
res_obj |
A |
FDR |
Numeric value, the significance level for thresholding adjusted p-values |
point_alpha |
Alpha transparency value for the points (0 = transparent, 1 = opaque) |
sig_color |
Color to use to mark differentially expressed genes. Defaults to red |
annotation_obj |
A |
draw_y0 |
Logical, whether to draw the horizontal line at y=0. Defaults to TRUE. |
hlines |
The y coordinate (in absolute value) where to draw horizontal lines, optional |
title |
A title for the plot, optional |
xlab |
X axis label, defaults to "mean of normalized counts - log10 scale" |
ylim |
Vector of two numeric values, Y axis limits to restrict the view |
add_rug |
Logical, whether to add rug plots in the margins |
intgenes |
Vector of genes of interest. Gene symbols if a |
intgenes_color |
The color to use to mark the genes on the main plot. |
labels_intgenes |
Logical, whether to add the gene identifiers/names close to the marked plots |
labels_repel |
Logical, whether to use |
The genes of interest are to be provided as gene symbols if a symbol
column is provided in res_obj
, or else by using the identifiers specified
in the row names
An object created by ggplot
library(airway) data(airway) airway dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway), colData = colData(airway), design = ~ cell + dex ) # subsetting for quicker run, ignore the next two commands if regularly using the function gene_subset <- c( "ENSG00000103196", # CRISPLD2 "ENSG00000120129", # DUSP1 "ENSG00000163884", # KLF15 "ENSG00000179094", # PER1 rownames(dds_airway)[rep(c(rep(FALSE, 99), TRUE), length.out = nrow(dds_airway))] ) # 1% of ids dds_airway <- dds_airway[gene_subset, ] dds_airway <- DESeq2::DESeq(dds_airway) res_airway <- DESeq2::results(dds_airway) plot_ma(res_airway, FDR = 0.05, hlines = 1) plot_ma(res_airway, FDR = 0.1, intgenes = c( "ENSG00000103196", # CRISPLD2 "ENSG00000120129", # DUSP1 "ENSG00000163884", # KLF15 "ENSG00000179094" # PER1 ) )
library(airway) data(airway) airway dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway), colData = colData(airway), design = ~ cell + dex ) # subsetting for quicker run, ignore the next two commands if regularly using the function gene_subset <- c( "ENSG00000103196", # CRISPLD2 "ENSG00000120129", # DUSP1 "ENSG00000163884", # KLF15 "ENSG00000179094", # PER1 rownames(dds_airway)[rep(c(rep(FALSE, 99), TRUE), length.out = nrow(dds_airway))] ) # 1% of ids dds_airway <- dds_airway[gene_subset, ] dds_airway <- DESeq2::DESeq(dds_airway) res_airway <- DESeq2::results(dds_airway) plot_ma(res_airway, FDR = 0.05, hlines = 1) plot_ma(res_airway, FDR = 0.1, intgenes = c( "ENSG00000103196", # CRISPLD2 "ENSG00000120129", # DUSP1 "ENSG00000163884", # KLF15 "ENSG00000179094" # PER1 ) )
Volcano plot for log fold changes and log p-values in the ggplot2 framework, with additional support to annotate genes if provided.
plot_volcano( res_obj, FDR = 0.05, ylim_up = NULL, vlines = NULL, title = NULL, intgenes = NULL, intgenes_color = "steelblue", labels_intgenes = TRUE, labels_repel = TRUE )
plot_volcano( res_obj, FDR = 0.05, ylim_up = NULL, vlines = NULL, title = NULL, intgenes = NULL, intgenes_color = "steelblue", labels_intgenes = TRUE, labels_repel = TRUE )
res_obj |
A |
FDR |
Numeric value, the significance level for thresholding adjusted p-values |
ylim_up |
Numeric value, Y axis upper limits to restrict the view |
vlines |
The x coordinate (in absolute value) where to draw vertical lines, optional |
title |
A title for the plot, optional |
intgenes |
Vector of genes of interest. Gene symbols if a |
intgenes_color |
The color to use to mark the genes on the main plot. |
labels_intgenes |
Logical, whether to add the gene identifiers/names close to the marked plots |
labels_repel |
Logical, whether to use |
The genes of interest are to be provided as gene symbols if a symbol
column is provided in res_obj
, or else b< using the identifiers specified
in the row names
An object created by ggplot
library(airway) data(airway) airway dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway), colData = colData(airway), design = ~ cell + dex ) # subsetting for quicker run, ignore the next two commands if regularly using the function gene_subset <- c( "ENSG00000103196", # CRISPLD2 "ENSG00000120129", # DUSP1 "ENSG00000163884", # KLF15 "ENSG00000179094", # PER1 rownames(dds_airway)[rep(c(rep(FALSE, 99), TRUE), length.out = nrow(dds_airway))] ) # 1% of ids dds_airway <- dds_airway[gene_subset, ] dds_airway <- DESeq2::DESeq(dds_airway) res_airway <- DESeq2::results(dds_airway) plot_volcano(res_airway)
library(airway) data(airway) airway dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway), colData = colData(airway), design = ~ cell + dex ) # subsetting for quicker run, ignore the next two commands if regularly using the function gene_subset <- c( "ENSG00000103196", # CRISPLD2 "ENSG00000120129", # DUSP1 "ENSG00000163884", # KLF15 "ENSG00000179094", # PER1 rownames(dds_airway)[rep(c(rep(FALSE, 99), TRUE), length.out = nrow(dds_airway))] ) # 1% of ids dds_airway <- dds_airway[gene_subset, ] dds_airway <- DESeq2::DESeq(dds_airway) res_airway <- DESeq2::results(dds_airway) plot_volcano(res_airway)
Returns a list of pathways from a GMT file.
read_gmt(gmtfile)
read_gmt(gmtfile)
gmtfile |
A character value, containing the location of the GMT formatted file. It can also be a file found online |
A list of vectors, one for each pathway in the GMT file.
# this example reads in the freely available pathways from wikipathways ## Not run: mysigs <- read_gmt( "http://data.wikipathways.org/20180910/gmt/wikipathways-20180910-gmt-Homo_sapiens.gmt" ) head(mysigs) # see how the gene identifiers are encoded as ENTREZ id ## End(Not run)
# this example reads in the freely available pathways from wikipathways ## Not run: mysigs <- read_gmt( "http://data.wikipathways.org/20180910/gmt/wikipathways-20180910-gmt-Homo_sapiens.gmt" ) head(mysigs) # see how the gene identifiers are encoded as ENTREZ id ## End(Not run)
This function tries to guess which separator was used in a text delimited file
sepguesser(file, sep_list = c(",", "\t", ";", " "))
sepguesser(file, sep_list = c(",", "\t", ";", " "))
file |
The name of the file which the data are to be read from |
sep_list |
A vector containing the candidates for being identified as
separators. Defaults to |
A character value, corresponding to the guessed separator. One of "," (comma), "\t" (tab), ";" (semicolon)," " (whitespace)
sepguesser(system.file("extdata/design_commas.txt", package = "ideal")) sepguesser(system.file("extdata/design_semicolons.txt", package = "ideal")) sepguesser(system.file("extdata/design_spaces.txt", package = "ideal")) mysep <- sepguesser(system.file("extdata/design_tabs.txt", package = "ideal")) # to be used for reading in the same file, without having to specify the sep
sepguesser(system.file("extdata/design_commas.txt", package = "ideal")) sepguesser(system.file("extdata/design_semicolons.txt", package = "ideal")) sepguesser(system.file("extdata/design_spaces.txt", package = "ideal")) mysep <- sepguesser(system.file("extdata/design_tabs.txt", package = "ideal")) # to be used for reading in the same file, without having to specify the sep
Plot a heatmap for the selected gene signature on the provided data, with the possibility to compactly display also DE only genes
sig_heatmap( vst_data, my_signature, res_data = NULL, FDR = 0.05, de_only = FALSE, annovec, title = "", cluster_rows = TRUE, cluster_cols = FALSE, anno_colData = NULL, center_mean = TRUE, scale_row = FALSE )
sig_heatmap( vst_data, my_signature, res_data = NULL, FDR = 0.05, de_only = FALSE, annovec, title = "", cluster_rows = TRUE, cluster_cols = FALSE, anno_colData = NULL, center_mean = TRUE, scale_row = FALSE )
vst_data |
A |
my_signature |
A character vector, usually named, containing the genes which compose the gene signature |
res_data |
A |
FDR |
Numeric value between 0 and 1, the False Discovery Rate |
de_only |
Logical, whether to display only DE genes belonging to the pathway - defaults to FALSE |
annovec |
A named character vector, with the corresponding annotation across IDs |
title |
Character, title for the heatmap |
cluster_rows |
Logical, whether to cluster rows - defaults to TRUE |
cluster_cols |
Logical, whether to cluster column - defaults to FALSE. Recommended to be set to TRUE if de_only is also set to TRUE |
anno_colData |
Character vector, specifying the elements of the colData information to be displayed as a decoration of the heatmap. Can be a vector of any length, as long as these names are included as colData. Defaults to NULL, which would plot no annotation on the samples. |
center_mean |
Logical, whether to perform mean centering on the expression values. Defaults to TRUE, as it improves the general readability of the heatmap |
scale_row |
Logical, whether to perform row-based standardization of the expression values |
A plot based on the pheatmap
function
# with the well known airway package... library(airway) data(airway) airway dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway), colData = colData(airway), design = ~ cell + dex ) ## Not run: dds_airway <- DESeq2::DESeq(dds_airway) res_airway <- DESeq2::results(dds_airway) vst_airway <- DESeq2::vst(dds_airway) library(org.Hs.eg.db) annovec <- mapIds(org.Hs.eg.db, rownames(dds_airway), "ENTREZID", "ENSEMBL") mysignatures <- read_gmt( "http://data.wikipathways.org/20190210/gmt/wikipathways-20190210-gmt-Homo_sapiens.gmt" ) mysignature_name <- "Lung fibrosis%WikiPathways_20190210%WP3624%Homo sapiens" library(pheatmap) sig_heatmap(vst_airway, mysignatures[[mysignature_name]], res_data = res_airway, de_only = TRUE, annovec = annovec, title = mysignature_name, cluster_cols = TRUE ) ## End(Not run)
# with the well known airway package... library(airway) data(airway) airway dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway), colData = colData(airway), design = ~ cell + dex ) ## Not run: dds_airway <- DESeq2::DESeq(dds_airway) res_airway <- DESeq2::results(dds_airway) vst_airway <- DESeq2::vst(dds_airway) library(org.Hs.eg.db) annovec <- mapIds(org.Hs.eg.db, rownames(dds_airway), "ENTREZID", "ENSEMBL") mysignatures <- read_gmt( "http://data.wikipathways.org/20190210/gmt/wikipathways-20190210-gmt-Homo_sapiens.gmt" ) mysignature_name <- "Lung fibrosis%WikiPathways_20190210%WP3624%Homo sapiens" library(pheatmap) sig_heatmap(vst_airway, mysignatures[[mysignature_name]], res_data = res_airway, de_only = TRUE, annovec = annovec, title = mysignature_name, cluster_cols = TRUE ) ## End(Not run)
Combine data from a typical DESeq2 run
wrapup_for_iSEE(dds, res)
wrapup_for_iSEE(dds, res)
dds |
A |
res |
A |
Combines the DESeqDataSet input and DESeqResults into a SummarizedExperiment object, which can be readily explored with iSEE.
A typical usage would be after running the DESeq2 pipeline as specified in one of the workflows which include this package, e.g. in the context of the ideal package.
A SummarizedExperiment object, with raw counts, normalized counts, and variance-stabilizing transformed counts in the assay slots; and with colData and rowData extracted from the corresponding input parameters
# with simulated data... library(DESeq2) dds <- DESeq2::makeExampleDESeqDataSet(n = 10000, m = 8) dds <- DESeq(dds) res <- results(dds) se <- wrapup_for_iSEE(dds, res) # library(iSEE) # iSEE(se) ## Not run: # or with the well known airway package... library(airway) data(airway) airway dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway), colData = colData(airway), design = ~ cell + dex ) dds_airway <- DESeq2::DESeq(dds_airway) res_airway <- DESeq2::results(dds_airway) se_airway <- wrapup_for_iSEE(dds_airway, res_airway) # iSEE(se_airway) ## End(Not run)
# with simulated data... library(DESeq2) dds <- DESeq2::makeExampleDESeqDataSet(n = 10000, m = 8) dds <- DESeq(dds) res <- results(dds) se <- wrapup_for_iSEE(dds, res) # library(iSEE) # iSEE(se) ## Not run: # or with the well known airway package... library(airway) data(airway) airway dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway), colData = colData(airway), design = ~ cell + dex ) dds_airway <- DESeq2::DESeq(dds_airway) res_airway <- DESeq2::results(dds_airway) se_airway <- wrapup_for_iSEE(dds_airway, res_airway) # iSEE(se_airway) ## End(Not run)