Package 'pcaExplorer' reference manual

Title:	Interactive Visualization of RNA-seq Data Using a Principal Components Approach
Description:	This package provides functionality for interactive visualization of RNA-seq datasets based on Principal Components Analysis. The methods provided allow for quick information extraction and effective data exploration. A Shiny application encapsulates the whole analysis.
Authors:	Federico Marini [aut, cre]
Maintainer:	Federico Marini <[email protected]>
License:	MIT + file LICENSE
Version:	3.1.1
Built:	2025-03-24 06:46:10 UTC
Source:	https://github.com/bioc/pcaExplorer

Principal components (cor)relation with experimental covariates

Description

Computes the significance of (cor)relations between PCA scores and the sample experimental covariates, using Kruskal-Wallis test for categorial variables and the cor.test based on Spearman's correlation for continuous variables

Usage

correlatePCs(pcaobj, coldata, pcs = 1:4)
correlatePCs(pcaobj, coldata, pcs = 1:4)

Arguments

`pcaobj`	A `prcomp` object
`coldata`	A `data.frame` object containing the experimental covariates
`pcs`	A numeric vector, containing the corresponding PC number

Value

A data.frame object with computed p values for each covariate and for each principal component

Examples

library(DESeq2)
dds <- makeExampleDESeqDataSet_multifac(betaSD_condition = 3, betaSD_tissue = 1)
rlt <- DESeq2::rlogTransformation(dds)
pcaobj <- prcomp(t(assay(rlt)))
correlatePCs(pcaobj, colData(dds))

library(DESeq2)
dds <- makeExampleDESeqDataSet_multifac(betaSD_condition = 3, betaSD_tissue = 1)
rlt <- DESeq2::rlogTransformation(dds)
pcaobj <- prcomp(t(assay(rlt)))
correlatePCs(pcaobj, colData(dds))

Deprecated functions in pcaExplorer

Description

Functions that are on their way to the function afterlife. Their successors are also listed.

Arguments

...

Ignored arguments.

Details

The successors of these functions are likely coming after the rework that led to the creation of the mosdef package. See more into its documentation for more details.

Value

All functions throw a warning, with a deprecation message pointing towards its descendent (if available).

Transitioning to the mosdef framework

topGOtable() is now being replaced by the more flexible mosdef::run_topGO() function

Author(s)

Federico Marini

Examples

# try(topGOtable())

# try(topGOtable())

Plot distribution of expression values

Description

Plot distribution of expression values

Usage

distro_expr(rld, plot_type = "density")
distro_expr(rld, plot_type = "density")

Arguments

`rld`	A `DESeq2::DESeqTransform()` object.
`plot_type`	Character, choose one of `boxplot`, `violin` or `density`. Defaults to `density`

Value

A plot with the distribution of the expression values

Examples

dds <- makeExampleDESeqDataSet_multifac(betaSD_condition = 3, betaSD_tissue = 1)
rlt <- DESeq2::rlogTransformation(dds)
distro_expr(rlt)
dds <- makeExampleDESeqDataSet_multifac(betaSD_condition = 3, betaSD_tissue = 1)
rlt <- DESeq2::rlogTransformation(dds)
distro_expr(rlt)

Extract and plot the expression profile of genes

Description

Extract and plot the expression profile of genes

Usage

geneprofiler(se, genelist = NULL, intgroup = "condition", plotZ = FALSE)
geneprofiler(se, genelist = NULL, intgroup = "condition", plotZ = FALSE)

Arguments

`se`	A `DESeq2::DESeqDataSet()` object, or a `DESeq2::DESeqTransform()` object.
`genelist`	An array of characters, including the names of the genes of interest of which the profile is to be plotted
`intgroup`	A factor, needs to be in the `colnames` of `colData(se)`
`plotZ`	Logical, whether to plot the scaled expression values. Defaults to `FALSE`

Value

A plot of the expression profile for the genes

Examples

dds <- makeExampleDESeqDataSet_multifac(betaSD_condition = 3, betaSD_tissue = 1)
rlt <- DESeq2::rlogTransformation(dds)
geneprofiler(rlt, paste0("gene", sample(1:1000, 20)))
geneprofiler(rlt, paste0("gene", sample(1:1000, 20)), plotZ = TRUE)
dds <- makeExampleDESeqDataSet_multifac(betaSD_condition = 3, betaSD_tissue = 1)
rlt <- DESeq2::rlogTransformation(dds)
geneprofiler(rlt, paste0("gene", sample(1:1000, 20)))
geneprofiler(rlt, paste0("gene", sample(1:1000, 20)), plotZ = TRUE)

Principal components analysis on the genes

Description

Computes and plots the principal components of the genes, eventually displaying the samples as in a typical biplot visualization.

Usage

genespca(
  x,
  ntop,
  choices = c(1, 2),
  arrowColors = "steelblue",
  groupNames = "group",
  biplot = TRUE,
  scale = 1,
  pc.biplot = TRUE,
  obs.scale = 1 - scale,
  var.scale = scale,
  groups = NULL,
  ellipse = FALSE,
  ellipse.prob = 0.68,
  labels = NULL,
  labels.size = 3,
  alpha = 1,
  var.axes = TRUE,
  circle = FALSE,
  circle.prob = 0.69,
  varname.size = 4,
  varname.adjust = 1.5,
  varname.abbrev = FALSE,
  returnData = FALSE,
  coordEqual = FALSE,
  scaleArrow = 1,
  useRownamesAsLabels = TRUE,
  point_size = 2,
  annotation = NULL
)
genespca(
  x,
  ntop,
  choices = c(1, 2),
  arrowColors = "steelblue",
  groupNames = "group",
  biplot = TRUE,
  scale = 1,
  pc.biplot = TRUE,
  obs.scale = 1 - scale,
  var.scale = scale,
  groups = NULL,
  ellipse = FALSE,
  ellipse.prob = 0.68,
  labels = NULL,
  labels.size = 3,
  alpha = 1,
  var.axes = TRUE,
  circle = FALSE,
  circle.prob = 0.69,
  varname.size = 4,
  varname.adjust = 1.5,
  varname.abbrev = FALSE,
  returnData = FALSE,
  coordEqual = FALSE,
  scaleArrow = 1,
  useRownamesAsLabels = TRUE,
  point_size = 2,
  annotation = NULL
)

Arguments

`x`	A `DESeq2::DESeqTransform()` object, with data in `assay(x)`, produced for example by either `DESeq2::rlog()` or `DESeq2::varianceStabilizingTransformation()`
`ntop`	Number of top genes to use for principal components, selected by highest row variance
`choices`	Vector of two numeric values, to select on which principal components to plot
`arrowColors`	Vector of character, either as long as the number of the samples, or one single value
`groupNames`	Factor containing the groupings for the input data. Is efficiently chosen as the (interaction of more) factors in the colData for the object provided
`biplot`	Logical, whether to additionally draw the samples labels as in a biplot representation
`scale`	Covariance biplot (scale = 1), form biplot (scale = 0). When scale = 1, the inner product between the variables approximates the covariance and the distance between the points approximates the Mahalanobis distance.
`pc.biplot`	Logical, for compatibility with biplot.princomp()
`obs.scale`	Scale factor to apply to observations
`var.scale`	Scale factor to apply to variables
`groups`	Optional factor variable indicating the groups that the observations belong to. If provided the points will be colored according to groups
`ellipse`	Logical, draw a normal data ellipse for each group
`ellipse.prob`	Size of the ellipse in Normal probability
`labels`	optional Vector of labels for the observations
`labels.size`	Size of the text used for the labels
`alpha`	Alpha transparency value for the points (0 = transparent, 1 = opaque)
`var.axes`	Logical, draw arrows for the variables?
`circle`	Logical, draw a correlation circle? (only applies when prcomp was called with scale = TRUE and when var.scale = 1)
`circle.prob`	Size of the correlation circle in Normal probability
`varname.size`	Size of the text for variable names
`varname.adjust`	Adjustment factor the placement of the variable names, '>= 1' means farther from the arrow
`varname.abbrev`	Logical, whether or not to abbreviate the variable names
`returnData`	Logical, if TRUE returns a data.frame for further use, containing the selected principal components for custom plotting
`coordEqual`	Logical, default FALSE, for allowing brushing. If TRUE, plot using equal scale cartesian coordinates
`scaleArrow`	Multiplicative factor, usually >=1, only for visualization purposes, to allow for distinguishing where the variables are plotted
`useRownamesAsLabels`	Logical, if TRUE uses the row names as labels for plotting
`point_size`	Size of the points to be plotted for the observations (genes)
`annotation`	A `data.frame` object, with row.names as gene identifiers (e.g. ENSEMBL ids) and a column, `gene_name`, containing e.g. HGNC-based gene symbols

Details

The implementation of this function is based on the beautiful ggbiplot package developed by Vince Vu, available at https://github.com/vqv/ggbiplot. The adaptation and additional parameters are tailored to display typical genomics data such as the transformed counts of RNA-seq experiments

Value

An object created by ggplot, which can be assigned and further customized.

Examples


library(DESeq2)
dds <- makeExampleDESeqDataSet_multifac(betaSD_condition = 3, betaSD_tissue = 1)
rlt <- rlogTransformation(dds)
groups <- colData(dds)$condition
groups <- factor(groups, levels = unique(groups))
cols <- scales::hue_pal()(2)[groups]
genespca(rlt, ntop=100, arrowColors = cols, groupNames = groups)

groups_multi <- interaction(as.data.frame(colData(rlt)[, c("condition", "tissue")]))
groups_multi <- factor(groups_multi, levels = unique(groups_multi))
cols_multi <- scales::hue_pal()(length(levels(groups_multi)))[factor(groups_multi)]
genespca(rlt, ntop = 100, arrowColors = cols_multi, groupNames = groups_multi)

library(DESeq2)
dds <- makeExampleDESeqDataSet_multifac(betaSD_condition = 3, betaSD_tissue = 1)
rlt <- rlogTransformation(dds)
groups <- colData(dds)$condition
groups <- factor(groups, levels = unique(groups))
cols <- scales::hue_pal()(2)[groups]
genespca(rlt, ntop=100, arrowColors = cols, groupNames = groups)

groups_multi <- interaction(as.data.frame(colData(rlt)[, c("condition", "tissue")]))
groups_multi <- factor(groups_multi, levels = unique(groups_multi))
cols_multi <- scales::hue_pal()(length(levels(groups_multi)))[factor(groups_multi)]
genespca(rlt, ntop = 100, arrowColors = cols_multi, groupNames = groups_multi)

Get an annotation data frame from biomaRt

Description

Get an annotation data frame from biomaRt

Usage

get_annotation(dds, biomart_dataset, idtype)
get_annotation(dds, biomart_dataset, idtype)

Arguments

`dds`	A `DESeq2::DESeqDataSet()` object
`biomart_dataset`	A biomaRt dataset to use. To see the list, type `mart = useMart('ensembl')`, followed by `listDatasets(mart)`.
`idtype`	Character, the ID type of the genes as in the row names of `dds`, to be used for the call to `biomaRt::getBM()`

Value

A data frame for ready use in pcaExplorer, retrieved from biomaRt.

Examples

library("airway")
data("airway", package = "airway")
airway
dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway),
                                             colData = colData(airway),
                                             design = ~dex+cell)
## Not run: 
get_annotation(dds_airway, "hsapiens_gene_ensembl", "ensembl_gene_id")

## End(Not run)
library("airway")
data("airway", package = "airway")
airway
dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway),
                                             colData = colData(airway),
                                             design = ~dex+cell)
## Not run: 
get_annotation(dds_airway, "hsapiens_gene_ensembl", "ensembl_gene_id")

## End(Not run)

Get an annotation data frame from org db packages

Description

Get an annotation data frame from org db packages

Usage

get_annotation_orgdb(dds, orgdb_species, idtype, key_for_genenames = "SYMBOL")
get_annotation_orgdb(dds, orgdb_species, idtype, key_for_genenames = "SYMBOL")

Arguments

`dds`	A `DESeq2::DESeqDataSet()` object
`orgdb_species`	Character string, named as the `org.XX.eg.db` package which should be available in Bioconductor
`idtype`	Character, the ID type of the genes as in the row names of `dds`, to be used for the call to `AnnotationDbi::mapIds()`
`key_for_genenames`	Character, corresponding to the column name for the key in the orgDb package containing the official gene name (often called gene symbol). This parameter defaults to "SYMBOL", but can be adjusted in case the key is not found in the annotation package (e.g. for `org.Sc.sgd.db`).

Value

A data frame for ready use in pcaExplorer, retrieved from the org db packages

Examples

library("airway")
data("airway", package = "airway")
airway
dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway),
                                             colData = colData(airway),
                                             design = ~dex+cell)
anno_df <- get_annotation_orgdb(dds_airway, "org.Hs.eg.db", "ENSEMBL")
head(anno_df)
library("airway")
data("airway", package = "airway")
airway
dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway),
                                             colData = colData(airway),
                                             design = ~dex+cell)
anno_df <- get_annotation_orgdb(dds_airway, "org.Hs.eg.db", "ENSEMBL")
head(anno_df)

Extract genes with highest loadings

Description

Extract genes with highest loadings

Usage

hi_loadings(
  pcaobj,
  whichpc = 1,
  topN = 10,
  exprTable = NULL,
  annotation = NULL,
  title = "Top/bottom loadings"
)
hi_loadings(
  pcaobj,
  whichpc = 1,
  topN = 10,
  exprTable = NULL,
  annotation = NULL,
  title = "Top/bottom loadings"
)

Arguments

`pcaobj`	A `prcomp` object
`whichpc`	An integer number, corresponding to the principal component of interest
`topN`	Integer, number of genes with top and bottom loadings
`exprTable`	A `matrix` object, e.g. the counts of a `DESeq2::DESeqDataSet()`. If not NULL, returns the counts matrix for the selected genes
`annotation`	A `data.frame` object, with row.names as gene identifiers (e.g. ENSEMBL ids) and a column, `gene_name`, containing e.g. HGNC-based gene symbols
`title`	The title of the plot

Value

A ggplot2 object, or a matrix, if exprTable is not null

Examples

dds <- makeExampleDESeqDataSet_multifac(betaSD = 3, betaSD_tissue = 1)
rlt <- DESeq2::rlogTransformation(dds)
pcaobj <- prcomp(t(SummarizedExperiment::assay(rlt)))
hi_loadings(pcaobj, topN = 20)
hi_loadings(pcaobj, topN = 10, exprTable = dds)
hi_loadings(pcaobj, topN = 10, exprTable = counts(dds))

dds <- makeExampleDESeqDataSet_multifac(betaSD = 3, betaSD_tissue = 1)
rlt <- DESeq2::rlogTransformation(dds)
pcaobj <- prcomp(t(SummarizedExperiment::assay(rlt)))
hi_loadings(pcaobj, topN = 20)
hi_loadings(pcaobj, topN = 10, exprTable = dds)
hi_loadings(pcaobj, topN = 10, exprTable = counts(dds))

Functional interpretation of the principal components, based on simple overrepresentation analysis

Description

Extracts the genes with the highest loadings for each principal component, and performs functional enrichment analysis on them using the simple and quick routine provided by the limma package

Usage

limmaquickpca2go(
  se,
  pca_ngenes = 10000,
  inputType = "ENSEMBL",
  organism = "Mm",
  loadings_ngenes = 500,
  background_genes = NULL,
  scale = FALSE,
  ...
)
limmaquickpca2go(
  se,
  pca_ngenes = 10000,
  inputType = "ENSEMBL",
  organism = "Mm",
  loadings_ngenes = 500,
  background_genes = NULL,
  scale = FALSE,
  ...
)

Arguments

`se`	A `DESeq2::DESeqTransform()` object, with data in `assay(se)`, produced for example by either `DESeq2::rlog()` or `DESeq2::varianceStabilizingTransformation()`
`pca_ngenes`	Number of genes to use for the PCA
`inputType`	Input format type of the gene identifiers. Deafults to `ENSEMBL`, that then will be converted to ENTREZ ids. Can assume values such as `ENTREZID`,`GENENAME` or `SYMBOL`, like it is normally used with the `select` function of `AnnotationDbi`
`organism`	Character abbreviation for the species, using `org.XX.eg.db` for annotation
`loadings_ngenes`	Number of genes to extract the loadings (in each direction)
`background_genes`	Which genes to consider as background.
`scale`	Logical, defaults to FALSE, scale values for the PCA
`...`	Further parameters to be passed to the goana routine

Value

A nested list object containing for each principal component the terms enriched in each direction. This object is to be thought in combination with the displaying feature of the main pcaExplorer() function

Examples

library("airway")
library("DESeq2")
library("limma")
data("airway", package = "airway")
airway
dds_airway <- DESeqDataSet(airway, design = ~ cell + dex)
## Not run: 
rld_airway <- rlogTransformation(dds_airway)
goquick_airway <- limmaquickpca2go(rld_airway,
                                   pca_ngenes = 10000,
                                   inputType = "ENSEMBL",
                                   organism = "Hs")

## End(Not run)

library("airway")
library("DESeq2")
library("limma")
data("airway", package = "airway")
airway
dds_airway <- DESeqDataSet(airway, design = ~ cell + dex)
## Not run: 
rld_airway <- rlogTransformation(dds_airway)
goquick_airway <- limmaquickpca2go(rld_airway,
                                   pca_ngenes = 10000,
                                   inputType = "ENSEMBL",
                                   organism = "Hs")

## End(Not run)

Make a simulated DESeqDataSet for two or more experimental factors

Description

Constructs a simulated dataset of Negative Binomial data from different conditions. The fold changes between the conditions can be adjusted with the betaSD_condition and the betaSD_tissue arguments.

Usage

makeExampleDESeqDataSet_multifac(
  n = 1000,
  m = 12,
  betaSD_condition = 1,
  betaSD_tissue = 3,
  interceptMean = 4,
  interceptSD = 2,
  dispMeanRel = function(x) 4/x + 0.1,
  sizeFactors = rep(1, m)
)
makeExampleDESeqDataSet_multifac(
  n = 1000,
  m = 12,
  betaSD_condition = 1,
  betaSD_tissue = 3,
  interceptMean = 4,
  interceptSD = 2,
  dispMeanRel = function(x) 4/x + 0.1,
  sizeFactors = rep(1, m)
)

Arguments

`n`	number of rows (genes)
`m`	number of columns (samples)
`betaSD_condition`	the standard deviation for condition betas, i.e. beta ~ N(0,betaSD)
`betaSD_tissue`	the standard deviation for tissue betas, i.e. beta ~ N(0,betaSD)
`interceptMean`	the mean of the intercept betas (log2 scale)
`interceptSD`	the standard deviation of the intercept betas (log2 scale)
`dispMeanRel`	a function specifying the relationship of the dispersions on `2^trueIntercept`
`sizeFactors`	multiplicative factors for each sample

Details

This function is designed and inspired following the proposal of DESeq2::makeExampleDESeqDataSet() from the DESeq2 package. Credits are given to Mike Love for the nice initial implementation

Value

a DESeq2::DESeqDataSet() with true dispersion, intercept for two factors (condition and tissue) and beta values in the metadata columns. Note that the true betas are provided on the log2 scale.

Examples

dds <- makeExampleDESeqDataSet_multifac(betaSD_condition = 3, betaSD_tissue = 1)
dds
dds2 <- makeExampleDESeqDataSet_multifac(betaSD_condition = 1, betaSD_tissue = 4)
dds2

dds <- makeExampleDESeqDataSet_multifac(betaSD_condition = 3, betaSD_tissue = 1)
dds
dds2 <- makeExampleDESeqDataSet_multifac(betaSD_condition = 1, betaSD_tissue = 4)
dds2

Pairwise scatter and correlation plot of counts

Description

Pairwise scatter and correlation plot of counts

Usage

pair_corr(df, log = FALSE, method = "pearson", use_subset = TRUE)
pair_corr(df, log = FALSE, method = "pearson", use_subset = TRUE)

Arguments

`df`	A data frame, containing the (raw/normalized/transformed) counts
`log`	Logical, whether to convert the input values to log2 (with addition of a pseudocount). Defaults to FALSE.
`method`	Character string, one of `pearson` (default), `kendall`, or `spearman` as in `cor`
`use_subset`	Logical value. If TRUE, only 1000 values per sample will be used to speed up the plotting operations.

Value

A plot with pairwise scatter plots and correlation coefficients

Examples

library("airway")
data("airway", package = "airway")
airway
dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway),
                                             colData = colData(airway),
                                             design = ~dex+cell)
pair_corr(counts(dds_airway)[1:100, ]) # use just a subset for the example
library("airway")
data("airway", package = "airway")
airway
dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway),
                                             colData = colData(airway),
                                             design = ~dex+cell)
pair_corr(counts(dds_airway)[1:100, ]) # use just a subset for the example

Functional interpretation of the principal components

Description

Extracts the genes with the highest loadings for each principal component, and performs functional enrichment analysis on them using routines and algorithms from the topGO package

Usage

pca2go(
  se,
  pca_ngenes = 10000,
  annotation = NULL,
  inputType = "geneSymbol",
  organism = "Mm",
  ensToGeneSymbol = FALSE,
  loadings_ngenes = 500,
  background_genes = NULL,
  scale = FALSE,
  return_ranked_gene_loadings = FALSE,
  annopkg = NULL,
  ...
)
pca2go(
  se,
  pca_ngenes = 10000,
  annotation = NULL,
  inputType = "geneSymbol",
  organism = "Mm",
  ensToGeneSymbol = FALSE,
  loadings_ngenes = 500,
  background_genes = NULL,
  scale = FALSE,
  return_ranked_gene_loadings = FALSE,
  annopkg = NULL,
  ...
)

Arguments

`se`	A `DESeq2::DESeqTransform()` object, with data in `assay(se)`, produced for example by either `DESeq2::rlog()` or `DESeq2::varianceStabilizingTransformation()`
`pca_ngenes`	Number of genes to use for the PCA
`annotation`	A `data.frame` object, with row.names as gene identifiers (e.g. ENSEMBL ids) and a column, `gene_name`, containing e.g. HGNC-based gene symbols
`inputType`	Input format type of the gene identifiers. Will be used by the routines of `topGO`
`organism`	Character abbreviation for the species, using `org.XX.eg.db` for annotation
`ensToGeneSymbol`	Logical, whether to expect ENSEMBL gene identifiers, to convert to gene symbols with the `annotation` provided
`loadings_ngenes`	Number of genes to extract the loadings (in each direction)
`background_genes`	Which genes to consider as background.
`scale`	Logical, defaults to FALSE, scale values for the PCA
`return_ranked_gene_loadings`	Logical, defaults to FALSE. If TRUE, simply returns a list containing the top ranked genes with hi loadings in each PC and in each direction
`annopkg`	String containing the name of the organism annotation package. Can be used to override the `organism` parameter, e.g. in case of alternative identifiers used in the annotation package (Arabidopsis with TAIR)
`...`	Further parameters to be passed to the topGO routine

Value

Examples

library("airway")
library("DESeq2")
data("airway", package = "airway")
airway
dds_airway <- DESeqDataSet(airway, design= ~ cell + dex)
## Not run: 
rld_airway <- rlogTransformation(dds_airway)
# constructing the annotation object
anno_df <- data.frame(gene_id = rownames(dds_airway),
                      stringsAsFactors = FALSE)
library("AnnotationDbi")
library("org.Hs.eg.db")
anno_df$gene_name <- mapIds(org.Hs.eg.db,
                            keys = anno_df$gene_id,
                            column = "SYMBOL",
                            keytype = "ENSEMBL",
                            multiVals = "first")
rownames(anno_df) <- anno_df$gene_id
bg_ids <- rownames(dds_airway)[rowSums(counts(dds_airway)) > 0]
library(topGO)
pca2go_airway <- pca2go(rld_airway,
                        annotation = anno_df,
                        organism = "Hs",
                        ensToGeneSymbol = TRUE,
                        background_genes = bg_ids)

## End(Not run)

library("airway")
library("DESeq2")
data("airway", package = "airway")
airway
dds_airway <- DESeqDataSet(airway, design= ~ cell + dex)
## Not run: 
rld_airway <- rlogTransformation(dds_airway)
# constructing the annotation object
anno_df <- data.frame(gene_id = rownames(dds_airway),
                      stringsAsFactors = FALSE)
library("AnnotationDbi")
library("org.Hs.eg.db")
anno_df$gene_name <- mapIds(org.Hs.eg.db,
                            keys = anno_df$gene_id,
                            column = "SYMBOL",
                            keytype = "ENSEMBL",
                            multiVals = "first")
rownames(anno_df) <- anno_df$gene_id
bg_ids <- rownames(dds_airway)[rowSums(counts(dds_airway)) > 0]
library(topGO)
pca2go_airway <- pca2go(rld_airway,
                        annotation = anno_df,
                        organism = "Hs",
                        ensToGeneSymbol = TRUE,
                        background_genes = bg_ids)

## End(Not run)

Explore a dataset from a PCA perspective

Description

Launch a Shiny App for interactive exploration of a dataset from the perspective of Principal Components Analysis

Usage

pcaExplorer(
  dds = NULL,
  dst = NULL,
  countmatrix = NULL,
  coldata = NULL,
  pca2go = NULL,
  annotation = NULL,
  runLocal = TRUE
)
pcaExplorer(
  dds = NULL,
  dst = NULL,
  countmatrix = NULL,
  coldata = NULL,
  pca2go = NULL,
  annotation = NULL,
  runLocal = TRUE
)

Arguments

`dds`	A `DESeq2::DESeqDataSet()` object. If not provided, then a `countmatrix` and a `coldata` need to be provided. If none of the above is provided, it is possible to upload the data during the execution of the Shiny App
`dst`	A `DESeq2::DESeqTransform()` object. Can be computed from the `dds` object if left NULL. If none is provided, then a `countmatrix` and a `coldata` need to be provided. If none of the above is provided, it is possible to upload the data during the execution of the Shiny App
`countmatrix`	A count matrix, with genes as rows and samples as columns. If not provided, it is possible to upload the data during the execution of the Shiny App
`coldata`	A data.frame containing the info on the covariates of each sample. If not provided, it is possible to upload the data during the execution of the Shiny App
`pca2go`	An object generated by the `pca2go()` function, which contains the information on enriched functional categories in the genes that show the top or bottom loadings in each principal component of interest. If not provided, it is possible to compute live during the execution of the Shiny App
`annotation`	A `data.frame` object, with row.names as gene identifiers (e.g. ENSEMBL ids) and a column, `gene_name`, containing e.g. HGNC-based gene symbols
`runLocal`	A logical indicating whether the app is to be run locally or remotely on a server, which determines how documentation will be accessed.

Value

A Shiny App is launched for interactive data exploration

Examples

library("airway")
data("airway", package = "airway")
airway
dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway),
                                             colData = colData(airway),
                                             design = ~dex+cell)
## Not run: 
rld_airway <- DESeq2::rlogTransformation(dds_airway)

pcaExplorer(dds_airway, rld_airway)

pcaExplorer(countmatrix = counts(dds_airway), coldata = colData(dds_airway))

pcaExplorer() # and then upload count matrix, covariate matrix (and eventual annotation)

## End(Not run)

library("airway")
data("airway", package = "airway")
airway
dds_airway <- DESeq2::DESeqDataSetFromMatrix(assay(airway),
                                             colData = colData(airway),
                                             design = ~dex+cell)
## Not run: 
rld_airway <- DESeq2::rlogTransformation(dds_airway)

pcaExplorer(dds_airway, rld_airway)

pcaExplorer(countmatrix = counts(dds_airway), coldata = colData(dds_airway))

pcaExplorer() # and then upload count matrix, covariate matrix (and eventual annotation)

## End(Not run)

pcaExplorer: analyzing time-lapse microscopy imaging, from detection to tracking

Description

pcaExplorer provides functionality for interactive visualization of RNA-seq datasets based on Principal Components Analysis. The methods provided allow for quick information extraction and effective data exploration. A Shiny application encapsulates the whole analysis.

Details

Author(s)

Federico Marini [email protected], 2016

Maintainer: Federico Marini [email protected]

Sample PCA plot for transformed data

Description

Plots the results of PCA on a 2-dimensional space

Usage

pcaplot(
  x,
  intgroup = NULL,
  ntop = 500,
  returnData = FALSE,
  title = NULL,
  pcX = 1,
  pcY = 2,
  text_labels = TRUE,
  point_size = 3,
  ellipse = TRUE,
  ellipse.prob = 0.95
)
pcaplot(
  x,
  intgroup = NULL,
  ntop = 500,
  returnData = FALSE,
  title = NULL,
  pcX = 1,
  pcY = 2,
  text_labels = TRUE,
  point_size = 3,
  ellipse = TRUE,
  ellipse.prob = 0.95
)

Arguments

`x`	A `DESeq2::DESeqTransform()` object, with data in `assay(x)`, produced for example by either `DESeq2::rlog()` or `DESeq2::varianceStabilizingTransformation()`/`DESeq2::vst()`
`intgroup`	Interesting groups: a character vector of names in `colData(x)` to use for grouping. Defaults to NULL, which would then select the first column of the `colData` slot
`ntop`	Number of top genes to use for principal components, selected by highest row variance
`returnData`	logical, if TRUE returns a data.frame for further use, containing the selected principal components and intgroup covariates for custom plotting
`title`	The plot title
`pcX`	The principal component to display on the x axis
`pcY`	The principal component to display on the y axis
`text_labels`	Logical, whether to display the labels with the sample identifiers
`point_size`	Integer, the size of the points for the samples
`ellipse`	Logical, whether to display the confidence ellipse for the selected groups
`ellipse.prob`	Numeric, a value in the interval [0;1)

Value

An object created by ggplot, which can be assigned and further customized.

Examples

dds <- makeExampleDESeqDataSet_multifac(betaSD_condition = 3, betaSD_tissue = 1)
rlt <- DESeq2::rlogTransformation(dds)
pcaplot(rlt, ntop = 200)

dds <- makeExampleDESeqDataSet_multifac(betaSD_condition = 3, betaSD_tissue = 1)
rlt <- DESeq2::rlogTransformation(dds)
pcaplot(rlt, ntop = 200)

Sample PCA plot for transformed data

Description

Plots the results of PCA on a 3-dimensional space, interactively

Usage

pcaplot3d(
  x,
  intgroup = "condition",
  ntop = 500,
  returnData = FALSE,
  title = NULL,
  pcX = 1,
  pcY = 2,
  pcZ = 3,
  text_labels = TRUE,
  point_size = 3
)
pcaplot3d(
  x,
  intgroup = "condition",
  ntop = 500,
  returnData = FALSE,
  title = NULL,
  pcX = 1,
  pcY = 2,
  pcZ = 3,
  text_labels = TRUE,
  point_size = 3
)

Arguments

`x`	A `DESeq2::DESeqTransform()` object, with data in `assay(x)`, produced for example by either `DESeq2::rlog()` or `DESeq2::varianceStabilizingTransformation()`
`intgroup`	Interesting groups: a character vector of names in `colData(x)` to use for grouping
`ntop`	Number of top genes to use for principal components, selected by highest row variance
`returnData`	logical, if TRUE returns a data.frame for further use, containing the selected principal components and intgroup covariates for custom plotting
`title`	The plot title
`pcX`	The principal component to display on the x axis
`pcY`	The principal component to display on the y axis
`pcZ`	The principal component to display on the z axis
`text_labels`	Logical, whether to display the labels with the sample identifiers
`point_size`	Integer, the size of the points for the samples

Value

A html-based visualization of the 3d PCA plot

Examples

dds <- makeExampleDESeqDataSet_multifac(betaSD_condition = 3, betaSD_tissue = 1)
rlt <- DESeq2::rlogTransformation(dds)
pcaplot3d(rlt, ntop = 200)
dds <- makeExampleDESeqDataSet_multifac(betaSD_condition = 3, betaSD_tissue = 1)
rlt <- DESeq2::rlogTransformation(dds)
pcaplot3d(rlt, ntop = 200)

Scree plot of the PCA on the samples

Description

Produces a scree plot for investigating the proportion of explained variance, or alternatively the cumulative value

Usage

pcascree(obj, type = c("pev", "cev"), pc_nr = NULL, title = NULL)
pcascree(obj, type = c("pev", "cev"), pc_nr = NULL, title = NULL)

Arguments

`obj`	A `prcomp` object
`type`	Display absolute proportions or cumulative proportion. Possible values: "pev" or "cev"
`pc_nr`	How many principal components to display max
`title`	Title of the plot

Value

An object created by ggplot, which can be assigned and further customized.

Examples

dds <- makeExampleDESeqDataSet_multifac(betaSD_condition = 3, betaSD_tissue = 1)
rlt <- DESeq2::rlogTransformation(dds)
pcaobj <- prcomp(t(SummarizedExperiment::assay(rlt)))
pcascree(pcaobj, type = "pev")
pcascree(pcaobj, type = "cev", title = "Cumulative explained proportion of variance - Test dataset")

dds <- makeExampleDESeqDataSet_multifac(betaSD_condition = 3, betaSD_tissue = 1)
rlt <- DESeq2::rlogTransformation(dds)
pcaobj <- prcomp(t(SummarizedExperiment::assay(rlt)))
pcascree(pcaobj, type = "pev")
pcascree(pcaobj, type = "cev", title = "Cumulative explained proportion of variance - Test dataset")

Plot significance of (cor)relations of covariates VS principal components

Description

Plots the significance of the (cor)relation of each covariate vs a principal component

Usage

plotPCcorrs(pccorrs, pc = 1, logp = TRUE)
plotPCcorrs(pccorrs, pc = 1, logp = TRUE)

Arguments

`pccorrs`	A `data.frame` object generated by correlatePCs
`pc`	An integer number, corresponding to the principal component of interest
`logp`	Logical, defaults to `TRUE`, displays the -`log10` of the pvalue instead of the p value itself

Value

A base plot object

Examples

library(DESeq2)
dds <- makeExampleDESeqDataSet_multifac(betaSD_condition = 3, betaSD_tissue = 1)
rlt <- rlogTransformation(dds)
pcaobj <- prcomp(t(assay(rlt)))
res <- correlatePCs(pcaobj, colData(dds))
plotPCcorrs(res)

library(DESeq2)
dds <- makeExampleDESeqDataSet_multifac(betaSD_condition = 3, betaSD_tissue = 1)
rlt <- rlogTransformation(dds)
pcaobj <- prcomp(t(assay(rlt)))
res <- correlatePCs(pcaobj, colData(dds))
plotPCcorrs(res)

Extract functional terms enriched in the DE genes, based on topGO

Description

A wrapper for extracting functional GO terms enriched in the DE genes, based on the algorithm and the implementation in the topGO package

Usage

topGOtable(
  DEgenes,
  BGgenes,
  ontology = "BP",
  annot = annFUN.org,
  mapping = "org.Mm.eg.db",
  geneID = "symbol",
  topTablerows = 200,
  fullNamesInRows = TRUE,
  addGeneToTerms = TRUE,
  plotGraph = FALSE,
  plotNodes = 10,
  writeOutput = FALSE,
  outputFile = "",
  topGO_method2 = "elim",
  do_padj = FALSE
)
topGOtable(
  DEgenes,
  BGgenes,
  ontology = "BP",
  annot = annFUN.org,
  mapping = "org.Mm.eg.db",
  geneID = "symbol",
  topTablerows = 200,
  fullNamesInRows = TRUE,
  addGeneToTerms = TRUE,
  plotGraph = FALSE,
  plotNodes = 10,
  writeOutput = FALSE,
  outputFile = "",
  topGO_method2 = "elim",
  do_padj = FALSE
)

Arguments

`DEgenes`	A vector of (differentially expressed) genes
`BGgenes`	A vector of background genes, e.g. all (expressed) genes in the assays
`ontology`	Which Gene Ontology domain to analyze: `BP` (Biological Process), `MF` (Molecular Function), or `CC` (Cellular Component)
`annot`	Which function to use for annotating genes to GO terms. Defaults to `annFUN.org`
`mapping`	Which `org.XX.eg.db` to use for annotation - select according to the species
`geneID`	Which format the genes are provided. Defaults to `symbol`, could also be `entrez` or `ENSEMBL`
`topTablerows`	How many rows to report before any filtering
`fullNamesInRows`	Logical, whether to display or not the full names for the GO terms
`addGeneToTerms`	Logical, whether to add a column with all genes annotated to each GO term
`plotGraph`	Logical, if TRUE additionally plots a graph on the identified GO terms
`plotNodes`	Number of nodes to plot
`writeOutput`	Logical, if TRUE additionally writes out the result to a file
`outputFile`	Name of the file the result should be written into
`topGO_method2`	Character, specifying which of the methods implemented by `topGO` should be used, in addition to the `classic` algorithm. Defaults to `elim`
`do_padj`	Logical, whether to perform the adjustment on the p-values from the specific topGO method, based on the FDR correction. Defaults to FALSE, since the assumption of independent hypotheses is somewhat violated by the intrinsic DAG-structure of the Gene Ontology Terms

Details

Allowed values assumed by the topGO_method2 parameter are one of the following: elim, weight, weight01, lea, parentchild. For more details on this, please refer to the original documentation of the topGO package itself

Value

A table containing the computed GO Terms and related enrichment scores

Examples

library("airway")
library("DESeq2")
data("airway", package = "airway")
airway
dds_airway <- DESeqDataSet(airway, design= ~ cell + dex)
# Example, performing extraction of enriched functional categories in
# detected significantly expressed genes
## Not run: 
dds_airway <- DESeq(dds_airway)
res_airway <- results(dds_airway)
library("AnnotationDbi")
library("org.Hs.eg.db")
res_airway$symbol <- mapIds(org.Hs.eg.db,
                            keys = row.names(res_airway),
                            column = "SYMBOL",
                            keytype = "ENSEMBL",
                            multiVals = "first")
res_airway$entrez <- mapIds(org.Hs.eg.db,
                            keys = row.names(res_airway),
                            column = "ENTREZID",
                            keytype = "ENSEMBL",
                            multiVals = "first")
resOrdered <- as.data.frame(res_airway[order(res_airway$padj),])
de_df <- resOrdered[resOrdered$padj < .05 & !is.na(resOrdered$padj),]
de_symbols <- de_df$symbol
bg_ids <- rownames(dds_airway)[rowSums(counts(dds_airway)) > 0]
bg_symbols <- mapIds(org.Hs.eg.db,
                     keys = bg_ids,
                     column = "SYMBOL",
                     keytype = "ENSEMBL",
                     multiVals = "first")
library(topGO)
topgoDE_airway <- topGOtable(de_symbols, bg_symbols,
                             ontology = "BP",
                             mapping = "org.Hs.eg.db",
                             geneID = "symbol")

## End(Not run)

library("airway")
library("DESeq2")
data("airway", package = "airway")
airway
dds_airway <- DESeqDataSet(airway, design= ~ cell + dex)
# Example, performing extraction of enriched functional categories in
# detected significantly expressed genes
## Not run: 
dds_airway <- DESeq(dds_airway)
res_airway <- results(dds_airway)
library("AnnotationDbi")
library("org.Hs.eg.db")
res_airway$symbol <- mapIds(org.Hs.eg.db,
                            keys = row.names(res_airway),
                            column = "SYMBOL",
                            keytype = "ENSEMBL",
                            multiVals = "first")
res_airway$entrez <- mapIds(org.Hs.eg.db,
                            keys = row.names(res_airway),
                            column = "ENTREZID",
                            keytype = "ENSEMBL",
                            multiVals = "first")
resOrdered <- as.data.frame(res_airway[order(res_airway$padj),])
de_df <- resOrdered[resOrdered$padj < .05 & !is.na(resOrdered$padj),]
de_symbols <- de_df$symbol
bg_ids <- rownames(dds_airway)[rowSums(counts(dds_airway)) > 0]
bg_symbols <- mapIds(org.Hs.eg.db,
                     keys = bg_ids,
                     column = "SYMBOL",
                     keytype = "ENSEMBL",
                     multiVals = "first")
library(topGO)
topgoDE_airway <- topGOtable(de_symbols, bg_symbols,
                             ontology = "BP",
                             mapping = "org.Hs.eg.db",
                             geneID = "symbol")

## End(Not run)

Package 'pcaExplorer'

Help Index

Principal components (cor)relation with experimental covariates

Description

Usage

Arguments

Value

Examples

Deprecated functions in pcaExplorer

Description

Arguments

Details

Value

Transitioning to the mosdef framework

Author(s)

Examples

Plot distribution of expression values

Description

Usage

Arguments

Value

Examples

Extract and plot the expression profile of genes

Description

Usage

Arguments

Value

Examples

Principal components analysis on the genes

Description

Usage

Arguments

Details

Value

Examples

Get an annotation data frame from biomaRt

Description

Usage

Arguments

Value

Examples

Get an annotation data frame from org db packages

Description

Usage

Arguments

Value

Examples

Extract genes with highest loadings

Description

Usage

Arguments

Value

Examples

Functional interpretation of the principal components, based on simple overrepresentation analysis

Description

Usage

Arguments

Value

Examples

Make a simulated DESeqDataSet for two or more experimental factors

Description

Usage

Arguments

Details

Value

Examples

Pairwise scatter and correlation plot of counts

Description

Usage

Arguments

Value

Examples

Functional interpretation of the principal components

Description

Usage

Arguments

Value

Examples

Explore a dataset from a PCA perspective

Description