Package 'HybridExpress' reference manual

Title:	Comparative analysis of RNA-seq data for hybrids and their progenitors
Description:	HybridExpress can be used to perform comparative transcriptomics analysis of hybrids (or allopolyploids) relative to their progenitor species. The package features functions to perform exploratory analyses of sample grouping, identify differentially expressed genes in hybrids relative to their progenitors, classify genes in expression categories (N = 12) and classes (N = 5), and perform functional analyses. We also provide users with graphical functions for the seamless creation of publication-ready figures that are commonly used in the literature.
Authors:	Fabricio Almeida-Silva [aut, cre] , Lucas Prost-Boxoen [aut] , Yves Van de Peer [aut]
Maintainer:	Fabricio Almeida-Silva <[email protected]>
License:	GPL-3
Version:	1.3.0
Built:	2025-03-20 05:33:26 UTC
Source:	https://github.com/bioc/HybridExpress

Add midparent expression to `SummarizedExperiment` object

Description

Add midparent expression to SummarizedExperiment object

Usage

add_midparent_expression(
  se,
  coldata_column = "Generation",
  parent1 = "P1",
  parent2 = "P2",
  method = "mean",
  weights = c(1, 1)
)
add_midparent_expression(
  se,
  coldata_column = "Generation",
  parent1 = "P1",
  parent2 = "P2",
  method = "mean",
  weights = c(1, 1)
)

Arguments

`se`	A `SummarizedExperiment` object with a count matrix and sample metadata.
`coldata_column`	Character indicating the name of column in `colData(se)` where information on the generation are stored. Default: "Generation".
`parent1`	Character indicating which level of the variable coldata_column represents parent 1. Default: "P1".
`parent2`	Character indicating which level of the variable coldata_column represents parent 2. Default: "P2".
`method`	Character indicating the method to use to create midparent values. One of 'mean' (default), 'sum', or 'weightedmean'.
`weights`	Numeric vector of length 2 indicating the weights to give to parents 1 and 2 (respectively) if `method == "weightedmean"`. Setting `method == "weightedmean"` is used sometimes when parents have different ploidy levels. In such cases, the ploidy levels of parents 1 and 2 can be passed in a vector. Default: `c(1, 2)`.

Value

A SummarizedExperiment object.

Examples

data(se_chlamy)
new_se <- add_midparent_expression(se_chlamy)
data(se_chlamy)
new_se <- add_midparent_expression(se_chlamy)

Add size factors to normalize count data by library size or by biomass

Description

Add size factors to normalize count data by library size or by biomass

Usage

add_size_factors(se, spikein = FALSE, spikein_pattern = "ERCC")
add_size_factors(se, spikein = FALSE, spikein_pattern = "ERCC")

Arguments

`se`	A `SummarizedExperiment` object with a count matrix and sample metadata.
`spikein`	Logical indicating whether or not to normalize data using spike-ins. If FALSE, data will be normalized by library size. Default: FALSE.
`spikein_pattern`	Character with the pattern (regex) to use to identify spike-in features in the count matrix. Only valid if `spikein_norm = TRUE`.

Value

A SummarizedExperiment object as in se, but with an extra column in the colData slot named "sizeFactor". This column contains size factors that will be used by DESeq2 when performing differential expression analyses.

Examples

data(se_chlamy)
se_norm <- add_size_factors(se_chlamy)
data(se_chlamy)
se_norm <- add_size_factors(se_chlamy)

Data frame with frequencies (absolute and relative) of DEGs per contrast

Description

This object was obtained with get_deg_counts() using the example data set deg_list.

Usage

data(deg_counts)
data(deg_counts)

Format

A data frame with the frequencies (absolute and relative) of up- and down-regulated genes in each contrast. Relative frequencies are calculated relative to the total number of genes in the count matrix used for differential expression analysis.

Examples

data(deg_counts)
data(deg_counts)

List of differentially expressed genes for all contrasts

Description

This object was obtained with get_deg_list() using the example data set se_chlamy.

Usage

data(deg_list)
data(deg_list)

Format

A list of data frames with gene-wise test statistics for differentially expressed genes for each contrast. Contrasts are "P2_vs_P1", "F1_vs_P1", "F1_vs_P2", and "F1_vs_midparent", where the ID before 'vs' represents the numerator, and the ID after 'vs' represents the denominator.

Examples

data(deg_list)
data(deg_list)

Partition genes in groups based on their expression patterns

Description

Partition genes in groups based on their expression patterns

Usage

expression_partitioning(deg_list)
expression_partitioning(deg_list)

Arguments

deg_list

A list of data frames with gene-wise test statistics for differentially expressed genes as returned by get_deg_list().

Value

A data with the following variables:

Gene: Character, gene ID.
Category: Factor, expression group. Category names are numbers from 1 to 12.
Class: Factor, expression group class. One of "UP" (transgressive up-regulation), "DOWN" (transgressive down-regulation), "ADD" (additivity), "ELD_P1" (expression-level dominance toward the parent 1), or "ELD_P2" (expression-level dominance toward the parent 2).

Examples

data(deg_list)
exp_partitions <- expression_partitioning(deg_list)
data(deg_list)
exp_partitions <- expression_partitioning(deg_list)

Get a count table of differentially expressed genes per contrast

Description

Get a count table of differentially expressed genes per contrast

Usage

get_deg_counts(deg_list)
get_deg_counts(deg_list)

Arguments

deg_list

A list of data frames with gene-wise test statistics for differentially expressed genes as returned by get_deg_list().

Value

A data frame with the following variables:

contrast: Character, contrast name.
up: Numeric, number of up-regulated genes.
down: Numeric, number of down-regulated genes.
total: Numeric, total number of differentially expressed genes.
perc_up: Numeric, percentage of up-regulated genes.
perc_down: Numeric, percentage of down-regulated genes.
perc_total: Numeric, percentage of diffferentially expressed genes.

Examples

data(deg_list)
deg_counts <- get_deg_counts(deg_list)
data(deg_list)
deg_counts <- get_deg_counts(deg_list)

Get a table of differential expression expression statistics with DESeq2

Description

Get a table of differential expression expression statistics with DESeq2

Usage

get_deg_list(
  se,
  coldata_column = "Generation",
  parent1 = "P1",
  parent2 = "P2",
  offspring = "F1",
  midparent = "midparent",
  lfcThreshold = 0,
  alpha = 0.01,
  ...
)
get_deg_list(
  se,
  coldata_column = "Generation",
  parent1 = "P1",
  parent2 = "P2",
  offspring = "F1",
  midparent = "midparent",
  lfcThreshold = 0,
  alpha = 0.01,
  ...
)

Arguments

`se`	A `SummarizedExperiment` object with a count matrix and sample metadata.
`coldata_column`	Character indicating the name of column in `colData(se)` where information on the generation are stored. Default: "Generation".
`parent1`	Character indicating which level of the variable coldata_column represents parent 1. Default: "P1".
`parent2`	Character indicating which level of the variable coldata_column represents parent 2. Default: "P2".
`offspring`	Character indicating which level of the variable coldata_column represents the offspring (hybrid or allopolyploid). Default: "F1"
`midparent`	Character indicating which level of the variable coldata_column represents the midparent value. Default: "midparent", as returned by `add_midparent_expression()`.
`lfcThreshold`	Numeric indicating the log2 fold-change threshold to use to consider differentially expressed genes. Default: 0.
`alpha`	Numeric indicating the adjusted P-value threshold to use to consider differentially expressed genes. Default: 0.01.
`...`	Additional arguments to be passed to `DESeq2::results()`.

Value

A list of data frames with DESeq2's gene-wise tests statistics for each contrast. Each data frame contains the same columns as the output of DESeq2::results(). Contrasts (list names) are:

P2_vs_P1: Parent 2 (numerator) versus parent 1 (denominator).
F1_vs_P1: Offspring (numerator) versus parent 1 (denominator).
F1_vs_P2: Offspring (numerator) versus parent 2 (denominator).
F1_vs_midparent: Offspring (numerator) versus midparent (denominator).

The data frame with gene-wise test statistics in each list element contains the following variables:

baseMean: Numeric, base mean.
log2FoldChange: Numeric, log2-transformed fold changes.
lfcSE: Numeric, standard error of the log2-transformed fold changes.
stat: Numeric, observed test statistic.
pvalue: Numeric, p-value.
padj: Numeric, P-value adjusted for multiple testing.

The list contains two additional attributes named ngenes (numeric, total number of genes), and plotdata, which is a 3-column data frame with variables "gene" (character, gene ID), "lFC_F1_vs_P1" (numeric, log2 fold change between F1 and P1), and "lFC_F1_vs_P2" (numeric, log2 fold change between F1 and P2).

Examples

data(se_chlamy)
se <- add_midparent_expression(se_chlamy)
se <- add_size_factors(se, spikein = TRUE)
deg_list <- get_deg_list(se)
data(se_chlamy)
se <- add_midparent_expression(se_chlamy)
se <- add_size_factors(se, spikein = TRUE)
deg_list <- get_deg_list(se)

Data frame with GO terms annotated to each gene of Chlamydomonas reinhardtii

Description

Data were obtained from Phytozome and processed so that each row contains only one GO term (long format).

Usage

data(go_chlamy)
data(go_chlamy)

Format

A 2-column data frame with columns gene (character, gene ID), and GO (character, name of GO term.)

Examples

data(go_chlamy)
data(go_chlamy)

Perform overrepresentation analysis for a set of genes

Description

Perform overrepresentation analysis for a set of genes

Usage

ora(
  genes,
  annotation,
  column = NULL,
  background,
  correction = "BH",
  alpha = 0.05,
  min_setsize = 5,
  max_setsize = 500,
  bp_param = BiocParallel::SerialParam()
)
ora(
  genes,
  annotation,
  column = NULL,
  background,
  correction = "BH",
  alpha = 0.05,
  min_setsize = 5,
  max_setsize = 500,
  bp_param = BiocParallel::SerialParam()
)

Arguments

`genes`	Character vector containing genes for overrepresentation analysis.
`annotation`	Annotation data frame with genes in the first column and functional annotation in the other columns. This data frame can be exported from Biomart or similar databases.
`column`	Column or columns of annotation to be used for enrichment. Both character or numeric values with column indices can be used. If users want to supply more than one column, input a character or numeric vector. Default: all columns from annotation.
`background`	Character vector of genes to be used as background for the overrepresentation analysis.
`correction`	Multiple testing correction method. One of "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr" or "none". Default is "BH".
`alpha`	Numeric indicating the adjusted P-value threshold for significance. Default: 0.05.
`min_setsize`	Numeric indicating the minimum gene set size to be considered. Gene sets correspond to levels of each variable in annotation). Default: 5.
`max_setsize`	Numeric indicating the maximum gene set size to be considered. Gene sets correspond to levels of each variable in annotation). Default: 500.
`bp_param`	BiocParallel back-end to be used. Default: BiocParallel::SerialParam()

Value

A data frame of overrepresentation results with the following variables:

term: Character, functional term ID/name.
genes: Numeric, intersection length between input genes and genes in a particular functional term.
all: Numeric, number of all genes in a particular functional term.
pval: Numeric, P-value for the hypergeometric test.
padj: Numeric, P-value adjusted for multiple comparisons using the method specified in parameter adj.
category: Character, name of the grouping variable (i.e., column name of annotation).

Examples

data(se_chlamy)
data(go_chlamy)
data(deg_list)

# Perform ORA for up-regulated genes in contrast F1_vs_P1
up_genes <- deg_list$F1_vs_P1
up_genes <- rownames(up_genes[up_genes$log2FoldChange > 0, ])

background <- rownames(se_chlamy)

ora(up_genes, go_chlamy, background = background)

data(se_chlamy)
data(go_chlamy)
data(deg_list)

# Perform ORA for up-regulated genes in contrast F1_vs_P1
up_genes <- deg_list$F1_vs_P1
up_genes <- rownames(up_genes[up_genes$log2FoldChange > 0, ])

background <- rownames(se_chlamy)

ora(up_genes, go_chlamy, background = background)

Perform a principal component analysis (PCA) and plot PCs

Description

Perform a principal component analysis (PCA) and plot PCs

Usage

pca_plot(
  se,
  PCs = c(1, 2),
  ntop = 500,
  color_by = NULL,
  shape_by = NULL,
  add_mean = FALSE,
  palette = NULL
)
pca_plot(
  se,
  PCs = c(1, 2),
  ntop = 500,
  color_by = NULL,
  shape_by = NULL,
  add_mean = FALSE,
  palette = NULL
)

Arguments

`se`	A `SummarizedExperiment` object with a count matrix and sample metadata.
`PCs`	Numeric vector indicating which principal components to show in the x-axis and y-axis, respectively. Default: `c(1,2)`.
`ntop`	Numeric indicating the number of top genes with the highest variances to use for the PCA. Default: 500.
`color_by`	Character with the name of the column in `colData(se)` to use to group samples by color. Default: NULL.
`shape_by`	Character with the name of the column in `colData(se)` to use to group samples by shape. Default: NULL.
`add_mean`	Logical indicating whether to add a diamond symbol with the mean value for each level of the variable indicated in color_by. Default: FALSE
`palette`	Character vector with colors to use for each level of the variable indicated in color_by. If NULL, a default color palette will be used.

Value

A ggplot object with a PCA plot showing 2 principal components in each axis along with their % of variance explained.

Examples

data(se_chlamy)
se <- add_midparent_expression(se_chlamy)
se$Ploidy[is.na(se$Ploidy)] <- "midparent"
se$Generation[is.na(se$Generation)] <- "midparent"
pca_plot(se, color_by = "Generation", shape_by = "Ploidy", add_mean = TRUE)
data(se_chlamy)
se <- add_midparent_expression(se_chlamy)
se$Ploidy[is.na(se$Ploidy)] <- "midparent"
se$Generation[is.na(se$Generation)] <- "midparent"
pca_plot(se, color_by = "Generation", shape_by = "Ploidy", add_mean = TRUE)

Plot expression partitions

Description

Plot expression partitions

Usage

plot_expression_partitions(
  partition_table,
  group_by = "Category",
  palette = NULL,
  labels = c("P1", "F1", "P2")
)
plot_expression_partitions(
  partition_table,
  group_by = "Category",
  palette = NULL,
  labels = c("P1", "F1", "P2")
)

Arguments

`partition_table`	A data frame with genes per expression partition as returned by `expression_partitioning()`.
`group_by`	Character indicating the name of the variable in partition_table to use to group genes. One of "Category" or "Class". Default: "Category".
`palette`	Character vector with color names to be used for each level of the variable specified in group_by. If group_by = "Category", this must be a vector of length 12. If group_by = "Class", this must be a vector of length 5. If NULL, a default color palette will be used.
`labels`	A character vector of length 3 indicating the labels to be given for parent 1, offspring, and parent 2. Default: `c("P1", "F1", "P2")`.

Value

A ggplot object with a plot showing genes in each expression partition.

Examples

data(deg_list)
partition_table <- expression_partitioning(deg_list)
plot_expression_partitions(partition_table)
data(deg_list)
partition_table <- expression_partitioning(deg_list)
plot_expression_partitions(partition_table)

Plot a triangle of comparisons of DEG sets among generations

Description

Plot a triangle of comparisons of DEG sets among generations

Usage

plot_expression_triangle(deg_counts, palette = NULL, box_labels = NULL)
plot_expression_triangle(deg_counts, palette = NULL, box_labels = NULL)

Arguments

`deg_counts`	Data frame with number of differentially expressed genes per contrast as returned by `get_deg_counts`.
`palette`	Character vector of length 4 indicating the colors of the boxes for P1, P2, F1, and midparent, respectively. If NULL, a default color palette will be used.
`box_labels`	Character vector of length 4 indicating the labels of the boxes for P1, P2, F1, and midparent, respectively. Default: NULL, which will lead to labels "P1", "P2", "F1", and "Midparent", respectively.

Details

The expression triangle plot shows the number of differentially expressed genes (DEGs) for each contrast. Numbers in the center of the lines (in bold) indicate the total number of DEGs, while numbers near boxes indicate the number of up-regulated genes in each generation of the triangle.

Value

A ggplot object with an expression triangle.

Examples

data(deg_counts)
plot_expression_triangle(deg_counts)
data(deg_counts)
plot_expression_triangle(deg_counts)

Plot a barplot of gene frequencies per expression partition

Description

Plot a barplot of gene frequencies per expression partition

Usage

plot_partition_frequencies(
  partition_table,
  group_by = "Category",
  palette = NULL,
  labels = c("P1", "F1", "P2")
)
plot_partition_frequencies(
  partition_table,
  group_by = "Category",
  palette = NULL,
  labels = c("P1", "F1", "P2")
)

Arguments

`partition_table`	A data frame with genes per expression partition as returned by `expression_partitioning()`.
`group_by`	Character indicating the name of the variable in partition_table to use to group genes. One of "Category" or "Class". Default: "Category".
`palette`	Character vector with color names to be used for each level of the variable specified in group_by. If group_by = "Category", this must be a vector of length 12. If group_by = "Class", this must be a vector of length 5. If NULL, a default color palette will be used.
`labels`	A character vector of length 3 indicating the labels to be given for parent 1, offspring, and parent 2. Default: `c("P1", "F1", "P2")`.

Value

A ggplot object with a barplot showing gene frequencies per partition next to explanatory line plots depicting each partition.

Examples

data(deg_list)
partition_table <- expression_partitioning(deg_list)
plot_partition_frequencies(partition_table)
data(deg_list)
partition_table <- expression_partitioning(deg_list)
plot_partition_frequencies(partition_table)

Plot a heatmap of pairwise sample correlations with hierarchical clustering

Description

Plot a heatmap of pairwise sample correlations with hierarchical clustering

Usage

plot_samplecor(
  se,
  coldata_cols = NULL,
  rowdata_cols = NULL,
  ntop = 500,
  cor_method = "pearson",
  palette = "Blues",
  ...
)
plot_samplecor(
  se,
  coldata_cols = NULL,
  rowdata_cols = NULL,
  ntop = 500,
  cor_method = "pearson",
  palette = "Blues",
  ...
)

Arguments

`se`	A `SummarizedExperiment` object with a count matrix and sample metadata in the colData slot. If a rowData slot is available, it can also be used for clustering rows.
`coldata_cols`	A vector (either numeric or character) indicating which columns should be extracted from colData(se).
`rowdata_cols`	A vector (either numeric or character) indicating which columns should be extracted from rowData(se).
`ntop`	Numeric indicating the number of top genes with the highest variances to use for the PCA. Default: 500.
`cor_method`	Character indicating the correlation method to use. One of "pearson" or "spearman". Default: "pearson".
`palette`	Character indicating the name of the color palette from the RColorBrewer package to use. Default: "Blues".
`...`	Additional arguments to be passed to `ComplexHeatmap::pheatmap()`. These arguments can be used to control heatmap aesthetics, such as show/hide row and column names, change font size, activate/deactivate hierarchical clustering, etc. For a complete list of the options, see `?ComplexHeatmap::pheatmap()`.

Value

A heatmap of hierarchically clustered pairwise sample correlations.

Examples

data(se_chlamy)
se <- add_midparent_expression(se_chlamy)
se$Ploidy[is.na(se$Ploidy)] <- "midparent"
se$Generation[is.na(se$Generation)] <- "midparent"
plot_samplecor(se, ntop = 500)
data(se_chlamy)
se <- add_midparent_expression(se_chlamy)
se$Ploidy[is.na(se$Ploidy)] <- "midparent"
se$Generation[is.na(se$Generation)] <- "midparent"
plot_samplecor(se, ntop = 500)

Expression data (in counts) for 3 Chlamydomonas lines (P1, P2, and F1)

Description

Two lines (referred to as parent 1 and parent 2) with different ploidy levels were crossed to generate an allopolyploid (F1).

Usage

data(se_chlamy)
data(se_chlamy)

Format

A SummarizedExperiment object with an assay (count) and colData.

Examples

data(se_chlamy)
data(se_chlamy)

Package 'HybridExpress'

Help Index

Add midparent expression to SummarizedExperiment object

Description

Usage

Arguments

Value

Examples

Add size factors to normalize count data by library size or by biomass

Description

Usage

Arguments

Value

Examples

Data frame with frequencies (absolute and relative) of DEGs per contrast

Description

Usage

Format

Examples

List of differentially expressed genes for all contrasts

Description

Usage

Format

Examples

Partition genes in groups based on their expression patterns

Description

Usage

Arguments

Value

Examples

Get a count table of differentially expressed genes per contrast

Description

Usage

Arguments

Value

Examples

Get a table of differential expression expression statistics with DESeq2

Description

Usage

Arguments

Value

Examples

Data frame with GO terms annotated to each gene of Chlamydomonas reinhardtii

Description

Usage

Format

Examples

Perform overrepresentation analysis for a set of genes

Description

Usage

Arguments

Value

Examples

Perform a principal component analysis (PCA) and plot PCs

Description

Usage

Arguments

Value

Examples

Plot expression partitions

Description

Usage

Arguments

Value

Examples

Plot a triangle of comparisons of DEG sets among generations

Description

Usage

Arguments

Details

Value

Examples

Plot a barplot of gene frequencies per expression partition

Description

Usage

Arguments

Value

Examples

Plot a heatmap of pairwise sample correlations with hierarchical clustering

Description

Add midparent expression to `SummarizedExperiment` object