Title: | Comparative analysis of RNA-seq data for hybrids and their progenitors |
---|---|
Description: | HybridExpress can be used to perform comparative transcriptomics analysis of hybrids (or allopolyploids) relative to their progenitor species. The package features functions to perform exploratory analyses of sample grouping, identify differentially expressed genes in hybrids relative to their progenitors, classify genes in expression categories (N = 12) and classes (N = 5), and perform functional analyses. We also provide users with graphical functions for the seamless creation of publication-ready figures that are commonly used in the literature. |
Authors: | Fabricio Almeida-Silva [aut, cre] , Lucas Prost-Boxoen [aut] , Yves Van de Peer [aut] |
Maintainer: | Fabricio Almeida-Silva <[email protected]> |
License: | GPL-3 |
Version: | 1.3.0 |
Built: | 2024-11-20 06:23:03 UTC |
Source: | https://github.com/bioc/HybridExpress |
SummarizedExperiment
objectAdd midparent expression to SummarizedExperiment
object
add_midparent_expression( se, coldata_column = "Generation", parent1 = "P1", parent2 = "P2", method = "mean", weights = c(1, 1) )
add_midparent_expression( se, coldata_column = "Generation", parent1 = "P1", parent2 = "P2", method = "mean", weights = c(1, 1) )
se |
A |
coldata_column |
Character indicating the name of column
in |
parent1 |
Character indicating which level of the variable coldata_column represents parent 1. Default: "P1". |
parent2 |
Character indicating which level of the variable coldata_column represents parent 2. Default: "P2". |
method |
Character indicating the method to use to create midparent values. One of 'mean' (default), 'sum', or 'weightedmean'. |
weights |
Numeric vector of length 2 indicating the weights to give
to parents 1 and 2 (respectively) if |
A SummarizedExperiment
object.
data(se_chlamy) new_se <- add_midparent_expression(se_chlamy)
data(se_chlamy) new_se <- add_midparent_expression(se_chlamy)
Add size factors to normalize count data by library size or by biomass
add_size_factors(se, spikein = FALSE, spikein_pattern = "ERCC")
add_size_factors(se, spikein = FALSE, spikein_pattern = "ERCC")
se |
A |
spikein |
Logical indicating whether or not to normalize data using spike-ins. If FALSE, data will be normalized by library size. Default: FALSE. |
spikein_pattern |
Character with the pattern (regex) to use
to identify spike-in features in the count matrix. Only valid
if |
A SummarizedExperiment
object as in se, but with an extra
column in the colData slot named "sizeFactor". This column contains size
factors that will be used by DESeq2 when performing differential expression
analyses.
data(se_chlamy) se_norm <- add_size_factors(se_chlamy)
data(se_chlamy) se_norm <- add_size_factors(se_chlamy)
This object was obtained with get_deg_counts()
using the example
data set deg_list.
data(deg_counts)
data(deg_counts)
A data frame with the frequencies (absolute and relative) of up- and down-regulated genes in each contrast. Relative frequencies are calculated relative to the total number of genes in the count matrix used for differential expression analysis.
data(deg_counts)
data(deg_counts)
This object was obtained with get_deg_list()
using the example
data set se_chlamy.
data(deg_list)
data(deg_list)
A list of data frames with gene-wise test statistics for differentially expressed genes for each contrast. Contrasts are "P2_vs_P1", "F1_vs_P1", "F1_vs_P2", and "F1_vs_midparent", where the ID before 'vs' represents the numerator, and the ID after 'vs' represents the denominator.
data(deg_list)
data(deg_list)
Partition genes in groups based on their expression patterns
expression_partitioning(deg_list)
expression_partitioning(deg_list)
deg_list |
A list of data frames with gene-wise test statistics for
differentially expressed genes as returned by |
A data with the following variables:
Character, gene ID.
Factor, expression group. Category names are numbers from 1 to 12.
Factor, expression group class. One of "UP" (transgressive up-regulation), "DOWN" (transgressive down-regulation), "ADD" (additivity), "ELD_P1" (expression-level dominance toward the parent 1), or "ELD_P2" (expression-level dominance toward the parent 2).
data(deg_list) exp_partitions <- expression_partitioning(deg_list)
data(deg_list) exp_partitions <- expression_partitioning(deg_list)
Get a count table of differentially expressed genes per contrast
get_deg_counts(deg_list)
get_deg_counts(deg_list)
deg_list |
A list of data frames with gene-wise test statistics for
differentially expressed genes as returned by |
A data frame with the following variables:
Character, contrast name.
Numeric, number of up-regulated genes.
Numeric, number of down-regulated genes.
Numeric, total number of differentially expressed genes.
Numeric, percentage of up-regulated genes.
Numeric, percentage of down-regulated genes.
Numeric, percentage of diffferentially expressed genes.
data(deg_list) deg_counts <- get_deg_counts(deg_list)
data(deg_list) deg_counts <- get_deg_counts(deg_list)
Get a table of differential expression expression statistics with DESeq2
get_deg_list( se, coldata_column = "Generation", parent1 = "P1", parent2 = "P2", offspring = "F1", midparent = "midparent", lfcThreshold = 0, alpha = 0.01, ... )
get_deg_list( se, coldata_column = "Generation", parent1 = "P1", parent2 = "P2", offspring = "F1", midparent = "midparent", lfcThreshold = 0, alpha = 0.01, ... )
se |
A |
coldata_column |
Character indicating the name of column
in |
parent1 |
Character indicating which level of the variable coldata_column represents parent 1. Default: "P1". |
parent2 |
Character indicating which level of the variable coldata_column represents parent 2. Default: "P2". |
offspring |
Character indicating which level of the variable coldata_column represents the offspring (hybrid or allopolyploid). Default: "F1" |
midparent |
Character indicating which level of the variable
coldata_column represents the midparent value. Default:
"midparent", as returned by |
lfcThreshold |
Numeric indicating the log2 fold-change threshold to use to consider differentially expressed genes. Default: 0. |
alpha |
Numeric indicating the adjusted P-value threshold to use to consider differentially expressed genes. Default: 0.01. |
... |
Additional arguments to be passed to |
A list of data frames with DESeq2's gene-wise tests statistics
for each contrast. Each data frame contains the same columns as the
output of DESeq2::results()
. Contrasts (list names) are:
Parent 2 (numerator) versus parent 1 (denominator).
Offspring (numerator) versus parent 1 (denominator).
Offspring (numerator) versus parent 2 (denominator).
Offspring (numerator) versus midparent (denominator).
The data frame with gene-wise test statistics in each list element contains the following variables:
Numeric, base mean.
Numeric, log2-transformed fold changes.
Numeric, standard error of the log2-transformed fold changes.
Numeric, observed test statistic.
Numeric, p-value.
Numeric, P-value adjusted for multiple testing.
The list contains two additional attributes named ngenes (numeric, total number of genes), and plotdata, which is a 3-column data frame with variables "gene" (character, gene ID), "lFC_F1_vs_P1" (numeric, log2 fold change between F1 and P1), and "lFC_F1_vs_P2" (numeric, log2 fold change between F1 and P2).
data(se_chlamy) se <- add_midparent_expression(se_chlamy) se <- add_size_factors(se, spikein = TRUE) deg_list <- get_deg_list(se)
data(se_chlamy) se <- add_midparent_expression(se_chlamy) se <- add_size_factors(se, spikein = TRUE) deg_list <- get_deg_list(se)
Data were obtained from Phytozome and processed so that each row contains only one GO term (long format).
data(go_chlamy)
data(go_chlamy)
A 2-column data frame with columns gene (character, gene ID), and GO (character, name of GO term.)
data(go_chlamy)
data(go_chlamy)
Perform overrepresentation analysis for a set of genes
ora( genes, annotation, column = NULL, background, correction = "BH", alpha = 0.05, min_setsize = 5, max_setsize = 500, bp_param = BiocParallel::SerialParam() )
ora( genes, annotation, column = NULL, background, correction = "BH", alpha = 0.05, min_setsize = 5, max_setsize = 500, bp_param = BiocParallel::SerialParam() )
genes |
Character vector containing genes for overrepresentation analysis. |
annotation |
Annotation data frame with genes in the first column and functional annotation in the other columns. This data frame can be exported from Biomart or similar databases. |
column |
Column or columns of annotation to be used for enrichment. Both character or numeric values with column indices can be used. If users want to supply more than one column, input a character or numeric vector. Default: all columns from annotation. |
background |
Character vector of genes to be used as background for the overrepresentation analysis. |
correction |
Multiple testing correction method. One of "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr" or "none". Default is "BH". |
alpha |
Numeric indicating the adjusted P-value threshold for significance. Default: 0.05. |
min_setsize |
Numeric indicating the minimum gene set size to be considered. Gene sets correspond to levels of each variable in annotation). Default: 5. |
max_setsize |
Numeric indicating the maximum gene set size to be considered. Gene sets correspond to levels of each variable in annotation). Default: 500. |
bp_param |
BiocParallel back-end to be used. Default: BiocParallel::SerialParam() |
A data frame of overrepresentation results with the following variables:
Character, functional term ID/name.
Numeric, intersection length between input genes and genes in a particular functional term.
Numeric, number of all genes in a particular functional term.
Numeric, P-value for the hypergeometric test.
Numeric, P-value adjusted for multiple comparisons using the method specified in parameter adj.
Character, name of the grouping variable (i.e., column name of annotation).
data(se_chlamy) data(go_chlamy) data(deg_list) # Perform ORA for up-regulated genes in contrast F1_vs_P1 up_genes <- deg_list$F1_vs_P1 up_genes <- rownames(up_genes[up_genes$log2FoldChange > 0, ]) background <- rownames(se_chlamy) ora(up_genes, go_chlamy, background = background)
data(se_chlamy) data(go_chlamy) data(deg_list) # Perform ORA for up-regulated genes in contrast F1_vs_P1 up_genes <- deg_list$F1_vs_P1 up_genes <- rownames(up_genes[up_genes$log2FoldChange > 0, ]) background <- rownames(se_chlamy) ora(up_genes, go_chlamy, background = background)
Perform a principal component analysis (PCA) and plot PCs
pca_plot( se, PCs = c(1, 2), ntop = 500, color_by = NULL, shape_by = NULL, add_mean = FALSE, palette = NULL )
pca_plot( se, PCs = c(1, 2), ntop = 500, color_by = NULL, shape_by = NULL, add_mean = FALSE, palette = NULL )
se |
A |
PCs |
Numeric vector indicating which principal components to show
in the x-axis and y-axis, respectively. Default: |
ntop |
Numeric indicating the number of top genes with the highest variances to use for the PCA. Default: 500. |
color_by |
Character with the name of the column in |
shape_by |
Character with the name of the column in |
add_mean |
Logical indicating whether to add a diamond symbol with the mean value for each level of the variable indicated in color_by. Default: FALSE |
palette |
Character vector with colors to use for each level of the variable indicated in color_by. If NULL, a default color palette will be used. |
A ggplot object with a PCA plot showing 2 principal components in each axis along with their % of variance explained.
data(se_chlamy) se <- add_midparent_expression(se_chlamy) se$Ploidy[is.na(se$Ploidy)] <- "midparent" se$Generation[is.na(se$Generation)] <- "midparent" pca_plot(se, color_by = "Generation", shape_by = "Ploidy", add_mean = TRUE)
data(se_chlamy) se <- add_midparent_expression(se_chlamy) se$Ploidy[is.na(se$Ploidy)] <- "midparent" se$Generation[is.na(se$Generation)] <- "midparent" pca_plot(se, color_by = "Generation", shape_by = "Ploidy", add_mean = TRUE)
Plot expression partitions
plot_expression_partitions( partition_table, group_by = "Category", palette = NULL, labels = c("P1", "F1", "P2") )
plot_expression_partitions( partition_table, group_by = "Category", palette = NULL, labels = c("P1", "F1", "P2") )
partition_table |
A data frame with genes per expression partition
as returned by |
group_by |
Character indicating the name of the variable in partition_table to use to group genes. One of "Category" or "Class". Default: "Category". |
palette |
Character vector with color names to be used for each level of the variable specified in group_by. If group_by = "Category", this must be a vector of length 12. If group_by = "Class", this must be a vector of length 5. If NULL, a default color palette will be used. |
labels |
A character vector of length 3 indicating the labels to be
given for parent 1, offspring, and parent 2.
Default: |
A ggplot object with a plot showing genes in each expression partition.
data(deg_list) partition_table <- expression_partitioning(deg_list) plot_expression_partitions(partition_table)
data(deg_list) partition_table <- expression_partitioning(deg_list) plot_expression_partitions(partition_table)
Plot a triangle of comparisons of DEG sets among generations
plot_expression_triangle(deg_counts, palette = NULL, box_labels = NULL)
plot_expression_triangle(deg_counts, palette = NULL, box_labels = NULL)
deg_counts |
Data frame with number of differentially expressed
genes per contrast as returned by |
palette |
Character vector of length 4 indicating the colors of the boxes for P1, P2, F1, and midparent, respectively. If NULL, a default color palette will be used. |
box_labels |
Character vector of length 4 indicating the labels of the boxes for P1, P2, F1, and midparent, respectively. Default: NULL, which will lead to labels "P1", "P2", "F1", and "Midparent", respectively. |
The expression triangle plot shows the number of differentially expressed genes (DEGs) for each contrast. Numbers in the center of the lines (in bold) indicate the total number of DEGs, while numbers near boxes indicate the number of up-regulated genes in each generation of the triangle.
A ggplot object with an expression triangle.
data(deg_counts) plot_expression_triangle(deg_counts)
data(deg_counts) plot_expression_triangle(deg_counts)
Plot a barplot of gene frequencies per expression partition
plot_partition_frequencies( partition_table, group_by = "Category", palette = NULL, labels = c("P1", "F1", "P2") )
plot_partition_frequencies( partition_table, group_by = "Category", palette = NULL, labels = c("P1", "F1", "P2") )
partition_table |
A data frame with genes per expression partition
as returned by |
group_by |
Character indicating the name of the variable in partition_table to use to group genes. One of "Category" or "Class". Default: "Category". |
palette |
Character vector with color names to be used for each level of the variable specified in group_by. If group_by = "Category", this must be a vector of length 12. If group_by = "Class", this must be a vector of length 5. If NULL, a default color palette will be used. |
labels |
A character vector of length 3 indicating the labels to be
given for parent 1, offspring, and parent 2.
Default: |
A ggplot object with a barplot showing gene frequencies per partition next to explanatory line plots depicting each partition.
data(deg_list) partition_table <- expression_partitioning(deg_list) plot_partition_frequencies(partition_table)
data(deg_list) partition_table <- expression_partitioning(deg_list) plot_partition_frequencies(partition_table)
Plot a heatmap of pairwise sample correlations with hierarchical clustering
plot_samplecor( se, coldata_cols = NULL, rowdata_cols = NULL, ntop = 500, cor_method = "pearson", palette = "Blues", ... )
plot_samplecor( se, coldata_cols = NULL, rowdata_cols = NULL, ntop = 500, cor_method = "pearson", palette = "Blues", ... )
se |
A |
coldata_cols |
A vector (either numeric or character) indicating which columns should be extracted from colData(se). |
rowdata_cols |
A vector (either numeric or character) indicating which columns should be extracted from rowData(se). |
ntop |
Numeric indicating the number of top genes with the highest variances to use for the PCA. Default: 500. |
cor_method |
Character indicating the correlation method to use. One of "pearson" or "spearman". Default: "pearson". |
palette |
Character indicating the name of the color palette from the RColorBrewer package to use. Default: "Blues". |
... |
Additional arguments to be passed
to |
A heatmap of hierarchically clustered pairwise sample correlations.
data(se_chlamy) se <- add_midparent_expression(se_chlamy) se$Ploidy[is.na(se$Ploidy)] <- "midparent" se$Generation[is.na(se$Generation)] <- "midparent" plot_samplecor(se, ntop = 500)
data(se_chlamy) se <- add_midparent_expression(se_chlamy) se$Ploidy[is.na(se$Ploidy)] <- "midparent" se$Generation[is.na(se$Generation)] <- "midparent" plot_samplecor(se, ntop = 500)
Two lines (referred to as parent 1 and parent 2) with different ploidy levels were crossed to generate an allopolyploid (F1).
data(se_chlamy)
data(se_chlamy)
A SummarizedExperiment
object with an assay (count) and
colData.
data(se_chlamy)
data(se_chlamy)