| Title: | Feature Dominance-based R Package for Genomic Data |
|---|---|
| Description: | dominatR is an R package for quantifying and visualizing feature dominance in datasets. dominatR applies concepts drawn from physics such as center of mass and shannon's entropy to effectively visualize features (e.g. genes) that are present within a specific context or condition. The package integrates, dataframes, matrices and SummerizedExperiment objects and is able to perform common genomic normalization methods. The key aspect is the generation of plots that serve to highlight context-relevant feature dominance. |
| Authors: | Simon Lizarazo [aut, cre] (ORCID: <https://orcid.org/0009-0001-8974-6225>), Ethan Chen [aut], Rajendra K C [aut], Kevin Van Bortle [aut, cph] |
| Maintainer: | Simon Lizarazo <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.0 |
| Built: | 2026-05-30 08:39:51 UTC |
| Source: | https://github.com/bioc/dominatR |
For each row of the numeric data, centmass() computes a 2D center of
mass with coordinates (comx, comy). The x_coord and
y_coord vectors specify the location for each column's "mass."
The original usage assumes a ternary coordinate system by default, but this can be generalized to any scenario where columns represent discrete "masses" at known (x,y) positions.
By default, x_coord = c(0, 1, 0.5) and
y_coord = c(0, 0, sqrt(3)/2), which correspond to the corners
of an equilateral triangle (often used in ternary plots).
centmass( x, x_coord = c(0, 1, 0.5), y_coord = c(0, 0, sqrt(3)/2), assay_name = NULL )centmass( x, x_coord = c(0, 1, 0.5), y_coord = c(0, 0, sqrt(3)/2), assay_name = NULL )
x |
A data.frame (with numeric columns) or a SummarizedExperiment. |
x_coord |
Numeric vector of length equal to the number of columns
in |
y_coord |
Numeric vector of length equal to the number of columns
in |
assay_name |
If |
If x is a data.frame, returns a new data.frame with
columns comx and comy.
If x is a SummarizedExperiment, returns the same object but
with two new columns comx and comy in rowData(x).
library(SummarizedExperiment) library(airway) data('airway') se = airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] # Let's subset for the first 3 columns for this example se = se[,1:3] # ------------------------------- # 1) Using a data.frame # ------------------------------- df = assay(se) |> as.data.frame() df = centmass(df) head(df) # ------------------------------- # 2) Using a SummarizedExperiment # ------------------------------- se2 = centmass(se) ## X and Y coordinates are stored in rowData(se2) head(rowData(se2))library(SummarizedExperiment) library(airway) data('airway') se = airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] # Let's subset for the first 3 columns for this example se = se[,1:3] # ------------------------------- # 1) Using a data.frame # ------------------------------- df = assay(se) |> as.data.frame() df = centmass(df) head(df) # ------------------------------- # 2) Using a SummarizedExperiment # ------------------------------- se2 = centmass(se) ## X and Y coordinates are stored in rowData(se2) head(rowData(se2))
Normalizes a count matrix (or a SummarizedExperiment assay) by the counts-per-million (CPM) method. Specifically:
If log_trans = TRUE, a log2(x + 1) transform is
applied afterward.
cpm_normalization( x, log_trans = FALSE, assay_name = NULL, new_assay_name = NULL )cpm_normalization( x, log_trans = FALSE, assay_name = NULL, new_assay_name = NULL )
x |
A |
log_trans |
Logical. If |
assay_name |
If |
new_assay_name |
If |
If x is a matrix or data.frame, returns a
matrix of CPM-normalized (and optionally
log2-transformed) counts.
If x is a SummarizedExperiment, returns the same
SummarizedExperiment object with the specified assay replaced
or a new assay created containing the CPM-normalized data.
library(SummarizedExperiment) library(airway) data('airway') se = airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] # ------------------------------- # 1) Using a data.frame # ------------------------------- df = assay(se) ## Without log transformation df1 = cpm_normalization(df, log_trans = FALSE) df1[1:5,1:5] ## With log transformation df1 = cpm_normalization(df, log_trans = TRUE) df1[1:5,1:5] # ------------------------------- # 2) Using a SummarizedExperiment # ------------------------------- # If now new_assay_name is provided, then overwrites existing assay se2 = cpm_normalization(se, log_trans = FALSE) se2 head(assay(se2)) # If new new_assay_name, normalization stored in a new object se2 = cpm_normalization(se, log_trans = FALSE, new_assay_name = 'cpm_counts') se2 head(assay(se2, 'cpm_counts')) # A specific assay can also be selected new_matrix = matrix(data = sample(x = seq(1, 100000), size = nrow(se) * ncol(se), replace = TRUE), nrow = nrow(se), ncol = ncol(se)) rownames(new_matrix) = rownames(se) colnames(new_matrix) = colnames(se) ## Creating a new assay called new counts assay(se, 'new_counts') = new_matrix se2 = cpm_normalization(se, new_assay_name = 'cpm_counts_new', assay_name = 'new_counts') se2 head(assay(se2, 'cpm_counts_new'))library(SummarizedExperiment) library(airway) data('airway') se = airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] # ------------------------------- # 1) Using a data.frame # ------------------------------- df = assay(se) ## Without log transformation df1 = cpm_normalization(df, log_trans = FALSE) df1[1:5,1:5] ## With log transformation df1 = cpm_normalization(df, log_trans = TRUE) df1[1:5,1:5] # ------------------------------- # 2) Using a SummarizedExperiment # ------------------------------- # If now new_assay_name is provided, then overwrites existing assay se2 = cpm_normalization(se, log_trans = FALSE) se2 head(assay(se2)) # If new new_assay_name, normalization stored in a new object se2 = cpm_normalization(se, log_trans = FALSE, new_assay_name = 'cpm_counts') se2 head(assay(se2, 'cpm_counts')) # A specific assay can also be selected new_matrix = matrix(data = sample(x = seq(1, 100000), size = nrow(se) * ncol(se), replace = TRUE), nrow = nrow(se), ncol = ncol(se)) rownames(new_matrix) = rownames(se) colnames(new_matrix) = colnames(se) ## Creating a new assay called new counts assay(se, 'new_counts') = new_matrix se2 = cpm_normalization(se, new_assay_name = 'cpm_counts_new', assay_name = 'new_counts') se2 head(assay(se2, 'cpm_counts_new'))
Compute Shannon Entropy on row-normalized data
entropy(x, assay_name = NULL, new_assay_name = "Entropy")entropy(x, assay_name = NULL, new_assay_name = "Entropy")
x |
A data.frame (with numeric columns) or a SummarizedExperiment (with an assay of numeric data). |
assay_name |
(SummarizedExperiment only) The name of the assay to transform and compute Entropy on. If NULL, uses the first assay. |
new_assay_name |
If you prefer to store Q-values in a *new* assay, provide a name. By default 'Entropy' |
If x is a data.frame: returns the same data.frame in which
numeric columns have been replaced by their row-wise proportions,
and an Entropy column is appended.
If x is a SummarizedExperiment: returns the same
SummarizedExperiment in with a new assay (Default name is Entropy)
and rowData(x)$Entropy is added.
library(SummarizedExperiment) library(airway) data('airway') se = airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] # ------------------------------- # 1) Using a data.frame # ------------------------------- df = assay(se) |> as.data.frame() df = entropy(df) ## The function adds a new column called Entropy and transform all ## the counts accordingly head(df) # ------------------------------- # 2) Using a SummarizedExperiment # ------------------------------- ## The function adds a new assay called 'Entropy' with the transformed ## counts. ## This name can be modified with the 'new_assay_name' parameter ## In the rowData dataframe a new column called Entropy is added. se2 <- entropy(se, new_assay_name = 'Entropy') se2 ## In case the experiment has multiple assays, the function allows you to ## choose which assay to use. new_matrix = matrix(data = sample(x = seq(1, 100000), size = nrow(se) * ncol(se), replace = TRUE), nrow = nrow(se), ncol = ncol(se)) rownames(new_matrix) = rownames(se) colnames(new_matrix) = colnames(se) ## Creating a new assay called new counts assay(se, 'new_counts') = new_matrix ## Saving the entropy values as Entropy_newmatrix using the assay 'new ## counts' se2 = entropy(se, new_assay_name = 'Entropy_newmatrix', assay_name = 'new_counts') se2library(SummarizedExperiment) library(airway) data('airway') se = airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] # ------------------------------- # 1) Using a data.frame # ------------------------------- df = assay(se) |> as.data.frame() df = entropy(df) ## The function adds a new column called Entropy and transform all ## the counts accordingly head(df) # ------------------------------- # 2) Using a SummarizedExperiment # ------------------------------- ## The function adds a new assay called 'Entropy' with the transformed ## counts. ## This name can be modified with the 'new_assay_name' parameter ## In the rowData dataframe a new column called Entropy is added. se2 <- entropy(se, new_assay_name = 'Entropy') se2 ## In case the experiment has multiple assays, the function allows you to ## choose which assay to use. new_matrix = matrix(data = sample(x = seq(1, 100000), size = nrow(se) * ncol(se), replace = TRUE), nrow = nrow(se), ncol = ncol(se)) rownames(new_matrix) = rownames(se) colnames(new_matrix) = colnames(se) ## Creating a new assay called new counts assay(se, 'new_counts') = new_matrix ## Saving the entropy values as Entropy_newmatrix using the assay 'new ## counts' se2 = entropy(se, new_assay_name = 'Entropy_newmatrix', assay_name = 'new_counts') se2
Scales each column of a matrix (or SummarizedExperiment assay) so that
the minimum value in that column is mapped to new_min and the
maximum value is mapped to new_max
minmax_normalization( x, new_min = 0, new_max = 1, assay_name = NULL, new_assay_name = NULL )minmax_normalization( x, new_min = 0, new_max = 1, assay_name = NULL, new_assay_name = NULL )
x |
A numeric |
new_min |
The lower bound of the new range (default 0). |
new_max |
The upper bound of the new range (default 1). |
assay_name |
If |
new_assay_name |
If |
If x is a data.frame or matrix, returns a matrix of
column-wise scaled values (same dimensions as x).
If x is a SummarizedExperiment, returns the same
SummarizedExperiment object with the chosen or new assay replaced
by the scaled values.
library(SummarizedExperiment) library(airway) data('airway') se <- airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] # ------------------------------- # 1) Using a data.frame # ------------------------------- df <- assay(se) df1 <- minmax_normalization(df) apply(df1, 2, range) ## Using a new range df1 <- minmax_normalization(df, new_min = 5, new_max = 10) apply(df1, 2, range) # ------------------------------- # 2) Using a SummarizedExperiment # ------------------------------- # If now new_assay_name is provided, then overwrites existing assay se2 <- minmax_normalization(se) apply(assay(se2), 2, range) # If new new_assay_name, normalization stored in a new object se2 <- minmax_normalization(se, new_assay_name = 'minmax_counts') apply(assay(se2, 'minmax_counts'), 2, range) # A specific assay can also be selected new_matrix <- matrix(data = sample(x = seq(1, 100000), size = nrow(se) * ncol(se), replace = TRUE), nrow = nrow(se), ncol = ncol(se)) rownames(new_matrix) <- rownames(se) colnames(new_matrix) <- colnames(se) ## Creating a new assay called new counts assay(se, 'new_counts') <- new_matrix se2 <- minmax_normalization(se, new_assay_name = 'minmax_counts_new', assay_name = 'new_counts') apply(assay(se2, 'minmax_counts_new'), 2, range) ## Using a different range se2 <- minmax_normalization(se, new_assay_name = 'minmax_counts_new', assay_name = 'new_counts', new_min = 10, new_max = 20) apply(assay(se2, 'minmax_counts_new'), 2, range)library(SummarizedExperiment) library(airway) data('airway') se <- airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] # ------------------------------- # 1) Using a data.frame # ------------------------------- df <- assay(se) df1 <- minmax_normalization(df) apply(df1, 2, range) ## Using a new range df1 <- minmax_normalization(df, new_min = 5, new_max = 10) apply(df1, 2, range) # ------------------------------- # 2) Using a SummarizedExperiment # ------------------------------- # If now new_assay_name is provided, then overwrites existing assay se2 <- minmax_normalization(se) apply(assay(se2), 2, range) # If new new_assay_name, normalization stored in a new object se2 <- minmax_normalization(se, new_assay_name = 'minmax_counts') apply(assay(se2, 'minmax_counts'), 2, range) # A specific assay can also be selected new_matrix <- matrix(data = sample(x = seq(1, 100000), size = nrow(se) * ncol(se), replace = TRUE), nrow = nrow(se), ncol = ncol(se)) rownames(new_matrix) <- rownames(se) colnames(new_matrix) <- colnames(se) ## Creating a new assay called new counts assay(se, 'new_counts') <- new_matrix se2 <- minmax_normalization(se, new_assay_name = 'minmax_counts_new', assay_name = 'new_counts') apply(assay(se2, 'minmax_counts_new'), 2, range) ## Using a different range se2 <- minmax_normalization(se, new_assay_name = 'minmax_counts_new', assay_name = 'new_counts', new_min = 10, new_max = 20) apply(assay(se2, 'minmax_counts_new'), 2, range)
Produces a radial dominance plot in which each observation is located by:
Angle (t) – the variable with the greatest value (ties broken at random).
Radius (r) – a monotone mapping of the row‐wise Shannon entropy: points with low entropy (one variable dominates) are near the edge; points with high entropy lie toward the centre.
The circle is partitioned into coloured slices; optional factor
information can colour/jitter points independently. Labels for each
slice may be drawn as curved text on the circle or shown in a legend.
plot_circle( x, n, column_variable_factor = NULL, variables_highlight = NULL, entropyrange = c(0, Inf), magnituderange = c(0, Inf), background_alpha_polygon = 0.05, background_polygon = NULL, background_na_polygon = "whitesmoke", point_size = 1, point_fill_colors = NULL, point_fill_na_colors = "whitesmoke", point_line_colors = NULL, point_line_na_colors = "whitesmoke", straight_points = TRUE, line_col = "gray90", out_line = "black", label = "legend", text_label_curve_size = 3, assay_name = NULL, output_table = TRUE )plot_circle( x, n, column_variable_factor = NULL, variables_highlight = NULL, entropyrange = c(0, Inf), magnituderange = c(0, Inf), background_alpha_polygon = 0.05, background_polygon = NULL, background_na_polygon = "whitesmoke", point_size = 1, point_fill_colors = NULL, point_fill_na_colors = "whitesmoke", point_line_colors = NULL, point_line_na_colors = "whitesmoke", straight_points = TRUE, line_col = "gray90", out_line = "black", label = "legend", text_label_curve_size = 3, assay_name = NULL, output_table = TRUE )
x |
A numeric |
n |
Integer ( |
column_variable_factor |
Character. Name of a column (or rowData
column in a SummarizedExperiment) holding a categorical variable whose
levels will colour the points. If |
variables_highlight |
Character vector naming which variables
should receive curved text labels when |
entropyrange, magnituderange
|
Numeric length-2 vectors. Rows falling outside either interval are excluded from the plot/data. |
background_alpha_polygon |
Alpha level (0–1) for the coloured background slices. |
background_polygon |
Character vector of slice fill colours;
defaults to |
background_na_polygon, point_fill_na_colors, point_line_na_colors
|
Sets the colour for missing values. |
point_size |
Numeric; plotted point size. |
point_fill_colors, point_line_colors
|
Optional colour vectors for point fill / outline. |
straight_points |
Logical. If TRUE points are plotted in a straight line. |
line_col |
Colour for the inner grid / slice borders. |
out_line |
Colour for the outermost circle. |
label |
Either |
text_label_curve_size |
Numeric font size for curved labels. |
assay_name |
(SummarizedExperiment only) Which assay to use. Defaults to the first assay. |
output_table |
Logical. Also return the underlying data frame? |
A linear map is used
where is the Shannon entropy of the row after log base 2, so
.
If output_table = TRUE a list with:
circle_plot — a ggplot object;
data — the augmented data frame containing
entropy, radius, (x,y) coordinates, dominant variable and
optional factor.
Otherwise only the ggplot object is returned.
library(SummarizedExperiment) library(airway) library(tidyverse) data('airway') se = airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(500, nrow(se))) se <- se[idx, ] ## Normalize the data first using tpm_normalization rowData(se)$gene_length = rowData(se)$gene_seq_end - rowData(se)$gene_seq_start se = tpm_normalization(se, log_trans = TRUE, new_assay_name = 'tpm_norm') # ------------------------------- # 1) Using a data.frame # ------------------------------- df <- assay(se, 'tpm_norm') |> as.data.frame() ## For simplicity let's rename the columns colnames(df) <- paste('Column_', 1:8, sep ='') # Default plot_circle( x = df, n = 8, entropyrange = c(0, 3), magnituderange = c(0, Inf), label = 'legend', output_table = FALSE ) # Filtering by entropy, 8 variables, max entropy value is log2(8) plot_circle( x = df, n = 8, entropyrange = c(2, 3), magnituderange = c(0, Inf), label = 'legend', output_table = FALSE ) plot_circle( x = df, n = 8, entropyrange = c(0, 2), magnituderange = c(0, Inf), label = 'legend', output_table = FALSE ) # Aesthetics modification plot_circle( x = df, n = 8, entropyrange = c(0, 2), magnituderange = c(0, Inf), label = 'curve', output_table = FALSE ) # It is possible to highlight only a specific variable plot_circle( x = df, n = 8, entropyrange = c(0, 2), magnituderange = c(0, Inf), label = 'legend', output_table = FALSE, background_alpha_polygon = 0.2, background_na_polygon = 'transparent', background_polygon = c('Column_1' = 'indianred', 'Column_3' = 'lightblue', 'Column_5' = 'lightgreen'), point_fill_colors = c('Column_1' = 'darkred', 'Column_3' = 'darkblue', 'Column_5' = 'darkgreen'), point_line_colors = c('Column_1' = 'black', 'Column_3' = 'black', 'Column_5' = 'black') ) # Let's create a factor column in our df df$factor <- sample(c('A', 'B', 'C', 'D'), size = nrow(df), replace = TRUE) # It is possible to visualize things by this specific factor column using # column_variable_factor plot_circle( x = df, n = 8, column_variable_factor = 'factor', entropyrange = c(0, 2), magnituderange = c(0, Inf), label = 'legend', output_table = FALSE, background_alpha_polygon = 0.2, background_na_polygon = 'transparent', background_polygon = c('Column_1' = 'indianred', 'Column_3' = 'lightblue', 'Column_5' = 'lightgreen') ) # Colors can be modified plot_circle( x = df, n = 8, column_variable_factor = 'factor', entropyrange = c(0, 2), magnituderange = c(0, Inf), label = 'curve', output_table = FALSE, background_alpha_polygon = 0.02, background_na_polygon = 'transparent', point_fill_colors = c('A' = 'black', 'B' = 'gray', 'C' = 'white', 'D' = 'orange'), point_line_colors = c('A' = 'black', 'B' = 'gray', 'C' = 'white', 'D' = 'orange') ) # Size of the points can be modified too plot_circle( x = df, n = 8, point_size = 2, column_variable_factor = 'factor', entropyrange = c(0, 2), magnituderange = c(0, Inf), label = 'curve', output_table = FALSE, background_alpha_polygon = 0.02, background_na_polygon = 'transparent', point_fill_colors = c('A' = 'black', 'B' = 'gray', 'C' = 'white', 'D' = 'orange'), point_line_colors = c('A' = 'black', 'B' = 'gray', 'C' = 'white', 'D' = 'orange') ) # Retrieving a dataframe with the results used for plotting, # set output_table <- TRUE plot <- plot_circle( x = df, n = 8, point_size = 2, column_variable_factor = 'factor', entropyrange = c(0, 2), magnituderange = c(0, Inf), label = 'curve', output_table = TRUE, background_alpha_polygon = 0.02, background_na_polygon = 'transparent', point_fill_colors = c('A' = 'black', 'B' = 'gray', 'C' = 'white', 'D' = 'orange'), point_line_colors = c('A' = 'black', 'B' = 'gray', 'C' = 'white', 'D' = 'orange') ) # The first object is the plot plot[[1]] # The second the dataframe with information for each row, including # Entropy and the variable that dominates that particular observation. head(plot[[2]]) # ------------------------------- # 1) Using a SummarizedExperiment # ------------------------------- # Changing column names colnames(se) <- paste('Column_', 1:8, sep ='') # Default plot_circle( x = se, n = 8, entropyrange = c(0, 3), magnituderange = c(0, Inf), label = 'legend', output_table = FALSE, assay_name = 'tpm_norm' ) # Filtering High Entropy genes plot_circle( x = se, n = 8, entropyrange = c(0, 1.5), magnituderange = c(0, Inf), label = 'legend', output_table = FALSE, assay_name = 'tpm_norm' ) # Filtering Low Entropy genes plot_circle( x = se, n = 8, entropyrange = c(2, 3), magnituderange = c(0, Inf), label = 'legend', output_table = FALSE, assay_name = 'tpm_norm' ) # Using a character column from rowData plot_circle( x = se, n = 8, column_variable_factor = 'gene_biotype', entropyrange = c(2,3), magnituderange = c(0, Inf), label = 'legend', output_table = FALSE, assay_name = 'tpm_norm' ) plot_circle( x = se, n = 8, column_variable_factor = 'gene_biotype', point_size = 3, entropyrange = c(0,1.5), magnituderange = c(2, Inf), label = 'legend', output_table = FALSE, assay_name = 'tpm_norm', ) # Highlighting only a class of interest plot_circle( x = se, n = 8, column_variable_factor = 'gene_biotype', point_size = 3, entropyrange = c(0,1.5), magnituderange = c(2, Inf), label = 'legend', output_table = FALSE, assay_name = 'tpm_norm', point_fill_colors = c('miRNA' = 'orange'), point_line_colors = c('miRNA' = 'orange') ) # Retrieving a dataframe with the results used for plotting, # set output_table <- TRUE plot <- plot_circle( x = se, n = 8, column_variable_factor = 'gene_biotype', point_size = 3, entropyrange = c(0,1.5), magnituderange = c(2, Inf), label = 'legend', output_table = TRUE, assay_name = 'tpm_norm', point_fill_colors = c('miRNA' = 'orange'), point_line_colors = c('miRNA' = 'orange') ) # It returns a list. # The first object is the plot plot[[1]] # The second the dataframe with information for each row, including # Entropy and the variable that dominates that particular observation. head(plot[[2]])library(SummarizedExperiment) library(airway) library(tidyverse) data('airway') se = airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(500, nrow(se))) se <- se[idx, ] ## Normalize the data first using tpm_normalization rowData(se)$gene_length = rowData(se)$gene_seq_end - rowData(se)$gene_seq_start se = tpm_normalization(se, log_trans = TRUE, new_assay_name = 'tpm_norm') # ------------------------------- # 1) Using a data.frame # ------------------------------- df <- assay(se, 'tpm_norm') |> as.data.frame() ## For simplicity let's rename the columns colnames(df) <- paste('Column_', 1:8, sep ='') # Default plot_circle( x = df, n = 8, entropyrange = c(0, 3), magnituderange = c(0, Inf), label = 'legend', output_table = FALSE ) # Filtering by entropy, 8 variables, max entropy value is log2(8) plot_circle( x = df, n = 8, entropyrange = c(2, 3), magnituderange = c(0, Inf), label = 'legend', output_table = FALSE ) plot_circle( x = df, n = 8, entropyrange = c(0, 2), magnituderange = c(0, Inf), label = 'legend', output_table = FALSE ) # Aesthetics modification plot_circle( x = df, n = 8, entropyrange = c(0, 2), magnituderange = c(0, Inf), label = 'curve', output_table = FALSE ) # It is possible to highlight only a specific variable plot_circle( x = df, n = 8, entropyrange = c(0, 2), magnituderange = c(0, Inf), label = 'legend', output_table = FALSE, background_alpha_polygon = 0.2, background_na_polygon = 'transparent', background_polygon = c('Column_1' = 'indianred', 'Column_3' = 'lightblue', 'Column_5' = 'lightgreen'), point_fill_colors = c('Column_1' = 'darkred', 'Column_3' = 'darkblue', 'Column_5' = 'darkgreen'), point_line_colors = c('Column_1' = 'black', 'Column_3' = 'black', 'Column_5' = 'black') ) # Let's create a factor column in our df df$factor <- sample(c('A', 'B', 'C', 'D'), size = nrow(df), replace = TRUE) # It is possible to visualize things by this specific factor column using # column_variable_factor plot_circle( x = df, n = 8, column_variable_factor = 'factor', entropyrange = c(0, 2), magnituderange = c(0, Inf), label = 'legend', output_table = FALSE, background_alpha_polygon = 0.2, background_na_polygon = 'transparent', background_polygon = c('Column_1' = 'indianred', 'Column_3' = 'lightblue', 'Column_5' = 'lightgreen') ) # Colors can be modified plot_circle( x = df, n = 8, column_variable_factor = 'factor', entropyrange = c(0, 2), magnituderange = c(0, Inf), label = 'curve', output_table = FALSE, background_alpha_polygon = 0.02, background_na_polygon = 'transparent', point_fill_colors = c('A' = 'black', 'B' = 'gray', 'C' = 'white', 'D' = 'orange'), point_line_colors = c('A' = 'black', 'B' = 'gray', 'C' = 'white', 'D' = 'orange') ) # Size of the points can be modified too plot_circle( x = df, n = 8, point_size = 2, column_variable_factor = 'factor', entropyrange = c(0, 2), magnituderange = c(0, Inf), label = 'curve', output_table = FALSE, background_alpha_polygon = 0.02, background_na_polygon = 'transparent', point_fill_colors = c('A' = 'black', 'B' = 'gray', 'C' = 'white', 'D' = 'orange'), point_line_colors = c('A' = 'black', 'B' = 'gray', 'C' = 'white', 'D' = 'orange') ) # Retrieving a dataframe with the results used for plotting, # set output_table <- TRUE plot <- plot_circle( x = df, n = 8, point_size = 2, column_variable_factor = 'factor', entropyrange = c(0, 2), magnituderange = c(0, Inf), label = 'curve', output_table = TRUE, background_alpha_polygon = 0.02, background_na_polygon = 'transparent', point_fill_colors = c('A' = 'black', 'B' = 'gray', 'C' = 'white', 'D' = 'orange'), point_line_colors = c('A' = 'black', 'B' = 'gray', 'C' = 'white', 'D' = 'orange') ) # The first object is the plot plot[[1]] # The second the dataframe with information for each row, including # Entropy and the variable that dominates that particular observation. head(plot[[2]]) # ------------------------------- # 1) Using a SummarizedExperiment # ------------------------------- # Changing column names colnames(se) <- paste('Column_', 1:8, sep ='') # Default plot_circle( x = se, n = 8, entropyrange = c(0, 3), magnituderange = c(0, Inf), label = 'legend', output_table = FALSE, assay_name = 'tpm_norm' ) # Filtering High Entropy genes plot_circle( x = se, n = 8, entropyrange = c(0, 1.5), magnituderange = c(0, Inf), label = 'legend', output_table = FALSE, assay_name = 'tpm_norm' ) # Filtering Low Entropy genes plot_circle( x = se, n = 8, entropyrange = c(2, 3), magnituderange = c(0, Inf), label = 'legend', output_table = FALSE, assay_name = 'tpm_norm' ) # Using a character column from rowData plot_circle( x = se, n = 8, column_variable_factor = 'gene_biotype', entropyrange = c(2,3), magnituderange = c(0, Inf), label = 'legend', output_table = FALSE, assay_name = 'tpm_norm' ) plot_circle( x = se, n = 8, column_variable_factor = 'gene_biotype', point_size = 3, entropyrange = c(0,1.5), magnituderange = c(2, Inf), label = 'legend', output_table = FALSE, assay_name = 'tpm_norm', ) # Highlighting only a class of interest plot_circle( x = se, n = 8, column_variable_factor = 'gene_biotype', point_size = 3, entropyrange = c(0,1.5), magnituderange = c(2, Inf), label = 'legend', output_table = FALSE, assay_name = 'tpm_norm', point_fill_colors = c('miRNA' = 'orange'), point_line_colors = c('miRNA' = 'orange') ) # Retrieving a dataframe with the results used for plotting, # set output_table <- TRUE plot <- plot_circle( x = se, n = 8, column_variable_factor = 'gene_biotype', point_size = 3, entropyrange = c(0,1.5), magnituderange = c(2, Inf), label = 'legend', output_table = TRUE, assay_name = 'tpm_norm', point_fill_colors = c('miRNA' = 'orange'), point_line_colors = c('miRNA' = 'orange') ) # It returns a list. # The first object is the plot plot[[1]] # The second the dataframe with information for each row, including # Entropy and the variable that dominates that particular observation. head(plot[[2]])
Visualises how often each categorical level ( 'Factor' ) is dominant at a
given entropy score. The function expects the second element of
the list returned by plot_circle().
Visualises how often each categorical level ( 'Factor' ) is dominant at a
given entropy score. The function expects the second element of
the list returned by plot_circle().
plot_circle_frequency( n, circle, single = FALSE, legend = TRUE, numb_columns = 1, filter_class = NULL, point_size = 2 ) plot_circle_frequency( n, circle, single = FALSE, legend = TRUE, numb_columns = 1, filter_class = NULL, point_size = 2 )plot_circle_frequency( n, circle, single = FALSE, legend = TRUE, numb_columns = 1, filter_class = NULL, point_size = 2 ) plot_circle_frequency( n, circle, single = FALSE, legend = TRUE, numb_columns = 1, filter_class = NULL, point_size = 2 )
n |
Integer. Number of numeric variables used in
|
circle |
The list returned by |
single |
Logical. If |
legend |
Logical. Show a legend for the plot |
numb_columns |
Faceting columns when |
filter_class |
Character vector of levels to keep; |
point_size |
Numeric. Size of jitter points. |
A list with
plot_stat — a ggplot object.
data — the aggregated frequency table.
#' Dominance–Entropy Frequency Plot
library(SummarizedExperiment) library(airway) data('airway') se = airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] ## Normalize the data first using tpm_normalization rowData(se)$gene_length = rowData(se)$gene_seq_end - rowData(se)$gene_seq_start se = tpm_normalization(se, log_trans = TRUE, new_assay_name = 'tpm_norm') ## Creating a plot_circle list using the 'gene_biotype' column as factor plot_test <- plot_circle( x = se, n = 8, column_variable_factor = 'gene_biotype', entropyrange = c(0,Inf), magnituderange = c(0, Inf), label = 'legend', output_table = TRUE, assay_name = 'tpm_norm' ) ## Using the plot_test object created above ## Default plot <- plot_circle_frequency(n = 8, circle = plot_test, single = TRUE, legend = TRUE, numb_columns = 1, filter_class = NULL, point_size = 2) plot[[1]] ## Facetting by factor is possible, adjusting the number of columns plot <- plot_circle_frequency(n = 8, circle = plot_test, single = FALSE, legend = TRUE, numb_columns = 3, filter_class = NULL, point_size = 2) plot[[1]] ## Subsetting by a specific class present in Factor plot_circle_frequency(n = 8, circle = plot_test, single = FALSE, legend = TRUE, numb_columns = 1, filter_class = c('protein_coding', 'snoRNA', 'miRNA'), point_size = 2) plot[[1]] library(SummarizedExperiment) library(airway) data('airway') se = airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] ## Normalize the data first using tpm_normalization rowData(se)$gene_length = rowData(se)$gene_seq_end - rowData(se)$gene_seq_start se = tpm_normalization(se, log_trans = TRUE, new_assay_name = 'tpm_norm') ## Creating a plot_circle list using the 'gene_biotype' column as factor plot_test <- plot_circle( x = se, n = 8, column_variable_factor = 'gene_biotype', entropyrange = c(0,Inf), magnituderange = c(0, Inf), label = 'legend', output_table = TRUE, assay_name = 'tpm_norm' ) ## Using the plot_test object created above ## Default plot <- plot_circle_frequency(n = 8, circle = plot_test, single = TRUE, legend = TRUE, numb_columns = 1, filter_class = NULL, point_size = 2) plot[[1]] ## Facetting by factor is possible, adjusting the number of columns plot <- plot_circle_frequency(n = 8, circle = plot_test, single = FALSE, legend = TRUE, numb_columns = 3, filter_class = NULL, point_size = 2) plot[[1]] ## Subsetting by a specific class present in Factor plot_circle_frequency(n = 8, circle = plot_test, single = FALSE, legend = TRUE, numb_columns = 1, filter_class = c('protein_coding', 'snoRNA', 'miRNA'), point_size = 2) plot[[1]]library(SummarizedExperiment) library(airway) data('airway') se = airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] ## Normalize the data first using tpm_normalization rowData(se)$gene_length = rowData(se)$gene_seq_end - rowData(se)$gene_seq_start se = tpm_normalization(se, log_trans = TRUE, new_assay_name = 'tpm_norm') ## Creating a plot_circle list using the 'gene_biotype' column as factor plot_test <- plot_circle( x = se, n = 8, column_variable_factor = 'gene_biotype', entropyrange = c(0,Inf), magnituderange = c(0, Inf), label = 'legend', output_table = TRUE, assay_name = 'tpm_norm' ) ## Using the plot_test object created above ## Default plot <- plot_circle_frequency(n = 8, circle = plot_test, single = TRUE, legend = TRUE, numb_columns = 1, filter_class = NULL, point_size = 2) plot[[1]] ## Facetting by factor is possible, adjusting the number of columns plot <- plot_circle_frequency(n = 8, circle = plot_test, single = FALSE, legend = TRUE, numb_columns = 3, filter_class = NULL, point_size = 2) plot[[1]] ## Subsetting by a specific class present in Factor plot_circle_frequency(n = 8, circle = plot_test, single = FALSE, legend = TRUE, numb_columns = 1, filter_class = c('protein_coding', 'snoRNA', 'miRNA'), point_size = 2) plot[[1]] library(SummarizedExperiment) library(airway) data('airway') se = airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] ## Normalize the data first using tpm_normalization rowData(se)$gene_length = rowData(se)$gene_seq_end - rowData(se)$gene_seq_start se = tpm_normalization(se, log_trans = TRUE, new_assay_name = 'tpm_norm') ## Creating a plot_circle list using the 'gene_biotype' column as factor plot_test <- plot_circle( x = se, n = 8, column_variable_factor = 'gene_biotype', entropyrange = c(0,Inf), magnituderange = c(0, Inf), label = 'legend', output_table = TRUE, assay_name = 'tpm_norm' ) ## Using the plot_test object created above ## Default plot <- plot_circle_frequency(n = 8, circle = plot_test, single = TRUE, legend = TRUE, numb_columns = 1, filter_class = NULL, point_size = 2) plot[[1]] ## Facetting by factor is possible, adjusting the number of columns plot <- plot_circle_frequency(n = 8, circle = plot_test, single = FALSE, legend = TRUE, numb_columns = 3, filter_class = NULL, point_size = 2) plot[[1]] ## Subsetting by a specific class present in Factor plot_circle_frequency(n = 8, circle = plot_test, single = FALSE, legend = TRUE, numb_columns = 1, filter_class = c('protein_coding', 'snoRNA', 'miRNA'), point_size = 2) plot[[1]]
Creates a rope-like visualization comparing two numeric columns (e.g., "a" vs. "b"), with optional color filtering based on maximum value range and entropy range.
The plot is useful for visualising “winner-takes-all” behaviour in two-way comparisons, e.g. gene expression in *A* and *B* conditions.
plot_rope( x, column_name = NULL, push_text = 1, rope_width = 1, rope_color = "#CCCCCCCC", rope_border = TRUE, col = c("red", "blue"), col_bg = "whitesmoke", pch = c(21, 21), pch_bg = 19, cex = 1, entropyrange = c(0, Inf), maxvaluerange = c(0, Inf), plotAll = TRUE, label = TRUE, output_table = TRUE, assay_name = NULL )plot_rope( x, column_name = NULL, push_text = 1, rope_width = 1, rope_color = "#CCCCCCCC", rope_border = TRUE, col = c("red", "blue"), col_bg = "whitesmoke", pch = c(21, 21), pch_bg = 19, cex = 1, entropyrange = c(0, Inf), maxvaluerange = c(0, Inf), plotAll = TRUE, label = TRUE, output_table = TRUE, assay_name = NULL )
x |
A |
column_name |
Character. The name of the two variables that will be
used for the analysis. By default it is |
push_text |
Numeric. Expands or contracts text label positions along the x-axis. |
rope_width |
Numeric. Thickness of the "rope" drawn in the center. |
rope_color |
Character. Color for the rope's fill. |
rope_border |
Logical or a color. Whether or how to draw the rope border. |
col |
Character vector of length 2. Colors assigned when |
col_bg |
Background color (used when a row is filtered out by entropy or max value). |
pch |
Integer or vector specifying point types for the main two categories. |
pch_bg |
Integer specifying the point type for the "gray" points
(if |
cex |
Numeric. Expansion factor for point size. |
entropyrange |
Numeric vector of length 2. Rows with |
maxvaluerange |
Numeric vector of length 2. Rows with |
plotAll |
Logical. If |
label |
Logical. If |
output_table |
Logical. If |
assay_name |
(SummarizedExperiment only) Name of the assay containing the 2-column data. If not specified, the first assay is used. |
The function expects two numeric columns. If the experiment has more than
two columns, the name of the columns of interest can be specified by using
the parameter column_name. If x is a
SummarizedExperiment, it extracts the indicated assay and extracts
the columns of interest
It also uses:
- centmass() for computing comx.
- entropy() for computing Shannon entropy, stored in the
entropy column. Between two variables, entropy rangeS between 0
and 1.
The rope is drawn in the middle of the plot (the x-axis from -1 to 1, y = 0),
with thickness rope_width. Points are scattered in comy
direction for a bit of jitter within the rope.
If output_table=TRUE, returns a data frame with extra
columns (comx, comy, color, maxvalue,
entropy) used in the plot.
If output_table=FALSE, invisibly returns NULL.
library(SummarizedExperiment) library(airway) data('airway') se <- airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] ## Normalize the data first using tpm_normalization rowData(se)$gene_length = rowData(se)$gene_seq_end - rowData(se)$gene_seq_start se <- tpm_normalization(se, log_trans = TRUE, new_assay_name = 'tpm_norm') # ------------------------------- # 1) Using a data.frame # ------------------------------- df <- assay(se, 'tpm_norm') df <- as.data.frame(df) # Choose two columns of interest, in this case 'SRR1039508' # and SRR1039516' # Default Behaviour plot_rope(df, column_name = c("SRR1039508", "SRR1039516"), output_table = FALSE) # Colors can be modified plot_rope(df, column_name = c("SRR1039508", "SRR1039516"), output_table = FALSE, col = c('darkgreen', 'darkred')) # Emphasis can be applied to highly dominant variables by controling # entropy parameter, # values outside of that range will be colored smokewhite. plot_rope(df, column_name = c("SRR1039508", "SRR1039516"), output_table = FALSE, col = c('darkgreen', 'darkred'), entropyrange = c(0,0.1)) # Points in the center are a reflection of genes with expression levels = 0. # This can be modified by adjusting the maxvalue range plot_rope(df, column_name = c("SRR1039508", "SRR1039516"), output_table = FALSE, col = c('darkgreen', 'darkred'), entropyrange = c(0,0.1), maxvaluerange = c(2, Inf)) # By controling entropy range, you can observe different types of genes. # Values closer to 0 represent dominance and closer to 1 shareness. # Exploring genes with high normalized expression values across different #' entropy ranges # Looking for genes with a Log2(TPM) score between 4 and 8 plot_rope(df, column_name = c("SRR1039508", "SRR1039516"), output_table = FALSE, col = c('darkgreen', 'darkred'), entropyrange = c(0,0.1), maxvaluerange = c(4, 8)) plot_rope(df, column_name = c("SRR1039508", "SRR1039516"), output_table = FALSE, col = c('darkgreen', 'darkred'), entropyrange = c(0.1,0.8), maxvaluerange = c(4, 8)) plot_rope(df, column_name = c("SRR1039508", "SRR1039516"), output_table = FALSE, col = c('darkgreen', 'darkred'), entropyrange = c(0.8,1), maxvaluerange = c(4, 8)) # ------------------------------- # 1) Using a SummarizedExperiment # ------------------------------- plot_rope(se, column_name = c("SRR1039508", "SRR1039516"), output_table = FALSE, col = c('lightgreen', 'indianred'), entropyrange = c(0,0.1), maxvaluerange = c(4, 8)) plot_rope(se, column_name = c("SRR1039508", "SRR1039516"), output_table = FALSE, col =c('lightgreen', 'indianred'), entropyrange = c(0.1,0.8), maxvaluerange = c(4, 8)) plot_rope(se, column_name = c("SRR1039508", "SRR1039516"), output_table = FALSE, col = c('lightgreen', 'indianred'), entropyrange = c(0.8,1), maxvaluerange = c(4, 8)) ### Obtaining the DF output for the analysis object <- plot_rope(se, column_name = c("SRR1039508", "SRR1039516"), output_table = TRUE, col = c('lightgreen', 'indianred'), entropyrange = c(0.8,1), maxvaluerange = c(4, 8)) head(object)library(SummarizedExperiment) library(airway) data('airway') se <- airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] ## Normalize the data first using tpm_normalization rowData(se)$gene_length = rowData(se)$gene_seq_end - rowData(se)$gene_seq_start se <- tpm_normalization(se, log_trans = TRUE, new_assay_name = 'tpm_norm') # ------------------------------- # 1) Using a data.frame # ------------------------------- df <- assay(se, 'tpm_norm') df <- as.data.frame(df) # Choose two columns of interest, in this case 'SRR1039508' # and SRR1039516' # Default Behaviour plot_rope(df, column_name = c("SRR1039508", "SRR1039516"), output_table = FALSE) # Colors can be modified plot_rope(df, column_name = c("SRR1039508", "SRR1039516"), output_table = FALSE, col = c('darkgreen', 'darkred')) # Emphasis can be applied to highly dominant variables by controling # entropy parameter, # values outside of that range will be colored smokewhite. plot_rope(df, column_name = c("SRR1039508", "SRR1039516"), output_table = FALSE, col = c('darkgreen', 'darkred'), entropyrange = c(0,0.1)) # Points in the center are a reflection of genes with expression levels = 0. # This can be modified by adjusting the maxvalue range plot_rope(df, column_name = c("SRR1039508", "SRR1039516"), output_table = FALSE, col = c('darkgreen', 'darkred'), entropyrange = c(0,0.1), maxvaluerange = c(2, Inf)) # By controling entropy range, you can observe different types of genes. # Values closer to 0 represent dominance and closer to 1 shareness. # Exploring genes with high normalized expression values across different #' entropy ranges # Looking for genes with a Log2(TPM) score between 4 and 8 plot_rope(df, column_name = c("SRR1039508", "SRR1039516"), output_table = FALSE, col = c('darkgreen', 'darkred'), entropyrange = c(0,0.1), maxvaluerange = c(4, 8)) plot_rope(df, column_name = c("SRR1039508", "SRR1039516"), output_table = FALSE, col = c('darkgreen', 'darkred'), entropyrange = c(0.1,0.8), maxvaluerange = c(4, 8)) plot_rope(df, column_name = c("SRR1039508", "SRR1039516"), output_table = FALSE, col = c('darkgreen', 'darkred'), entropyrange = c(0.8,1), maxvaluerange = c(4, 8)) # ------------------------------- # 1) Using a SummarizedExperiment # ------------------------------- plot_rope(se, column_name = c("SRR1039508", "SRR1039516"), output_table = FALSE, col = c('lightgreen', 'indianred'), entropyrange = c(0,0.1), maxvaluerange = c(4, 8)) plot_rope(se, column_name = c("SRR1039508", "SRR1039516"), output_table = FALSE, col =c('lightgreen', 'indianred'), entropyrange = c(0.1,0.8), maxvaluerange = c(4, 8)) plot_rope(se, column_name = c("SRR1039508", "SRR1039516"), output_table = FALSE, col = c('lightgreen', 'indianred'), entropyrange = c(0.8,1), maxvaluerange = c(4, 8)) ### Obtaining the DF output for the analysis object <- plot_rope(se, column_name = c("SRR1039508", "SRR1039516"), output_table = TRUE, col = c('lightgreen', 'indianred'), entropyrange = c(0.8,1), maxvaluerange = c(4, 8)) head(object)
Creates a triangular (ternary) scatter plot for **three** numeric variables Each point is coloured by the variable with the largest value and can be filtered by (i) Entropy score ranging from (0 to 1.585) and (ii) overall score
The plot is useful for visualising “winner-takes-all” behaviour in three-way comparisons, e.g. gene expression in *A*, *B*, *C* conditions.
plot_triangle( x, column_name = NULL, entropyrange = c(0, Inf), maxvaluerange = c(0, Inf), col = c("darkred", "darkgreen", "darkblue"), background_col = "whitesmoke", output_table = TRUE, plotAll = TRUE, cex = 1, pch = 16, assay_name = NULL, label = TRUE, push_text = 1 )plot_triangle( x, column_name = NULL, entropyrange = c(0, Inf), maxvaluerange = c(0, Inf), col = c("darkred", "darkgreen", "darkblue"), background_col = "whitesmoke", output_table = TRUE, plotAll = TRUE, cex = 1, pch = 16, assay_name = NULL, label = TRUE, push_text = 1 )
x |
A numeric |
column_name |
Character. Names (or indices) of the three columns to
visualise. If |
entropyrange |
Numeric. Keep points whose entropy lies inside
this interval. Default is |
maxvaluerange |
Numeric. Keep points whose values lies inside
this interval. Default is |
col |
Character. Colors for each variable. |
background_col |
Character. Color for the observations outside
|
output_table |
Logical. If |
plotAll |
Logical. If |
cex, pch
|
Base-graphics point size / symbol. |
assay_name |
(SummarizedExperiment only) Which assay to use. Default: the first assay. |
label |
Logical. If |
push_text |
Numeric. Expands or contracts text label positions. |
The function expects three numeric columns. If the experiment has more than
three columns, the name of the columns of interest can be specified by using
the parameter column_name. If x is
a SummarizedExperiment, it extracts the indicated assay and extracts
the columns of interest
It also uses:
- centmass() for computing comx and comy.
- entropy() for computing Shannon entropy, stored in the
entropy column. Between three variables, entropy rangeS between
0 and 1.585.
The ternary vertices are fixed at
,
and
.
If output_table = TRUE, a data.frame with the original three
columns plus:
comx, comy — Cartesian coordinates in the triangle;
color — final plotting colour;
entropy — Entropy scores for each gene;
max_counts — Maximum score across variables
library(SummarizedExperiment) library(airway) data('airway') se <- airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] ## Normalize the data first using tpm_normalization rowData(se)$gene_length <- rowData(se)$gene_seq_end - rowData(se)$gene_seq_start se <- tpm_normalization(se, log_trans = TRUE, new_assay_name = 'tpm_norm') # ------------------------------- # 1) Using a data.frame # ------------------------------- df <- assay(se, 'tpm_norm') |> as.data.frame() # Choose three columns of interest, in this case 'SRR1039508', 'SRR1039516' # and 'SRR1039512' # Default Behaviour plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE) # Colors can be modified plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('indianred', 'lightgreen', 'lightblue')) # Emphasis can be applied to highly dominant variables by controling # entropy parameter, # values outside of that range will be colored smokewhite. plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('indianred', 'lightgreen', 'lightblue'), entropyrange = c(0, 0.1)) # Points in the center are a reflection of genes with expression levels = 0. # This can be modified by adjusting the maxvalue range plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('indianred', 'lightgreen', 'lightblue'), entropyrange = c(0, 0.1), maxvaluerange = c(0.1, Inf)) # By controling entropy range, you can observe different types of genes. # Values closer to 0 represent dominance and closer to 1.6 shareness. plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('indianred', 'lightgreen', 'lightblue'), entropyrange = c(0, 0.4), maxvaluerange = c(0.1, Inf)) plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('indianred', 'lightgreen', 'lightblue'), entropyrange = c(0.4, 1.3), maxvaluerange = c(0.1, Inf)) plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('indianred', 'lightgreen', 'lightblue'), entropyrange = c(1.3, Inf), maxvaluerange = c(0.1, Inf)) # Same analysis can be performed by filtering out genes with low expression # values plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('indianred', 'lightgreen', 'lightblue'), entropyrange = c(1.2, Inf), maxvaluerange = c(2, Inf)) plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('indianred', 'lightgreen', 'lightblue'), entropyrange = c(1.2, Inf), maxvaluerange = c(5, Inf)) plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('indianred', 'lightgreen', 'lightblue'), entropyrange = c(1.2, Inf), maxvaluerange = c(10, Inf)) # Background points can be removed plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('indianred', 'lightgreen', 'lightblue'), entropyrange = c(1.2, Inf), maxvaluerange = c(2, Inf), plotAll = FALSE) # ------------------------------- # 1) Using a SummarizedExperiment # ------------------------------- plot_triangle(x = se, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('darkred', 'darkgreen', 'darkblue'), entropyrange = c(0, 0.4), maxvaluerange = c(0.1, Inf), assay_name = 'tpm_norm') plot_triangle(x = se, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('darkred', 'darkgreen', 'darkblue'), entropyrange = c(0.4, 1.3), maxvaluerange = c(0.1, Inf), assay_name = 'tpm_norm') plot_triangle(x = se, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('darkred', 'darkgreen', 'darkblue'), entropyrange = c(1.3, Inf), maxvaluerange = c(0.1, Inf), assay_name = 'tpm_norm') ### Obtaining the DF output for the analysis object = plot_triangle(x = se, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = TRUE, col = c('darkred', 'darkgreen', 'darkblue'), entropyrange = c(1.3, Inf), maxvaluerange = c(0.1, Inf), assay_name = 'tpm_norm') head(object)library(SummarizedExperiment) library(airway) data('airway') se <- airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] ## Normalize the data first using tpm_normalization rowData(se)$gene_length <- rowData(se)$gene_seq_end - rowData(se)$gene_seq_start se <- tpm_normalization(se, log_trans = TRUE, new_assay_name = 'tpm_norm') # ------------------------------- # 1) Using a data.frame # ------------------------------- df <- assay(se, 'tpm_norm') |> as.data.frame() # Choose three columns of interest, in this case 'SRR1039508', 'SRR1039516' # and 'SRR1039512' # Default Behaviour plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE) # Colors can be modified plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('indianred', 'lightgreen', 'lightblue')) # Emphasis can be applied to highly dominant variables by controling # entropy parameter, # values outside of that range will be colored smokewhite. plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('indianred', 'lightgreen', 'lightblue'), entropyrange = c(0, 0.1)) # Points in the center are a reflection of genes with expression levels = 0. # This can be modified by adjusting the maxvalue range plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('indianred', 'lightgreen', 'lightblue'), entropyrange = c(0, 0.1), maxvaluerange = c(0.1, Inf)) # By controling entropy range, you can observe different types of genes. # Values closer to 0 represent dominance and closer to 1.6 shareness. plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('indianred', 'lightgreen', 'lightblue'), entropyrange = c(0, 0.4), maxvaluerange = c(0.1, Inf)) plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('indianred', 'lightgreen', 'lightblue'), entropyrange = c(0.4, 1.3), maxvaluerange = c(0.1, Inf)) plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('indianred', 'lightgreen', 'lightblue'), entropyrange = c(1.3, Inf), maxvaluerange = c(0.1, Inf)) # Same analysis can be performed by filtering out genes with low expression # values plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('indianred', 'lightgreen', 'lightblue'), entropyrange = c(1.2, Inf), maxvaluerange = c(2, Inf)) plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('indianred', 'lightgreen', 'lightblue'), entropyrange = c(1.2, Inf), maxvaluerange = c(5, Inf)) plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('indianred', 'lightgreen', 'lightblue'), entropyrange = c(1.2, Inf), maxvaluerange = c(10, Inf)) # Background points can be removed plot_triangle(x = df, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('indianred', 'lightgreen', 'lightblue'), entropyrange = c(1.2, Inf), maxvaluerange = c(2, Inf), plotAll = FALSE) # ------------------------------- # 1) Using a SummarizedExperiment # ------------------------------- plot_triangle(x = se, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('darkred', 'darkgreen', 'darkblue'), entropyrange = c(0, 0.4), maxvaluerange = c(0.1, Inf), assay_name = 'tpm_norm') plot_triangle(x = se, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('darkred', 'darkgreen', 'darkblue'), entropyrange = c(0.4, 1.3), maxvaluerange = c(0.1, Inf), assay_name = 'tpm_norm') plot_triangle(x = se, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = FALSE, col = c('darkred', 'darkgreen', 'darkblue'), entropyrange = c(1.3, Inf), maxvaluerange = c(0.1, Inf), assay_name = 'tpm_norm') ### Obtaining the DF output for the analysis object = plot_triangle(x = se, column_name = c("SRR1039508", "SRR1039516", 'SRR1039512'), output_table = TRUE, col = c('darkred', 'darkgreen', 'darkblue'), entropyrange = c(1.3, Inf), maxvaluerange = c(0.1, Inf), assay_name = 'tpm_norm') head(object)
#' Transform entropy scores into categorical entropy scores
, or Inf if
.
@details
For each row and column , is defined as
if is
positive, or Inf otherwise.
Qentropy(x, assay_name = "Entropy", new_assay_name = "Qentropy")Qentropy(x, assay_name = "Entropy", new_assay_name = "Qentropy")
x |
A data.frame (already processed by 'entropy()') or a SummarizedExperiment (already processed by 'entropy()'). |
assay_name |
(SummarizedExperiment only) The name of the assay whose row-normalized data will be replaced by Q-values. If NULL, uses the first assay. |
new_assay_name |
If you prefer to store Q-values in a *new* assay, provide a name. By default 'Qentropy' |
If x is a data.frame: returns the same data.frame with numeric
columns replaced by values and Entropy column removed.
If x is a SummarizedExperiment: returns the same object with
the specified assay replaced by values (or a new assay
if new_assay_name is set) and rowData(x)$Entropy removed.
library(SummarizedExperiment) library(airway) data('airway') se = airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] # ------------------------------- # 1) Using a data.frame # ------------------------------- df = assay(se) |> as.data.frame() ## Entropy needs to be calculated first df = entropy(df) ## Then you can apply the Qentropy function df = Qentropy(df) head(df) # ------------------------------- # 2) Using a SummarizedExperiment # ------------------------------- ## Calculate Entropy first se2 = entropy(se, new_assay_name = 'Entropy') ## Transform entropy into Qentropy. new_assay_name specify a new assay ## where data is going to be stored. Assay_name must have Entropy transformed values ## By default, the function will look for an assay_name 'Entropy' and assign ## a new assay to 'Qentropy' se2 = Qentropy(se2, new_assay_name = 'Qentropy', assay_name = 'Entropy') se2library(SummarizedExperiment) library(airway) data('airway') se = airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] # ------------------------------- # 1) Using a data.frame # ------------------------------- df = assay(se) |> as.data.frame() ## Entropy needs to be calculated first df = entropy(df) ## Then you can apply the Qentropy function df = Qentropy(df) head(df) # ------------------------------- # 2) Using a SummarizedExperiment # ------------------------------- ## Calculate Entropy first se2 = entropy(se, new_assay_name = 'Entropy') ## Transform entropy into Qentropy. new_assay_name specify a new assay ## where data is going to be stored. Assay_name must have Entropy transformed values ## By default, the function will look for an assay_name 'Entropy' and assign ## a new assay to 'Qentropy' se2 = Qentropy(se2, new_assay_name = 'Qentropy', assay_name = 'Entropy') se2
Normalizes read counts by the quantile normalization method:
Each sample (column) is sorted, and values at each rank are averaged across columns
Each sample's values are replaced with the average of their respective rank
If log_trans = TRUE, applies log2(QN + 1)
transformation
quantile_normalization( x, log_trans = FALSE, assay_name = NULL, new_assay_name = NULL )quantile_normalization( x, log_trans = FALSE, assay_name = NULL, new_assay_name = NULL )
x |
A numeric
|
log_trans |
Logical. If |
assay_name |
If |
new_assay_name |
If |
If x is a SummarizedExperiment, the function will extract the
assay using assay_name, apply quantile normalization, and return a
new or updated assay. If x is a matrix or data.frame, normalization is
applied directly to the input matrix.
A numeric matrix of quantile-normalized (or log2-normalized)
values if x is a data.frame or matrix. If x is a
SummarizedExperiment, returns the modified SummarizedExperiment with the
normalized data placed in the existing or new assay.
library(SummarizedExperiment) library(airway) data('airway') se <- airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] # ------------------------------- # 1) Using a data.frame # ------------------------------- df <- assay(se) ## Without log transformation df_qn <- quantile_normalization(df, log_trans = FALSE) df_qn[1:5, 1:5] ## With log transformation df_qn_log <- quantile_normalization(df, log_trans = TRUE) df_qn_log[1:5, 1:5] # ------------------------------- # 2) Using a SummarizedExperiment # ------------------------------- ## Overwrite existing assay se2 <- quantile_normalization(se) assay(se2)[1:5, 1:5] ## Store result in new assay se3 <- quantile_normalization(se, new_assay_name = "quant_norm") assay(se3, "quant_norm")[1:5, 1:5] ## Use specific input assay (simulate new one) new_matrix <- matrix( data = sample(x = seq(1, 100000), size = nrow(se) * ncol(se), replace = TRUE), nrow = nrow(se), ncol = ncol(se) ) rownames(new_matrix) <- rownames(se) colnames(new_matrix) <- colnames(se) ## Create a new assay in the SummarizedExperiment assay(se, "new_counts") <- new_matrix ## Normalize the new assay and store it under a new name se4 <- quantile_normalization(se, assay_name = "new_counts", new_assay_name = "quant_new") assay(se4, "quant_new")[1:5, 1:5]library(SummarizedExperiment) library(airway) data('airway') se <- airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] # ------------------------------- # 1) Using a data.frame # ------------------------------- df <- assay(se) ## Without log transformation df_qn <- quantile_normalization(df, log_trans = FALSE) df_qn[1:5, 1:5] ## With log transformation df_qn_log <- quantile_normalization(df, log_trans = TRUE) df_qn_log[1:5, 1:5] # ------------------------------- # 2) Using a SummarizedExperiment # ------------------------------- ## Overwrite existing assay se2 <- quantile_normalization(se) assay(se2)[1:5, 1:5] ## Store result in new assay se3 <- quantile_normalization(se, new_assay_name = "quant_norm") assay(se3, "quant_norm")[1:5, 1:5] ## Use specific input assay (simulate new one) new_matrix <- matrix( data = sample(x = seq(1, 100000), size = nrow(se) * ncol(se), replace = TRUE), nrow = nrow(se), ncol = ncol(se) ) rownames(new_matrix) <- rownames(se) colnames(new_matrix) <- colnames(se) ## Create a new assay in the SummarizedExperiment assay(se, "new_counts") <- new_matrix ## Normalize the new assay and store it under a new name se4 <- quantile_normalization(se, assay_name = "new_counts", new_assay_name = "quant_new") assay(se4, "quant_new")[1:5, 1:5]
Normalizes read counts by the RPKM (Reads Per Kilobase per Million mapped reads) method:
Normalize counts by library size (column sums), scaled to millions.
Divide each gene's value by its length in kilobases.
If log_trans = TRUE, applies log2(RPKM + 1).
rpkm_normalization( x, gene_length = NULL, log_trans = FALSE, assay_name = NULL, new_assay_name = NULL )rpkm_normalization( x, gene_length = NULL, log_trans = FALSE, assay_name = NULL, new_assay_name = NULL )
x |
A numeric
|
gene_length |
A numeric vector of gene lengths (one per row), used
only if |
log_trans |
Logical. If |
assay_name |
If |
new_assay_name |
If |
If x is a SummarizedExperiment, the function looks for a
numeric column named "gene_length" in rowData(x). That column
must have length equal to the number of rows in the assay being normalized.
A numeric matrix of RPKM or log2(RPKM + 1) values if
x is a data.frame or matrix. If x is a SummarizedExperiment,
returns the modified SummarizedExperiment with the RPKM data placed in the
existing or new assay.
library(SummarizedExperiment) library(airway) data('airway') se = airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] ### Adding a column in rowData regarding the gene_length rowData(se)$gene_length = rowData(se)$gene_seq_end - rowData(se)$gene_seq_start # ------------------------------- # 1) Using a data.frame # ------------------------------- gene_length = rowData(se)$gene_length df = assay(se) ## Without log transformation df = rpkm_normalization(df, gene_length = gene_length) df[1:5, 1:5] ## With log transformation df = rpkm_normalization(df, gene_length = gene_length, log_trans = TRUE) df[1:5, 1:5] # ------------------------------- # 2) Using a SummarizedExperiment # ------------------------------- # If no new_assay_name is provided, then overwrites existing assay se2 = rpkm_normalization(se, log_trans = FALSE) head(assay(se2)) # If new_assay_name is given, normalization stored in a new assay se2 = rpkm_normalization(se, log_trans = FALSE, new_assay_name = 'rpkm_counts') head(assay(se2, 'rpkm_counts')) # Creating a new assay to test specific input new_matrix = matrix(data = sample(x = seq(1, 100000), size = nrow(se) * ncol(se), replace = TRUE), nrow = nrow(se), ncol = ncol(se)) rownames(new_matrix) = rownames(se) colnames(new_matrix) = colnames(se) assay(se, 'new_counts') = new_matrix se2 = rpkm_normalization(se, new_assay_name = 'rpkm_counts_new', assay_name = 'new_counts') head(assay(se2, 'rpkm_counts_new'))library(SummarizedExperiment) library(airway) data('airway') se = airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] ### Adding a column in rowData regarding the gene_length rowData(se)$gene_length = rowData(se)$gene_seq_end - rowData(se)$gene_seq_start # ------------------------------- # 1) Using a data.frame # ------------------------------- gene_length = rowData(se)$gene_length df = assay(se) ## Without log transformation df = rpkm_normalization(df, gene_length = gene_length) df[1:5, 1:5] ## With log transformation df = rpkm_normalization(df, gene_length = gene_length, log_trans = TRUE) df[1:5, 1:5] # ------------------------------- # 2) Using a SummarizedExperiment # ------------------------------- # If no new_assay_name is provided, then overwrites existing assay se2 = rpkm_normalization(se, log_trans = FALSE) head(assay(se2)) # If new_assay_name is given, normalization stored in a new assay se2 = rpkm_normalization(se, log_trans = FALSE, new_assay_name = 'rpkm_counts') head(assay(se2, 'rpkm_counts')) # Creating a new assay to test specific input new_matrix = matrix(data = sample(x = seq(1, 100000), size = nrow(se) * ncol(se), replace = TRUE), nrow = nrow(se), ncol = ncol(se)) rownames(new_matrix) = rownames(se) colnames(new_matrix) = colnames(se) assay(se, 'new_counts') = new_matrix se2 = rpkm_normalization(se, new_assay_name = 'rpkm_counts_new', assay_name = 'new_counts') head(assay(se2, 'rpkm_counts_new'))
Normalizes read counts by the TPM (Transcripts Per Million) method:
If log_trans = TRUE, applies log2(TPM + 1).
tpm_normalization( x, gene_length = NULL, log_trans = FALSE, assay_name = NULL, new_assay_name = NULL )tpm_normalization( x, gene_length = NULL, log_trans = FALSE, assay_name = NULL, new_assay_name = NULL )
x |
A numeric
|
gene_length |
A numeric vector of gene lengths (one per row), used
only if |
log_trans |
Logical. If |
assay_name |
If |
new_assay_name |
If |
If x is a SummarizedExperiment, this function looks for a
numeric column named "gene_length" in rowData(x). That column
must have length equal to the number of rows in the assay being normalized.
A numeric matrix of TPM or log2(TPM + 1) values if
x is a data.frame or matrix. If x is a SummarizedExperiment,
returns the modified SummarizedExperiment with the TPM data placed in the
existing or new assay.
library(SummarizedExperiment) library(airway) data('airway') se = airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] ### Adding a column in rowData regarding the gene_length rowData(se)$gene_length = rowData(se)$gene_seq_end - rowData(se)$gene_seq_start # ------------------------------- # 1) Using a data.frame # ------------------------------- gene_length = rowData(se)$gene_length df = assay(se) ## Without log transformation df = tpm_normalization(df, gene_length = gene_length) df[1:5, 1:5] ## With log transformation df = tpm_normalization(df, gene_length = gene_length, log_trans = TRUE) df[1:5, 1:5] # ------------------------------- # 2) Using a SummarizedExperiment # ------------------------------- # If now new_assay_name is provided, then overwrites existing assay se2 = tpm_normalization(se, log_trans = FALSE) head(assay(se2)) # If new new_assay_name, normalization stored in a new object se2 = tpm_normalization(se, log_trans = FALSE, new_assay_name = 'tpm_counts') head(assay(se2, 'tpm_counts')) # A specific assay can also be selected new_matrix = matrix(data = sample(x = seq(1, 100000), size = nrow(se) * ncol(se), replace = TRUE), nrow = nrow(se), ncol = ncol(se)) rownames(new_matrix) = rownames(se) colnames(new_matrix) = colnames(se) ## Creating a new assay called new counts assay(se, 'new_counts') = new_matrix se2 = tpm_normalization(se, new_assay_name = 'tpm_counts_new', assay_name = 'new_counts') se2 head(assay(se2, 'tpm_counts_new'))library(SummarizedExperiment) library(airway) data('airway') se = airway # Only use a random subset of 1000 rows set.seed(123) idx <- sample(seq_len(nrow(se)), size = min(1000, nrow(se))) se <- se[idx, ] ### Adding a column in rowData regarding the gene_length rowData(se)$gene_length = rowData(se)$gene_seq_end - rowData(se)$gene_seq_start # ------------------------------- # 1) Using a data.frame # ------------------------------- gene_length = rowData(se)$gene_length df = assay(se) ## Without log transformation df = tpm_normalization(df, gene_length = gene_length) df[1:5, 1:5] ## With log transformation df = tpm_normalization(df, gene_length = gene_length, log_trans = TRUE) df[1:5, 1:5] # ------------------------------- # 2) Using a SummarizedExperiment # ------------------------------- # If now new_assay_name is provided, then overwrites existing assay se2 = tpm_normalization(se, log_trans = FALSE) head(assay(se2)) # If new new_assay_name, normalization stored in a new object se2 = tpm_normalization(se, log_trans = FALSE, new_assay_name = 'tpm_counts') head(assay(se2, 'tpm_counts')) # A specific assay can also be selected new_matrix = matrix(data = sample(x = seq(1, 100000), size = nrow(se) * ncol(se), replace = TRUE), nrow = nrow(se), ncol = ncol(se)) rownames(new_matrix) = rownames(se) colnames(new_matrix) = colnames(se) ## Creating a new assay called new counts assay(se, 'new_counts') = new_matrix se2 = tpm_normalization(se, new_assay_name = 'tpm_counts_new', assay_name = 'new_counts') se2 head(assay(se2, 'tpm_counts_new'))