| Title: | Tsallis Entropy Analysis Toolbox |
|---|---|
| Description: | Quantifies and models isoform-usage complexity in RNA-seq data using Tsallis entropy, a scale-dependent diversity measure. By tuning the entropic index parameter (q), TSENAT examines transcriptome heterogeneity at different scales: rare variants (low q) or dominant isoforms (high q). It enables computing Tsallis entropy and Tsallis divergence from transcript-level estimates, comparing measures between conditions, testing for differences, and visualizing scale-dependent complexity via q-curves. |
| Authors: | Cristóbal Gallardo [aut, cre] (ORCID: <https://orcid.org/0000-0002-5752-2155>) |
| Maintainer: | Cristóbal Gallardo <[email protected]> |
| License: | GPL (>= 3) + file LICENSE |
| Version: | 0.99.17 |
| Built: | 2026-06-02 23:40:34 UTC |
| Source: | https://github.com/bioc/TSENAT |
Define generic functions for accessing components of TSENATAnalysis objects. These are the recommended way to extract results from analysis objects, following Bioconductor best practices.
These generic functions return different types depending on the specific method: - Accessor methods return data.frames or lists containing analysis results - See individual method documentation for specific return types
Convenience wrapper that combines . build_se() and
TSENATAnalysis()
into a single function call. This creates a complete analysis object
ready for
Tsallis entropy computation and downstream analysis.
build_analysis( readcounts = NULL, salmon_dir = NULL, tx2gene, assay_name = "counts", metadata = NULL, tpm = NULL, effective_length = NULL, config = list(), skip = FALSE, verbose = FALSE )build_analysis( readcounts = NULL, salmon_dir = NULL, tx2gene, assay_name = "counts", metadata = NULL, tpm = NULL, effective_length = NULL, config = list(), skip = FALSE, verbose = FALSE )
readcounts |
A matrix or data.frame of transcript-level read counts with
transcript IDs as row names and sample names as column names. Typically
output
from quantification tools (SALMON, kallisto, etc. ). Optional when
|
salmon_dir |
Optional character path to directory containing Salmon
quantification
output. Expected structure: |
tx2gene |
Either: - A path to a GFF3 or GFF3.gz file containing transcript-to-gene mapping - A path to a TSV file with columns 'Transcript' and 'Gene' - A data.frame with transcript-to-gene mapping |
assay_name |
Character. Name for the assay (default: 'counts'). |
metadata |
Optional data.frame with sample metadata. Should have sample
names as row names and metadata columns (e.g., sample_type, condition, etc.).
If NULL, will attempt to read from |
tpm |
REQUIRED matrix of transcript-level TPM values (Transcripts Per Million). Must be provided and will be stored in the SummarizedExperiment for use during diverse filtering. Same dimensions as readcounts required (rows = transcripts, columns = samples). Typically from SALMON quantification output. If not provided to build_analysis(), subsequent filter_analysis() calls will fail with an explicit error message. |
effective_length |
REQUIRED numeric vector of transcript effective lengths (e.g., from SALMON EffectiveLength column). Length must match nrow(readcounts). Typically obtained as the median effective length across all samples. If not provided to build_analysis(), the data will not be stored for later use in length-normalized calculations. |
config |
Optional list of configuration parameters to store in the
TSENATAnalysis object. Can also contain |
skip |
Logical. If TRUE, allow unmapped transcripts (transcripts not found in tx2gene mapping) and remove them from analysis. If FALSE (default), stop with an error when unmapped transcripts are detected. Useful for handling data with transcript IDs that don't match the annotation file provided. |
verbose |
Logical. If TRUE, print informative messages during execution (e.g., Salmon sample discovery, progress on data loading). Default: TRUE. |
When using salmon_dir, the function automatically:
Discovers all Salmon sample folders and quant.sf files
Reads transcript counts (NumReads), TPM, and effective_length
Extracts sample names from directory structure
Creates count matrix ready for analysis
The salmon_dir parameter provides a convenient alternative to manually
constructing the readcounts matrix,
especially useful in Galaxy workflows.
A TSENATAnalysis S4 object with:
@se |
The SummarizedExperiment containing transcript counts and metadata |
@config |
Analysis configuration (empty list or user-provided) |
@diversity_results |
Empty list (populated by calculate_diversity()) |
@divergence_results |
Empty list (populated by calculate_divergence()) |
@sait_results |
Empty list (populated by calculate_sait()) |
@jackknife_results |
Empty list (populated by jackknife functions) |
@plots |
Empty list (populated by plotting functions) |
@metadata |
Metadata with package version and creation timestamp |
CRITICAL: TPM and effective_length Requirements
Both tpm and effective_length MUST be provided to ensure correct
filtering and normalization in downstream analysis:
filter_analysis() requires TPM data (stored in metadata).
If TPM is missing, the function will fail with an explicit error message
that guides you to pass it to build_analysis().
calculate_diversity() uses effective_length for
length-normalized entropy calculations.
Following Bioconductor best practices (fail-fast principle), these are explicit parameters, not optional. They must be passed at object construction time:
analysis <- build_analysis( readcounts = readcounts, metadata = metadata_df, tx2gene = gff3_file, tpm = tpm, # REQUIRED from Salmon output effective_length = effective_length, # REQUIRED from Salmon output config = config )
This wrapper combines two steps into one:
Call .build_se() to create a SummarizedExperiment from transcript counts
Wrap the result in TSENATAnalysis() to create the analysis object
The returned object is ready for diversity analysis via calculate_diversity().
If you need to inspect or filter the SummarizedExperiment before creating the
TSENATAnalysis object, call .build_se() and TSENATAnalysis() separately.
TSENATAnalysis for the S4 class structure
calculate_diversity for computing Tsallis entropy
# Create example transcript count data set.seed(42) n_genes <- 10 n_isoforms_per_gene <- 3 n_isoforms <- n_genes * n_isoforms_per_gene n_samples <- 10 # Generate count matrix counts <- matrix(rpois(n_isoforms * n_samples, lambda = 20), nrow = n_isoforms, ncol = n_samples) rownames(counts) <- paste0('TX_', 1:n_isoforms) colnames(counts) <- paste0('Sample_', 1:n_samples) # Create tx2gene mapping tx2gene <- data.frame( Transcript = rownames(counts), Gene = rep(paste0('GENE_', 1:n_genes), each = n_isoforms_per_gene)) # Create sample metadata metadata <- data.frame( sample = colnames(counts), condition = rep(c('control', 'treatment'), each = 5), row.names = colnames(counts)) # Build analysis object - use NAMED parameters to avoid confusion # Method 1: With explicit tx2gene data.frame (most common) config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis( readcounts = counts, tx2gene = tx2gene, metadata = metadata, config = config) # Verify the analysis object was created analysis print(dim(analysis)) # Method 2: From Salmon quantification folder # Requires directory structure like: # salmon_output/ # sample1/quant.sf # sample2/quant.sf # ... # # salmon_dir <- '/path/to/salmon/directory' # # First create sample metadata matching Salmon sample names # salmon_metadata <- data.frame( # condition = c('control', 'control', 'treatment', 'treatment'), # row.names = c('sample1', 'sample2', 'sample3', 'sample4') # ) # # analysis_salmon <- build_analysis( # salmon_dir = salmon_dir, # tx2gene = 'annotation.gff3.gz', # Auto-parsed from GFF3 # metadata = salmon_metadata # ) # # Method 3: Hybrid - Salmon counts with manual tx2gene # analysis_hybrid <- build_analysis( # salmon_dir = salmon_dir, # tx2gene = tx2gene, # data.frame instead of file # metadata = salmon_metadata # ) # # Method 4: Pass metadata via config (parameter resolution pattern) # cfg <- TSENAT_config() # cfg$metadata <- metadata # analysis_with_config <- build_analysis( # readcounts = counts, # tx2gene = tx2gene, # config = cfg # # Note: metadata argument omitted - will be read from config$metadata # ) # Advanced: Assigning metadata to assays after object creation # When adding metadata to SummarizedExperiment assays, always use the # S4Vectors namespace to ensure proper method dispatch: # # se <- getSE(analysis) # assay_with_ci <- SummarizedExperiment::assay(se, 'log2fc_ci') # S4Vectors::metadata(assay_with_ci)$lower <- ci_lower_bounds # S4Vectors::metadata(assay_with_ci)$upper <- ci_upper_bounds # # Note: Avoid using metadata(assay) without the namespace - this can # cause silent failures in S4 object metadata assignment.# Create example transcript count data set.seed(42) n_genes <- 10 n_isoforms_per_gene <- 3 n_isoforms <- n_genes * n_isoforms_per_gene n_samples <- 10 # Generate count matrix counts <- matrix(rpois(n_isoforms * n_samples, lambda = 20), nrow = n_isoforms, ncol = n_samples) rownames(counts) <- paste0('TX_', 1:n_isoforms) colnames(counts) <- paste0('Sample_', 1:n_samples) # Create tx2gene mapping tx2gene <- data.frame( Transcript = rownames(counts), Gene = rep(paste0('GENE_', 1:n_genes), each = n_isoforms_per_gene)) # Create sample metadata metadata <- data.frame( sample = colnames(counts), condition = rep(c('control', 'treatment'), each = 5), row.names = colnames(counts)) # Build analysis object - use NAMED parameters to avoid confusion # Method 1: With explicit tx2gene data.frame (most common) config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis( readcounts = counts, tx2gene = tx2gene, metadata = metadata, config = config) # Verify the analysis object was created analysis print(dim(analysis)) # Method 2: From Salmon quantification folder # Requires directory structure like: # salmon_output/ # sample1/quant.sf # sample2/quant.sf # ... # # salmon_dir <- '/path/to/salmon/directory' # # First create sample metadata matching Salmon sample names # salmon_metadata <- data.frame( # condition = c('control', 'control', 'treatment', 'treatment'), # row.names = c('sample1', 'sample2', 'sample3', 'sample4') # ) # # analysis_salmon <- build_analysis( # salmon_dir = salmon_dir, # tx2gene = 'annotation.gff3.gz', # Auto-parsed from GFF3 # metadata = salmon_metadata # ) # # Method 3: Hybrid - Salmon counts with manual tx2gene # analysis_hybrid <- build_analysis( # salmon_dir = salmon_dir, # tx2gene = tx2gene, # data.frame instead of file # metadata = salmon_metadata # ) # # Method 4: Pass metadata via config (parameter resolution pattern) # cfg <- TSENAT_config() # cfg$metadata <- metadata # analysis_with_config <- build_analysis( # readcounts = counts, # tx2gene = tx2gene, # config = cfg # # Note: metadata argument omitted - will be read from config$metadata # ) # Advanced: Assigning metadata to assays after object creation # When adding metadata to SummarizedExperiment assays, always use the # S4Vectors namespace to ensure proper method dispatch: # # se <- getSE(analysis) # assay_with_ci <- SummarizedExperiment::assay(se, 'log2fc_ci') # S4Vectors::metadata(assay_with_ci)$lower <- ci_lower_bounds # S4Vectors::metadata(assay_with_ci)$upper <- ci_upper_bounds # # Note: Avoid using metadata(assay) without the namespace - this can # cause silent failures in S4 object metadata assignment.
Test statistical assumptions on diversity data in TSENATAnalysis
calculate_assumptions( analysis, q = NULL, checks = "rank", alpha = 0.05, format = "text", ... ) ## S4 method for signature 'TSENATAnalysis' calculate_assumptions( analysis, q = NULL, checks = "rank", alpha = 0.05, format = "text", ... )calculate_assumptions( analysis, q = NULL, checks = "rank", alpha = 0.05, format = "text", ... ) ## S4 method for signature 'TSENATAnalysis' calculate_assumptions( analysis, q = NULL, checks = "rank", alpha = 0.05, format = "text", ... )
analysis |
|
q |
|
checks |
|
alpha |
|
format |
|
... |
Additional arguments (output_file, verbose for file output). |
This wrapper calls .calculate_assumptions() on diversity data
extracted from the analysis object. Evaluates data stability and
consistency across dimensions. Results include:
Permutation test for independence and temporal/spatial structure
Spearman correlation consistency across rows
Kendall's W concordance and ICC across samples
**Data Extraction Priority:**
1. If q specified: uses diversity result for that q-value
2. If q NULL: uses first available diversity result
3. If no diversity results:
extracts from cached combined result (@metadata$diversity_combined)
Modified TSENATAnalysis object with assumption test results stored
in @metadata$rankbased_assumptions.
# Load example data (matching TSENAT.Rmd workflow) data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Build analysis from vignette data and create small subset config <- TSENAT_config( sample_col = 'sample', condition_col = 'condition', q_values = seq(0, 2, by = 0.05), paired = FALSE ) analysis <- build_analysis( readcounts = readcounts, metadata = metadata_df, tx2gene = gff3_dataset, config = config, tpm = tpm, effective_length = effective_length ) analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes = 200) analysis <- calculate_diversity(analysis, q = c(0.5, 1.0, 1.5)) analysis <- calculate_assumptions(analysis, q = 1.0) # Check results using rank_test accessor results_df <- results(analysis, type = 'rank_test')# Load example data (matching TSENAT.Rmd workflow) data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Build analysis from vignette data and create small subset config <- TSENAT_config( sample_col = 'sample', condition_col = 'condition', q_values = seq(0, 2, by = 0.05), paired = FALSE ) analysis <- build_analysis( readcounts = readcounts, metadata = metadata_df, tx2gene = gff3_dataset, config = config, tpm = tpm, effective_length = effective_length ) analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes = 200) analysis <- calculate_diversity(analysis, q = c(0.5, 1.0, 1.5)) analysis <- calculate_assumptions(analysis, q = 1.0) # Check results using rank_test accessor results_df <- results(analysis, type = 'rank_test')
Compares statistical results from two different methods (typically SAIT/GAM for continuous data and Scheirer-Ray-Hare rank tests) to assess agreement and identify genes detected by one method but not the other.
calculate_concordance(analysis_sait, analysis_rank = NULL, ...) ## S4 method for signature 'TSENATAnalysis' calculate_concordance( analysis_sait, analysis_rank = NULL, verbose = FALSE, output_file = NULL, ... )calculate_concordance(analysis_sait, analysis_rank = NULL, ...) ## S4 method for signature 'TSENATAnalysis' calculate_concordance( analysis_sait, analysis_rank = NULL, verbose = FALSE, output_file = NULL, ... )
analysis_sait |
|
analysis_rank |
|
... |
Additional arguments for future extensibility. |
verbose |
|
output_file |
|
Compares results from two different statistical methods (typically GAM for continuous and Scheirer-Ray-Hare for rank-based analysis) on the same data. Identifies: - Genes significant in both methods (high confidence) - Genes detected by one method only (potential false positives or method-specific signal) - Spearman correlation of p-values (overall agreement trends)
Modified TSENATAnalysis object with concordance results stored in:
@metadata$method_concordance:
Data frame comparing results from both methods
Spearman correlation between adjusted p-values
Genes with strong agreement
Contingency table of significant/non-significant calls
Method name used for SAIT/GAM analysis
Method name used for rank-based analysis
When concordance was computed
# Compare results from SAIT and rank-based testing # (Requires pre-computed analysis objects from calculate_sait and calculate_srh) # results_df <- results(calculate_concordance(analysis_sait, analysis_rank))# Compare results from SAIT and rank-based testing # (Requires pre-computed analysis objects from calculate_sait and calculate_srh) # results_df <- results(calculate_concordance(analysis_sait, analysis_rank))
Wrapper around .calculate_divergence() that manages TSENATAnalysis object. Calculates Tsallis divergence between experimental conditions for transcripts across multiple q-values to detect condition-specific transcript remodeling.
calculate_divergence( analysis, q = NULL, verbose = FALSE, nthreads = NULL, output_file = NULL, control_group = NULL, paired = FALSE, method = NULL, bootstrap = FALSE, nboot = NULL, progress = FALSE, ... )calculate_divergence( analysis, q = NULL, verbose = FALSE, nthreads = NULL, output_file = NULL, control_group = NULL, paired = FALSE, method = NULL, bootstrap = FALSE, nboot = NULL, progress = FALSE, ... )
analysis |
|
q |
|
verbose |
|
nthreads |
|
output_file |
|
control_group |
|
paired |
|
method |
|
bootstrap |
|
nboot |
|
progress |
|
... |
Additional arguments passed to the base divergence function. |
Key Features:
Multi-q analysis: Divergence computed across full q-spectrum simultaneously
Bootstrap confidence intervals: Quantify uncertainty in divergence estimates
Multiple testing correction: Hochberg, Benjamini-Yekutieli, or no correction
Paired designs: Supports paired/repeated measures via subject_col parameter
Effect size reporting: Log-fold-change and confidence intervals per gene
Flexible control group: Compare any condition vs. any other condition
**Mathematical Background:** Tsallis divergence D_q between two probability distributions:
D_q(P||Q) = (log_2(N) - entropy_q(P) + entropy_q(Q)) / (q - 1)
Measures how much transcript composition changes from control to condition. Values near 0: Similar isoform composition; Large positive values: Major change.
**Example Use Case:**
Control sample: All reads from dominant isoform (low entropy)
Tumor sample: Reads spread across multiple isoforms (high entropy)
Result: Large divergence indicating condition-specific isoform switching.
**Parameter Resolution:** Parameters are resolved in priority order: 1. Explicit arguments passed to function 2. Values from analysis@config (if present) 3. Function defaults
Requires diversity results from calculate_diversity() as prerequisite.
Modified TSENATAnalysis with divergence metrics in
@divergence_results (stored as list of data.frames or matrices).
# Load example data (matching TSENAT.Rmd workflow) data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Build analysis from vignette data and create manageable subset config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis( readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length ) # Use 200+ genes to ensure diversity filtering doesn't remove all genes analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes = 200) # Compute diversity first (required for divergence) analysis <- calculate_diversity(analysis, q = c(0.5, 1.0, 1.5)) # Calculate divergence across q-values analysis <- calculate_divergence(analysis, q = c(0.5, 1.0, 1.5)) # Check divergence results using unified accessor head(results(analysis, type = 'divergence'))# Load example data (matching TSENAT.Rmd workflow) data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Build analysis from vignette data and create manageable subset config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis( readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length ) # Use 200+ genes to ensure diversity filtering doesn't remove all genes analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes = 200) # Compute diversity first (required for divergence) analysis <- calculate_diversity(analysis, q = c(0.5, 1.0, 1.5)) # Calculate divergence across q-values analysis <- calculate_divergence(analysis, q = c(0.5, 1.0, 1.5)) # Check divergence results using unified accessor head(results(analysis, type = 'divergence'))
Wrapper around .calculate_diversity() that manages TSENATAnalysis object. Calculates Tsallis entropy (diversity) across multiple q-values for each gene to quantify isoform complexity and transcript heterogeneity.
calculate_diversity( analysis, q = NULL, norm = TRUE, norm_method = NULL, reference_group = NULL, verbose = NULL, show_messages = FALSE, what = NULL, nthreads = NULL, pseudocount = NULL, min_valid_frac = NULL, shrinkage = NULL, bootstrap = NULL, nboot = NULL, bootstrap_method = NULL, bootstrap_ci = NULL, bootstrap_include_diagnostics = NULL, output_file = NULL, ... )calculate_diversity( analysis, q = NULL, norm = TRUE, norm_method = NULL, reference_group = NULL, verbose = NULL, show_messages = FALSE, what = NULL, nthreads = NULL, pseudocount = NULL, min_valid_frac = NULL, shrinkage = NULL, bootstrap = NULL, nboot = NULL, bootstrap_method = NULL, bootstrap_ci = NULL, bootstrap_include_diagnostics = NULL, output_file = NULL, ... )
analysis |
|
q |
|
norm |
|
norm_method |
If NULL, reads from |
reference_group |
|
verbose |
|
show_messages |
|
what |
|
nthreads |
|
pseudocount |
|
min_valid_frac |
|
shrinkage |
|
bootstrap |
|
nboot |
|
bootstrap_method |
|
bootstrap_ci |
|
bootstrap_include_diagnostics |
|
output_file |
Example: output_file = 'analysis.rds' generates:
The spectrum file contains columns: q, central (median diversity), spread (IQR), count, and group (if grouping variable available). Default: NULL (no file output). |
... |
Additional arguments passed to underlying functions for extensibility. |
Key Features:
Multi-q analysis: Entropy computed for q = 0.01 to 2.00 (by default)
Normalized entropy: Scale to [0, 1] using theoretical maximum log(m)
Bootstrap confidence intervals: Quantify uncertainty in estimates
TPM normalization: Optional SALMON TPM-based weighting
Multiple normalization methods: Range, Z-score, log-odds-ratio, relative
Spectrum computation: Aggregate diversity statistics across all q-values
**Mathematical Background:** Tsallis entropy H_q for q-parameter:
H_q(X) = (1/(q-1)) * (1 - Sum p_i^q) [for q != 1] H_1(X) = -Sum p_i * log(p_i) [Shannon entropy, limit q→1]
where p_i = relative abundance of isoform i for a gene. Larger q emphasizes dominant isoforms; smaller q emphasizes rare ones.
**Example Interpretation:**
Gene with 1 isoform: H_q = 0 for all q (no diversity)
Gene with 2 equal isoforms: H_q ~ 0.5-1.0 depending on q
Gene with m equally abundant isoforms: H_q = 1.0 (maximum diversity)
This wrapper calls .calculate_diversity() once per q-value, storing
results as SummarizedExperiment objects. It extracts key parameters from
analysis@config with
priority resolution (explicit > @config > default).
**Diversity Spectrum Computation:**
By default, this function computes and saves a diversity spectrum (aggregated
statistics across all q-values and groups) when
output_file is provided.
The spectrum contains:
q: Diversity parameter value
central: Median (or mean) diversity across all genes
spread: IQR (or SD) around central value
count: Number of valid measurements
group: Condition group (if applicable)
This provides a quick summary of how diversity changes across q-values,
useful for q-curve visualization and statistical comparisons.
Spectrum is saved as: *_spectrum.tsv
**Parameter Priority Resolution:**
Priority 1 (explicit) > Priority 2 (@config$q) > Priority 3 (default: seq(0.01, 2, by=0.05)).
Accepts single or multiple q-values (for spectrum computation).
Priority: explicit > @config$nthreads > 1
Priority: explicit > @config$verbose > TRUE
Priority: explicit > @config$bootstrap > FALSE
Priority: explicit > @config$pseudocount > 0
Priority: explicit > @config$norm > TRUE
Priority:
explicit > @config$what > 'S' (Tsallis entropy)
**Audit Trail:** After execution, check:
analysis@config$last_diversity_run$parameters_used:
Actual parameters
(not original @config)
attr(diversity(analysis, q=X), 'computed_with'):
Per-q-value metadata
(timestamp, bootstrap setting, nthreads, etc.)
For additional details on diversity spectrum calculations and normalization methods, see the package vignettes.
Modified TSENATAnalysis object with diversity results stored
in @diversity_results, keyed by 'q_X. XX. . . ' format (e. g. ,
'q_1. 000').
When output_file is provided, also generates:
Primary file: Analysis object or table export
Spectrum file: *_spectrum.tsv containing
aggregated diversity statistics across q-values and groups
# Load vignette data and build analysis data(readcounts) metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis(readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length) # Filter to manageable size (use 200+ genes to survive diversity filtering) analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes = 200) # Compute diversity and access results using unified accessor analysis <- calculate_diversity(analysis, q = c(0.5, 1.0), verbose = FALSE) head(results(analysis, type = 'diversity', q = 1.0))# Load vignette data and build analysis data(readcounts) metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis(readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length) # Filter to manageable size (use 200+ genes to survive diversity filtering) analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes = 200) # Compute diversity and access results using unified accessor analysis <- calculate_diversity(analysis, q = c(0.5, 1.0), verbose = FALSE) head(results(analysis, type = 'diversity', q = 1.0))
S4 wrapper for . calculate_effect_sizes() that
extracts divergence and LM
results directly from a TSENATAnalysis object.
calculate_effect_sizes( analysis, significance_threshold = NULL, enrich_per_q_pattern = NULL, verbose = NULL, output_file = NULL, ... )calculate_effect_sizes( analysis, significance_threshold = NULL, enrich_per_q_pattern = NULL, verbose = NULL, output_file = NULL, ... )
analysis |
|
significance_threshold |
|
enrich_per_q_pattern |
|
verbose |
|
output_file |
|
... |
Additional arguments passed to the base function. |
**Workflow steps:**
Input analysis object and required results
Divergence SE and SAIT results from analysis slots
Divergence SE with gene names via tx2gene or direct mapping
Effect sizes using base .calculate_effect_sizes()
Results in metadata with function call tracking
**Parameter resolution priority** (explicit > metadata > default):
significance_threshold: Uses explicit arg,
else metadata(analysis)$significance_threshold,
else 0.05
enrich_per_q_pattern: Uses explicit arg,
else metadata(analysis)$enrich_per_q_pattern,
else TRUE
verbose: Uses explicit arg,
else metadata(analysis)$verbose, else TRUE
Results are accessed via: metadata(analysis)$effect_sizes_divergence
Modified TSENATAnalysis with effect size results stored via
metadata(analysis)$effect_sizes_divergence.
Returns the analysis object visibly
to support piping and method chaining.
calculate_divergence for divergence wrapper,
calculate_sait for SAIT interaction wrapper
# Setup: Create test analysis with divergence and SAIT interaction results data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Configure analysis parameters (best practice for reproducibility) ## Not run: config <- TSENAT_config( sample_col = 'sample', condition_col = 'condition', subject_col = 'paired_samples', paired = TRUE, control = 'normal', q = seq(0, 2, by = 0.5) # Multiple q-values for SAIT (5 unique values) ) analysis <- build_analysis(readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length) analysis <- filter_analysis(analysis, stringency = 'severe') analysis <- calculate_diversity(analysis) analysis <- calculate_divergence(analysis) analysis <- suppressWarnings(calculate_sait(analysis, method = 'gam')) # Compute effect sizes from divergence results analysis <- calculate_effect_sizes(analysis, significance_threshold = 0.05) # Access results using unified results accessor effect_size_results <- results(analysis, type = 'effect_sizes_divergence') # View structure of results str(effect_size_results, max.level = 1) ## End(Not run)# Setup: Create test analysis with divergence and SAIT interaction results data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Configure analysis parameters (best practice for reproducibility) ## Not run: config <- TSENAT_config( sample_col = 'sample', condition_col = 'condition', subject_col = 'paired_samples', paired = TRUE, control = 'normal', q = seq(0, 2, by = 0.5) # Multiple q-values for SAIT (5 unique values) ) analysis <- build_analysis(readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length) analysis <- filter_analysis(analysis, stringency = 'severe') analysis <- calculate_diversity(analysis) analysis <- calculate_divergence(analysis) analysis <- suppressWarnings(calculate_sait(analysis, method = 'gam')) # Compute effect sizes from divergence results analysis <- calculate_effect_sizes(analysis, significance_threshold = 0.05) # Access results using unified results accessor effect_size_results <- results(analysis, type = 'effect_sizes_divergence') # View structure of results str(effect_size_results, max.level = 1) ## End(Not run)
Jackknife resampling with confidence intervals
calculate_jeo( analysis, q = NULL, norm = NULL, log_base = NULL, top_n = NULL, verbose = NULL, nthreads = NULL, pseudocount = NULL, output_file = NULL, ... )calculate_jeo( analysis, q = NULL, norm = NULL, log_base = NULL, top_n = NULL, verbose = NULL, nthreads = NULL, pseudocount = NULL, output_file = NULL, ... )
analysis |
|
q |
|
norm |
|
log_base |
|
top_n |
|
verbose |
|
nthreads |
|
pseudocount |
|
output_file |
|
... |
Additional arguments passed to the base function. |
Requires diversity results to exist first. Will error if
calculate_diversity() has not been run.
**Parameter Priority Resolution:**
Priority: explicit > @config$nthreads > 1
Modified TSENATAnalysis with jackknife results in @jackknife_results.
# Load example data (matching TSENAT.Rmd workflow) data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # TPM and effective_length REQUIRED for filter_analysis() tpm <- matrix(runif(nrow(readcounts) * ncol(readcounts), 0.1, 10), nrow = nrow(readcounts), ncol = ncol(readcounts), dimnames = dimnames(readcounts)) effective_length <- matrix(100, nrow = nrow(readcounts), ncol = ncol(readcounts)) # Create config (metadata passed as explicit parameter to build_analysis) config <- TSENAT_config( sample_col = 'sample', condition_col = 'condition', q = seq(0, 2, by = 0.05), paired = FALSE ) # Build analysis from vignette data - metadata as explicit parameter analysis <- build_analysis( readcounts = readcounts, metadata = metadata_df, tx2gene = gff3_dataset, config = config, tpm = tpm, effective_length = effective_length ) # Filter low-abundance genes (required for reliable jackknife estimates) analysis <- filter_analysis(analysis, stringency = 'severe') # Compute diversity first (required for jackknife) analysis <- calculate_diversity(analysis, q = c(0.5, 1.0, 1.5)) # Run jackknife estimation analysis <- calculate_jeo(analysis, q = c(0.5, 1.0, 1.5)) # Check jackknife results using unified accessor jackknife_res <- results(analysis, type = 'jackknife') if (!is.null(jackknife_res)) names(jackknife_res)# Load example data (matching TSENAT.Rmd workflow) data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # TPM and effective_length REQUIRED for filter_analysis() tpm <- matrix(runif(nrow(readcounts) * ncol(readcounts), 0.1, 10), nrow = nrow(readcounts), ncol = ncol(readcounts), dimnames = dimnames(readcounts)) effective_length <- matrix(100, nrow = nrow(readcounts), ncol = ncol(readcounts)) # Create config (metadata passed as explicit parameter to build_analysis) config <- TSENAT_config( sample_col = 'sample', condition_col = 'condition', q = seq(0, 2, by = 0.05), paired = FALSE ) # Build analysis from vignette data - metadata as explicit parameter analysis <- build_analysis( readcounts = readcounts, metadata = metadata_df, tx2gene = gff3_dataset, config = config, tpm = tpm, effective_length = effective_length ) # Filter low-abundance genes (required for reliable jackknife estimates) analysis <- filter_analysis(analysis, stringency = 'severe') # Compute diversity first (required for jackknife) analysis <- calculate_diversity(analysis, q = c(0.5, 1.0, 1.5)) # Run jackknife estimation analysis <- calculate_jeo(analysis, q = c(0.5, 1.0, 1.5)) # Check jackknife results using unified accessor jackknife_res <- results(analysis, type = 'jackknife') if (!is.null(jackknife_res)) names(jackknife_res)
Wrapper around .calculate_jis() that manages TSENATAnalysis object. Identifies transcripts with significant isoform switching patterns using jackknife resampling across samples to detect influential isoforms.
calculate_jis( analysis, condition_col = NULL, subject_col = NULL, gene_col = NULL, isoform_col = NULL, q = c(0, 0.5, 1, 1.5, 2), norm = NULL, log_base = NULL, threshold = 90, nboot = 1000, pseudocount = NULL, sait_results = NULL, sait_p_threshold = 0.05, use_sait_fdr = TRUE, output_file = NULL, verbose = FALSE, ... )calculate_jis( analysis, condition_col = NULL, subject_col = NULL, gene_col = NULL, isoform_col = NULL, q = c(0, 0.5, 1, 1.5, 2), norm = NULL, log_base = NULL, threshold = 90, nboot = 1000, pseudocount = NULL, sait_results = NULL, sait_p_threshold = 0.05, use_sait_fdr = TRUE, output_file = NULL, verbose = FALSE, ... )
analysis |
|
condition_col |
|
subject_col |
|
gene_col |
|
isoform_col |
|
q |
|
norm |
|
log_base |
|
threshold |
|
nboot |
|
pseudocount |
|
sait_results |
|
sait_p_threshold |
|
use_sait_fdr |
|
output_file |
For .rds format, saves only the full analysis object. Default: NULL (no file output). |
verbose |
|
... |
Additional arguments for future extensibility. |
Key Features:
Jackknife resampling: Robust outlier detection across all samples
Delta-influence metric: Measures how much each isoform drives phenotype
Confidence intervals: Bootstrap-based uncertainty quantification
Multi-q analysis: Tests across full q-spectrum simultaneously
LM filtering: Optional restriction to genes with significant interactions
Paired designs: Supports repeated measures/longitudinal data
**Parameter Resolution from Config**
The following parameters are resolved using a three-level priority system:
User-provided argument (if not NULL)
Value from analysis@config (if key exists)
Function default value
Affected parameters:
q: Multi-q vector c(0, 0.5, 1, 1.5, 2) if not provided, or @config$q if available
nboot: Uses @config$nboot if available, else 1000
threshold: Uses @config$threshold if available, else 90
sait_p_threshold: Uses @config$sait_p_threshold if available, else 0.05
This allows setting defaults once in the config and reusing across multiple analyses.
**Mathematical Background:** Delta-influence measures how much removing each sample changes entropy:
Delta = H_q(leave-one-out) - H_q(original)
High |Delta| for specific isoforms indicates those isoforms drive differences. Identifies 'outlier samples' where isoforms contribute unusually much.
**Example Use Case:**
Sample shows high Delta for isoform X → X has outsized importance in that sample
Classifying transcripts as 'switching' if top percentile (e.g., 90th) Delta
Reveals condition-specific isoforms crucial for phenotype determination.
**Automatic Setup:**
This wrapper automatically:
1. Extracts SummarizedExperiment from @se slot
2. Detects condition_col, gene_col, isoform_col from colData/rowData or
@config
3. Calls .calculate_jis() with extracted parameters
**Parameter Auto-Detection:**
condition_col: Uses explicit parameter, then @config,
then 'sample_type'
gene_col: Uses explicit parameter, then looks for
'gene' or 'Gene'
isoform_col: Uses explicit parameter, then looks for
'transcript', 'isoform', or 'Isoform'
TSENATAnalysis object with
jackknife results stored in @jackknife_results
slot. Results are keyed by q-value (e.g., 'q_1.00'). For multi-q
analysis, multiple
calls will accumulate results in the slot.
The analysis object is returned visibly to support method chaining:
analysis <- calculate_jis(analysis, q = 0.5)
analysis <- calculate_jis(analysis, q = 1.0)
jackknife_isoform_switching for
the underlying implementation,
TSENATAnalysis for object structure.
data(readcounts) metadata_df <- read.table(system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t') gff3_file <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis(readcounts = readcounts, tx2gene = gff3_file, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length) analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes = 20, subset_n_samples = 8) analysis <- calculate_diversity(analysis, q = 1)data(readcounts) metadata_df <- read.table(system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t') gff3_file <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis(readcounts = readcounts, tx2gene = gff3_file, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length) analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes = 20, subset_n_samples = 8) analysis <- calculate_diversity(analysis, q = 1)
S4 wrapper for m_estimate that performs robust M-estimation
on diversity results stored in a TSENATAnalysis object and stores results
back into the object.
calculate_m_estimator( analysis, condition_col = NULL, loss_type = "huber", scale = NULL, max_iter = 50, tol = 1e-06, paired = NULL, pcorr = "BH", q_combine_method = "mean", influence_threshold = 0.75, scale_method = "mad", output_file = NULL, verbose = FALSE )calculate_m_estimator( analysis, condition_col = NULL, loss_type = "huber", scale = NULL, max_iter = 50, tol = 1e-06, paired = NULL, pcorr = "BH", q_combine_method = "mean", influence_threshold = 0.75, scale_method = "mad", output_file = NULL, verbose = FALSE )
analysis |
|
condition_col |
|
loss_type |
|
scale |
|
max_iter |
|
tol |
|
paired |
|
pcorr |
|
q_combine_method |
|
influence_threshold |
|
scale_method |
|
output_file |
|
verbose |
|
This wrapper extracts diversity results from analysis@diversity_results,
performs robust M-estimation on diversity values (entropy), and stores
results
in the analysis object metadata.
**M-Estimation:** Robust regression technique that down-weights outliers based on their residuals. Useful for detecting low-quality samples that show unusual diversity patterns.
**M-Estimation Results include:**
sample_influence: How much each sample affects the overall fit
robustness_weight: Down-weighting factor
entropy_mean: Average entropy for the sample
entropy_sd: Entropy variability within the sample
Status: QC Classification
**Parameter resolution priority** (explicit > @config > error):
samples: Uses explicit arg, else @config$condition_col,
else error. Note: despite parameter name 'samples', maps to condition
grouping column
**Data Requirements:**
Diversity results must be computed via calculate_diversity()
Sample grouping column required in colData (auto-detected from @config$condition_col or via 'samples' parameter)
Modified TSENATAnalysis object with M-estimation results stored in
analysis@metadata$m_estimate_results. Contains data frame with
influence scores, robustness weights, entropy statistics, and QC classifications.
Returns visibly to support method chaining and piping.
calculate_diversity for computing diversity
# Create test analysis and compute M-estimation data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Create config (metadata passed as explicit parameter to build_analysis) config <- TSENAT_config( sample_col = 'sample', condition_col = 'condition', q_values = seq(0, 2, by = 0.05), paired = FALSE ) # Build analysis from vignette data analysis <- build_analysis( readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length ) analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes = 200) analysis <- calculate_diversity(analysis, q = c(0.5, 1.0, 1.5)) analysis <- calculate_m_estimator( analysis, condition_col = 'condition', loss_type = 'huber' )# Create test analysis and compute M-estimation data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Create config (metadata passed as explicit parameter to build_analysis) config <- TSENAT_config( sample_col = 'sample', condition_col = 'condition', q_values = seq(0, 2, by = 0.05), paired = FALSE ) # Build analysis from vignette data analysis <- build_analysis( readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length ) analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes = 200) analysis <- calculate_diversity(analysis, q = c(0.5, 1.0, 1.5)) analysis <- calculate_m_estimator( analysis, condition_col = 'condition', loss_type = 'huber' )
Statistical interaction testing for entropy diversity using flexible scale-adaptive models (GAM, LMM, GEE, FPCA) with AR(1) correlation structure for repeated measures. Tests for significant q-by-condition interactions across entropic indices.
calculate_sait( analysis, fdr_threshold = NULL, formula = NULL, condition_col = NULL, method = "gam", paired = NULL, subject_col = NULL, nthreads = NULL, multicorr = NULL, corstr = NULL, pcorr = NULL, verbose = NULL, return_model_data = NULL, output_file = NULL, ... )calculate_sait( analysis, fdr_threshold = NULL, formula = NULL, condition_col = NULL, method = "gam", paired = NULL, subject_col = NULL, nthreads = NULL, multicorr = NULL, corstr = NULL, pcorr = NULL, verbose = NULL, return_model_data = NULL, output_file = NULL, ... )
analysis |
|
fdr_threshold |
|
formula |
|
condition_col |
|
method |
|
paired |
|
subject_col |
|
nthreads |
|
multicorr |
|
corstr |
|
pcorr |
|
verbose |
|
return_model_data |
|
output_file |
|
... |
Additional arguments passed to the base LM function, including: pvalue, min_obs, assay_name, bias_correction, regularization, storey, wy_randomizations, adaptive_knots, etc. |
Extracts diversity results from @diversity_results (prerequisite),
combines across q-values into single SummarizedExperiment,
then runs .calculate_sait().
**Parameter Priority Resolution:**
nthreads: Priority: explicit > @config > NULL
Parameters are resolved in priority order: 1. Explicit arguments passed to function 2. Values from analysis@config (if present) 3. Function defaults
Modified TSENATAnalysis with results in @sait_results$sait_interaction.
## Not run: # Create test analysis with appropriate sample structure for paired design # Note: requires lme4 package for LMM fitting; uses synthetic data set.seed(42) # Create transcript-level counts with biological signal # Note: Use adequate complexity (transcripts/genes, samples, expression) # to avoid filtering away all genes during diversity computation n_genes <- 50 n_transcripts_per_gene <- 30 n_transcripts <- n_genes * n_transcripts_per_gene n_samples <- 16 # 8 subjects x 2 conditions (paired design) # Generate counts with clear biological signal control_idx <- seq(1, n_samples, by = 2) treatment_idx <- seq(2, n_samples, by = 2) counts <- matrix(0, nrow = n_transcripts, ncol = n_samples) for (j in seq_len(n_samples)) { lambda <- if (j %in% control_idx) 100 else 180 counts[, j] <- rpois(n_transcripts, lambda = lambda) } counts <- pmax(counts, 50) # Ensure minimum expression rownames(counts) <- paste0('TX_', seq_len(n_transcripts)) colnames(counts) <- paste0('Sample_', seq_len(n_samples)) # Create rowData with gene mapping (tx2gene structure) rowdata <- data.frame( transcript_id = rownames(counts), gene_id = rep(paste0('GENE_', 1:n_genes), each = n_transcripts_per_gene), row.names = rownames(counts) ) # Create colData with paired design metadata coldata <- data.frame( sample_id = colnames(counts), condition = rep(c('control', 'treatment'), length.out = n_samples), subject = rep(paste0('Subject_', 1:8), length.out = n_samples), row.names = colnames(counts) ) # Build SummarizedExperiment se <- SummarizedExperiment::SummarizedExperiment( assays = list(counts = counts), rowData = S4Vectors::DataFrame(rowdata), colData = S4Vectors::DataFrame(coldata) ) # Add tx2gene metadata for gene-level aggregation S4Vectors::metadata(se)$tx2gene <- data.frame(Transcript = rowdata$transcript_id, Gene = rowdata$gene_id) # Initialize TSENATAnalysis analysis <- TSENATAnalysis(se = se, config = list()) # Compute diversity (prerequisite for SAIT interaction analysis) analysis <- calculate_diversity( analysis, q = c(0.5, 1.0, 1.5, 2.0, 2.5) ) # Calculate q x condition interactions using GAM analysis <- suppressWarnings(calculate_sait( analysis, condition_col = 'condition', method = 'gam' )) # View top interaction results using unified accessor (first 3 genes) res <- results(analysis, type = "sait") if (!is.null(res)) head(res, 3) ## End(Not run)## Not run: # Create test analysis with appropriate sample structure for paired design # Note: requires lme4 package for LMM fitting; uses synthetic data set.seed(42) # Create transcript-level counts with biological signal # Note: Use adequate complexity (transcripts/genes, samples, expression) # to avoid filtering away all genes during diversity computation n_genes <- 50 n_transcripts_per_gene <- 30 n_transcripts <- n_genes * n_transcripts_per_gene n_samples <- 16 # 8 subjects x 2 conditions (paired design) # Generate counts with clear biological signal control_idx <- seq(1, n_samples, by = 2) treatment_idx <- seq(2, n_samples, by = 2) counts <- matrix(0, nrow = n_transcripts, ncol = n_samples) for (j in seq_len(n_samples)) { lambda <- if (j %in% control_idx) 100 else 180 counts[, j] <- rpois(n_transcripts, lambda = lambda) } counts <- pmax(counts, 50) # Ensure minimum expression rownames(counts) <- paste0('TX_', seq_len(n_transcripts)) colnames(counts) <- paste0('Sample_', seq_len(n_samples)) # Create rowData with gene mapping (tx2gene structure) rowdata <- data.frame( transcript_id = rownames(counts), gene_id = rep(paste0('GENE_', 1:n_genes), each = n_transcripts_per_gene), row.names = rownames(counts) ) # Create colData with paired design metadata coldata <- data.frame( sample_id = colnames(counts), condition = rep(c('control', 'treatment'), length.out = n_samples), subject = rep(paste0('Subject_', 1:8), length.out = n_samples), row.names = colnames(counts) ) # Build SummarizedExperiment se <- SummarizedExperiment::SummarizedExperiment( assays = list(counts = counts), rowData = S4Vectors::DataFrame(rowdata), colData = S4Vectors::DataFrame(coldata) ) # Add tx2gene metadata for gene-level aggregation S4Vectors::metadata(se)$tx2gene <- data.frame(Transcript = rowdata$transcript_id, Gene = rowdata$gene_id) # Initialize TSENATAnalysis analysis <- TSENATAnalysis(se = se, config = list()) # Compute diversity (prerequisite for SAIT interaction analysis) analysis <- calculate_diversity( analysis, q = c(0.5, 1.0, 1.5, 2.0, 2.5) ) # Calculate q x condition interactions using GAM analysis <- suppressWarnings(calculate_sait( analysis, condition_col = 'condition', method = 'gam' )) # View top interaction results using unified accessor (first 3 genes) res <- results(analysis, type = "sait") if (!is.null(res)) head(res, 3) ## End(Not run)
Wrapper around [.calculate_srh()] that manages TSENATAnalysis object. Tests for genes with condition-specific q-dependent entropy patterns by testing whether the effect of q-values DIFFERS between experimental conditions. This detects disease-relevant or condition-specific isoform switching patterns.
calculate_srh( analysis, condition_col, output_file = NULL, paired = NULL, subject_col = NULL, multicorr = c("hochberg", "benjamini-yekutieli", "westfall-young", "none"), entropy_col = "diversity", q_col = "q", gene_col = "gene", wy_randomizations = 500, nperm_mode = c("standard", "conservative", "interactive"), nthreads = NULL, alpha = 0.05, p_threshold = 0.05, eta2_threshold_moderate = 0.01, eta2_threshold_strong = 0.1, min_nperm = 100, max_nperm = 10000, verbose = FALSE, ... )calculate_srh( analysis, condition_col, output_file = NULL, paired = NULL, subject_col = NULL, multicorr = c("hochberg", "benjamini-yekutieli", "westfall-young", "none"), entropy_col = "diversity", q_col = "q", gene_col = "gene", wy_randomizations = 500, nperm_mode = c("standard", "conservative", "interactive"), nthreads = NULL, alpha = 0.05, p_threshold = 0.05, eta2_threshold_moderate = 0.01, eta2_threshold_strong = 0.1, min_nperm = 100, max_nperm = 10000, verbose = FALSE, ... )
analysis |
|
condition_col |
|
output_file |
|
paired |
|
subject_col |
|
multicorr |
|
entropy_col |
|
q_col |
|
gene_col |
|
wy_randomizations |
|
nperm_mode |
|
nthreads |
|
alpha |
|
p_threshold |
|
eta2_threshold_moderate |
|
eta2_threshold_strong |
|
min_nperm |
|
max_nperm |
|
verbose |
|
... |
Additional arguments passed to the base |
## Key Features
- **Q Condition Interaction**: Tests if entropy patterns across q-values differ
by condition (main discovery goal)
- **Multi-q Analysis**: Combines diversity results for multiple q-values into
a single SummarizedExperiment for joint hypothesis testing
- **Rank-Based Statistics**: Scheirer-Ray-Hare test (two-way ANOVA on ranked data,
- **Scheirer-Ray-Hare Test**: Two-way non-parametric ANOVA on ranks
- **Multiple Testing Correction**: Hochberg, Benjamini-Yekutieli, or permutation
(Westfall-Young) procedures
- **AR(1) Correlation Handling**: Westfall-Young preserves q-value spatial
correlations (important for ordered q measurements)
- **Effect Sizes**: Eta-squared () for q condition interactions
## Statistical Hypotheses
Tests the null hypothesis: - H0 = Gene entropy q-effect does NOT differ between conditions (q-independent) - H1 = Gene entropy q-dependence is CONDITION-SPECIFIC (interaction exists)
A significant interaction indicates condition-specific patterns in how entropy varies across the q-value spectrum, revealing biological processes specific to that condition.
## Biological Example
Gene shows strong isoform switching (q-dependent entropy) in tumor cells but NOT in healthy cells -> Identified as disease-relevant q-dependent gene.
For condition-specific q-dependent genes: - **Condition A**: Strong entropy variation across q (q-dependent isoform usage) - **Condition B**: Flat entropy profile across q (uniform isoform usage) - **Interaction**: Condition-specific q-dependence pattern reveals disease-associated splicing regulation
Analyzes how gene interactions change across q-value spectrum using rank-based (Scheirer-Ray-Hare) or parametric (GAM) statistical tests.
**Parameter resolution priority** (explicit > @config > default/auto-detect):
condition_col: REQUIRED - must be explicitly provided
q: ALWAYS auto-detected from diversity_results (all q-values tested together)
paired: explicit arg > @config$paired > FALSE (default)
subject_col: explicit arg > @config$subject_col
multicorr:
explicit arg > @config$multicorr > 'hochberg'
nthreads: explicit arg > @config$nthreads > 1 (default)
test:
explicit arg > @config$test > 'auto' (auto-selection)
nperm_mode:
explicit arg > @config$nperm_mode > 'standard'
Modified TSENATAnalysis with interaction results in @sait_results.
# Load example data (matching TSENAT.Rmd workflow) data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Create config first (required when metadata is provided) config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') # Build analysis from vignette data and create manageable subset analysis <- build_analysis( readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length ) analysis <- filter_analysis( analysis, min_samples = 1, subset_n_genes = 200 ) analysis <- calculate_diversity(analysis, q = c(0.5, 1.0, 1.5)) # Test Q\eqn{\times} Condition interaction (condition_col is REQUIRED) analysis <- calculate_srh( analysis, condition_col = 'condition', multicorr = 'hochberg' ) # View results using unified accessor rank_test_res <- results(analysis, type = 'rank_test') if (!is.null(rank_test_res)) head(rank_test_res)# Load example data (matching TSENAT.Rmd workflow) data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Create config first (required when metadata is provided) config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') # Build analysis from vignette data and create manageable subset analysis <- build_analysis( readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length ) analysis <- filter_analysis( analysis, min_samples = 1, subset_n_genes = 200 ) analysis <- calculate_diversity(analysis, q = c(0.5, 1.0, 1.5)) # Test Q\eqn{\times} Condition interaction (condition_col is REQUIRED) analysis <- calculate_srh( analysis, condition_col = 'condition', multicorr = 'hochberg' ) # View results using unified accessor rank_test_res <- results(analysis, type = 'rank_test') if (!is.null(rank_test_res)) head(rank_test_res)
S4 wrapper for .filter_se() that filters low-abundance transcripts
directly within a TSENATAnalysis object. This maintains the consistent
S4 workflow pattern where functions accept and return analysis objects.
filter_analysis( analysis, min_tpm = 1, tpm_assay_name = NULL, min_samples = 5L, stringency = "medium", pair_col = NULL, min_tx_per_gene = 2L, min_isoform_abundance = NULL, assay_name = "counts", subset_n_genes = NULL, subset_genes = NULL, subset_n_samples = NULL, subset_samples = NULL, subset_select_by = c("variance", "mean", "random"), subset_seed = 42, subset_min_count = NULL, verbose = FALSE )filter_analysis( analysis, min_tpm = 1, tpm_assay_name = NULL, min_samples = 5L, stringency = "medium", pair_col = NULL, min_tx_per_gene = 2L, min_isoform_abundance = NULL, assay_name = "counts", subset_n_genes = NULL, subset_genes = NULL, subset_n_samples = NULL, subset_samples = NULL, subset_select_by = c("variance", "mean", "random"), subset_seed = 42, subset_min_count = NULL, verbose = FALSE )
analysis |
A |
min_tpm |
Numeric TPM threshold (default 1.0).
Keeps transcripts with
TPM >= |
tpm_assay_name |
Character; name of assay containing TPM data (default: NULL). If NULL, searches for TPM assay automatically. |
min_samples |
Numeric. Minimum number of samples in which a transcript
must be present (default: 5). Ignored if |
stringency |
Character. Filtering stringency level: 'soft' (permissive),
'medium' (balanced), or 'severe' (stringent). When specified,
auto-estimates:
|
pair_col |
Character; column name in colData containing pair IDs for paired designs. Default: NULL (auto-detect if needed). |
min_tx_per_gene |
Integer minimum number of transcripts per gene
required
(default 2L). Single-transcript genes are always kept. Ignored if
|
min_isoform_abundance |
Numeric in [0, 1]; minimum relative
abundance threshold
for isoforms within each gene. Implements Soneson et al. (2016) filtering.
Default behavior:
- If |
assay_name |
Character; name or index of the assay to use for filtering
(default: 'counts'). Deprecated: use |
subset_n_genes |
Integer; optional number of genes to retain after
filtering.
If provided, genes are selected based on |
subset_genes |
Character vector; optional specific genes to retain after filtering. Default: NULL. |
subset_n_samples |
Integer; optional number of samples to retain after filtering. If provided, samples are selected (balanced by condition if available). Default: NULL. |
subset_samples |
Character vector; optional specific samples to retain after filtering. Default: NULL. |
subset_select_by |
Character; gene selection method for
|
subset_seed |
Integer; random seed for reproducibility when
|
subset_min_count |
Numeric; optional minimum count threshold applied during subsetting. Default: NULL. |
verbose |
Logical. If TRUE, print filtering progress and summary statistics (default: FALSE). |
This wrapper applies .filter_se() to the SummarizedExperiment within
the TSENATAnalysis object, optionally followed by subsetting parameters.
The filtering and subsetting operations are applied in sequence:
1. Extracts the SE from analysis@se
2. Filters using .filter_se() with specified filtering parameters (default: 'medium' stringency)
3. If any subset parameters are provided, applies gene/sample selection
to select specific genes and/or samples
4. Stores the filtered/subsetted SE back in analysis@se
5. Returns the modified analysis object invisibly
**Default Filtering (stringency = 'medium'):** By default, filtering applies balanced stringency: requires transcripts in >= 50 isoform abundance of 5 noise reduction with preservation of isoform diversity for reliable entropy calculations.
**Important:** Filtering should be performed BEFORE computing diversity, divergence, or SAIT interaction results. If called after analysis results have been computed, those results will be based on unfiltered data and may not align with the filtered SE dimensions.
Invisibly returns the modified analysis object with filtered
SummarizedExperiment in the @se slot. The filtering operation
modifies the analysis object in-place while maintaining all other slots
(results, metadata, etc.).
build_analysis for creating a new analysis object
# Create test analysis and filter data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis(readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length) analysis <- filter_analysis(analysis, stringency = 'medium')# Create test analysis and filter data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis(readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length) analysis <- filter_analysis(analysis, stringency = 'medium')
Access or set the metadata list stored in a TSENATAnalysis object. The getter function retrieves all metadata or a specific key-value. The setter function replaces the entire metadata list.
## S4 method for signature 'TSENATAnalysis' metadata(x, key = NULL) ## S4 replacement method for signature 'TSENATAnalysis' metadata(x) <- value## S4 method for signature 'TSENATAnalysis' metadata(x, key = NULL) ## S4 replacement method for signature 'TSENATAnalysis' metadata(x) <- value
x |
A |
key |
Optional character string specifying a metadata key to retrieve |
value |
A list of metadata to assign |
Get or Set Metadata
metadataReturns the full metadata list, or a single value if key is specified
metadata<-Returns the modified TSENATAnalysis object
# Create a TSENATAnalysis object library(SummarizedExperiment) se <- SummarizedExperiment(assays = list(counts = matrix(1:100, nrow = 10))) analysis <- new('TSENATAnalysis', se = se, config = list()) # Get metadata (empty by default) metadata(analysis) # Set metadata metadata(analysis) <- list(processing_date = Sys.Date(), method = 'test') # Retrieve all metadata metadata(analysis) # Retrieve specific metadata key metadata(analysis, key = 'method')# Create a TSENATAnalysis object library(SummarizedExperiment) se <- SummarizedExperiment(assays = list(counts = matrix(1:100, nrow = 10))) analysis <- new('TSENATAnalysis', se = se, config = list()) # Get metadata (empty by default) metadata(analysis) # Set metadata metadata(analysis) <- list(processing_date = Sys.Date(), method = 'test') # Retrieve all metadata metadata(analysis) # Retrieve specific metadata key metadata(analysis, key = 'method')
Plot method concordance results from TSENATAnalysis
plot_concordance(analysis, verbose = FALSE) ## S4 method for signature 'TSENATAnalysis' plot_concordance(analysis, verbose = FALSE)plot_concordance(analysis, verbose = FALSE) ## S4 method for signature 'TSENATAnalysis' plot_concordance(analysis, verbose = FALSE)
analysis |
|
verbose |
|
Creates visualization of method concordance including: - Comparison of significance across two methods (with color-coded agreement) - P-value distribution histograms for both methods - Significance threshold lines at p < 0.05
Requires that calculate_concordance() has already been run
to populate @metadata$method_concordance.
A ggplot/cowplot object showing:
Scatter plot of -log10(p-values) comparing methods
Histogram of p-value distributions by method
# Load example data (matching TSENAT.Rmd workflow) data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Build analysis from vignette data and create small subset config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis(readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length) analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes = 200) # Note: calculate_concordance requires additional LM and Scheirer-Ray-Hare rank test # results computed. For demo purposes, we show that # plot_concordance needs pre-computed concordance in @metadata# Load example data (matching TSENAT.Rmd workflow) data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Build analysis from vignette data and create small subset config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis(readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length) analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes = 200) # Note: calculate_concordance requires additional LM and Scheirer-Ray-Hare rank test # results computed. For demo purposes, we show that # plot_concordance needs pre-computed concordance in @metadata
S4 wrapper that extracts effect size results from a TSENATAnalysis object and generates a histogram visualization of Tsallis divergence effect sizes across genes.
plot_divergence_distribution( analysis, threshold = 0.1, output_file = NULL, width = 12, height = 6, verbose = FALSE, ... )plot_divergence_distribution( analysis, threshold = 0.1, output_file = NULL, width = 12, height = 6, verbose = FALSE, ... )
analysis |
|
threshold |
|
output_file |
|
width |
|
height |
|
verbose |
|
... |
Additional arguments passed to the underlying plotting function. |
This wrapper extracts the interaction results (with effect size columns)
from analysis@metadata$effect_sizes_divergence$interaction_results
and passes them to the base .plot_divergence_distribution() function.
The function visualizes the distribution of effect sizes using the median q-value's divergence (typically around q=1.0, close to Shannon entropy). A red dashed line marks the information-theoretic significance threshold.
**Data Requirements:**
Effect sizes must be computed via calculate_effect_sizes()
@metadata$effect_sizes_divergence$interaction_results must
contain columns matching pattern effect_size_D_q*
Invisibly returns the file path if saved, otherwise the ggplot object. If the plot cannot be created (missing data, ggplot2 not available), returns NULL invisibly with an informative message.
calculate_effect_sizes for computing effect sizes.
# Plot 2: Distribution of effect sizes across genes data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Configure analysis parameters first config <- TSENAT_config( sample_col = 'sample', condition_col = 'condition', control = 'normal' ) # Build analysis with configured parameters analysis <- build_analysis( readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length ) analysis <- filter_analysis(analysis, stringency = 'severe') analysis <- calculate_diversity( analysis, q = seq(0.2, 2, by = 0.4), verbose = FALSE ) analysis <- calculate_divergence( analysis, q = seq(0.2, 2, by = 0.4), verbose = FALSE ) analysis <- suppressWarnings(calculate_sait(analysis, method = 'lmm')) analysis <- calculate_effect_sizes(analysis) p_dist <- plot_divergence_distribution(analysis) # print(p_dist)# Plot 2: Distribution of effect sizes across genes data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Configure analysis parameters first config <- TSENAT_config( sample_col = 'sample', condition_col = 'condition', control = 'normal' ) # Build analysis with configured parameters analysis <- build_analysis( readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length ) analysis <- filter_analysis(analysis, stringency = 'severe') analysis <- calculate_diversity( analysis, q = seq(0.2, 2, by = 0.4), verbose = FALSE ) analysis <- calculate_divergence( analysis, q = seq(0.2, 2, by = 0.4), verbose = FALSE ) analysis <- suppressWarnings(calculate_sait(analysis, method = 'lmm')) analysis <- calculate_effect_sizes(analysis) p_dist <- plot_divergence_distribution(analysis) # print(p_dist)
S4 wrapper that extracts divergence results from a TSENATAnalysis object and visualizes the average Tsallis divergence across all genes (or specified genes) as a function of q-value.
plot_divergence_spectrum( analysis, gene = NULL, n_genes = 4, ncol = 2, metric = c("median", "mean"), variability_metric = c("iqr", "sd"), use_pvalue_ranking = FALSE, output_file = NULL, width = 12, height = NULL, verbose = FALSE, ... )plot_divergence_spectrum( analysis, gene = NULL, n_genes = 4, ncol = 2, metric = c("median", "mean"), variability_metric = c("iqr", "sd"), use_pvalue_ranking = FALSE, output_file = NULL, width = 12, height = NULL, verbose = FALSE, ... )
analysis |
|
gene |
|
n_genes |
|
ncol |
|
metric |
|
variability_metric |
|
use_pvalue_ranking |
|
output_file |
|
width |
|
height |
|
verbose |
|
... |
Additional arguments passed to the underlying plotting function. |
This wrapper extracts the divergence SummarizedExperiment from
analysis@divergence_results and optionally the SAIT results from
analysis@sait_results$sait_interaction to pass to the base function.
**Data Requirements:**
Divergence must be computed via calculate_divergence()
@divergence_results$divergence_se or direct divergence SE
**Modes:**
Global mode (gene = NULL): Shows median/mean divergence across all genes with variability bands
Gene-specific mode (gene specified): Shows divergence spectrum for a single named gene
Top genes mode (gene = NULL, sait_res provided): Shows top n_genes by significance
Invisibly returns the file path if saved, otherwise the ggplot object. If the plot cannot be created (missing data, ggplot2 not available), returns NULL invisibly with an informative message.
calculate_divergence for computing divergence.
# Load example data (matching TSENAT.Rmd workflow) data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Build analysis from vignette data and create small subset config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis(readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length) analysis <- filter_analysis( analysis, min_samples = 1, subset_n_genes = 200 ) analysis <- calculate_diversity( analysis, q = c(0.5, 1, 1.5), verbose = FALSE ) analysis <- calculate_divergence( analysis, q = c(0.5, 1, 1.5), verbose = FALSE ) p_global <- plot_divergence_spectrum(analysis) # print(p_global)# Load example data (matching TSENAT.Rmd workflow) data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Build analysis from vignette data and create small subset config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis(readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length) analysis <- filter_analysis( analysis, min_samples = 1, subset_n_genes = 200 ) analysis <- calculate_diversity( analysis, q = c(0.5, 1, 1.5), verbose = FALSE ) analysis <- calculate_divergence( analysis, q = c(0.5, 1, 1.5), verbose = FALSE ) p_global <- plot_divergence_spectrum(analysis) # print(p_global)
Visualize Tsallis entropy (S_q) as a function of the diversity parameter q across sample groups. Supports three modes: aggregate q-curves (default), gene-specific q-curves (when 'gene' provided), or bootstrap confidence interval bands.
plot_diversity_spectrum( se, assay_name = "diversity", condition_col = NULL, gene = NULL, sait_res = NULL, n_top = NULL, metric = "iqr", output_file = NULL, dev_width = NULL, dev_height = NULL )plot_diversity_spectrum( se, assay_name = "diversity", condition_col = NULL, gene = NULL, sait_res = NULL, n_top = NULL, metric = "iqr", output_file = NULL, dev_width = NULL, dev_height = NULL )
se |
A 'SummarizedExperiment' returned by '.calculate_diversity()' with diversity assay. If pre-computed bootstrap confidence intervals are available (ci_lower/ci_upper assays), they will be displayed automatically. Otherwise, falls back to IQR visualization. |
assay_name |
Character; name of the assay to plot (default: 'diversity'). |
condition_col |
Character or NULL; column name in colData indicating group/sample type. If NULL (default), reads from '@config$condition_col' when input is TSENATAnalysis, otherwise defaults to 'sample_type'. Only used in aggregate and CI modes. |
gene |
Character vector (optional); if provided, plot q-curves for specified gene(s). Overrides default aggregate behavior. When provided, uses median +/- SD for each gene. |
sait_res |
Data frame (optional); gene interaction test results with 'gene' column and p-value column. Accepts either: - Results from '.calculate_sait()' (has 'adj_p_interaction' or 'p_interaction' columns) - Results from '.calculate_srh()' (has 'adj_p_value' or 'p_value' columns from Scheirer-Ray-Hare rank tests) If provided (and 'gene' is NULL), plots top 'n_top' genes ranked by p-value. Useful for plotting significant genes from any interaction analysis. |
n_top |
Integer or NULL; number of top genes to select from 'sait_res' when 'gene' is NULL (default: NULL). When NULL, defaults to showing the single most significant gene (n_top=1), providing a conservative view of the strongest effect. Set to a numeric value to show that many top genes. |
metric |
Character; when bootstrap CIs are NOT available, specifies the spread metric to display. Options: 'iqr' (default, Interquartile Range - more robust) or 'sd' (Standard Deviation). This parameter is ignored when bootstrap confidence intervals are available. Default: 'iqr'. |
output_file |
|
dev_width |
Numeric or NULL; width in inches for the graphics device when displaying the plot interactively. If specified (along with dev_height), creates a new device with this width. Default: NULL (uses current device). |
dev_height |
Numeric or NULL; height in inches for the graphics device when displaying the plot interactively. If specified (along with dev_width), creates a new device with this height. Default: NULL (uses current device). |
**Aggregate mode (default, gene=NULL, sait_res=NULL)**: - Plots median Tsallis entropy +/- IQR across all genes for each group - Works with any SummarizedExperiment from .calculate_diversity() - Supports single or multiple q values and any number of groups - No CI data required for basic plots; bootstrap CIs optional
**Gene-specific mode (gene or sait_res provided)**: - Plots q-curve separately for each selected gene - Shows median entropy +/- SD (variance) for each gene across q-values and groups - When 'sait_res' provided: automatically ranks genes and selects top 'n_top' by p-value - Single gene: returns a ggplot object; multiple genes: returns a grid plot (2 rows x 2 columns) with shared legend - For multiple genes: legend appears once at the bottom of the grid to avoid repetition and save space - Useful for highlighting specific genes of interest or significant discoveries - Bootstrap mode not supported in gene-specific mode
**Automatic Bootstrap CI Detection**: - When computing divergence or diversity with bootstrap enabled, ci_lower and ci_upper assays are added to the SummarizedExperiment. - This function automatically detects these assays and displays bootstrap confidence interval bands instead of IQR. No additional parameter needed. - For confidence bands to appear, use 'calculate_diversity(..., bootstrap=TRUE, nboot=1000)' or appropriate divergence function with bootstrap enabled.
**Aggregate mode (gene=NULL, sait_res=NULL)**: - If bootstrap CI assays available: A ggplot object showing median entropy with bootstrap confidence interval bands. - If no CI assays: A ggplot object showing median entropy with IQR ribbons (automatic fallback).
**Gene-specific mode (gene or sait_res provided)**: - Single gene: A ggplot object showing median entropy +/- SD for that gene. - Multiple genes: A grid plot object arranged in 2 rows x 2 columns with a shared legend at the bottom. The legend appears once beneath the grid, avoiding repetition across subplots.
# Plot 7: Tsallis entropy q-curve (combined across all sample diversity) data(readcounts) metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' # Create configuration (required when metadata is provided) config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis(readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length) analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes = 200) analysis <- calculate_diversity(analysis, q = seq(0, 2, by = 0.5), ) p <- plot_diversity_spectrum(analysis) # print(p)# Plot 7: Tsallis entropy q-curve (combined across all sample diversity) data(readcounts) metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' # Create configuration (required when metadata is provided) config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis(readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length) analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes = 200) analysis <- calculate_diversity(analysis, q = seq(0, 2, by = 0.5), ) p <- plot_diversity_spectrum(analysis) # print(p)
Creates a side-by-side grid layout with a violin plot on the left and a density plot on the right, both showing Tsallis entropy distribution for the q value in the provided SummarizedExperiment (which should contain a single q value).
Creates a side-by-side grid layout with a violin plot on the left and a density plot on the right, both showing Tsallis entropy distribution for the q value in the provided SummarizedExperiment (which should contain a single q value).
plot_diversity_violin_density( se, assay_name = "diversity", title = NULL, output_file = NULL ) plot_diversity_violin_density( se, assay_name = "diversity", title = NULL, output_file = NULL )plot_diversity_violin_density( se, assay_name = "diversity", title = NULL, output_file = NULL ) plot_diversity_violin_density( se, assay_name = "diversity", title = NULL, output_file = NULL )
se |
A 'SummarizedExperiment' returned by 'calculate_diversity' containing entropy values at a single q value. |
assay_name |
Name of the assay to use (default: 'diversity'). |
title |
Optional base title. If NULL, auto-generated based on q value. |
output_file |
Character or NULL. Optional file path to save the plot as an image. If provided, the plot will be saved with appropriate dimensions. Default: NULL (no file output, only return object). |
A 'ggplot2' object showing a 1x2 grid with violin plot on the left and density plot on the right.
A 'ggplot2' object showing a 1x2 grid with violin plot on the left and density plot on the right.
# Plot 8: Violin and density plots of Tsallis entropy distribution data(readcounts) metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis(readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length) analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes = 200) analysis <- calculate_diversity(analysis, q = 1.0) p <- plot_diversity_violin_density(analysis) # if (!is.null(p)) print(p) # Plot 8: Violin and density plots of Tsallis entropy distribution data(readcounts) metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' # Create configuration (required when metadata is provided) config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis(readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length) analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes = 200) analysis <- calculate_diversity(analysis, q = 1.0) p <- plot_diversity_violin_density(analysis) # print(p)# Plot 8: Violin and density plots of Tsallis entropy distribution data(readcounts) metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis(readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length) analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes = 200) analysis <- calculate_diversity(analysis, q = 1.0) p <- plot_diversity_violin_density(analysis) # if (!is.null(p)) print(p) # Plot 8: Violin and density plots of Tsallis entropy distribution data(readcounts) metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' # Create configuration (required when metadata is provided) config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis(readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length) analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes = 200) analysis <- calculate_diversity(analysis, q = 1.0) p <- plot_diversity_violin_density(analysis) # print(p)
S4 wrapper for . plot_expression() that
extracts data directly from
a TSENATAnalysis object. Automatically retrieves the SummarizedExperiment and
SAIT results for visualizing transcript abundance across conditions.
plot_expression( analysis, gene = NULL, condition_col = NULL, top_n = 4, output_file = NULL, metric = c("median", "mean", "variance", "iqr"), use_tpm = TRUE, width = NULL, height = NULL, fontsize = 16, cellwidth = 0, cellheight = 0, layout_ncol = 2, verbose = FALSE, ... )plot_expression( analysis, gene = NULL, condition_col = NULL, top_n = 4, output_file = NULL, metric = c("median", "mean", "variance", "iqr"), use_tpm = TRUE, width = NULL, height = NULL, fontsize = 16, cellwidth = 0, cellheight = 0, layout_ncol = 2, verbose = FALSE, ... )
analysis |
|
gene |
|
condition_col |
|
top_n |
|
output_file |
|
metric |
|
use_tpm |
|
width |
|
height |
|
fontsize |
|
cellwidth |
|
cellheight |
|
layout_ncol |
|
verbose |
|
... |
Additional arguments passed to the base plotting function. |
This wrapper extracts the following from analysis:
From analysis@se containing transcript counts
From analysis@sait_results$sait_interaction for
gene selection
If no gene is specified, the function automatically selects the top gene from the SAIT results (lowest p-value). This simplifies visualization of genes with significant q x condition interaction effects.
Invisibly returns the output file path (if 'output_file' provided), or invisible(NULL) if rendering to active graphics device. Graphics are rendered to the active grid device for capture during vignette compilation.
TSENATAnalysis for object structure
# Plot 6: Top transcripts across groups data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Configure analysis parameters first config <- TSENAT_config( sample_col = 'sample', condition_col = 'condition', subject_col = 'paired_samples', paired = TRUE, control = 'normal', q_values = seq(0, 2, by = 0.1) ) # Build analysis with configured parameters analysis <- build_analysis( readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length ) analysis <- filter_analysis(analysis, stringency = 'severe') analysis <- calculate_diversity( analysis, q = seq(0.2, 2, by = 0.4), verbose = FALSE ) analysis <- suppressWarnings(calculate_sait( analysis, method = 'lmm', verbose = FALSE )) plot_file <- plot_expression(analysis, top_n = 2) # print(plot_file)# Plot 6: Top transcripts across groups data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Configure analysis parameters first config <- TSENAT_config( sample_col = 'sample', condition_col = 'condition', subject_col = 'paired_samples', paired = TRUE, control = 'normal', q_values = seq(0, 2, by = 0.1) ) # Build analysis with configured parameters analysis <- build_analysis( readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length ) analysis <- filter_analysis(analysis, stringency = 'severe') analysis <- calculate_diversity( analysis, q = seq(0.2, 2, by = 0.4), verbose = FALSE ) analysis <- suppressWarnings(calculate_sait( analysis, method = 'lmm', verbose = FALSE )) plot_file <- plot_expression(analysis, top_n = 2) # print(plot_file)
S4 wrapper for . plot_jis_delta() that
extracts results
directly from a TSENATAnalysis object. Automatically retrieves jackknife
switching
results from the analysis object slots.
plot_jis_delta( analysis, n_genes = 4, sait_results = NULL, verbose = FALSE, output_file = NULL, ... )plot_jis_delta( analysis, n_genes = 4, sait_results = NULL, verbose = FALSE, output_file = NULL, ... )
analysis |
|
n_genes |
|
sait_results |
|
verbose |
|
output_file |
|
... |
Additional arguments passed to the base function. |
This function extracts the following from analysis:
From analysis@jackknife_results, which
should
contain multi-q switching results keyed by q-value (e.g., 'q_1.00')
From analysis@sait_results$sait_interaction if not
explicitly provided, for ranking genes by significance
The wrapper automatically handles parameter extraction and provides a simplified interface compared to the base function.
A file path (character) to the saved heatmap PNG file, invisibly.
calculate_jis for computing switching results
# Plot 5: Multi-q delta influence (isoform switching) heatmaps data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Configure analysis parameters first config <- TSENAT_config( sample_col = 'sample', condition_col = 'condition', subject_col = 'paired_samples', paired = TRUE, control = 'normal', q = seq(0, 2, by = 0.1) ) # Build analysis with configured parameters analysis <- build_analysis( readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length ) analysis <- filter_analysis(analysis, stringency = 'severe') analysis <- calculate_diversity( analysis, q = seq(0.2, 2, by = 0.4), verbose = FALSE ) analysis <- calculate_divergence( analysis, q = seq(0.2, 2, by = 0.4) ) analysis <- suppressWarnings(calculate_sait(analysis, method = 'lmm')) analysis <- calculate_jis( analysis, q = seq(0.2, 2, by = 0.4), n_bootstrap = 20 ) heatmap_file <- plot_jis_delta(analysis, n_genes = 2)# Plot 5: Multi-q delta influence (isoform switching) heatmaps data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Configure analysis parameters first config <- TSENAT_config( sample_col = 'sample', condition_col = 'condition', subject_col = 'paired_samples', paired = TRUE, control = 'normal', q = seq(0, 2, by = 0.1) ) # Build analysis with configured parameters analysis <- build_analysis( readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length ) analysis <- filter_analysis(analysis, stringency = 'severe') analysis <- calculate_diversity( analysis, q = seq(0.2, 2, by = 0.4), verbose = FALSE ) analysis <- calculate_divergence( analysis, q = seq(0.2, 2, by = 0.4) ) analysis <- suppressWarnings(calculate_sait(analysis, method = 'lmm')) analysis <- calculate_jis( analysis, q = seq(0.2, 2, by = 0.4), n_bootstrap = 20 ) heatmap_file <- plot_jis_delta(analysis, n_genes = 2)
S4 wrapper that accepts a TSENATAnalysis object and generates GAM q-curve plots
plot_sait( analysis, n_top = 6, genes = NULL, condition_col = NULL, sig_alpha = 0.05, assay_name = "diversity", output_file = NULL, width = 12, height = NULL, verbose = FALSE, ... )plot_sait( analysis, n_top = 6, genes = NULL, condition_col = NULL, sig_alpha = 0.05, assay_name = "diversity", output_file = NULL, width = 12, height = NULL, verbose = FALSE, ... )
analysis |
|
n_top |
|
genes |
|
condition_col |
|
sig_alpha |
|
assay_name |
|
output_file |
|
width |
|
height |
|
verbose |
|
... |
Additional arguments passed to the base function. |
This wrapper automatically:
1. Extracts SummarizedExperiment from @se slot
2. Extracts SAIT results from @sait_results$sait_interaction slot
3. Detects condition_col from @config or uses default
4. Calls .plot_sait() with extracted parameters
**Parameter Resolution (condition_col):**
Explicit condition_col parameter (highest priority)
@config$condition_col if available
@config$condition if available
Default: 'sample_type'
A single ggplot object with all selected genes arranged in a
grid layout. Can be saved with ggplot2::ggsave().
calculate_sait for
running LM analysis on TSENATAnalysis.
## Not run: # Plot 3: GAM q-curves for genes with q-by-condition interactions data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Configure analysis parameters first config <- TSENAT_config( sample_col = 'sample', condition_col = 'condition', subject_col = 'paired_samples', paired = TRUE, control = 'normal', q = seq(0.2, 2, by = 0.4) # 5 unique q-values: 0.2, 0.6, 1.0, 1.4, 1.8 ) # Build analysis with configured parameters analysis <- build_analysis( readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length ) analysis <- filter_analysis(analysis, stringency = 'severe') analysis <- calculate_diversity(analysis, q = seq(0.2, 2, by = 0.4)) analysis <- suppressWarnings(calculate_sait(analysis, method = 'gam')) p_gam <- plot_sait(analysis, n_top = 2, sig_alpha = 0.15) # print(p_gam) ## End(Not run)## Not run: # Plot 3: GAM q-curves for genes with q-by-condition interactions data(readcounts) readcounts <- as.matrix(readcounts) mode(readcounts) <- 'numeric' metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_dataset <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') # Configure analysis parameters first config <- TSENAT_config( sample_col = 'sample', condition_col = 'condition', subject_col = 'paired_samples', paired = TRUE, control = 'normal', q = seq(0.2, 2, by = 0.4) # 5 unique q-values: 0.2, 0.6, 1.0, 1.4, 1.8 ) # Build analysis with configured parameters analysis <- build_analysis( readcounts = readcounts, tx2gene = gff3_dataset, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length ) analysis <- filter_analysis(analysis, stringency = 'severe') analysis <- calculate_diversity(analysis, q = seq(0.2, 2, by = 0.4)) analysis <- suppressWarnings(calculate_sait(analysis, method = 'gam')) p_gam <- plot_sait(analysis, n_top = 2, sig_alpha = 0.15) # print(p_gam) ## End(Not run)
A dataset containing transcript-level transcript-level read count abundances from salmon quantification. The dataset includes samples as columns (SRR14800475-SRR14800490) and transcript IDs as row names.
data(readcounts)data(readcounts)
A data frame with 3514 transcripts (rows) and 16 samples (columns). Row names are transcript IDs (ENST format) and column names are sample IDs. Values represent TPM-normalized read counts.
A matrix (data frame) containing transcript-level TPM-normalized read
counts from salmon quantification. Rows represent transcripts (ENST format
IDs) and columns represent samples. Aliases include readcounts
(main read count matrix), tpm (TPM values), and
effective_length (transcript lengths from salmon).
data(readcounts, package = 'TSENAT') dim(readcounts) head(readcounts[1:4, 1:3])data(readcounts, package = 'TSENAT') dim(readcounts) head(readcounts[1:4, 1:3])
Provides flexible access to diversity, divergence, and statistical test results with options for ranking, filtering, and format conversion.
results( analysis, type, q = NULL, rankBy = "none", n = NA, filterFDR = NULL, format = "text", n_genes = 4, q_values_table = c(0, 0.5, 1, 1.5, 2), top_n = NULL, sort_by = "adj_p_interaction", sample = NULL, plot = FALSE )results( analysis, type, q = NULL, rankBy = "none", n = NA, filterFDR = NULL, format = "text", n_genes = 4, q_values_table = c(0, 0.5, 1, 1.5, 2), top_n = NULL, sort_by = "adj_p_interaction", sample = NULL, plot = FALSE )
analysis |
|
type |
|
q |
|
rankBy |
|
n |
|
filterFDR |
|
format |
|
n_genes |
|
q_values_table |
|
top_n |
|
sort_by |
|
sample |
|
plot |
|
This function provides flexible access to all computed results with ranking, filtering, and format conversion. Compatible with DESeq2/edgeR design patterns for familiar result extraction workflows.
**Lazy Computation for switching_tables:**
When requesting type = 'switching_tables', the function automatically
computes and caches the tables if they don't exist yet but the prerequisites
do (LM and jackknife results). This eliminates the need for a separate
prepare_gene_switching_tables_s4() call - simply request the results
and they will be computed on-demand.
- For diversity with q=NULL: A named list of SummarizedExperiment objects, one per q-value - For diversity with q specified and format='text' or 'table': A data.frame table for display - For diversity with q specified and format='se': The SummarizedExperiment object directly (useful for SplicingFactory) - For divergence: A SummarizedExperiment (rows=genes, columns=q-values), data.frame, or other format depending on divergence computation method - For lm/jackknife: A data.frame or list based on type and format - For effect_sizes_divergence: A list containing effect size divergence results with components like interaction_results - For assumptions: A list containing rank-based assumption checks (exchangeability, monotonicity, consistency) and optional method-specific diagnostics (gam_metrics, gee_metrics, lmm_metrics, fpca_metrics) - For switching_tables with format='text' (default): A list with components:
$gene_headers Character vector of gene headers ('GeneName (ENSG00...)')
$comparison_tables List of data frames, one per gene, with columns: Transcript, q=0.00, q=0.50, ..., Direction Consistency
$q_metadata List with per-gene metadata: q_values_available and q_key_to_value mapping
- For switching_tables with format='raw': Original named list where names are gene headers and values are data frames with same column structure as format='text' Returns NULL if requested result type not computed or no results pass filtering.
# Load example data data(readcounts, package = 'TSENAT') # Create TSENATAnalysis from count matrix config <- TSENAT_config( q = 1.0, condition_col = 'group' ) se <- SummarizedExperiment::SummarizedExperiment( assays = list(counts = readcounts), colData = data.frame( group = rep(c('A', 'B'), length.out = ncol(readcounts)) ) ) analysis <- TSENATAnalysis(se = se, config = config) analysis <- calculate_diversity(analysis) # Get all diversity results (list of SummarizedExperiment objects, one per q) div_all <- results(analysis, type = 'diversity') # Get diversity for specific q-value (single SummarizedExperiment table for display) div_q1 <- results(analysis, type = 'diversity', q = 1.0) # Get diversity as SummarizedExperiment for downstream processing div_q1_se <- results(analysis, type = 'diversity', q = 1.0, format = 'se') # Get results ranked by p-value, top 20 genes # Using accessor function instead of @ slot access top_sait <- results(analysis, type = "sait", rankBy = 'pvalue', n = 20) # Get effect size results, top 6 genes by p-value (most significant first) top_effect_sizes <- results(analysis, type = 'effect_sizes_divergence', top_n = 6, sort_by = 'adj_p_interaction') # Get effect sizes sorted by mean divergence (largest effect sizes first) large_effects <- results(analysis, type = 'effect_sizes_divergence', top_n = 10, sort_by = 'Mean_Divergence') # Get switching tables - automatically computed if prerequisites exist # (no need to call prepare_gene_switching_tables_s4 separately) # Default format='text' returns structured list for vignette rendering switching <- results(analysis, type = 'switching_tables') # Get switching tables in raw format (named list of data frames) for direct manipulation switching_raw <- results(analysis, type = 'switching_tables', format = 'raw') # ======================================================================== # RETRIEVE CACHED PLOTS using the plot parameter # ======================================================================== # Get the diversity spectrum plot diversity_plot <- results(analysis, type = 'diversity', plot = TRUE) # Get the LM/regularized regression (GAM, LMM, GEE, FPCA) interaction plot sait_plot <- results(analysis, type = "sait", plot = TRUE) # Get the divergence distribution plot div_dist_plot <- results(analysis, type = 'divergence', plot = TRUE) # Get the influence/m-estimator plot influence_plot <- results(analysis, type = 'influence', plot = TRUE) # Available plot types correspond to analysis types: # - type = 'diversity': Returns Tsallis entropy q-spectrum visualization # - type = "sait": Returns regularized/penalized regression (GAM, LMM, GEE, FPCA) interaction plot # - type = 'divergence': Returns distribution of divergence metrics across genes # - type = 'influence': Returns m-estimator sample influence analysis # - type = 'rank_test': Returns Scheirer-Ray-Hare interaction visualization # - type = 'concordance': Returns method concordance comparison (LM vs rank test)# Load example data data(readcounts, package = 'TSENAT') # Create TSENATAnalysis from count matrix config <- TSENAT_config( q = 1.0, condition_col = 'group' ) se <- SummarizedExperiment::SummarizedExperiment( assays = list(counts = readcounts), colData = data.frame( group = rep(c('A', 'B'), length.out = ncol(readcounts)) ) ) analysis <- TSENATAnalysis(se = se, config = config) analysis <- calculate_diversity(analysis) # Get all diversity results (list of SummarizedExperiment objects, one per q) div_all <- results(analysis, type = 'diversity') # Get diversity for specific q-value (single SummarizedExperiment table for display) div_q1 <- results(analysis, type = 'diversity', q = 1.0) # Get diversity as SummarizedExperiment for downstream processing div_q1_se <- results(analysis, type = 'diversity', q = 1.0, format = 'se') # Get results ranked by p-value, top 20 genes # Using accessor function instead of @ slot access top_sait <- results(analysis, type = "sait", rankBy = 'pvalue', n = 20) # Get effect size results, top 6 genes by p-value (most significant first) top_effect_sizes <- results(analysis, type = 'effect_sizes_divergence', top_n = 6, sort_by = 'adj_p_interaction') # Get effect sizes sorted by mean divergence (largest effect sizes first) large_effects <- results(analysis, type = 'effect_sizes_divergence', top_n = 10, sort_by = 'Mean_Divergence') # Get switching tables - automatically computed if prerequisites exist # (no need to call prepare_gene_switching_tables_s4 separately) # Default format='text' returns structured list for vignette rendering switching <- results(analysis, type = 'switching_tables') # Get switching tables in raw format (named list of data frames) for direct manipulation switching_raw <- results(analysis, type = 'switching_tables', format = 'raw') # ======================================================================== # RETRIEVE CACHED PLOTS using the plot parameter # ======================================================================== # Get the diversity spectrum plot diversity_plot <- results(analysis, type = 'diversity', plot = TRUE) # Get the LM/regularized regression (GAM, LMM, GEE, FPCA) interaction plot sait_plot <- results(analysis, type = "sait", plot = TRUE) # Get the divergence distribution plot div_dist_plot <- results(analysis, type = 'divergence', plot = TRUE) # Get the influence/m-estimator plot influence_plot <- results(analysis, type = 'influence', plot = TRUE) # Available plot types correspond to analysis types: # - type = 'diversity': Returns Tsallis entropy q-spectrum visualization # - type = "sait": Returns regularized/penalized regression (GAM, LMM, GEE, FPCA) interaction plot # - type = 'divergence': Returns distribution of divergence metrics across genes # - type = 'influence': Returns m-estimator sample influence analysis # - type = 'rank_test': Returns Scheirer-Ray-Hare interaction visualization # - type = 'concordance': Returns method concordance comparison (LM vs rank test)
Extract SummarizedExperiment from TSENATAnalysis
se(object) ## S4 method for signature 'TSENATAnalysis' se(object)se(object) ## S4 method for signature 'TSENATAnalysis' se(object)
object |
|
Provides read-only access to the underlying SummarizedExperiment. To modify
the SummarizedExperiment, use the configuration methods or directly access
object@se.
SummarizedExperiment object
The SummarizedExperiment object stored in @se slot,
containing raw count data and sample metadata.
# Create a simple TSENATAnalysis object library(SummarizedExperiment) counts <- matrix(rpois(100, lambda = 10), nrow = 10, ncol = 10) colnames(counts) <- paste0('sample_', 1:10) rownames(counts) <- paste0('gene_', 1:10) se <- SummarizedExperiment(assays = list(counts = counts)) analysis <- new('TSENATAnalysis', se = se, config = list()) # Extract the SummarizedExperiment extracted_se <- se(analysis) dim(extracted_se)# Create a simple TSENATAnalysis object library(SummarizedExperiment) counts <- matrix(rpois(100, lambda = 10), nrow = 10, ncol = 10) colnames(counts) <- paste0('sample_', 1:10) rownames(counts) <- paste0('gene_', 1:10) se <- SummarizedExperiment(assays = list(counts = counts)) analysis <- new('TSENATAnalysis', se = se, config = list()) # Extract the SummarizedExperiment extracted_se <- se(analysis) dim(extracted_se)
Coordinates the full TSENAT workflow: diversity -> jackknife -> LM interactions -> divergence -> gene interactions -> rank-based tests -> concordance -> visualizations.
TSENAT( analysis, output_dir = "tsenat_outputs", save_output = TRUE, output_format = c("tsv", "csv", "txt", "rds"), verbose = TRUE )TSENAT( analysis, output_dir = "tsenat_outputs", save_output = TRUE, output_format = c("tsv", "csv", "txt", "rds"), verbose = TRUE )
analysis |
|
output_dir |
|
save_output |
|
output_format |
|
verbose |
|
Pipeline execution order (enforced, follows TSENAT.Rmd vignette):
filter_analysis() - Filter low-abundance transcripts
calculate_diversity() - Tsallis entropy per q-value
plot_diversity_spectrum() - Visualize q-spectrum
calculate_m_estimator() - Sample influence QC analysis
calculate_sait() - LM/SAIT interaction testing
plot_sait() - SAIT results visualization
calculate_jis() - Transcript switching detection
plot_jis_delta() - Multi-q influence heatmap (gene switching tables computed lazily via results())
plot_expression() - Top transcript visualization
calculate_divergence() - Pairwise divergence metrics
calculate_effect_sizes() - Effect size computation
plot_divergence_distribution() - Divergence distribution plot
plot_divergence_spectrum() - Divergence spectrum plot
calculate_assumptions() - Validate rank-based test assumptions
calculate_srh() - Scheirer-Ray-Hare rank-based interaction test
calculate_concordance() - Compare LM and rank test results
TSENATAnalysis object containing complete analysis results,
plots, and metadata.
data(readcounts, package = 'TSENAT') metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_file <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') config <- TSENAT_config( sample_col = 'sample', condition_col = 'condition', q = seq(0, 2, length.out = 10), generate_plots = FALSE ) analysis <- build_analysis( readcounts = as.matrix(readcounts), tx2gene = gff3_file, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length ) result <- TSENAT(analysis)data(readcounts, package = 'TSENAT') metadata_df <- read.table( system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t' ) gff3_file <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') config <- TSENAT_config( sample_col = 'sample', condition_col = 'condition', q = seq(0, 2, length.out = 10), generate_plots = FALSE ) analysis <- build_analysis( readcounts = as.matrix(readcounts), tx2gene = gff3_file, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length ) result <- TSENAT(analysis)
Builds a configuration list for use with TSENAT().
Allows specifying analysis parameters once and reusing across multiple
analyses.
TSENAT_config( q = 1, condition_col = "condition", subject_col = NULL, sample_col = "sample", paired = FALSE, control = NULL, p_threshold = 0.05, fdr_threshold = 0.05, significance_threshold = 0.05, bootstrap = FALSE, nboot = 1000, bootstrap_method = c("percentile", "bca"), stringency = "medium", nthreads = 1, norm = TRUE, bootstrap_ci = 0.95, bootstrap_include_diagnostics = TRUE, min_valid_frac = 0.75, norm_method = NULL, pseudocount = 0, shrinkage = "none", sait_method = c("gam", "lmm", "fpca", "gee"), sait_pcorr = c("BH", "bonferroni", "hochberg", "holm"), jis_use_sait_fdr = TRUE, divergence_ci = 0.95, assumptions_checks = c("rank", "gam", "all"), ... )TSENAT_config( q = 1, condition_col = "condition", subject_col = NULL, sample_col = "sample", paired = FALSE, control = NULL, p_threshold = 0.05, fdr_threshold = 0.05, significance_threshold = 0.05, bootstrap = FALSE, nboot = 1000, bootstrap_method = c("percentile", "bca"), stringency = "medium", nthreads = 1, norm = TRUE, bootstrap_ci = 0.95, bootstrap_include_diagnostics = TRUE, min_valid_frac = 0.75, norm_method = NULL, pseudocount = 0, shrinkage = "none", sait_method = c("gam", "lmm", "fpca", "gee"), sait_pcorr = c("BH", "bonferroni", "hochberg", "holm"), jis_use_sait_fdr = TRUE, divergence_ci = 0.95, assumptions_checks = c("rank", "gam", "all"), ... )
q |
|
condition_col |
|
subject_col |
|
sample_col |
|
paired |
|
control |
|
p_threshold |
|
fdr_threshold |
|
significance_threshold |
|
bootstrap |
|
nboot |
|
bootstrap_method |
|
stringency |
|
nthreads |
|
norm |
|
bootstrap_ci |
|
bootstrap_include_diagnostics |
|
min_valid_frac |
|
norm_method |
|
pseudocount |
|
shrinkage |
|
sait_method |
|
sait_pcorr |
|
jis_use_sait_fdr |
|
divergence_ci |
|
assumptions_checks |
|
... |
Additional configuration parameters (stored as-is). |
Configuration is stored in the TSENATAnalysis@config slot and used by wrapper functions to configure analysis behavior. Note: Statistical tests (Wilcoxon, shuffle) work on a single q-value, so only one q-value is specified in config.
list with class TSENATConfig containing all
specified parameters.
# Default config with standard parameters (point estimates only) cfg <- TSENAT_config() # For Wilcoxon/shuffle tests (single q-value required in config) cfg <- TSENAT_config( q = 1.0, # Shannon entropy - for rank tests condition_col = 'treatment', control = 'untreated' ) # For Scheirer-Ray-Hare rank tests (multiple q-values) cfg <- TSENAT_config( q = seq(0, 2, by = 0.5), # Multiple q-values for spectrum or advanced testing condition_col = 'treatment', control = 'untreated' ) # With bootstrap CIs for uncertainty quantification (recommended) cfg <- TSENAT_config( bootstrap = TRUE, # Enable bootstrap confidence intervals bootstrap_method = 'bca', # Bias-corrected (better for skewed entropy) nboot = 1000, # 1000 resamples bootstrap_ci = 0.95 # 95% CI ) # Custom with paired analysis, strict filtering, and normalization cfg <- TSENAT_config( q = 1.0, # Shannon entropy condition_col = 'treatment', subject_col = 'subject_id', paired = TRUE, control = 'untreated', stringency = 'severe', # High-confidence transcripts only norm_method = 'zscore', # Cross-study standardization shrinkage = 'none', # Empirical estimates bootstrap = TRUE, bootstrap_method = 'bca', nboot = 5000, # Higher precision pseudocount = 0, # Disabled by default; set > 0 to add pseudocount significance_threshold = 0.01 # Stricter significance level )# Default config with standard parameters (point estimates only) cfg <- TSENAT_config() # For Wilcoxon/shuffle tests (single q-value required in config) cfg <- TSENAT_config( q = 1.0, # Shannon entropy - for rank tests condition_col = 'treatment', control = 'untreated' ) # For Scheirer-Ray-Hare rank tests (multiple q-values) cfg <- TSENAT_config( q = seq(0, 2, by = 0.5), # Multiple q-values for spectrum or advanced testing condition_col = 'treatment', control = 'untreated' ) # With bootstrap CIs for uncertainty quantification (recommended) cfg <- TSENAT_config( bootstrap = TRUE, # Enable bootstrap confidence intervals bootstrap_method = 'bca', # Bias-corrected (better for skewed entropy) nboot = 1000, # 1000 resamples bootstrap_ci = 0.95 # 95% CI ) # Custom with paired analysis, strict filtering, and normalization cfg <- TSENAT_config( q = 1.0, # Shannon entropy condition_col = 'treatment', subject_col = 'subject_id', paired = TRUE, control = 'untreated', stringency = 'severe', # High-confidence transcripts only norm_method = 'zscore', # Cross-study standardization shrinkage = 'none', # Empirical estimates bootstrap = TRUE, bootstrap_method = 'bca', nboot = 5000, # Higher precision pseudocount = 0, # Disabled by default; set > 0 to add pseudocount significance_threshold = 0.01 # Stricter significance level )
Creates a new TSENATAnalysis object with a SummarizedExperiment base and optional initial configuration.
TSENATAnalysis(se, config = list())TSENATAnalysis(se, config = list())
se |
|
config |
|
The constructor initializes all slots with empty lists except @se, which must be provided. The @metadata slot automatically records: - creation timestamp - TSENAT package version - initial function call
A new TSENATAnalysis object.
# Load real TSENAT data data(readcounts) metadata_df <- read.table(system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t') gff3_file <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis(readcounts = readcounts, tx2gene = gff3_file, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length) analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes = 200)# Load real TSENAT data data(readcounts) metadata_df <- read.table(system.file('extdata', 'metadata.tsv', package = 'TSENAT'), header = TRUE, sep = '\t') gff3_file <- system.file('extdata', 'annotation.gff3.gz', package = 'TSENAT') config <- TSENAT_config(sample_col = 'sample', condition_col = 'condition') analysis <- build_analysis(readcounts = readcounts, tx2gene = gff3_file, metadata = metadata_df, config = config, tpm = tpm, effective_length = effective_length) analysis <- filter_analysis(analysis, min_samples = 1, subset_n_genes = 200)
Central container for unified analysis workflows in TSENAT.
## S4 method for signature 'TSENATAnalysis,ANY,ANY,ANY' x[i, j, drop = TRUE]## S4 method for signature 'TSENATAnalysis,ANY,ANY,ANY' x[i, j, drop = TRUE]
x |
A |
i |
Gene indices (numeric, logical, or character vector). Defaults to all genes. |
j |
Sample indices (numeric, logical, or character vector). Defaults to all samples. |
drop |
Ignored for TSENATAnalysis objects; included for S4 method signature compatibility. |
The TSENATAnalysis class encapsulates all components of a complete
TSENAT analysis: raw data, configuration metadata, and results from each
analytical step. This unified object ensures metadata is never lost through
the analysis pipeline and provides consistent accessor methods for result
retrieval.
Access results via the unified results(obj, type = ...) accessor method:
- type='diversity' for Tsallis entropy across q-values
- type = "sait" for regularized/penalized regression (GAM, LMM, GEE, FPCA) interaction results
- type='rank_test' for Scheirer-Ray-Hare rank-based test results
- type='divergence' for divergence metrics
- type='jackknife' for jackknife resampling results
Use metadata(obj) to access reproducibility metadata.
A new TSENATAnalysis object containing only the specified
genes and samples, with all associated results and metadata preserved.
seSummarizedExperiment. The base expression data object
(genes x samples) with assays and colData.
configlist. Configuration metadata specifying analysis
parameters that persist through the workflow (q-values, sample grouping
columns, etc.). Set once via TSENAT_config() and used by all
downstream wrapper functions.
diversity_resultslist. Named list of diversity calculation
results. Each name corresponds to a q-value (e.g., 'q_0.5', 'q_1.0').
Values are SummarizedExperiment objects or data.frames containing entropy
values for each gene at that q-value.
sait_resultslist. Complex results from scale-adaptive interaction models (GAM with smoothing, LMM with random effects, GEE with working correlations, FPCA for functional data)
(GAM, LMM, GEE, FPCA) and statistical testing. Top-level names identify analysis type:
sait_interactionRegularized regression (GAM/LMM/GEE/FPCA) model results (list with
$results data.frame, $models list, etc.)
rank_testScheirer-Ray-Hare rank-based test results
divergence_differenceDifferential divergence comparison
pairwise_resultslist. Pairwise group comparison results.
Results from running statistical tests for group comparisons.
rank_test_resultslist. Scheirer-Ray-Hare rank-based statistical
test results. Names correspond to q-values (e.g., 'q_0.5', 'q_1.0').
Computed by calculate_srh() as a non-parametric alternative
to linear mixed model testing.
jackknife_resultslist. Resampling-based confidence intervals.
Names correspond to q-values (e.g., 'q_0.5', 'q_1.0'). Values are
jackknife result objects containing resamples, CI bounds, and diagnostics.
divergence_resultslist. Divergence metric calculations.
Typically contains:
tsallis_divergenceSummarizedExperiment with divergence values
effect_sizesdata.frame with Cohen's d, etc.
plotslist. Cached visualization objects (ggplot). Names
identify plot type (e.g., 'q_curve', 'sait_interaction', 'influence').
Populated by TSENAT() if generate_plots=TRUE.
metadatalist. Reproducibility and tracking metadata.
Automatically maintained by wrapper functions. Includes:
created_atTimestamp of object creation
function_callsVector of wrapper functions called
function_timestampsTimestamps for each function call
package_versionTSENAT version at creation