---
title: "Appendix B: Non-Parametric Validation of Scale-Adaptive Interaction Test Results via GAMM and Rank-Based Methods"
author: "Cristobal Gallardo <gallardoalba@pm.me>"
date: "`r Sys.Date()`"
output:
  BiocStyle::html_document:
    toc: true
    toc_float: true
    toc_depth: 2
  rmarkdown::pdf_document:
    toc: true
    fig_width: 10
    fig_height: 6.5
    toc_depth: 2
    latex_engine: xelatex
header-includes:
  - \usepackage{float}
  - \usepackage{tabu}
bibliography: TSENAT.bib
suppress-bibliography: true
vignette: |
  %\VignetteIndexEntry{3. TSENAT Appendix B: Non-Parametric Validation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r global-setup-appendix-b, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    fig.width = 10,
    fig.height = 6.5,
    message = FALSE,
    warning = FALSE,
    dev = "png",
    dpi = 300
)
suppressPackageStartupMessages(library(kableExtra))
```

# Introduction

The primary goal of this vignette is to validate that discoveries of scale-dependent entropic index × group interactions from Scale-Adaptive Interaction Tests (SAIT) generalize to non-parametric statistical frameworks with minimal assumptions.

## Two Complementary Validation Approaches

**Method 1: Generalized Additive Mixed Models (GAMM)** with ARIMA-Ordered Measurement Structure.

- Framework: Semi-parametric model combining flexible smooth functions (thin-plate splines) with mixed-effects structure; assumes additive model Y = f₁(q) + f₂(condition) + f₃(q, condition) + subject-intercept + ε where f terms are smooth rather than linear.
- Paired design structure: Uses random intercepts by subject (~1|subject) with AR(1) correlation structure to accommodate repeated entropy measurements across q-values within each subject. This hierarchical approach preserves paired sample structure while modeling autocorrelation in q-ordered measurements.
- Distributional assumption: Uses Gaussian residuals (standard for mixed-effects models); automatically detects bounded support [0,1] entropy but applies Gaussian family due to `mgcv::gamm()` limitations with extended families in paired designs.
- Treatment of entropic structure: Treats q-values as time-like ordered measurements within each subject, applies ARIMA(1,1,0) first-differencing to remove monotone trend and ensure stationarity; AR(1) correlation structure captured through nlme::corAR1() specification.
- Heteroscedasticity handling: Automatically detects variance heterogeneity (Breusch-Pagan test); ; however, heteroscedasticity-based variance weights are not applied in paired designs due to `mgcv::gamm()` limitations.
- Key advantage: Captures smooth nonlinear q × condition interactions while accommodating paired sample correlation structure through random intercepts and AR(1) correlation modeling; includes small-sample bias correction for n < 20. 

**Method 2: Rank-Based Scheirer-Ray-Hare Test (SRH)** with Hochberg Step-Up Correction.

- Framework: Pure non-parametric method; zero parametric or distributional assumptions.
- Distributional assumption: None; operates entirely on ranks (1, 2, ..., n), equivalent to two-way ANOVA on ordinal data.
- Treatment of q-ordering: Two-way SRH test on ranks, treating q-values as within-subject paired measurements; decomposes ranked data via ANOVA and computes F-statistic.
- Heteroscedasticity handling: Rank transformation inherently robust to heteroscedasticity (variance differences do not affect relative ordering); Hochberg step-up correction controls FWER across multiple tests.
- Key advantage: Maximally robust to outliers and any distributional violations; valid under any continuous distribution and correlation structure.


## Why Two Methods for One Question?

Statistical testing in transcriptomics faces a fundamental challenge: no single method is universally optimal. Different approaches trade-off power for robustness:

| Aspect | Parametric (GAMM) | Non-Parametric (Rank-Based) |
|--------|------------------|----------------------------|
| Assumptions | Normality, homoscedasticity | Ranks only; fully non-parametric |
| Power | Highest (if assumptions hold) | Good; reduced but robust to violations |
| Robustness | Moderate | Highest |
| Outlier sensitivity | Moderate potential for bias | Minimal; inherently resistant |

Validation strategy: Concordance between both methods confirms robustness across statistical frameworks.

***

# Setup

This section initializes the analysis environment by loading required packages, setting a random seed for reproducibility, and preparing the test dataset.

```{r load-packages, message=FALSE}
suppressPackageStartupMessages({
    library(TSENAT)
    library(ggplot2)
    library(SummarizedExperiment)
    library(dplyr)
    library(gridExtra)
})

set.seed(42)

# Load preprocessed dataset
data(readcounts)
readcounts <- as.matrix(readcounts)
mode(readcounts) <- "numeric"

# Load metadata
metadata_df <- read.table(
    system.file("extdata", "metadata.tsv", package = "TSENAT"),
    header = TRUE, sep = "\t"
)

gff3_dataset <- system.file("extdata", "annotation.gff3.gz", package = "TSENAT")

# Configure analysis parameters first (best practice: fail-fast validation)
config <- TSENAT_config(
  sample_col = "sample",
  condition_col = "condition",
  subject_col = "paired_samples",
  q = seq(0, 2, by = 0.05),
  nthreads = 3,
  paired = TRUE,
  control = "normal"
)

# Build analysis object: creates SummarizedExperiment and initializes TSENATAnalysis
analysis <- build_analysis(
    readcounts = readcounts,
    tx2gene = gff3_dataset,
    metadata = metadata_df,
    tpm = tpm,
    effective_length = effective_length,
    config = config
)

# Apply filtering for quality control
analysis <- filter_analysis(
    analysis,
    stringency = "medium"
)
```

## Computing Tsallis Entropy with Pseudocount Regularization

Pseudocounts are critical for statistical robustness when applying rank-based tests to sparse RNA-seq count data. RNA-seq experiments typically contain many zero or near-zero counts, which creates two problems for rank-based nonparametric methods:

1. **Ties in rankings:** Sparse counts produce many tied values (especially zeros), which reduces the discriminatory power of rank tests. Rank-based statistics depend on unique orderings; when many observations are identical, the test statistic contains less information, reducing statistical power.

2. **Library size artifacts:** Small count differences due to sequencing depth variations overshadow true biological differences. Normalized library sizes (accounting for sequencing depth) ensure that entropy estimates reflect true biological diversity rather than technical artifacts [@S195; @S069].

By adding small pseudocounts proportional to library size, we achieve two benefits:

- **Regularization** breaks ties and prevents zero-inflation bias that compromises rank-based inference.
- **Normalization** makes entropy estimates comparable across samples with different sequencing depths. This approach is standard in RNA-seq analysis and is particularly important for entropy-based diversity metrics, which require positive values for logarithmic transforms.

The `calculate_diversity()` function implements automatic pseudocount selection, scaling pseudocount magnitude and diversity estimates by median library size to ensure robust statistical inference across the range of sequencing depths in your dataset.

```{r compute-tsallis-entropy, message=FALSE, results='hide'}
# Compute diversity using S4 wrapper with bootstrap confidence intervals
analysis <- calculate_diversity(
  analysis, 
  norm = TRUE,
  pseudocount = "auto"
)
```

# Rank-Based Approach (SRH)

The SRH test is a two-way non-parametric ANOVA that operates on ranks rather than raw values. It is particularly well-suited for Tsallis entropy analysis for three fundamental reasons:

1. **Robustness to Distribution Violations**

Tsallis entropy values exhibit inherent distributional properties that often violate parametric assumptions:

- Non-normality: Entropy commonly exhibits bounded support (e.g., normalized entropy in [0,1]), producing skewed or bimodal distributions rather than normal distributions.
- Heteroscedasticity: Variance in entropy estimates varies systematically across q-values. Low q-values (emphasizing rare transcripts) produce volatile entropy estimates with high variance; high q-values (emphasizing abundant transcripts) produce stable estimates with lower variance.
- Outlier sensitivity: Rare isoforms and count variability can produce extreme entropy values that heavily influence parametric tests.

The SRH test avoids these issues by replacing raw entropy values with their ranks (1, 2, 3, ..., n). Ranks are **uniformly distributed and contain no outliers**, making the method valid for ANY continuous distribution—normal, skewed, bimodal, or otherwise. Critically, rank ordering is unaffected by normalization choice; whether entropy is normalized, log-transformed, or left raw, the test produces identical results [@S270; @S271].

2. **Paired Design for Ordered q-Value Structure**

Tsallis entropy has a fundamental sequential property: entropy curves are smooth functions of q. As q increases from 0 (rare-species-emphasizing) to ∞ (common-species-emphasizing), entropy values change systematically. This ordered structure is critical to interpreting q-dependent patterns and exhibits **AR(1) autocorrelation**: consecutive q-values produce correlated entropy estimates.

The SRH test handles this structure by treating measurements as paired/repeated within genes across ordered q-values, capturing the sequential nature while maintaining exchangeability of samples. Autocorrelation in the original entropy values does NOT affect rank ordering or the validity of inference—this property, known as the *exchangeability property* of rank-based tests, ensures that rank-based inference is valid under any correlation pattern [@S019; @C018]. Within-subject ranking preserves pairing structure while removing the influence of AR(1) dependence through rank transformation [@S030; @K003].

3. **Interaction Testing: Testing q × Condition Effects**

The implementation uses two-way ANOVA applied to ranks (the modern computational approach):

$$F_{q \times \text{condition}} = \frac{MS_{\text{interaction}}}{MS_{\text{residual}}}$$

where:

- Data are first ranked: $R = \text{rank}(\text{entropy})$ (within-subject ranks for paired designs)
- Two-way ANOVA decomposes ranked data: $R = \mu + \alpha_q + \beta_{\text{condition}} + \gamma_{q \times \text{condition}} + \epsilon$
- $MS_{\text{interaction}}$ = sum of squares for $q \times$ condition interaction / degrees of freedom
- $MS_{\text{residual}}$ = residual sum of squares / residual degrees of freedom
- $F$-statistic follows $F$-distribution under null hypothesis of no interaction

This directly tests the core biological question: "Does entropy q-dependence differ between groups?" The rank-transformed two-way ANOVA is mathematically equivalent to the original SRH test statistic but computationally more stable and easier to interpret. Modern extensions of rank-based testing demonstrate the validity of permutation procedures for testing interactions even under complex dependence structures [@S166].

## Assumption Validation

To ensure valid statistical inference, we verify key data assumptions across multiple dimensions.

```{r validate-srh-assumptions, message=FALSE, results='hide'}
# Validate statistical assumptions (including GAMM diagnostics)
analysis <- calculate_assumptions(
    analysis,
    checks = "all"  # Include core checks + GAMM diagnostics
)

# Get assumptions
assumptions_text <- results(analysis, type = "assumptions")
print(assumptions_text)
```

```{r display-assumption-results, message=FALSE, echo=FALSE}
# Extract assumption check results as structured list
check_results <- results(analysis, type = "assumptions", format = "list")

# Display assumptions table as formatted kable
if (!is.null(check_results) && !is.null(check_results$assumptions_table)) {
  assumptions_df <- check_results$assumptions_table
  
  knitr::kable(
    assumptions_df,
    caption = "**Supplementary Table 14 | Assumption Checks for Scheirer-Ray-Hare Test.** Evaluates homogeneity of variance and normality for rank-based interaction test validity."
  )
}
```


**Interpretation of Results:** The exchangeability test, the unique assumption of the SRH test, detects strong serial correlation (p ≈ 0), indicating that consecutive samples are more correlated than expected by chance. This violation of exchangeability is **not problematic** for SRH because rank-based inference satisfies the *exchangeability property*—a principle ensuring that rank statistics depend only on value ordering, not on correlations in the original data [@S270; @S271; @K003]. The three reasons outlined above (robustness to distribution violations, paired design for ordered structure, and interaction testing) together uniquely suit SRH for multi-q entropy validation without requiring strong distributional assumptions.

## SRH Test: Testing q x Condition Interactions

Now we will use the SRH test to evaluate whether entropy patterns across entropic indices (q-values) differ between normal and tumor samples.

```{r srh-rank-test-analysis, message=FALSE, results='hide'}
# IMPORTANT: Create a copy of analysis before running SRH
# This preserves the GAMM results in 'analysis' for later concordance comparison

# Run SRH test for q * condition INTERACTION on the copy
analysis <- calculate_srh(
    analysis,
    multicorr = "hochberg"
)

# Extract ALL results (no filtering) for summary statistics
srh_results_all <- results(analysis, type = "rank_test", rankBy = "pvalue")

# Extract top significant genes for display
srh_results <- results(analysis, type = "rank_test", rankBy = "pvalue", n = 20, filterFDR = 0.05)
print(head(srh_results, n = 10))
```

```{r srh-results-summary, message=FALSE, echo=FALSE}
# Create summary statistics table with additional metrics using ALL tested genes
# Handle NA and NaN values properly
mean_effect <- mean(srh_results_all$effect_size_eta2, na.rm = TRUE)
median_effect <- median(srh_results_all$effect_size_eta2, na.rm = TRUE)

rank_summary_stats <- data.frame(
  Metric = c(
    "Genes tested", 
    "Significant (p < 0.05)", 
    "Significant (adj_p < 0.05, FWER-controlled)", 
    "NAs",
    "Mean effect size ($\\eta^2$)",
    "Median effect size ($\\eta^2$)",
    "Strong effect genes ($\\eta^2$ > 10%)"
  ),
  Value = c(
    nrow(srh_results_all),
    sum(srh_results_all$p_value < 0.05, na.rm = TRUE),
    sum(srh_results_all$adj_p_value < 0.05, na.rm = TRUE),
    sum(is.na(srh_results_all$p_value)),
    if (is.na(mean_effect) || !is.finite(mean_effect)) "Not available" else sprintf("%.1f%%", mean_effect * 100),
    if (is.na(median_effect) || !is.finite(median_effect)) "Not available" else sprintf("%.1f%%", median_effect * 100),
    sum(srh_results_all$p_value < 0.05 & srh_results_all$effect_size_eta2 > 0.10, na.rm = TRUE)
  )
)

knitr::kable(rank_summary_stats, caption = "**Supplementary Table 15 | Scheirer-Ray-Hare Test Results Summary.** Rank-based nonparametric test of *q* × condition interactions. Compare with GAMM results (main vignette Table 3) to assess parametric vs. nonparametric method concordance.", label = "tab:srh-test-summary", align = c("l", "r"))

# Show top genes results
rank_sig <- srh_results_all[!is.na(srh_results_all$p_value), ]
if (nrow(rank_sig) > 0) {
  rank_sig <- rank_sig[order(rank_sig$adj_p_value), ]
  rank_sig <- rank_sig[1:min(10, nrow(rank_sig)), ]
  
  # Select columns: gene, p-values, f-statistic, effect size, interaction class
  cols_to_show <- intersect(c("gene", "p_value", "adj_p_value", "f_statistic", "effect_size_eta2", 
                              "interaction_class"), 
                            colnames(rank_sig))
  if (length(cols_to_show) > 0) {
    rank_display <- rank_sig[, cols_to_show, drop = FALSE]
    
    # Format p-values with scientific notation for very small values
    if ("p_value" %in% colnames(rank_display)) {
      rank_display$p_value <- sapply(rank_display$p_value, function(x) {
        if (is.na(x)) "NA" else if (x >= 0.9999) "1" else if (x < 0.0001) format(signif(x, 2), scientific=TRUE) else sprintf("%.6f", x)
      })
    }
    
    if ("adj_p_value" %in% colnames(rank_display)) {
      rank_display$adj_p_value <- sapply(rank_display$adj_p_value, function(x) {
        if (is.na(x)) "NA" else if (x >= 0.9999) "1" else if (x < 0.0001) format(signif(x, 2), scientific=TRUE) else sprintf("%.6f", x)
      })
    }
    
    # Replace test method name for clarity (removed - column no longer included)
    # if ("test_method" %in% colnames(rank_display)) {
    #   rank_display$test_method <- sapply(rank_display$test_method, function(x) {
    #     if (is.na(x)) "NA" else if (x == "srh_paired") "Scheirer-Ray-Hare (paired)" else x
    #   })
    # }
    
    # Format other numeric columns
    if ("f_statistic" %in% colnames(rank_display)) {
      rank_display$f_statistic <- sapply(rank_display$f_statistic, function(x) {
        if (is.na(x)) "NA" else sprintf("%.4f", x)
      })
    }
    
    if ("effect_size_eta2" %in% colnames(rank_display)) {
      rank_display$effect_size_eta2 <- sapply(rank_display$effect_size_eta2, function(x) {
        if (is.na(x)) "NA" else sprintf("%.4f", x)
      })
    }
    
    knitr::kable(rank_display, 
                 caption = "**Supplementary Table 16 | Top genes identified by Scheirer-Ray-Hare rank-based test.** Ranked by adjusted p-value; comparison with GAMM (main vignette Table 3) reveals method-specific sensitivities.",
                 col.names = c("Gene", "P-value", "Adj. P-value", "F-Statistic", 
                               "Effect Size (η²)", "Interaction Class")[1:ncol(rank_display)])
  }
}
```

Visualize q-curves for top genes:

```{r extended-fig-1-srh-q-curves, fig.width=12, fig.height=8, message=FALSE, out.width="98%", fig.pos="H", fig.cap="**Supplementary Figure 1 | SRH rank test results for scale-dependent isoform switching.** Q-curves for top genes identified by Scheirer-Ray-Hare rank-based test; compare visual patterns with GAMM results (main vignette Figure 2) to assess method agreement."}
# Plot top genes from SRH test (using all genes, not just significant ones)
# This ensures the plot displays n_top genes regardless of significance threshold
top_genes_plot <- plot_diversity_spectrum(analysis, sait_res = srh_results_all, n_top = 4)
print(top_genes_plot)
```

***

# Generalized Additive Mixed Model Approach (GAMM)

This section loads precomputed GAMM results generated in the main vignette ([TSENAT.Rmd](TSENAT.Rmd)).

```{r generalized-additive-model-analysis, message=FALSE, results='hide'}
# Load precomputed GAMM analysis object from RDS file
analysis_sait <- readRDS(
    system.file("extdata", "analysis_sait.rds", package = "TSENAT")
)

# Extract GAMM results for inspection using accessor function
sait_results <- results(analysis_sait, type = "sait", rankBy = "pvalue")
print(sait_results)

```

```{r gam-results-summary-table, message=FALSE, echo=FALSE}
# Create summary statistics table with additional metrics
# Handle NA and NaN values properly
mean_effect <- mean(sait_results$effect_size, na.rm = TRUE)
median_effect <- median(sait_results$effect_size, na.rm = TRUE)

summary_stats <- data.frame(
  Metric = c(
    "Genes tested", 
    "Significant (p < 0.05)", 
    "Concordant (p < 0.05 AND adj_p < 0.05)",
    "NAs",
    "Mean effect size",
    "Median effect size",
    "Strong effect genes (effect_size > 30%)",
    "Model convergence rate"
  ),
  Value = c(
    nrow(sait_results),
    sum(sait_results$p_interaction < 0.05, na.rm = TRUE),
    sum(sait_results$p_interaction < 0.05 & sait_results$adj_p_interaction < 0.05, na.rm = TRUE),
    sum(is.na(sait_results$p_interaction)),
    if (is.na(mean_effect) || !is.finite(mean_effect)) "Not available" else sprintf("%.1f%%", mean_effect * 100),
    if (is.na(median_effect) || !is.finite(median_effect)) "Not available" else sprintf("%.1f%%", median_effect * 100),
    sum(sait_results$p_interaction < 0.05 & sait_results$effect_size > 0.30, na.rm = TRUE),
    sprintf("%.1f%%", mean(sait_results$model_converged, na.rm = TRUE) * 100)
  )
)

knitr::kable(summary_stats, caption = "**Supplementary Table 17 | GAMM Results Summary.** Overview of generalized additive mixed model statistics for q × condition interaction tests across full q-spectrum with paired sample structure.", label = "tab:gam-summary-stats", align = c("l", "r"))

# Show top genes with additional information
if ("p_interaction" %in% colnames(sait_results)) {
  gam_sig <- sait_results[!is.na(sait_results$p_interaction), ]
  
  if (nrow(gam_sig) > 0) {
    gam_sig <- gam_sig[order(gam_sig$p_interaction), ]
    gam_sig <- gam_sig[1:min(10, nrow(gam_sig)), ]
    
    # Select columns for display: gene, p-values, effect size, test statistic, df
    cols_to_show <- intersect(c("gene", "p_interaction", "adj_p_interaction", "effect_size", 
                                "test_statistic", "df_residual"), 
                              colnames(gam_sig))
    if (length(cols_to_show) > 0) {
      gam_display <- gam_sig[, cols_to_show, drop = FALSE]
      
      # Format numeric columns (but NOT p-value columns - preserve full precision for later formatting)
      numeric_cols <- sapply(gam_display, is.numeric)
      # Only round non-p-value columns to 3 decimal places
      pval_cols <- c("p_interaction", "adj_p_interaction")
      cols_to_round <- setdiff(names(numeric_cols[numeric_cols]), pval_cols)
      if (length(cols_to_round) > 0) {
        gam_display[, cols_to_round] <- round(gam_display[, cols_to_round], 3)
      }
      
      # Format p-value columns with scientific notation for very small values
      if ("p_interaction" %in% colnames(gam_display)) {
        gam_display$p_interaction <- sapply(gam_display$p_interaction, function(x) {
          if (is.na(x)) "NA" else if (x < 0.001) sprintf("%.2e", x) else sprintf("%.6f", x)
        })
      }
      
      if ("adj_p_interaction" %in% colnames(gam_display)) {
        gam_display$adj_p_interaction <- sapply(gam_display$adj_p_interaction, function(x) {
          if (is.na(x)) "NA" else if (x < 0.001) sprintf("%.2e", x) else sprintf("%.6f", x)
        })
      }
      
      # Format effect size as percentage
      if ("effect_size" %in% colnames(gam_display)) {
        gam_display$effect_size <- sapply(gam_display$effect_size, function(x) {
          if (is.na(x)) "NA" else sprintf("%.1f%%", x * 100)
        })
      }
      
      # Format test statistic
      if ("test_statistic" %in% colnames(gam_display)) {
        gam_display$test_statistic <- sapply(gam_display$test_statistic, function(x) {
          if (is.na(x)) "NA" else sprintf("%.2f", x)
        })
      }
      
      # Format df as integer
      if ("df_residual" %in% colnames(gam_display)) {
        gam_display$df_residual <- sapply(gam_display$df_residual, function(x) {
          if (is.na(x)) "NA" else as.integer(x)
        })
      }
      
      # Rename columns for better readability
      colnames(gam_display) <- c("Gene", "p (interaction)", "Adjusted p", "Effect size", "Test statistic", "df")
      
      knitr::kable(gam_display, 
                   caption = "**Supplementary Table 18 | Top 10 Genes by GAMM p-value (with Effect Size and Test Statistics).** Ranked by adjusted p-value; compare with SRH results (Supplementary Table 16) for method agreement assessment.",
                   label = "tab:gam-top-genes",
                   align = c("l", "r", "r", "r", "r", "r"))
    }
  }
} else {
  cat("WARNING: p_interaction column not found in results\n")
  cat("Available columns:", paste(colnames(sait_results), collapse=", "), "\n")
}
```

***

# Method Concordance Analysis

A critical validation step is to compare whether the SRH rank-based test and GAM produce consistent results. High concordance between methods (i.e., genes significant in both approaches) provides strong evidence that discoveries are robust to methodological choice. Discordant genes—those significant in only one method—warrant closer inspection: they may represent genuine biological signals revealed by one method's specific advantages, or artifacts of that method's assumptions. This section quantifies agreement between approaches and identifies high-confidence genes significant across both statistical frameworks.

```{r compare-srh-gam-methods, message=FALSE}
# Compute concordance analysis using new two-object API
analysis <- calculate_concordance(
    analysis_sait = analysis_sait,
    analysis_rank = analysis,
    verbose = TRUE
)

# Extract concordance results with list format for component display
concordance_results <- results(analysis, type = "concordance", format = "list")
```

```{r display-concordance-metrics, message=FALSE, echo=FALSE, results='asis'}
# Display concordance summary - results() with format="list" returns structured components
if (!is.null(concordance_results) && is.list(concordance_results)) {
  # Display summary table
  if (!is.null(concordance_results$summary_table) && nrow(concordance_results$summary_table) > 0) {
    knitr::kable(concordance_results$summary_table, 
                 caption = "**Supplementary Table 19 | Global Concordance Metrics: GAMM vs Scheirer-Ray-Hare Methods.** Quantifies agreement between parametric (GAMM) and nonparametric (SRH) approaches for detecting q × condition interactions.",
                 align = c("l", "r"))
  }
  
  # Display agreement distribution
  if (!is.null(concordance_results$agreement_dist) && nrow(concordance_results$agreement_dist) > 0) {
    cat("\n")
    knitr::kable(concordance_results$agreement_dist, 
                 caption = "**Supplementary Table 20 | Agreement Distribution: Concordance by Gene Category.** Distribution of genes across categories: both methods significant, GAM only, SRH only, and neither method.",
                 align = c("l", "r", "r"))
  }
  
  # Display high-confidence genes (concordant in both methods)
  if (!is.null(concordance_results$high_conf_table) && nrow(concordance_results$high_conf_table) > 0) {
    cat("\n")
    n_hc <- nrow(concordance_results$high_conf_table)
    knitr::kable(concordance_results$high_conf_table, 
                 caption = paste0("**Supplementary Table 21 | Robust Entropic Order Index Interactions: High-Confidence Genes Detected by Both Methods (n=", n_hc, ", ranked by statistical significance).** Cross-method validation: genes significant in both GAM and SRH provide strongest evidence for q × condition effects."),
                 align = c("l", "r", "r", "r", "r"))
  }
} else {
  cat("Concordance results not available\n")
}
```

## Statistical Power vs. Robustness: Understanding Method Discordance

The striking discordance between GAMM and SRH results (75.0% significant in GAMM only, 0.0% in rank test only, 2.6% both methods significant) reflects a fundamental trade-off in statistical methodology: **parametric methods maximize power when assumptions hold, while nonparametric methods sacrifice power for robustness to assumption violations**.

GAMM's superior power derives from two factors:

1. Distributional assumptions: GAMM assumes approximately normal residuals and homogeneous variance, which are reasonable after appropriate transformation for entropy data. Under these assumptions, parametric methods are theoretically optimal, achieving maximum power for a given type I error rate. The rank-based SRH test is assumption-free but necessarily discards quantitative information by converting measurements to ranks, which reduces statistical power when the underlying data are approximately normal.

2. Model flexibility with penalty: GAMM uses thin-plate splines with smoothness penalties that simultaneously fit nonlinear patterns while controlling degrees of freedom. This flexibility allows GAMM to detect subtle entropic index × group interactions across all q-values. The SRH test, by contrast, operates on rank patterns only, which is inherently less sensitive to continuous relationships across the ordering (q-values).

This complementary approach—combining high-power parametric tests with robust nonparametric alternatives—provides confidence that discoveries are methodologically sound rather than artifacts of statistical assumptions.

## Visualization: Method Comparison

Visual comparison of concordance patterns provides intuitive assessment of which genes align between methods and which are method-specific. The following plots display significance calls, effect sizes, and agreement metrics in a format that facilitates interpretation of both robust discoveries and method-specific findings.

```{r visualize-method-concordance, fig.width=10, fig.height=8, out.width="98%", fig.pos="H", fig.cap="**Supplementary Figure 2 | Method concordance visualization comparing SRH rank-based and GAMM approaches for detecting q × condition interactions.** Venn diagram or scatter plot showing overlap in genes identified as significant by each method; reveals parametric-nonparametric agreement patterns."}
# Create comparison plot using S4 wrapper
# Concordance analysis results are stored in analysis metadata after calculate_concordance()
p <- plot_concordance(analysis, verbose = TRUE)
print(p)
```

***

# Session Information

```{r session-info-appendix-b}
sessionInfo()
```