| Title: | Differential Expression Analysis with Long Read RNA-Seq Data |
|---|---|
| Description: | Provides hurdle negative binomial models for differential expression analysis with long-read RNA-Seq data. |
| Authors: | Ziyang Liu [aut, cre] (ORCID: <https://orcid.org/0009-0004-2098-434X>), Hongxu Ding [aut, fnd] |
| Maintainer: | Ziyang Liu <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.99.6 |
| Built: | 2026-05-30 09:35:22 UTC |
| Source: | https://github.com/bioc/LRDE |
The LRDE package provides statistical methods for differential expression analysis of long-read RNA sequencing (RNA-Seq) data using a hurdle negative binomial generalized linear model (hurdle-NB GLM).
It implements procedures for:
Estimation of sample-specific size factors for normalization
Modeling zero inflation via group-specific expression probabilities
Gene-wise (tag-wise) dispersion estimation
Statistical testing for differential expression
These methods are designed to address key challenges in long-read RNA-Seq data, including limited sample sizes and excess zero counts (dropout events).
The main functions in this package include:
prepareDGE: Prepare count data for analysis. Converts supported
input types (matrix, data.frame, DGEList,
DESeqDataSet, and SummarizedExperiment) into a standardized format.
sizeFactorsEst: Estimate sample-specific size factors for normalization.
tagwiseEst: Estimate gene-specific (tag-wise) dispersion parameters
for a hurdle negative binomial model using prior information from bin-level estimates.
hurdle_LRT: Perform gene-wise likelihood ratio tests (LRT) for differential expression.
hurdle_Wald_Test: Perform gene-wise Wald tests for differential expression.
Typical workflow:
Prepare data using prepareDGE
Normalize counts with sizeFactorsEst
Estimate tag-wise dispersions using tagwiseEst
Perform differential expression testing with
hurdle_LRT or hurdle_Wald_Test
Ziyang Liu [email protected]
prepareDGE,
sizeFactorsEst,
tagwiseEst,
hurdle_LRT,
hurdle_Wald_Test
# Load the package library(LRDE) # Simulate count data set.seed(123) mat <- matrix(rnbinom(300, size = 5, mu = 5), nrow = 50) grp <- factor(c("A", "A", "A", "B", "B", "B")) # Prepare data y <- prepareDGE(mat, grp) # Normalize counts y <- sizeFactorsEst(y) # Estimate dispersions y <- tagwiseEst(y) # Differential expression testing y <- hurdle_Wald_Test(y) y <- hurdle_LRT(y) # Access results head(y$lrt_stats) head(y$p.values)# Load the package library(LRDE) # Simulate count data set.seed(123) mat <- matrix(rnbinom(300, size = 5, mu = 5), nrow = 50) grp <- factor(c("A", "A", "A", "B", "B", "B")) # Prepare data y <- prepareDGE(mat, grp) # Normalize counts y <- sizeFactorsEst(y) # Estimate dispersions y <- tagwiseEst(y) # Differential expression testing y <- hurdle_Wald_Test(y) y <- hurdle_LRT(y) # Access results head(y$lrt_stats) head(y$p.values)
Performs gene-wise likelihood ratio tests (LRT) for differential expression using a hurdle negative binomial model with fixed zero probabilities and tag-wise dispersions.
hurdle_LRT(y)hurdle_LRT(y)
y |
A list-like object returned from
|
For each gene:
The null model assumes a single shared mean across groups.
The alternative model estimates group-specific means.
Zero probabilities and dispersions are fixed from prior estimates.
When one group has all zero counts, a one-sided Z-test is applied instead.
The input object y with two additional elements:
Numeric vector of LRT statistics for each gene.
Numeric vector of corresponding p-values.
set.seed(123) mat <- matrix(rnbinom(30, size = 5, mu = 5), nrow = 5) grp <- c("A", "A", "A", "B", "B", "B") y <- prepareDGE(mat, grp) y <- sizeFactorsEst(y) y <- tagwiseEst(y) y <- hurdle_LRT(y)set.seed(123) mat <- matrix(rnbinom(30, size = 5, mu = 5), nrow = 5) grp <- c("A", "A", "A", "B", "B", "B") y <- prepareDGE(mat, grp) y <- sizeFactorsEst(y) y <- tagwiseEst(y) y <- hurdle_LRT(y)
Performs gene-wise Wald tests for differential expression using a hurdle negative binomial model with fixed zero probabilities and tag-wise dispersions.
hurdle_Wald_Test(y)hurdle_Wald_Test(y)
y |
A list-like object returned from
|
For each gene:
Zero probabilities and dispersions are fixed from prior estimates.
The model estimates group-specific mean parameters.
When one group has all zero counts, a one-sided Z-test is applied instead.
Otherwise, a standard two-sided Wald test is applied on the log-difference of group means.
The input object y with two additional elements:
Numeric vector of Wald statistics for each gene.
Numeric vector of corresponding p-values.
set.seed(123) mat <- matrix(rnbinom(30, size = 5, mu = 5), nrow = 5) grp <- c("A", "A", "A", "B", "B", "B") y <- prepareDGE(mat, grp) y <- sizeFactorsEst(y) y <- tagwiseEst(y) y <- hurdle_Wald_Test(y)set.seed(123) mat <- matrix(rnbinom(30, size = 5, mu = 5), nrow = 5) grp <- c("A", "A", "A", "B", "B", "B") y <- prepareDGE(mat, grp) y <- sizeFactorsEst(y) y <- tagwiseEst(y) y <- hurdle_Wald_Test(y)
Converts various supported input types to a standardized list format for downstream
differential expression analysis. Supports matrix, data.frame, DGEList,
DESeqDataSet, and SummarizedExperiment objects.
prepareDGE(data, group)prepareDGE(data, group)
data |
A numeric matrix, data.frame, or supported object containing counts. |
group |
A vector of group labels for the columns/samples of |
This function performs input validation.
Checks for non-negative numeric values and absence of NA.
Ensures group labels match the number of samples.
Automatically assigns column names if missing.
Returns a list suitable for use with hurdle model-based DE functions.
A list with two elements:
An integer matrix of counts.
A data.frame containing sample-level metadata: group, lib.size, and size.factor.
# Example with a matrix set.seed(123) mat <- matrix(rnbinom(30, size = 5, mu = 5), nrow = 5) grp <- c("A", "A", "A", "B", "B", "B") y <- prepareDGE(mat, grp) # Example with a SummarizedExperiment if (requireNamespace("SummarizedExperiment", quietly = TRUE)) { se <- SummarizedExperiment::SummarizedExperiment(assays = list(counts = mat)) y_se <- prepareDGE(se, grp) y }# Example with a matrix set.seed(123) mat <- matrix(rnbinom(30, size = 5, mu = 5), nrow = 5) grp <- c("A", "A", "A", "B", "B", "B") y <- prepareDGE(mat, grp) # Example with a SummarizedExperiment if (requireNamespace("SummarizedExperiment", quietly = TRUE)) { se <- SummarizedExperiment::SummarizedExperiment(assays = list(counts = mat)) y_se <- prepareDGE(se, grp) y }
Computes sample-specific size factors for long-read RNA-Seq data, used to normalize counts for differential expression analysis.
sizeFactorsEst(y, type = c("poscounts", "ratio"), locfunc = stats::median)sizeFactorsEst(y, type = c("poscounts", "ratio"), locfunc = stats::median)
y |
A count matrix ( |
type |
Character string specifying the method for estimating size factors:
Default is |
locfunc |
Function to summarize log-ratios across genes. Defaults to |
This function implements two methods for size factor estimation:
poscounts: Computes a geometric mean of positive counts per gene, then calculates ratios for each sample. Normalizes so that the geometric mean of size factors equals 1.
ratio: Uses the mean of log-counts per gene across samples to compute ratios.
The function automatically normalizes counts using the estimated size factors
and stores gene-level normalized means in baseMean.
A list (same structure as prepareDGE() output) with:
Original count matrix (integer).
Data frame with sample information, updated size.factor.
Normalized mean of counts per gene.
# Using a count matrix #' set.seed(123) mat <- matrix(rnbinom(30, size = 5, mu = 5), nrow = 5) grp <- c("A", "A", "A", "B", "B", "B") y <- prepareDGE(mat, grp) y <- sizeFactorsEst(y, type = "poscounts")# Using a count matrix #' set.seed(123) mat <- matrix(rnbinom(30, size = 5, mu = 5), nrow = 5) grp <- c("A", "A", "A", "B", "B", "B") y <- prepareDGE(mat, grp) y <- sizeFactorsEst(y, type = "poscounts")
Estimate gene-specific (tag-wise) dispersion parameters for a hurdle negative binomial model using prior information derived from bin-level estimates.
tagwiseEst(y)tagwiseEst(y)
y |
A list object created by |
This function performs the following steps:
Retrieves bin-level prior estimates of zero probabilities and log-dispersion
for each gene via priorEst.
Fixes the zero probabilities and optimizes only the mean parameters and dispersion for each gene individually.
Uses the internal function nll_hurdle_fixed_P to compute the negative
log-likelihood with fixed zero probabilities.
The resulting tagwise.disp will be used for downstream differential
expression analysis.
The input y object augmented with:
Numeric vector of estimated gene-wise dispersions.
Numeric matrix of fixed zero probabilities for each gene and group.
set.seed(123) mat <- matrix(rnbinom(30, size = 5, mu = 5), nrow = 5) grp <- c("A", "A", "A", "B", "B", "B") y <- prepareDGE(mat, grp) y <- sizeFactorsEst(y) y <- tagwiseEst(y) head(y$tagwise.disp) head(y$zero_prob_matrix)set.seed(123) mat <- matrix(rnbinom(30, size = 5, mu = 5), nrow = 5) grp <- c("A", "A", "A", "B", "B", "B") y <- prepareDGE(mat, grp) y <- sizeFactorsEst(y) y <- tagwiseEst(y) head(y$tagwise.disp) head(y$zero_prob_matrix)