| Title: | L1-regularization based methods for detection of differential splicing |
|---|---|
| Description: | Statistical methods for detection of differential splicing (differential exon usage) in RNA-seq and exon microarray data, using L1-regularization (lasso) to improve power. |
| Authors: | Lukas M. Weber [aut, cre] |
| Maintainer: | Lukas M. Weber <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.39.0 |
| Built: | 2026-05-30 06:58:40 UTC |
| Source: | https://github.com/bioc/regsplice |
Create a model design matrix for a single gene.
createDesignMatrix(condition, n_exons)createDesignMatrix(condition, n_exons)
condition |
Experimental conditions for each sample (character or numeric vector, or factor). |
n_exons |
Number of exons in the gene (integer). |
Creates a model design matrix for a single gene in the format required by the
regsplice model fitting functions. Required inputs are the experimental
conditions (groups) for each sample, and the number of exons in the gene.
The design matrix includes main effect terms for each exon and each sample, and interaction terms between the exons and conditions.
Note that the design matrix does not include main effect terms for the conditions, since these are absorbed into the main effect terms for the samples. In addition, the design matrix does not include an intercept column, since it is simpler to let the model fitting functions add an intercept term later.
The model fitting functions in subsequent steps call this function once for each gene.
Returns a model design matrix for the gene, in the format required by the
regsplice model fitting functions.
fitRegMultiple fitNullMultiple
fitFullMultiple LRTests
condition <- rep(c(0, 1), each = 3) n_exons <- 10 X <- createDesignMatrix(condition, n_exons)condition <- rep(c(0, 1), each = 3) n_exons <- 10 X <- createDesignMatrix(condition, n_exons)
Filter low-count exons from RNA-seq read count data.
filterLowCounts(rs_data, filter_min_per_exon = 6, filter_min_per_sample = 3)filterLowCounts(rs_data, filter_min_per_exon = 6, filter_min_per_sample = 3)
rs_data |
|
filter_min_per_exon |
Filtering parameter: minimum number of reads per exon bin, summed across all biological samples. Default is 6. |
filter_min_per_sample |
Filtering parameter: minimum number of reads per biological sample; i.e. for each exon bin, at least one sample must have this number of reads. Default is 3. |
Filters low-count exon bins from RNA-seq read count data. Any remaining single-exon genes (after filtering) are also removed (since differential splicing requires multiple exon bins).
Input data is assumed to be in the form of a RegspliceData object. See
RegspliceData for details.
The arguments filter_min_per_exon and filter_min_per_sample control the
amount of filtering. Exon bins that meet the filtering conditions are kept. Default
values for the arguments are provided; however, these should be adjusted depending on
the total number of samples and the number of samples per condition.
After filtering low-count exon bins, any remaining genes containing only a single exon bin are also removed (since differential splicing requires multiple exon bins).
Filtering should be skipped when using exon microarray data. (When using the
regsplice wrapper function, filtering can be disabled with the argument
filter = FALSE).
Previous step: Filter zero-count exon bins with filterZeros.
Next step: Calculate normalization factors with runNormalization.
Returns a RegspliceData object.
file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice") data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE) head(data) counts <- data[, 2:7] tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]])) gene_IDs <- names(tbl_exons) n_exons <- unname(tbl_exons) condition <- rep(c("untreated", "treated"), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_data <- filterZeros(rs_data) rs_data <- filterLowCounts(rs_data)file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice") data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE) head(data) counts <- data[, 2:7] tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]])) gene_IDs <- names(tbl_exons) n_exons <- unname(tbl_exons) condition <- rep(c("untreated", "treated"), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_data <- filterZeros(rs_data) rs_data <- filterLowCounts(rs_data)
Filter exons with zero RNA-seq read counts in all biological samples.
filterZeros(rs_data)filterZeros(rs_data)
rs_data |
|
Removes exon bins with zero RNA-seq read counts in all biological samples. Any remaining single-exon genes (after filtering) are also removed (since differential splicing requires multiple exon bins).
Input data is assumed to be in the form of a RegspliceData object. See
RegspliceData for details.
After filtering zero-count exon bins, any remaining genes containing only a single exon bin are also removed (since differential splicing requires multiple exon bins).
Filtering should be skipped when using exon microarray data. (When using the
regsplice wrapper function, filtering can be disabled with the argument
filter = FALSE).
Previous step: Create RegspliceData object with RegspliceData
constructor function.
Next step: Filter low-count exon bins with filterLowCounts.
Returns a RegspliceData object.
file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice") data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE) head(data) counts <- data[, 2:7] tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]])) gene_IDs <- names(tbl_exons) n_exons <- unname(tbl_exons) condition <- rep(c("untreated", "treated"), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_data <- filterZeros(rs_data)file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice") data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE) head(data) counts <- data[, 2:7] tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]])) gene_IDs <- names(tbl_exons) n_exons <- unname(tbl_exons) condition <- rep(c("untreated", "treated"), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_data <- filterZeros(rs_data)
Model fitting functions for regsplice package.
fitRegMultiple( rs_results, rs_data, alpha = 1, lambda_choice = c("lambda.min", "lambda.1se"), seed = NULL, ... ) fitNullMultiple(rs_results, rs_data, seed = NULL, ...) fitFullMultiple(rs_results, rs_data, seed = NULL, ...)fitRegMultiple( rs_results, rs_data, alpha = 1, lambda_choice = c("lambda.min", "lambda.1se"), seed = NULL, ... ) fitNullMultiple(rs_results, rs_data, seed = NULL, ...) fitFullMultiple(rs_results, rs_data, seed = NULL, ...)
rs_results |
|
rs_data |
|
alpha |
Elastic net parameter |
lambda_choice |
Parameter to select which optimal |
seed |
Random seed (integer). Default is NULL. Provide an integer value to set the random seed for reproducible results. |
... |
Other arguments to pass to |
There are three model fitting functions:
fitRegMultiple fits regularized (lasso) models containing an optimal subset of
exon:condition interaction terms for each gene. The model fitting procedure penalizes
the interaction terms only, so that the main effect terms for exons and samples are
always included. This ensures that the null model is nested, allowing likelihood ratio
tests to be calculated.
fitNullMultiple fits the null models, which do not contain any interaction
terms.
fitFullMultiple fits full models, which contain all exon:condition interaction
terms for each gene.
See createDesignMatrix for more details about the terms in each model.
The fitting functions fit models for all genes in the data set.
A random seed can be provided with the seed argument, to generate reproducible
results.
If the rs_data object does not contain a weights matrix, all exon bins are
weighted equally.
Previous step: Initialize RegspliceResults object with
initializeResults.
Next step: Calculate likelihood ratio tests with LRTests.
Returns a RegspliceResults object containing deviance and
degrees of freedom of the fitted models. See RegspliceResults
for details.
createDesignMatrix RegspliceResults
initializeResults LRTests
file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice") data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE) head(data) counts <- data[, 2:7] tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]])) gene_IDs <- names(tbl_exons) n_exons <- unname(tbl_exons) condition <- rep(c("untreated", "treated"), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_data <- filterZeros(rs_data) rs_data <- filterLowCounts(rs_data) rs_data <- runNormalization(rs_data) rs_data <- runVoom(rs_data) rs_results <- initializeResults(rs_data) rs_results <- fitRegMultiple(rs_results, rs_data) rs_results <- fitNullMultiple(rs_results, rs_data) rs_results <- fitFullMultiple(rs_results, rs_data)file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice") data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE) head(data) counts <- data[, 2:7] tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]])) gene_IDs <- names(tbl_exons) n_exons <- unname(tbl_exons) condition <- rep(c("untreated", "treated"), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_data <- filterZeros(rs_data) rs_data <- filterLowCounts(rs_data) rs_data <- runNormalization(rs_data) rs_data <- runVoom(rs_data) rs_results <- initializeResults(rs_data) rs_results <- fitRegMultiple(rs_results, rs_data) rs_results <- fitNullMultiple(rs_results, rs_data) rs_results <- fitFullMultiple(rs_results, rs_data)
Initialize a RegspliceResults object, which will contain the results of the
regsplice analysis.
initializeResults(rs_data)initializeResults(rs_data)
rs_data |
|
Creates a RegspliceResults object containing gene names only.
This object will subsequently be populated using the functions
fitRegMultiple, fitNullMultiple,
fitFullMultiple, and LRTests.
Previous step: Calculate limma-voom transformation and weights with
runVoom.
Next step: Fit models with fitRegMultiple,
fitNullMultiple, and fitFullMultiple.
Returns a RegspliceResults object containing gene IDs
only.
RegspliceData RegspliceResults
fitRegMultiple fitNullMultiple
fitFullMultiple LRTests summaryTable
file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice") data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE) head(data) counts <- data[, 2:7] tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]])) gene_IDs <- names(tbl_exons) n_exons <- unname(tbl_exons) condition <- rep(c("untreated", "treated"), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_data <- filterZeros(rs_data) rs_data <- filterLowCounts(rs_data) rs_data <- runNormalization(rs_data) rs_data <- runVoom(rs_data) rs_results <- initializeResults(rs_data)file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice") data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE) head(data) counts <- data[, 2:7] tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]])) gene_IDs <- names(tbl_exons) n_exons <- unname(tbl_exons) condition <- rep(c("untreated", "treated"), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_data <- filterZeros(rs_data) rs_data <- filterLowCounts(rs_data) rs_data <- runNormalization(rs_data) rs_data <- runVoom(rs_data) rs_results <- initializeResults(rs_data)
Calculate likelihood ratio tests between fitted models and null models.
LRTests(rs_results, when_null_selected = c("ones", "full", "NA"))LRTests(rs_results, when_null_selected = c("ones", "full", "NA"))
rs_results |
|
when_null_selected |
Which option to use for genes where the lasso model selects
zero interaction terms, i.e. identical to the null model. Options are |
The regularized (lasso) fitted models contain an optimal subset of exon:condition interaction terms for each gene, and the "full" fitted models contain all exon:condition interaction terms. The null models contain zero interaction terms, so they are nested within the fitted models.
The likelihood ratio (LR) tests compare the fitted models against the nested null models.
If the regularized (lasso) model contains at least one exon:condition interaction
term, the LR test compares the lasso model against the null model. However, if the
lasso model contains zero interaction terms, then the lasso and null models are
identical, so the LR test cannot be calculated. The when_null_selected argument
lets the user choose what to do in these cases: either set p-values equal to 1
(when_null_selected = "ones"); or calculate a LR test using the "full" model
containing all exon:condition interaction terms (when_null_selected = "full"),
which reduces power due to the larger number of terms, but allows the evidence for
differential exon usage among these genes to be distinguished. You can also return
NAs for these genes (when_null_selected = "NA").
The default option is when_null_selected = "ones". This simply calls all these
genes non-significant, which in most cases is sufficient since we are more interested
in genes with strong evidence for differential exon usage. However, if it is important
to rank the low-evidence genes in your data set, use the when_null_selected =
"full" option.
If when_null_selected = "ones" or when_null_selected = "NA", the "full"
fitted models are not required.
Previous step: Fit models with fitRegMultiple,
fitNullMultiple, and fitFullMultiple.
Next step: Generate summary table of results with summaryTable.
Returns a RegspliceResults object containing results of
the LR tests. The results consist of the following entries for each gene:
p_vals: raw p-values
p_adj: multiple testing adjusted p-values (Benjamini-Hochberg false discovery rates, FDR)
LR_stats: likelihood ratio test statistics
df_tests: degrees of freedom of likelihood ratio tests
RegspliceResults initializeResults
fitRegMultiple fitNullMultiple
fitFullMultiple summaryTable
file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice") data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE) head(data) counts <- data[, 2:7] tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]])) gene_IDs <- names(tbl_exons) n_exons <- unname(tbl_exons) condition <- rep(c("untreated", "treated"), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_data <- filterZeros(rs_data) rs_data <- filterLowCounts(rs_data) rs_data <- runNormalization(rs_data) rs_data <- runVoom(rs_data) rs_results <- initializeResults(rs_data) rs_results <- fitRegMultiple(rs_results, rs_data) rs_results <- fitNullMultiple(rs_results, rs_data) rs_results <- fitFullMultiple(rs_results, rs_data) rs_results <- LRTests(rs_results)file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice") data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE) head(data) counts <- data[, 2:7] tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]])) gene_IDs <- names(tbl_exons) n_exons <- unname(tbl_exons) condition <- rep(c("untreated", "treated"), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_data <- filterZeros(rs_data) rs_data <- filterLowCounts(rs_data) rs_data <- runNormalization(rs_data) rs_data <- runVoom(rs_data) rs_results <- initializeResults(rs_data) rs_results <- fitRegMultiple(rs_results, rs_data) rs_results <- fitNullMultiple(rs_results, rs_data) rs_results <- fitFullMultiple(rs_results, rs_data) rs_results <- LRTests(rs_results)
Wrapper function to run a regsplice analysis with a single command.
regsplice( rs_data, filter_zeros = TRUE, filter_low_counts = TRUE, filter_min_per_exon = 6, filter_min_per_sample = 3, normalize = TRUE, norm_method = "TMM", voom = TRUE, alpha = 1, lambda_choice = c("lambda.min", "lambda.1se"), when_null_selected = c("ones", "full", "NA"), seed = NULL, ... )regsplice( rs_data, filter_zeros = TRUE, filter_low_counts = TRUE, filter_min_per_exon = 6, filter_min_per_sample = 3, normalize = TRUE, norm_method = "TMM", voom = TRUE, alpha = 1, lambda_choice = c("lambda.min", "lambda.1se"), when_null_selected = c("ones", "full", "NA"), seed = NULL, ... )
rs_data |
|
filter_zeros |
Whether to filter zero-count exon bins, using
|
filter_low_counts |
Whether to filter low-count exon bins, using
|
filter_min_per_exon |
Filtering parameter for low-count exon bins: minimum number
of reads per exon bin, summed across all biological samples. Default is 6. See
|
filter_min_per_sample |
Filtering parameter for low-count exon bins: minimum
number of reads per biological sample; i.e. for each exon bin, at least one sample
must have this number of reads. Default is 3. See |
normalize |
Whether to calculate normalization factors, using
|
norm_method |
Normalization method to use. Options are |
voom |
Whether to calculate |
alpha |
Elastic net parameter |
lambda_choice |
Parameter to select which optimal |
when_null_selected |
Which option to use for genes where the lasso model selects
zero interaction terms, i.e. identical to the null model. Options are |
seed |
Random seed (integer). Default is NULL. Provide an integer value to set the random seed for reproducible results. |
... |
Other arguments to pass to |
This wrapper function runs the regsplice analysis pipeline with a single
command.
The required input format is a RegspliceData object, which is created with the
RegspliceData constructor function.
The wrapper function calls each of the individual functions in the analysis pipeline in sequence. You can also run the individual functions directly, which provides additional flexibility and insight into the statistical methodology. See the vignette for a description of the individual functions and an example workflow.
After running the analysis pipeline, a summary table of the results can be displayed
with summaryTable.
Note that when using exon microarray data, the filtering, normalization, and
voom steps should be disabled with the respective arguments.
See RegspliceData for details on constructing the input data
object; filterZeros and filterLowCounts for details about
filtering; runNormalization and runVoom for details about
calculation of normalization factors and voom transformation and weights;
createDesignMatrix for details about the model design matrices;
fitRegMultiple, fitNullMultiple, or
fitFullMultiple for details about the model fitting functions; and
LRTests for details about the likelihood ratio tests.
Returns a RegspliceResults object containing fitted model
results and likelihood ratio (LR) test results. The LR test results consist of the
following entries for each gene:
p_vals: raw p-values
p_adj: multiple testing adjusted p-values (Benjamini-Hochberg false discovery rates, FDR)
LR_stats: likelihood ratio test statistics
df_tests: degrees of freedom of likelihood ratio tests
RegspliceData RegspliceResults
initializeResults filterZeros
filterLowCounts runNormalization runVoom
createDesignMatrix fitRegMultiple
fitNullMultiple fitFullMultiple LRTests
summaryTable
file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice") data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE) head(data) counts <- data[, 2:7] tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]])) gene_IDs <- names(tbl_exons) n_exons <- unname(tbl_exons) condition <- rep(c("untreated", "treated"), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_results <- regsplice(rs_data) summaryTable(rs_results)file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice") data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE) head(data) counts <- data[, 2:7] tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]])) gene_IDs <- names(tbl_exons) n_exons <- unname(tbl_exons) condition <- rep(c("untreated", "treated"), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_results <- regsplice(rs_data) summaryTable(rs_results)
RegspliceData objects contain data in the format required by functions in the
regsplice analysis pipeline.
RegspliceData(counts, gene_IDs = NULL, n_exons = NULL, condition = NULL) ## S4 method for signature 'RegspliceData' assays(x, withDimnames, ..., value) countsData(x) ## S4 method for signature 'RegspliceData' countsData(x) weightsData(x) ## S4 method for signature 'RegspliceData' weightsData(x) ## S4 method for signature 'RegspliceData' rowData(x) ## S4 method for signature 'RegspliceData' colData(x, ..., value) ## S4 method for signature 'RegspliceData,ANY,ANY,ANY' x[i, j]RegspliceData(counts, gene_IDs = NULL, n_exons = NULL, condition = NULL) ## S4 method for signature 'RegspliceData' assays(x, withDimnames, ..., value) countsData(x) ## S4 method for signature 'RegspliceData' countsData(x) weightsData(x) ## S4 method for signature 'RegspliceData' weightsData(x) ## S4 method for signature 'RegspliceData' rowData(x) ## S4 method for signature 'RegspliceData' colData(x, ..., value) ## S4 method for signature 'RegspliceData,ANY,ANY,ANY' x[i, j]
counts |
RNA-seq read counts or exon microarray intensities (matrix or data
frame). Rows are exons, and columns are biological samples. Alternatively,
|
gene_IDs |
Vector of gene IDs (character vector). Length is equal to the number of genes. |
n_exons |
Vector of exon lengths (numeric vector of integers), i.e. the number of exon bins per gene. Length is equal to the number of genes. |
condition |
Experimental condition for each biological sample (character or numeric vector, or factor). |
x |
|
withDimnames |
See |
... |
Additional arguments for replacement with |
value |
Value for replacement with |
i |
Gene names (character vector) or row numbers (numeric vector) for subsetting genes or exons. Note that when subsetting whole genes, gene names (character vector) should be provided instead of row numbers, to avoid possible errors due to selecting incorrect row numbers. Row numbers may be provided to subset individual exons. |
j |
Column numbers (numeric vector) for subsetting biological samples. |
The RegspliceData format is based on the
SummarizedExperiment container. Initially, objects
contain raw data along with meta-data for rows (genes and exons) and columns
(biological samples). During subsequent steps in the regsplice analysis
pipeline, the data values are modified, and additional data and meta-data are added to
the object. Final results are stored in a RegspliceResults
object.
RegspliceData objects are created with the constructor function
RegspliceData().
Required inputs for the constructor function are counts (matrix or data frame
of RNA-seq read counts or exon microarray intensities), gene_IDs (vector of
gene IDs), n_exons (vector of exon lengths, i.e. number of exon bins per gene),
and condition (vector of experimental conditions for each biological sample).
Alternatively, the inputs can be provided as a SummarizedExperiment object,
which will be parsed to extract each of these components. This may be useful when
running regsplice as part of a pipeline together with other packages.
See the vignette for an example showing how to construct gene_IDs and
n_exons from a column of gene:exon IDs.
Exon microarray intensities should be log2-transformed, which is usually done during
pre-processing of microarray data. (RNA-seq counts will be transformed automatically
during the regsplice analysis pipeline; see runVoom.)
After creating a RegspliceData object, the wrapper function
regsplice can be used to run the analysis pipeline with a single
command. Alternatively, you can run the individual functions for each step in the
pipeline, beginning with filterZeros (see vignette for details).
Returns a RegspliceData object.
countsMatrix of RNA-seq read counts or exon microarray intensities. Rows are exons, and columns are biological samples.
weights(Optional) Matrix of observation-level weights. Rows are exons, and
columns are biological samples. Created by the runVoom function.
rowDataDataFrame of row meta-data. This should contain two columns:
gene_IDs and exon_IDs, which are created by the RegspliceData
constructor function.
colDataDataFrame of column meta-data. This contains the experimental
condition and (optionally) normalization factors for each biological sample.
Normalization factors are created by the runVoom function.
countsData(): Accesses the counts data matrix.
weightsData(): Accesses the (optional) weights data matrix.
rowData(): Accesses the DataFrame of row meta-data. This should
contain two columns: gene_IDs and exon_IDs.
colData(): Accesses the DataFrame of column meta-data. This
contains the experimental condition and (optionally) normalization factors for each
biological sample.
Subsetting of RegspliceData objects is performed with square brackets,
x[i, j], where x is the name of the object. The subsetting operations
are designed to keep data and meta-data in sync.
For subsetting by rows, there are two possibilities:
Subsetting genes: To subset whole genes, provide a character vector of gene
names to the argument i. The returned object will contain all rows
corresponding to these genes. Row numbers should not be used when subsetting whole
genes, since this risks potential errors due to selecting incorrect rows.
Subsetting exons: To subset individual exons, provide the corresponding row
numbers to the argument i.
For subsetting by columns (biological samples), provide the corresponding column
numbers to the argument j.
# --------- # Example 1 # --------- counts <- matrix(sample(100:200, 14 * 6, replace = TRUE), nrow = 14, ncol = 6) gene_IDs <- paste0("gene", 1:5) n_exons <- c(3, 2, 3, 1, 5) condition <- rep(c(0, 1), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_data countsData(rs_data) rowData(rs_data) colData(rs_data) rs_data[1, ] rs_data[1, 1:3] rs_data["gene1", ] rs_data["gene1", 1:3] # -------------------- # Example 2 (Vignette) # -------------------- file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice") data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE) head(data) counts <- data[, 2:7] tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]])) gene_IDs <- names(tbl_exons) n_exons <- unname(tbl_exons) condition <- rep(c("untreated", "treated"), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_data head(countsData(rs_data)) rowData(rs_data) colData(rs_data) rs_data[1, ] rs_data[1, 1:3] rs_data["ENSG00000000003", ] rs_data["ENSG00000000003", 1:3]# --------- # Example 1 # --------- counts <- matrix(sample(100:200, 14 * 6, replace = TRUE), nrow = 14, ncol = 6) gene_IDs <- paste0("gene", 1:5) n_exons <- c(3, 2, 3, 1, 5) condition <- rep(c(0, 1), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_data countsData(rs_data) rowData(rs_data) colData(rs_data) rs_data[1, ] rs_data[1, 1:3] rs_data["gene1", ] rs_data["gene1", 1:3] # -------------------- # Example 2 (Vignette) # -------------------- file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice") data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE) head(data) counts <- data[, 2:7] tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]])) gene_IDs <- names(tbl_exons) n_exons <- unname(tbl_exons) condition <- rep(c("untreated", "treated"), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_data head(countsData(rs_data)) rowData(rs_data) colData(rs_data) rs_data[1, ] rs_data[1, 1:3] rs_data["ENSG00000000003", ] rs_data["ENSG00000000003", 1:3]
RegspliceResults objects contain the results of a regsplice analysis.
RegspliceResults(gene_IDs) gene_IDs(x) ## S4 method for signature 'RegspliceResults' gene_IDs(x) p_vals(x) ## S4 method for signature 'RegspliceResults' p_vals(x) p_adj(x) ## S4 method for signature 'RegspliceResults' p_adj(x) LR_stats(x) ## S4 method for signature 'RegspliceResults' LR_stats(x) df_tests(x) ## S4 method for signature 'RegspliceResults' df_tests(x)RegspliceResults(gene_IDs) gene_IDs(x) ## S4 method for signature 'RegspliceResults' gene_IDs(x) p_vals(x) ## S4 method for signature 'RegspliceResults' p_vals(x) p_adj(x) ## S4 method for signature 'RegspliceResults' p_adj(x) LR_stats(x) ## S4 method for signature 'RegspliceResults' LR_stats(x) df_tests(x) ## S4 method for signature 'RegspliceResults' df_tests(x)
gene_IDs |
Gene identifiers or names (character vector). |
x |
|
RegspliceResults objects are created with the constructor function
RegspliceResults(), which requires the gene IDs as an argument.
Once created, RegspliceResults objects are then populated using the functions
fitRegMultiple, fitNullMultiple,
fitFullMultiple, and LRTests.
The function summaryTable can be used to display a summary table of the
results.
Returns an empty RegspliceResults object.
gene_IDsGene identifiers or names (character vector).
fit_reg_devDeviance of fitted regularized (lasso) models from
fitRegMultiple.
fit_reg_dfDegrees of freedom of fitted regularized (lasso) models from
fitRegMultiple.
fit_null_devDeviance of fitted null models from
fitNullMultiple.
fit_null_dfDegrees of freedom of fitted null models from
fitNullMultiple.
fit_full_devDeviance of fitted full models from
fitFullMultiple.
fit_full_dfDegrees of freedom of fitted full models from
fitFullMultiple.
p_valsRaw p-values (numeric vector).
p_adjMultiple testing adjusted p-values (Benjamini-Hochberg false discovery rates, FDR).
LR_statsLikelihood ratio test statistics.
df_testsDegrees of freedom of likelihood ratio tests.
gene_IDs(): Accesses gene identifiers or names.
p_vals(): Accesses raw p-values.
p_adj(): Accesses multiple testing adjusted p-values (Benjamini-Hochberg false discovery rates, FDR).
LR_stats(): Accesses likelihood ratio test statistics.
df_tests(): Accesses degrees of freedom of likelihood ratio tests.
fitRegMultiple fitNullMultiple
fitFullMultiple LRTests summaryTable
# initialize RegspliceResults object gene_IDs <- paste0("gene", 1:5) RegspliceResults(gene_IDs)# initialize RegspliceResults object gene_IDs <- paste0("gene", 1:5) RegspliceResults(gene_IDs)
Calculate normalization factors to scale library sizes, using the TMM (trimmed mean of
M-values) method implemented in edgeR.
runNormalization(rs_data, norm_method = "TMM")runNormalization(rs_data, norm_method = "TMM")
rs_data |
|
norm_method |
Normalization method to use. Options are |
Normalization factors are used to scale the raw library sizes (total read counts per
sample). We use the TMM (trimmed mean of M-values) normalization method (Robinson and
Oshlack, 2010), as implemented in the edgeR package.
For more details, see the documentation for calcNormFactors in
the edgeR package.
This step should be performed after filtering with filterZeros and
filterLowCounts. The normalization factors are then used by
limma-voom in the next step (runVoom).
The normalization factors are stored in a new column named norm_factors in the
column meta-data (colData slot) of the RegspliceData
object. The colData can be accessed with the accessor function
colData().
Normalization should be skipped when using exon microarray data. (When using the
regsplice wrapper function, normalization can be disabled with the
argument normalize = FALSE).
Previous step: Filter low-count exon bins with filterLowCounts.
Next step: Calculate limma-voom transformation and weights with
runVoom.
Returns a RegspliceData object. Normalization factors are
stored in the column norm_factors in the column meta-data (colData
slot), which can be accessed with the colData() accessor function.
file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice") data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE) head(data) counts <- data[, 2:7] tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]])) gene_IDs <- names(tbl_exons) n_exons <- unname(tbl_exons) condition <- rep(c("untreated", "treated"), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_data <- filterZeros(rs_data) rs_data <- filterLowCounts(rs_data) rs_data <- runNormalization(rs_data)file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice") data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE) head(data) counts <- data[, 2:7] tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]])) gene_IDs <- names(tbl_exons) n_exons <- unname(tbl_exons) condition <- rep(c("untreated", "treated"), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_data <- filterZeros(rs_data) rs_data <- filterLowCounts(rs_data) rs_data <- runNormalization(rs_data)
Use limma-voom to transform counts and calculate exon-level weights.
runVoom(rs_data)runVoom(rs_data)
rs_data |
|
Raw counts do not fulfill the statistical assumptions required for linear modeling.
The limma-voom methodology transforms counts to log2-counts per million
(logCPM), and calculates exon-level weights based on the observed mean-variance
relationship. Linear modeling methods can then be applied.
For more details, see the documentation for voom in the
limma package.
Note that voom assumes that exon bins (rows) with zero or low counts have
already been removed, so this step should be done after filtering with
filterZeros and filterLowCounts.
Normalization factors can be provided in a column named norm_factors in the
column meta-data (colData slot) of the RegspliceData
object. These will be used by voom to calculate normalized library sizes. If
normalization factors are not provided, voom will use non-normalized library
sizes (columnwise total counts) instead.
The experimental conditions or group labels for each biological sample are assumed to
be in a column named condition in the column meta-data (colData slot) of
the RegspliceData object. This column is created when the object
is initialized with the RegspliceData() constructor function.
The transformed counts are stored in the updated counts matrix, which can be
accessed with the countsData accessor function. The weights are stored
in a new data matrix labeled weights, which can be accessed with the
weightsData accessor function. In addition, the normalized library sizes
(if available) are stored in a new column named lib_sizes in the column
meta-data (colData slot).
If you are using exon microarray data, this step should be skipped, since exon microarray intensities are already on a continuous scale.
Previous step: Calculate normalization factors with runNormalization.
Next step: Initialize RegspliceResults object with the
constructor function RegspliceResults().
Returns a RegspliceData object. Transformed counts are
stored in the counts matrix, and weights are stored in a new weights
data matrix. The data matrices can be accessed with the accessor functions
countsData and weightsData.
runNormalization fitRegMultiple
fitNullMultiple fitFullMultiple
file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice") data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE) head(data) counts <- data[, 2:7] tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]])) gene_IDs <- names(tbl_exons) n_exons <- unname(tbl_exons) condition <- rep(c("untreated", "treated"), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_data <- filterZeros(rs_data) rs_data <- filterLowCounts(rs_data) rs_data <- runNormalization(rs_data) rs_data <- runVoom(rs_data)file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice") data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE) head(data) counts <- data[, 2:7] tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]])) gene_IDs <- names(tbl_exons) n_exons <- unname(tbl_exons) condition <- rep(c("untreated", "treated"), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_data <- filterZeros(rs_data) rs_data <- filterLowCounts(rs_data) rs_data <- runNormalization(rs_data) rs_data <- runVoom(rs_data)
Display summary table of results from a regsplice analysis.
summaryTable( rs_results, n = 20, threshold = 0.05, rank_by = c("FDR", "p-value", "none") )summaryTable( rs_results, n = 20, threshold = 0.05, rank_by = c("FDR", "p-value", "none") )
rs_results |
|
n |
Number of genes to display in summary table. Default is 20. If the total
number of significant genes up to the significance threshold is less than |
threshold |
Significance threshold (for either FDR or raw p-values, depending on
choice of argument |
rank_by |
Whether to rank genes by false discovery rate (FDR), raw p-values, or
no ranking. Choices are |
The results of a regsplice analysis consist of a set of multiple testing
adjusted p-values (Benjamini-Hochberg false discovery rates, FDR) quantifying the
statistical evidence for differential exon usage (DEU) for each gene. Typically, the
adjusted p-values are used to rank the genes in the data set according to their
evidence for DEU, and an appropriate significance threshold (e.g. FDR < 0.05) can be
used to generate a list of genes with statistically significant evidence for DEU.
The main regsplice functions return results in the form of a
RegspliceResults object, which contains slots for gene names,
fitted model results, raw p-values, multiple testing adjusted p-values
(Benjamini-Hochberg FDR), likelihood ratio (LR) test statistics, and degrees of
freedom of the LR tests. See RegspliceResults and the main
regsplice wrapper function regsplice for details.
This function generates a summary table of the results. The results are displayed as a
data frame of the top n most highly significant genes, ranked according to
either FDR or raw p-values, up to a specified significance threshold (e.g. FDR <
0.05).
The argument rank_by controls whether to rank by FDR or raw p-values. The
default is to rank by FDR.
To display results for all genes up to the significance threshold, set the argument
n = Inf. To display results for all genes in the data set, set both n =
Inf and threshold = 1.
Previous step: Run regsplice pipeline with the regsplice wrapper
function (or individual functions up to LRTests).
Returns a data frame containing results for the top n most highly
significant genes, up to the specified significance threshold for the FDR or raw
p-values.
file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice") data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE) head(data) counts <- data[, 2:7] tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]])) gene_IDs <- names(tbl_exons) n_exons <- unname(tbl_exons) condition <- rep(c("untreated", "treated"), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_results <- regsplice(rs_data) summaryTable(rs_results) summaryTable(rs_results, n = Inf, threshold = 1)file_counts <- system.file("extdata/vignette_counts.txt", package = "regsplice") data <- read.table(file_counts, header = TRUE, sep = "\t", stringsAsFactors = FALSE) head(data) counts <- data[, 2:7] tbl_exons <- table(sapply(strsplit(data$exon, ":"), function(s) s[[1]])) gene_IDs <- names(tbl_exons) n_exons <- unname(tbl_exons) condition <- rep(c("untreated", "treated"), each = 3) rs_data <- RegspliceData(counts, gene_IDs, n_exons, condition) rs_results <- regsplice(rs_data) summaryTable(rs_results) summaryTable(rs_results, n = Inf, threshold = 1)