Title: | PAIRADISE: Paired analysis of differential isoform expression |
---|---|
Description: | This package implements the PAIRADISE procedure for detecting differential isoform expression between matched replicates in paired RNA-Seq data. |
Authors: | Levon Demirdjian, Ying Nian Wu, Yi Xing |
Maintainer: | Qiang Hu <[email protected]>, Levon Demirdjian <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.23.0 |
Built: | 2024-10-31 06:09:06 UTC |
Source: | https://github.com/bioc/PAIRADISE |
PDseDataSet counts
counts(object)
counts(object)
object |
A PDseDataSet object |
A counts matrix
Primary function of the PAIRADISE package. Analyzes matched pairs for differences in isoform expression. Uses parallel processing to speed up computation.
pairadise( pdat, nIter = 100, tol = 10^(-2), pseudocount = 0, seed = 12321, equal.variance = FALSE, numCluster = 2, BPPARAM = MulticoreParam(numCluster) )
pairadise( pdat, nIter = 100, tol = 10^(-2), pseudocount = 0, seed = 12321, equal.variance = FALSE, numCluster = 2, BPPARAM = MulticoreParam(numCluster) )
pdat |
A PDseDataSet object |
nIter |
Positive integer. Specifies the maximum number of iterations of the optimization algorithm allowed. Default is nIter = 100 |
tol |
Positive number. Specifies the tolerance level for terminating the optimization algorithm, defined as the difference in log-likelihood ratios between iterations. Default is tol = 10^(-2) |
pseudocount |
Positive number. Specifies a value for a pseudocount added to each count at the beginning of the analysis. Default is pseudocount = 0 |
seed |
An integer to set seed. |
equal.variance |
Are the group variances assumed equal? Default value is FALSE. |
numCluster |
Number of clusters to use for parallel computing. |
BPPARAM |
parallel parameters from package BiocParallel. |
This is the primary function of the PAIRADISE package that implements the PAIRADISE algorithm.
A PDseDataSet object contains outputs from PAIRADISE algorithm.
############################# ## Example: Simulated data ## ############################# set.seed(12345) data("sample_dataset") pdat <- PDseDataSetFromMat(sample_dataset) pdat <- pairadise(pdat, numCluster =4) results(pdat)
############################# ## Example: Simulated data ## ############################# set.seed(12345) data("sample_dataset") pdat <- PDseDataSetFromMat(sample_dataset) pdat <- pairadise(pdat, numCluster =4) results(pdat)
We introduce PAIRADISE (PAIred Replicate analysis of Allelic DIfferential Splicing Events), a method for detecting allele-specific alternative splicing (ASAS) from RNA-seq data. PAIRADISE uses a statistical model that aggregates ASAS signals across multiple individuals in a population. It formulates ASAS detection as a statistical problem for identifying differential alternative splicing from RNA-seq data with paired replicates. The PAIRADISE statistical model is applicable to many forms of allele-specific isoform variation (e.g. RNA editing), and can be used as a generic statistical model for RNA-seq studies involving paired replicates.
'PDseDataSet' is a subclass of 'SummarizedExperiment'. It can used to store inclusion and skipping splicing counts for pair designed samples.
PDseDataSet(counts, design, lengths)
PDseDataSet(counts, design, lengths)
counts |
The counts of splicing events, including inclusion and skipping counts in 3 dimensions for each sample. |
design |
The paired design data.frame, including sample column for sample ids and group column for design factors. |
lengths |
Two columns iLen and sLen for the effective lengths of inclusion and skipping isoforms. |
A PDseDataSet object
icount <- matrix(1:4, 1) scount <- matrix(5:8, 1) acount <- abind::abind(icount, scount, along = 3) design <- data.frame(sample = rep(c("s1", "s2"), 2), group = rep(c("T", "N"), each = 2)) lens <- data.frame(sLen=1L, iLen=2L) PDseDataSet(acount, design, lens)
icount <- matrix(1:4, 1) scount <- matrix(5:8, 1) acount <- abind::abind(icount, scount, along = 3) design <- data.frame(sample = rep(c("s1", "s2"), 2), group = rep(c("T", "N"), each = 2)) lens <- data.frame(sLen=1L, iLen=2L) PDseDataSet(acount, design, lens)
The Mat format should have 7 columns, arranged as follows: Column 1 contains the ID of the alternative splicing events. Column 2 contains counts of isoform 1 corresponding to the first group. Column 3 contains counts of isoform 2 corresponding to the first group. Column 4 contains counts of isoform 1 corresponding to the second group. Column 5 contains counts of isoform 2 corresponding to the second group. Column 6 contains the effective length of isoform 1. Column 7 contains the effective length of isoform 2. Replicates in columns 2-5 should be separated by commas, e.g. "1623,432,6" for three replicates and the replicate order should be consistent for each column to ensure pairs are matched correctly.
PDseDataSetFromMat(dat)
PDseDataSetFromMat(dat)
dat |
The Mat format dataframe. |
A PDseDataSet object
data("sample_dataset") pdat <- PDseDataSetFromMat(sample_dataset)
data("sample_dataset") pdat <- PDseDataSetFromMat(sample_dataset)
Extract results for pairadise analysis
results(pdat, p.adj = "BH", sig.level = 0.01, details = FALSE)
results(pdat, p.adj = "BH", sig.level = 0.01, details = FALSE)
pdat |
A PDseDataSet object from pairadise analysis |
p.adj |
The p ajustment method. |
sig.level |
The cutoff of significant results |
details |
Whether to list detailed results. |
The function return a results DataFrame.
testStats |
Vector of test statistics for paired analysis. |
p.value |
Vector of pvalues for each exon/event. |
p.adj |
The adjusted p values |
If details is TRUE, more detailed parameter estimates for constrained and unconstrained model will return.
data("sample_dataset") pdat <- PDseDataSetFromMat(sample_dataset) pdat <- pairadise(pdat) results(pdat)
data("sample_dataset") pdat <- PDseDataSetFromMat(sample_dataset) pdat <- pairadise(pdat) results(pdat)
The CEU dataset was generated by analyzing the allele-specific alternative splicing events in the GEUVADIS CEU data. Allele-specific reads were mapped onto alternative splicing events using rPGA (version 2.0.0). Then the allele-specific bam files mapped onto the two haplotypes are merged together to detect alternative splicing events using rMATS (version 3.2.5)16.
The LUSC dataset was generated by analyzing the tumor versus adjacent control samples from TCGA LUSC RNA-seq data.
data(sample_dataset) data(sample_dataset_CEU) data(sample_dataset_LUSC)
data(sample_dataset) data(sample_dataset_CEU) data(sample_dataset_LUSC)
The dataset has 7 columns, arranged as follows:
Column 1 contains the ID of the alternative splicing events.
Column 2 contains counts of isoform 1 corresponding to the first group.
Column 3 contains counts of isoform 2 corresponding to the first group.
Column 4 contains counts of isoform 1 corresponding to the second group.
Column 5 contains counts of isoform 2 corresponding to the second group.
Column 6 contains the effective length of isoform 1.
Column 7 contains the effective length of isoform 2.
The dataset has 7 columns, arranged as follows:
Column 1 contains the ID of the alternative splicing events.
Column 2 contains counts of isoform 1 corresponding to the first group.
Column 3 contains counts of isoform 2 corresponding to the first group.
Column 4 contains counts of isoform 1 corresponding to the second group.
Column 5 contains counts of isoform 2 corresponding to the second group.
Column 6 contains the effective length of isoform 1.
Column 7 contains the effective length of isoform 2.
The dataset has 7 columns, arranged as follows:
Column 1 contains the ID of the alternative splicing events.
Column 2 contains counts of isoform 1 corresponding to the first group.
Column 3 contains counts of isoform 2 corresponding to the first group.
Column 4 contains counts of isoform 1 corresponding to the second group.
Column 5 contains counts of isoform 2 corresponding to the second group.
Column 6 contains the effective length of isoform 1.
Column 7 contains the effective length of isoform 2.