We introduce PAIRADISE (PAIred Replicate analysis of Allelic DIfferential Splicing Events), a method for detecting allele-specific alternative splicing (ASAS) from RNA-seq data. PAIRADISE uses a statistical model that aggregates ASAS signals across multiple individuals in a population. It formulates ASAS detection as a statistical problem for identifying differential alternative splicing from RNA-seq data with paired replicates. The PAIRADISE statistical model is applicable to many forms of allele-specific isoform variation (e.g. RNA editing), and can be used as a generic statistical model for RNA-seq studies involving paired replicates. More details can be found: https://github.com/Xinglab/PAIRADISE
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("PAIRADISE")
The development version is also available to download from Github.
A PDseDataSet
class is defined to store the splicing
count data, and contains inclusion and skipping counts for each sample.
A design dataframe
is required to describe the paired
sample information. An integer dataframe
of inclusion and
skipping lengths are also required. The PDseDataSet
extends
the SummarizedExperiment
class. All functions from
SummarizedExperiment
are inherited.
Here is an example to construct a PDseDataSet
with 2
pairs of samples.
library(abind)
icount <- matrix(1:4, 1)
scount <- matrix(5:8, 1)
acount <- abind(icount, scount, along = 3)
acount
#> , , 1
#>
#> [,1] [,2] [,3] [,4]
#> [1,] 1 2 3 4
#>
#> , , 2
#>
#> [,1] [,2] [,3] [,4]
#> [1,] 5 6 7 8
design <- data.frame(sample = rep(c("s1", "s2"), 2),
group = rep(c("T", "N"), each = 2))
lens <- data.frame(sLen=1L, iLen=2L)
PDseDataSet(acount, design, lens)
#> class: PDseDataSet
#> dim: 1 4
#> metadata(0):
#> assays(1): counts
#> rownames: NULL
#> rowData names(2): sLen iLen
#> colnames: NULL
#> colData names(2): sample group
A count matrix can be imported as a PDseDataSet
directly.
data("sample_dataset")
sample_dataset
#> ExonID I1 S1 I2 S2 I_len S_len
#> 1 Exon 1 624,661,209 564,450,167 549,468,103 1261,767,325 180 90
#> 2 Exon 2 963,1139,388 1104,1100,330 1196,938,439 317,374,93 180 90
#> 3 Exon 3 15,17000,20,100 2,12,1,1 1,6,7,10 274,NA,320,5650 3 1
#> 4 Exon 4 3,5,9 13,27,4 5,9,9 11,29,3 3 1
PAIRADISE also includes two small sample datasets from Geuvadis and TCGA:
The “sample_dataset_CEU” dataset was generated by analyzing the allele-specific alternative splicing events in the GEUVADIS CEU data. Allele-specific reads were mapped onto alternative splicing events using rPGA (version 2.0.0). Then the allele-specific bam files mapped onto the two haplotypes are merged together to detect alternative splicing events using rMATS (version 3.2.5). The second LUSC dataset was generated by analyzing the tumor versus adjacent control samples from TCGA LUSC RNA-seq data.
Each row of the data corresponds to a different alternative splicing event. The data should have 7 columns. The order of the 7 columns in the input data frame to PAIRADISE follows the convention of the output of the rMATS software, arranged as follows:
To import the data to a PDseDataSet
object.
pairadise
methodThe pairadise
function implements the PAIRADISE
statistical model for the PDseDataSet
object. Multiple
processors can be used via the BiocParallel
package. The
function returns a PDseDataSet
object with statistical
estimates in its rowData
. Here is how to run the model with
2 threads.
A function results
can be used to calculate p values and
filter the significant results. For example, results significant at an
FDR of 0.01 can be obtained as follows.
res <- results(pairadise_output, p.adj = "BH", sig.level = 0.01)
res
#> DataFrame with 3 rows and 3 columns
#> testStats p.value p.adj
#> <numeric> <numeric> <numeric>
#> Exon 1 9.54941 0.002000136 0.00274910
#> Exon 2 9.49366 0.002061827 0.00274910
#> Exon 3 12.29846 0.000453331 0.00181333
With details=TRUE
, more detailed statistical estimates
can be returned.
res <- results(pairadise_output, details = TRUE)
colnames(res)
#> [1] "testStats" "p.value" "mu.u" "s1.u" "s2.u" "s.u"
#> [7] "delta" "mu.c" "s1.c" "s2.c" "s.c" "totalIter"
#> [13] "latent" "p.adj"
res$latent[3,]
#> $latent
#> [,1] [,2] [,3]
#> psi1.u 0.857840921 0.899015595 0.9405878997
#> psi2.u 0.001442136 0.006073380 0.0006441722
#> alpha.u 2.281881387 2.292264054 2.2888561292
#> psi1.c 0.824598001 0.878199864 0.9348674453
#> psi2.c 0.001352191 0.007339606 0.0005979513
#> alpha.c 1.552849901 1.975950491 2.6566262109
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] abind_1.4-8 PAIRADISE_1.23.0 nloptr_2.1.1 BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] Matrix_1.7-1 jsonlite_1.8.9
#> [3] crayon_1.5.3 compiler_4.4.1
#> [5] BiocManager_1.30.25 SummarizedExperiment_1.36.0
#> [7] Biobase_2.67.0 GenomicRanges_1.59.0
#> [9] parallel_4.4.1 jquerylib_0.1.4
#> [11] IRanges_2.41.0 BiocParallel_1.41.0
#> [13] yaml_2.3.10 fastmap_1.2.0
#> [15] lattice_0.22-6 R6_2.5.1
#> [17] XVector_0.46.0 S4Arrays_1.6.0
#> [19] generics_0.1.3 GenomeInfoDb_1.43.0
#> [21] knitr_1.48 BiocGenerics_0.53.1
#> [23] DelayedArray_0.33.1 MatrixGenerics_1.19.0
#> [25] maketools_1.3.1 GenomeInfoDbData_1.2.13
#> [27] bslib_0.8.0 rlang_1.1.4
#> [29] cachem_1.1.0 xfun_0.48
#> [31] sass_0.4.9 sys_3.4.3
#> [33] SparseArray_1.6.0 cli_3.6.3
#> [35] zlibbioc_1.52.0 grid_4.4.1
#> [37] digest_0.6.37 lifecycle_1.0.4
#> [39] S4Vectors_0.44.0 evaluate_1.0.1
#> [41] codetools_0.2-20 buildtools_1.0.0
#> [43] stats4_4.4.1 rmarkdown_2.28
#> [45] httr_1.4.7 matrixStats_1.4.1
#> [47] tools_4.4.1 htmltools_0.5.8.1
#> [49] UCSC.utils_1.2.0