Title: | interactive analysis and visualization of alternative splicing in R |
---|---|
Description: | The analysis and visualization of alternative splicing (AS) events from RNA sequencing data remains challenging. SpliceWiz is a user-friendly and performance-optimized R package for AS analysis, by processing alignment BAM files to quantify read counts across splice junctions, IRFinder-based intron retention quantitation, and supports novel splicing event identification. We introduce a novel visualization for AS using normalized coverage, thereby allowing visualization of differential AS across conditions. SpliceWiz features a shiny-based GUI facilitating interactive data exploration of results including gene ontology enrichment. It is performance optimized with multi-threaded processing of BAM files and a new COV file format for fast recall of sequencing coverage. Overall, SpliceWiz streamlines AS analysis, enabling reliable identification of functionally relevant AS events for further characterization. |
Authors: | Alex Chit Hei Wong [aut, cre, cph], Ulf Schmitz [ctb], William Ritchie [cph] |
Maintainer: | Alex Chit Hei Wong <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.9.0 |
Built: | 2024-11-18 04:43:55 UTC |
Source: | https://github.com/bioc/SpliceWiz |
SpliceWiz is a computationally efficient and user friendly workflow that analyses aligned short-read RNA sequencing for differential intron retention and alternative splicing.
SpliceWiz uses isoform-specific alignments to quantify percent-spliced-in ratios (i.e. ratio of the "included" isoform, as a proportion of "included" and "excluded" isoforms). For intron retention (IR), the abundance of the intron-retaining transcript (included isoform) is quantified using the trimmed-mean depth of intron coverage with reads, whereas the spliced transcript (excluded isoform) is measured as the splicing of the intron as well as that of overlapping introns (since splicing of any overlapping intron implies the intron of interest is not retained). For other forms of alternative splicing, junction reads (reads aligned across splice junctions) are used to quantify included and excluded isoforms.
SpliceWiz processes BAM files (aligned RNA sequencing) using
ompBAM::ompBAM-package. ompBAM
is a C++ library that allows R packages (via the Rcpp framework) to
efficiently read BAM files using OpenMP-based multi-threading. SpliceWiz
processes BAM files via the processBAM function, using a splicing and
intron reference built from any given genome / gene annotation resource
using the buildRef function. processBAM generates two outputs per
BAM file: a txt.gz
file which is a gzip-compressed text file with multiple
tables, containing information including junction read counts and intron
retention metrics. This output is very similar to that of
IRFinder, as the analysis
steps of SpliceWiz's BAM processing was built on an improved version of
IRFinder's source code (version 1.3.1). Additionally, processBAM outputs
a COV file, which is a binary bgzf-compressed file that contains
strand-specific coverage data.
Once individual files have been analysed, SpliceWiz compiles a dataset using these individual outputs, using collateData. This function unifies junctions detected across the dataset, and generates included / excluded counts of all putative IR events and annotated alternative splicing events (ASEs). This dataset is exported as a collection of files including an H5 database. The data is later imported into the R session using the makeSE function, as a NxtSE object.
The NxtSE object is a specialized SummarizedExperiment object tailored for use in SpliceWiz. Annotation of rows provide information about ASEs via rowData, while columns allows users to provide annotations via colData.
SpliceWiz offers several novel filters via the ASEFilter class. See ASEFilter for details.
Once the NxtSE is annotated and filtered, differential analysis is performed, using limma, DoubleExpSeq (DES), edgeR and DESeq2 wrappers. These wrappers model isoform counts as log-normal (limma), beta-binomial (DES) and negative-binomial (edgeR and DESeq2) distributions. See ASE-methods for details.
Finally, SpliceWiz provides visualisation tools to illustrate alternative splicing using coverage plots, including a novel method to normalise RNA-seq coverage grouped by experimental condition. This approach accounts for variations introduced by sequenced library size and gene expression. SpliceWiz efficiently computes and visualises means and variations in per-nucleotide coverage depth across alternate exons in genomic loci.
The main functions are:
Build-Reference-methods - Prepares genome and gene annotation references from FASTA and GTF files and synthesizes the SpliceWiz reference for processing BAM files, collating the NxtSE object.
STAR-methods - (Optional) Provides wrapper functions to build the STAR genome reference and alignment of short-read FASTQ raw sequencing files. This functionality is only available on systems with STAR installed.
processBAM - OpenMP/C++ based algorithm to analyse single or multiple BAM files.
collateData - Collates an experiment based on multiple IRFinder outputs for individual samples, into one unified H5-based data structure.
makeSE - Constructs a NxtSE (H5-based SummarizedExperiment) object, specialised to house measurements of retained introns and junction counts of alternative splice events.
applyFilters - Use default or custom filters to remove alternative splicing or IR events pertaining to low-abundance genes and transcripts.
ASE-methods - one-step method to perform differential alternate splice event (ASE) analysis on a NxtSE object using limma or DESeq2.
make_plot_data: Functions that compile individual and group-mean percent spliced in (PSI) values of IR and alternative splice events; useful to produce scatter plots or heatmaps.
Coverage: methods that retrieve coverage data from COV files.
getCoverageData / getPlotObject / plotView: Functions for plotting SpliceWiz's novel coverage plots.
See the SpliceWiz Quick-Start for worked examples on how to use SpliceWiz SpliceWiz Cookbook for real-life usage examples
Alex Wong
Wong ACH, Wong JJ-L, Rasko JEJ, Schmitz U. SpliceWiz: interactive analysis and visualization of alternative splicing in R. Briefings in Bioinformatics, Volume 25, Issue 1, January 2024, bbad468. https://doi.org/10.1093/bib/bbad468
Useful links:
These functions allow users to fit custom GLMs included/excluded counts using edgeR for differential Alternative Splice Events (ASEs)
fitASE_edgeR( se, strModelFormula, strASEFormula, useQL = TRUE, IRmode = c("all", "annotated", "annotated_binary"), filter_antiover = TRUE, filter_antinear = FALSE ) fitASE_edgeR_custom( se, model_IncExc, model_ASE, useQL = TRUE, IRmode = c("all", "annotated", "annotated_binary"), filter_antiover = TRUE, filter_antinear = FALSE ) testASE_edgeR( se, fit, coef_IncExc = ncol(fit[["model_IncExc"]]), contrast_IncExc = NULL, coef_ASE = ncol(fit[["model_ASE"]]), contrast_ASE = NULL ) addPSI_edgeR(results, se, condition, conditionList)
fitASE_edgeR( se, strModelFormula, strASEFormula, useQL = TRUE, IRmode = c("all", "annotated", "annotated_binary"), filter_antiover = TRUE, filter_antinear = FALSE ) fitASE_edgeR_custom( se, model_IncExc, model_ASE, useQL = TRUE, IRmode = c("all", "annotated", "annotated_binary"), filter_antiover = TRUE, filter_antinear = FALSE ) testASE_edgeR( se, fit, coef_IncExc = ncol(fit[["model_IncExc"]]), contrast_IncExc = NULL, coef_ASE = ncol(fit[["model_ASE"]]), contrast_ASE = NULL ) addPSI_edgeR(results, se, condition, conditionList)
se |
The NxtSE object created by |
strModelFormula |
A string specifying the model formula to fit isoform
counts to assess differential expression in isolation. Should take the form
of |
strASEFormula |
A string specifying the model formula to fit PSIs
(isoform ratios). The variate of interest should be specified as an
interactiion term with |
useQL |
(default |
IRmode |
(default |
filter_antiover , filter_antinear
|
Whether to remove novel IR events that overlap over or near anti-sense genes. Default will exclude antiover but not antinear introns. These are ignored if strand-specific RNA-seq protocols are used. |
model_IncExc |
A model matrix in which to model differential expression
of isoform counts in isolation. The number of rows must equal that of the
number of samples in |
model_ASE |
A model matrix in which to model differential PSIs.
The number of rows must be twice that of the number of samples in |
fit |
The output returned by the |
coef_IncExc , coef_ASE
|
model coefficients to be dropped for LRT test
between full and reduced models. Directly parsed onto |
contrast_IncExc , contrast_ASE
|
numeric vector specifying one or more #' contrasts of the linear model coefficients to be tested.
Directly parsed onto |
results |
The return value of |
condition |
The name of the column containing the condition values in
|
conditionList |
A list (or vector) of condition values of which to calculate mean PSIs |
edgeR accounts appropriately for zero-counts which are often problematic as PSI approaches zero or one, leading to false positives. The following functions allow users to define model formulas to test relative expressions of included / excluded counts (to assess whether isoforms are differentially regulated, in isolation), as well as together as an interaction (the latter provides results of differential ASE analysis)
See the examples section for a brief explanation of how to use these functions.
See also ASE-methods for further explanations of results output.
fitASE_edgeR
and fitASE_edgeR_custom
returns a named list containing
the following:
IncExc
and ASE
are DGEGLM
objects containing the fitted models for
isoform counts and PSIs, respectively
model_IncExc
and model_ASE
are model matrices of the above fitted
models.
testASE_edgeR()
returns a data.table containing the following:
EventName: The name of the ASE event. This identifies each ASE in downstream functions including makeMeanPSI, makeMatrix, and plotCoverage
EventType: The type of event. See details section above.
EventRegion: The genomic coordinates the event occupies. This spans the most upstream and most downstream splice junction involved in the ASE, and is use to guide the plotCoverage function.
flags: Indicates which isoforms are NMD substrates and/or which are formed by novel splicing only.
edgeR specific output equivalent to statistics returned by
edgeR::topTags()
:
logFC, logCPM, F, PValue, FDR: log fold change, log counts per million, F statistic, p value and (Benjamini Hochberg) adjusted p values of the differential PSIs for the contrasts or coefficients tested.
inc/exc_(...): edgeR statistics corresponding to differential expression testing for raw included / excluded counts in isolation (not of the PSIs).
addPSI_edgeR()
appends the following columns to the above output
AvgPSI_X: the average percent spliced in / percent IR levels for condition X. Note this is a geometric mean, based on the arithmetic mean of logit PSI values.
deltaPSI: The difference in PSI between the mean values of the two conditions.
abs_deltaPSI: The absolute value of difference in PSI between the mean values of the two conditions.
fitASE_edgeR()
: Use edgeR to fit counts and ASE models with a
given design formula
fitASE_edgeR_custom()
: Use edgeR to fit counts and ASE models with a
given design formula
testASE_edgeR()
: Use edgeR to return differential ASE results. coef
and contrast are parsed onto edgeR's glmQLFTest function
addPSI_edgeR()
: Adds average and delta PSIs of conditions of
interest onto results produced by testASE_edgeR(). Note this is done
automatically for other methods described in ASE-methods
.
Lun A, Smyth G (2017). 'No counts, no variance: allowing for loss of degrees of freedom when assessing biological variability from RNA-seq data' Stat Appl Genet Mol Biol, 017 Apr 25;16(2):83-93. https://doi.org/10.1515/sagmb-2017-0010
# Load the NxtSE object and set up the annotations # - see ?makeSE on example code of generating this NxtSE object se <- SpliceWiz_example_NxtSE() colData(se)$treatment <- rep(c("A", "B"), each = 3) colData(se)$replicate <- rep(c("P","Q","R"), 2) require("edgeR") fit <- fitASE_edgeR( se, strModelFormula = "~0 + replicate + treatment", strASEFormula = "~0 + replicate + treatment + treatment:ASE" ) # Get coefficient terms of Included / Excluded counts isolated model colnames(fit$model_IncExc) # [1] "replicateP" "replicateQ" "replicateR" "treatmentB" # Get coefficient terms of PSI model colnames(fit$model_ASE) # [1] "replicateP" "replicateQ" "replicateR" "treatmentB" # [5] "treatmentA:ASEIncluded" "treatmentB:ASEIncluded" # Contrast between treatment "B" against treatment "A" res <- testASE_edgeR(se, fit, contrast_IncExc = c(0,0,0,1), contrast_ASE = c(0,0,0,0,-1,1) ) ### # Add mean PSI values to results: res_withPSI <- addPSI_edgeR(res, se, "treatment", c("B", "A")) ### Using custom model matrices to model counts # - the equivalent analysis can be performed as follows: # Sample annotations for isoform count expressions colData <- as.data.frame(colData(se)) # Sample annotations for isoform count PSI analysis colData_ASE <- rbind(colData, colData) colData_ASE$ASE <- rep(c("Included", "Excluded"), each = nrow(colData)) rownames(colData_ASE) <- c( paste0(rownames(colData), ".Included"), paste0(rownames(colData), ".Excluded") ) model_IncExc <- model.matrix( ~0 + replicate + treatment, data = colData ) model_ASE <- model.matrix( ~0 + replicate + treatment + treatment:ASE, data = colData_ASE ) fit <- fitASE_edgeR_custom(se, model_IncExc, model_ASE) res_customModel <- testASE_edgeR(se, fit, contrast_IncExc = c(0,0,0,1), contrast_ASE = c(0,0,0,0,-1,1) ) # Check this produces identical results: identical(res_customModel, res) ### Time series examples using edgeR and splines # - similar to section 4.8 in the edgeR vignette colData(se)$timepoint <- rep(c(1,2,3), each = 2) colData(se)$batch <- rep(c("1", "2"), 3) # First, we set up a polynomial spline with 2 degrees of freedom: Time <- poly(colData(se)$timepoint, df = 2) # Next, we define the batch factor: Batch <- factor(colData(se)$batch) # Finally, we construct the same factors for ASE analysis. Note that # each factor must be repeated twice Time_ASE <- rbind(Time, Time) Batch_ASE <- c(Batch, Batch) ASE <- factor( rep(c("Included", "Excluded"), each = nrow(colData(se))) ) # Now, we set up the model matrices for isoform and PSI count modelling model_IncExc <- model.matrix(~0 + Batch + Time) model_ASE <- model.matrix(~0 + Batch_ASE + Time_ASE + Time_ASE:ASE) fit <- fitASE_edgeR_custom(se, model_IncExc, model_ASE) # Note the coefficients of interest in the constructed models: colnames(model_IncExc) # [1] "Batch1" "Batch2" "Time1" "Time2" colnames(model_ASE) # [1] "Batch_ASE1" "Batch_ASE2" "Time_ASE1" "Time_ASE2" # [5] "Time_ASE1:ASEIncluded" "Time_ASE2:ASEIncluded" # We are interested in a model in which `Time` is excluded, thus: res <- testASE_edgeR(se, fit, coef_IncExc = 3:4, coef_ASE = 5:6 ) # Finally, add PSI values for each time point: res_withPSI <- addPSI_edgeR(res, se, "timepoint", c(1, 2, 3))
# Load the NxtSE object and set up the annotations # - see ?makeSE on example code of generating this NxtSE object se <- SpliceWiz_example_NxtSE() colData(se)$treatment <- rep(c("A", "B"), each = 3) colData(se)$replicate <- rep(c("P","Q","R"), 2) require("edgeR") fit <- fitASE_edgeR( se, strModelFormula = "~0 + replicate + treatment", strASEFormula = "~0 + replicate + treatment + treatment:ASE" ) # Get coefficient terms of Included / Excluded counts isolated model colnames(fit$model_IncExc) # [1] "replicateP" "replicateQ" "replicateR" "treatmentB" # Get coefficient terms of PSI model colnames(fit$model_ASE) # [1] "replicateP" "replicateQ" "replicateR" "treatmentB" # [5] "treatmentA:ASEIncluded" "treatmentB:ASEIncluded" # Contrast between treatment "B" against treatment "A" res <- testASE_edgeR(se, fit, contrast_IncExc = c(0,0,0,1), contrast_ASE = c(0,0,0,0,-1,1) ) ### # Add mean PSI values to results: res_withPSI <- addPSI_edgeR(res, se, "treatment", c("B", "A")) ### Using custom model matrices to model counts # - the equivalent analysis can be performed as follows: # Sample annotations for isoform count expressions colData <- as.data.frame(colData(se)) # Sample annotations for isoform count PSI analysis colData_ASE <- rbind(colData, colData) colData_ASE$ASE <- rep(c("Included", "Excluded"), each = nrow(colData)) rownames(colData_ASE) <- c( paste0(rownames(colData), ".Included"), paste0(rownames(colData), ".Excluded") ) model_IncExc <- model.matrix( ~0 + replicate + treatment, data = colData ) model_ASE <- model.matrix( ~0 + replicate + treatment + treatment:ASE, data = colData_ASE ) fit <- fitASE_edgeR_custom(se, model_IncExc, model_ASE) res_customModel <- testASE_edgeR(se, fit, contrast_IncExc = c(0,0,0,1), contrast_ASE = c(0,0,0,0,-1,1) ) # Check this produces identical results: identical(res_customModel, res) ### Time series examples using edgeR and splines # - similar to section 4.8 in the edgeR vignette colData(se)$timepoint <- rep(c(1,2,3), each = 2) colData(se)$batch <- rep(c("1", "2"), 3) # First, we set up a polynomial spline with 2 degrees of freedom: Time <- poly(colData(se)$timepoint, df = 2) # Next, we define the batch factor: Batch <- factor(colData(se)$batch) # Finally, we construct the same factors for ASE analysis. Note that # each factor must be repeated twice Time_ASE <- rbind(Time, Time) Batch_ASE <- c(Batch, Batch) ASE <- factor( rep(c("Included", "Excluded"), each = nrow(colData(se))) ) # Now, we set up the model matrices for isoform and PSI count modelling model_IncExc <- model.matrix(~0 + Batch + Time) model_ASE <- model.matrix(~0 + Batch_ASE + Time_ASE + Time_ASE:ASE) fit <- fitASE_edgeR_custom(se, model_IncExc, model_ASE) # Note the coefficients of interest in the constructed models: colnames(model_IncExc) # [1] "Batch1" "Batch2" "Time1" "Time2" colnames(model_ASE) # [1] "Batch_ASE1" "Batch_ASE2" "Time_ASE1" "Time_ASE2" # [5] "Time_ASE1:ASEIncluded" "Time_ASE2:ASEIncluded" # We are interested in a model in which `Time` is excluded, thus: res <- testASE_edgeR(se, fit, coef_IncExc = 3:4, coef_ASE = 5:6 ) # Finally, add PSI values for each time point: res_withPSI <- addPSI_edgeR(res, se, "timepoint", c(1, 2, 3))
Use Limma, DESeq2, DoubleExpSeq, and edgeR wrapper functions to test for differential Alternative Splice Events (ASEs)
ASE_limma( se, test_factor, test_nom, test_denom, batch1 = "", batch2 = "", IRmode = c("all", "annotated", "annotated_binary"), filter_antiover = TRUE, filter_antinear = FALSE ) ASE_edgeR( se, test_factor, test_nom, test_denom, batch1 = "", batch2 = "", useQL = TRUE, IRmode = c("all", "annotated", "annotated_binary"), filter_antiover = TRUE, filter_antinear = FALSE ) ASE_limma_timeseries( se, test_factor, batch1 = "", batch2 = "", degrees_of_freedom = 1, IRmode = c("all", "annotated", "annotated_binary"), filter_antiover = TRUE, filter_antinear = FALSE ) ASE_edgeR_timeseries( se, test_factor, batch1 = "", batch2 = "", degrees_of_freedom = 1, useQL = TRUE, IRmode = c("all", "annotated", "annotated_binary"), filter_antiover = TRUE, filter_antinear = FALSE ) ASE_DESeq( se, test_factor, test_nom, test_denom, batch1 = "", batch2 = "", n_threads = 1, IRmode = c("all", "annotated", "annotated_binary"), filter_antiover = TRUE, filter_antinear = FALSE ) ASE_DoubleExpSeq( se, test_factor, test_nom, test_denom, IRmode = c("all", "annotated", "annotated_binary"), filter_antiover = TRUE, filter_antinear = FALSE )
ASE_limma( se, test_factor, test_nom, test_denom, batch1 = "", batch2 = "", IRmode = c("all", "annotated", "annotated_binary"), filter_antiover = TRUE, filter_antinear = FALSE ) ASE_edgeR( se, test_factor, test_nom, test_denom, batch1 = "", batch2 = "", useQL = TRUE, IRmode = c("all", "annotated", "annotated_binary"), filter_antiover = TRUE, filter_antinear = FALSE ) ASE_limma_timeseries( se, test_factor, batch1 = "", batch2 = "", degrees_of_freedom = 1, IRmode = c("all", "annotated", "annotated_binary"), filter_antiover = TRUE, filter_antinear = FALSE ) ASE_edgeR_timeseries( se, test_factor, batch1 = "", batch2 = "", degrees_of_freedom = 1, useQL = TRUE, IRmode = c("all", "annotated", "annotated_binary"), filter_antiover = TRUE, filter_antinear = FALSE ) ASE_DESeq( se, test_factor, test_nom, test_denom, batch1 = "", batch2 = "", n_threads = 1, IRmode = c("all", "annotated", "annotated_binary"), filter_antiover = TRUE, filter_antinear = FALSE ) ASE_DoubleExpSeq( se, test_factor, test_nom, test_denom, IRmode = c("all", "annotated", "annotated_binary"), filter_antiover = TRUE, filter_antinear = FALSE )
se |
The NxtSE object created by |
test_factor |
The column name in the sample annotation |
test_nom |
The nominator condition to test for differential ASE. Usually the "treatment" condition |
test_denom |
The denominator condition to test against for differential ASE. Usually the "control" condition |
batch1 , batch2
|
(Optional, limma and DESeq2 only) One or two condition types containing batch information to account for. |
IRmode |
(default |
filter_antiover , filter_antinear
|
Whether to remove novel IR events that overlap over or near anti-sense genes. Default will exclude antiover but not antinear introns. These are ignored if strand-specific RNA-seq protocols are used. |
useQL |
(default |
degrees_of_freedom |
(default |
n_threads |
(DESeq2 only) How many threads to use for DESeq2 based analysis. |
Using limma, SpliceWiz models included and excluded counts as log-normal distributed, whereas using DESeq2, SpliceWiz models included and excluded counts as negative binomial distributed with dispersion shrinkage according to their mean count expressions. For limma and DESeq2, differential ASE are considered as the "interaction" between included and excluded splice counts for each sample. See this vignette for an explanation of how this is done.
SpliceWiz's limma wrapper implements an additional filter where ASEs with an average cpm values of either Included or Excluded counts are less than 1. DESeq2 has its own method for handling outliers, which seems to work well for handling situations where PSI ~= 0 or PSI ~= 1.
Time series are supported by SpliceWiz to a limited extent.
Time series analysis can be performed via limma or DESeq2.
For limma time-series analysis, use ASE_limma_timeseries()
, specifying
the test_factor
as the column of numeric values containing
time series data. For DESeq, time series differential analysis can be
activated using the ASE_DESeq()
function, again specifying test_factor
as the column containing time series data (and leaving test_nom
and test_denom
parameters blank). See examples below.
edgeR models counts using a negative binomial model. It accounts appropriately for zero-counts which are often problematic as PSI approaches zero or one, leading to false positives. The edgeR-based option produces differential ASEs that are less biased towards low counts. Our preliminary analysis shows it to be more accurate than limma or DoubleExpSeq based methods.
For time series analysis using edgeR, ASE_edgeR_timeseries()
can be
used interchangeably with its counterpart limma-based function. For
complex models, please see ASE-GLM-edgeR to build your own GLM models.
Using DoubleExpSeq, included and excluded counts are modeled using the generalized beta prime distribution, using empirical Bayes shrinkage to estimate dispersion.
EventType are as follow:
IR
= intron retention (IR-ratio) - all introns are considered
MXE
= mutually exclusive exons
SE
= skipped exons
AFE
= alternate first exon
ALE
= alternate last exon
A5SS
= alternate 5'-splice site
A3SS
= alternate 3'-splice site
RI
= (known / annotated) intron retention (PSI).
NB: SpliceWiz measures intron retention events using two different approaches, the choice of which is left to the user - see ASE-methods:
IR (intron retention) events: considers all introns to be potentially
retained. Given in most scenarios there may be uncertainty as to which of the
many mutually-overlapping introns are spliced to produce the major isoform,
SpliceWiz adopts the IRFinder approach by using the IR-ratio. The "included"
isoform is the relative abundance of the IR-transcript, as approximated by
the trimmed-mean depth of coverage across the intron (excluding outliers
including exons of other transcripts, intronic elements such as snoRNAs,
etc). The "excluded isoform" includes all spliced transcripts that
contain an overlapping intron, as estimated via SpliceWiz's SpliceOver
and
IRFinder's SpliceMax
methods - see collateData.
RI (annotated retained introns) considers only annotated retained introns, i.e., those annotated within the given reference. These are quantified using PSI, considering the included (IR-transcript) and excluded (splicing of the exact intron) as binary alternatives.
SpliceWiz considers "included" counts as those that represent abundance of the "included" isoform, whereas "excluded" counts represent the abundance of the "excluded" isoform. To allow comparison between modalities, SpliceWiz applies a convention whereby the "included" transcript is one where its splice junctions are by definition shorter than those of "excluded" transcripts. Specifically, this means the included / excluded isoforms are as follows:
EventType | Included | Excluded |
IR or RI | Intron Retention | Spliced Intron |
MXE | Upstream exon inclusion | Downstream exon inclusion |
SE | Exon inclusion | Exon skipping |
AFE | Downstream exon usage | Upstream exon usage |
ALE | Upstream exon usage | Downstream exon usage |
A5SS | Downstream 5'-SS | Upstream 5'-SS |
A3SS | Upstream 3'-SS | Downstream 3'-SS |
For all methods, a data.table containing the following:
EventName: The name of the ASE event. This identifies each ASE in downstream functions including makeMeanPSI, makeMatrix, and plotCoverage
EventType: The type of event. See details section above.
EventRegion: The genomic coordinates the event occupies. This spans the most upstream and most downstream splice junction involved in the ASE, and is use to guide the plotCoverage function.
flags: Indicates which isoforms are NMD substrates and/or which are formed by novel splicing only.
AvgPSI_nom, Avg_PSI_denom: the average percent spliced in / percent
IR levels for the two conditions being contrasted. nom
and denom
in
column names are replaced with the condition names. Note this is a
geometric mean, based on the arithmetic mean of logit PSI values.
deltaPSI: The difference in PSI between the mean values of the two conditions.
abs_deltaPSI: The absolute value of difference in PSI between the mean values of the two conditions.
limma specific output
logFC, AveExpr, t, P.Value, adj.P.Val, B: limma topTable columns of differential ASE. See limma::topTable for details.
inc/exc_(logFC, AveExpr, t, P.Value, adj.P.Val, B): limma results for differential testing for raw included / excluded counts only
edgeR specific output equivalent to statistics returned by edgeR::topTags:
logFC, logCPM, F, PValue, FDR: log fold change, log counts per million, F statistic, p value and (Benjamini Hochberg) adjusted p values.
inc/exc_(...): edgeR statistics corresponding to differential expression testing for raw included / excluded counts in isolation
DESeq2 specific output
baseMean, log2FoldChange, lfcSE, stat, pvalue, padj: DESeq2 results columns for differential ASE; see DESeq2::results for details.
inc/exc_(baseMean, log2FoldChange, lfcSE, stat, pvalue, padj): DESeq2 results for differential testing for raw included / excluded counts only
DoubleExp specific output
MLE_nom, MLE_denom: Maximum likelihood expectation of PSI values for the
denom
in column names are replaced with the condition names
MLE_LFC: Log2-fold change of the MLE
P.Value, adj.P.Val: Nominal and BH-adjusted P values
n_eff: Number of effective samples (i.e. non-zero or non-unity PSI)
mDepth: Mean Depth of splice coverage in each of the two groups.
Dispersion_Reduced, Dispersion_Full: Dispersion values for reduced and full models. See DoubleExpSeq::DBGLM1 for details.
ASE_limma()
: Use limma to perform differential ASE analysis of
a filtered NxtSE object
ASE_edgeR()
: Use edgeR to perform differential ASE analysis of
a filtered NxtSE object
ASE_limma_timeseries()
: Use limma to perform differential ASE analysis of
a filtered NxtSE object (time series)
ASE_edgeR_timeseries()
: Use edgeR to perform differential time series
of a filtered NxtSE object
ASE_DESeq()
: Use DESeq2 to perform differential ASE analysis of
a filtered NxtSE object
ASE_DoubleExpSeq()
: Use DoubleExpSeq to perform differential ASE analysis
of a filtered NxtSE object (uses double exponential beta-binomial model)
to estimate group dispersions, followed by LRT
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015). 'limma powers differential expression analyses for RNA-sequencing and microarray studies.' Nucleic Acids Research, 43(7), e47. https://doi.org/10.1093/nar/gkv007
Love MI, Huber W, Anders S (2014). 'Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.' Genome Biology, 15, 550. https://doi.org/10.1186/s13059-014-0550-8
Ruddy S, Johnson M, Purdom E (2016). 'Shrinkage of dispersion parameters in the binomial family, with application to differential exon skipping.' Ann. Appl. Stat. 10(2): 690-725. https://doi.org/10.1214/15-AOAS871
Gilis J, Vitting-Seerup K, Van den Berge K, Clement L (2021). 'Scalable analysis of differential transcript usage for bulk and single-cell RNA-sequencing applications.' F1000Research 2021, 10:374. https://doi.org/10.12688/f1000research.51749.1
Lun A, Smyth G (2017). 'No counts, no variance: allowing for loss of degrees of freedom when assessing biological variability from RNA-seq data' Stat Appl Genet Mol Biol, 017 Apr 25;16(2):83-93. https://doi.org/10.1515/sagmb-2017-0010
# Load the NxtSE object and set up the annotations # - see ?makeSE on example code of generating this NxtSE object se <- SpliceWiz_example_NxtSE(novelSplicing = TRUE) colData(se)$treatment <- rep(c("A", "B"), each = 3) colData(se)$replicate <- rep(c("P","Q","R"), 2) # Limma analysis (counts modeled using log-normal distribution) require("limma") res_limma <- ASE_limma(se, "treatment", "A", "B") # edgeR analysis (counts modeled using negative binomial distribution) # - QL: whether quasi-likelihood method was used require("edgeR") res_edgeR <- ASE_edgeR(se, "treatment", "A", "B", useQL = FALSE) res_edgeR_QL <- ASE_edgeR(se, "treatment", "A", "B", useQL = TRUE) # DoubleExpSeq analysis (counts modeled using beta binomial distribution) require("DoubleExpSeq") res_DES <- ASE_DoubleExpSeq(se, "treatment", "A", "B") # DESeq2 analysis (counts modeled using negative binomial distribution) require("DESeq2") res_DESeq <- ASE_DESeq(se, "treatment", "A", "B") # Time series examples colData(se)$timepoint <- rep(c(1,2,3), each = 2) colData(se)$batch <- rep(c("1", "2"), 3) res_limma_timeseries <- ASE_limma_timeseries(se, "timepoint") res_edgeR_timeseries <- ASE_edgeR_timeseries(se, "timepoint") res_DESeq_timeseries <- ASE_DESeq(se, "timepoint")
# Load the NxtSE object and set up the annotations # - see ?makeSE on example code of generating this NxtSE object se <- SpliceWiz_example_NxtSE(novelSplicing = TRUE) colData(se)$treatment <- rep(c("A", "B"), each = 3) colData(se)$replicate <- rep(c("P","Q","R"), 2) # Limma analysis (counts modeled using log-normal distribution) require("limma") res_limma <- ASE_limma(se, "treatment", "A", "B") # edgeR analysis (counts modeled using negative binomial distribution) # - QL: whether quasi-likelihood method was used require("edgeR") res_edgeR <- ASE_edgeR(se, "treatment", "A", "B", useQL = FALSE) res_edgeR_QL <- ASE_edgeR(se, "treatment", "A", "B", useQL = TRUE) # DoubleExpSeq analysis (counts modeled using beta binomial distribution) require("DoubleExpSeq") res_DES <- ASE_DoubleExpSeq(se, "treatment", "A", "B") # DESeq2 analysis (counts modeled using negative binomial distribution) require("DESeq2") res_DESeq <- ASE_DESeq(se, "treatment", "A", "B") # Time series examples colData(se)$timepoint <- rep(c(1,2,3), each = 2) colData(se)$batch <- rep(c("1", "2"), 3) res_limma_timeseries <- ASE_limma_timeseries(se, "timepoint") res_edgeR_timeseries <- ASE_edgeR_timeseries(se, "timepoint") res_DESeq_timeseries <- ASE_DESeq(se, "timepoint")
SpliceWiz implements a number of novel filters designed to exclude alternative splicing events (ASEs) that yield low-confidence estimates.
ASEFilter( filterClass = c("Data", "Annotation"), filterType = c("Depth", "Participation", "Consistency", "Modality", "Protein_Coding", "NMD", "TSL", "Terminus", "ExclusiveMXE", "StrictAltSS"), pcTRUE = 100, minimum = 20, maximum = 1, minDepth = 5, condition = "", minCond = -1, EventTypes = c("IR", "MXE", "SE", "A3SS", "A5SS", "AFE", "ALE", "RI") )
ASEFilter( filterClass = c("Data", "Annotation"), filterType = c("Depth", "Participation", "Consistency", "Modality", "Protein_Coding", "NMD", "TSL", "Terminus", "ExclusiveMXE", "StrictAltSS"), pcTRUE = 100, minimum = 20, maximum = 1, minDepth = 5, condition = "", minCond = -1, EventTypes = c("IR", "MXE", "SE", "A3SS", "A5SS", "AFE", "ALE", "RI") )
filterClass |
Must be either |
filterType |
Must be a valid |
pcTRUE |
If conditions are set, what percentage of all samples in each of the condition must satisfy the filter for the event to pass the filter check. Must be between 0 and 100 (default 100) |
minimum |
Filter-dependent argument. See details |
maximum |
Filter-dependent argument. See details |
minDepth |
Filter-dependent argument. See details |
condition |
(default "") If set, must match the name of an experimental
condition in the NxtSE object to be filtered,
i.e. a column name in |
minCond |
(default -1) If condition is set, how many minimum number of
conditions must pass the filter criteria. For example,
if condition = "Batch", and batches are "A", "B", or "C", setting
|
EventTypes |
What types of events are considered for filtering. Must be
one or more of |
Annotation Filters
Modality: Filters for specific modalities of ASEs. All events
belonging to the specified EventTypes
are removed.
No additional parameters required.
Protein_Coding: Filters for alternative splicing or IR events involving protein-coding transcripts. No additional parameters required.
NMD: Filters for events in which one isoform is a predicted NMD substrate.
TSL: filters for events in which both
isoforms have a TSL level below or equal to minimum
Terminus: In alternate first exons, the splice junction must not be shared with another transcript for which it is not its first intron. For alternative last exons, the splice junction must not be shared with another transcript for which it is not its last intron
ExclusiveMXE: For MXE events, the two alternate casette exons must not overlap in their genomic regions
StrictAltSS: For A5SS / A3SS events, the two alternate splice sites must not be interupted by detected introns
Data Filters
Depth: Filters IR or alternative splicing events of transcripts
that are "expressed" with adequate Depth
as calculated by the
sum of all splicing and IR reads spanning the event. Events with
Depth
below minimum
are filtered out
Participation: Participation means different things to IR
and alternative splicing.
For IR, Participation refers to the percentage of the measured intron
covered with reads. Only introns of samples with a depth of intron
coverage (intron depth) above
minDepth
are assessed, where introns with coverage percentage
below minimum
are filtered out.
For non-IR ASEs, Participation refers to the percentage of
all splicing events observed across the genomic region
(SpliceOver metric) that is
compatible with either the included or excluded event. This prevents
SpliceWiz from doing differential analysis between two minor isoforms.
Instead of IntronDepth, in AS events SpliceWiz considers events where
the SpliceOver metric exceed minDepth
.
Then, events with a SpliceOver metric below minimum
are excluded.
We recommend testing IR events for > 70% coverage and AS
events for > 40% coverage as given in the default filters which can be
accessed using getDefaultFilters
Consistency: Skipped exons (SE) and mutually exclusive exons
(MXE) comprise reads aligned to two contiguous splice junctions.
Most algorithms take the average counts from both junctions. This
will inadvertently include transcripts that share one but not both
splice events. To check that this is not happening, we require both
splice junctions to have comparable counts.
This filter checks whether reads from each splice junction comprises
a reasonable proportion of the sum of these reads.
Events are excluded if either of the upstream or downstream
event is lower than total splicing events by a log-2 magnitude
above maximum
. For example, if
maximum = 2
, we require both upstream and downstream
events to represent at least 1/(2^2) = 1/4 of the sum of upstream
and downstream event. If maximum = 3
, then each junction must be at
least 1/8 of total, etc.
This is considered for each isoform of each event, and is NOT tested
when total (upstream+downstream) counts belonging to each isoform is
below minDepth
.
IR-events are also checked. For IR events, the upstream and downstream
exon-intron spanning reads must comprise a reasonable proportion of total
exon-intron spanning reads.
We highly recommend using the default filters, which can be acquired using getDefaultFilters
An ASEFilter object with the specified parameters
ASEFilter()
: Constructs a ASEFilter object
# Create a ASEFilter that filters for protein-coding ASE f1 <- ASEFilter(filterClass = "Annotation", filterType = "Protein_Coding") # Create a ASEFilter that filters for Depth >= 20 in IR events f2 <- ASEFilter( filterClass = "Data", filterType = "Depth", minimum = 20, EventTypes = c("IR", "RI") ) # Create a ASEFilter that filters for Participation > 60% in splice events # that must be satisfied in at least 2 categories of condition "Genotype" f3 <- ASEFilter( filterClass = "Data", filterType = "Participation", minimum = 60, EventTypes = c("MXE", "SE", "AFE", "ALE", "A3SS", "A5SS"), condition = "Genotype", minCond = 2 ) # Create a ASEFilter that filters for Depth > 10 in all events # that must be satisfied in at least 50% of each gender f4 <- ASEFilter( filterClass = "Data", filterType = "Depth", minimum = 10, condition = "gender", pcTRUE = 50 ) # Get a description of what these filters do: f1 f2 f3 f4
# Create a ASEFilter that filters for protein-coding ASE f1 <- ASEFilter(filterClass = "Annotation", filterType = "Protein_Coding") # Create a ASEFilter that filters for Depth >= 20 in IR events f2 <- ASEFilter( filterClass = "Data", filterType = "Depth", minimum = 20, EventTypes = c("IR", "RI") ) # Create a ASEFilter that filters for Participation > 60% in splice events # that must be satisfied in at least 2 categories of condition "Genotype" f3 <- ASEFilter( filterClass = "Data", filterType = "Participation", minimum = 60, EventTypes = c("MXE", "SE", "AFE", "ALE", "A3SS", "A5SS"), condition = "Genotype", minCond = 2 ) # Create a ASEFilter that filters for Depth > 10 in all events # that must be satisfied in at least 50% of each gender f4 <- ASEFilter( filterClass = "Data", filterType = "Depth", minimum = 10, condition = "gender", pcTRUE = 50 ) # Get a description of what these filters do: f1 f2 f3 f4
These function builds the reference required by the SpliceWiz engine, as well as alternative splicing annotation data for SpliceWiz. See examples below for guides to making the SpliceWiz reference.
getResources( reference_path = "./Reference", fasta = "", gtf = "", overwrite = FALSE, force_download = FALSE, verbose = TRUE ) buildRef( reference_path = "./Reference", fasta = "", gtf = "", overwrite = FALSE, force_download = FALSE, chromosome_aliases = NULL, genome_type = "", nonPolyARef = "", MappabilityRef = "", BlacklistRef = "", ontologySpecies = "", useExtendedTranscripts = TRUE, lowMemoryMode = TRUE, verbose = TRUE ) buildFullRef( reference_path = "./Reference", fasta = "", gtf = "", use_STAR_mappability = FALSE, overwrite = FALSE, force_download = FALSE, chromosome_aliases = NULL, genome_type = "", nonPolyARef = "", MappabilityRef = "", BlacklistRef = "", ontologySpecies = "", useExtendedTranscripts = TRUE, verbose = TRUE, n_threads = 4, ... ) getNonPolyARef(genome_type) getAvailableGO(localHub = FALSE, ah = AnnotationHub(localHub = localHub))
getResources( reference_path = "./Reference", fasta = "", gtf = "", overwrite = FALSE, force_download = FALSE, verbose = TRUE ) buildRef( reference_path = "./Reference", fasta = "", gtf = "", overwrite = FALSE, force_download = FALSE, chromosome_aliases = NULL, genome_type = "", nonPolyARef = "", MappabilityRef = "", BlacklistRef = "", ontologySpecies = "", useExtendedTranscripts = TRUE, lowMemoryMode = TRUE, verbose = TRUE ) buildFullRef( reference_path = "./Reference", fasta = "", gtf = "", use_STAR_mappability = FALSE, overwrite = FALSE, force_download = FALSE, chromosome_aliases = NULL, genome_type = "", nonPolyARef = "", MappabilityRef = "", BlacklistRef = "", ontologySpecies = "", useExtendedTranscripts = TRUE, verbose = TRUE, n_threads = 4, ... ) getNonPolyARef(genome_type) getAvailableGO(localHub = FALSE, ah = AnnotationHub(localHub = localHub))
reference_path |
(REQUIRED) The directory path to store the generated reference files |
fasta |
The file path or web link to the user-supplied genome
FASTA file. Alternatively, the name of the AnnotationHub record containing
the genome resource. May be omitted if |
gtf |
The file path or web link to the user-supplied transcript
GTF file (or gzipped GTF file). Alternatively, the name of the
AnnotationHub record containing the transcript GTF file. May be omitted if
|
overwrite |
(default |
force_download |
(default |
verbose |
(default |
chromosome_aliases |
(Highly optional) A 2-column data frame containing chromosome name conversions. If this is set, allows processBAM to parse BAM alignments to a genome whose chromosomes are named differently to the reference genome. The most common scenario is where Ensembl genome typically use chromosomes "1", "2", ..., "X", "Y", whereas UCSC/Gencode genome use "chr1", "chr2", ..., "chrX", "chrY". See example below. Refer to https://github.com/dpryan79/ChromosomeMappings for a list of chromosome alias resources. |
genome_type |
Allows |
nonPolyARef |
(Optional) A BED file of regions defining known
non-polyadenylated transcripts. This file is used for QC analysis
to measure Poly-A enrichment quality of samples. An RDS file (openable
using |
MappabilityRef |
(Optional) A BED file of low mappability regions due to
repeat elements in the genome. If omitted, the file generated by
|
BlacklistRef |
A BED file of regions to be otherwise excluded from IR
analysis. If omitted, a blacklist is not used (this is the default).
An RDS file (openable using |
ontologySpecies |
(default |
useExtendedTranscripts |
(default |
lowMemoryMode |
(default |
use_STAR_mappability |
(default FALSE) In |
n_threads |
The number of threads used to generate the STAR reference and mappability calculations. Multi-threading is not used for SpliceWiz reference generation (but multiple cores are utilised in data-table and fst file processing automatically, where available). See STAR-methods |
... |
For |
localHub |
(default |
ah |
For |
getResources()
processes the files, downloads resources from
web links or from AnnotationHub()
, and saves a local copy in the "resource"
subdirectory within the given reference_path
. Resources are retrieved via
either:
User-supplied FASTA and GTF file. This can be a file path, or a web link
(e.g. 'http://', 'https://' or 'ftp://'). Use fasta
and gtf
to specify the files or web paths to use.
AnnotationHub genome and gene annotation (Ensembl): supply the names of
the genome sequence and gene annotations to fasta
and gtf
.
buildRef()
will first run getResources()
if resources are
not yet saved locally (i.e. getResources()
is not already run).
Then, it creates the SpliceWiz references. Typical run-times are
5 to 10 minutes for human and mouse genomes (after resources are downloaded).
NB: the parameters fasta
and gtf
can be omitted in buildRef()
if
getResources()
is already run.
buildFullRef()
builds the STAR aligner reference alongside the SpliceWiz
reference. The STAR reference will be located in the STAR
subdirectory
of the specified reference path. If use_STAR_mappability
is set to TRUE
this function will empirically compute regions of low mappability. This
function requires STAR
to be installed on the system (which only runs on
linux-based systems).
getNonPolyARef()
returns the path of the non-polyA reference file for the
human and mouse genomes.
Typical usage involves running buildRef()
for human and mouse genomes
and specifying the genome_type
to use the default MappabilityRef
and
nonPolyARef
files for the specified genome. For non-human non-mouse
genomes, use one of the following alternatives:
Create the SpliceWiz reference without using Mappability Exclusion regions.
To do this, simply run buildRef()
and omit MappabilityRef
. This is
acceptable assuming the introns assessed are short and do not contain
intronic repeats
Calculating Mappability Exclusion regions using the STAR aligner,
and building the SpliceWiz reference. This can be done using the
buildFullRef()
function, on systems where STAR
is installed
Instead of using the STAR aligner, any genome splice-aware aligner could be
used. See Mappability-methods for
an example workflow using the Rsubread aligner. After producing the
MappabilityExclusion.bed.gz
file (in the Mappability
subfolder), run
buildRef()
using this file (or simply leave it blank).
BED files are tab-separated text files containing 3 unnamed columns
specifying chromosome, start and end coordinates. To view an example BED
file, open the file specified in the path returned by
getNonPolyARef("hg38")
If MappabilityRef
, nonPolyARef
and BlacklistRef
are left blank, the
following will be used (by priority):
The previously used Mappability, non-polyA and/or Blacklist file resource from a previous run, if available,
The resource implied by the genome_type
parameter, if specified,
No resource is used.
To rebuild a SpliceWiz reference using existing resources
This is typically run when updating an old resource to a new SpliceWiz
version. Simply run buildRef(), specifying the existing reference directory,
leave the fasta
and gtf
parameters blank, and set overwrite = TRUE
.
SpliceWiz will use the previously-used resources to re-create the reference.
See examples below for common use cases.
For getResources
: creates the following local resources:
reference_path/resource/genome.2bit
: Local copy of the genome sequences
as a TwoBitFile.
reference_path/resource/transcripts.gtf.gz
: Local copy of the gene
annotation as a gzip-compressed file.
For buildRef()
and buildFullRef()
: creates a SpliceWiz reference
which is written to the given directory specified by reference_path
.
Files created includes:
reference_path/settings.Rds
: An RDS file containing parameters used
to generate the SpliceWiz reference
reference_path/SpliceWiz.ref.gz
: A gzipped text file containing collated
SpliceWiz reference files. This file is used by processBAM
reference_path/fst/
: Contains fst files for subsequent easy access to
SpliceWiz generated references
reference_path/cov_data.Rds
: An RDS file containing data required to
visualise genome / transcript tracks.
buildFullRef()
also creates a STAR
reference located in the STAR
subdirectory inside the designated reference_path
For getNonPolyARef()
: Returns the file path to the BED file for
the nonPolyA loci for the specified genome.
For getAvailableGO()
: Returns a vector containing names of species with
supported gene ontology annotations.
getResources()
: Processes / downloads a copy of the
genome and gene annotations and stores this in the "resource" subdirectory
of the given reference path
buildRef()
: First calls getResources()
(if required). Afterwards creates the SpliceWiz reference in the
given reference path
buildFullRef()
: One-step function that fetches resources,
creates a STAR reference (including mappability calculations), then
creates the SpliceWiz reference
getNonPolyARef()
: Returns the path to the BED file
containing coordinates of known non-polyadenylated transcripts for genomes
hg38
, hg19
, mm10
and mm9
,
getAvailableGO()
: Returns available species on Bioconductor's
AnnotationHub. Currently, only Bioconductor's OrgDb/Ensembl gene ontology
annotations are supported.
Mappability-methods for methods to calculate low mappability regions
STAR-methods for a list of STAR wrapper functions
AnnotationHub
https://github.com/alexchwong/SpliceWizResources for RDS files of
Mappability Exclusion GRanges objects (for hg38, hg19, mm10 and mm9)
that can be use as input files for MappabilityRef
in buildRef()
.
These resources are intended for SpliceWiz users on older Bioconductor
versions (3.13 or earlier)
# Quick runnable example: generate a reference using SpliceWiz's example genome example_ref <- file.path(tempdir(), "Reference") getResources( reference_path = example_ref, fasta = chrZ_genome(), gtf = chrZ_gtf() ) buildRef( reference_path = example_ref ) # NB: the above is equivalent to: example_ref <- file.path(tempdir(), "Reference") buildRef( reference_path = example_ref, fasta = chrZ_genome(), gtf = chrZ_gtf() ) # Get the path to the Non-PolyA BED file for hg19 getNonPolyARef("hg19") # View available species for AnnotationHub's Ensembl/orgDB-based GO resources availSpecies <- getAvailableGO() # Build example reference with `Homo sapiens` Ens/orgDB gene ontology ont_ref <- file.path(tempdir(), "Reference_withGO") buildRef( reference_path = ont_ref, fasta = chrZ_genome(), gtf = chrZ_gtf(), ontologySpecies = "Homo sapiens" ) ## Not run: ### Long examples ### # Generate a SpliceWiz reference from user supplied FASTA and GTF files for a # hg38-based genome: buildRef( reference_path = "./Reference_user", fasta = "genome.fa", gtf = "transcripts.gtf", genome_type = "hg38" ) # NB: Setting `genome_type = hg38`, will automatically use default # nonPolyARef and MappabilityRef for `hg38` # Reference generation from Ensembl's FTP links: FTP <- "ftp://ftp.ensembl.org/pub/release-94/" buildRef( reference_path = "./Reference_FTP", fasta = paste0(FTP, "fasta/homo_sapiens/dna/", "Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz"), gtf = paste0(FTP, "gtf/homo_sapiens/", "Homo_sapiens.GRCh38.94.chr.gtf.gz"), genome_type = "hg38" ) # Get AnnotationHub record names for Ensembl release-94: # First, search for the relevant AnnotationHub record names: ah <- AnnotationHub::AnnotationHub() AnnotationHub::query(ah, c("Homo Sapiens", "release-94")) buildRef( reference_path = "./Reference_AH", fasta = "AH65745", gtf = "AH64631", genome_type = "hg38" ) # Build a SpliceWiz reference, setting chromosome aliases to allow # this reference to process BAM files aligned to UCSC-style genomes: chrom.df <- GenomeInfoDb::genomeStyles()$Homo_sapiens buildRef( reference_path = "./Reference_UCSC", fasta = "AH65745", gtf = "AH64631", genome_type = "hg38", chromosome_aliases = chrom.df[, c("Ensembl", "UCSC")] ) # One-step generation of SpliceWiz and STAR references, using 4 threads. # NB1: requires a linux-based system with STAR installed. # NB2: A STAR reference genome will be generated in the `STAR` subfolder # inside the given `reference_path`. # NB3: A custom Mappability Exclusion file will be calculated using STAR # and will be used to generate the SpliceWiz reference. buildFullRef( reference_path = "./Reference_with_STAR", fasta = "genome.fa", gtf = "transcripts.gtf", genome_type = "hg38", use_STAR_mappability = TRUE, n_threads = 4 ) # NB: the above is equivalent to running the following in sequence: getResources( reference_path = "./Reference_with_STAR", fasta = "genome.fa", gtf = "transcripts.gtf" ) STAR_buildRef( reference_path = reference_path, also_generate_mappability = TRUE, n_threads = 4 ) buildRef( reference_path = "./Reference_with_STAR", genome_type = "" ) ## End(Not run)
# Quick runnable example: generate a reference using SpliceWiz's example genome example_ref <- file.path(tempdir(), "Reference") getResources( reference_path = example_ref, fasta = chrZ_genome(), gtf = chrZ_gtf() ) buildRef( reference_path = example_ref ) # NB: the above is equivalent to: example_ref <- file.path(tempdir(), "Reference") buildRef( reference_path = example_ref, fasta = chrZ_genome(), gtf = chrZ_gtf() ) # Get the path to the Non-PolyA BED file for hg19 getNonPolyARef("hg19") # View available species for AnnotationHub's Ensembl/orgDB-based GO resources availSpecies <- getAvailableGO() # Build example reference with `Homo sapiens` Ens/orgDB gene ontology ont_ref <- file.path(tempdir(), "Reference_withGO") buildRef( reference_path = ont_ref, fasta = chrZ_genome(), gtf = chrZ_gtf(), ontologySpecies = "Homo sapiens" ) ## Not run: ### Long examples ### # Generate a SpliceWiz reference from user supplied FASTA and GTF files for a # hg38-based genome: buildRef( reference_path = "./Reference_user", fasta = "genome.fa", gtf = "transcripts.gtf", genome_type = "hg38" ) # NB: Setting `genome_type = hg38`, will automatically use default # nonPolyARef and MappabilityRef for `hg38` # Reference generation from Ensembl's FTP links: FTP <- "ftp://ftp.ensembl.org/pub/release-94/" buildRef( reference_path = "./Reference_FTP", fasta = paste0(FTP, "fasta/homo_sapiens/dna/", "Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz"), gtf = paste0(FTP, "gtf/homo_sapiens/", "Homo_sapiens.GRCh38.94.chr.gtf.gz"), genome_type = "hg38" ) # Get AnnotationHub record names for Ensembl release-94: # First, search for the relevant AnnotationHub record names: ah <- AnnotationHub::AnnotationHub() AnnotationHub::query(ah, c("Homo Sapiens", "release-94")) buildRef( reference_path = "./Reference_AH", fasta = "AH65745", gtf = "AH64631", genome_type = "hg38" ) # Build a SpliceWiz reference, setting chromosome aliases to allow # this reference to process BAM files aligned to UCSC-style genomes: chrom.df <- GenomeInfoDb::genomeStyles()$Homo_sapiens buildRef( reference_path = "./Reference_UCSC", fasta = "AH65745", gtf = "AH64631", genome_type = "hg38", chromosome_aliases = chrom.df[, c("Ensembl", "UCSC")] ) # One-step generation of SpliceWiz and STAR references, using 4 threads. # NB1: requires a linux-based system with STAR installed. # NB2: A STAR reference genome will be generated in the `STAR` subfolder # inside the given `reference_path`. # NB3: A custom Mappability Exclusion file will be calculated using STAR # and will be used to generate the SpliceWiz reference. buildFullRef( reference_path = "./Reference_with_STAR", fasta = "genome.fa", gtf = "transcripts.gtf", genome_type = "hg38", use_STAR_mappability = TRUE, n_threads = 4 ) # NB: the above is equivalent to running the following in sequence: getResources( reference_path = "./Reference_with_STAR", fasta = "genome.fa", gtf = "transcripts.gtf" ) STAR_buildRef( reference_path = reference_path, also_generate_mappability = TRUE, n_threads = 4 ) buildRef( reference_path = "./Reference_with_STAR", genome_type = "" ) ## End(Not run)
collateData()
creates a dataset from a collection of processBAM
output files belonging to an experiment.
collateData( Experiment, reference_path, output_path, IRMode = c("SpliceOver", "SpliceMax"), packageCOVfiles = FALSE, novelSplicing = FALSE, forceStrandAgnostic = FALSE, novelSplicing_minSamples = 3, novelSplicing_countThreshold = 10, novelSplicing_minSamplesAboveThreshold = 1, novelSplicing_requireOneAnnotatedSJ = TRUE, novelSplicing_useTJ = TRUE, overwrite = FALSE, n_threads = 1, lowMemoryMode = TRUE )
collateData( Experiment, reference_path, output_path, IRMode = c("SpliceOver", "SpliceMax"), packageCOVfiles = FALSE, novelSplicing = FALSE, forceStrandAgnostic = FALSE, novelSplicing_minSamples = 3, novelSplicing_countThreshold = 10, novelSplicing_minSamplesAboveThreshold = 1, novelSplicing_requireOneAnnotatedSJ = TRUE, novelSplicing_useTJ = TRUE, overwrite = FALSE, n_threads = 1, lowMemoryMode = TRUE )
Experiment |
(Required) A 2 or 3 column data frame, ideally generated by
findSpliceWizOutput or findSamples.
The first column designate the sample names, and the 2nd column
contains the path to the processBAM output file (of type
|
reference_path |
(Required) The path to the reference generated by Build-Reference-methods |
output_path |
(Required) The path to contain the output files for the collated dataset |
IRMode |
(default |
packageCOVfiles |
(default |
novelSplicing |
(default FALSE) Whether collateData will use
novel junction reads detected in samples to infer novel splice variants.
All tandem split reads (those bridging two consecutive splice junctions)
are used, as well as novel split reads that satisfy abundance criteria
(see |
forceStrandAgnostic |
(default |
novelSplicing_minSamples |
(default 3) Novel junctions are included in building of novel reference if number samples with non-zero counts exceeds this number. |
novelSplicing_countThreshold |
(default 10) Threshold of split-reads across
novel junctions; used in conjunction with
|
novelSplicing_minSamplesAboveThreshold |
(default 1) Novel junctions are included in building of novel reference if novel junction reads are above a pre-defined threshold exceeds this number |
novelSplicing_requireOneAnnotatedSJ |
(default |
novelSplicing_useTJ |
(default |
overwrite |
(default |
n_threads |
(default |
lowMemoryMode |
(default |
In Windows, collateData runs using only 1 thread, as BiocParallel's MulticoreParam is not supported.
It is assumed that all sample processBAM outputs were generated using the same reference.
The combination of junction counts and IR quantification from processBAM is used to calculate percentage spliced in (PSI) of alternative splice events, and intron retention ratios (IR-ratio) of retained introns. Also, QC information is collated. Data is organised in a H5file and FST files for memory and processor efficient downstream access using makeSE.
The original IRFinder algorithm, see the following
wiki,
uses SpliceMax
to estimate abundance of spliced transcripts.
This calculates the number of mapped splice events
that share the boundary coordinate of either the left or right flanking
exon SpliceLeft,SpliceRight
, estimating splice abundance as the larger
of the two values.
SpliceWiz proposes a new algorithm, SpliceOver
,
to account for the possibility that the major isoform shares neither
boundary, but arises from either of the flanking exon clusters. Exon
clusters are contiguous regions covered by exons from any transcript
(except those designated as retained_intron
or
sense_intronic
), and are separated by
obligate intronic regions (genomic regions that are introns for all
transcripts). For introns that are internal to a single exon cluster
(i.e. akin to "known-exon" introns from IRFinder), SpliceOver
uses GenomicRanges::findOverlaps to sum all splice reads that overlap
the same genomic region as the intron of interest.
Detection of novel ASEs: When novelSplicing
is set to TRUE
,
novel junctions (split reads across unannotated junctions from samples
of the dataset being collated) are used in conjunction with the reference
to compile a list of novel ASEs. To avoid being overwhelmed by a large
number of false positive novel junctions (often due to mis-alignments),
a simple filtering strategy is used. This involves including novel
junctions only if it occurs in a minimum number of samples (default 3),
or if the number of split reads of a novel junction is above a pre-defined
threshold (default 10) in a certain number of samples (default 1). These
parameters can be set using novelSplicing_minSamples
,
novelSplicing_countThreshold
and novelSplicing_minSamplesAboveThreshold
respectively.
collateData()
writes to the directory given by output_path
.
This output directory is portable (i.e. it can be moved to a different
location after running collateData()
before running makeSE), but
individual files within the output folder should not be moved.
Also, the processBAM and collateData output folders should be copied to
the same destination and their relative paths preserved. Otherwise, the
locations of the "COV" files will not be recorded in the collated data and
will have to be re-assigned using covfile(se)<-
. See makeSE
buildRef( reference_path = file.path(tempdir(), "Reference"), fasta = chrZ_genome(), gtf = chrZ_gtf() ) bams <- SpliceWiz_example_bams() processBAM(bams$path, bams$sample, reference_path = file.path(tempdir(), "Reference"), output_path = file.path(tempdir(), "SpliceWiz_Output") ) expr <- findSpliceWizOutput(file.path(tempdir(), "SpliceWiz_Output")) collateData(expr, reference_path = file.path(tempdir(), "Reference"), output_path = file.path(tempdir(), "Collated_output") ) # Enable novel splicing: collateData(expr, reference_path = file.path(tempdir(), "Reference"), output_path = file.path(tempdir(), "Collated_output"), novelSplicing = TRUE )
buildRef( reference_path = file.path(tempdir(), "Reference"), fasta = chrZ_genome(), gtf = chrZ_gtf() ) bams <- SpliceWiz_example_bams() processBAM(bams$path, bams$sample, reference_path = file.path(tempdir(), "Reference"), output_path = file.path(tempdir(), "SpliceWiz_Output") ) expr <- findSpliceWizOutput(file.path(tempdir(), "SpliceWiz_Output")) collateData(expr, reference_path = file.path(tempdir(), "Reference"), output_path = file.path(tempdir(), "Collated_output") ) # Enable novel splicing: collateData(expr, reference_path = file.path(tempdir(), "Reference"), output_path = file.path(tempdir(), "Collated_output"), novelSplicing = TRUE )
This function takes a string vector of genomic coordinates and converts it into a GRanges object.
coord2GR(coordinates)
coord2GR(coordinates)
coordinates |
A string vector of one or more genomic coordinates to be converted |
Genomic coordinates can take one of the following syntax:
seqnames:start
seqnames:start-end
seqnames:start-end/strand
The following examples are considered valid genomic coordinates:
"chr1:21535"
"chr3:10550-10730"
"X:51231-51330/-"
"chrM:2134-5232/+"
A GRanges object that corresponds to the given coordinates
se <- SpliceWiz_example_NxtSE() coordinates <- rowData(se)$EventRegion gr <- coord2GR(coordinates)
se <- SpliceWiz_example_NxtSE() coordinates <- rowData(se)$EventRegion gr <- coord2GR(coordinates)
This object is generated using getCoverageData or getGenomeData methods, and is used as input for generating coverage plots.
## S4 method for signature 'covDataObject' showEvents(object) getCoverageData( se, Event, Gene, seqname, start, end, coordinates, strand = c("*", "+", "-"), zoom_factor = 0.2, bases_flanking = 100, tracks, condition, ... ) getGenomeData( reference_path, Gene, seqname, start, end, coordinates, zoom_factor = 0.2, bases_flanking = 100, ... ) plotAnnoTrack( object, Event, view_start, view_end, reverseGenomeCoords = FALSE, condensed = FALSE, selected_transcripts = "", plot_key_isoforms = FALSE, usePlotly = FALSE, ... )
## S4 method for signature 'covDataObject' showEvents(object) getCoverageData( se, Event, Gene, seqname, start, end, coordinates, strand = c("*", "+", "-"), zoom_factor = 0.2, bases_flanking = 100, tracks, condition, ... ) getGenomeData( reference_path, Gene, seqname, start, end, coordinates, zoom_factor = 0.2, bases_flanking = 100, ... ) plotAnnoTrack( object, Event, view_start, view_end, reverseGenomeCoords = FALSE, condensed = FALSE, selected_transcripts = "", plot_key_isoforms = FALSE, usePlotly = FALSE, ... )
object |
For |
se |
A NxtSE object, created by makeSE.
COV files must be linked to the NxtSE object. To do this, see the example
in makeSE. Required by |
Event |
The |
Gene |
Whether to use the range for the given |
seqname , start , end
|
The chromosome (string) and genomic |
coordinates |
A string specifying genomic coordinates can be given
instead of |
strand |
Whether to show coverage of both strands "*" (default), or from the "+" or "-" strand only. |
zoom_factor |
Zoom out from event. Each level of zoom zooms out by a
factor of 3. E.g. for a query region of chr1:10000-11000, if a
|
bases_flanking |
(Default = |
tracks |
The names of individual samples,
or the names of the different conditions to be plotted. For the latter, set
|
condition |
To display normalised coverage per condition, set this to
the condition category. If omitted, |
... |
Ignored / not used |
reference_path |
The path of the reference generated by
Build-Reference-methods. Required by |
view_start , view_end
|
Start and end coordinates of plotting function. Note that plot coordinates may be different from retrieval coordinates and is useful for zooming in. |
reverseGenomeCoords |
Whether the genomic axis should be reversed to make it more convenient to plot reverse-stranded genes |
condensed |
(default 'FALSE) Whether the genomic track should be condensed to plot whole genes, rather than transcripts. Preferred if multiple genes are plotted on a zoomed-out plot |
selected_transcripts |
(default |
plot_key_isoforms |
(default |
usePlotly |
(default |
For getCoverageData(): A covDataObject containing required data used to generate downstream For plotAnnoTrack(): A ggplot or plotly object
showEvents(covDataObject)
: Returns the EventNames for which events can
be normalized using the given covDataObject
getCoverageData()
: Get coverage / genome data for plotting
coverage plots
getGenomeData()
: Get coverage / genome data for plotting
coverage plots
plotAnnoTrack()
: Directly plots the annotation from a
covDataObject.
se <- SpliceWiz_example_NxtSE(novelSplicing = TRUE) # Assign annotation of the experimental conditions colData(se)$treatment <- rep(c("A", "B"), each = 3) dataObj <- getCoverageData( se, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3", tracks = colnames(se) ) # Show `EventName`s of supported splicing events # contained within covDataObject showEvents(dataObj) # A limited covDataObject containing only the reference can be generated # from the SpliceWiz reference buildRef( reference_path = file.path(tempdir(), "Reference"), fasta = chrZ_genome(), gtf = chrZ_gtf() ) genomeObj <- getGenomeData( reference_path = file.path(tempdir(), "Reference"), Gene = "SRSF3" ) # Plot reference track directly from the covDataObject # NB: Event plotting is not supported for reference-derived `covDataObject`s plotAnnoTrack(genomeObj) plotAnnoTrack(dataObj, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3")
se <- SpliceWiz_example_NxtSE(novelSplicing = TRUE) # Assign annotation of the experimental conditions colData(se)$treatment <- rep(c("A", "B"), each = 3) dataObj <- getCoverageData( se, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3", tracks = colnames(se) ) # Show `EventName`s of supported splicing events # contained within covDataObject showEvents(dataObj) # A limited covDataObject containing only the reference can be generated # from the SpliceWiz reference buildRef( reference_path = file.path(tempdir(), "Reference"), fasta = chrZ_genome(), gtf = chrZ_gtf() ) genomeObj <- getGenomeData( reference_path = file.path(tempdir(), "Reference"), Gene = "SRSF3" ) # Plot reference track directly from the covDataObject # NB: Event plotting is not supported for reference-derived `covDataObject`s plotAnnoTrack(genomeObj) plotAnnoTrack(dataObj, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3")
This function returns an RLE / RLEList or data.frame
containing coverage data from the given COV file
COV files are generated by SpliceWiz's processBAM and BAM2COV functions.
It records alignment coverage for each nucleotide in the given BAM file.
It stores this data in "COV" format, which is an indexed BGZF-compressed
format specialised for the storage of unstranded and stranded alignment
coverage in RNA sequencing.
Unlike BigWig files, COV files store coverage for both positive and negative
strands.
These functions retrieves coverage data from the specified COV file. They
are computationally efficient as they utilise random-access to rapidly
search for the requested data from the COV file.
getCoverage(file, seqname = "", start = 0, end = 0, strand = c("*", "+", "-")) getCoverage_DF( file, seqname = "", start = 0, end = 0, strand = c("*", "+", "-") ) getCoverageRegions( file, regions, strandMode = c("unstranded", "forward", "reverse") ) getCoverageBins( file, region, bins = 2000, strandMode = c("unstranded", "forward", "reverse"), bin_size )
getCoverage(file, seqname = "", start = 0, end = 0, strand = c("*", "+", "-")) getCoverage_DF( file, seqname = "", start = 0, end = 0, strand = c("*", "+", "-") ) getCoverageRegions( file, regions, strandMode = c("unstranded", "forward", "reverse") ) getCoverageBins( file, region, bins = 2000, strandMode = c("unstranded", "forward", "reverse"), bin_size )
file |
(Required) The file name of the COV file |
seqname |
(Required for |
start , end
|
1-based genomic coordinates. If |
strand |
Either "*", "+", or "-" |
regions |
A GRanges object for a set of regions to obtain mean / total coverage from the given COV file. |
strandMode |
The stranded-ness of the RNA-seq experiment. "unstranded" means that an unstranded protocol was used. Stranded protocols can be either "forward", where the first read is the same strand as the expressed transcript, or "reverse" where the second strand is the same strand as the expressed transcript. |
region |
In |
bins |
In |
bin_size |
In |
For getCoverage
: If seqname is left as "", returns an RLEList of the
whole BAM file, with each RLE in the list containing coverage data for
one chromosome. Otherwise, returns an RLE containing coverage data for
the requested genomic region
For getCoverage_DF
: Returns a two-column data frame, with the first column
coordinate
denoting genomic coordinate, and the second column value
containing the coverage depth for each coordinate nucleotide.
For getCoverageRegions
: Returns a GRanges object with an extra metacolumn:
cov_mean
, which gives the mean coverage of each of the given ranges.
For getCoverageBins
: Returns a GRanges object which spans the given
region
, divided by the number of bins
or by width as given by
bin_size
. Mean coverage in each bin is calculated (returned by the
cov_mean
metadata column). This function is useful for retrieving
coverage of a large region for visualisation, especially when the
size of the region vastly exceeds the width of the figure.
getCoverage()
: Retrieves alignment coverage as an RLE or RLElist
getCoverage_DF()
: Retrieves alignment coverage as a data.frame
getCoverageRegions()
: Retrieves total and mean coverage of a GRanges object
from a COV file
getCoverageBins()
: Retrieves coverage of a single region from a COV file,
binned by the given number of bins or bin_size
se <- SpliceWiz_example_NxtSE() cov_file <- covfile(se)[1] # Retrieve Coverage as RLE cov <- getCoverage(cov_file, seqname = "chrZ", start = 10000, end = 20000, strand = "*" ) # Retrieve Coverage as data.frame cov.df <- getCoverage_DF(cov_file, seqname = "chrZ", start = 10000, end = 20000, strand = "*" ) # Retrieve mean coverage of 100-nt window regions as defined # in a GRanges object: gr <- GenomicRanges::GRanges( seqnames = "chrZ", ranges = IRanges::IRanges( start = seq(1, 99901, by = 100), end = seq(100, 100000, by = 100) ), strand = "-" ) gr.unstranded <- getCoverageRegions(cov_file, regions = gr, strandMode = "unstranded" ) gr.stranded <- getCoverageRegions(cov_file, regions = gr, strandMode = "reverse" ) # Retrieve binned coverage of a large region gr.fetch <- getCoverageBins( cov_file, region = GenomicRanges::GRanges(seqnames = "chrZ", ranges = IRanges::IRanges(start = 100, end = 100000), strand = "*" ), bins = 2000 ) # Plot coverage using ggplot: require(ggplot2) ggplot(cov.df, aes(x = coordinate, y = value)) + geom_line() + theme_white ggplot(as.data.frame(gr.unstranded), aes(x = (start + end) / 2, y = cov_mean)) + geom_line() + theme_white ggplot(as.data.frame(gr.fetch), aes(x = (start + end)/2, y = cov_mean)) + geom_line() + theme_white # Export COV data as BigWig cov_whole <- getCoverage(cov_file) bw_file <- file.path(tempdir(), "sample.bw") rtracklayer::export(cov_whole, bw_file, "bw")
se <- SpliceWiz_example_NxtSE() cov_file <- covfile(se)[1] # Retrieve Coverage as RLE cov <- getCoverage(cov_file, seqname = "chrZ", start = 10000, end = 20000, strand = "*" ) # Retrieve Coverage as data.frame cov.df <- getCoverage_DF(cov_file, seqname = "chrZ", start = 10000, end = 20000, strand = "*" ) # Retrieve mean coverage of 100-nt window regions as defined # in a GRanges object: gr <- GenomicRanges::GRanges( seqnames = "chrZ", ranges = IRanges::IRanges( start = seq(1, 99901, by = 100), end = seq(100, 100000, by = 100) ), strand = "-" ) gr.unstranded <- getCoverageRegions(cov_file, regions = gr, strandMode = "unstranded" ) gr.stranded <- getCoverageRegions(cov_file, regions = gr, strandMode = "reverse" ) # Retrieve binned coverage of a large region gr.fetch <- getCoverageBins( cov_file, region = GenomicRanges::GRanges(seqnames = "chrZ", ranges = IRanges::IRanges(start = 100, end = 100000), strand = "*" ), bins = 2000 ) # Plot coverage using ggplot: require(ggplot2) ggplot(cov.df, aes(x = coordinate, y = value)) + geom_line() + theme_white ggplot(as.data.frame(gr.unstranded), aes(x = (start + end) / 2, y = cov_mean)) + geom_line() + theme_white ggplot(as.data.frame(gr.fetch), aes(x = (start + end)/2, y = cov_mean)) + geom_line() + theme_white # Export COV data as BigWig cov_whole <- getCoverage(cov_file) bw_file <- file.path(tempdir(), "sample.bw") rtracklayer::export(cov_whole, bw_file, "bw")
A covPlotly
object is created when plotView is called using
a covPlotObject
as input. It stores metadata alongside the plotly object,
which allows it to be drawn at various resolutions. Smaller resolutions lead
to faster draws at expense of more jagged plots.
## S4 method for signature 'covPlotly' getExonRanges(object) ## S4 method for signature 'covPlotly' setResolution(object, resolution) ## S4 method for signature 'covPlotly' showExons(object)
## S4 method for signature 'covPlotly' getExonRanges(object) ## S4 method for signature 'covPlotly' setResolution(object, resolution) ## S4 method for signature 'covPlotly' showExons(object)
object |
A covPlotly object |
resolution |
How many horizontal pixels of resolution should be shown
in the final plotly object. Set to |
For show()
: A plotly object synthesised by plotView()
For getExonRanges()
: A named GRanges object containing exon ranges
For showExons()
: A named GRanges object containing exon ranges, and
additionally "shows" the plotly coverage plot with annotation replaced
by named exons
For setResolution()
Returns the covPlotly
object with addition of
resolution set by the corresponding parameter. When show()
is called,
the plotly object with the new coverage resolution will be displayed.
getExonRanges(covPlotly)
: Returns a named GRanges object containing exon
ranges, without showing the associated plotly object
setResolution(covPlotly)
: Returns a covPlotly object after setting
the output resolution of the plotly-based coverage plots.
showExons(covPlotly)
: Returns a named GRanges object containing exon
ranges, and shows the plotly object with the annotation track showing the
named exons
se <- SpliceWiz_example_NxtSE(novelSplicing = TRUE) # Assign annotation of the experimental conditions colData(se)$treatment <- rep(c("A", "B"), each = 3) # Retrieve coverage data for all samples for the gene "SRSF3" (and surrounds) dataObj <- getCoverageData( se, Gene = "SRSF3", tracks = colnames(se) ) plotObj_samples <- getPlotObject( dataObj, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3" ) if(interactive()) { # Create covPlotly object by setting `usePlotly = TRUE` p <- plotView(plotObj_samples, usePlotly = TRUE) # Display plotly plot show(p) # Set resolution to 2000; display new plot p <- setResolution(p, resolution = 2000) show(p) # Display exon annotation along with generated plot; # - also returns GRanges object gr <- showExons(p) }
se <- SpliceWiz_example_NxtSE(novelSplicing = TRUE) # Assign annotation of the experimental conditions colData(se)$treatment <- rep(c("A", "B"), each = 3) # Retrieve coverage data for all samples for the gene "SRSF3" (and surrounds) dataObj <- getCoverageData( se, Gene = "SRSF3", tracks = colnames(se) ) plotObj_samples <- getPlotObject( dataObj, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3" ) if(interactive()) { # Create covPlotly object by setting `usePlotly = TRUE` p <- plotView(plotObj_samples, usePlotly = TRUE) # Display plotly plot show(p) # Set resolution to 2000; display new plot p <- setResolution(p, resolution = 2000) show(p) # Display exon annotation along with generated plot; # - also returns GRanges object gr <- showExons(p) }
Here, we implement fast and versatile ggplot and plotly based coverage and sashimi plots. Users can plot with unlimited number of individual, individual-normalized, or group-normalized tracks. Also implemented is user-defined group-comparison differential plots (including t-test plots). Additionally, users can generate ggplots subsetted by exon groups. See details below.
getPlotObject(object, Event, strand = c("*", "+", "-"), tracks, condition, ...) ## S4 method for signature 'covPlotObject' tracks(object) ## S4 method for signature 'covPlotObject' condition(object) plotView( object, view_start, view_end, oldP = covPlotly(), centerByEvent = FALSE, EventZoomFactor = 0.2, EventBasesFlanking = 100, resolution = 5000, trackList = list(), diff_stat = c("t-test", "none"), diffList = list(), reverseGenomeCoords = FALSE, ribbon_mode = c("sd", "sem", "ci", "none"), normalizeCoverage = FALSE, plotAnnotations = TRUE, plotAnnoSubTrack = TRUE, showExonRanges = FALSE, verticalLayout = c(4, 1, 1, 2), horizontalLayout = c(), filterByTranscripts = "", filterByEventTranscripts = FALSE, filterByExpressedTranscripts = TRUE, condenseTranscripts = FALSE, plotJunctions = TRUE, plotJuncPSI = FALSE, junctionThreshold = 0.01, plotRanges = GRanges(), rangesBasesFlanking = 100, usePlotly = FALSE, ... )
getPlotObject(object, Event, strand = c("*", "+", "-"), tracks, condition, ...) ## S4 method for signature 'covPlotObject' tracks(object) ## S4 method for signature 'covPlotObject' condition(object) plotView( object, view_start, view_end, oldP = covPlotly(), centerByEvent = FALSE, EventZoomFactor = 0.2, EventBasesFlanking = 100, resolution = 5000, trackList = list(), diff_stat = c("t-test", "none"), diffList = list(), reverseGenomeCoords = FALSE, ribbon_mode = c("sd", "sem", "ci", "none"), normalizeCoverage = FALSE, plotAnnotations = TRUE, plotAnnoSubTrack = TRUE, showExonRanges = FALSE, verticalLayout = c(4, 1, 1, 2), horizontalLayout = c(), filterByTranscripts = "", filterByEventTranscripts = FALSE, filterByExpressedTranscripts = TRUE, condenseTranscripts = FALSE, plotJunctions = TRUE, plotJuncPSI = FALSE, junctionThreshold = 0.01, plotRanges = GRanges(), rangesBasesFlanking = 100, usePlotly = FALSE, ... )
object |
For |
Event |
The EventName of the alternative splicing event which will be highlighted and used for normalization |
strand |
The strand for coverage / junction plotting. Options are |
tracks |
Sample names or condition categories |
condition |
For condition-based group plots, the name of the condition. |
... |
Ignored / not used |
view_start , view_end
|
The start and end coordinates for plotting |
oldP |
(Optional) If plotting the same tracks and track-widths,
supplying the old |
centerByEvent |
(default |
EventZoomFactor |
If |
EventBasesFlanking |
(default |
resolution |
The number of horizontal "pixels" or data-points to plot.
This is calculated per sub-plot. Smaller numbers lead to lower resolution
but faster plots. Default is |
trackList |
A list, with each element being a vector of 1 or more track names or indices to plot. If a vector is supplied it will be coerced to a list |
diff_stat |
(default "t-test") Which statistical method to perform differential comparisons. |
diffList |
A list, with each element being a vector of size 2, containing names or indices of which tracks to contrast. |
reverseGenomeCoords |
If |
ribbon_mode |
The statistic to represent variance. Options are "sd" - standard deviation, "sem" - standard error of the mean, "ci" - 95% confidence interval, or "none" |
normalizeCoverage |
If |
plotAnnotations |
Whether the main annotation track should be plotted |
plotAnnoSubTrack |
If plotting by exon ranges (using |
showExonRanges |
(only applies if |
verticalLayout |
A vector (of length 4) containing relative heights of
the following elements: (1) main block of coverage tracks, (2) differential
track, (3) annotation sub-track, and (4) main annotation track. Default
|
horizontalLayout |
A vector containing relative widths of coverage
tracks. Only used alongside |
filterByTranscripts |
(default |
filterByEventTranscripts |
(default |
filterByExpressedTranscripts |
(default |
condenseTranscripts |
Whether to plot by genes |
plotJunctions |
Whether to plot junction counts as numbered arcs. Plots
normalized junctions if |
plotJuncPSI |
If plotting group coverage plots, whether to plot
mean +/- sd of normalized junction counts |
junctionThreshold |
(default |
plotRanges |
A GRanges object containing one or more exon ranges to
plot. If given, |
rangesBasesFlanking |
(default |
usePlotly |
If |
The typical pipeline for plotting versatile coverage plots is as follows:
A covDataObject
is generated by calling getCoverageData()
using an
input NxtSE
object. This step retrieves coverage, junction counts and
normalization data for the relevant genomic region being queried. A new
covDataObject
is necessary when querying a new genomic region.
A covPlotObject
is generated by calling getPlotObject()
using an
input covDataObject
. This step retrieves alternative splicing event
specific data, such as normalized coverages, or group combined coverages.
A new covPlotObject
is required when changing condition
, Event
,
strand
, or when querying using a different set of tracks
.
Plots can be generated by calling plotView()
using a covPlotObject
.
Interactive plotly plots can be generated by setting usePlotly = TRUE
,
otherwise, static plots are generated. For interactive plots, a covPlotly
object is returned, which contains raw data which is downsampled by
pixel resolution prior to plotting for performance reasons. A new
covPlotly
is required unless one only wishes to downsample the resolution
see setResolution for covPlotly
objects.
Tracks are now versatile (unlimited). Samples are retrieved by individual
sample names at getCoverageData()
. If condition
is set in
getPlotObject()
, track names are defined by their condition categorical
names; otherwise, tracks are named by individual samples when retrieved using
getPlotObject()
.
When calling plotView()
, trackList
by default displays all tracks as
ordered in the covPlotObject
. Users can supply a vector containing either
the track names (or numbers, as ordered in the covPlotObject
).
Alternatively, multiple traces can be stacked in a single track by
using a list, e.g. trackList = list(A = c(1,2,3), B = c(4,5,6))
.
For differential comparisons, diffList
takes a list of pairs of samples.
For example, if trackList = list("A", "B")
, then setting
diffList = list(c("A", "B"))
will compare groups "A" and "B". This is only
activated by setting diff_stat
to anything other than none
. For now,
only t-test
is supported.
plotView()
supports plotting by exon ranges, for which only static plots
are currently supported. The workflow for generating such a plot is as
follows:
A GRanges object is returned by the plotView()
function and setting
showExonRanges = TRUE
. plotView()
will simultaneously show an
annotation plot of exons labelled by their "exon names", which is the
transcript name appended with "-E" followed by the exon number.
If plotView()
is called and usePlotly = TRUE
is set, a covPlotly
object is returned. Calling showExons()
on this object will display
a plotly plot showing exon names, and returning a GRanges object of exon
ranges.
Exon ranges can be supplied to the plotView()
function by setting
the plotRanges
parameter as a GRanges object. This will generate a static
plot showing coverage plots segmented by exons.
For getPlotObject()
: A covPlotObject
object containing Event-based
data to create coverage plots using plotView()
.
For plotView()
:
If usePlotly = TRUE
, returns a covPlotly
object containing plotly-based
interactive plot
If usePlotly = FALSE
, returns a patchwork-assembled static plot, unless
showExonRanges = TRUE
in which it shows the plot and returns a named
GRanges object containing exon ranges.
getPlotObject()
: Generates a covPlotObject object from a
covDataObject. Allows users to change parameters such as viewing window,
conditions, tracks, and other parameters, for customizing plot parameters
tracks(covPlotObject)
: Returns the tracks contained in the
covPlotObject object
condition(covPlotObject)
: Returns the condition value set in the
covPlotObject object
plotView()
: Creates a coverage plot using the stored
data in the covPlotObject
se <- SpliceWiz_example_NxtSE(novelSplicing = TRUE) # Assign annotation of the experimental conditions colData(se)$treatment <- rep(c("A", "B"), each = 3) # Retrieve coverage data for all samples for the gene "SRSF3" (and surrounds) dataObj <- getCoverageData( se, Gene = "SRSF3", tracks = colnames(se) ) # Retrieves raw / normalized coverage / junction data for the # specified SRSF3 skipped exon event: plotObj_samples <- getPlotObject( dataObj, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3" ) # Retrieves data for samples grouped by the specified condition plotObj_group <- getPlotObject( dataObj, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3", condition = "treatment", tracks = c("A", "B") ) # Display tracks and conditions of covPlotObject tracks(plotObj_group) condition(plotObj_group) # Show static ggplots plotView(plotObj_samples) plotView(plotObj_group, centerByEvent = TRUE) # Plot junctions using PSI estimates based on individual junction SpliceOver # metrics plotView(plotObj_group, centerByEvent = TRUE, plotJuncPSI = TRUE) # Show normalized coverages, individual samples stacked in grouped tracks plotView( plotObj_samples, normalizeCoverage = TRUE, trackList = list(A = c(1,2,3), B = c(4,5,6)) ) # Show stacked group comparisons with t-test plotView( plotObj_group, trackList = list(c(1,2)), diffList = list(c("A", "B")), diff_stat = "t-test" ) # Show interactive plotly: if(interactive()) { p <- plotView(plotObj_samples, usePlotly = TRUE) show(p) } # Show exons with coverage plot # static: gr <- plotView(plotObj_samples, showExonRanges = TRUE) # interactive: if(interactive()) { p <- plotView(plotObj_samples, usePlotly = TRUE) gr <- showExons(p) } # Plot coverage by exons p <- plotView(plotObj_samples, plotRanges = gr[c("SRSF3-203-E3", "SRSF3-203-E4", "SRSF3-203-E5")], horizontalLayout = c(1,1,1) )
se <- SpliceWiz_example_NxtSE(novelSplicing = TRUE) # Assign annotation of the experimental conditions colData(se)$treatment <- rep(c("A", "B"), each = 3) # Retrieve coverage data for all samples for the gene "SRSF3" (and surrounds) dataObj <- getCoverageData( se, Gene = "SRSF3", tracks = colnames(se) ) # Retrieves raw / normalized coverage / junction data for the # specified SRSF3 skipped exon event: plotObj_samples <- getPlotObject( dataObj, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3" ) # Retrieves data for samples grouped by the specified condition plotObj_group <- getPlotObject( dataObj, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3", condition = "treatment", tracks = c("A", "B") ) # Display tracks and conditions of covPlotObject tracks(plotObj_group) condition(plotObj_group) # Show static ggplots plotView(plotObj_samples) plotView(plotObj_group, centerByEvent = TRUE) # Plot junctions using PSI estimates based on individual junction SpliceOver # metrics plotView(plotObj_group, centerByEvent = TRUE, plotJuncPSI = TRUE) # Show normalized coverages, individual samples stacked in grouped tracks plotView( plotObj_samples, normalizeCoverage = TRUE, trackList = list(A = c(1,2,3), B = c(4,5,6)) ) # Show stacked group comparisons with t-test plotView( plotObj_group, trackList = list(c(1,2)), diffList = list(c("A", "B")), diff_stat = "t-test" ) # Show interactive plotly: if(interactive()) { p <- plotView(plotObj_samples, usePlotly = TRUE) show(p) } # Show exons with coverage plot # static: gr <- plotView(plotObj_samples, showExonRanges = TRUE) # interactive: if(interactive()) { p <- plotView(plotObj_samples, usePlotly = TRUE) gr <- showExons(p) } # Plot coverage by exons p <- plotView(plotObj_samples, plotRanges = gr[c("SRSF3-203-E3", "SRSF3-203-E4", "SRSF3-203-E5")], horizontalLayout = c(1,1,1) )
SpliceWiz_example_bams()
is a wrapper function to obtain and make a local
copy
of 6 example files provided by the NxtIRFdata companion package to
demonstrate the use of SpliceWiz. See NxtIRFdata::example_bams for
a description of the provided BAM files. SpliceWiz_example_NxtSE()
retrieves a ready-made functioning
NxtSE object. The steps to reproduce this object is shown
in the example code in makeSE
SpliceWiz_example_bams() SpliceWiz_example_NxtSE(novelSplicing = FALSE)
SpliceWiz_example_bams() SpliceWiz_example_NxtSE(novelSplicing = FALSE)
novelSplicing |
Whether to import an example NxtSE with novel splice event discovery. |
In SpliceWiz_example_bams()
: returns a 2-column data frame containing
sample names and BAM paths of the example dataset.
In SpliceWiz_example_NxtSE()
: returns a NxtSE object.
SpliceWiz_example_bams()
: Returns a 2-column data frame, containing
sample names and sample paths (in tempdir()) of example BAM files
SpliceWiz_example_NxtSE()
: Returns a (in-memory / realized) NxtSE
object that was pre-generated using the SpliceWiz example reference and
example BAM files
Generation of the mappability files was performed using SpliceWiz using a method analogous to that described in:
Middleton R, Gao D, Thomas A, Singh B, Au A, Wong JJ, Bomane A, Cosson B, Eyras E, Rasko JE, Ritchie W. IRFinder: assessing the impact of intron retention on mammalian gene expression. Genome Biol. 2017 Mar 15;18(1):51. doi:10.1186/s13059-017-1184-4
# returns a data frame with the first column as sample names, and the # second column as BAM paths SpliceWiz_example_bams() # Returns a NxtSE object created by the example bams aligned to the # mock NxtSE reference se <- SpliceWiz_example_NxtSE()
# returns a data frame with the first column as sample names, and the # second column as BAM paths SpliceWiz_example_bams() # Returns a NxtSE object created by the example bams aligned to the # mock NxtSE reference se <- SpliceWiz_example_NxtSE()
Often, files e.g. raw sequencing FASTQ files, alignment BAM files,
or processBAM output files, are stored in a single folder under some
directory structure.
They can be grouped by being in common directory or having common names.
Often, their sample names can be gleaned by these common names or the names
of the folders in which they are contained.
This function (recursively) finds all files and
extracts sample names assuming either the files are named by sample names
(level = 0
), or that their names can be derived from the
parent folder (level = 1
). Higher level
also work (e.g. level = 2
)
mean the parent folder of the parent folder of the file is named by sample
names. See details section below.
findSamples(sample_path, suffix = ".txt.gz", level = 0) findFASTQ( sample_path, paired = TRUE, fastq_suffix = c(".fastq", ".fq", ".fastq.gz", ".fq.gz"), level = 0 ) findBAMS(sample_path, level = 0) findSpliceWizOutput(sample_path, level = 0)
findSamples(sample_path, suffix = ".txt.gz", level = 0) findFASTQ( sample_path, paired = TRUE, fastq_suffix = c(".fastq", ".fq", ".fastq.gz", ".fq.gz"), level = 0 ) findBAMS(sample_path, level = 0) findSpliceWizOutput(sample_path, level = 0)
sample_path |
The path in which to recursively search for files
that match the given |
suffix |
A vector of or or more strings that specifies the file suffix (e.g. '.bam' denotes BAM files, whereas ".txt.gz" denotes gzipped txt files). |
level |
Whether sample names can be found in the file names themselves (level = 0), or their parent directory (level = 1). Potentially parent of parent directory (level = 2). Support max level <= 3 (for sanity). |
paired |
Whether to expect single FASTQ files (of the format "sample.fastq"), or paired files (of the format "sample_1.fastq", "sample_2.fastq") |
fastq_suffix |
The name of the FASTQ suffix. Options are: ".fastq", ".fastq.gz", ".fq", or ".fq.gz" |
Paired FASTQ files are assumed to be named using the suffix _1
and _2
after their common names; e.g. sample_1.fastq
, sample_2.fastq
. Alternate
FASTQ suffixes for findFASTQ()
include ".fq", ".fastq.gz", and ".fq.gz".
In BAM files, often the parent directory denotes their sample names. In this
case, use level = 1
to automatically annotate the sample names using
findBAMS()
.
processBAM outputs two files per BAM processed. These are named by the
given sample names. The text output is named "sample1.txt.gz", and the COV
file is named "sample1.cov", where sample1
is the name of the sample. These
files can be organised / tabulated using the function findSpliceWizOutput
.
The generic function findSamples
will organise the processBAM text output
files but exclude the COV files. Use the latter as the Experiment
in
collateData if one decides to collate an experiment without linked COV
files, for portability reasons.
A multi-column data frame with the first column containing
the sample name, and subsequent columns being the file paths with suffix
as determined by suffix
.
findSamples()
: Finds all files with the given suffix pattern.
Annotates sample names based on file or parent folder names.
findFASTQ()
: Use findSamples() to return all FASTQ files
in a given folder
findBAMS()
: Use findSamples() to return all BAM files in a
given folder
findSpliceWizOutput()
: Use findSamples() to return all processBAM output
files in a given folder, including COV files
# Retrieve all BAM files in a given folder, named by sample names bam_path <- tempdir() example_bams(path = bam_path) df.bams <- findSamples(sample_path = bam_path, suffix = ".bam", level = 0) # equivalent to: df.bams <- findBAMS(bam_path, level = 0) # Retrieve all processBAM() output files in a given folder, # named by sample names expr <- findSpliceWizOutput(file.path(tempdir(), "SpliceWiz_Output")) ## Not run: # Find FASTQ files in a directory, named by sample names # where files are in the form: # - "./sample_folder/sample1.fastq" # - "./sample_folder/sample2.fastq" findFASTQ("./sample_folder", paired = FALSE, fastq_suffix = ".fastq") # Find paired gzipped FASTQ files in a directory, named by parent directory # where files are in the form: # - "./sample_folder/sample1/raw_1.fq.gz" # - "./sample_folder/sample1/raw_2.fq.gz" # - "./sample_folder/sample2/raw_1.fq.gz" # - "./sample_folder/sample2/raw_2.fq.gz" findFASTQ("./sample_folder", paired = TRUE, fastq_suffix = ".fq.gz") ## End(Not run)
# Retrieve all BAM files in a given folder, named by sample names bam_path <- tempdir() example_bams(path = bam_path) df.bams <- findSamples(sample_path = bam_path, suffix = ".bam", level = 0) # equivalent to: df.bams <- findBAMS(bam_path, level = 0) # Retrieve all processBAM() output files in a given folder, # named by sample names expr <- findSpliceWizOutput(file.path(tempdir(), "SpliceWiz_Output")) ## Not run: # Find FASTQ files in a directory, named by sample names # where files are in the form: # - "./sample_folder/sample1.fastq" # - "./sample_folder/sample2.fastq" findFASTQ("./sample_folder", paired = FALSE, fastq_suffix = ".fastq") # Find paired gzipped FASTQ files in a directory, named by parent directory # where files are in the form: # - "./sample_folder/sample1/raw_1.fq.gz" # - "./sample_folder/sample1/raw_2.fq.gz" # - "./sample_folder/sample2/raw_1.fq.gz" # - "./sample_folder/sample2/raw_2.fq.gz" findFASTQ("./sample_folder", paired = TRUE, fastq_suffix = ".fq.gz") ## End(Not run)
Genes containing differential alternative splicing events (ASEs) may be enriched in key functional pathways. This can be identified using a simple over-representation analysis. Biologists can identify key pathways of interest in order to focus on studying ASEs belonging to genes of functional interest.
goASE( enrichedEventNames, universeEventNames = NULL, se, ontologyType = c("BP", "MF", "CC"), pAdjustMethod = c("BH", "holm", "hochberg", "hommel", "bonferroni", "BY", "fdr", "none"), ontologyRef = NULL, ... ) goGenes( enrichedGenes, universeGenes = NULL, ontologyRef, ontologyType = c("BP", "MF", "CC"), pAdjustMethod = c("BH", "holm", "hochberg", "hommel", "bonferroni", "BY", "fdr", "none"), ... ) extract_gene_ids_for_GO(enrichedEventNames, universeEventNames = NULL, se) subset_EventNames_by_GO(EventNames, go_id, se) plotGO( res_go = NULL, plot_x = c("log10FDR", "foldEnrichment", "nGenes"), plot_size = c("nGenes", "foldEnrichment", "log10FDR"), plot_color = c("foldEnrichment", "nGenes", "log10FDR"), filter_n_terms = 20, filter_padj = 1, filter_pvalue = 1, trim_go_term = 50 )
goASE( enrichedEventNames, universeEventNames = NULL, se, ontologyType = c("BP", "MF", "CC"), pAdjustMethod = c("BH", "holm", "hochberg", "hommel", "bonferroni", "BY", "fdr", "none"), ontologyRef = NULL, ... ) goGenes( enrichedGenes, universeGenes = NULL, ontologyRef, ontologyType = c("BP", "MF", "CC"), pAdjustMethod = c("BH", "holm", "hochberg", "hommel", "bonferroni", "BY", "fdr", "none"), ... ) extract_gene_ids_for_GO(enrichedEventNames, universeEventNames = NULL, se) subset_EventNames_by_GO(EventNames, go_id, se) plotGO( res_go = NULL, plot_x = c("log10FDR", "foldEnrichment", "nGenes"), plot_size = c("nGenes", "foldEnrichment", "log10FDR"), plot_color = c("foldEnrichment", "nGenes", "log10FDR"), filter_n_terms = 20, filter_padj = 1, filter_pvalue = 1, trim_go_term = 50 )
enrichedEventNames |
A vector of |
universeEventNames |
A vector of |
se |
The NxtSE object containing the GO reference and the |
ontologyType |
One of either |
pAdjustMethod |
The method for p-value adjustment for FDR.
See |
ontologyRef |
A valid gene ontology reference. This can be generated
either using |
... |
Additional arguments to be passed to |
enrichedGenes |
A vector of |
universeGenes |
(default |
EventNames , go_id
|
In |
res_go |
For |
plot_x , plot_size , plot_color
|
What parameters should be plotted on
the x-axis, bubble-size, or bubble-color? Valid options are
|
filter_n_terms |
(default |
filter_padj , filter_pvalue
|
Whether given GO results should be filtered by adjusted p value (FDR) or nominal p value, respectively, prior to plot |
trim_go_term |
(default |
Users can perform GO analysis using either the GO annotation compiled via
building the SpliceWiz reference using buildRef()
, or via a
custom-supplied gene ontology annotation. This is done by
supplying their own GO annotations as an argument to ontologyRef
. This
should be coerceable to a data.frame
containing the following columns:
gene_id
Gene ID's matching that used by the SpliceWiz reference
go_id
Gene ontology ID terms, of the form GO:XXXXXXX
For goASE()
and goGenes()
, a data table containing the following:
go_id: Gene ontology ID
go_term: Gene ontology term
pval: Raw p values
padj: Adjusted p values
overlap: Number of enriched genes (of enriched ASEs)
size: Number of background genes (of background ASEs)
overlapGenes: A list of gene_id
's from genes of enriched ASEs
expected: The number of overlap genes expected by random
For extract_gene_ids_for_GO()
, a list containing the following:
genes: A vector of enriched gene_id
s
universe: A vector of background gene_id
s
For subset_EventNames_by_GO()
, a vector of all ASE EventName
s belonging
to the given gene ontology go_id
goASE()
: Performs over-representation gene ontology
analysis using a given list of enriched / background ASEs
goGenes()
: Performs GO analysis given the set of
enriched and (optionally) the background (universe) genes.
extract_gene_ids_for_GO()
: Produces a list containing enriched and
universe gene_id
s of given enriched and background ASE EventName
s
subset_EventNames_by_GO()
: Returns a list of ASEs enriched in a given
gene ontology category
plotGO()
: Produces a lollipop plot based on the
given gene ontology results object
Build-Reference-methods on how to generate gene ontology annotations
# Generate example reference with `Homo sapiens` gene ontology ref_path <- file.path(tempdir(), "Reference_withGO") buildRef( reference_path = ref_path, fasta = chrZ_genome(), gtf = chrZ_gtf(), ontologySpecies = "Homo sapiens" ) # Perform GO analysis using first 1000 genes ontology <- viewGO(ref_path) allGenes <- sort(unique(ontology$gene_id)) exampleGeneID <- allGenes[1:1000] exampleBkgdID <- allGenes go_df <- goGenes( enrichedGenes = exampleGeneID, universeGenes = exampleBkgdID, ontologyRef = ontology ) # Plots the top 12 GO terms plotGO(go_df, filter_n_terms = 12) # Below example code of how to use output of differential ASEs for GO analysis # It will not work with the example dataset because the reference must be # either human / mouse, or a valid `ontologySpecies` given to buildRef() # We hope the example code is simple enough to understand for users to adapt # to their own workflows. ## Not run: se <- SpliceWiz_example_NxtSE(novelSplicing = TRUE) colData(se)$treatment <- rep(c("A", "B"), each = 3) require("limma") res_limma <- ASE_limma(se, "treatment", "A", "B") # Perform gene ontology analysis of the first 10 differential ASEs go_df <- goASE( enrichedEventNames = res_limma$EventName[1:10], universeEventNames = res_limma$EventName, se = se ) # Return a list of all ASEs belonging to the top enriched category GOsubset_EventName <- subset_EventNames_by_GO( EventNames = res_limma$EventName, go_id = go_df$go_id[1], se = se ) # Return a list of all ASEs belonging to the top enriched category. # - typically used if one wishes to export `gene_id` for use in other gene # ontology tools gene_id_list <- extract_gene_ids_for_GO( enrichedEventNames = res_limma$EventName[1:10], universeEventNames = res_limma$EventName, se = se ) ## End(Not run)
# Generate example reference with `Homo sapiens` gene ontology ref_path <- file.path(tempdir(), "Reference_withGO") buildRef( reference_path = ref_path, fasta = chrZ_genome(), gtf = chrZ_gtf(), ontologySpecies = "Homo sapiens" ) # Perform GO analysis using first 1000 genes ontology <- viewGO(ref_path) allGenes <- sort(unique(ontology$gene_id)) exampleGeneID <- allGenes[1:1000] exampleBkgdID <- allGenes go_df <- goGenes( enrichedGenes = exampleGeneID, universeGenes = exampleBkgdID, ontologyRef = ontology ) # Plots the top 12 GO terms plotGO(go_df, filter_n_terms = 12) # Below example code of how to use output of differential ASEs for GO analysis # It will not work with the example dataset because the reference must be # either human / mouse, or a valid `ontologySpecies` given to buildRef() # We hope the example code is simple enough to understand for users to adapt # to their own workflows. ## Not run: se <- SpliceWiz_example_NxtSE(novelSplicing = TRUE) colData(se)$treatment <- rep(c("A", "B"), each = 3) require("limma") res_limma <- ASE_limma(se, "treatment", "A", "B") # Perform gene ontology analysis of the first 10 differential ASEs go_df <- goASE( enrichedEventNames = res_limma$EventName[1:10], universeEventNames = res_limma$EventName, se = se ) # Return a list of all ASEs belonging to the top enriched category GOsubset_EventName <- subset_EventNames_by_GO( EventNames = res_limma$EventName, go_id = go_df$go_id[1], se = se ) # Return a list of all ASEs belonging to the top enriched category. # - typically used if one wishes to export `gene_id` for use in other gene # ontology tools gene_id_list <- extract_gene_ids_for_GO( enrichedEventNames = res_limma$EventName[1:10], universeEventNames = res_limma$EventName, se = se ) ## End(Not run)
This function launches the SpliceWiz interactive app using Shiny Dashboard
This is (by default) a dialog window within the RStudio application with
the resolution specified by the res
parameter. Alternatively, setting
mode = "browser"
will launch a resizable browser window (using the default
internet browser). The demo mode can be launched by setting demo = TRUE
.
See the SpliceWiz Quick-Start for a guide
to using the SpliceWiz GUI.
spliceWiz( mode = c("dialog", "browser"), res = c("1080p", "720p", "960p", "1440p"), demo = FALSE )
spliceWiz( mode = c("dialog", "browser"), res = c("1080p", "720p", "960p", "1440p"), demo = FALSE )
mode |
(default |
res |
(default "1080p") Sets width and height of the app to pre-defined dimensions. Possible options are "720p, "960p", "1080p", "1440p", which specifies the height of the app. All are displayed in aspect ratio 16x9 |
demo |
(default FALSE) If set to |
Runs an interactive shinydashboard SpliceWiz app with the specified mode.
spliceWiz()
: Launches the SpliceWiz GUI
if(interactive()) { # Launches interactive ShinyDashboard SpliceWiz app as fixed-size dialog box # 1080p = 1920 x 1080 pixels spliceWiz(mode = "dialog", res = "1080p") # Launches interactive ShinyDashboard SpliceWiz app as browser window spliceWiz(mode = "browser") }
if(interactive()) { # Launches interactive ShinyDashboard SpliceWiz app as fixed-size dialog box # 1080p = 1920 x 1080 pixels spliceWiz(mode = "dialog", res = "1080p") # Launches interactive ShinyDashboard SpliceWiz app as browser window spliceWiz(mode = "browser") }
This function takes the path of a possible COV file and checks whether its format complies with that of the COV format defined by this package.
isCOV(coverage_files)
isCOV(coverage_files)
coverage_files |
A vector containing the file names of files to be checked |
COV files are BGZF-compressed files. The first 4 bytes of the file must always be 'COV\1', distinguishing it from BAM or other files in BGZF format. This function checks whether the given file complies with this.
TRUE
if all files are valid COV files. FALSE
otherwise
se <- SpliceWiz_example_NxtSE() cov_files <- covfile(se) isCOV(cov_files) # returns true if these are true COV files
se <- SpliceWiz_example_NxtSE() cov_files <- covfile(se) isCOV(cov_files) # returns true if these are true COV files
makeMatrix()
constructs a matrix of PSI values of the given alternative
splicing events (ASEs).makeMeanPSI()
constructs a table of "average" PSI values, with samples
grouped by a number of given conditions (e.g. "group A" and "group B") of a
given condition category (e.g. condition "treatment").
See details below.
makeMatrix( se, event_list, sample_list = colnames(se), method = c("PSI", "logit", "Z-score"), depth_threshold = 10, logit_max = 5, na.percent.max = 0.1 ) makeMeanPSI( se, event_list = rownames(se), condition, conditionList, depth_threshold = 10, logit_max = 10 )
makeMatrix( se, event_list, sample_list = colnames(se), method = c("PSI", "logit", "Z-score"), depth_threshold = 10, logit_max = 5, na.percent.max = 0.1 ) makeMeanPSI( se, event_list = rownames(se), condition, conditionList, depth_threshold = 10, logit_max = 10 )
se |
|
event_list |
A character vector containing the names of ASE events
(as given by the |
sample_list |
(default = |
method |
In |
depth_threshold |
(default = 10) Samples with the number of reads supporting either included or excluded isoforms below this values are excluded |
logit_max |
PSI values close to 0 or 1 are rounded up/down
to |
na.percent.max |
(default = 0.1) The maximum proportion of values in
the given dataset that were transformed to |
condition |
The name of the column containing the condition values in
|
conditionList |
A list (or vector) of condition values of which to calculate mean PSIs |
Note that this function takes the geometric mean of PSI, by first converting all values to logit(PSI), taking the average logit(PSI) values of each condition, and then converting back to PSI using inverse logit.
Samples with low splicing coverage (either due to insufficient sequencing
depth or low gene expression) are excluded from calculation of mean PSIs.
The threshold can be set using depth_threshold
. Excluding these samples is
appropriate because the uncertainty of PSI is high when the total included /
excluded count is low. Note that events where all samples in a condition is
excluded will return a value of NaN
.
Using logit-transformed PSI values is appropriate because PSI values are
bound to the (0,1) interval, and are often thought to be beta-distributed.
The link function often used with beta-distributed models is the logit
function, which is defined as logit(x) = function(x) log(x / (1 - x))
,
and is equivalent to stats::qlogis. Its inverse is equivalent to
stats::plogis.
Users wishing to calculate arithmetic means of PSI are advised to use makeMatrix, followed by rowMeans on subsetted sample columns.
For makeMatrix
: A matrix of PSI (or alternate) values, with
columns as samples and rows as ASE events.
For makeMeanPSI
: A 3 column data frame, with the first column containing
event_list
list of ASE events, and the last 2 columns containing the
average PSI values of the nominator and denominator conditions.
makeMatrix()
: constructs a matrix of PSI values of the given
alternative splicing events (ASEs)
makeMeanPSI()
: constructs a table of "average" PSI values
se <- SpliceWiz_example_NxtSE() colData(se)$treatment <- rep(c("A", "B"), each = 3) event_list <- rowData(se)$EventName mat <- makeMatrix(se, event_list[1:10]) diag_values <- makeMeanPSI(se, event_list, condition = "treatment", conditionList = list("A", "B") )
se <- SpliceWiz_example_NxtSE() colData(se)$treatment <- rep(c("A", "B"), each = 3) event_list <- rowData(se)$EventName mat <- makeMatrix(se, event_list[1:10]) diag_values <- makeMeanPSI(se, event_list, condition = "treatment", conditionList = list("A", "B") )
Creates a NxtSE object from the data (that was collated using collateData). This object is used for downstream differential analysis of IR and alternative splicing events using ASE-methods, data generation for visualization of scatter plots and heatmaps via make_plot_data methods, and coverage visualisation using plotCoverage
makeSE( collate_path, colData, RemoveOverlapping = TRUE, realize = FALSE, verbose = TRUE )
makeSE( collate_path, colData, RemoveOverlapping = TRUE, realize = FALSE, verbose = TRUE )
collate_path |
(Required) The output path of collateData pointing to the collated data |
colData |
(Optional) A data frame containing the sample annotation
information. The first column must contain the sample names.
Omit |
RemoveOverlapping |
(default = |
realize |
(default = |
verbose |
(default = |
makeSE
retrieves the data collated by
collateData, and initialises a NxtSE object. It references
the required on-disk assay data using DelayedArrays, thereby utilising
'on-disk' memory to conserve memory usage.
For extremely large datasets, loading the entire data into memory may consume
too much memory. In such cases, make a subset of the NxtSE
object (e.g. subset by samples) before loading the data into memory (RAM)
using realize_NxtSE. Alternatively supply a data frame to the colData
parameter of the makeSE()
function. Only samples listed in the first column
of the colData
data frame will be imported into the NxtSE
object.
It should be noted that downstream applications of SpliceWiz, including ASE-methods, plotCoverage, are much faster if the NxtSE is realized. It is recommended to realize the NxtSE object before extensive usage.
If COV files assigned via collateData have been moved relative to the
collate_path
, the created NxtSE object will not be linked to
any COV files and plotCoverage cannot be used. To reassign these
files, a vector of file paths corresponding to all the COV files of the data
set can be assigned using covfile(se) <- vector_of_cov_files
. See
the example below for details.
If RemoveOverlapping = TRUE
, makeSE
will remove introns that overlap
other introns with higher junction read counts in the dataset. This means
that SpliceWiz will assess a set of non-overlapping introns which belong
to likely major isoforms, ensuring that overlapping
IR events are not 'double-counted'.
NB: Since version 1.3.4, SpliceWiz has improved the algorithm of generating
the set of non-overlapping introns (prior versions appear to generate
sets of introns that still overlap). To use the prior algorithm for
compatibility with prior analysis, set RemoveOverlapping = FALSE
.
A NxtSE object containing the compiled data in
DelayedArrays (or as matrices if realize = TRUE
), pointing to the assay
data contained in the given collate_path
# The following code can be used to reproduce the NxtSE object # that can be fetched with SpliceWiz_example_NxtSE() buildRef( reference_path = file.path(tempdir(), "Reference"), fasta = chrZ_genome(), gtf = chrZ_gtf() ) bams <- SpliceWiz_example_bams() processBAM(bams$path, bams$sample, reference_path = file.path(tempdir(), "Reference"), output_path = file.path(tempdir(), "SpliceWiz_Output") ) expr <- findSpliceWizOutput(file.path(tempdir(), "SpliceWiz_Output")) collateData(expr, reference_path = file.path(tempdir(), "Reference"), output_path = file.path(tempdir(), "Collated_output") ) se <- makeSE(collate_path = file.path(tempdir(), "Collated_output")) # "Realize" NxtSE object to load all H5 assays into memory: se <- realize_NxtSE(se) # If COV files have been removed since the last call to collateData() # reassign them to the NxtSE object, for example: covfile_path <- system.file("extdata", package = "SpliceWiz") covfile_df <- findSamples(covfile_path, ".cov") covfile(se) <- covfile_df$path
# The following code can be used to reproduce the NxtSE object # that can be fetched with SpliceWiz_example_NxtSE() buildRef( reference_path = file.path(tempdir(), "Reference"), fasta = chrZ_genome(), gtf = chrZ_gtf() ) bams <- SpliceWiz_example_bams() processBAM(bams$path, bams$sample, reference_path = file.path(tempdir(), "Reference"), output_path = file.path(tempdir(), "SpliceWiz_Output") ) expr <- findSpliceWizOutput(file.path(tempdir(), "SpliceWiz_Output")) collateData(expr, reference_path = file.path(tempdir(), "Reference"), output_path = file.path(tempdir(), "Collated_output") ) se <- makeSE(collate_path = file.path(tempdir(), "Collated_output")) # "Realize" NxtSE object to load all H5 assays into memory: se <- realize_NxtSE(se) # If COV files have been removed since the last call to collateData() # reassign them to the NxtSE object, for example: covfile_path <- system.file("extdata", package = "SpliceWiz") covfile_df <- findSamples(covfile_path, ".cov") covfile(se) <- covfile_df$path
These functions empirically calculate low-mappability (Mappability Exclusion) regions using the given reference. A splice-aware alignment software capable of aligning reads to the genome is required. See details and examples below.
generateSyntheticReads( reference_path, read_len = 70, read_stride = 10, error_pos = 35, verbose = TRUE, alt_fasta_file ) calculateMappability( reference_path, aligned_bam = file.path(reference_path, "Mappability", "Aligned.out.bam"), threshold = 4, n_threads = 1 )
generateSyntheticReads( reference_path, read_len = 70, read_stride = 10, error_pos = 35, verbose = TRUE, alt_fasta_file ) calculateMappability( reference_path, aligned_bam = file.path(reference_path, "Mappability", "Aligned.out.bam"), threshold = 4, n_threads = 1 )
reference_path |
The directory of the reference prepared by getResources |
read_len |
The nucleotide length of the synthetic reads |
read_stride |
The nucleotide distance between adjacent synthetic reads |
error_pos |
The position of the procedurally-generated nucleotide error from the start of each synthetic reads |
verbose |
Whether additional status messages are shown |
alt_fasta_file |
(Optional) The path to the user-supplied genome fasta
file, if different to that found inside the |
aligned_bam |
The BAM file of alignment of the synthetic reads generated
by |
threshold |
Genomic regions with this alignment read depth (or below) in the aligned synthetic read BAM are defined as low mappability regions. |
n_threads |
The number of threads used to calculate mappability exclusion regions from aligned bam file of synthetic reads. |
Creating a Mappability Exclusion BED file is a three-step process.
First, using generateSyntheticReads()
, synthetic reads are systematically
generated using the given genome contained within reference_path
, prepared
via getResources. Alternatively, use alt_fasta_file
to set the genome
sequence if this is different to that prepared by getResources
or if
getResources
is not yet run.
Second, an aligner such as STAR (preferably the same aligner used for the subsequent RNA-seq experiment) is required to align these reads to the source genome. Poorly mapped regions of the genome will be reflected by regions of low coverage depth.
Finally, the BAM file containing the aligned reads is analysed
using calculateMappability()
, to identify
low-mappability regions to compile the Mappability Exclusion BED file.
It is recommended to leave all parameters to their default settings. Regular
users should only specify reference_path
, aligned_bam
and n_threads
,
as required.
NB: STAR_mappability runs all 3 steps required, using the STAR
aligner.
This only works in systems where STAR
is installed.
NB2: buildFullRef builds the STAR reference, then calculates mappability. It then uses the calculated mappability regions to build the SpliceWiz reference.
NB3: In systems where STAR
is not available, consider using HISAT2 or
Rsubread. A working example using Rsubread is shown below.
For generateSyntheticReads
: writes Reads.fa
to the Mappability
subdirectory inside the given reference_path
.
For calculateMappability
: writes a gzipped BED file
named MappabilityExclusion.bed.gz
to the Mappability
subdirectory
inside reference_path
.
This BED file is automatically used by buildRef if its
MappabilityRef
parameter is not specified.
generateSyntheticReads()
: Generates synthetic reads from a
genome FASTA file, for mappability calculations.
calculateMappability()
: Generate a BED file defining
low mappability regions, using reads generated by
generateSyntheticReads()
, aligned to the genome.
# (1a) Creates genome resource files ref_path <- file.path(tempdir(), "refWithMapExcl") getResources( reference_path = ref_path, fasta = chrZ_genome(), gtf = chrZ_gtf() ) # (1b) Systematically generate reads based on the example genome: generateSyntheticReads( reference_path = ref_path ) ## Not run: # (2) Align the generated reads using Rsubread: # (2a) Build the Rsubread genome index: subreadIndexPath <- file.path(ref_path, "Rsubread") if(!dir.exists(subreadIndexPath)) dir.create(subreadIndexPath) Rsubread::buildindex( basename = file.path(subreadIndexPath, "reference_index"), reference = chrZ_genome() ) # (2b) Align the synthetic reads using Rsubread::subjunc() Rsubread::subjunc( index = file.path(subreadIndexPath, "reference_index"), readfile1 = file.path(ref_path, "Mappability", "Reads.fa"), output_file = file.path(ref_path, "Mappability", "AlignedReads.bam"), useAnnotation = TRUE, annot.ext = chrZ_gtf(), isGTF = TRUE ) # (3) Analyse the aligned reads in the BAM file for low-mappability regions: calculateMappability( reference_path = ref_path, aligned_bam = file.path(ref_path, "Mappability", "AlignedReads.bam") ) # (4) Build the example reference using the calculated Mappability Exclusions buildRef(ref_path) # NB the default is to search for the BED file generated by # `calculateMappability()` in the given reference_path ## End(Not run)
# (1a) Creates genome resource files ref_path <- file.path(tempdir(), "refWithMapExcl") getResources( reference_path = ref_path, fasta = chrZ_genome(), gtf = chrZ_gtf() ) # (1b) Systematically generate reads based on the example genome: generateSyntheticReads( reference_path = ref_path ) ## Not run: # (2) Align the generated reads using Rsubread: # (2a) Build the Rsubread genome index: subreadIndexPath <- file.path(ref_path, "Rsubread") if(!dir.exists(subreadIndexPath)) dir.create(subreadIndexPath) Rsubread::buildindex( basename = file.path(subreadIndexPath, "reference_index"), reference = chrZ_genome() ) # (2b) Align the synthetic reads using Rsubread::subjunc() Rsubread::subjunc( index = file.path(subreadIndexPath, "reference_index"), readfile1 = file.path(ref_path, "Mappability", "Reads.fa"), output_file = file.path(ref_path, "Mappability", "AlignedReads.bam"), useAnnotation = TRUE, annot.ext = chrZ_gtf(), isGTF = TRUE ) # (3) Analyse the aligned reads in the BAM file for low-mappability regions: calculateMappability( reference_path = ref_path, aligned_bam = file.path(ref_path, "Mappability", "AlignedReads.bam") ) # (4) Build the example reference using the calculated Mappability Exclusions buildRef(ref_path) # NB the default is to search for the BED file generated by # `calculateMappability()` in the given reference_path ## End(Not run)
The NxtSE class inherits from the SummarizedExperiment class and is constructed using makeSE. NxtSE extends SummarizedExperiment by housing additional assays pertaining to IR and splice junction counts.
NxtSE(...) ## S4 method for signature 'NxtSE' up_inc(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' down_inc(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' up_exc(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' down_exc(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' covfile(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' sampleQC(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' sourcePath(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' row_gr(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' ref(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' junc_PSI(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' junc_counts(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' junc_counts_uns(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' junc_gr(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' update_NxtSE(x, ...) ## S4 method for signature 'NxtSE' realize_NxtSE(x, includeJunctions = FALSE, withDimnames = TRUE, ...) ## S4 replacement method for signature 'NxtSE' up_inc(x, withDimnames = TRUE) <- value ## S4 replacement method for signature 'NxtSE' down_inc(x, withDimnames = TRUE) <- value ## S4 replacement method for signature 'NxtSE' up_exc(x, withDimnames = TRUE) <- value ## S4 replacement method for signature 'NxtSE' down_exc(x, withDimnames = TRUE) <- value ## S4 replacement method for signature 'NxtSE' covfile(x, withDimnames = TRUE) <- value ## S4 replacement method for signature 'NxtSE' sampleQC(x, withDimnames = TRUE) <- value ## S4 method for signature 'NxtSE,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] ## S4 replacement method for signature 'NxtSE,ANY,ANY,NxtSE' x[i, j, ...] <- value ## S4 method for signature 'NxtSE' cbind(..., deparse.level = 1) ## S4 method for signature 'NxtSE' rbind(..., deparse.level = 1)
NxtSE(...) ## S4 method for signature 'NxtSE' up_inc(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' down_inc(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' up_exc(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' down_exc(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' covfile(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' sampleQC(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' sourcePath(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' row_gr(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' ref(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' junc_PSI(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' junc_counts(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' junc_counts_uns(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' junc_gr(x, withDimnames = TRUE, ...) ## S4 method for signature 'NxtSE' update_NxtSE(x, ...) ## S4 method for signature 'NxtSE' realize_NxtSE(x, includeJunctions = FALSE, withDimnames = TRUE, ...) ## S4 replacement method for signature 'NxtSE' up_inc(x, withDimnames = TRUE) <- value ## S4 replacement method for signature 'NxtSE' down_inc(x, withDimnames = TRUE) <- value ## S4 replacement method for signature 'NxtSE' up_exc(x, withDimnames = TRUE) <- value ## S4 replacement method for signature 'NxtSE' down_exc(x, withDimnames = TRUE) <- value ## S4 replacement method for signature 'NxtSE' covfile(x, withDimnames = TRUE) <- value ## S4 replacement method for signature 'NxtSE' sampleQC(x, withDimnames = TRUE) <- value ## S4 method for signature 'NxtSE,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] ## S4 replacement method for signature 'NxtSE,ANY,ANY,NxtSE' x[i, j, ...] <- value ## S4 method for signature 'NxtSE' cbind(..., deparse.level = 1) ## S4 method for signature 'NxtSE' rbind(..., deparse.level = 1)
... |
In NxtSE(), additional arguments to be passed onto SummarizedExperiment() |
x |
A NxtSE object |
withDimnames |
(default TRUE) Whether exported assays should be supplied with row and column names of the NxtSE object. See SummarizedExperiment |
includeJunctions |
When realizing a NxtSE object, include whether junction counts and PSIs should be realized into memory. Not recommended for general use, as they are only used for coverage plots. |
value |
The value to replace. Must be a matrix for the up_inc<-, down_inc<-, up_exc<- and down_exc<- replacers, and a character vector for covfile<- |
i , j
|
Row and column subscripts to subset a NxtSE object. |
drop |
A logical(1), ignored by these methods. |
deparse.level |
See base::cbind for a description of this argument. |
See Functions section (below) for details
NxtSE()
: Constructor function for NxtSE; akin to
SummarizedExperiment(...)
up_inc(NxtSE)
: Gets upstream included events (SE/MXE), or
upstream exon-intron spanning reads (IR)
down_inc(NxtSE)
: Gets downstream included events (SE/MXE), or
downstream exon-intron spanning reads (IR)
up_exc(NxtSE)
: Gets upstream excluded events (MXE only)
down_exc(NxtSE)
: Gets downstream excluded events (MXE only)
covfile(NxtSE)
: Gets a named vector with
the paths to the corresponding COV files
sampleQC(NxtSE)
: Gets a data frame with the QC parameters
of the samples
sourcePath(NxtSE)
: Retrieves the directory path containing the source
data for this NxtSE object.
row_gr(NxtSE)
: Retrieves a GRanges object representing the genomic
spans of the ASEs (EventRegion as GRanges)
ref(NxtSE)
: Retrieves a list of annotation data associated
with this NxtSE object; primarily used in plotCoverage()
junc_PSI(NxtSE)
: Getter for junction PSI DelayedMatrix;
primarily used in plotCoverage()
junc_counts(NxtSE)
: Getter for junction counts DelayedMatrix;
primarily used in plotCoverage()
junc_counts_uns(NxtSE)
: Getter for (unstranded) junction counts
DelayedMatrix; primarily used in plotCoverage()
junc_gr(NxtSE)
: Getter for junction GenomicRanges coordinates;
primarily used in plotCoverage()
update_NxtSE(NxtSE)
: Updates NxtSE object to the latest version.
realize_NxtSE(NxtSE)
: Converts all DelayedMatrix assays as matrices
(i.e. performs all delayed calculation and loads resulting object
to RAM)
up_inc(NxtSE) <- value
: Sets upstream included events (SE/MXE), or
upstream exon-intron spanning reads (IR)
down_inc(NxtSE) <- value
: Sets downstream included events (SE/MXE), or
downstream exon-intron spanning reads (IR)
up_exc(NxtSE) <- value
: Sets upstream excluded events (MXE only)
down_exc(NxtSE) <- value
: Sets downstream excluded events (MXE only)
covfile(NxtSE) <- value
: Sets the paths to the corresponding COV files
sampleQC(NxtSE) <- value
: Sets the values in the data frame containing
sample QC
x[i
: Subsets a NxtSE object
`[`(x = NxtSE, i = ANY, j = ANY) <- value
: Sets a subsetted NxtSE object
cbind(NxtSE)
: Combines two NxtSE objects (by samples - columns)
rbind(NxtSE)
: Combines two NxtSE objects (by AS/IR events - rows)
# Run the full pipeline to generate a NxtSE object: buildRef( reference_path = file.path(tempdir(), "Reference"), fasta = chrZ_genome(), gtf = chrZ_gtf() ) bams <- SpliceWiz_example_bams() processBAM(bams$path, bams$sample, reference_path = file.path(tempdir(), "Reference"), output_path = file.path(tempdir(), "SpliceWiz_Output") ) expr <- findSpliceWizOutput(file.path(tempdir(), "SpliceWiz_Output")) collateData(expr, reference_path = file.path(tempdir(), "Reference"), output_path = file.path(tempdir(), "Collated_output") ) se <- makeSE(collate_path = file.path(tempdir(), "Collated_output")) # Coerce NxtSE -> SummarizedExperiment se_raw <- as(se, "SummarizedExperiment") # Coerce SummarizedExperiment -> NxtSE se_NxtSE <- as(se_raw, "NxtSE") identical(se, se_NxtSE) # Returns TRUE # Update NxtSE object to the latest version # - useful if an NxtSE object made with old SpliceWiz version # - was stored as an RDS obejct se <- update_NxtSE(se) # Get directory path of NxtSE (i.e., collate_path) sourcePath(se) # Get Main Assay Counts assay(se, "Included") # Junction (or IR depth) counts for included isoform assay(se, "Excluded") # Junction (or IR depth) counts for excluded isoform # Get Auxiliary Counts (for filter use only) assay(se, "Coverage") # Participation ratio (intron coverage for IR/RI) assay(se, "minDepth") # SpliceOver junction counts (Intron Depths for IR/RI) assay(se, "Depth") # Sum of intron depth and SpliceOver (used for # coverage normalization factor # Get Junction reads of SE / MXE and spans-reads of IR events up_inc(se) # Upstream included junction counts (IR/MXE/SE/RI) down_inc(se) # Downstream included junction counts (IR/MXE/SE/RI) up_exc(se) # Upstream excluded junction counts (MXE only) down_exc(se) # Downstream excluded junction counts (MXE only) # Get Junction counts junc_counts(se) # stranded (if RNA-seq is auto-detected as stranded) junc_counts_uns(se) # unstranded (sum of junction reads from both strand) junc_PSI(se) # PSI of junction (as proportion of SpliceOver metric) # Get Junction GRanges object junc_gr(se) # Get EventRegion as GRanges object row_gr(se) # Get list of available coverage files covfile(se) # Get sample QC information sampleQC(se) # Get resource data (used internally for plotCoverage()) cov_data <- ref(se) names(cov_data) # Subset functions se_by_samples <- se[,1:3] se_by_events <- se[1:10,] se_by_rowData <- subset(se, EventType == "IR") # Cbind (bind event_identical NxtSE by samples) se_by_samples_1 <- se[,1:3] se_by_samples_2 <- se[,4:6] se_cbind <- cbind(se_by_samples_1, se_by_samples_2) identical(se, se_cbind) # should return TRUE # Rbind (bind sample_identical NxtSE by events) se_IR <- subset(se, EventType == "IR") se_SE <- subset(se, EventType == "SE") se_IRSE <- rbind(se_IR, se_SE) identical(se_IRSE, subset(se, EventType %in% c("IR", "SE"))) # TRUE # Convert HDF5-based NxtSE to in-memory se # makeSE() creates a HDF5-based NxtSE object where all assay data is stored # as an h5 file instead of in-memory. All operations are performed as # delayed operations as per DelayedArray package. # To realize the NxtSE object as an in-memory object, use: se_real <- realize_NxtSE(se) identical(se, se_real) # should return FALSE # To check the difference, run: class(up_inc(se)) class(up_inc(se_real))
# Run the full pipeline to generate a NxtSE object: buildRef( reference_path = file.path(tempdir(), "Reference"), fasta = chrZ_genome(), gtf = chrZ_gtf() ) bams <- SpliceWiz_example_bams() processBAM(bams$path, bams$sample, reference_path = file.path(tempdir(), "Reference"), output_path = file.path(tempdir(), "SpliceWiz_Output") ) expr <- findSpliceWizOutput(file.path(tempdir(), "SpliceWiz_Output")) collateData(expr, reference_path = file.path(tempdir(), "Reference"), output_path = file.path(tempdir(), "Collated_output") ) se <- makeSE(collate_path = file.path(tempdir(), "Collated_output")) # Coerce NxtSE -> SummarizedExperiment se_raw <- as(se, "SummarizedExperiment") # Coerce SummarizedExperiment -> NxtSE se_NxtSE <- as(se_raw, "NxtSE") identical(se, se_NxtSE) # Returns TRUE # Update NxtSE object to the latest version # - useful if an NxtSE object made with old SpliceWiz version # - was stored as an RDS obejct se <- update_NxtSE(se) # Get directory path of NxtSE (i.e., collate_path) sourcePath(se) # Get Main Assay Counts assay(se, "Included") # Junction (or IR depth) counts for included isoform assay(se, "Excluded") # Junction (or IR depth) counts for excluded isoform # Get Auxiliary Counts (for filter use only) assay(se, "Coverage") # Participation ratio (intron coverage for IR/RI) assay(se, "minDepth") # SpliceOver junction counts (Intron Depths for IR/RI) assay(se, "Depth") # Sum of intron depth and SpliceOver (used for # coverage normalization factor # Get Junction reads of SE / MXE and spans-reads of IR events up_inc(se) # Upstream included junction counts (IR/MXE/SE/RI) down_inc(se) # Downstream included junction counts (IR/MXE/SE/RI) up_exc(se) # Upstream excluded junction counts (MXE only) down_exc(se) # Downstream excluded junction counts (MXE only) # Get Junction counts junc_counts(se) # stranded (if RNA-seq is auto-detected as stranded) junc_counts_uns(se) # unstranded (sum of junction reads from both strand) junc_PSI(se) # PSI of junction (as proportion of SpliceOver metric) # Get Junction GRanges object junc_gr(se) # Get EventRegion as GRanges object row_gr(se) # Get list of available coverage files covfile(se) # Get sample QC information sampleQC(se) # Get resource data (used internally for plotCoverage()) cov_data <- ref(se) names(cov_data) # Subset functions se_by_samples <- se[,1:3] se_by_events <- se[1:10,] se_by_rowData <- subset(se, EventType == "IR") # Cbind (bind event_identical NxtSE by samples) se_by_samples_1 <- se[,1:3] se_by_samples_2 <- se[,4:6] se_cbind <- cbind(se_by_samples_1, se_by_samples_2) identical(se, se_cbind) # should return TRUE # Rbind (bind sample_identical NxtSE by events) se_IR <- subset(se, EventType == "IR") se_SE <- subset(se, EventType == "SE") se_IRSE <- rbind(se_IR, se_SE) identical(se_IRSE, subset(se, EventType %in% c("IR", "SE"))) # TRUE # Convert HDF5-based NxtSE to in-memory se # makeSE() creates a HDF5-based NxtSE object where all assay data is stored # as an h5 file instead of in-memory. All operations are performed as # delayed operations as per DelayedArray package. # To realize the NxtSE object as an in-memory object, use: se_real <- realize_NxtSE(se) identical(se, se_real) # should return FALSE # To check the difference, run: class(up_inc(se)) class(up_inc(se_real))
Generate plotly / ggplot RNA-seq genome and coverage plots from command line. Note that these are legacy functions. More expansive functionality is available using getCoverageData / getPlotObject / plotView functions.
plotCoverage( se, Event, Gene, seqname, start, end, coordinates, strand = c("*", "+", "-"), zoom_factor = 0.2, bases_flanking = 100, tracks, track_names = tracks, condition, ribbon_mode = c("sd", "ci", "sem", "none"), selected_transcripts = "", reverseGenomeCoords = FALSE, plotJunctions = FALSE, junctionThreshold = 0.01, plot_key_isoforms = FALSE, condense_tracks = FALSE, stack_tracks = FALSE, t_test = FALSE, norm_event, usePlotly = FALSE ) plotGenome( se, reference_path, Event, Gene, seqname, start, end, coordinates, zoom_factor = 0.2, bases_flanking = 100, reverseGenomeCoords = FALSE, condense_tracks = FALSE, selected_transcripts = "", plot_key_isoforms = FALSE, usePlotly = FALSE )
plotCoverage( se, Event, Gene, seqname, start, end, coordinates, strand = c("*", "+", "-"), zoom_factor = 0.2, bases_flanking = 100, tracks, track_names = tracks, condition, ribbon_mode = c("sd", "ci", "sem", "none"), selected_transcripts = "", reverseGenomeCoords = FALSE, plotJunctions = FALSE, junctionThreshold = 0.01, plot_key_isoforms = FALSE, condense_tracks = FALSE, stack_tracks = FALSE, t_test = FALSE, norm_event, usePlotly = FALSE ) plotGenome( se, reference_path, Event, Gene, seqname, start, end, coordinates, zoom_factor = 0.2, bases_flanking = 100, reverseGenomeCoords = FALSE, condense_tracks = FALSE, selected_transcripts = "", plot_key_isoforms = FALSE, usePlotly = FALSE )
se |
A NxtSE object, created by makeSE.
COV files must be linked to the NxtSE object. To do this, see the example
in makeSE. Required by |
Event |
The |
Gene |
Whether to use the range for the given |
seqname , start , end
|
The chromosome (string) and genomic |
coordinates |
A string specifying genomic coordinates can be given
instead of |
strand |
Whether to show coverage of both strands "*" (default), or from the "+" or "-" strand only. |
zoom_factor |
Zoom out from event. Each level of zoom zooms out by a
factor of 3. E.g. for a query region of chr1:10000-11000, if a
|
bases_flanking |
(Default = |
tracks |
The names of individual samples,
or the names of the different conditions to be plotted. For the latter, set
|
track_names |
The names of the tracks to be displayed. If omitted, the
track_names will default to the input in |
condition |
To display normalised coverage per condition, set this to
the condition category. If omitted, |
ribbon_mode |
(default |
selected_transcripts |
(Optional) A vector containing transcript ID or transcript names of transcripts to be displayed on the gene annotation track. Useful to remove minor isoforms that are not relevant to the samples being displayed. |
reverseGenomeCoords |
(default |
plotJunctions |
(default |
junctionThreshold |
(default |
plot_key_isoforms |
(default |
condense_tracks |
(default |
stack_tracks |
(default |
t_test |
(default |
norm_event |
Whether to normalise by an event different to that given
in "Event". The difference between this and Event is that the genomic
coordinates can be centered around a different |
usePlotly |
If |
reference_path |
The path of the reference generated by
Build-Reference-methods. Required by |
In RNA sequencing, alignments to spliced transcripts will "skip" over genomic regions of introns. This can be illustrated in a plot using a horizontal genomic axis, with the vertical axis representing the number of alignments covering each nucleotide. As a result, the coverage "hills" represent the expression of exons, and "valleys" to introns.
Different alternatively-spliced isoforms thus produce different coverage patterns. The change in the coverage across an alternate exon relative to its constitutively-included flanking exons, for example, represents its alternative inclusion or skipping. Similarly, elevation of intron valleys represent increased intron retention.
With multiple replicates per sample, coverage is dependent on library size and gene expression. To compare alternative splicing ratios, normalisation of the coverage of the alternate exon (or alternatively retained intron) relative to their constitutive flanking exons, is required. There is no established method for this normalisation, and can be confounded in situations where flanking elements are themselves alternatively spliced.
SpliceWiz performs this coverage normalisation using the same method as its
estimate of spliced / intronic transcript abundance using the SpliceOver
method (see details section in collateData). This normalisation can be
applied to correct for library size and gene expression differences between
samples of the same experimental condition. After normalisation, mean and
variance of coverage can be computed as ratios relative to total transcript
abundance. This method can visualise alternatively included genomic regions
including casette exons, alternate splice site usage, and intron retention.
plotCoverage
generates plots showing depth of alignments to
the genomic axis. Plots can be generated for individual samples or samples
grouped by experimental conditions. In the latter, mean and 95% confidence
intervals are shown.
plotGenome
generates genome transcript tracks only. Protein-coding
regions are denoted by thick rectangles, whereas non-protein coding
transcripts or untranslated regions are denoted with thin rectangles.
Introns are denoted as lines.
For plotCoverage
and plotGenome
:
If usePlotly = FALSE
returns a patchwork-assembled static plot
If usePlotly = TRUE
returns a covPlotly object,
which generates a plotly interactive plot when shown using show()
plotCoverage()
: Legacy function - works by internally calling
getCoverageData(), getPlotObject(), then plotView()
plotGenome()
: Legacy function - works by internally calling
getGenomeData(), followed by plotAnnoTrack()
se <- SpliceWiz_example_NxtSE(novelSplicing = TRUE) # Assign annotation of the experimental conditions colData(se)$treatment <- rep(c("A", "B"), each = 3) # Verify that the COV files are linked to the NxtSE object: covfile(se) # Plot the genome track only, with specified gene: plotGenome(se, Gene = "SRSF3") # View the genome track, specifying a genomic region via coordinates: plotGenome(se, coordinates = "chrZ:10000-20000") # Return a list of ggplot and plotly objects, also plotting junction counts plotCoverage( se = se, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3", tracks = colnames(se)[1:4], plotJunctions = TRUE ) # Plot the same, but as a plotly interactive plot if(interactive()) { p <- plotCoverage( se = se, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3", tracks = colnames(se)[1:4], plotJunctions = TRUE, usePlotly = TRUE ) show(p) } # Plot by condition "treatment", including provisional PSIs plotCoverage( se = se, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3", tracks = c("A", "B"), condition = "treatment", plotJunctions = TRUE ) # As above, but stack all traces into the same track # - NB: plotJunctions is disabled when `stack_tracks = TRUE` plotCoverage( se = se, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3", tracks = c("A", "B"), condition = "treatment", stack_tracks = TRUE ) # Plot the above, but unstancked, and with t-test track # - NB: plotJunctions is disabled when `stack_tracks = TRUE` plotCoverage( se = se, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3", tracks = c("A", "B"), condition = "treatment", t_test = TRUE ) # Select only transcripts involved in the selected alternative splicing event plotCoverage( se = se, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3", tracks = colnames(se)[1:4], plot_key_isoforms = TRUE )
se <- SpliceWiz_example_NxtSE(novelSplicing = TRUE) # Assign annotation of the experimental conditions colData(se)$treatment <- rep(c("A", "B"), each = 3) # Verify that the COV files are linked to the NxtSE object: covfile(se) # Plot the genome track only, with specified gene: plotGenome(se, Gene = "SRSF3") # View the genome track, specifying a genomic region via coordinates: plotGenome(se, coordinates = "chrZ:10000-20000") # Return a list of ggplot and plotly objects, also plotting junction counts plotCoverage( se = se, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3", tracks = colnames(se)[1:4], plotJunctions = TRUE ) # Plot the same, but as a plotly interactive plot if(interactive()) { p <- plotCoverage( se = se, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3", tracks = colnames(se)[1:4], plotJunctions = TRUE, usePlotly = TRUE ) show(p) } # Plot by condition "treatment", including provisional PSIs plotCoverage( se = se, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3", tracks = c("A", "B"), condition = "treatment", plotJunctions = TRUE ) # As above, but stack all traces into the same track # - NB: plotJunctions is disabled when `stack_tracks = TRUE` plotCoverage( se = se, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3", tracks = c("A", "B"), condition = "treatment", stack_tracks = TRUE ) # Plot the above, but unstancked, and with t-test track # - NB: plotJunctions is disabled when `stack_tracks = TRUE` plotCoverage( se = se, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3", tracks = c("A", "B"), condition = "treatment", t_test = TRUE ) # Select only transcripts involved in the selected alternative splicing event plotCoverage( se = se, Event = "SE:SRSF3-203-exon4;SRSF3-202-int3", tracks = colnames(se)[1:4], plot_key_isoforms = TRUE )
These function calls the SpliceWiz C++ routine on one or more BAM files.
The routine is an improved version over the original IRFinder, with
OpenMP-based multi-threading and the production of compact "COV" files to
record alignment coverage. A SpliceWiz reference built using
Build-Reference-methods is required.
After processBAM()
is run, users should call
collateData to collate individual outputs into an experiment / dataset.
BAM2COV creates COV files from BAM files without running processBAM()
.
See details for performance info.
BAM2COV( bamfiles = "./Unsorted.bam", sample_names = "sample1", output_path = "./cov_folder", n_threads = 1, useOpenMP = TRUE, overwrite = FALSE, verbose = FALSE, multiRead = FALSE ) processBAM( bamfiles = "./Unsorted.bam", sample_names = "sample1", reference_path = "./Reference", output_path = "./SpliceWiz_Output", n_threads = 1, useOpenMP = TRUE, overwrite = FALSE, run_featureCounts = FALSE, verbose = FALSE, skipCOVfiles = FALSE, multiRead = FALSE )
BAM2COV( bamfiles = "./Unsorted.bam", sample_names = "sample1", output_path = "./cov_folder", n_threads = 1, useOpenMP = TRUE, overwrite = FALSE, verbose = FALSE, multiRead = FALSE ) processBAM( bamfiles = "./Unsorted.bam", sample_names = "sample1", reference_path = "./Reference", output_path = "./SpliceWiz_Output", n_threads = 1, useOpenMP = TRUE, overwrite = FALSE, run_featureCounts = FALSE, verbose = FALSE, skipCOVfiles = FALSE, multiRead = FALSE )
bamfiles |
A vector containing file paths of 1 or more BAM files |
sample_names |
The sample names of the given BAM files. Must
be a vector of the same length as |
output_path |
The output directory of this function |
n_threads |
(default |
useOpenMP |
(default |
overwrite |
(default |
verbose |
(default |
multiRead |
(default |
reference_path |
The directory containing the SpliceWiz reference |
run_featureCounts |
(default |
skipCOVfiles |
(default |
Typical run-times for a 100-million paired-end alignment BAM file takes 10
minutes using a single core. Using 8 threads, the runtime is approximately
2-5 minutes, depending on your system's file input / output speeds.
Approximately 10 Gb of RAM is used when OpenMP is used. If OpenMP
is not used (see below), this memory usage is multiplied across the number
of processor threads (i.e. 40 Gb if n_threads = 4
).
OpenMP is natively available to Linux / Windows compilers, and OpenMP will
be used if useOpenMP
is set to TRUE
, using multiple threads to process
each BAM file. On Macs, if OpenMP is not available at compilation,
BiocParallel will be used, processing BAM files simultaneously,
with one BAM file per thread.
Output will be saved to output_path
. Output files
will be named using the given sample_names
.
For processBAM()
:
sample.txt.gz: The main output file containing the quantitation
of IR and splice junctions, as well as QC information
sample.cov: Contains coverage information in compressed binary. See getCoverage
main.FC.Rds: A single file containing gene counts for the whole dataset
(only if run_featureCounts == TRUE
)
For BAM2COV()
:
sample.cov: Contains coverage information in compressed binary. See getCoverage
BAM2COV()
: Converts BAM files to COV files without running
processBAM()
processBAM()
: Processes BAM files. Requires a
SpliceWiz reference generated by buildRef()
Build-Reference-methods collateData isCOV
# Run BAM2COV, which only produces COV files but does not run `processBAM()`: bams <- SpliceWiz_example_bams() BAM2COV(bams$path, bams$sample, output_path = file.path(tempdir(), "SpliceWiz_Output"), n_threads = 2, overwrite = TRUE ) # Run processBAM(), which produces: # - text output of intron coverage and spliced read counts # - COV files which record read coverages example_ref <- file.path(tempdir(), "Reference") buildRef( reference_path = example_ref, fasta = chrZ_genome(), gtf = chrZ_gtf() ) bams <- SpliceWiz_example_bams() processBAM(bams$path, bams$sample, reference_path = file.path(tempdir(), "Reference"), output_path = file.path(tempdir(), "SpliceWiz_Output"), n_threads = 2 )
# Run BAM2COV, which only produces COV files but does not run `processBAM()`: bams <- SpliceWiz_example_bams() BAM2COV(bams$path, bams$sample, output_path = file.path(tempdir(), "SpliceWiz_Output"), n_threads = 2, overwrite = TRUE ) # Run processBAM(), which produces: # - text output of intron coverage and spliced read counts # - COV files which record read coverages example_ref <- file.path(tempdir(), "Reference") buildRef( reference_path = example_ref, fasta = chrZ_genome(), gtf = chrZ_gtf() ) bams <- SpliceWiz_example_bams() processBAM(bams$path, bams$sample, reference_path = file.path(tempdir(), "Reference"), output_path = file.path(tempdir(), "SpliceWiz_Output"), n_threads = 2 )
These function implements filtering of alternative splicing events, based on customisable criteria. See ASEFilter for details on how to construct SpliceWiz filters
getDefaultFilters() applyFilters(se, filters = getDefaultFilters()) runFilter(se, filterObj)
getDefaultFilters() applyFilters(se, filters = getDefaultFilters()) runFilter(se, filterObj)
se |
the NxtSE object to filter |
filters |
A vector or list of one or more ASEFilter objects. If left blank, the SpliceWiz default filters will be used. |
filterObj |
A single ASEFilter object. |
We highly recommend using the default filters, which are as follows:
(1) Depth filter of 20,
(2) Participation filter requiring 70% coverage in IR events.
(3) Participation filter requiring 40% coverage in MXE, SE, A5SS and A3SS events (i.e. Included + Excluded isoforms must cover at least 40% of all junction events across the given region)
(4) Consistency filter requring log difference of 2 (for skipped exon and mutually exclusive exon events, each junction must comprise at least 1/(2^2) = 1/4 of all reads associated with each isoform). For retained introns, the exon-intron overhangs must not differ by 1/4
(5) Terminus filter: In alternate first exons, the splice junction must not be shared with another transcript for which it is not its first intron. For alternative last exons, the splice junction must not be shared with another transcript for which it is not its last intron
(6) ExclusiveMXE filter: For MXE events, the two alternate casette exons must not overlap in their genomic regions
(7) StrictAltSS filter: For A5SS / A3SS events, the two alternate splice sites must not be interrupted by an intron
In all data-based filters, we require at least 80% samples (pcTRUE = 80
)
to pass this filters from the entire dataset (minCond = -1
).
Threshold depths for Participation filters:
For IR/RI, Participation filter is only applied for IR events
for which the intron depth is above a certain threshold (set by minDepth
).
This avoids the filters running on samples for which there is no IR.
For non-IR ASEs, Participation is only run on events with
splice depth (SpliceOver metric) higher than minDepth
. This avoids filters
running on events with low total participation (i.e., (Inc+Exc)/SpliceOver)
Threshold depths for Consistency filters:
Consistency filters are only applied for events where the sum of
upstream and downstream junction counts surpass a given threshold minDepth
.
This is applied on both included and excluded counts (the latter only
applies to MXE). This avoids consistency filters running on events with
insufficient junction counts (leading to high variance between up/downstream
values).
For an explanation of the various parameters mentioned here, see ASEFilter
For runFilter
and applyFilters
: a vector of type logical
,
representing the rows of NxtSE that should be kept.
For getDefaultFilters
: returns a list of default recommended filters
that should be parsed into applyFilters
.
getDefaultFilters()
: Returns a vector of recommended default
SpliceWiz filters
applyFilters()
: Run a vector or list of ASEFilter objects
on a NxtSE object
runFilter()
: Run a single filter on a NxtSE object
ASEFilter for details describing how to create and assign settings to ASEFilter objects.
# see ?makeSE on example code of how this object was generated se <- SpliceWiz_example_NxtSE() # Get the list of SpliceWiz recommended filters filters <- getDefaultFilters() # View a description of what these filters do: filters # Filter the NxtSE using the first default filter ("Depth") se.depthfilter <- se[runFilter(se, filters[[1]]), ] # Filter the NxtSE using all four default filters se.defaultFiltered <- se[applyFilters(se, getDefaultFilters()), ]
# see ?makeSE on example code of how this object was generated se <- SpliceWiz_example_NxtSE() # Get the list of SpliceWiz recommended filters filters <- getDefaultFilters() # View a description of what these filters do: filters # Filter the NxtSE using the first default filter ("Depth") se.depthfilter <- se[runFilter(se, filters[[1]]), ] # Filter the NxtSE using all four default filters se.defaultFiltered <- se[applyFilters(se, getDefaultFilters()), ]
SpliceWiz uses the computationally efficient packages fst
and data.table
to compute file and data operations, respectively. Both packages make use
of parallelisation. If excessive number of threads are allocated, it may
impact the running of other operations on your system. Use this function to
manually allocate the desired number of threads
setSWthreads(threads = 0)
setSWthreads(threads = 0)
threads |
(default |
Nothing.
setSWthreads(0)
setSWthreads(0)
These STAR helper / wrapper functions allow users to (1) create a STAR genome reference (with or without GTF), (2) align one or more RNA-seq samples, and (3) calculate regions of low mappability. STAR references can be created using one-step (genome and GTF), or two-step (genome first, then on-the-fly with injected GTF) approaches.
STAR_version() STAR_buildRef( reference_path, STAR_ref_path = file.path(reference_path, "STAR"), n_threads = 4, overwrite = FALSE, sjdbOverhang = 100, sparsity = 1, also_generate_mappability = FALSE, map_depth_threshold = 4, additional_args = NULL, ... ) STAR_alignExperiment( Experiment, STAR_ref_path, BAM_output_path, n_threads = 4, overwrite = FALSE, two_pass = FALSE, trim_adaptor = "AGATCGGAAG", additional_args = NULL ) STAR_alignReads( fastq_1 = c("./sample_1.fastq"), fastq_2 = NULL, STAR_ref_path, BAM_output_path, n_threads = 4, overwrite = FALSE, two_pass = FALSE, trim_adaptor = "AGATCGGAAG", memory_mode = "NoSharedMemory", additional_args = NULL ) STAR_buildGenome( reference_path, STAR_ref_path = file.path(reference_path, "STAR"), n_threads = 4, overwrite = FALSE, sparsity = 1, also_generate_mappability = FALSE, map_depth_threshold = 4, additional_args = NULL, ... ) STAR_loadGenomeGTF( reference_path, STAR_ref_path, STARgenome_output = file.path(tempdir(), "STAR"), n_threads = 4, overwrite = FALSE, sjdbOverhang = 100, extraFASTA = "", additional_args = NULL ) STAR_mappability( reference_path, STAR_ref_path = file.path(reference_path, "STAR"), map_depth_threshold = 4, n_threads = 4, ... )
STAR_version() STAR_buildRef( reference_path, STAR_ref_path = file.path(reference_path, "STAR"), n_threads = 4, overwrite = FALSE, sjdbOverhang = 100, sparsity = 1, also_generate_mappability = FALSE, map_depth_threshold = 4, additional_args = NULL, ... ) STAR_alignExperiment( Experiment, STAR_ref_path, BAM_output_path, n_threads = 4, overwrite = FALSE, two_pass = FALSE, trim_adaptor = "AGATCGGAAG", additional_args = NULL ) STAR_alignReads( fastq_1 = c("./sample_1.fastq"), fastq_2 = NULL, STAR_ref_path, BAM_output_path, n_threads = 4, overwrite = FALSE, two_pass = FALSE, trim_adaptor = "AGATCGGAAG", memory_mode = "NoSharedMemory", additional_args = NULL ) STAR_buildGenome( reference_path, STAR_ref_path = file.path(reference_path, "STAR"), n_threads = 4, overwrite = FALSE, sparsity = 1, also_generate_mappability = FALSE, map_depth_threshold = 4, additional_args = NULL, ... ) STAR_loadGenomeGTF( reference_path, STAR_ref_path, STARgenome_output = file.path(tempdir(), "STAR"), n_threads = 4, overwrite = FALSE, sjdbOverhang = 100, extraFASTA = "", additional_args = NULL ) STAR_mappability( reference_path, STAR_ref_path = file.path(reference_path, "STAR"), map_depth_threshold = 4, n_threads = 4, ... )
reference_path |
The path to the reference.
getResources must first be run using this path
as its |
STAR_ref_path |
(Default - the "STAR" subdirectory under
|
n_threads |
The number of threads to run the STAR aligner. |
overwrite |
(default |
sjdbOverhang |
(Default = 100) A STAR setting indicating the length of the donor / acceptor sequence on each side of the junctions. Ideally equal to (mate_length - 1). See the STAR aligner manual for details. |
sparsity |
(default |
also_generate_mappability |
Whether |
map_depth_threshold |
(Default |
additional_args |
A character vector of additional arguments to be parsed into STAR. See examples below. |
... |
Additional arguments to be parsed into
|
Experiment |
A two or three-column data frame with the columns denoting sample names, forward-FASTQ and reverse-FASTQ files. This can be conveniently generated using findFASTQ |
BAM_output_path |
The path under which STAR outputs the aligned BAM
files. In |
two_pass |
Whether to use two-pass mapping. In
|
trim_adaptor |
The sequence of the Illumina adaptor to trim via STAR's
|
fastq_1 , fastq_2
|
In STAR_alignReads: character vectors giving the
path(s) of one or more FASTQ (or FASTA) files to be aligned.
If single reads are to be aligned, omit |
memory_mode |
The parameter to be parsed to |
STARgenome_output |
The output path of the created on-the-fly genome |
extraFASTA |
(default |
Pre-requisites
STAR_buildRef()
and STAR_buildGenome()
require prepared genome
and gene annotation reference retrieved using getResources, which is run
internally by buildRef
STAR_loadGenomeGTF()
requires the above, and additionally a STAR genome
created using STAR_buildGenome()
STAR_alignExperiment()
, STAR_alignReads()
, and STAR_mappability()
:
requires a STAR
genome, which can be built using STAR_buildRef()
or
STAR_buildGenome()
followed by STAR_loadGenomeGTF()
Function Description
For STAR_buildRef
: this function will create a STAR
genome reference
using the same genome FASTA and gene annotation GTF used to create
the SpliceWiz reference. Optionally, it will run STAR_mappability
if also_generate_mappability
is set to TRUE
For STAR_alignExperiment
: aligns a set of FASTQ or paired FASTQ files
using the given
STAR
genome using the STAR
aligner.
A data.frame specifying sample names and corresponding FASTQ files are
required
For STAR_alignReads
: aligns a single or pair of FASTQ files to the given
STAR
genome using the STAR
aligner.
For STAR_buildGenome
: Creates a STAR genome reference, using ONLY the
FASTA file used to create the SpliceWiz reference. This allows users to
create a single STAR reference for use with multiple transcriptome (GTF)
references (on different occasions). Optionally, it will run
STAR_mappability
if also_generate_mappability
is set to TRUE
For STAR_loadGenomeGTF
: Creates an "on-the-fly" STAR genome, injecting GTF
from the given SpliceWiz reference_path
, setting sjdbOverhang
setting,
and (optionally) any spike-ins via the extraFASTA
parameter.
This allows users to create a single STAR reference for use with multiple
transcriptome (GTF) references, with different sjdbOverhang settings,
and/or spike-ins (on different occasions or for different projects).
For STAR_mappability
: this function will first
will run generateSyntheticReads, then use the given STAR
genome to
align the synthetic reads using STAR
. The aligned BAM file will then be
processed using calculateMappability to calculate the
lowly-mappable genomic regions,
producing the MappabilityExclusion.bed.gz
output file.
For STAR_version()
: The STAR version
For STAR_buildRef()
: None
For STAR_alignExperiment()
: None
For STAR_alignReads()
: None
For STAR_buildGenome()
: None
For STAR_loadGenomeGTF()
:
The path of the on-the-fly STAR genome, typically in the subdirectory
"_STARgenome" within the given STARgenome_output
directory
For STAR_mappability()
: None
STAR_version()
: Checks whether STAR is installed, and its version
STAR_buildRef()
: Creates a STAR genome reference, using both FASTA
and GTF files used to create the SpliceWiz reference
STAR_alignExperiment()
: Aligns multiple sets of FASTQ files, belonging to
multiple samples
STAR_alignReads()
: Aligns a single sample (with single or paired FASTQ
or FASTA files)
STAR_buildGenome()
: Creates a STAR genome reference, using ONLY the
FASTA file used to create the SpliceWiz reference
STAR_loadGenomeGTF()
: Creates an "on-the-fly" STAR genome, injecting GTF
from the given SpliceWiz reference_path
, setting sjdbOverhang
setting,
and (optionally) any spike-ins as extraFASTA
STAR_mappability()
: Calculates lowly-mappable genomic regions using STAR
Build-Reference-methods findSamples Mappability-methods
The latest STAR documentation
# 0) Check that STAR is installed and compatible with SpliceWiz STAR_version() ## Not run: # The below workflow illustrates # 1) Getting the reference resource # 2) Building the STAR Reference, including Mappability Exclusion calculation # 3) Building the SpliceWiz Reference, using the Mappability Exclusion file # 4) Aligning (a) one or (b) multiple raw sequencing samples. # 1) Reference generation from Ensembl's FTP links FTP <- "ftp://ftp.ensembl.org/pub/release-94/" getResources( reference_path = "Reference_FTP", fasta = paste0(FTP, "fasta/homo_sapiens/dna/", "Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz"), gtf = paste0(FTP, "gtf/homo_sapiens/", "Homo_sapiens.GRCh38.94.chr.gtf.gz") ) # 2) Generates STAR genome within the SpliceWiz reference. Also generates # mappability exclusion gzipped BED file inside the "Mappability/" sub-folder STAR_buildRef( reference_path = "Reference_FTP", STAR_ref_path = file.path("Reference_FTP", "STAR"), n_threads = 8, also_generate_mappability = TRUE ) # 2a) Generates STAR genome of the example SpliceWiz genome. # This demonstrates using custom STAR parameters, as the example # SpliceWiz genome is ~100k in length, # so --genomeSAindexNbases needs to be # adjusted to be min(14, log2(GenomeLength)/2 - 1) getResources( reference_path = "Reference_chrZ", fasta = chrZ_genome(), gtf = chrZ_gtf() ) STAR_buildRef( reference_path = "Reference_chrZ", STAR_ref_path = file.path("Reference_chrZ", "STAR"), n_threads = 8, additional_args = c("--genomeSAindexNbases", "7"), also_generate_mappability = TRUE ) # 3) Build SpliceWiz reference using the newly-generated # Mappability exclusions #' NB: also specifies to use the hg38 nonPolyA resource buildRef(reference_path = "Reference_FTP", genome_type = "hg38") # 4a) Align a single sample using the STAR reference STAR_alignReads( fastq_1 = "sample1_1.fastq", fastq_2 = "sample1_2.fastq", STAR_ref_path = file.path("Reference_FTP", "STAR"), BAM_output_path = "./bams/sample1", n_threads = 8 ) # 4b) Align multiple samples, using two-pass alignment Experiment <- data.frame( sample = c("sample_A", "sample_B"), forward = file.path("raw_data", c("sample_A", "sample_B"), c("sample_A_1.fastq", "sample_B_1.fastq")), reverse = file.path("raw_data", c("sample_A", "sample_B"), c("sample_A_2.fastq", "sample_B_2.fastq")) ) STAR_alignExperiment( Experiment = Experiment, STAR_ref_path = file.path("Reference_FTP", "STAR"), BAM_output_path = "./bams", n_threads = 8, two_pass = TRUE ) # - Building a STAR genome (only) reference, and injecting GTF as a # subsequent step # # This is useful for users who want to create a single STAR genome, for # experimentation with different GTF files. # It is important to note that the chromosome names of the genome (FASTA) # file and the GTF file needs to be identical. Thus, Ensembl and Gencode # GTF files should not be mixed (unless the chromosome GTF names have # been fixed) # - also set sparsity = 2 to build human genome so that it will fit in # 16 Gb RAM. NB: this step's RAM usage can be set using the # `--limitGenomeGenerateRAM` parameter STAR_buildGenome( reference_path = "Reference_FTP", STAR_ref_path = file.path("Reference_FTP", "STAR_genomeOnly"), n_threads = 8, sparsity = 2, additional_args = c("--limitGenomeGenerateRAM", "16000000000") ) # - Injecting a GTF into a genome-only STAR reference # # This creates an on-the-fly STAR genome, using a GTF file # (derived from a SpliceWiz reference) into a new location. # This allows a single STAR reference to use multiple GTFs # on different occasions. STAR_new_ref <- STAR_loadGenomeGTF( reference_path = "Reference_FTP", STAR_ref_path = file.path("Reference_FTP", "STAR_genomeOnly"), STARgenome_output = file.path(tempdir(), "STAR"), n_threads = 4, sjdbOverhang = 100 ) # This new reference can then be used to align your experiment: STAR_alignExperiment( Experiment = Experiment, STAR_ref_path = STAR_new_ref, BAM_output_path = "./bams", n_threads = 8, two_pass = TRUE ) # Typically, one should `clean up` the on-the-fly STAR reference (as it is # large!). If it is in a temporary directory, it will be cleaned up # when the current R session ends; otherwise this needs to be done manually: unlink(file.path(tempdir(), "STAR"), recursive = TRUE) ## End(Not run)
# 0) Check that STAR is installed and compatible with SpliceWiz STAR_version() ## Not run: # The below workflow illustrates # 1) Getting the reference resource # 2) Building the STAR Reference, including Mappability Exclusion calculation # 3) Building the SpliceWiz Reference, using the Mappability Exclusion file # 4) Aligning (a) one or (b) multiple raw sequencing samples. # 1) Reference generation from Ensembl's FTP links FTP <- "ftp://ftp.ensembl.org/pub/release-94/" getResources( reference_path = "Reference_FTP", fasta = paste0(FTP, "fasta/homo_sapiens/dna/", "Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz"), gtf = paste0(FTP, "gtf/homo_sapiens/", "Homo_sapiens.GRCh38.94.chr.gtf.gz") ) # 2) Generates STAR genome within the SpliceWiz reference. Also generates # mappability exclusion gzipped BED file inside the "Mappability/" sub-folder STAR_buildRef( reference_path = "Reference_FTP", STAR_ref_path = file.path("Reference_FTP", "STAR"), n_threads = 8, also_generate_mappability = TRUE ) # 2a) Generates STAR genome of the example SpliceWiz genome. # This demonstrates using custom STAR parameters, as the example # SpliceWiz genome is ~100k in length, # so --genomeSAindexNbases needs to be # adjusted to be min(14, log2(GenomeLength)/2 - 1) getResources( reference_path = "Reference_chrZ", fasta = chrZ_genome(), gtf = chrZ_gtf() ) STAR_buildRef( reference_path = "Reference_chrZ", STAR_ref_path = file.path("Reference_chrZ", "STAR"), n_threads = 8, additional_args = c("--genomeSAindexNbases", "7"), also_generate_mappability = TRUE ) # 3) Build SpliceWiz reference using the newly-generated # Mappability exclusions #' NB: also specifies to use the hg38 nonPolyA resource buildRef(reference_path = "Reference_FTP", genome_type = "hg38") # 4a) Align a single sample using the STAR reference STAR_alignReads( fastq_1 = "sample1_1.fastq", fastq_2 = "sample1_2.fastq", STAR_ref_path = file.path("Reference_FTP", "STAR"), BAM_output_path = "./bams/sample1", n_threads = 8 ) # 4b) Align multiple samples, using two-pass alignment Experiment <- data.frame( sample = c("sample_A", "sample_B"), forward = file.path("raw_data", c("sample_A", "sample_B"), c("sample_A_1.fastq", "sample_B_1.fastq")), reverse = file.path("raw_data", c("sample_A", "sample_B"), c("sample_A_2.fastq", "sample_B_2.fastq")) ) STAR_alignExperiment( Experiment = Experiment, STAR_ref_path = file.path("Reference_FTP", "STAR"), BAM_output_path = "./bams", n_threads = 8, two_pass = TRUE ) # - Building a STAR genome (only) reference, and injecting GTF as a # subsequent step # # This is useful for users who want to create a single STAR genome, for # experimentation with different GTF files. # It is important to note that the chromosome names of the genome (FASTA) # file and the GTF file needs to be identical. Thus, Ensembl and Gencode # GTF files should not be mixed (unless the chromosome GTF names have # been fixed) # - also set sparsity = 2 to build human genome so that it will fit in # 16 Gb RAM. NB: this step's RAM usage can be set using the # `--limitGenomeGenerateRAM` parameter STAR_buildGenome( reference_path = "Reference_FTP", STAR_ref_path = file.path("Reference_FTP", "STAR_genomeOnly"), n_threads = 8, sparsity = 2, additional_args = c("--limitGenomeGenerateRAM", "16000000000") ) # - Injecting a GTF into a genome-only STAR reference # # This creates an on-the-fly STAR genome, using a GTF file # (derived from a SpliceWiz reference) into a new location. # This allows a single STAR reference to use multiple GTFs # on different occasions. STAR_new_ref <- STAR_loadGenomeGTF( reference_path = "Reference_FTP", STAR_ref_path = file.path("Reference_FTP", "STAR_genomeOnly"), STARgenome_output = file.path(tempdir(), "STAR"), n_threads = 4, sjdbOverhang = 100 ) # This new reference can then be used to align your experiment: STAR_alignExperiment( Experiment = Experiment, STAR_ref_path = STAR_new_ref, BAM_output_path = "./bams", n_threads = 8, two_pass = TRUE ) # Typically, one should `clean up` the on-the-fly STAR reference (as it is # large!). If it is in a temporary directory, it will be cleaned up # when the current R session ends; otherwise this needs to be done manually: unlink(file.path(tempdir(), "STAR"), recursive = TRUE) ## End(Not run)
A ggplot theme object for white background figures +/- a legend
theme_white theme_white_legend theme_white_legend_plot_track
theme_white theme_white_legend theme_white_legend_plot_track
An object of class theme
(inherits from gg
) of length 10.
An object of class theme
(inherits from gg
) of length 9.
An object of class theme
(inherits from gg
) of length 10.
theme_white
: White theme without figure legend
theme_white_legend
: White theme but with a figure legend (if applicable)
theme_white_legend_plot_track
: White theme with figure legend but without horizontal
grid lines. Used internally in PlotGenome
library(ggplot2) df <- data.frame( gp = factor(rep(letters[1:3], each = 10)), y = rnorm(30)) ggplot(df, aes(gp, y)) + geom_point() + theme_white
library(ggplot2) df <- data.frame( gp = factor(rep(letters[1:3], each = 10)), y = rnorm(30)) ggplot(df, aes(gp, y)) + geom_point() + theme_white
These functions allow users to construct tables containing SpliceWiz's reference of alternate splicing events, intron retention events, and other relevant data
viewASE(reference_path) viewIR(reference_path, directional = TRUE) viewIntrons(reference_path) viewIR_NMD(reference_path) viewExons(reference_path) viewGenes(reference_path) viewGO(reference_path) viewProteins(reference_path) viewTranscripts(reference_path)
viewASE(reference_path) viewIR(reference_path, directional = TRUE) viewIntrons(reference_path) viewIR_NMD(reference_path) viewExons(reference_path) viewGenes(reference_path) viewGO(reference_path) viewProteins(reference_path) viewTranscripts(reference_path)
reference_path |
The directory containing the SpliceWiz reference |
directional |
(default |
A data frame containing the relevant info. See details
viewASE()
: Outputs summary of alternative splicing
events constructed by SpliceWiz
viewIR()
: Outputs summary of assessed IRFinde-like
IR events, constructed by SpliceWiz
viewIntrons()
: Outputs summary of all introns from
the annotation, constructed by SpliceWiz
viewIR_NMD()
: Outputs information for every intron -
whether retention of the intron will convert the transcript to an NMD
substrate
viewExons()
: Outputs information for every exon
from the annotation.
viewGenes()
: Outputs information for every gene
from the annotation.
viewGO()
: Outputs information for every gene
from the annotation.
viewProteins()
: Outputs information for every
protein-coding exon from the annotation.
viewTranscripts()
: Outputs information for every
transcript from the annotation.
ref_path <- file.path(tempdir(), "Reference_withGO") buildRef( reference_path = ref_path, fasta = chrZ_genome(), gtf = chrZ_gtf(), ontologySpecies = "Homo sapiens" ) df <- viewASE(ref_path) df <- viewIR(ref_path, directional = TRUE) df <- viewIntrons(ref_path) df <- viewIR_NMD(ref_path) df <- viewExons(ref_path) df <- viewGenes(ref_path) df <- viewProteins(ref_path) df <- viewTranscripts(ref_path) df <- viewGO(ref_path)
ref_path <- file.path(tempdir(), "Reference_withGO") buildRef( reference_path = ref_path, fasta = chrZ_genome(), gtf = chrZ_gtf(), ontologySpecies = "Homo sapiens" ) df <- viewASE(ref_path) df <- viewIR(ref_path, directional = TRUE) df <- viewIntrons(ref_path) df <- viewIR_NMD(ref_path) df <- viewExons(ref_path) df <- viewGenes(ref_path) df <- viewProteins(ref_path) df <- viewTranscripts(ref_path) df <- viewGO(ref_path)