Package 'eisaR' reference manual

Title:	Exon-Intron Split Analysis (EISA) in R
Description:	Exon-intron split analysis (EISA) uses ordinary RNA-seq data to measure changes in mature RNA and pre-mRNA reads across different experimental conditions to quantify transcriptional and post-transcriptional regulation of gene expression. For details see Gaidatzis et al., Nat Biotechnol 2015. doi: 10.1038/nbt.3269. eisaR implements the major steps of EISA in R.
Authors:	Michael Stadler [aut, cre], Dimos Gaidatzis [aut], Lukas Burger [aut], Charlotte Soneson [aut]
Maintainer:	Michael Stadler <[email protected]>
License:	GPL-3
Version:	1.19.0
Built:	2025-03-08 03:18:48 UTC
Source:	https://github.com/bioc/eisaR

Export GRangesList to GTF

Description

Export the features in a GRangesList generated by getFeatureRanges to a GTF file. The function will represent each row of each of the entries as an "exon", each individual entry as a "transcript", and aggregate all features belonging to the same gene as a "gene" entry in the GTF file.

Usage

exportToGtf(grl, filepath)
exportToGtf(grl, filepath)

Arguments

`grl`	`GRangesList` object, typically generated by `getFeatureRanges`
`filepath`	Path to output GTF file

Value

Does not return anything, generates a GTF file

Author(s)

Charlotte Soneson

Examples

  ## Get feature ranges
  grl <- getFeatureRanges(
    gtf = system.file("extdata/small_example.gtf", package = "eisaR"),
    featureType = c("spliced", "intron"),
    intronType = "separate",
    flankLength = 5L,
    joinOverlappingIntrons = FALSE,
    verbose = TRUE
  )
  
  ## Export GTF
  exportToGtf(grl = grl, filepath = file.path(tempdir(), "exported.gtf"))

## Get feature ranges
  grl <- getFeatureRanges(
    gtf = system.file("extdata/small_example.gtf", package = "eisaR"),
    featureType = c("spliced", "intron"),
    intronType = "separate",
    flankLength = 5L,
    joinOverlappingIntrons = FALSE,
    verbose = TRUE
  )
  
  ## Export GTF
  exportToGtf(grl = grl, filepath = file.path(tempdir(), "exported.gtf"))

Generate a GRangesList object with feature ranges

Description

Generate a GRangesList object with genomic ranges for (any combination of) spliced transcripts, unspliced transcripts and introns.

Usage

getFeatureRanges(
  gtf,
  featureType = c("spliced", "intron"),
  intronType = "separate",
  flankLength = 90L,
  joinOverlappingIntrons = FALSE,
  collapseIntronsByGene = FALSE,
  keepIntronInFeature = FALSE,
  verbose = TRUE
)
getFeatureRanges(
  gtf,
  featureType = c("spliced", "intron"),
  intronType = "separate",
  flankLength = 90L,
  joinOverlappingIntrons = FALSE,
  collapseIntronsByGene = FALSE,
  keepIntronInFeature = FALSE,
  verbose = TRUE
)

Arguments

`gtf`	Path to gtf file.
`featureType`	Character vector indicating the type(s) of features to extract, any subset of `c("spliced", "intron", "unspliced")`.
`intronType`	Character vector indicating how to define the introns (only used if "intron" is part of `featureType`). Has to be either "separate" (introns are defined for each transcript separately) or "collapse" (transcripts of the same gene are first collapsed before introns are defined as any non-exonic part of the gene locus).
`flankLength`	Integer scalar indicating the length of the flanking sequence added to each side of each extracted intron (only used if "intron" is included in `featureType`).
`joinOverlappingIntrons`	Logical scalar indicating whether two introns that overlap (after adding the flanking sequences) should be joined into one feature.
`collapseIntronsByGene`	Logical scalar indicating whether to collapse overlapping intron ranges by gene after extraction.
`keepIntronInFeature`	Logical scalar indicating whether introns (after adding the flank length) should be restricted to stay within the boundaries of the feature they were generated from. For backwards compatibility, the default is `FALSE`.
`verbose`	Logical scalar, whether to print out progress messages.

Value

Returns a GRangesList object where each element represents one extracted feature. The metadata of this object contains two data.frames mapping corresponding identifiers between the different feature types, as well as a list of all features for each type.

Author(s)

Charlotte Soneson

Examples

  ## Get feature ranges
  grl <- getFeatureRanges(
    gtf = system.file("extdata/small_example.gtf", package = "eisaR"),
    featureType = c("spliced", "intron"),
    intronType = "separate",
    flankLength = 5L,
    joinOverlappingIntrons = FALSE,
    collapseIntronsByGene = FALSE,
    verbose = TRUE
  )
  
  ## GRangesList
  grl
  
  ## Corresponding transcript/gene IDs
  S4Vectors::metadata(grl)$corrtx
  S4Vectors::metadata(grl)$corrgene
  
  ## List of features of different types
  S4Vectors::metadata(grl)$featurelist
  
  ## Get feature sequences
  if (requireNamespace("BSgenome", quietly = TRUE) &&
      requireNamespace("GenomicFeatures", quietly = TRUE)) {
    library(BSgenome)
    genome <- Biostrings::readDNAStringSet(
      system.file("extdata/small_example_genome.fa", package = "eisaR"))
    seqs <- GenomicFeatures::extractTranscriptSeqs(x = genome, 
                                                   transcripts = grl)
    seqs
  }
 
## Get feature ranges
  grl <- getFeatureRanges(
    gtf = system.file("extdata/small_example.gtf", package = "eisaR"),
    featureType = c("spliced", "intron"),
    intronType = "separate",
    flankLength = 5L,
    joinOverlappingIntrons = FALSE,
    collapseIntronsByGene = FALSE,
    verbose = TRUE
  )
  
  ## GRangesList
  grl
  
  ## Corresponding transcript/gene IDs
  S4Vectors::metadata(grl)$corrtx
  S4Vectors::metadata(grl)$corrgene
  
  ## List of features of different types
  S4Vectors::metadata(grl)$featurelist
  
  ## Get feature sequences
  if (requireNamespace("BSgenome", quietly = TRUE) &&
      requireNamespace("GenomicFeatures", quietly = TRUE)) {
    library(BSgenome)
    genome <- Biostrings::readDNAStringSet(
      system.file("extdata/small_example_genome.fa", package = "eisaR"))
    seqs <- GenomicFeatures::extractTranscriptSeqs(x = genome, 
                                                   transcripts = grl)
    seqs
  }

Get exonic/gene body regions from a transcript database.

Description

From a transcript database package (TxDb), extract exonic and gene body ranges for use with EISA. These regions can be used to quantify RNA-seq alignments in exons and gene bodies, respectively. Intronic counts can then be obtained from the difference between gene bodies and exonic region counts.

Usage

getRegionsFromTxDb(txdb, exonExt = 10L, strandedData = TRUE)
getRegionsFromTxDb(txdb, exonExt = 10L, strandedData = TRUE)

Arguments

`txdb`	a `TxDb` or an `EnsDb` object with the transcript annotations.
`exonExt`	`numeric` (default = 10L). Exonic ranges will be extended on either side by this many nucleotides, in order to avoid "bleed-over" of exonic alignments into adjacent intronic regions.
`strandedData`	`logical(1)`. If `TRUE`, the RNA-seq data is assumed to be strand-specific, and therefore only overlapping genes that are on the same strand will be filtered out. If `FALSE`, also genes overlapping on opposite strands will be filtered out.

Details

The exonic regions are generated as follows:

extract exons by gene from the txdb
extend each exon by exonExt
combine overlapping exons within each gene
create gene body ranges from the most extreme exonic coordinates
filter out genes that have only a single exon (no intron), have exons on more than a single chromosome or on both strands, or that overlap other genes

Value

a list with elements "exons" and "genebodies", containing named GenomicRanges objects with ranges for exons and gene bodies, respectively.

Author(s)

Michael Stadler

Examples

if (requireNamespace("AnnotationDbi", quietly = TRUE)) {
    txdb <- AnnotationDbi::loadDb(system.file("extdata", "hg19sub.sqlite", package = "eisaR"))
    regL <- getRegionsFromTxDb(txdb)
    lengths(regL)
}

if (requireNamespace("AnnotationDbi", quietly = TRUE)) {
    txdb <- AnnotationDbi::loadDb(system.file("extdata", "hg19sub.sqlite", package = "eisaR"))
    regL <- getRegionsFromTxDb(txdb)
    lengths(regL)
}

Generate a transcript-to-gene mapping from a GRangesList

Description

Generate a data.frame mapping transcript IDs to gene IDs, based on a GRangesList object generated by getFeatureRanges.

Usage

getTx2Gene(grl, filepath = NULL)
getTx2Gene(grl, filepath = NULL)

Arguments

`grl`	`GRangesList` object, typically generated by `getFeatureRanges`
`filepath`	Either `NULL` or the path to a file where the transcript-to-gene mapping will be written

Value

Invisibly returns a data.frame with the transcript-to-gene mapping.

Author(s)

Charlotte Soneson

Examples

  ## Get feature ranges
  grl <- getFeatureRanges(
    gtf = system.file("extdata/small_example.gtf", package = "eisaR"),
    featureType = c("spliced", "intron"),
    intronType = "separate",
    flankLength = 5L,
    joinOverlappingIntrons = FALSE,
    verbose = TRUE
  )
  
  ## Get transcript-to-gene mapping
  t2g <- getTx2Gene(grl = grl)
  t2g
   
## Get feature ranges
  grl <- getFeatureRanges(
    gtf = system.file("extdata/small_example.gtf", package = "eisaR"),
    featureType = c("spliced", "intron"),
    intronType = "separate",
    flankLength = 5L,
    joinOverlappingIntrons = FALSE,
    verbose = TRUE
  )
  
  ## Get transcript-to-gene mapping
  t2g <- getTx2Gene(grl = grl)
  t2g

Visualize the results from an exon-intron split analysis.

Description

plotEISA takes the return value from runEISA and generates a scatterplot of intronic versus exonic changes.

Usage

plotEISA(
  x,
  contrast = c("ExIn", "none"),
  minLfc = NULL,
  maxFDR = 0.05,
  genecolors = c("#E41A1C", "#497AB3", "#222222"),
  ...
)
plotEISA(
  x,
  contrast = c("ExIn", "none"),
  minLfc = NULL,
  maxFDR = 0.05,
  genecolors = c("#E41A1C", "#497AB3", "#222222"),
  ...
)

Arguments

`x`	`list` with EISA results, typically the return value from `runEISA`
`contrast`	one of `"ExIn"` or `"none"`. If `"ExIn"` (the default), genes that significantly differ between exonic and intronic changes are highlighted. `"none"` turns off gene highlighting.
`minLfc`	`NULL` or `numeric(1)` with the minimal absolute log2 fold change to color a gene. If `NULL` (the default), no fold changes are not used to select genes for highlighting.
`maxFDR`	`numeric(1)` with maximal false discovery rate for gene highlighting.
`genecolors`	Vector of length three specifying the colors to use for genes that are significantly up, down or unchanged.
`...`	further arguments past to `plot()`. Parameters that will be set automatically unless given in the arguments are: pch : plot symbol (default: `"."`) cex : plot symbol expansion factor (default: `2`) col : plot symbol color (default: according to `contrast` and `genecolors`) xlab/ylab : axis labels

Value

NULL (invisibly)

Author(s)

Michael Stadler

Examples

  # see the help for runEISA() for a full example

# see the help for runEISA() for a full example

Run Exon-Intron Split Analysis.

Description

Starting from count tables with exonic and intronic counts for two conditions, perform all the steps in EISA (normalize, identify quantifyable genes, calculate contrasts and their significance).

Usage

runEISA(
  cntEx,
  cntIn,
  cond,
  method = NULL,
  modelSamples = TRUE,
  geneSelection = c("filterByExpr", "none", "Gaidatzis2015"),
  statFramework = c("QLF", "LRT"),
  legacyQLF = FALSE,
  effects = c("predFC", "Gaidatzis2015"),
  pscnt = 2,
  sizeFactor = c("exon", "intron", "individual"),
  recalcNormFactAfterFilt = TRUE,
  recalcLibSizeAfterFilt = FALSE,
  ...
)
runEISA(
  cntEx,
  cntIn,
  cond,
  method = NULL,
  modelSamples = TRUE,
  geneSelection = c("filterByExpr", "none", "Gaidatzis2015"),
  statFramework = c("QLF", "LRT"),
  legacyQLF = FALSE,
  effects = c("predFC", "Gaidatzis2015"),
  pscnt = 2,
  sizeFactor = c("exon", "intron", "individual"),
  recalcNormFactAfterFilt = TRUE,
  recalcLibSizeAfterFilt = FALSE,
  ...
)

Arguments

`cntEx`	Gene by sample `matrix` with exonic counts, OR a `SummarizedExperiment` with two assays named `exon` and `intron`, containing exonic and intronic counts, respectively. If `cntEx` is a `SummarizedExperiment`, `cntIn` will be disregarded.
`cntIn`	Gene by sample `matrix` with intronic counts. Must have the same structure as `cntEx` (same number and order of rows and columns) if `cntEx` is a matrix. Will be disregarded if `cntEx` is a `SummarizedExperiment`.
`cond`	`numeric`, `character` or `factor` with two levels that groups the samples (columns of `cntEx` and `cntIn`) into two conditions. The contrast will be defined as secondLevel - firstLevel.
`method`	One of `NULL` (the default) or `"Gaidatzis2015"`. If `"Gaidatzis2015"`, gene filtering, statistical analysis and calculation of contrasts is performed as described in Gaidatzis et al. 2015, and the statistical analysis is based on `glmFit` and `glmLRT`. This is done by setting the arguments `modelSamples`, `geneSelection`, `effects`, `pscnt`, `statFramework`, `sizeFactor`, `recalcNormFactAfterFilt` and `recalcLibSizeAfterFilt` to appropriate values (see details), overriding the defaults or any value passed to these arguments. If `NULL`, the default values of the arguments will be used instead (recommended).
`modelSamples`	Whether to include a sample identifier in the design matrix of the statistical model. If `TRUE`, potential sample effects that affect both exonic and intronic counts of that sample will be taken into account, which could result in higher sensitivity (default: `TRUE`).
`geneSelection`	Controls how to select quantifyable genes. One of the following: `"filterByExpr"`: (default) First, counts are normalized using `calcNormFactors`, treating intronic and exonic counts as individual samples. Then, `filterByExpr` is used with default parameters to select quantifyable genes. `"none"`: This will use all the genes provided in the count tables, assuming that an appropriate selection of quantifyable genes has already been done. `"Gaidatzis2015"`: First, intronic and exonic counts are linearly scaled to the mean library size (estimated as the sum of all intronic or exonic counts, respectively). Then, quantifyable genes are selected as the genes with counts `x` that fulfill `log2(x + 8) > 5` in both exons and introns.
`statFramework`	Selects the framework within `edgeR` that is used for the statistical analysis. One of: `"QLF"`: (default) Quasi-likelihood F-test using `glmQLFit` and `glmQLFTest`. This framework is highly recommended as it gives stricter error rate control by accounting for the uncertainty in dispersion estimation. `"LRT"`: Likelihood ratio test using `glmFit` and `glmLRT` .
`legacyQLF`	Whether to use the 'legacy' version of `glmQLFit`. See `glmQLFit` for more details. If `FALSE`, the new method introduced in `edgeR` 4.0.0 is used.
`effects`	How the effects (contrasts or log2 fold-changes) are calculated. One of: `"predFC"`: (default) Fold-changes are calculated using the fitted model with `predFC` with `prior.count = pscnt`. Please note that if a sample factor is included in the model (`modelSamples=TRUE`), effects cannot be obtained from that model. In that case, effects are obtained from a simpler model without sample effects. `"Gaidatzis2015"`: Fold-changes are calculated using the formula `log2((x + pscnt)/(y + pscnt))`. If `pscnt` is not set to 8, `runEISA` will warn that this deviates from the method used in Gaidatzis et al., 2015.
`pscnt`	`numeric(1)` with pseudocount to add to read counts (default: 2). For `method = "Gaidatzis2015"`, it is set to 8. It is added to scaled read counts used in `geneSelection = "Gaidatzis2015"` and `effects = "Gaidatzis2015"`, or else used in `cpm(..., prior.count = pscnt)` and `predFC(..., prior.count = pscnt)`.
`sizeFactor`	How the size factors are calculated in the analysis. If 'exon' (default), the exon-derived size factors are used also for the columns corresponding to intronic counts. If 'intron', the intron-derived size factors are used also for the columns corresponding to exonic counts. If 'individual', column-wise size factors are calculated.
`recalcNormFactAfterFilt`	Logical, indicating whether normalization factors should be recalculated after filtering out lowly expressed genes.
`recalcLibSizeAfterFilt`	Logical, indicating whether library sizes should be recalculated after filtering out lowly expressed genes.
`...`	additional arguments passed to the `DGEList` constructor, such as `lib.size` or `genes`.

Details

Setting method = "Gaidatzis2015" has precedence over other argument values and corresponds to setting: modelSamples = FALSE, geneSelection = "Gaidatzis2015", statFramework = "LRT", effects = "Gaidatzis2015", pscnt = 8, sizeFactor = "individual", recalcNormFactAfterFilt = TRUE, recalcLibSizeAfterFilt = FALSE.

Value

a list with elements

fracIn: fraction intronic counts in each sample
contrastName: contrast name
contrasts: contrast matrix for quantifyable genes, with average log2 fold-changes in exons (Dex), in introns (Din), and average difference between log2 fold-changes in exons and introns (Dex.Din)
DGEList: DGEList object used in model fitting
tab.ExIn: statisical results for differential changes between exonic and intronic contrast, an indication for post-transcriptional regulation.
contr.ExIn: contrast vector used for testing the difference between exonic and intronic contrast (results in tab.ExIn)
designMatrix: design matrix used for testing the difference between exonic and intronic contrast (results in tab.ExIn)
params: a list with parameter values used to run EISA

Author(s)

Michael Stadler

References

Analysis of intronic and exonic reads in RNA-seq data characterizes transcriptional and post-transcriptional regulation. Dimos Gaidatzis, Lukas Burger, Maria Florescu and Michael B. Stadler Nature Biotechnology, 2015 Jul;33(7):722-9. doi: 10.1038/nbt.3269.

Examples

cntEx <- readRDS(system.file("extdata", "Fig3abc_GSE33252_rawcounts_exonic.rds",
                             package = "eisaR"))[,-1]
cntIn <- readRDS(system.file("extdata", "Fig3abc_GSE33252_rawcounts_intronic.rds",
                             package = "eisaR"))[,-1]
cond <- factor(c("ES","ES","TN","TN"))
res <- runEISA(cntEx, cntIn, cond)
plotEISA(res)

cntEx <- readRDS(system.file("extdata", "Fig3abc_GSE33252_rawcounts_exonic.rds",
                             package = "eisaR"))[,-1]
cntIn <- readRDS(system.file("extdata", "Fig3abc_GSE33252_rawcounts_intronic.rds",
                             package = "eisaR"))[,-1]
cond <- factor(c("ES","ES","TN","TN"))
res <- runEISA(cntEx, cntIn, cond)
plotEISA(res)

Package 'eisaR'

Help Index

Export GRangesList to GTF

Description

Usage

Arguments

Value

Author(s)

Examples

Generate a GRangesList object with feature ranges

Description

Usage

Arguments

Value

Author(s)

Examples

Get exonic/gene body regions from a transcript database.

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Generate a transcript-to-gene mapping from a GRangesList

Description

Usage

Arguments

Value

Author(s)

Examples

Visualize the results from an exon-intron split analysis.

Description

Usage

Arguments

Value

Author(s)

Examples

Run Exon-Intron Split Analysis.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples