Title: | Exon-Intron Split Analysis (EISA) in R |
---|---|
Description: | Exon-intron split analysis (EISA) uses ordinary RNA-seq data to measure changes in mature RNA and pre-mRNA reads across different experimental conditions to quantify transcriptional and post-transcriptional regulation of gene expression. For details see Gaidatzis et al., Nat Biotechnol 2015. doi: 10.1038/nbt.3269. eisaR implements the major steps of EISA in R. |
Authors: | Michael Stadler [aut, cre], Dimos Gaidatzis [aut], Lukas Burger [aut], Charlotte Soneson [aut] |
Maintainer: | Michael Stadler <[email protected]> |
License: | GPL-3 |
Version: | 1.19.0 |
Built: | 2024-11-19 03:43:30 UTC |
Source: | https://github.com/bioc/eisaR |
Export the features in a GRangesList generated by getFeatureRanges
to a GTF file. The function will represent each row of each of the
entries as an "exon", each individual entry as a "transcript", and
aggregate all features belonging to the same gene as a "gene" entry in
the GTF file.
exportToGtf(grl, filepath)
exportToGtf(grl, filepath)
grl |
|
filepath |
Path to output GTF file |
Does not return anything, generates a GTF file
Charlotte Soneson
## Get feature ranges grl <- getFeatureRanges( gtf = system.file("extdata/small_example.gtf", package = "eisaR"), featureType = c("spliced", "intron"), intronType = "separate", flankLength = 5L, joinOverlappingIntrons = FALSE, verbose = TRUE ) ## Export GTF exportToGtf(grl = grl, filepath = file.path(tempdir(), "exported.gtf"))
## Get feature ranges grl <- getFeatureRanges( gtf = system.file("extdata/small_example.gtf", package = "eisaR"), featureType = c("spliced", "intron"), intronType = "separate", flankLength = 5L, joinOverlappingIntrons = FALSE, verbose = TRUE ) ## Export GTF exportToGtf(grl = grl, filepath = file.path(tempdir(), "exported.gtf"))
Generate a GRangesList object with genomic ranges for (any combination of) spliced transcripts, unspliced transcripts and introns.
getFeatureRanges( gtf, featureType = c("spliced", "intron"), intronType = "separate", flankLength = 90L, joinOverlappingIntrons = FALSE, collapseIntronsByGene = FALSE, keepIntronInFeature = FALSE, verbose = TRUE )
getFeatureRanges( gtf, featureType = c("spliced", "intron"), intronType = "separate", flankLength = 90L, joinOverlappingIntrons = FALSE, collapseIntronsByGene = FALSE, keepIntronInFeature = FALSE, verbose = TRUE )
gtf |
Path to gtf file. |
featureType |
Character vector indicating the type(s) of features to
extract, any subset of |
intronType |
Character vector indicating how to define the introns
(only used if "intron" is part of |
flankLength |
Integer scalar indicating the length of the flanking
sequence added to each side of each extracted intron (only used if
"intron" is included in |
joinOverlappingIntrons |
Logical scalar indicating whether two introns that overlap (after adding the flanking sequences) should be joined into one feature. |
collapseIntronsByGene |
Logical scalar indicating whether to collapse overlapping intron ranges by gene after extraction. |
keepIntronInFeature |
Logical scalar indicating whether introns
(after adding the flank length) should be restricted to stay within the
boundaries of the feature they were generated from. For backwards
compatibility, the default is |
verbose |
Logical scalar, whether to print out progress messages. |
Returns a GRangesList
object where each element represents
one extracted feature. The metadata of this object contains two
data.frame
s mapping corresponding identifiers between the
different feature types, as well as a list of all features for each type.
Charlotte Soneson
## Get feature ranges grl <- getFeatureRanges( gtf = system.file("extdata/small_example.gtf", package = "eisaR"), featureType = c("spliced", "intron"), intronType = "separate", flankLength = 5L, joinOverlappingIntrons = FALSE, collapseIntronsByGene = FALSE, verbose = TRUE ) ## GRangesList grl ## Corresponding transcript/gene IDs S4Vectors::metadata(grl)$corrtx S4Vectors::metadata(grl)$corrgene ## List of features of different types S4Vectors::metadata(grl)$featurelist ## Get feature sequences if (requireNamespace("BSgenome", quietly = TRUE) && requireNamespace("GenomicFeatures", quietly = TRUE)) { library(BSgenome) genome <- Biostrings::readDNAStringSet( system.file("extdata/small_example_genome.fa", package = "eisaR")) seqs <- GenomicFeatures::extractTranscriptSeqs(x = genome, transcripts = grl) seqs }
## Get feature ranges grl <- getFeatureRanges( gtf = system.file("extdata/small_example.gtf", package = "eisaR"), featureType = c("spliced", "intron"), intronType = "separate", flankLength = 5L, joinOverlappingIntrons = FALSE, collapseIntronsByGene = FALSE, verbose = TRUE ) ## GRangesList grl ## Corresponding transcript/gene IDs S4Vectors::metadata(grl)$corrtx S4Vectors::metadata(grl)$corrgene ## List of features of different types S4Vectors::metadata(grl)$featurelist ## Get feature sequences if (requireNamespace("BSgenome", quietly = TRUE) && requireNamespace("GenomicFeatures", quietly = TRUE)) { library(BSgenome) genome <- Biostrings::readDNAStringSet( system.file("extdata/small_example_genome.fa", package = "eisaR")) seqs <- GenomicFeatures::extractTranscriptSeqs(x = genome, transcripts = grl) seqs }
From a transcript database package (TxDb
),
extract exonic and gene body ranges for use with EISA. These regions can
be used to quantify RNA-seq alignments in exons and gene bodies, respectively.
Intronic counts can then be obtained from the difference between gene bodies
and exonic region counts.
getRegionsFromTxDb(txdb, exonExt = 10L, strandedData = TRUE)
getRegionsFromTxDb(txdb, exonExt = 10L, strandedData = TRUE)
txdb |
a |
exonExt |
|
strandedData |
|
The exonic regions are generated as follows:
extract exons by gene from the txdb
extend each exon by exonExt
combine overlapping exons within each gene
create gene body ranges from the most extreme exonic coordinates
filter out genes that have only a single exon (no intron), have exons on more than a single chromosome or on both strands, or that overlap other genes
a list
with elements "exons" and "genebodies", containing
named GenomicRanges
objects with ranges for exons and gene bodies,
respectively.
Michael Stadler
TxDb
for details on
TxDb
objects and the txdbmaker
package for how to create them,
e.g. from .gtf
files.
if (requireNamespace("AnnotationDbi", quietly = TRUE)) { txdb <- AnnotationDbi::loadDb(system.file("extdata", "hg19sub.sqlite", package = "eisaR")) regL <- getRegionsFromTxDb(txdb) lengths(regL) }
if (requireNamespace("AnnotationDbi", quietly = TRUE)) { txdb <- AnnotationDbi::loadDb(system.file("extdata", "hg19sub.sqlite", package = "eisaR")) regL <- getRegionsFromTxDb(txdb) lengths(regL) }
Generate a data.frame
mapping transcript IDs to gene IDs, based on
a GRangesList object generated by getFeatureRanges
.
getTx2Gene(grl, filepath = NULL)
getTx2Gene(grl, filepath = NULL)
grl |
|
filepath |
Either |
Invisibly returns a data.frame
with the transcript-to-gene
mapping.
Charlotte Soneson
## Get feature ranges grl <- getFeatureRanges( gtf = system.file("extdata/small_example.gtf", package = "eisaR"), featureType = c("spliced", "intron"), intronType = "separate", flankLength = 5L, joinOverlappingIntrons = FALSE, verbose = TRUE ) ## Get transcript-to-gene mapping t2g <- getTx2Gene(grl = grl) t2g
## Get feature ranges grl <- getFeatureRanges( gtf = system.file("extdata/small_example.gtf", package = "eisaR"), featureType = c("spliced", "intron"), intronType = "separate", flankLength = 5L, joinOverlappingIntrons = FALSE, verbose = TRUE ) ## Get transcript-to-gene mapping t2g <- getTx2Gene(grl = grl) t2g
plotEISA
takes the return value from runEISA
and generates a scatterplot of intronic versus exonic changes.
plotEISA( x, contrast = c("ExIn", "none"), minLfc = NULL, maxFDR = 0.05, genecolors = c("#E41A1C", "#497AB3", "#222222"), ... )
plotEISA( x, contrast = c("ExIn", "none"), minLfc = NULL, maxFDR = 0.05, genecolors = c("#E41A1C", "#497AB3", "#222222"), ... )
x |
|
contrast |
one of |
minLfc |
|
maxFDR |
|
genecolors |
Vector of length three specifying the colors to use for genes that are significantly up, down or unchanged. |
... |
further arguments past to
|
NULL
(invisibly)
Michael Stadler
# see the help for runEISA() for a full example
# see the help for runEISA() for a full example
Starting from count tables with exonic and intronic counts for two conditions, perform all the steps in EISA (normalize, identify quantifyable genes, calculate contrasts and their significance).
runEISA( cntEx, cntIn, cond, method = NULL, modelSamples = TRUE, geneSelection = c("filterByExpr", "none", "Gaidatzis2015"), statFramework = c("QLF", "LRT"), legacyQLF = FALSE, effects = c("predFC", "Gaidatzis2015"), pscnt = 2, sizeFactor = c("exon", "intron", "individual"), recalcNormFactAfterFilt = TRUE, recalcLibSizeAfterFilt = FALSE, ... )
runEISA( cntEx, cntIn, cond, method = NULL, modelSamples = TRUE, geneSelection = c("filterByExpr", "none", "Gaidatzis2015"), statFramework = c("QLF", "LRT"), legacyQLF = FALSE, effects = c("predFC", "Gaidatzis2015"), pscnt = 2, sizeFactor = c("exon", "intron", "individual"), recalcNormFactAfterFilt = TRUE, recalcLibSizeAfterFilt = FALSE, ... )
cntEx |
Gene by sample |
cntIn |
Gene by sample |
cond |
|
method |
One of |
modelSamples |
Whether to include a sample identifier in the design matrix
of the statistical model. If |
geneSelection |
Controls how to select quantifyable genes. One of the following:
|
statFramework |
Selects the framework within
. |
legacyQLF |
Whether to use the 'legacy' version of
|
effects |
How the effects (contrasts or log2 fold-changes) are calculated. One of:
|
pscnt |
|
sizeFactor |
How the size factors are calculated in the analysis. If 'exon' (default), the exon-derived size factors are used also for the columns corresponding to intronic counts. If 'intron', the intron-derived size factors are used also for the columns corresponding to exonic counts. If 'individual', column-wise size factors are calculated. |
recalcNormFactAfterFilt |
Logical, indicating whether normalization factors should be recalculated after filtering out lowly expressed genes. |
recalcLibSizeAfterFilt |
Logical, indicating whether library sizes should be recalculated after filtering out lowly expressed genes. |
... |
additional arguments passed to the |
Setting method = "Gaidatzis2015"
has precedence over other
argument values and corresponds to setting:
modelSamples = FALSE, geneSelection = "Gaidatzis2015",
statFramework = "LRT", effects = "Gaidatzis2015", pscnt = 8,
sizeFactor = "individual", recalcNormFactAfterFilt = TRUE,
recalcLibSizeAfterFilt = FALSE
.
a list
with elements
fraction intronic counts in each sample
contrast name
contrast matrix for quantifyable genes, with average log2
fold-changes in exons (Dex
), in introns (Din
), and average
difference between log2 fold-changes in exons and introns (Dex.Din
)
DGEList
object used in model fitting
statisical results for differential changes between exonic and intronic contrast, an indication for post-transcriptional regulation.
contrast vector used for testing the difference between
exonic and intronic contrast (results in tab.ExIn
)
design matrix used for testing the difference between
exonic and intronic contrast (results in tab.ExIn
)
a list
with parameter values used to run EISA
Michael Stadler
Analysis of intronic and exonic reads in RNA-seq data characterizes transcriptional and post-transcriptional regulation. Dimos Gaidatzis, Lukas Burger, Maria Florescu and Michael B. Stadler Nature Biotechnology, 2015 Jul;33(7):722-9. doi: 10.1038/nbt.3269.
DGEList
for DGEList
object construction,
calcNormFactors
for normalization,
filterByExpr
for gene selection,
glmFit
and glmQLFit
for statistical
analysis.
cntEx <- readRDS(system.file("extdata", "Fig3abc_GSE33252_rawcounts_exonic.rds", package = "eisaR"))[,-1] cntIn <- readRDS(system.file("extdata", "Fig3abc_GSE33252_rawcounts_intronic.rds", package = "eisaR"))[,-1] cond <- factor(c("ES","ES","TN","TN")) res <- runEISA(cntEx, cntIn, cond) plotEISA(res)
cntEx <- readRDS(system.file("extdata", "Fig3abc_GSE33252_rawcounts_exonic.rds", package = "eisaR"))[,-1] cntIn <- readRDS(system.file("extdata", "Fig3abc_GSE33252_rawcounts_intronic.rds", package = "eisaR"))[,-1] cond <- factor(c("ES","ES","TN","TN")) res <- runEISA(cntEx, cntIn, cond) plotEISA(res)