Title: | Estimate Promoter Activity from RNA-Seq data |
---|---|
Description: | Most human genes have multiple promoters that control the expression of different isoforms. The use of these alternative promoters enables the regulation of isoform expression pre-transcriptionally. Alternative promoters have been found to be important in a wide number of cell types and diseases. proActiv is an R package that enables the analysis of promoters from RNA-seq data. proActiv uses aligned reads as input, and generates counts and normalized promoter activity estimates for each annotated promoter. In particular, proActiv accepts junction files from TopHat2 or STAR or BAM files as inputs. These estimates can then be used to identify which promoter is active, which promoter is inactive, and which promoters change their activity across conditions. proActiv also allows visualization of promoter activity across conditions. |
Authors: | Deniz Demircioglu [aut] , Jonathan Göke [aut], Joseph Lee [cre] |
Maintainer: | Joseph Lee <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.17.0 |
Built: | 2024-12-14 03:52:49 UTC |
Source: | https://github.com/bioc/proActiv |
Visualizes promoter activity and gene expression with boxplots
boxplotPromoters( result, geneId, geneName = NULL, filterInternal = TRUE, col = NULL )
boxplotPromoters( result, geneId, geneName = NULL, filterInternal = TRUE, col = NULL )
result |
A SummarizedExperiment object return by proActiv, with assays giving promoter counts and activity with gene expression stored as metadata. rowData contains promoter metadata and absolute promoter activity summarized across conditions. Condition must be provided. |
geneId |
A character vector. A single gene id. This identifier must correspond to the identifier in the promoter annotation. |
geneName |
A character vector. Common gene name to be displayed on plot. Optional. Defaults to NULL. |
filterInternal |
A boolean. Determines if internal promoters should be removed from the plot. Defaults to TRUE. |
col |
A character vector of colours to be used for plotting. |
A list of length 3. Each entry is a plot corresponding to absolute promoter activity, relative promoter activity and gene expression.
files <- list.files(system.file('extdata/vignette/junctions', package = 'proActiv'), full.names = TRUE, pattern = 'replicate5') promoterAnnotation <- promoterAnnotation.gencode.v34.subset result <- proActiv(files = files, promoterAnnotation = promoterAnnotation, condition = rep(c('A549', 'HepG2'), each=1), fileLabels = NULL, ncores = 1) plots <- boxplotPromoters(result, "ENSG00000076864.19")
files <- list.files(system.file('extdata/vignette/junctions', package = 'proActiv'), full.names = TRUE, pattern = 'replicate5') promoterAnnotation <- promoterAnnotation.gencode.v34.subset result <- proActiv(files = files, promoterAnnotation = promoterAnnotation, condition = rep(c('A549', 'HepG2'), each=1), fileLabels = NULL, ncores = 1) plots <- boxplotPromoters(result, "ENSG00000076864.19")
Calculate the total number of junction reads overlapping with the introns of each promoter for the input junction file
calculateJunctionReadCounts( promoterCoordinates, intronRanges, file = "", fileType = "", genome = "" )
calculateJunctionReadCounts( promoterCoordinates, intronRanges, file = "", fileType = "", genome = "" )
promoterCoordinates |
A GRanges object containing promoter coordinates and reduced exon coordinates by gene |
intronRanges |
A Granges object containing the annotated unique intron ranges. These ranges will be used for counting the reads |
file |
character path for the input junction bed or bam file |
fileType |
character type of the junction bed file. Either 'tophat', 'star' or 'bam' |
genome |
character genome version |
The total number of junction reads overlapping with each promoter for the input annotated intron ranges
Calculate the promoter read counts using junction read counts approach for all the input junction files
calculatePromoterReadCounts( promoterAnnotation, files = NULL, fileLabels = NULL, fileType = NULL, genome = NULL, numberOfCores = 1 )
calculatePromoterReadCounts( promoterAnnotation, files = NULL, fileLabels = NULL, fileType = NULL, genome = NULL, numberOfCores = 1 )
promoterAnnotation |
A PromoterAnnotation object containing the reduced exon ranges, annotated intron ranges, promoter coordinates and the promoter id mapping |
files |
A character vector. The list of junction or BAM files for which the junction read counts will be calculated |
fileLabels |
A character vector. The labels of junction or BAM files for which the junction read counts will be calculated. These labels will be used as column names for the output data.frame object |
fileType |
A character. Type of the junction bed or bam file, either 'tophat', 'star' or 'bam |
genome |
A character. Genome version used. Must be specified if input is a BAM file. Defaults to NULL |
numberOfCores |
A numeric value. The number of cores to be used for counting junction reads. Defaults to 1 (no parallelization). This parameter will be used as an argument to BiocParallel::bplapply |
A data.frame object. The number of junction reads per promoter (rows) for each sample (cols)
Prepare the absolute promoter activity table including the promoter and gene ids
getAbsolutePromoterActivity( junctionReadCounts, promoterAnnotation, log2 = TRUE, pseudocount = 1 )
getAbsolutePromoterActivity( junctionReadCounts, promoterAnnotation, log2 = TRUE, pseudocount = 1 )
junctionReadCounts |
Matrix of junction read counts (rows: promoters, cols: samples) |
promoterAnnotation |
A PromoterAnnotation object containing the intron ranges, promoter coordinates and the promoter id mapping |
log2 |
Logical indicating whether log2 read counts should be used (default: TRUE) or not |
pseudocount |
Number to be used for log2 as pseudocount if log2 is TRUE |
data.frame of absolute promoter activity with promoter and gene ids
Identifies alternative promoters.
getAlternativePromoters( result, referenceCondition, minAbs = 0.25, minRel = 0.05, maxPval = 0.05, promoterFC = 2, geneFC = 1.5 )
getAlternativePromoters( result, referenceCondition, minAbs = 0.25, minRel = 0.05, maxPval = 0.05, promoterFC = 2, geneFC = 1.5 )
result |
A SummarizedExperiment object with assays giving promoter counts, activity and gene expression (output from proActiv). rowData contains promoter metadata and absolute promoter activity summarized across conditions. Condition must be provided. |
referenceCondition |
A character vector. The reference condition to be compared. Samples corresponding to all other conditions will be compared to this samples in this current condition. |
minAbs |
A numeric value. Minimum value for promoter to be active in absolute terms. Defaults to 0.25. |
minRel |
A numeric value. Minimum value for promoter to be active in relative terms. Defaults to 0.05. |
maxPval |
A numeric value. Adjusted p-value threshold for detecting alternative promoters. Defaults to 0.05. |
promoterFC |
A numeric value. Minimum fold change for a promoter in the current condition compared to all other conditions. Promoters must have at least this magnitude of fold change for alternative usage. |
geneFC |
A numeric value. Maximum fold change for gene expression. To identify alternative promoter usage independent of changes in gene expression, limit the gene expression fold change. |
A list of length 2. Each entry is a dataframe summarizing up-regulated and down-regulated promoters and their corresponding genes, if any.
files <- list.files(system.file('extdata/vignette/junctions', package = 'proActiv'), full.names = TRUE, pattern = 'replicate5') promoterAnnotation <- promoterAnnotation.gencode.v34.subset result <- proActiv(files = files, promoterAnnotation = promoterAnnotation, condition = rep(c('A549', 'HepG2'), each=1), fileLabels = NULL, ncores = 1) alternativePromoters <- getAlternativePromoters(result, "A549")
files <- list.files(system.file('extdata/vignette/junctions', package = 'proActiv'), full.names = TRUE, pattern = 'replicate5') promoterAnnotation <- promoterAnnotation.gencode.v34.subset result <- proActiv(files = files, promoterAnnotation = promoterAnnotation, condition = rep(c('A549', 'HepG2'), each=1), fileLabels = NULL, ncores = 1) alternativePromoters <- getAlternativePromoters(result, "A549")
Prepare the gene expression table including the gene ids
getGeneExpression(absolutePromoterActivity)
getGeneExpression(absolutePromoterActivity)
absolutePromoterActivity |
data.frame of absolute promoter activity with promoter and gene ids |
data.frame of gene expression with gene ids#'
Prepare the relative promoter activity table including the promoter and gene ids
getRelativePromoterActivity(absolutePromoterActivity, geneExpression)
getRelativePromoterActivity(absolutePromoterActivity, geneExpression)
absolutePromoterActivity |
data.frame of absolute promoter activity with promoter and gene ids |
geneExpression |
data.frame of gene expression with gene ids |
data.frame of relative promoter activity with promoter and gene ids
Integrate multiple proActiv runs
integrateProactiv(res1, res2, ..., promoterAnnotation, renormalize = TRUE)
integrateProactiv(res1, res2, ..., promoterAnnotation, renormalize = TRUE)
res1 |
A summarizedExperiment object returned by proActiv |
res2 |
A summarizedExperiment object returned by proActiv |
... |
Additional summarizedExperiment objects returned by proActiv |
promoterAnnotation |
Promoter annotation object used to create proActiv runs |
renormalize |
Whether to renormalize counts after merging. Defaults to TRUE |
A SummarizedExperiment object with assays giving promoter counts and activity with gene expression. rowData contains promoter metadata and absolute promoter activity summarized across conditions (if condition is provided)
f1 <- list.files(system.file('extdata/vignette/junctions', package = 'proActiv'), full.names = TRUE, pattern = 'A549') f2 <- list.files(system.file('extdata/vignette/junctions', package = 'proActiv'), full.names = TRUE, pattern = 'HepG2') promoterAnnotation <- promoterAnnotation.gencode.v34.subset res1 <- proActiv(files = f1, promoterAnnotation = promoterAnnotation, condition = rep('A549',3)) res2 <- proActiv(files = f2, promoterAnnotation = promoterAnnotation, condition = rep('HepG2',3)) res <- integrateProactiv(res1, res2, promoterAnnotation = promoterAnnotation)
f1 <- list.files(system.file('extdata/vignette/junctions', package = 'proActiv'), full.names = TRUE, pattern = 'A549') f2 <- list.files(system.file('extdata/vignette/junctions', package = 'proActiv'), full.names = TRUE, pattern = 'HepG2') promoterAnnotation <- promoterAnnotation.gencode.v34.subset res1 <- proActiv(files = f1, promoterAnnotation = promoterAnnotation, condition = rep('A549',3)) res2 <- proActiv(files = f2, promoterAnnotation = promoterAnnotation, condition = rep('HepG2',3)) res <- integrateProactiv(res1, res2, promoterAnnotation = promoterAnnotation)
Normalize promoter read counts using DESeq2
normalizePromoterReadCounts(promoterReadCounts)
normalizePromoterReadCounts(promoterReadCounts)
promoterReadCounts |
A data.frame object. The number of junction reads per promoter (rows) for each sample (cols) |
A data.frame object. The normalized number of junction reads per promoter (rows) for each sample (cols) using DESeq2 counts function. Requires 'DESeq2' package to be installed
Visualizes heatmap of features for samples
plotHeatmap( result, by = "absolutePromoterActivity", features = NULL, cex.legend = 0.75, cex.row = NULL, cex.col = NULL, row.margin = 5, col.margin = 12, col = NULL, breaks = NULL, palette = "bluered" )
plotHeatmap( result, by = "absolutePromoterActivity", features = NULL, cex.legend = 0.75, cex.row = NULL, cex.col = NULL, row.margin = 5, col.margin = 12, col = NULL, breaks = NULL, palette = "bluered" )
result |
A SummarizedExperiment object return by proActiv, with assays giving promoter counts and activity with gene expression stored as metadata. rowData contains promoter metadata and absolute promoter activity summarized across conditions. Condition must be provided. |
by |
A character vector. The assay to visualize the heatmap for. One of promoterCounts, normalizedPromoterCounts, absolutePromoterActivity and geneExpression (unambiguous substrings can be supplied). Defaults to absolutePromoterActivity. |
features |
Features to visualize. Either a list of promoterIds or geneIds. The features must correspond to the assay is used, i.e., if promoter assays are used, features must be promoterIds, while if gene expression assay is used, features must be geneIds. Defaults to NULL (visualizes all features of the assay). |
cex.legend |
A numeric value. Legend size. |
cex.row |
A numeric value. Row label size. |
cex.col |
A numeric value. Column label size. |
row.margin |
A numeric value. Row margins. |
col.margin |
A numeric value. Column margins. |
col |
A vector of colours. Length should correspond to number of experimental conditions. Defaults to NULL. |
breaks |
A numeric vector. Breaks for heatmap plotting. |
palette |
A character vector. One of bluered, redblue, redgreen, greenred. Defaults to bluered. |
Displays heatmap.
files <- list.files(system.file('extdata/vignette/junctions', package = 'proActiv'), full.names = TRUE, pattern = 'replicate5') promoterAnnotation <- promoterAnnotation.gencode.v34.subset result <- proActiv(files = files, promoterAnnotation = promoterAnnotation, condition = rep(c('A549', 'HepG2'), each=1), fileLabels = NULL, ncores = 1) result <- result[complete.cases(assays(result)[[1]]),] plotHeatmap(result)
files <- list.files(system.file('extdata/vignette/junctions', package = 'proActiv'), full.names = TRUE, pattern = 'replicate5') promoterAnnotation <- promoterAnnotation.gencode.v34.subset result <- proActiv(files = files, promoterAnnotation = promoterAnnotation, condition = rep(c('A549', 'HepG2'), each=1), fileLabels = NULL, ncores = 1) result <- result[complete.cases(assays(result)[[1]]),] plotHeatmap(result)
Performs principal component analysis
plotPCA( result, by = "absolutePromoterActivity", main = NULL, col = NULL, alpha = 0.75, cex.size = 2 )
plotPCA( result, by = "absolutePromoterActivity", main = NULL, col = NULL, alpha = 0.75, cex.size = 2 )
result |
A SummarizedExperiment object return by proActiv, with assays giving promoter counts and activity with gene expression stored as metadata. rowData contains promoter metadata and absolute promoter activity summarized across conditions. Condition must be provided. |
by |
A character vector. The assay to perform principal component analysis by. One of promoterCounts, normalizedPromoterCounts, absolutePromoterActivity and geneExpression (unambiguous substrings can be supplied). Defaults to absolutePromoterActivity. |
main |
A character vector. Plot title (optional). Defaults to NULL. |
col |
A vector of colours. If NULL, uses standard ggplot colours. Defaults to NULL. |
alpha |
A numeric value in between 0 and 1. Determines point transparency. |
cex.size |
A numeric value. Determines point size. |
PCA plot.
files <- list.files(system.file('extdata/vignette/junctions', package = 'proActiv'), full.names = TRUE, pattern = 'replicate5') promoterAnnotation <- promoterAnnotation.gencode.v34.subset result <- proActiv(files = files, promoterAnnotation = promoterAnnotation, condition = rep(c('A549', 'HepG2'), each=1), fileLabels = NULL, ncores = 1) result <- result[complete.cases(assays(result)[[1]]),] plotPCA(result)
files <- list.files(system.file('extdata/vignette/junctions', package = 'proActiv'), full.names = TRUE, pattern = 'replicate5') promoterAnnotation <- promoterAnnotation.gencode.v34.subset result <- proActiv(files = files, promoterAnnotation = promoterAnnotation, condition = rep(c('A549', 'HepG2'), each=1), fileLabels = NULL, ncores = 1) result <- result[complete.cases(assays(result)[[1]]),] plotPCA(result)
Prepares promoter annotation from a gtf or txdb
preparePromoterAnnotation(txdb, file, species)
preparePromoterAnnotation(txdb, file, species)
txdb |
A txdb object. The txdb of the annotation version for which promoters will be identified. Either 'txdb' or 'file' argument must be specified, but not both. |
file |
A character object. The file path to a gtf/gff or txdb of the annotation version for which promoters will be identified. Either 'txdb' or 'file' argument must be specified, but not both. |
species |
A character object. The genus and species of the organism to be used in keepStandardChromosomes(). Supported species can be seen with names(genomeStyles()). |
A PromoterAnnotation object. The annotated intron ranges, promoter coordinates and the promoter id mapping are attributes of the promoter annotation data.
txdbPath <- system.file('extdata/vignette/annotations/', 'gencode.v34.annotation.subset.sqlite', package = 'proActiv') txdb <- AnnotationDbi::loadDb(txdbPath) promoterAnnotation <- preparePromoterAnnotation(txdb = txdb, species = 'Homo_sapiens')
txdbPath <- system.file('extdata/vignette/annotations/', 'gencode.v34.annotation.subset.sqlite', package = 'proActiv') txdb <- AnnotationDbi::loadDb(txdbPath) promoterAnnotation <- preparePromoterAnnotation(txdb = txdb, species = 'Homo_sapiens')
Estimates promoter counts and activity in a single command
proActiv( files, promoterAnnotation, fileLabels = NULL, condition = NULL, genome = NULL, ncores = 1 )
proActiv( files, promoterAnnotation, fileLabels = NULL, condition = NULL, genome = NULL, ncores = 1 )
files |
A character vector. The list of input files for which the junction read counts will be calculated |
promoterAnnotation |
A PromoterAnnotation object containing the intron ranges, promoter coordinates and the promoter id mapping |
fileLabels |
A character vector. The labels of input files for which the junction read counts will be calculated. These labels will be used as column names for each output data.frame object. If not provided, filenames will be used as labels. Defaults to NULL |
condition |
A character vector. The condition to which each sample belong to. Must correspond to the order of the files. If supplied, results are summarized by condition. Defaults to NULL |
genome |
A character. Genome version. Must be specified if input file type is a BAM file. Defaults to NULL |
ncores |
A numeric value. The number of cores to be used for counting junction reads. Defaults to 1 (no parallelization). This parameter will be used as an argument to BiocParallel::bplapply |
A SummarizedExperiment object with assays giving promoter counts and activity with gene expression. rowData contains promoter metadata and absolute promoter activity summarized across conditions (if condition is provided)
files <- list.files(system.file('extdata/vignette/junctions', package = 'proActiv'), full.names = TRUE, pattern = 'replicate5') promoterAnnotation <- promoterAnnotation.gencode.v34.subset result <- proActiv(files = files, promoterAnnotation = promoterAnnotation, condition = rep(c('A549', 'HepG2'), each=1), fileLabels = NULL, ncores = 1)
files <- list.files(system.file('extdata/vignette/junctions', package = 'proActiv'), full.names = TRUE, pattern = 'replicate5') promoterAnnotation <- promoterAnnotation.gencode.v34.subset result <- proActiv(files = files, promoterAnnotation = promoterAnnotation, condition = rep(c('A549', 'HepG2'), each=1), fileLabels = NULL, ncores = 1)
S4 class for promoter annotation data for a specific annotation version
PromoterAnnotation( intronRanges = GenomicRanges::GRanges(), promoterIdMapping = data.frame(), promoterCoordinates = GenomicRanges::GRanges() ) intronRanges(x) ## S4 method for signature 'PromoterAnnotation' intronRanges(x) promoterIdMapping(x) ## S4 method for signature 'PromoterAnnotation' promoterIdMapping(x) promoterCoordinates(x) ## S4 method for signature 'PromoterAnnotation' promoterCoordinates(x) intronRanges(x) <- value ## S4 replacement method for signature 'PromoterAnnotation' intronRanges(x) <- value promoterIdMapping(x) <- value ## S4 replacement method for signature 'PromoterAnnotation' promoterIdMapping(x) <- value promoterCoordinates(x) <- value ## S4 replacement method for signature 'PromoterAnnotation' promoterCoordinates(x) <- value
PromoterAnnotation( intronRanges = GenomicRanges::GRanges(), promoterIdMapping = data.frame(), promoterCoordinates = GenomicRanges::GRanges() ) intronRanges(x) ## S4 method for signature 'PromoterAnnotation' intronRanges(x) promoterIdMapping(x) ## S4 method for signature 'PromoterAnnotation' promoterIdMapping(x) promoterCoordinates(x) ## S4 method for signature 'PromoterAnnotation' promoterCoordinates(x) intronRanges(x) <- value ## S4 replacement method for signature 'PromoterAnnotation' intronRanges(x) <- value promoterIdMapping(x) <- value ## S4 replacement method for signature 'PromoterAnnotation' promoterIdMapping(x) <- value promoterCoordinates(x) <- value ## S4 replacement method for signature 'PromoterAnnotation' promoterCoordinates(x) <- value
intronRanges |
A GRanges object containing annotated intron ranges |
promoterIdMapping |
A data.frame containing mapping between transcript, TSS, promoter and gene ids |
promoterCoordinates |
A GRanges object containing promoter coordinates |
x |
A PromoterAnnotation object |
value |
intronRanges, promoterIdMapping or promoterCoordinates to be assigned |
A promoter annotation object with three slots: intronRanges, promoterIdMapping and promoter Coordinates
intronRanges
: Getter for intronRanges
intronRanges,PromoterAnnotation-method
: Getter for intronRanges
promoterIdMapping
: Getter for promoterIdMapping
promoterIdMapping,PromoterAnnotation-method
: Getter for promoterIdMapping
promoterCoordinates
: Getter for promoterCoordinates
promoterCoordinates,PromoterAnnotation-method
: Getter for promoterCoordinates
intronRanges<-
: Setter for intronRanges
intronRanges<-,PromoterAnnotation-method
: Setter for intronRanges
promoterIdMapping<-
: Setter for promoterIdMapping
promoterIdMapping<-,PromoterAnnotation-method
: Setter for promoterIdMapping
promoterCoordinates<-
: Setter for promoterCoordinates
promoterCoordinates<-,PromoterAnnotation-method
: Setter for promoterCoordinates
intronRanges
A GRanges object. The intron ranges annotated with the promoter information.
promoterIdMapping
A data.frame object. The id mapping between transcript ids, names, TSS ids, promoter ids and gene ids.
promoterCoordinates
A GRanges object. Promoter coordinates (TSS) with gene id and internal promoter state
promoterAnnotation <- PromoterAnnotation() intronRanges(promoterAnnotation) <- intronRanges( promoterAnnotation.gencode.v34.subset) promoterIdMapping(promoterAnnotation) <- promoterIdMapping( promoterAnnotation.gencode.v34.subset) promoterCoordinates(promoterAnnotation) <- promoterCoordinates( promoterAnnotation.gencode.v34.subset)
promoterAnnotation <- PromoterAnnotation() intronRanges(promoterAnnotation) <- intronRanges( promoterAnnotation.gencode.v34.subset) promoterIdMapping(promoterAnnotation) <- promoterIdMapping( promoterAnnotation.gencode.v34.subset) promoterCoordinates(promoterAnnotation) <- promoterCoordinates( promoterAnnotation.gencode.v34.subset)
Promoter annotation for Gencode.v34 (chr1:10,000,000 - 30,000,000)
promoterAnnotation.gencode.v34.subset
promoterAnnotation.gencode.v34.subset
A PromoterAnnotation (S4 Class) object containing all promoter annotation objects for Gencode.v34 chr1:10,000,000-30,000,000. The object has 3 slots:
A GRanges object of 4,523 ranges corresponding to introns, annotated with the associated transcript.
The id mapping between transcript names, promoter ids and gene ids for Gencode v34.
A GRanges object of 1,380 ranges showing the tss coordinate for each promoter of Gencode v34 chr1:10,000,000-30,000,000, annotated with the associated gene id, coordinate of the 3' end of the first reduced exon, and intron id.