Package 'mpra' reference manual

Title:	Analyze massively parallel reporter assays
Description:	Tools for data management, count preprocessing, and differential analysis in massively parallel report assays (MPRA).
Authors:	Leslie Myint [cre, aut], Kasper D. Hansen [aut]
Maintainer:	Leslie Myint <[email protected]>
License:	Artistic-2.0
Version:	1.29.0
Built:	2025-03-29 06:17:53 UTC
Source:	https://github.com/bioc/mpra

Analyze massively parallel reporter assays

Description

Tools for data management, count preprocessing, and differential analysis in massively parallel report assays (MPRA).

Details

This package provides tools for the analysis of MPRA data. The primary purpose is to enable powerful differential analysis of activity measures, but the package can also be used to generate precision weights useful in regression analyses of activity scores on sequence features. The main workhorse is the mpralm function which draws on the previously proposed voom framework for RNA-seq analysis in the limma package.

Author(s)

Leslie Myint [cre, aut], Kasper D. Hansen [aut]

Maintainer: Leslie Myint <[email protected]>

References

Myint, Leslie, Dimitrios G. Avramopoulos, Loyal A. Goff, and Kasper D. Hansen. Linear models enable powerful differential activity analysis in massively parallel reporter assays. BMC Genomics 2019, 209. doi:10.1186/s12864-019-5556-x.

Law, Charity W., Yunshun Chen, Wei Shi, and Gordon K. Smyth. Voom: Precision Weights Unlock Linear Model Analysis Tools for RNA-Seq Read Counts. Genome Biology 2014, 15:R29. doi:10.1186/gb-2014-15-2-r29.

Smyth, Gordon K., Joelle Michaud, and Hamish S. Scott. Use of within-Array Replicate Spots for Assessing Differential Expression in Microarray Experiments. Bioinformatics 2005, 21 (9): 2067-75. doi:10.1093/bioinformatics/bti270.

Examples

data(mpraSetAggExample)
design <- data.frame(intcpt = 1,
                     episomal = grepl("MT", colnames(mpraSetAggExample)))
mpralm_fit <- mpralm(object = mpraSetAggExample, design = design,
                     aggregate = "none", normalize = TRUE,
                     model_type = "indep_groups", plot = FALSE)
toptab <- topTable(mpralm_fit, coef = 2, number = Inf)
head(toptab)
data(mpraSetAggExample)
design <- data.frame(intcpt = 1,
                     episomal = grepl("MT", colnames(mpraSetAggExample)))
mpralm_fit <- mpralm(object = mpraSetAggExample, design = design,
                     aggregate = "none", normalize = TRUE,
                     model_type = "indep_groups", plot = FALSE)
toptab <- topTable(mpralm_fit, coef = 2, number = Inf)
head(toptab)

Compute activity measure (log-ratio) for each element.

Description

Compute the log ratio of RNA counts to DNA counts using different methods. For "mean", uses the average of barcode-specific log ratios. For "sum", sums RNA and DNA counts over barcodes before forming the log ratio.

Usage

compute_logratio(object, aggregate = c("mean", "sum", "none"))
compute_logratio(object, aggregate = c("mean", "sum", "none"))

Arguments

`object`	An object of class `MPRASet`.
`aggregate`	Aggregation method over barcodes: `"mean"` to use the average of barcode-specific log ratios, `"sum"` to use the log ratio of summed RNA and DNA counts, `"none"` to perform no aggregation (counts have already been summarized over barcodes).

Value

A matrix with the same dimension as object, containing element- and sample-specific log ratios.

Examples

  data(mpraSetAggExample)
  logr <- compute_logratio(mpraSetAggExample, aggregate = "sum")
data(mpraSetAggExample)
  logr <- compute_logratio(mpraSetAggExample, aggregate = "sum")

Get precision weights from the copy number-variance relationship.

Description

Estimates the variability of the supplied log-ratios across samples as a function of copy number (DNA count levels).

Usage

get_precision_weights(logr, design, log_dna, span = 0.4, plot = TRUE, ...)
get_precision_weights(logr, design, log_dna, span = 0.4, plot = TRUE, ...)

Arguments

`logr`	Matrix of outcome measures: log2 ratio of RNA counts to DNA counts.
`design`	Design matrix specifying comparisons of interest.
`log_dna`	Matrix of log2 aggregated DNA counts of the same dimension as `logr`.
`span`	The smoothing span for `lowess` in estimating the copy number-variance relationship. Default: 0.4.
`plot`	If `TRUE`, plot the copy number-variance relationship.
`...`	Further arguments to be passed to `lmFit` for obtaining residual standard deviations used in estimating the copy number-variance relationship.

Details

Residual standard deviations are computed using the supplied outcomes and design matrix. The square root of the the residual standard deviations are modeled as a function of the average log2 aggregated DNA counts to estimate the copy number-variance relationship.

Value

A matrix of precision weights of the same dimension as logr and log_dna.

References

Examples

data(mpraSetAggExample)
design <- data.frame(intcpt = 1,
                     episomal = grepl("MT", colnames(mpraSetAggExample)))
logr <- compute_logratio(mpraSetAggExample, aggregate = "none")
log_dna <- log2(getDNA(mpraSetAggExample, aggregate = FALSE) + 1)
w <- get_precision_weights(logr = logr, design = design,
                           log_dna = log_dna, plot = FALSE)
data(mpraSetAggExample)
design <- data.frame(intcpt = 1,
                     episomal = grepl("MT", colnames(mpraSetAggExample)))
logr <- compute_logratio(mpraSetAggExample, aggregate = "none")
log_dna <- log2(getDNA(mpraSetAggExample, aggregate = FALSE) + 1)
w <- get_precision_weights(logr = logr, design = design,
                           log_dna = log_dna, plot = FALSE)

Linear models for differential analysis of MPRA data

Description

Fits weighted linear models to test for differential activity in MPRA data.

Usage

mpralm(object, design, aggregate = c("mean", "sum", "none"),
       normalize = TRUE, normalizeSize = 10e6,
       block = NULL, model_type = c("indep_groups", "corr_groups"),
       plot = TRUE, endomorphic = FALSE, ...)
mpralm(object, design, aggregate = c("mean", "sum", "none"),
       normalize = TRUE, normalizeSize = 10e6,
       block = NULL, model_type = c("indep_groups", "corr_groups"),
       plot = TRUE, endomorphic = FALSE, ...)

Arguments

`object`	An object of class `MPRASet`.
`design`	Design matrix specifying comparisons of interest. The number of rows of this matrix should equal the number of columns in `object`. The number of columns in this design matrix has no constraints and should correspond to the experimental design.
`aggregate`	Aggregation method over barcodes: `"mean"` to use the average of barcode-specific log ratios, `"sum"` to use the log ratio of summed RNA and DNA counts, `"none"` to perform no aggregation (counts have already been summarized over barcodes).
`normalize`	If `TRUE`, perform total count normalization before model fitting.
`normalizeSize`	If normalizing, the target library size (default is 10e6).
`block`	A vector giving the sample designations of the columns of `object`. The default, `NULL`, indicates that all columns are separate samples.
`model_type`	Indicates whether an unpaired model fit (`"indep_groups"`) or a paired mixed-model fit ((`"corr_groups"`)) should be used.
`plot`	If `TRUE`, plot the mean-variance relationship.
`endomorphic`	If `TRUE`, return the same class as the input, i.e. an object of class `MPRASet`.
`...`	Further arguments to be passed to `lmFit` for obtaining residual standard deviations used in estimating the mean-variance relationship.

Details

Using method_type = "corr_groups" use the duplicateCorrelation function from the limma package to estimate the intra-replicate correlation of log-ratio values.

Value

An object of class MArrayLM resulting from the eBayes function.

If endomorphic = TRUE, then an MPRASet is returned, with the output of topTable added to the rowData, and the MArrayLM results added as an attribute "MArrayLM".

References

Examples

data(mpraSetAggExample)
design <- data.frame(intcpt = 1,
                     episomal = grepl("MT", colnames(mpraSetAggExample)))
mpralm_fit <- mpralm(object = mpraSetAggExample, design = design,
                     aggregate = "none", normalize = TRUE, 
                     model_type = "indep_groups", plot = FALSE)
toptab <- topTable(mpralm_fit, coef = 2, number = Inf)
head(toptab)
data(mpraSetAggExample)
design <- data.frame(intcpt = 1,
                     episomal = grepl("MT", colnames(mpraSetAggExample)))
mpralm_fit <- mpralm(object = mpraSetAggExample, design = design,
                     aggregate = "none", normalize = TRUE, 
                     model_type = "indep_groups", plot = FALSE)
toptab <- topTable(mpralm_fit, coef = 2, number = Inf)
head(toptab)

Class `"MPRASet"`

Description

A container for data from massively parallel reporter assays (MPRA). Builds on the SummarizedExperiment class.

Usage

## Constructor
MPRASet(DNA = new("matrix"), RNA = new("matrix"),
        barcode = new("character"), eid = new("character"),
        eseq = new("character"), ...)

## Accessors
getRNA(object, aggregate = FALSE)
getDNA(object, aggregate = FALSE)
getBarcode(object)
getEid(object)
getEseq(object)
## Constructor
MPRASet(DNA = new("matrix"), RNA = new("matrix"),
        barcode = new("character"), eid = new("character"),
        eseq = new("character"), ...)

## Accessors
getRNA(object, aggregate = FALSE)
getDNA(object, aggregate = FALSE)
getBarcode(object)
getEid(object)
getEseq(object)

Arguments

`object`	A `MPRASet` object.
`aggregate`	A `logical` indicating if data should be aggregated to the element level (by summing across barcodes).
`DNA`	A matrix of DNA counts where rows correspond to elements or individual barcodes and columns to samples of conditions being compared.
`RNA`	A matrix of RNA counts where rows correspond to elements or individual barcodes and columns to samples of conditions being compared.
`barcode`	If barcodes are supplied, a `character` vector of length equal to the number of rows in `DNA` and `RNA` containing the barcode sequences or identifiers. `NULL` otherwise.
`eid`	A `character` vector of length equal to the number of rows in `DNA` and `RNA` containing the enhancer identifiers corresponding to each row.
`eseq`	If supplied, a `character` vector of length equal to the number of rows in `DNA` and `RNA` containing the enhancer sequences corresponding to the regulatory elements in each row. `NULL` otherwise.
`...`	Further arguments to be passed to `SummarizedExperiment`.

Value

The constrcutor function MPRASet returns an object of class "MPRASet".

Slots

Slots are as described in a SummarizedExperiment. Of particular interest are colData which describes the phenotype data. The assay slot holds the assay data, with specific assay names RNA and DNA (accessed by getRNA and getDNA). Element and barcode data are accessible in the rowData slot. We have chosen to store barcode and element as character to be flexible, although they are often DNA sequences (and could therefore be considered DNAStringSet (from package Biostrings)).

Extends

Class "SummarizedExperiment", directly.

Accessors

getDNA:: Gets the DNA channel data.
getRNA:: Gets the RNA channel data.
getBarcode:: Gets the barcode, if present.
getEid:: Gets the element ID
getEseq:: Gets the element sequence, if present.

Examples

showClass("MPRASet")
showClass("MPRASet")

Example data for the mpra package.

Description

Example data for the MPRA package. mpraSetExample and mpraSetAggExample come from a study by Inoue et al that compares episomal and lentiviral MPRA. The former contains data at the barcode level and the latter contains data aggregated over barcodes. mpraSetAllelicExample come from a study by Tewhey et al that looks at regulatory activity of allelic versions of thousands of SNPs to follow up on prior eQTL results.

Usage

data("mpraSetExample")
data("mpraSetAggExample")
data("mpraSetAllelicExample")
data("mpraSetExample")
data("mpraSetAggExample")
data("mpraSetAllelicExample")

Format

An MPRASet.

Details

mpraSetExample contains barcode level information for the study by Inoue et al. mpraSetAggExample contains count information from mpraSetExample where the counts have been summed over barcodes for each element. mpraSetAllelicExample contains count information for the Tewhey et al study. The counts have been summed over barcodes for each element.

Source

A script for creating the three datasets is supplied in the scripts folder of the package. The data are taken from the GEO submission associated with the paper (see references), specifically GSE83894 and GSE75661.

References

Inoue, Fumitaka, Martin Kircher, Beth Martin, Gregory M. Cooper, Daniela M. Witten, Michael T. McManus, Nadav Ahituv, and Jay Shendure. A Systematic Comparison Reveals Substantial Differences in Chromosomal versus Episomal Encoding of Enhancer Activity. Genome Research 2017, 27(1):38-52. doi:10.1101/gr.212092.116.

Tewhey R, Kotliar D, Park DS, Liu B, Winnicki S, Reilly SK, Andersen KG, Mikkelsen TS, Lander ES, Schaffner SF, Sabeti PC. Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay. Cell 2016, 165:1519-1529. doi:10.1016/j.cell.2016.04.027.

Examples

data(mpraSetAggExample)
data(mpraSetAggExample)

Total count normalization of DNA and RNA counts

Description

Total count normalization of DNA and RNA counts.

Usage

normalize_counts(object, normalizeSize = 10e6, block = NULL)
normalize_counts(object, normalizeSize = 10e6, block = NULL)

Arguments

`object`	An object of class `MPRASet`.
`normalizeSize`	If normalizing, the target library size (default is 10e6).
`block`	A vector giving the sample designations of the columns of `object`. The default, `NULL`, indicates that all columns are separate samples.

Details

block is a vector that is used when the columns of the MPRAset object are paired. This often is the case when comparing allelic versions of an element. In this case, the first $s$ columns of object give the counts for the reference allele in $s$ samples. The second $s$ columns give the counts for the alternative allele measured in the same $s$ samples. With 3 samples, block would be c(1,2,3,1,2,3). All columns are scaled to have 10 million counts.

Value

An object of class MPRASet with the total count-normalized DNA and RNA counts.

Examples

data(mpraSetAggExample)
mpraSetAggExample <- normalize_counts(mpraSetAggExample)
data(mpraSetAggExample)
mpraSetAggExample <- normalize_counts(mpraSetAggExample)

Package 'mpra'

Help Index

Analyze massively parallel reporter assays

Description

Details

Author(s)

References

Examples

Compute activity measure (log-ratio) for each element.

Description

Usage

Arguments

Value

Examples

Get precision weights from the copy number-variance relationship.

Description

Usage

Arguments

Details

Value

References

Examples

Linear models for differential analysis of MPRA data

Description

Usage

Arguments

Details

Value

References

Examples

Class "MPRASet"

Description

Usage

Arguments

Value

Slots

Extends

Accessors

See Also

Examples

Example data for the mpra package.

Description

Usage

Format

Details

Source

References

Examples

Total count normalization of DNA and RNA counts

Description

Usage

Arguments

Details

Value

Examples

Class `"MPRASet"`