Title: | Analyze massively parallel reporter assays |
---|---|
Description: | Tools for data management, count preprocessing, and differential analysis in massively parallel report assays (MPRA). |
Authors: | Leslie Myint [cre, aut], Kasper D. Hansen [aut] |
Maintainer: | Leslie Myint <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.29.0 |
Built: | 2024-10-30 08:55:37 UTC |
Source: | https://github.com/bioc/mpra |
Tools for data management, count preprocessing, and differential analysis in massively parallel report assays (MPRA).
This package provides tools for the analysis of MPRA data. The primary
purpose is to enable powerful differential analysis of activity
measures, but the package can also be used to generate precision
weights useful in regression analyses of activity scores on sequence
features. The main workhorse is the mpralm
function which draws
on the previously proposed voom
framework for RNA-seq analysis
in the limma
package.
Leslie Myint [cre, aut], Kasper D. Hansen [aut]
Maintainer: Leslie Myint <[email protected]>
Myint, Leslie, Dimitrios G. Avramopoulos, Loyal A. Goff, and Kasper D. Hansen. Linear models enable powerful differential activity analysis in massively parallel reporter assays. BMC Genomics 2019, 209. doi:10.1186/s12864-019-5556-x.
Law, Charity W., Yunshun Chen, Wei Shi, and Gordon K. Smyth. Voom: Precision Weights Unlock Linear Model Analysis Tools for RNA-Seq Read Counts. Genome Biology 2014, 15:R29. doi:10.1186/gb-2014-15-2-r29.
Smyth, Gordon K., Joelle Michaud, and Hamish S. Scott. Use of within-Array Replicate Spots for Assessing Differential Expression in Microarray Experiments. Bioinformatics 2005, 21 (9): 2067-75. doi:10.1093/bioinformatics/bti270.
data(mpraSetAggExample) design <- data.frame(intcpt = 1, episomal = grepl("MT", colnames(mpraSetAggExample))) mpralm_fit <- mpralm(object = mpraSetAggExample, design = design, aggregate = "none", normalize = TRUE, model_type = "indep_groups", plot = FALSE) toptab <- topTable(mpralm_fit, coef = 2, number = Inf) head(toptab)
data(mpraSetAggExample) design <- data.frame(intcpt = 1, episomal = grepl("MT", colnames(mpraSetAggExample))) mpralm_fit <- mpralm(object = mpraSetAggExample, design = design, aggregate = "none", normalize = TRUE, model_type = "indep_groups", plot = FALSE) toptab <- topTable(mpralm_fit, coef = 2, number = Inf) head(toptab)
Compute the log ratio of RNA counts to DNA counts using different methods.
For "mean"
, uses the average of barcode-specific log ratios.
For "sum"
, sums RNA and DNA counts over barcodes before forming the log ratio.
compute_logratio(object, aggregate = c("mean", "sum", "none"))
compute_logratio(object, aggregate = c("mean", "sum", "none"))
object |
An object of class |
aggregate |
Aggregation method over barcodes: |
A matrix
with the same dimension as object
, containing element-
and sample-specific log ratios.
data(mpraSetAggExample) logr <- compute_logratio(mpraSetAggExample, aggregate = "sum")
data(mpraSetAggExample) logr <- compute_logratio(mpraSetAggExample, aggregate = "sum")
Estimates the variability of the supplied log-ratios across samples as a function of copy number (DNA count levels).
get_precision_weights(logr, design, log_dna, span = 0.4, plot = TRUE, ...)
get_precision_weights(logr, design, log_dna, span = 0.4, plot = TRUE, ...)
logr |
Matrix of outcome measures: log2 ratio of RNA counts to DNA counts. |
design |
Design matrix specifying comparisons of interest. |
log_dna |
Matrix of log2 aggregated DNA counts of the same dimension as |
span |
The smoothing span for |
plot |
If |
... |
Further arguments to be passed to |
Residual standard deviations are computed using the supplied outcomes and design matrix. The square root of the the residual standard deviations are modeled as a function of the average log2 aggregated DNA counts to estimate the copy number-variance relationship.
A matrix of precision weights of the same dimension as logr
and log_dna
.
Law, Charity W., Yunshun Chen, Wei Shi, and Gordon K. Smyth. Voom: Precision Weights Unlock Linear Model Analysis Tools for RNA-Seq Read Counts. Genome Biology 2014, 15:R29. doi:10.1186/gb-2014-15-2-r29.
data(mpraSetAggExample) design <- data.frame(intcpt = 1, episomal = grepl("MT", colnames(mpraSetAggExample))) logr <- compute_logratio(mpraSetAggExample, aggregate = "none") log_dna <- log2(getDNA(mpraSetAggExample, aggregate = FALSE) + 1) w <- get_precision_weights(logr = logr, design = design, log_dna = log_dna, plot = FALSE)
data(mpraSetAggExample) design <- data.frame(intcpt = 1, episomal = grepl("MT", colnames(mpraSetAggExample))) logr <- compute_logratio(mpraSetAggExample, aggregate = "none") log_dna <- log2(getDNA(mpraSetAggExample, aggregate = FALSE) + 1) w <- get_precision_weights(logr = logr, design = design, log_dna = log_dna, plot = FALSE)
Fits weighted linear models to test for differential activity in MPRA data.
mpralm(object, design, aggregate = c("mean", "sum", "none"), normalize = TRUE, normalizeSize = 10e6, block = NULL, model_type = c("indep_groups", "corr_groups"), plot = TRUE, endomorphic = FALSE, ...)
mpralm(object, design, aggregate = c("mean", "sum", "none"), normalize = TRUE, normalizeSize = 10e6, block = NULL, model_type = c("indep_groups", "corr_groups"), plot = TRUE, endomorphic = FALSE, ...)
object |
An object of class |
design |
Design matrix specifying comparisons of interest. The
number of rows of this matrix should equal the number of columns
in |
aggregate |
Aggregation method over barcodes: |
normalize |
If |
normalizeSize |
If normalizing, the target library size (default is 10e6). |
block |
A vector giving the sample designations of the columns of
|
model_type |
Indicates whether an unpaired model fit
( |
plot |
If |
endomorphic |
If |
... |
Further arguments to be passed to |
Using method_type = "corr_groups"
use the
duplicateCorrelation
function from the limma
package to
estimate the intra-replicate correlation of log-ratio values.
An object of class MArrayLM
resulting from the eBayes
function.
If endomorphic = TRUE
, then an MPRASet
is returned,
with the output of topTable
added to the rowData
,
and the MArrayLM
results added as an attribute
"MArrayLM"
.
Myint, Leslie, Dimitrios G. Avramopoulos, Loyal A. Goff, and Kasper D. Hansen. Linear models enable powerful differential activity analysis in massively parallel reporter assays. BMC Genomics 2019, 209. doi:10.1186/s12864-019-5556-x.
Law, Charity W., Yunshun Chen, Wei Shi, and Gordon K. Smyth. Voom: Precision Weights Unlock Linear Model Analysis Tools for RNA-Seq Read Counts. Genome Biology 2014, 15:R29. doi:10.1186/gb-2014-15-2-r29.
Smyth, Gordon K., Joelle Michaud, and Hamish S. Scott. Use of within-Array Replicate Spots for Assessing Differential Expression in Microarray Experiments. Bioinformatics 2005, 21 (9): 2067-75. doi:10.1093/bioinformatics/bti270.
data(mpraSetAggExample) design <- data.frame(intcpt = 1, episomal = grepl("MT", colnames(mpraSetAggExample))) mpralm_fit <- mpralm(object = mpraSetAggExample, design = design, aggregate = "none", normalize = TRUE, model_type = "indep_groups", plot = FALSE) toptab <- topTable(mpralm_fit, coef = 2, number = Inf) head(toptab)
data(mpraSetAggExample) design <- data.frame(intcpt = 1, episomal = grepl("MT", colnames(mpraSetAggExample))) mpralm_fit <- mpralm(object = mpraSetAggExample, design = design, aggregate = "none", normalize = TRUE, model_type = "indep_groups", plot = FALSE) toptab <- topTable(mpralm_fit, coef = 2, number = Inf) head(toptab)
"MPRASet"
A container for data from massively parallel reporter assays (MPRA). Builds on the SummarizedExperiment class.
## Constructor MPRASet(DNA = new("matrix"), RNA = new("matrix"), barcode = new("character"), eid = new("character"), eseq = new("character"), ...) ## Accessors getRNA(object, aggregate = FALSE) getDNA(object, aggregate = FALSE) getBarcode(object) getEid(object) getEseq(object)
## Constructor MPRASet(DNA = new("matrix"), RNA = new("matrix"), barcode = new("character"), eid = new("character"), eseq = new("character"), ...) ## Accessors getRNA(object, aggregate = FALSE) getDNA(object, aggregate = FALSE) getBarcode(object) getEid(object) getEseq(object)
object |
A |
aggregate |
A |
DNA |
A matrix of DNA counts where rows correspond to elements or individual barcodes and columns to samples of conditions being compared. |
RNA |
A matrix of RNA counts where rows correspond to elements or individual barcodes and columns to samples of conditions being compared. |
barcode |
If barcodes are supplied, a |
eid |
A |
eseq |
If supplied, a |
... |
Further arguments to be passed to
|
The constrcutor function MPRASet
returns an object of class
"MPRASet"
.
Slots are as described in a SummarizedExperiment
. Of
particular interest are colData
which describes the phenotype
data. The assay
slot holds the assay data, with specific assay
names RNA
and DNA
(accessed by getRNA
and
getDNA
). Element and barcode data are accessible in the
rowData
slot. We have chosen to store barcode and element as
character
to be flexible, although they are often DNA sequences
(and could therefore be considered DNAStringSet
(from package
Biostrings)).
Class "SummarizedExperiment"
, directly.
getDNA
:Gets the DNA channel data.
getRNA
:Gets the RNA channel data.
getBarcode
:Gets the barcode, if present.
getEid
:Gets the element ID
getEseq
:Gets the element sequence, if present.
SummarizedExperiment
for the basic class that is
used as a building block.
showClass("MPRASet")
showClass("MPRASet")
Example data for the MPRA package. mpraSetExample
and
mpraSetAggExample
come from a study by Inoue et al
that compares episomal and lentiviral MPRA. The former contains
data at the barcode level and the latter contains data
aggregated over barcodes. mpraSetAllelicExample
come from
a study by Tewhey et al that looks at regulatory activity of
allelic versions of thousands of SNPs to follow up on prior
eQTL results.
data("mpraSetExample") data("mpraSetAggExample") data("mpraSetAllelicExample")
data("mpraSetExample") data("mpraSetAggExample") data("mpraSetAllelicExample")
An MPRASet
.
mpraSetExample
contains barcode level information for the
study by Inoue et al.
mpraSetAggExample
contains count information from
mpraSetExample
where the counts have been summed over
barcodes for each element.
mpraSetAllelicExample
contains count information for the
Tewhey et al study. The counts have been summed over barcodes
for each element.
A script for creating the three datasets is supplied in the
scripts
folder of the package. The data are taken from the GEO
submission associated with the paper (see references), specifically
GSE83894 and GSE75661.
Inoue, Fumitaka, Martin Kircher, Beth Martin, Gregory M. Cooper, Daniela M. Witten, Michael T. McManus, Nadav Ahituv, and Jay Shendure. A Systematic Comparison Reveals Substantial Differences in Chromosomal versus Episomal Encoding of Enhancer Activity. Genome Research 2017, 27(1):38-52. doi:10.1101/gr.212092.116.
Tewhey R, Kotliar D, Park DS, Liu B, Winnicki S, Reilly SK, Andersen KG, Mikkelsen TS, Lander ES, Schaffner SF, Sabeti PC. Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay. Cell 2016, 165:1519-1529. doi:10.1016/j.cell.2016.04.027.
data(mpraSetAggExample)
data(mpraSetAggExample)
Total count normalization of DNA and RNA counts.
normalize_counts(object, normalizeSize = 10e6, block = NULL)
normalize_counts(object, normalizeSize = 10e6, block = NULL)
object |
An object of class |
normalizeSize |
If normalizing, the target library size (default is 10e6). |
block |
A vector giving the sample designations of the columns of
|
block
is a vector that is used when the columns of the
MPRAset
object are paired. This often is the case when comparing
allelic versions of an element. In this case, the first $s$ columns of
object
give the counts for the reference allele in $s$
samples. The second $s$ columns give the counts for the alternative
allele measured in the same $s$ samples. With 3 samples, block
would be c(1,2,3,1,2,3)
. All columns are scaled to have 10
million counts.
An object of class MPRASet
with the total count-normalized DNA
and RNA counts.
data(mpraSetAggExample) mpraSetAggExample <- normalize_counts(mpraSetAggExample)
data(mpraSetAggExample) mpraSetAggExample <- normalize_counts(mpraSetAggExample)