Title: | A container (S4 class) for matrix-like assays |
---|---|
Description: | The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples. |
Authors: | Martin Morgan [aut], Valerie Obenchain [aut], Jim Hester [aut], Hervé Pagès [aut, cre] |
Maintainer: | Hervé Pagès <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.37.0 |
Built: | 2024-12-21 05:58:26 UTC |
Source: | https://github.com/bioc/SummarizedExperiment |
The Assays virtual class and its methods provide a formal abstraction of the assays slot of SummarizedExperiment objects.
SimpleAssays and ShallowSimpleListAssays are concrete subclasses of Assays with the former being currently the default implementation of Assays objects. Other implementations (e.g. disk-based) could easily be added.
Note that these classes are not meant to be used directly by the end user and the material in this man page is aimed at package developers.
Assays objects have a list-like semantics with elements having matrix- or
array-like semantics (e.g., dim
, dimnames
).
The Assays API consists of:
(a) The Assays()
constructor function.
(b) Lossless back and forth coercion from/to SimpleList. The coercion method from SimpleList doesn't need (and should not) validate the returned object.
(c) length
, names
, `names<-`
,
getListElement
, setListElement
,
dim
, [
, `[<-`
, rbind
, cbind
.
An Assays concrete subclass needs to implement (b) (required) plus, optionally any of the methods in (c).
IMPORTANT:
Nobody in the Assays hierarchy is allowed to inherit from
SimpleList because of the conflicting semantic
of [
.
Methods that return a modified Assays object (a.k.a.
endomorphisms), that is, [
as well as replacement methods
names<-
, setListElement
, and [<-
, must respect
the copy-on-change contract.
With objects that don't make use of references internally, the
developer doesn't need to take any special action for that because
it's automatically taken care of by R itself. However, for objects
that do make use of references internally (e.g. environments,
external pointers, pointer to a file on disk, etc...), the developer
needs to be careful to implement endomorphisms with copy-on-change
semantics. This can easily be achieved (and is what the default
methods for Assays objects do) by performaing a full (deep) copy of
the object before modifying it instead of trying to modify it
in-place. However note that this full (deep) copy can be very
expensive and is actually not necessary in order to achieve
copy-on-change semantics: it's enough (and often preferrable for
performance reasons) to copy only the parts of the object that need
to be modified.
Assays has currently 3 implementations which are formalized by concrete subclasses SimpleAssays, ShallowSimpleListAssays, and AssaysInEnv. SimpleAssays is the default (prior to SummarizedExperiment 1.15.4, ShallowSimpleListAssays was the default). AssaysInEnv is a broken alternative to ShallowSimpleListAssays that does NOT respect the copy-on-change contract. It is only provided for illustration purposes (see source file Assays-class.R for the details).
A little more detail about ShallowSimpleListAssays: a small reference
class hierarchy (not exported from the GenomicRanges name space)
defines a reference class ShallowData with a single field data
of type ANY
, and a derived class ShallowSimpleListAssays
that specializes the type of data
as SimpleList,
and contains=c("ShallowData", "Assays")
. The assays slot of a
SummarizedExperiment object contains an instance of
ShallowSimpleListAssays.
Martin Morgan and Hervé Pagès
SummarizedExperiment objects.
SimpleList objects in the S4Vectors package.
## --------------------------------------------------------------------- ## DIRECT MANIPULATION OF Assays OBJECTS ## --------------------------------------------------------------------- m1 <- matrix(runif(24), ncol=3) m2 <- matrix(runif(24), ncol=3) a <- Assays(SimpleList(m1, m2)) a as(a, "SimpleList") length(a) getListElement(a, 2) dim(a) b <- a[-4, 2] b length(b) getListElement(b, 2) dim(b) names(a) names(a) <- c("a1", "a2") names(a) getListElement(a, "a2") rbind(a, a) cbind(a, a) ## --------------------------------------------------------------------- ## COPY-ON-CHANGE CONTRACT ## --------------------------------------------------------------------- ## ShallowSimpleListAssays objects have copy-on-change semantics but not ## AssaysInEnv objects. For example: ssla <- as(SimpleList(m1, m2), "ShallowSimpleListAssays") aie <- as(SimpleList(m1, m2), "AssaysInEnv") ## No names on 'ssla' and 'aie': names(ssla) names(aie) ssla2 <- ssla aie2 <- aie names(ssla2) <- names(aie2) <- c("A1", "A2") names(ssla) # still NULL (as expected) names(aie) # changed! (because the names<-,AssaysInEnv method is not # implemented in a way that respects the copy-on-change # contract)
## --------------------------------------------------------------------- ## DIRECT MANIPULATION OF Assays OBJECTS ## --------------------------------------------------------------------- m1 <- matrix(runif(24), ncol=3) m2 <- matrix(runif(24), ncol=3) a <- Assays(SimpleList(m1, m2)) a as(a, "SimpleList") length(a) getListElement(a, 2) dim(a) b <- a[-4, 2] b length(b) getListElement(b, 2) dim(b) names(a) names(a) <- c("a1", "a2") names(a) getListElement(a, "a2") rbind(a, a) cbind(a, a) ## --------------------------------------------------------------------- ## COPY-ON-CHANGE CONTRACT ## --------------------------------------------------------------------- ## ShallowSimpleListAssays objects have copy-on-change semantics but not ## AssaysInEnv objects. For example: ssla <- as(SimpleList(m1, m2), "ShallowSimpleListAssays") aie <- as(SimpleList(m1, m2), "AssaysInEnv") ## No names on 'ssla' and 'aie': names(ssla) names(aie) ssla2 <- ssla aie2 <- aie names(ssla2) <- names(aie2) <- c("A1", "A2") names(ssla) # still NULL (as expected) names(aie) # changed! (because the names<-,AssaysInEnv method is not # implemented in a way that respects the copy-on-change # contract)
This man page documents the coverage
method for
RangedSummarizedExperiment objects.
## S4 method for signature 'RangedSummarizedExperiment' coverage(x, shift=0L, width=NULL, weight=1L, method=c("auto", "sort", "hash"))
## S4 method for signature 'RangedSummarizedExperiment' coverage(x, shift=0L, width=NULL, weight=1L, method=c("auto", "sort", "hash"))
x |
A RangedSummarizedExperiment object. |
shift , width , weight , method
|
See |
This method operates on the rowRanges
component of the
RangedSummarizedExperiment object, which can be a
GenomicRanges or GRangesList
object.
More precisely, on RangedSummarizedExperiment object x
,
coverage(x, ...)
is equivalent to coverage(rowRanges(x), ...)
.
See ?coverage
in the GenomicRanges
package for the details of how coverage
operates on a
GenomicRanges or GRangesList
object.
See ?coverage
in the GenomicRanges
package.
RangedSummarizedExperiment objects.
The coverage man page in the
GenomicRanges package where the coverage
methods
for GenomicRanges and
GRangesList objects are documented.
nrows <- 20; ncols <- 6 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) rowRanges <- GRanges(rep(c("chr1", "chr2"), c(5, 15)), IRanges(sample(1000L, 20), width=100), strand=Rle(c("+", "-"), c(12, 8)), seqlengths=c(chr1=1800, chr2=1300)) colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3), row.names=LETTERS[1:6]) rse <- SummarizedExperiment(assays=SimpleList(counts=counts), rowRanges=rowRanges, colData=colData) cvg <- coverage(rse) cvg stopifnot(identical(cvg, coverage(rowRanges(rse))))
nrows <- 20; ncols <- 6 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) rowRanges <- GRanges(rep(c("chr1", "chr2"), c(5, 15)), IRanges(sample(1000L, 20), width=100), strand=Rle(c("+", "-"), c(12, 8)), seqlengths=c(chr1=1800, chr2=1300)) colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3), row.names=LETTERS[1:6]) rse <- SummarizedExperiment(assays=SimpleList(counts=counts), rowRanges=rowRanges, colData=colData) cvg <- coverage(rse) cvg stopifnot(identical(cvg, coverage(rowRanges(rse))))
This man page documents the findOverlaps
methods for
RangedSummarizedExperiment objects.
RangedSummarizedExperiment objects also support
countOverlaps
, overlapsAny
, and subsetByOverlaps
thanks to the default methods defined in the IRanges package and
to the findOverlaps
methods defined in this package and documented
below.
## S4 method for signature 'RangedSummarizedExperiment,Vector' findOverlaps(query, subject, maxgap=-1L, minoverlap=0L, type=c("any", "start", "end", "within", "equal"), select=c("all", "first", "last", "arbitrary"), ignore.strand=FALSE) ## S4 method for signature 'Vector,RangedSummarizedExperiment' findOverlaps(query, subject, maxgap=-1L, minoverlap=0L, type=c("any", "start", "end", "within", "equal"), select=c("all", "first", "last", "arbitrary"), ignore.strand=FALSE)
## S4 method for signature 'RangedSummarizedExperiment,Vector' findOverlaps(query, subject, maxgap=-1L, minoverlap=0L, type=c("any", "start", "end", "within", "equal"), select=c("all", "first", "last", "arbitrary"), ignore.strand=FALSE) ## S4 method for signature 'Vector,RangedSummarizedExperiment' findOverlaps(query, subject, maxgap=-1L, minoverlap=0L, type=c("any", "start", "end", "within", "equal"), select=c("all", "first", "last", "arbitrary"), ignore.strand=FALSE)
query , subject
|
One of these two arguments must be a RangedSummarizedExperiment object. |
maxgap , minoverlap , type
|
See |
select , ignore.strand
|
See |
These methods operate on the rowRanges
component of the
RangedSummarizedExperiment object, which can be a
GenomicRanges or GRangesList
object.
More precisely, if any of the above functions is passed a
RangedSummarizedExperiment object thru the query
and/or
subject
argument, then it behaves as if rowRanges(query)
and/or rowRanges(subject)
had been passed instead.
See ?findOverlaps
in the GenomicRanges
package for the details of how findOverlaps
and family operate on
GenomicRanges and GRangesList
objects.
See ?findOverlaps
in the GenomicRanges
package.
RangedSummarizedExperiment objects.
The findOverlaps man page in the
GenomicRanges package where the findOverlaps
family
of methods for GenomicRanges and
GRangesList objects is documented.
nrows <- 20; ncols <- 6 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) rowRanges <- GRanges(rep(c("chr1", "chr2"), c(5, 15)), IRanges(sample(1000L, 20), width=100), strand=Rle(c("+", "-"), c(12, 8))) colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3), row.names=LETTERS[1:6]) rse0 <- SummarizedExperiment(assays=SimpleList(counts=counts), rowRanges=rowRanges, colData=colData) rse1 <- shift(rse0, 100) hits <- findOverlaps(rse0, rse1) hits stopifnot(identical(hits, findOverlaps(rowRanges(rse0), rowRanges(rse1)))) stopifnot(identical(hits, findOverlaps(rse0, rowRanges(rse1)))) stopifnot(identical(hits, findOverlaps(rowRanges(rse0), rse1)))
nrows <- 20; ncols <- 6 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) rowRanges <- GRanges(rep(c("chr1", "chr2"), c(5, 15)), IRanges(sample(1000L, 20), width=100), strand=Rle(c("+", "-"), c(12, 8))) colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3), row.names=LETTERS[1:6]) rse0 <- SummarizedExperiment(assays=SimpleList(counts=counts), rowRanges=rowRanges, colData=colData) rse1 <- shift(rse0, 100) hits <- findOverlaps(rse0, rse1) hits stopifnot(identical(hits, findOverlaps(rowRanges(rse0), rowRanges(rse1)))) stopifnot(identical(hits, findOverlaps(rse0, rowRanges(rse1)))) stopifnot(identical(hits, findOverlaps(rowRanges(rse0), rse1)))
This man page documents the inter range transformations that are supported on RangedSummarizedExperiment objects.
## S4 method for signature 'RangedSummarizedExperiment' isDisjoint(x, ignore.strand=FALSE) ## S4 method for signature 'RangedSummarizedExperiment' disjointBins(x, ignore.strand=FALSE)
## S4 method for signature 'RangedSummarizedExperiment' isDisjoint(x, ignore.strand=FALSE) ## S4 method for signature 'RangedSummarizedExperiment' disjointBins(x, ignore.strand=FALSE)
x |
A RangedSummarizedExperiment object. |
ignore.strand |
See |
These transformations operate on the rowRanges
component of the
RangedSummarizedExperiment object, which can be a
GenomicRanges or GRangesList
object.
More precisely, any of the above functions performs the following
transformation on RangedSummarizedExperiment object x
:
f(rowRanges(x), ...)
where f
is the name of the function and ...
any additional
arguments passed to it.
See ?isDisjoint
in the GenomicRanges
package for the details of how these transformations operate on a
GenomicRanges or GRangesList
object.
See ?isDisjoint
in the
GenomicRanges package.
RangedSummarizedExperiment objects.
The isDisjoint man page in the GenomicRanges package where inter range transformations of a GenomicRanges or GRangesList object are documented.
nrows <- 20; ncols <- 6 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) rowRanges <- GRanges(rep(c("chr1", "chr2"), c(5, 15)), IRanges(sample(1000L, 20), width=100), strand=Rle(c("+", "-"), c(12, 8))) colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3), row.names=LETTERS[1:6]) rse0 <- SummarizedExperiment(assays=SimpleList(counts=counts), rowRanges=rowRanges, colData=colData) rse1 <- shift(rse0, 99*start(rse0)) isDisjoint(rse0) # FALSE isDisjoint(rse1) # TRUE bins0 <- disjointBins(rse0) bins0 stopifnot(identical(bins0, disjointBins(rowRanges(rse0)))) bins1 <- disjointBins(rse1) bins1 stopifnot(all(bins1 == bins1[1]))
nrows <- 20; ncols <- 6 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) rowRanges <- GRanges(rep(c("chr1", "chr2"), c(5, 15)), IRanges(sample(1000L, 20), width=100), strand=Rle(c("+", "-"), c(12, 8))) colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3), row.names=LETTERS[1:6]) rse0 <- SummarizedExperiment(assays=SimpleList(counts=counts), rowRanges=rowRanges, colData=colData) rse1 <- shift(rse0, 99*start(rse0)) isDisjoint(rse0) # FALSE isDisjoint(rse1) # TRUE bins0 <- disjointBins(rse0) bins0 stopifnot(identical(bins0, disjointBins(rowRanges(rse0)))) bins1 <- disjointBins(rse1) bins1 stopifnot(all(bins1 == bins1[1]))
This man page documents the intra range transformations that are supported on RangedSummarizedExperiment objects.
## S4 method for signature 'RangedSummarizedExperiment' shift(x, shift=0L, use.names=TRUE) ## S4 method for signature 'RangedSummarizedExperiment' narrow(x, start=NA, end=NA, width=NA, use.names=TRUE) ## S4 method for signature 'RangedSummarizedExperiment' resize(x, width, fix="start", use.names=TRUE, ignore.strand=FALSE) ## S4 method for signature 'RangedSummarizedExperiment' flank(x, width, start=TRUE, both=FALSE, use.names=TRUE, ignore.strand=FALSE) ## S4 method for signature 'RangedSummarizedExperiment' promoters(x, upstream=2000, downstream=200) ## S4 method for signature 'RangedSummarizedExperiment' terminators(x, upstream=2000, downstream=200) ## S4 method for signature 'RangedSummarizedExperiment' restrict(x, start=NA, end=NA, keep.all.ranges=FALSE, use.names=TRUE) ## S4 method for signature 'RangedSummarizedExperiment' trim(x, use.names=TRUE)
## S4 method for signature 'RangedSummarizedExperiment' shift(x, shift=0L, use.names=TRUE) ## S4 method for signature 'RangedSummarizedExperiment' narrow(x, start=NA, end=NA, width=NA, use.names=TRUE) ## S4 method for signature 'RangedSummarizedExperiment' resize(x, width, fix="start", use.names=TRUE, ignore.strand=FALSE) ## S4 method for signature 'RangedSummarizedExperiment' flank(x, width, start=TRUE, both=FALSE, use.names=TRUE, ignore.strand=FALSE) ## S4 method for signature 'RangedSummarizedExperiment' promoters(x, upstream=2000, downstream=200) ## S4 method for signature 'RangedSummarizedExperiment' terminators(x, upstream=2000, downstream=200) ## S4 method for signature 'RangedSummarizedExperiment' restrict(x, start=NA, end=NA, keep.all.ranges=FALSE, use.names=TRUE) ## S4 method for signature 'RangedSummarizedExperiment' trim(x, use.names=TRUE)
x |
A RangedSummarizedExperiment object. |
shift , use.names , start , end , width , fix , ignore.strand , both , upstream , downstream , keep.all.ranges
|
See |
These transformations operate on the rowRanges
component of the
RangedSummarizedExperiment object, which can be a
GenomicRanges or GRangesList
object.
More precisely, any of the above functions performs the following
transformation on RangedSummarizedExperiment object x
:
rowRanges(x) <- f(rowRanges(x), ...)
where f
is the name of the function and ...
any additional
arguments passed to it.
See ?GenomicRanges::shift
in the
GenomicRanges package for the details of how these transformations
operate on a GenomicRanges or
GRangesList
object.
RangedSummarizedExperiment objects.
The shift
man page in the
GenomicRanges package where intra range transformations
of a GenomicRanges or
GRangesList object are documented.
nrows <- 20; ncols <- 6 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) rowRanges <- GRanges(rep(c("chr1", "chr2"), c(5, 15)), IRanges(sample(1000L, 20), width=100), strand=Rle(c("+", "-"), c(12, 8))) colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3), row.names=LETTERS[1:6]) rse0 <- SummarizedExperiment(assays=SimpleList(counts=counts), rowRanges=rowRanges, colData=colData) rse1 <- shift(rse0, 1) stopifnot(identical( rowRanges(rse1), shift(rowRanges(rse0), 1) )) se2 <- narrow(rse0, start=10, end=-15) stopifnot(identical( rowRanges(se2), narrow(rowRanges(rse0), start=10, end=-15) )) se3 <- resize(rse0, width=75) stopifnot(identical( rowRanges(se3), resize(rowRanges(rse0), width=75) )) se4 <- flank(rse0, width=20) stopifnot(identical( rowRanges(se4), flank(rowRanges(rse0), width=20) )) se5 <- promoters(rse0, upstream=85, downstream=50) stopifnot(identical( rowRanges(se5), promoters(rowRanges(rse0), upstream=85, downstream=50) )) se6 <- terminators(rse0, upstream=85, downstream=50) stopifnot(identical( rowRanges(se6), terminators(rowRanges(rse0), upstream=85, downstream=50) )) se7 <- restrict(rse0, start=200, end=700, keep.all.ranges=TRUE) stopifnot(identical( rowRanges(se7), restrict(rowRanges(rse0), start=200, end=700, keep.all.ranges=TRUE) ))
nrows <- 20; ncols <- 6 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) rowRanges <- GRanges(rep(c("chr1", "chr2"), c(5, 15)), IRanges(sample(1000L, 20), width=100), strand=Rle(c("+", "-"), c(12, 8))) colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3), row.names=LETTERS[1:6]) rse0 <- SummarizedExperiment(assays=SimpleList(counts=counts), rowRanges=rowRanges, colData=colData) rse1 <- shift(rse0, 1) stopifnot(identical( rowRanges(rse1), shift(rowRanges(rse0), 1) )) se2 <- narrow(rse0, start=10, end=-15) stopifnot(identical( rowRanges(se2), narrow(rowRanges(rse0), start=10, end=-15) )) se3 <- resize(rse0, width=75) stopifnot(identical( rowRanges(se3), resize(rowRanges(rse0), width=75) )) se4 <- flank(rse0, width=20) stopifnot(identical( rowRanges(se4), flank(rowRanges(rse0), width=20) )) se5 <- promoters(rse0, upstream=85, downstream=50) stopifnot(identical( rowRanges(se5), promoters(rowRanges(rse0), upstream=85, downstream=50) )) se6 <- terminators(rse0, upstream=85, downstream=50) stopifnot(identical( rowRanges(se6), terminators(rowRanges(rse0), upstream=85, downstream=50) )) se7 <- restrict(rse0, start=200, end=700, keep.all.ranges=TRUE) stopifnot(identical( rowRanges(se7), restrict(rowRanges(rse0), start=200, end=700, keep.all.ranges=TRUE) ))
makeSummarizedExperimentFromDataFrame
uses data.frame
or DataFrame
column names to create a GRanges object for the
rowRanges
of the resulting SummarizedExperiment object.
It requires that non-range data columns be coercible into a numeric
matrix
for the SummarizedExperiment constructor. All columns
that are not part of the row ranges attribute are assumed to be experiment
data; thus, keeping metadata columns will not be supported. Note that this
function only returns SummarizedExperiment objects with a single
assay.
If metadata columns are to be kept, one can first construct the row ranges attribute by using the makeGRangesFromDataFrame function and subsequently creating the SummarizedExperiment.
makeSummarizedExperimentFromDataFrame(df, ..., seqinfo = NULL, starts.in.df.are.0based = FALSE)
makeSummarizedExperimentFromDataFrame(df, ..., seqinfo = NULL, starts.in.df.are.0based = FALSE)
df |
A data.frame or DataFrame object. If not, then
the function first tries to turn |
... |
Additional arguments passed on to makeGRangesFromDataFrame |
seqinfo |
Either |
starts.in.df.are.0based |
|
A RangedSummarizedExperiment object with rowRanges and a single assay
M. Ramos
## --------------------------------------------------------------------- ## BASIC EXAMPLES ## --------------------------------------------------------------------- # Note that rownames of the data.frame are also rownames of the result df <- data.frame(chr="chr2", start = 11:15, end = 12:16, strand = c("+", "-", "+", "*", "."), expr0 = 3:7, expr1 = 8:12, expr2 = 12:16, row.names = paste0("GENE", letters[5:1])) df exRSE <- makeSummarizedExperimentFromDataFrame(df) exRSE assay(exRSE) rowRanges(exRSE)
## --------------------------------------------------------------------- ## BASIC EXAMPLES ## --------------------------------------------------------------------- # Note that rownames of the data.frame are also rownames of the result df <- data.frame(chr="chr2", start = 11:15, end = 12:16, strand = c("+", "-", "+", "*", "."), expr0 = 3:7, expr1 = 8:12, expr2 = 12:16, row.names = paste0("GENE", letters[5:1])) df exRSE <- makeSummarizedExperimentFromDataFrame(df) exRSE assay(exRSE) rowRanges(exRSE)
Coercion between RangedSummarizedExperiment and ExpressionSet is supported in both directions.
For going from ExpressionSet to
RangedSummarizedExperiment, the
makeSummarizedExperimentFromExpressionSet
function is also
provided to let the user control how to map features to ranges.
makeSummarizedExperimentFromExpressionSet(from, mapFun=naiveRangeMapper, ...) ## range mapping functions naiveRangeMapper(from) probeRangeMapper(from) geneRangeMapper(txDbPackage, key = "ENTREZID")
makeSummarizedExperimentFromExpressionSet(from, mapFun=naiveRangeMapper, ...) ## range mapping functions naiveRangeMapper(from) probeRangeMapper(from) geneRangeMapper(txDbPackage, key = "ENTREZID")
from |
An ExpressionSet object. |
mapFun |
A function which takes an ExpressionSet object and returns a GRanges, or GRangesList object which corresponds to the genomic ranges used in the ExpressionSet. The rownames of the returned GRanges are used to match the featureNames of the ExpressionSet. The |
... |
Additional arguments passed to |
txDbPackage |
A character string with the Transcript Database to use for the mapping. |
key |
A character string with the Gene key to use for the mapping. |
makeSummarizedExperimentFromExpressionSet
takes an
ExpressionSet object as input and a range mapping
function that maps the features to ranges. It then returns a
RangedSummarizedExperiment object that corresponds to the input.
The range mapping functions return a GRanges object, with the
rownames
corresponding to the featureNames of
the ExpressionSet object.
Jim Hester, [email protected]
RangedSummarizedExperiment objects.
ExpressionSet objects in the Biobase package.
TxDb objects in the GenomicFeatures package.
## --------------------------------------------------------------------- ## GOING FROM ExpressionSet TO SummarizedExperiment ## --------------------------------------------------------------------- data(sample.ExpressionSet, package="Biobase") # naive coercion makeSummarizedExperimentFromExpressionSet(sample.ExpressionSet) as(sample.ExpressionSet, "RangedSummarizedExperiment") as(sample.ExpressionSet, "SummarizedExperiment") # using probe range mapper makeSummarizedExperimentFromExpressionSet(sample.ExpressionSet, probeRangeMapper) # using the gene range mapper se <- makeSummarizedExperimentFromExpressionSet( sample.ExpressionSet, geneRangeMapper("TxDb.Hsapiens.UCSC.hg19.knownGene") ) se rowData(se) # duplicate row names ## --------------------------------------------------------------------- ## GOING FROM SummarizedExperiment TO ExpressionSet ## --------------------------------------------------------------------- example(RangedSummarizedExperiment) # to create 'rse' rse as(rse, "ExpressionSet")
## --------------------------------------------------------------------- ## GOING FROM ExpressionSet TO SummarizedExperiment ## --------------------------------------------------------------------- data(sample.ExpressionSet, package="Biobase") # naive coercion makeSummarizedExperimentFromExpressionSet(sample.ExpressionSet) as(sample.ExpressionSet, "RangedSummarizedExperiment") as(sample.ExpressionSet, "SummarizedExperiment") # using probe range mapper makeSummarizedExperimentFromExpressionSet(sample.ExpressionSet, probeRangeMapper) # using the gene range mapper se <- makeSummarizedExperimentFromExpressionSet( sample.ExpressionSet, geneRangeMapper("TxDb.Hsapiens.UCSC.hg19.knownGene") ) se rowData(se) # duplicate row names ## --------------------------------------------------------------------- ## GOING FROM SummarizedExperiment TO ExpressionSet ## --------------------------------------------------------------------- example(RangedSummarizedExperiment) # to create 'rse' rse as(rse, "ExpressionSet")
makeSummarizedExperimentFromLoom
represents a '.loom' file as
a SummarizedExperiment
. The '/matrix'
and
'/layers'
are represented as HDF5Array
objects; row
and column attributes are parsed to DataFrame
. Optionally,
row or column attributes can be specified as row and and column
names.
makeSummarizedExperimentFromLoom(file, rownames_attr = NULL, colnames_attr = NULL)
makeSummarizedExperimentFromLoom(file, rownames_attr = NULL, colnames_attr = NULL)
file |
The path (as a single character string) to the HDF5 file where the dataset is located. |
rownames_attr |
The name of the row attribute to be used as row names. |
colnames_attr |
The name of the column attribute to be used as column names. |
A SummarizedExperiment object with row and column data and one or more assays.
Martin Morgan
http://loompy.org/loompy-docs/format/index.html for a specification of the .loom format.
## --------------------------------------------------------------------- ## BASIC EXAMPLE ## --------------------------------------------------------------------- file <- system.file( package="SummarizedExperiment", "extdata", "example.loom" ) se <- makeSummarizedExperimentFromLoom(file) se assay(se) metadata(se)
## --------------------------------------------------------------------- ## BASIC EXAMPLE ## --------------------------------------------------------------------- file <- system.file( package="SummarizedExperiment", "extdata", "example.loom" ) se <- makeSummarizedExperimentFromLoom(file) se assay(se) metadata(se)
This man page documents the nearest
methods and family (i.e.
precede
, follow
, distance
, and distanceToNearest
methods) for RangedSummarizedExperiment objects.
## S4 method for signature 'RangedSummarizedExperiment,ANY' precede(x, subject, select=c("arbitrary", "all"), ignore.strand=FALSE) ## S4 method for signature 'ANY,RangedSummarizedExperiment' precede(x, subject, select=c("arbitrary", "all"), ignore.strand=FALSE) ## S4 method for signature 'RangedSummarizedExperiment,ANY' follow(x, subject, select=c("arbitrary", "all"), ignore.strand=FALSE) ## S4 method for signature 'ANY,RangedSummarizedExperiment' follow(x, subject, select=c("arbitrary", "all"), ignore.strand=FALSE) ## S4 method for signature 'RangedSummarizedExperiment,ANY' nearest(x, subject, select=c("arbitrary", "all"), ignore.strand=FALSE) ## S4 method for signature 'ANY,RangedSummarizedExperiment' nearest(x, subject, select=c("arbitrary", "all"), ignore.strand=FALSE) ## S4 method for signature 'RangedSummarizedExperiment,ANY' distance(x, y, ignore.strand=FALSE, ...) ## S4 method for signature 'ANY,RangedSummarizedExperiment' distance(x, y, ignore.strand=FALSE, ...) ## S4 method for signature 'RangedSummarizedExperiment,ANY' distanceToNearest(x, subject, ignore.strand=FALSE, ...) ## S4 method for signature 'ANY,RangedSummarizedExperiment' distanceToNearest(x, subject, ignore.strand=FALSE, ...)
## S4 method for signature 'RangedSummarizedExperiment,ANY' precede(x, subject, select=c("arbitrary", "all"), ignore.strand=FALSE) ## S4 method for signature 'ANY,RangedSummarizedExperiment' precede(x, subject, select=c("arbitrary", "all"), ignore.strand=FALSE) ## S4 method for signature 'RangedSummarizedExperiment,ANY' follow(x, subject, select=c("arbitrary", "all"), ignore.strand=FALSE) ## S4 method for signature 'ANY,RangedSummarizedExperiment' follow(x, subject, select=c("arbitrary", "all"), ignore.strand=FALSE) ## S4 method for signature 'RangedSummarizedExperiment,ANY' nearest(x, subject, select=c("arbitrary", "all"), ignore.strand=FALSE) ## S4 method for signature 'ANY,RangedSummarizedExperiment' nearest(x, subject, select=c("arbitrary", "all"), ignore.strand=FALSE) ## S4 method for signature 'RangedSummarizedExperiment,ANY' distance(x, y, ignore.strand=FALSE, ...) ## S4 method for signature 'ANY,RangedSummarizedExperiment' distance(x, y, ignore.strand=FALSE, ...) ## S4 method for signature 'RangedSummarizedExperiment,ANY' distanceToNearest(x, subject, ignore.strand=FALSE, ...) ## S4 method for signature 'ANY,RangedSummarizedExperiment' distanceToNearest(x, subject, ignore.strand=FALSE, ...)
x , subject
|
One of these two arguments must be a RangedSummarizedExperiment object. |
select , ignore.strand
|
See |
y |
For the |
... |
Additional arguments for methods. |
These methods operate on the rowRanges
component of the
RangedSummarizedExperiment object, which can be a
GenomicRanges or GRangesList
object.
More precisely, if any of the above functions is passed a
RangedSummarizedExperiment object thru the x
, subject
,
and/or y
argument, then it behaves as if rowRanges(x)
,
rowRanges(subject)
, and/or rowRanges(y)
had been passed
instead.
See ?nearest
in the GenomicRanges
package for the details of how nearest
and family operate on
GenomicRanges and GRangesList
objects.
See ?nearest
in the GenomicRanges
package.
RangedSummarizedExperiment objects.
The nearest man page in the
GenomicRanges package where the nearest
family
of methods for GenomicRanges and
GRangesList objects is documented.
nrows <- 20; ncols <- 6 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) rowRanges <- GRanges(rep(c("chr1", "chr2"), c(5, 15)), IRanges(sample(1000L, 20), width=100), strand=Rle(c("+", "-"), c(12, 8))) colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3), row.names=LETTERS[1:6]) rse0 <- SummarizedExperiment(assays=SimpleList(counts=counts), rowRanges=rowRanges, colData=colData) rse1 <- shift(rse0, 100) res <- nearest(rse0, rse1) res stopifnot(identical(res, nearest(rowRanges(rse0), rowRanges(rse1)))) stopifnot(identical(res, nearest(rse0, rowRanges(rse1)))) stopifnot(identical(res, nearest(rowRanges(rse0), rse1))) res <- nearest(rse0) # missing subject res stopifnot(identical(res, nearest(rowRanges(rse0)))) hits <- nearest(rse0, rse1, select="all") hits stopifnot(identical( hits, nearest(rowRanges(rse0), rowRanges(rse1), select="all") )) stopifnot(identical( hits, nearest(rse0, rowRanges(rse1), select="all") )) stopifnot(identical( hits, nearest(rowRanges(rse0), rse1, select="all") ))
nrows <- 20; ncols <- 6 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) rowRanges <- GRanges(rep(c("chr1", "chr2"), c(5, 15)), IRanges(sample(1000L, 20), width=100), strand=Rle(c("+", "-"), c(12, 8))) colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3), row.names=LETTERS[1:6]) rse0 <- SummarizedExperiment(assays=SimpleList(counts=counts), rowRanges=rowRanges, colData=colData) rse1 <- shift(rse0, 100) res <- nearest(rse0, rse1) res stopifnot(identical(res, nearest(rowRanges(rse0), rowRanges(rse1)))) stopifnot(identical(res, nearest(rse0, rowRanges(rse1)))) stopifnot(identical(res, nearest(rowRanges(rse0), rse1))) res <- nearest(rse0) # missing subject res stopifnot(identical(res, nearest(rowRanges(rse0)))) hits <- nearest(rse0, rse1, select="all") hits stopifnot(identical( hits, nearest(rowRanges(rse0), rowRanges(rse1), select="all") )) stopifnot(identical( hits, nearest(rse0, rowRanges(rse1), select="all") )) stopifnot(identical( hits, nearest(rowRanges(rse0), rse1, select="all") ))
The RangedSummarizedExperiment class is a matrix-like container where rows represent ranges of interest (as a GRanges or GRangesList object) and columns represent samples (with sample data summarized as a DataFrame). A RangedSummarizedExperiment object contains one or more assays, each represented by a matrix-like object of numeric or other mode.
RangedSummarizedExperiment is a subclass of SummarizedExperiment and,
as such, all the methods documented in ?SummarizedExperiment
also work on a RangedSummarizedExperiment object or any
SummarizedExperiment derivative. The methods documented below
are additional methods that are specific to RangedSummarizedExperiment
objects.
## Constructor # See ?SummarizedExperiment for the constructor function. ## Accessors rowRanges(x, ...) rowRanges(x, ...) <- value ## Subsetting ## S4 method for signature 'RangedSummarizedExperiment' subset(x, subset, select, ...) ## rowRanges access ## see 'GRanges compatibility', below
## Constructor # See ?SummarizedExperiment for the constructor function. ## Accessors rowRanges(x, ...) rowRanges(x, ...) <- value ## Subsetting ## S4 method for signature 'RangedSummarizedExperiment' subset(x, subset, select, ...) ## rowRanges access ## see 'GRanges compatibility', below
x |
A RangedSummarizedExperiment object or derivative. The
|
... |
Further arguments to be passed to or from other methods. |
value |
A GRanges or GRangesList object. |
subset |
An expression which, when evaluated in the
context of |
select |
An expression which, when evaluated in the
context of |
The rows of a RangedSummarizedExperiment object represent ranges
(in genomic coordinates) of interest. The ranges of interest are
described by a GRanges or a GRangesList object, accessible
using the rowRanges
function, described below. The GRanges
and GRangesList classes contains sequence (e.g., chromosome) name,
genomic coordinates, and strand information. Each range can be
annotated with additional data; this data might be used to describe
the range or to summarize results (e.g., statistics of differential
abundance) relevant to the range. Rows may or may not have row names;
they often will not.
RangedSummarizedExperiment instances are constructed using the
SummarizedExperiment()
function documented in
?SummarizedExperiment
.
In the code snippets below, x
is a RangedSummarizedExperiment object
or derivative (e.g. a SingleCellExperiment
object).
rowRanges(x)
, rowRanges(x) <- value
:Get or set the
row data. value
is a GenomicRanges
object. Row
names of value
must be NULL or consistent with the existing
row names of x
.
Many GRanges and GRangesList operations are supported on
RangedSummarizedExperiment objects, using rowRanges
.
Supported operations include: pcompare
,
duplicated
, end
, end<-
,
granges
, is.unsorted
, match
,
mcols
, mcols<-
, order
,
ranges
, ranges<-
, rank
,
seqinfo
, seqinfo<-
, seqnames
,
sort
, start
, start<-
,
strand
, strand<-
,
width
, width<-
.
See also ?shift
,
?isDisjoint
,
?coverage
,
?findOverlaps
, and
?nearest
for more
GRanges compatibility methods.
Not all GRanges operations are supported, because they do not make sense for RangedSummarizedExperiment objects (e.g., length, name, as.data.frame, c, splitAsList), involve non-trivial combination or splitting of rows (e.g., disjoin, gaps, reduce, unique), or have not yet been implemented (Ops, map, window, window<-).
In the code snippets below, x
is a RangedSummarizedExperiment object
or derivative (e.g. a SingleCellExperiment
object).
subset(x, subset, select)
:Create a subset of x
using an expression subset
referring to columns of
rowRanges(x)
(including ‘seqnames’, ‘start’,
‘end’, ‘width’, ‘strand’, and
names(rowData(x))
) and / or select
referring to
column names of colData(x)
.
RangedSummarizedExperiment is implemented as an S4 class, and can be
extended in the usual way, using contains="RangedSummarizedExperiment"
in the new class definition.
See the SingleCellExperiment class defined in the SingleCellExperiment package for an example of such extension.
Martin Morgan, [email protected]
SummarizedExperiment for the parent class of RangedSummarizedExperiment and the RangedSummarizedExperiment/SummarizedExperiment constructor function.
shift, isDisjoint, coverage, findOverlaps, and nearest for more GRanges compatibility methods.
GRanges objects in the GenomicRanges package.
The SingleCellExperiment defined in the SingleCellExperiment package, a subclass of RangedSummarizedExperiment specifically designed to represent single-cell sequencing data.
nrows <- 200; ncols <- 6 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) rowRanges <- GRanges(rep(c("chr1", "chr2"), c(50, 150)), IRanges(floor(runif(200, 1e5, 1e6)), width=100), strand=sample(c("+", "-"), 200, TRUE), feature_id=sprintf("ID%03d", 1:200)) colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3), row.names=LETTERS[1:6]) rse <- SummarizedExperiment(assays=SimpleList(counts=counts), rowRanges=rowRanges, colData=colData) rse dim(rse) dimnames(rse) assayNames(rse) head(assay(rse)) assays(rse) <- endoapply(assays(rse), asinh) head(assay(rse)) rowRanges(rse) rowData(rse) # same as 'mcols(rowRanges(rse))' colData(rse) rse[ , rse$Treatment == "ChIP"] ## cbind() combines objects with the same ranges but different samples: rse1 <- rse rse2 <- rse1[ , 1:3] colnames(rse2) <- letters[1:ncol(rse2)] cmb1 <- cbind(rse1, rse2) dim(cmb1) dimnames(cmb1) ## rbind() combines objects with the same samples but different ranges: rse1 <- rse rse2 <- rse1[1:50, ] rownames(rse2) <- letters[1:nrow(rse2)] cmb2 <- rbind(rse1, rse2) dim(cmb2) dimnames(cmb2) ## Coercion to/from SummarizedExperiment: se0 <- as(rse, "SummarizedExperiment") se0 as(se0, "RangedSummarizedExperiment") ## Setting rowRanges on a SummarizedExperiment object turns it into a ## RangedSummarizedExperiment object: se <- se0 rowRanges(se) <- rowRanges se # RangedSummarizedExperiment ## Sanity checks: stopifnot(identical(assays(se0), assays(rse))) stopifnot(identical(dim(se0), dim(rse))) stopifnot(identical(dimnames(se0), dimnames(rse))) stopifnot(identical(rowData(se0), rowData(rse))) stopifnot(identical(colData(se0), colData(rse)))
nrows <- 200; ncols <- 6 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) rowRanges <- GRanges(rep(c("chr1", "chr2"), c(50, 150)), IRanges(floor(runif(200, 1e5, 1e6)), width=100), strand=sample(c("+", "-"), 200, TRUE), feature_id=sprintf("ID%03d", 1:200)) colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3), row.names=LETTERS[1:6]) rse <- SummarizedExperiment(assays=SimpleList(counts=counts), rowRanges=rowRanges, colData=colData) rse dim(rse) dimnames(rse) assayNames(rse) head(assay(rse)) assays(rse) <- endoapply(assays(rse), asinh) head(assay(rse)) rowRanges(rse) rowData(rse) # same as 'mcols(rowRanges(rse))' colData(rse) rse[ , rse$Treatment == "ChIP"] ## cbind() combines objects with the same ranges but different samples: rse1 <- rse rse2 <- rse1[ , 1:3] colnames(rse2) <- letters[1:ncol(rse2)] cmb1 <- cbind(rse1, rse2) dim(cmb1) dimnames(cmb1) ## rbind() combines objects with the same samples but different ranges: rse1 <- rse rse2 <- rse1[1:50, ] rownames(rse2) <- letters[1:nrow(rse2)] cmb2 <- rbind(rse1, rse2) dim(cmb2) dimnames(cmb2) ## Coercion to/from SummarizedExperiment: se0 <- as(rse, "SummarizedExperiment") se0 as(se0, "RangedSummarizedExperiment") ## Setting rowRanges on a SummarizedExperiment object turns it into a ## RangedSummarizedExperiment object: se <- se0 rowRanges(se) <- rowRanges se # RangedSummarizedExperiment ## Sanity checks: stopifnot(identical(assays(se0), assays(rse))) stopifnot(identical(dim(se0), dim(rse))) stopifnot(identical(dimnames(se0), dimnames(rse))) stopifnot(identical(rowData(se0), rowData(rse))) stopifnot(identical(colData(se0), colData(rse)))
The SummarizedExperiment class is a matrix-like container where rows represent features of interest (e.g. genes, transcripts, exons, etc...) and columns represent samples (with sample data summarized as a DataFrame). A SummarizedExperiment object contains one or more assays, each represented by a matrix-like object of numeric or other mode.
Note that SummarizedExperiment is the parent of the RangedSummarizedExperiment class which means that all the methods documented below also work on a RangedSummarizedExperiment object.
## Constructor for RangedSummarizedExperiment/SummarizedExperiment objects SummarizedExperiment(assays=SimpleList(), rowData=NULL, rowRanges=NULL, colData=DataFrame(), metadata=list(), checkDimnames=TRUE) ## Accessors assayNames(x, ...) assayNames(x, ...) <- value assays(x, withDimnames=TRUE, ...) assays(x, withDimnames=TRUE, ...) <- value assay(x, i, withDimnames=TRUE, ...) assay(x, i, withDimnames=TRUE, ...) <- value rowData(x, use.names=TRUE, ...) rowData(x, ...) <- value colData(x, ...) colData(x, ...) <- value #dim(x) #dimnames(x) #dimnames(x) <- value ## Quick colData access ## S4 method for signature 'SummarizedExperiment' x$name ## S4 replacement method for signature 'SummarizedExperiment' x$name <- value ## S4 method for signature 'SummarizedExperiment,ANY,missing' x[[i, j, ...]] ## S4 replacement method for signature 'SummarizedExperiment,ANY,missing' x[[i, j, ...]] <- value ## Subsetting ## S4 method for signature 'SummarizedExperiment' x[i, j, ..., drop=TRUE] ## S4 replacement method for signature 'SummarizedExperiment,ANY,ANY,SummarizedExperiment' x[i, j] <- value ## S4 method for signature 'SummarizedExperiment' subset(x, subset, select, ...) ## Combining ## S4 method for signature 'SummarizedExperiment' rbind(..., deparse.level=1) ## S4 method for signature 'SummarizedExperiment' cbind(..., deparse.level=1) ## S4 method for signature 'SummarizedExperiment' combineRows(x, ..., delayed=TRUE, fill=NA, use.names=TRUE) ## S4 method for signature 'SummarizedExperiment' combineCols(x, ..., delayed=TRUE, fill=NA, use.names=TRUE) ## On-disk realization ## S4 method for signature 'SummarizedExperiment' realize(x, BACKEND=getAutoRealizationBackend())
## Constructor for RangedSummarizedExperiment/SummarizedExperiment objects SummarizedExperiment(assays=SimpleList(), rowData=NULL, rowRanges=NULL, colData=DataFrame(), metadata=list(), checkDimnames=TRUE) ## Accessors assayNames(x, ...) assayNames(x, ...) <- value assays(x, withDimnames=TRUE, ...) assays(x, withDimnames=TRUE, ...) <- value assay(x, i, withDimnames=TRUE, ...) assay(x, i, withDimnames=TRUE, ...) <- value rowData(x, use.names=TRUE, ...) rowData(x, ...) <- value colData(x, ...) colData(x, ...) <- value #dim(x) #dimnames(x) #dimnames(x) <- value ## Quick colData access ## S4 method for signature 'SummarizedExperiment' x$name ## S4 replacement method for signature 'SummarizedExperiment' x$name <- value ## S4 method for signature 'SummarizedExperiment,ANY,missing' x[[i, j, ...]] ## S4 replacement method for signature 'SummarizedExperiment,ANY,missing' x[[i, j, ...]] <- value ## Subsetting ## S4 method for signature 'SummarizedExperiment' x[i, j, ..., drop=TRUE] ## S4 replacement method for signature 'SummarizedExperiment,ANY,ANY,SummarizedExperiment' x[i, j] <- value ## S4 method for signature 'SummarizedExperiment' subset(x, subset, select, ...) ## Combining ## S4 method for signature 'SummarizedExperiment' rbind(..., deparse.level=1) ## S4 method for signature 'SummarizedExperiment' cbind(..., deparse.level=1) ## S4 method for signature 'SummarizedExperiment' combineRows(x, ..., delayed=TRUE, fill=NA, use.names=TRUE) ## S4 method for signature 'SummarizedExperiment' combineCols(x, ..., delayed=TRUE, fill=NA, use.names=TRUE) ## On-disk realization ## S4 method for signature 'SummarizedExperiment' realize(x, BACKEND=getAutoRealizationBackend())
assays |
A |
rowData |
|
rowRanges |
|
colData |
An optional DataFrame describing the samples. Row names on
|
metadata |
An optional |
checkDimnames |
By default the rownames and colnames of the supplied
assay(s) are checked for consistency with those of the
SummarizedExperiment or RangedSummarizedExperiment object
to construct. More precisely, the rownames and colnames of each assay
must be |
x |
A SummarizedExperiment object or derivative. |
... |
For For For other accessors, ignored. |
value |
An object of a class specified in the S4 method signature or as outlined in ‘Details’. |
i , j
|
For For For |
name |
A symbol representing the name of a column of |
withDimnames |
A See "Top-level dimnames vs assay-level dimnames" section in the SummarizedExperiment vignette for more information about this and some examples. Setting Note that assays(x, withDimnames=FALSE) <- assays(x, withDimnames=FALSE) is guaranteed to always work and be a no-op. This is not the case
if |
use.names |
For For |
drop |
A |
deparse.level |
See |
subset |
An expression which, when evaluated in the context of |
select |
An expression which, when evaluated in the context of |
delayed , fill
|
See |
BACKEND |
|
The SummarizedExperiment class is meant for numeric and other
data types derived from a sequencing experiment. The structure is
rectangular like a matrix
, but with additional annotations on
the rows and columns, and with the possibility to manage several
assays simultaneously so long as they be of the same dimensions.
The rows of a SummarizedExperiment object represent features
of interest. Information about these features is stored in a
DataFrame object, accessible using the function
rowData
. The DataFrame must have as many rows
as there are rows in the SummarizedExperiment object, with each row
of the DataFrame providing information on the feature in the
corresponding row of the SummarizedExperiment object. Columns of the
DataFrame represent different attributes of the features
of interest, e.g., gene or transcript IDs, etc.
Each column of a SummarizedExperiment object represents a sample.
Information about the samples are stored in a DataFrame,
accessible using the function colData
, described below.
The DataFrame must have as many rows as there are
columns in the SummarizedExperiment object, with each row of the
DataFrame providing information on the sample in the
corresponding column of the SummarizedExperiment object.
Columns of the DataFrame represent different sample
attributes, e.g., tissue of origin, etc. Columns of the
DataFrame can themselves be annotated (via the
mcols
function). Column names typically
provide a short identifier unique to each sample.
A SummarizedExperiment object can also contain information about
the overall experiment, for instance the lab in which it was conducted,
the publications with which it is associated, etc. This information is
stored as a list
object, accessible using the metadata
function. The form of the data associated with the experiment is left to
the discretion of the user.
The SummarizedExperiment container is appropriate for matrix-like
data. The data are accessed using the assays
function,
described below. This returns a SimpleList object. Each
element of the list must itself be a matrix (of any mode) and must
have dimensions that are the same as the dimensions of the
SummarizedExperiment in which they are stored. Row and column
names of each matrix must either be NULL
or match those of the
SummarizedExperiment during construction. It is convenient for
the elements of SimpleList of assays to be named.
In the code snippets below, x
is a SummarizedExperiment object
or derivative (e.g. a RangedSummarizedExperiment object).
assays(x)
, assays(x) <- value
:Get or set the
assays. value
is a list
or SimpleList
, each
element of which is a matrix with the same dimensions as
x
.
assay(x, i)
, assay(x, i) <- value
:A convenient
alternative (to assays(x)[[i]]
, assays(x)[[i]] <-
value
) to get or set the i
th (default first) assay
element. value
must be a matrix of the same dimension as
x
, and with dimension names NULL
or consistent with
those of x
.
assayNames(x)
, assayNames(x) <- value
:Get or
set the names of assay()
elements.
rowData(x, use.names=TRUE)
, rowData(x) <- value
:Get or set the row data. value
is a DataFrame object.
colData(x)
, colData(x) <- value
:Get or set the
column data. value
is a DataFrame object. Row
names of value
must be NULL
or consistent with the
existing column names of x
.
metadata(x)
, metadata(x) <- value
:Get or set
the experiment data. value
is a list
with arbitrary
content.
dim(x)
:Get the dimensions (features of interest x samples) of the SummarizedExperiment object.
dimnames(x)
, dimnames(x) <- value
:Get or set
the dimension names. value
is usually a list of length 2,
containing elements that are either NULL
or vectors of
appropriate length for the corresponding dimension. value
can be NULL
, which removes dimension names. This method
implies that rownames
, rownames<-
, colnames
,
and colnames<-
are all available.
In the code snippets below, x
is a SummarizedExperiment object
or derivative (e.g. a RangedSummarizedExperiment object).
x[i,j]
, x[i,j] <- value
:Create or replace a
subset of x
. i
, j
can be numeric
,
logical
, character
, or missing
. value
must be a SummarizedExperiment object with dimensions,
dimension names, and assay elements consistent with the subset
x[i,j]
being replaced.
subset(x, subset, select)
:Create a subset of x
using an expression subset
referring to columns of
rowData(x)
and / or select
referring to column names
of colData(x)
.
Additional subsetting accessors provide convenient access to
colData
columns
x$name
, x$name <- value
Access or replace
column name
in x
.
x[[i, ...]]
, x[[i, ...]] <- value
Access or
replace column i
in x
.
In the code snippets below, x
, y
and ...
are
SummarizedExperiment objects (or derivatives) to be combined.
rbind(...)
:rbind
combines objects with the same samples
but different features of interest (rows in assays
).
The colnames in rowData(SummarizedExperiment)
must match or
an error is thrown.
Duplicate columns of colData(SummarizedExperiment)
must
contain the same data.
Data in assays
are combined by name matching; if all assay
names are NULL
matching is by position. A mixture of names
and NULL
throws an error.
metadata
from all objects are combined into a list
with no name checking.
cbind(...)
:cbind
combines objects with the same features of interest
but different samples (columns in assays
).
The colnames in colData(SummarizedExperiment)
must match or
an error is thrown.
Duplicate columns of rowData(SummarizedExperiment)
must
contain the same data.
Data in assays
are combined by name matching; if all assay
names are NULL
matching is by position. A mixture of names
and NULL
throws an error.
metadata
from all objects are combined into a list
with no name checking.
combineRows(x, ..., use.names=TRUE, delayed=TRUE, fill=NA)
:combineRows
acts like more flexible rbind
, returning a
SummarizedExperiment with features equal to the concatenation of features
across all input objects. Unlike rbind
, it permits differences in
the number and identity of the columns, differences in the available
rowData
fields, and even differences in the available
assays
among the objects being combined.
If use.names=TRUE
, each input object must have non-NULL
,
non-duplicated column names. These names do not have to be the same, or
even shared, across the input objects. The column names of the returned
SummarizedExperiment
will be a union of the column names across
all input objects. If a column is not present in an input, the
corresponding assay and colData
entries will be filled with
fill
and NA
s, respectively, in the combined
SummarizedExperiment.
If use.names=FALSE
, all objects must have the same number of
columns. The column names of the returned object is set to
colnames(x)
. Any differences in the column names between input
objects are ignored.
Data in assays
are combined by matching the names of the assays.
If one input object does not contain a named assay present in other input
objects, the corresponding assay entries in the returned object will be
set to fill
. If all assay names are NULL
, matching is done
by position. A mixture of named and unnamed assays will throw an error.
If delayed=TRUE
, assay matrices are wrapped in
DelayedArray
s to avoid any extra memory allocation during
the matrix rbind
ing. Otherwise, the matrices are combined as-is;
note that this may still return DelayedMatrix
s if the inputs were
also DelayedMatrix
objects.
If any input is a RangedSummarizedExperiment
, the returned object
will also be a RangedSummarizedExperiment
. The rowRanges
of
the returned object is set to the concatenation of the rowRanges
of all inputs. If any input is a SummarizedExperiment
, the
returned rowRanges
is converted into a GRangesList
and the
entries corresponding to the rows of the SummarizedExperiment
are
set to zero-length GRanges
. If all inputs are
SummarizedExperiment
objects, a SummarizedExperiment
is
also returned.
rowData
are combined using combineRows
for
DataFrame
objects. It is not necessary for all input objects to
have the same fields in their rowData
; missing fields are filled
with NA
s for the corresponding rows in the returned object.
metadata
from all objects are combined into a list
with no name checking.
combineCols(x, ..., use.names=TRUE, delayed=TRUE, fill=NA)
:combineCols
acts like more flexible cbind
, returning a
SummarizedExperiment with columns equal to the concatenation of columns
across all input objects. Unlike cbind
, it permits differences in
the number and identity of the rows, differences in the available
colData
fields, and even differences in the available
assays
among the objects being combined.
If use.names=TRUE
, each input object must have non-NULL
,
non-duplicated row names. These names do not have to be the same, or
even shared, across the input objects. The row names of the returned
SummarizedExperiment
will be a union of the row names across
all input objects. If a row is not present in an input, the
corresponding assay and rowData
entries will be filled with
fill
and NA
s, respectively, in the combined
SummarizedExperiment.
If use.names=FALSE
, all objects must have the same number of rows.
The row names of the returned object is set to rownames(x)
. Any
differences in the row names between input objects are ignored.
Data in assays
are combined by matching the names of the assays.
If one input object does not contain a named assay present in other input
objects, the corresponding assay entries in the returned object will be
set to fill
. If all assay names are NULL
, matching is
done by position. A mixture of named and unnamed assays will throw an
error.
If delayed=TRUE
, assay matrices are wrapped in
DelayedArray
s to avoid any extra memory allocation during
the matrix rbind
ing. Otherwise, the matrices are combined as-is;
note that this may still return DelayedMatrix
s if the inputs were
also DelayedMatrix
objects.
If any input is a RangedSummarizedExperiment
, the returned object
will also be a RangedSummarizedExperiment
. The rowRanges
of
the returned object is set to a merge of the rowRanges
of all
inputs, where the coordinates for each row are taken from the input
object that contains that row. Any conflicting ranges for shared rows
will raise a warning and all rowRanges
information from the
offending RangedSummarizedExperiment
will be ignored. If any
input is a SummarizedExperiment
, the returned rowRanges
is
converted into a GRangesList
and the entries corresponding to the
unique rows of the SummarizedExperiment
are set to zero-length
GRanges
. If all inputs are SummarizedExperiment
objects, a
SummarizedExperiment
is also returned.
colData
are combined using combineRows
for
DataFrame
objects. It is not necessary for all input objects to
have the same fields in their colData
; missing fields are filled
with NA
s for the corresponding columns in the returned object.
metadata
from all objects are combined into a list
with no name checking.
This section contains advanced material meant for package developers.
SummarizedExperiment is implemented as an S4 class, and can be extended in
the usual way, using contains="SummarizedExperiment"
in the new
class definition.
In addition, the representation of the assays
slot of
SummarizedExperiment is as a virtual class Assays. This
allows derived classes (contains="Assays"
) to implement
alternative requirements for the assays, e.g., backed by file-based
storage like NetCDF or the ff
package, while re-using the existing
SummarizedExperiment class without modification.
See Assays for more information.
Martin Morgan; combineRows
and combineCols
by Aaron Lun
RangedSummarizedExperiment objects.
DataFrame, SimpleList, and Annotated objects in the S4Vectors package.
saveHDF5SummarizedExperiment
and
loadHDF5SummarizedExperiment
in the
HDF5Array package for saving/loading an HDF5-based
SummarizedExperiment object to/from disk.
The realize
generic function in the
DelayedArray package for more information about on-disk
realization of objects carrying delayed operations.
nrows <- 200; ncols <- 6 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3), row.names=LETTERS[1:6]) se0 <- SummarizedExperiment(assays=SimpleList(counts=counts), colData=colData) se0 dim(se0) dimnames(se0) assayNames(se0) head(assay(se0)) assays(se0) <- endoapply(assays(se0), asinh) head(assay(se0)) rowData(se0) colData(se0) se0[, se0$Treatment == "ChIP"] subset(se0, select = Treatment == "ChIP") ## rbind() combines objects with the same samples but different ## features of interest: se1 <- se0 se2 <- se1[1:50,] rownames(se2) <- letters[seq_len(nrow(se2))] cmb2 <- rbind(se1, se2) dim(cmb2) dimnames(cmb2) ## cbind() combines objects with the same features of interest ## but different samples: se1 <- se0 se2 <- se1[,1:3] colnames(se2) <- letters[seq_len(ncol(se2))] cmb1 <- cbind(se1, se2) dim(cmb1) dimnames(cmb1) ## --------------------------------------------------------------------- ## ON-DISK REALIZATION ## --------------------------------------------------------------------- library(DelayedArray) setAutoRealizationBackend("HDF5Array") cmb3 <- realize(cmb2) assay(cmb3, withDimnames=FALSE) # an HDF5Matrix object
nrows <- 200; ncols <- 6 counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows) colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3), row.names=LETTERS[1:6]) se0 <- SummarizedExperiment(assays=SimpleList(counts=counts), colData=colData) se0 dim(se0) dimnames(se0) assayNames(se0) head(assay(se0)) assays(se0) <- endoapply(assays(se0), asinh) head(assay(se0)) rowData(se0) colData(se0) se0[, se0$Treatment == "ChIP"] subset(se0, select = Treatment == "ChIP") ## rbind() combines objects with the same samples but different ## features of interest: se1 <- se0 se2 <- se1[1:50,] rownames(se2) <- letters[seq_len(nrow(se2))] cmb2 <- rbind(se1, se2) dim(cmb2) dimnames(cmb2) ## cbind() combines objects with the same features of interest ## but different samples: se1 <- se0 se2 <- se1[,1:3] colnames(se2) <- letters[seq_len(ncol(se2))] cmb1 <- cbind(se1, se2) dim(cmb1) dimnames(cmb1) ## --------------------------------------------------------------------- ## ON-DISK REALIZATION ## --------------------------------------------------------------------- library(DelayedArray) setAutoRealizationBackend("HDF5Array") cmb3 <- realize(cmb2) assay(cmb3, withDimnames=FALSE) # an HDF5Matrix object