Title: | Gene set enrichment data structures and methods |
---|---|
Description: | This package provides classes and methods to support Gene Set Enrichment Analysis (GSEA). |
Authors: | Martin Morgan [aut], Seth Falcon [aut], Robert Gentleman [aut], Paul Villafuerte [ctb] ('GSEABase' vignette translation from Sweave to Rmarkdown / HTML), Bioconductor Package Maintainer [cre] |
Maintainer: | Bioconductor Package Maintainer <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.69.0 |
Built: | 2024-10-30 07:23:00 UTC |
Source: | https://github.com/bioc/GSEABase |
This package provides classes and methods to support Gene Set Enrichment Analysis (GSEA). The GeneSet class provides a common data structure for representing gene sets. The GeneColorSet class allows genes in a set to be associated with phenotypes. The GeneSetCollection class facilitates grouping together a list of related gene sets. The GeneIdentifierType class hierarchy reflects how genes are represented (e.g., Entrez versus symbol) in the gene set. mapIdentifiers provides a way to convert identifiers in a set from one type to another. The CollectionType class hierarchy reflects how the gene set was made, and can order genes into distinct sets or collections.
Written by Martin Morgan, Seth Falcon, Robert Gentleman. Maintainer: Biocore Team c/o BioC user list <[email protected]>
GeneSet, GeneColorSet GeneSetCollection
example(GeneSet)
example(GeneSet)
These functions construct collection types. Collection types can be used in manipulating (e.g., selecting) sets, and can contain information specific to particular sets (e.g., 'category' and 'subcategory' classifications of 'BroadCollection'.)
NullCollection(...) ComputedCollection(...) ExpressionSetCollection(...) ChrCollection(ids,...) ChrlocCollection(ids,...) KEGGCollection(ids,...) MapCollection(ids,...) OMIMCollection(ids,...) PMIDCollection(ids,...) PfamCollection(ids, ...) PrositeCollection(ids, ...) GOCollection(ids=character(0), evidenceCode="ANY", ontology="ANY", ..., err=FALSE) OBOCollection(ids, evidenceCode="ANY", ontology="ANY", ...) BroadCollection(category, subCategory=NA, ...)
NullCollection(...) ComputedCollection(...) ExpressionSetCollection(...) ChrCollection(ids,...) ChrlocCollection(ids,...) KEGGCollection(ids,...) MapCollection(ids,...) OMIMCollection(ids,...) PMIDCollection(ids,...) PfamCollection(ids, ...) PrositeCollection(ids, ...) GOCollection(ids=character(0), evidenceCode="ANY", ontology="ANY", ..., err=FALSE) OBOCollection(ids, evidenceCode="ANY", ontology="ANY", ...) BroadCollection(category, subCategory=NA, ...)
category |
(Required) Broad category, one of "c1" (postitional), "c2" (curated), "c3" (motif), "c4" (computational), "c5" (GO), "c6" (Oncogenic Pathway Activation Modules) "c7" (Immunologic Signatures), "c8" (Cell Type Signatures), "h" (Hallmark). |
subCategory |
(Optional) Sub-category; no controlled vocabulary. |
ids |
(Optional) Character vector of identifiers (e.g., GO, KEGG, or PMID terms). |
evidenceCode |
(Optional) Character vector of GO evidence codes to be included, or "ANY" (any identifier; the default). Evidence is a property of particular genes, rather than of the ontology, so evidenceCode is a convenient way of specifying how users of a GOCollection might restrict derived objects (as in done during create of a gene set from an expression set). |
ontology |
(Optional) Character vector of GO ontology terms to be included, or "ANY" (any identifier; the default). Unlike evidence code, ontology membership is enforced when GOCollection gene sets are constructed. |
err |
(Optional) logical scalar indicating whether non-existent
GO terms signal an error ( |
... |
Additional arguments, usually none but see specific
|
An object of the same class as the function name, initialized as appropriate for the collection.
Martin Morgan <[email protected]>
NullCollection() ## NullCollection when no collection type specified collectionType(GeneSet()) collectionType(GeneSet(collectionType=GOCollection())) ## fl could be a url fl <- system.file("extdata", "Broad.xml", package="GSEABase") gs1 <- getBroadSets(fl)[[1]] collectionType(gs1) # BroadCollection ## new BroadCollection, with different category bc <- BroadCollection(category="c2") ## change collectionType of gs2 gs2 <- gs1 collectionType(gs2) <- NullCollection() ## OBOCollection fl <- system.file("extdata", "goslim_plant.obo", package="GSEABase") getOBOCollection(fl, evidenceCode="TAS") # returns OBOCollection OBOCollection(c("GO:0008967", "GO:0015119", "GO:0030372", "GO:0002732", "GO:0048090"))
NullCollection() ## NullCollection when no collection type specified collectionType(GeneSet()) collectionType(GeneSet(collectionType=GOCollection())) ## fl could be a url fl <- system.file("extdata", "Broad.xml", package="GSEABase") gs1 <- getBroadSets(fl)[[1]] collectionType(gs1) # BroadCollection ## new BroadCollection, with different category bc <- BroadCollection(category="c2") ## change collectionType of gs2 gs2 <- gs1 collectionType(gs2) <- NullCollection() ## OBOCollection fl <- system.file("extdata", "goslim_plant.obo", package="GSEABase") getOBOCollection(fl, evidenceCode="TAS") # returns OBOCollection OBOCollection(c("GO:0008967", "GO:0015119", "GO:0030372", "GO:0002732", "GO:0048090"))
These classes provides a way to tag the origin of a
GeneSet
. Collection types can be used in manipulating
(e.g., selecting) sets, and can contain information specific to
particular sets (e.g., category
and subcategory
classifications of BroadCollection
.)
The following classes can tag gene sets; GO, KEGG, Chr, Chrloc,
OMIM, and PMID
collections can be derived from chip or organism
‘annotation’ packages.
No formal collection information available.
Derived from, or destined to be, Broad
XML. Usually created and written getBroadSets
,
toBroadXML
.
A computationally created collection, e.g., by performing logic operations on gene sets.
Derived from
ExpressionSet
. Usually
created during a call to GeneSet
or
GeneColorSet
.
Collection derived using Gene Ontology (GO) terms.
Collection derived from GOCollection
,
specifically from files described by the OBO file format. See
OBOCollection
Collection derived using KEGG terms.
Collection derived using chromsome locations
Collection derived using chromosome starting posistions
Collection derived from cytogenic bands.
Collection derived from identifiers in the Online Inheritance in Man.
Collection derived from PMID identifiers.
Collection derived from Pfam identifiers.
Collection derived from Prosite identifiers.
Objects are instantiated with calls to CollectionType
constructors, with slot names as possible arguments.
CollectionType
classes (Null, ComputedCollection,
ExpressionSet
) have the slot:
type
:Object of class "ScalarCharacter"
containing the character string representation of this
CollectionType
.
CollectionIdType
classes (KEGG, OMIM, PMID, Chr, Chrloc,
Map, GO
) extend the CollectionType
and have the additional
slot:
ids
:Object of class "character"
containing a
vector of character string representations of corresponding
identifiers, e.g., ‘KEGG’ or ‘GO’ terms.
GOCollection
extends CollectionIdType
and has the
additional slot:
evidenceCode
:Object of class "character"
,
containing GO evidence codes used to construct the gene set.
Object of class "character"
vector of GO
ontology terms used to filter GO terms in the GO Collection.
The values of evidenceCode
are
Inferred from Experiment
Inferred from Direct Assay
Inferred from Physical Interaction
Inferred from Mutant Phenotype
Inferred from Genetic Interaction
Inferred from Expression Pattern
Inferred from Sequence or Structural Similarity
Inferred from Sequence Orthology
Inferred from Sequence Alignment
Inferred from Sequence Model
Inferred from Genomic Context
inferred from Reviewed Computational Analysis
Traceable Author Statement
Non-traceable Author Statement
Inferred by Curator
No biological Data available
Inferred from Electronic Annotation
OBOCollection
extends GOCollection
; see
OBOCollection
.
BroadCollection
has slots:
category
:Object of class "ScalarCharacter"
containing terms from the Broad list of categories, or NA
subCategory
:Object of class "ScalarCharacter"
containing Broad sub-categories, or NA
CollectionType
classes have methods:
signature(object = "GeneSet", value =
"CollectionType")
: Replace the CollectionType
signature(object = "CollectionType")
:
Retrieve the collection type.
signature(e1="CollectionType",
e2="CollectionType")
: return e1
when class(e1)
and
class(e2)
are the same, or ComputedCollection
when different.
signature(object = "CollectionType")
: display the
collection type.
CollectionIdType
classes inherit CollectionType
methods,
and have in addition:
signature(object="CollectionIdType")
: Retrieve the
identifiers of the collection type.
signature(object="CollectionIdType", i="missing",
j="missing", ..., ids=ids(object))
: return a subset of
object
containing only ids in ids
signature(e1="CollectionIdType", e2="CollectionIdType")
:
always return ComputedCollection
.
GOCollection
inherits CollectionIdType
methods, and has
in addition:
Retrieve the evidence codes of the GO collection.
Retrieve the ontology terms of the GO collection.
signature(object="CollectionIdType", i="missing",
j="missing", ..., evidenceCode=evidenceCode(object),
ontology=ontology(object))
: return a subset of
object
containing only evidence and ontology codes in
evidenceCode
, ontology
. This method passes
arguments ... to [,CollectionIdType
methods.
BroadCollection
has methods:
Retrieve the category
of the Broad
collection.
Retrieve the sub-category of the Broad collection.
Martin Morgan <[email protected]>
CollectionType
consturctors; getBroadSets
for importing collections from the Broad (and sources).
names(getClass("CollectionType")@subclasses) ## Create a CollectionType and ask for its type collectionType(ExpressionSetCollection()) ## Read two GeneSets from a Broad XML file into a list, verify that ## they are both BroadCollection's. Category / subcategory information ## is unique to Broad collections. fl <- system.file("extdata", "Broad.xml", package="GSEABase") sets <- getBroadSets(fl) sapply(sets, collectionType) ## ExpressionSets are tagged with ExpressionSetCollection; there is no ## 'category' information. data(sample.ExpressionSet) gs <- GeneSet(sample.ExpressionSet[100:109], setName="sample.GeneSet", setIdentifier="123") collectionType(gs) ## GOCollections are created by reference to GO terms and evidenceCodes GOCollection("GO:0005488") ## requires library(GO); EntrezIdentifers automatically created ## Not run: GeneSet(GOCollection(c("GO:0005488", "GO:0019825"), evidenceCode="IDA")) ## End(Not run)
names(getClass("CollectionType")@subclasses) ## Create a CollectionType and ask for its type collectionType(ExpressionSetCollection()) ## Read two GeneSets from a Broad XML file into a list, verify that ## they are both BroadCollection's. Category / subcategory information ## is unique to Broad collections. fl <- system.file("extdata", "Broad.xml", package="GSEABase") sets <- getBroadSets(fl) sapply(sets, collectionType) ## ExpressionSets are tagged with ExpressionSetCollection; there is no ## 'category' information. data(sample.ExpressionSet) gs <- GeneSet(sample.ExpressionSet[100:109], setName="sample.GeneSet", setIdentifier="123") collectionType(gs) ## GOCollections are created by reference to GO terms and evidenceCodes GOCollection("GO:0005488") ## requires library(GO); EntrezIdentifers automatically created ## Not run: GeneSet(GOCollection(c("GO:0005488", "GO:0019825"), evidenceCode="IDA")) ## End(Not run)
This generic and methods supplement show
, providing more detail
on object contents.
Defined methods include:
signature(object = "GeneSet")
,
signature(object = "GeneColorSet")
These methods display information about setIdentifier
,
description
, organism
, pubMedIds
,
urls
, contributor
, setVersion
, and
creationDate
.
GeneColorSet
is a generic for constructing gene color sets
(i.e., gene sets with "coloring" to indicate how features of genes and
phenotypes are associated).
Available methods are the same as those for
GeneSet
, but a GeneColorSet
requires an additional
phenotype
argument to identify the phenotype that is being
colored. See documentation for GeneColorSet
for
examples.
An additional method is:
signature(type = "GeneSet",
phenotype="character")
This method constructs a 'color' gene set from an uncolored gene set.
A GeneColorSet
extends GeneSet
to allow
genes to be 'colored'. Coloring means that for a particular phenotype,
each gene has a color (e.g., expression levels "up", "down", or "unchanged")
and a phenotypic consequence (e.g., the phenotype is "enhanced" or
"reduced").
All operations on a GeneSet
can be applied to a
GeneColorSet
; coloring can also be accessed.
Construct a GeneColorSet
with a GeneColorSet
method. These methods are identical to those for GeneSet
,
except they require an additional phenotype
argument to
specify the phenotype to which the genetic and phenotypic coloring
apply. A GeneColorSet
can be constructed from a GeneSet
with GeneColorSet(<GeneSet>, phenotype="<phenotype>").
A GeneColorSet
inherits all slots from
GeneSet
, and gains the following slots:
phenotype
:Object of class "ScalarCharacter"
describing the phenotype for which this gene set is colored.
geneColor
:Object of class "factor"
describing
the coloring of each gene in the set. The lengths of
geneColor
and gene
must be equal.
phenotypeColor
:Object of class "factor"
describing the phenotypic coloring of each gene in the set. The
lengths of phenotypeColor
and gene
must be equal.
Class "GeneSet"
, directly.
Methods unique to GeneColorSet
include:
signature(object = "GeneColorSet")
: retrieve
coloring as a data.frame
. The row names of the data frame
are the gene names; the columns are geneColor
and
phenotypeColor
.
signature(object = "GeneColorSet", value =
"data.frame")
: use a data frame
to assign coloring
information. The data.frame
must have the same number of
rows as the GeneColorSet
has genes (though see the examples
below for flexible ways to alter coloring of a subset of
genes). Row names of the data.frame
correspond to gene
names. The data frame has two columns, named geneColor
and
phenotypeColor
. These must be of class factor
.
A typical use of coloring<-
is to simultaneous extract,
subset, and reassign the current coloring, e.g.,
coloring(<GeneColorSet>)[1:5,"geneColor"] <- "up"
; see the
examples below.
signature(object = "GeneColorSet", value =
"factor")
: assign gene colors.
signature(object = "GeneColorSet")
: retrieve
gene colors as a factor
.
signature(object = "GeneColorSet",
value = "factor")
: assign phenotype colors.
signature(object = "GeneColorSet")
:
retrieve phenotype colors as a factor
.
signature(object = "GeneColorSet", value =
"character")
: assign the phenotype from a single-element
character vector.
signature(object = "GeneColorSet")
: retrieve
the phenotype as a single-element character
.
GeneColorSet
inherits all methods from class
GeneSet
. Methods with different behavior include
signature(x = "GeneSet", i="character")
signature(x = "GeneSet", i="numeric")
: subset the gene set by
index (i="numeric"
) or gene value (i="character"
). Genes
are re-ordered as required. geneColor
and phenotypeColor
are
subset as appropriate.
signature(x = "GeneSet")
: select a single gene from
the gene set, returning a named character vector of gene,
geneColor, phenotypeColor
. Exact matches only.
signature(x = "GeneSet")
: select a single gene from
the gene set, returning a named character vector of gene,
geneColor, phenotypeColor
. Provides partial matching into the
list of genes.
signature(x="GeneColorSet", to="*", from="*")
: checks
that gene- and phenotype colors are consistent for mapped
identifiers, e.g., that two AnnotationIdentifiers
mapping
to the same SymbolIdentifier
are colored the same.
Logical (set) operations &, |, setdiff
warn if
the phenotype
geneColor
, or phenotypeColor
differs between sets; this implies coercion of factor levels, and the
consequences should be carefully considered.
Martin Morgan <[email protected]>
## Create a GeneColorSet from an ExpressionSet data(sample.ExpressionSet) gcs1 <- GeneColorSet(sample.ExpressionSet[100:109], phenotype="imaginary") gcs1 ## or with color... gcs2 <- GeneColorSet(sample.ExpressionSet[100:109], phenotype="imaginary", geneColor=factor( rep(c("up", "down", "unchanged"), length.out=10)), phenotypeColor=factor( rep(c("enhanced", "reduced"), length.out=10))) coloring(gcs2) ## recode geneColor of genes 1 and 4 coloring(gcs2)[c(1,4),"geneColor"] <- "down" coloring(gcs2) ## reset, this time by gene name coloring(gcs2)[c("31339_at", "31342_at"),"geneColor"] <- c("up", "up") ## usual 'factor' errors and warning apply: coloring(gcs2)[c("31339_at", "31342_at"),"geneColor"] <- c("UP", "up") gcs2[["31342_at"]] try(gcs2[["31342_"]]) # no partial matching gcs2$"31342" # 1 partial match ok
## Create a GeneColorSet from an ExpressionSet data(sample.ExpressionSet) gcs1 <- GeneColorSet(sample.ExpressionSet[100:109], phenotype="imaginary") gcs1 ## or with color... gcs2 <- GeneColorSet(sample.ExpressionSet[100:109], phenotype="imaginary", geneColor=factor( rep(c("up", "down", "unchanged"), length.out=10)), phenotypeColor=factor( rep(c("enhanced", "reduced"), length.out=10))) coloring(gcs2) ## recode geneColor of genes 1 and 4 coloring(gcs2)[c(1,4),"geneColor"] <- "down" coloring(gcs2) ## reset, this time by gene name coloring(gcs2)[c("31339_at", "31342_at"),"geneColor"] <- c("up", "up") ## usual 'factor' errors and warning apply: coloring(gcs2)[c("31339_at", "31342_at"),"geneColor"] <- c("UP", "up") gcs2[["31342_at"]] try(gcs2[["31342_"]]) # no partial matching gcs2$"31342" # 1 partial match ok
Gene identifier classes and functions are used to indicate what the
list of genes in a gene set represents (e.g., Entrez gene identifiers
are tagged with EntrezIdentifier()
, Bioconductor annotations with
AnnotationIdentifier()
).
NullIdentifier(annotation, ...) EnzymeIdentifier(annotation, ...) ENSEMBLIdentifier(annotation, ...) GenenameIdentifier(annotation,...) RefseqIdentifier(annotation,...) SymbolIdentifier(annotation,...) UniprotIdentifier(annotation,...) EntrezIdentifier(annotation,...) AnnotationIdentifier(annotation, ...) AnnoOrEntrezIdentifier(annotation, ...)
NullIdentifier(annotation, ...) EnzymeIdentifier(annotation, ...) ENSEMBLIdentifier(annotation, ...) GenenameIdentifier(annotation,...) RefseqIdentifier(annotation,...) SymbolIdentifier(annotation,...) UniprotIdentifier(annotation,...) EntrezIdentifier(annotation,...) AnnotationIdentifier(annotation, ...) AnnoOrEntrezIdentifier(annotation, ...)
annotation |
An optional character string identifying the Bioconductor package from which the annotations are drawn, e.g., ‘hgu95av2’, ‘org.Hs.eg.db’. Or an ‘src_organism’ object, e.g. ‘Organism.dplyr::src_organism(TxDb.Hsapiens.UCSC.hg38.knownGene)’. |
... |
Additional arguments, usually none. |
For all but AnnoOrEntrezIdentifier
, An object of the same class
as the function name, initialized as appropriate for the identifier.
For AnnoOrEntrezIdentifier
, either an
AnnotationIdentifier
or EntrezIdentifier
depending on
the argument. This requires that the corresponding chip- or organism
package be loaded, hence installed on the user's system.
Martin Morgan <[email protected]>
GeneIdentifierType
-class for a description of the
classes and methods using these objects.
NullIdentifier() data(sample.ExpressionSet) gs1 <- GeneSet(sample.ExpressionSet[100:109], setName="sample1", setIdentifier="100") geneIdType(gs1) # AnnotationIdentifier geneIds <- featureNames(sample.ExpressionSet)[100:109] gs2 <- GeneSet(geneIds=geneIds, setName="sample1", setIdentifier="101") geneIdType(gs2) # NullIdentifier, since no info about genes provided ## Convert... ai <- AnnotationIdentifier(annotation(sample.ExpressionSet)) geneIdType(gs2) <- ai geneIdType(gs2) ## ...or provide more explicit construction gs3 <- GeneSet(geneIds=geneIds, type=ai, setName="sample1", setIdentifier="102") uprotIds <- c("Q9Y6Q1", "A6NJZ7", "Q9BXI6", "Q15035", "A1X283", "P55957") gs4 <- GeneSet(uprotIds, geneIdType=UniprotIdentifier()) geneIdType(gs4) # UniprotIdentifier
NullIdentifier() data(sample.ExpressionSet) gs1 <- GeneSet(sample.ExpressionSet[100:109], setName="sample1", setIdentifier="100") geneIdType(gs1) # AnnotationIdentifier geneIds <- featureNames(sample.ExpressionSet)[100:109] gs2 <- GeneSet(geneIds=geneIds, setName="sample1", setIdentifier="101") geneIdType(gs2) # NullIdentifier, since no info about genes provided ## Convert... ai <- AnnotationIdentifier(annotation(sample.ExpressionSet)) geneIdType(gs2) <- ai geneIdType(gs2) ## ...or provide more explicit construction gs3 <- GeneSet(geneIds=geneIds, type=ai, setName="sample1", setIdentifier="102") uprotIds <- c("Q9Y6Q1", "A6NJZ7", "Q9BXI6", "Q15035", "A1X283", "P55957") gs4 <- GeneSet(uprotIds, geneIdType=UniprotIdentifier()) geneIdType(gs4) # UniprotIdentifier
This class provides a way to tag the meaning of gene
symbols in a GeneSet
. For instance, a GeneSet
with gene
names derived from a Bioconductor annotation
package (e.g., via
ExpressionSet
) initially have a
GeneIdentifierType
of AnnotationIdentifier
.
The following classes are available, and derive from tables in ‘annotation’ packages
No formal information about what gene identifiers represent.
Gene identifiers are Affymetrix chip-specific probe identifier, as represented in Bioconductor annotation packages.
‘Entrez’ identifiers.
‘EC’ identifiers.
‘ENSEMBL’ identifiers.
Curated and ad hoc descriptive gene names.
‘Prosite’ identifiers.
‘Symbol’ identifiers.
‘Uniprot’ identifiers.
A virtual Class: No objects may be created
from it; all classes listed above are subclasses of
GeneIdentifierType
.
All GeneIdentifierType
classes have the following slots:
Object of class "ScalarCharacter"
containing the character string representation of this
GeneIdentifierType
.
Object of class "ScalarCharacter"
containing the name of the annotation package from which the
identifiers (probe identifiers) are derived.
GeneIdentifierType
classes are used in:
signature(type = "GeneIdentifierType")
:
Create a new GeneSet
using identifiers of
GeneIdentifierType
.
signature(type = "GeneIdentifierType")
:
Create a new GeneColorSet
using identifiers of
GeneIdentifierType
.
signature(object = "GeneIdentifierType")
:
extract the name of the annotation package as a character string.
signature(object = "GeneIdentifierType", value = "character")
:
assign the name of the annotation package as a character string.
signature(object = "GeneIdentifierType")
:
return a character string representation of the type of this
object
.
signature(object = "GeneSet", verbose=FALSE, value =
"GeneIdentifierType")
: Changes the GeneIdentifierType
of
object
to value
, attempting to convert symbols in
the process. This method calls mapIdentifiers(what=object,
to=value, from=geneIdType(what), verbose=verbose)
.
See mapIdentifiers
.
signature(object = "GeneIdentifierType")
: display
this object.
Martin Morgan <[email protected]>
The example below lists GeneIdentifierType
classes defined in
this package; See the help pages of these classes for specific information.
names(getClass("GeneIdentifierType")@subclasses) # create an AnnotationIdentifier, and ask it's type geneIdType(AnnotationIdentifier(annotation="hgu95av2")) # Construct a GeneSet from an ExpressionSet, using the 'annotation' # field of ExpressionSet to recognize the genes as AnnotationType data(sample.ExpressionSet) gs <- GeneSet(sample.ExpressionSet[100:109], setName="sample.GeneSet", setIdentifier="123") geneIdType(gs) # AnnotationIdentifier ## Read a Broad set from the system (or a url), and discover their ## GeneIdentifierType fl <- system.file("extdata", "Broad.xml", package="GSEABase") bsets <- getBroadSets(fl) sapply(bsets, geneIdType) ## try to combine gene sets with different set types try(gs & sets[[1]]) ## Not run: ## Use the annotation package associated with the original ## ExpressionSet to map to EntrezIdentifier() ... geneIdType(gs) <- EntrezIdentifier() ## ...and try again gs & bsets[[1]] ## Another way to change annotation to Entrez (or other) ids probeIds <- featureNames(sample.ExpressionSet)[100:109] geneIds <- getEG(probeIds, "hgu95av2") GeneSet(EntrezIdentifier(), setName="sample.GeneSet2", setIdentifier="101", geneIds=geneIds) ## End(Not run) ## Create a new identifier setClass("FooIdentifier", contains="GeneIdentifierType", prototype=prototype( type=new("ScalarCharacter", "Foo"))) ## Create a constructor (optional) FooIdentifier <- function() new("FooIdentifier") geneIdType(FooIdentifier()) ## tidy up removeClass("FooIdentifier")
names(getClass("GeneIdentifierType")@subclasses) # create an AnnotationIdentifier, and ask it's type geneIdType(AnnotationIdentifier(annotation="hgu95av2")) # Construct a GeneSet from an ExpressionSet, using the 'annotation' # field of ExpressionSet to recognize the genes as AnnotationType data(sample.ExpressionSet) gs <- GeneSet(sample.ExpressionSet[100:109], setName="sample.GeneSet", setIdentifier="123") geneIdType(gs) # AnnotationIdentifier ## Read a Broad set from the system (or a url), and discover their ## GeneIdentifierType fl <- system.file("extdata", "Broad.xml", package="GSEABase") bsets <- getBroadSets(fl) sapply(bsets, geneIdType) ## try to combine gene sets with different set types try(gs & sets[[1]]) ## Not run: ## Use the annotation package associated with the original ## ExpressionSet to map to EntrezIdentifier() ... geneIdType(gs) <- EntrezIdentifier() ## ...and try again gs & bsets[[1]] ## Another way to change annotation to Entrez (or other) ids probeIds <- featureNames(sample.ExpressionSet)[100:109] geneIds <- getEG(probeIds, "hgu95av2") GeneSet(EntrezIdentifier(), setName="sample.GeneSet2", setIdentifier="101", geneIds=geneIds) ## End(Not run) ## Create a new identifier setClass("FooIdentifier", contains="GeneIdentifierType", prototype=prototype( type=new("ScalarCharacter", "Foo"))) ## Create a constructor (optional) FooIdentifier <- function() new("FooIdentifier") geneIdType(FooIdentifier()) ## tidy up removeClass("FooIdentifier")
Use GeneSet
to construct gene sets from ExpressionSet
,
character vector, or other objects.
GeneSet(type, ..., setIdentifier=.uniqueIdentifier())
GeneSet(type, ..., setIdentifier=.uniqueIdentifier())
type |
An argument determining how the gene set will be created, as described in the Methods section. |
setIdentifier |
A |
... |
Additional arguments for gene set construction.
Methods have required arguments, as outlined below; additional
arguments correspond to slot names |
signature(type = "missing", ..., setIdentifier=.uniqueIdentifier())
Construct an empty gene set.
signature(type = "character", ..., setIdentifier=.uniqueIdentifier())
Construct a gene set using identifiers type
.
signature(type = "GeneIdentifierType", ..., setIdentifier=.uniqueIdentifier())
Construct an empty gene set. The gene set has geneIdType
created from the GeneIdentifierType
of type
.
signature(type = "ExpressionSet", ..., setIdentifier=.uniqueIdentifier())
Construct a gene set from an
ExpressionSet
. geneIdType
is set to AnnotationIdentifier
; the annotation field and
annotation package of the ExpressionSet
are consulted to
determine organism
, if possible. Short and long
descriptions from the ExpressionSet
experimentData
title and abstract; pub med ids, urls, and contributor are also
derived from experimentData
.
signature(type = "GOCollection", ..., geneIdType, setIdentifier=.uniqueIdentifier())
Use genes contained in type
to create a GeneSet
. The required arugment geneIdType
must include a package
for which an appropriate map (to GO) exists, e.g.,
EntrezIdentifier('org.Hs.eg.db')
.
signature(type = "BroadCollection", ..., urls = character(0), setIdentifier=.uniqueIdentifier())
Read XML following the Broad Institute schema and located at
urls
to create a gene set. The url can be a local file or
internet connection, but must contain just a single gene set. See
getBroadSets
for details.
GeneSet-class
GeneColorSet-class
## Empty gene set GeneSet() ## Gene set from ExpressionSet data(sample.ExpressionSet) gs1 <- GeneSet(sample.ExpressionSet[100:109]) ## GeneSet from Broad XML; 'fl' could be a url fl <- system.file("extdata", "Broad.xml", package="GSEABase") gs2 <- getBroadSets(fl)[[1]] # actually, a list of two gene sets ## GeneSet from list of gene identifiers geneIds <- geneIds(gs2) # any character vector would do gs3 <- GeneSet(geneIds) ## unspecified set type, so... is(geneIdType(gs3), "NullIdentifier") == TRUE ## update set type to match encoding of identifiers geneIdType(gs2) geneIdType(gs3) <- SymbolIdentifier() ## other ways of accomplishing the same gs4 <- GeneSet(geneIds, geneIdType=SymbolIdentifier()) gs5 <- GeneSet(SymbolIdentifier(), geneIds=geneIds)
## Empty gene set GeneSet() ## Gene set from ExpressionSet data(sample.ExpressionSet) gs1 <- GeneSet(sample.ExpressionSet[100:109]) ## GeneSet from Broad XML; 'fl' could be a url fl <- system.file("extdata", "Broad.xml", package="GSEABase") gs2 <- getBroadSets(fl)[[1]] # actually, a list of two gene sets ## GeneSet from list of gene identifiers geneIds <- geneIds(gs2) # any character vector would do gs3 <- GeneSet(geneIds) ## unspecified set type, so... is(geneIdType(gs3), "NullIdentifier") == TRUE ## update set type to match encoding of identifiers geneIdType(gs2) geneIdType(gs3) <- SymbolIdentifier() ## other ways of accomplishing the same gs4 <- GeneSet(geneIds, geneIdType=SymbolIdentifier()) gs5 <- GeneSet(SymbolIdentifier(), geneIds=geneIds)
A GeneSet
contains a set of gene identifiers. Each gene set has a
geneIdType
, indicating how the gene identifiers should be interpreted
(e.g., as Entrez identifiers), and a collectionType
, indicating
the origin of the gene set (perhaps including additional information
about the set, as in the BroadCollection
type).
Conversion between identifiers, subsetting, and logical (set)
operations can be performed. Relationships between genes and phenotype
in a GeneSet
can be summarized using coloring
to create
a GeneColorSet
. A GeneSet
can be exported to XML with
toBroadXML
.
Construct a GeneSet
with a GeneSet
method (e.g.,
from a character vector of gene names, or an
ExpressionSet
), or from gene sets stored as XML
(locally or on the internet; see getBroadSets
)
setName
:Object of class "ScalarCharacter"
containing a short name (single word is best) to identify the set.
setIdentifier
:Object of class
"ScalarCharacter"
containing a (unique) identifier for the
set.
geneIdType
:Object of class "GeneIdentifierType"
containing information about how the gene identifiers are encoded. See
GeneIdentifierType
and related classes.
geneIds
:Object of class "character"
containing
the gene symbols.
collectionType
:Object of class
"CollectionType"
containing information about how the geneIds
were collected, including perhaps additional information unique to
the collection methodology. See CollectionType
and related classes.
shortDescription
:Object of class
"ScalarCharacter"
representing short description (1 line) of the gene set.
longDescription
:Object of class
"ScalarCharacter"
providing a longer description (e.g.,
like an abstract) of the gene set.
organism
:Object of class "ScalarCharacter"
represents the organism the gene set is derived from.
pubMedIds
:Object of class "character"
containing PubMed ids related to the gene set.
urls
:Object of class "character"
containing
urls used to construct or manipulate the gene set.
contributor
:Object of class "character"
identifying who created the gene set.
version
:Object of class "Versions"
a version
number, manually curated (i.e., by the contributor
) to
provide a consistent way of tracking a gene set.
creationDate
:Object of class "character"
containing the character string representation of the date on which
the gene set was created.
Gene set construction:
See GeneSet
methods and
getBroadSets
for convenient construction.
Slot access (e.g., setName
) and retrieve
(e.g., setName<-
) :
signature(object = "GeneSet", value = "CollectionType")
signature(object = "GeneSet")
signature(object = "GeneSet", value = "character")
signature(object = "GeneSet")
signature(object = "GeneSet", value = "character")
signature(object = "GeneSet")
signature(object = "GeneSet", value = "character")
signature(object = "GeneSet")
signature(object = "GeneSet", value = "character")
signature(object = "GeneSet")
signature(object = "GeneSet", value = "character")
signature(object = "GeneSet")
signature(object = "GeneSet", value = "character")
signature(object = "GeneSet")
signature(object = "GeneSet", value = "character")
signature(object = "GeneSet")
signature(x = "GeneSet", y = "GeneSet")
signature(object = "GeneSet", value = "character")
signature(object = "GeneSet")
signature(object = "GeneSet", value = "character")
signature(object = "GeneSet")
signature(object = "GeneSet", verbose=FALSE, value = "character")
,
signature(object = "GeneSet", verbose=FALSE, value = "GeneIdentifierType")
:
These method attempt to coerce geneIds from
the current type to the type named by value
. Successful
coercion requires an appropriate method for mapIdentifiers
.
signature(object = "GeneSet")
signature(object = "GeneSet", value = "Versions")
signature(object = "GeneSet")
signature(object = "GeneSet", value = "character")
signature(object = "GeneSet")
Logical and subsetting operations:
signature(x = "GeneSet", y = "GeneSet")
: ...
signature(e1 = "GeneSet", e2 = "GeneSet")
: calculate
the logical ‘or’ (union) of two gene sets. The sets must contain elements of
the same geneIdType
.
signature(e1 = "GeneSet", e2 = "character")
,
signature(e1 = "character", e2 = "GeneSet")
:
calculate the logical ‘or’ (union) of a gene set and a character vector,
i.e., add the geneIds named in the character vector to the gene set.
signature(x = "GeneSet", y = "GeneSet")
:
signature(e1 = "GeneSet", e2 = "GeneSet")
: calculate
the logical ‘and’ (intersection) of two gene sets.
signature(e1 = "GeneSet", e2 = "character")
,
signature(e1 = "character", e2 = "GeneSet")
:
calculate the logical ‘and’ (intersection) of a gene set and a
character vector, creating a new gene set containing only those
genes named in the character vector.
signature(x = "GeneSet", y = "GeneSet")
,
signature(x = "GeneSet", y = "character")
,
signature(x = "character", y = "GeneSet")
:
calculate the logical set difference betwen two gene sets, or
betwen a gene set and a character vector.
signature(x = "GeneSet", i="character")
signature(x = "GeneSet", i="numeric")
: subset the gene set by
index (i="numeric"
) or value (i="character"
). Genes
are re-ordered as required
signature(x = "ExpressionSet", i = "GeneSet")
: subset the
expression set, using genes in the gene set to select
features. Genes in the gene set are coerced to appropriate annotation type
if necessary (by consulting the annotation
slot of the
expression set, and using geneIdType<-
).
signature(x = "GeneSet")
: select a single gene from
the gene set.
signature(x = "GeneSet")
: select a single gene from
the gene set, allowing partial matching.
Useful additional methods include:
signature(type = "GeneSet")
: create a
'color' gene set from a GeneSet
, containing information
about phenotype. This method has a required argument
phenotype
, a character string describing the phenotype for
which color is available. See GeneColorSet
.
Use the code in the examples to list available
methods. These convert genes from one GeneIdentifierType
to another. See
mapIdentifiers
and specific methods in
GeneIdentifierType
for additional detail.
Summarize shared membership in genes across gene
sets. See incidence-methods
.
Export to 'GMT' format file. See toGmt
.
signature(object = "GeneSet")
: display a short
summary of the gene set.
signature(object = "GeneSet")
: display
additional information about the gene set. See details
.
signature(.Object = "GeneSet")
: Used
internally during gene set construction.
Martin Morgan <[email protected]>
GeneColorSet
CollectionType
GeneIdentifierType
## Empty gene set GeneSet() ## Gene set from ExpressionSet data(sample.ExpressionSet) gs1 <- GeneSet(sample.ExpressionSet[100:109]) ## GeneSet from Broad XML; 'fl' could be a url fl <- system.file("extdata", "Broad.xml", package="GSEABase") gs2 <- getBroadSets(fl)[[1]] # actually, a list of two gene sets ## GeneSet from list of geneIds geneIds <- geneIds(gs2) # any character vector would do gs3 <- GeneSet(geneIds=geneIds) ## unspecified set type, so... is(geneIdType(gs3), "NullIdentifier") == TRUE ## update set type to match encoding of identifiers geneIdType(gs2) geneIdType(gs3) <- SymbolIdentifier() ## Convert between set types; this consults the 'annotation' ## information encoded in the 'AnnotationIdentifier' set type and the ## corresponding annotation package. ## Not run: gs4 <- gs1 geneIdType(gs4) <- EntrezIdentifier() ## End(Not run) ## logical (set) operations gs5 <- GeneSet(sample.ExpressionSet[100:109], setName="subset1") gs6 <- GeneSet(sample.ExpressionSet[105:114], setName="subset2") ## intersection: 5 'genes'; note the set name '(subset1 & subset2)' gs5 & gs6 ## union: 15 'genes'; note the set name gs5 | gs6 ## an identity gs7 <- gs5 | gs6 gs8 <- setdiff(gs5, gs6) | (gs5 & gs6) | setdiff(gs6, gs5) identical(geneIds(gs7), geneIds(gs8)) identical(gs7, gs8) == FALSE # gs7 and gs8 setNames differ ## output tmp <- tempfile() toBroadXML(gs2, tmp) noquote(readLines(tmp)) ## must be BroadCollection() collectionType try(toBroadXML(gs1)) gs9 <- gs1 collectionType(gs9) <- BroadCollection() toBroadXML(gs9, tmp) unlink(tmp) toBroadXML(gs9) # no connection --> character vector ## list of geneIds --> vector of Broad GENESET XML gs10 <- getBroadSets(fl) # two sets entries <- sapply(gs10, function(x) toBroadXML(x)) ## list mapIdentifiers available for GeneSet showMethods("mapIdentifiers", classes="GeneSet", inherit=FALSE)
## Empty gene set GeneSet() ## Gene set from ExpressionSet data(sample.ExpressionSet) gs1 <- GeneSet(sample.ExpressionSet[100:109]) ## GeneSet from Broad XML; 'fl' could be a url fl <- system.file("extdata", "Broad.xml", package="GSEABase") gs2 <- getBroadSets(fl)[[1]] # actually, a list of two gene sets ## GeneSet from list of geneIds geneIds <- geneIds(gs2) # any character vector would do gs3 <- GeneSet(geneIds=geneIds) ## unspecified set type, so... is(geneIdType(gs3), "NullIdentifier") == TRUE ## update set type to match encoding of identifiers geneIdType(gs2) geneIdType(gs3) <- SymbolIdentifier() ## Convert between set types; this consults the 'annotation' ## information encoded in the 'AnnotationIdentifier' set type and the ## corresponding annotation package. ## Not run: gs4 <- gs1 geneIdType(gs4) <- EntrezIdentifier() ## End(Not run) ## logical (set) operations gs5 <- GeneSet(sample.ExpressionSet[100:109], setName="subset1") gs6 <- GeneSet(sample.ExpressionSet[105:114], setName="subset2") ## intersection: 5 'genes'; note the set name '(subset1 & subset2)' gs5 & gs6 ## union: 15 'genes'; note the set name gs5 | gs6 ## an identity gs7 <- gs5 | gs6 gs8 <- setdiff(gs5, gs6) | (gs5 & gs6) | setdiff(gs6, gs5) identical(geneIds(gs7), geneIds(gs8)) identical(gs7, gs8) == FALSE # gs7 and gs8 setNames differ ## output tmp <- tempfile() toBroadXML(gs2, tmp) noquote(readLines(tmp)) ## must be BroadCollection() collectionType try(toBroadXML(gs1)) gs9 <- gs1 collectionType(gs9) <- BroadCollection() toBroadXML(gs9, tmp) unlink(tmp) toBroadXML(gs9) # no connection --> character vector ## list of geneIds --> vector of Broad GENESET XML gs10 <- getBroadSets(fl) # two sets entries <- sapply(gs10, function(x) toBroadXML(x)) ## list mapIdentifiers available for GeneSet showMethods("mapIdentifiers", classes="GeneSet", inherit=FALSE)
a GeneSetCollection
is a collection of related
GeneSet
s. The collection can mix and match
different types of gene sets. Members of the collection are refered to
by the setName
s of each gene set.
Construct a GeneSetCollection
with a
GeneSetCollection
method, e.g., from a list of gene sets
or with several gene sets provided as argument to the constructor. See
examples below.
.Data
:Object of class "list"
, containing the
gene sets.
Class "list"
, from data part.
Class "vector"
, by class "list", distance 2.
Class "AssayData"
, by class "list", distance 2.
Gene set collection construction
See GeneSetCollection
methods
and getBroadSets
for convenient construction methods.
Collection access (operations on lists, such as length
, ,
lapply
also work on GeneSetCollection
).
signature(object = "GeneSetCollection")
:
return a list, with each member a character vector of gene
identifiers from the gene set collection.
signature(object="GeneSetCollection",
value="list")
: assign character vectors in value
to
corresponding geneIds
of object
.
signature(x = "GeneSetCollection")
: return the
setName
of each gene set in the colloection.
Logical and subsetting operations
signature(x = "GeneSetCollection", y = "ANY")
,
signature(x = "ANY", y = "GeneSetCollection")
: ...
signature(e1 = "GeneSetCollection", e2 = "ANY")
,
signautre(e1 = "GeneSet", e2 = "GeneSetCollection")
,
signautre(e1 = "character", e2 = "GeneSetCollection")
,
signature(e1 = "ANY", e2 = "GeneSetCollection")
:
calculate the logical 'or' (union) of all gene identifiers
in an object over all members of the gene set collection.
signature(x = "GeneSetCollection", y = "ANY")
,
signature(x = "ANY", y = "GeneSetCollection")
: ...
signature(e1 = "GeneSetCollection", e2 = "ANY")
,
signautre(e1 = "character", e2 = "GeneSetCollection")
,
signautre(e1 = "GeneSet", e2 = "GeneSetCollection")
,
signature(e1 = "ANY", e2 = "GeneSetCollection")
:
calculate the logical ‘and’ (intersection) of all gene identifiers
in a gene set or character vector, over all members of the gene
set collection.
signature(x = "GeneSetCollection", y = "ANY")
:
calculate the logical set difference betwen all gene sets in a
collection and the gene identifiers of a gene set or character
vector. A setdiff
method must be available for
x="GeneSet"
and the type of y
.
signature(x = "GeneSetCollection", i = "ANY", j = "ANY",
value = "ANY")
,
signature(x = "GeneSetCollection", i = "ANY", j = "ANY",
value = "GeneSet")
,
signature(x = "GeneSetCollection", i = "character", j =
"ANY", value = "GeneSet")
: assign new sets to existing set
members. To add entirely new sets, use a
GeneSetCollection
constructor.
signature(x = "GeneSetCollection", i = "logical")
,
signature(x = "GeneSetCollection", i = "numeric")
,
signature(x = "GeneSetCollection", i = "character")
: create
a GeneSetCollection
consisting of a subset of the current
set. All indicies i
must already be present in the set.
signature(x = "GeneSetCollection", i = "character")
:
Select a single gene set from the collection. Methods for
i="numeric"
are inherited from list
.
signature(x = "GeneSetCollection", i = "ANY", j = "ANY", value = "ANY")
,
signature(x = "GeneSetCollection", i = "numeric", j = "ANY", value = "GeneSet")
,
signature(x = "GeneSetCollection", i = "character", j = "ANY", value = "GeneSet")
:
Replace a gene set in the collecton with another.
value = "ANY"
serves to stop invalid assignments.
Additional useful methods.
Objects created in previous versions of GSEABase may be
incompatible with current object definitions. Usually this is
singalled by an error suggesting that a slot is missing, and a
recommnedation to use updateObject
. Use updateObject
to update a GeneSetCollection
and all contained
GeneSets
to their current defintion.
Convert genes from one GeneIdentifierType
to another. See
mapIdentifiers
and specific methods in
GeneIdentifierType
for additional detail.
Summarize shared membership in genes across gene
sets. See incidence-methods
.
Export to 'GMT' format file. See toGmt
.
signature(object="GeneSetCollection")
: provide a
compact representation of object
.
Martin Morgan <[email protected]>
gs1 <- GeneSet(setName="set1", setIdentifier="101") gs2 <- GeneSet(setName="set2", setIdentifier="102") ## construct from indivdiual elements... gsc <- GeneSetCollection(gs1, gs2) ## or from a list gsc <- GeneSetCollection(list(gs1, gs2)) ## 'names' are the setNames names(gsc) ## a collection of a single gene set gsc["set1"] ## a gene set gsc[["set1"]] ## set names must be unique try(GeneSetCollection(gs1, gs1)) try(gsc[c("set1", "set1")])
gs1 <- GeneSet(setName="set1", setIdentifier="101") gs2 <- GeneSet(setName="set2", setIdentifier="102") ## construct from indivdiual elements... gsc <- GeneSetCollection(gs1, gs2) ## or from a list gsc <- GeneSetCollection(list(gs1, gs2)) ## 'names' are the setNames names(gsc) ## a collection of a single gene set gsc["set1"] ## a gene set gsc[["set1"]] ## set names must be unique try(GeneSetCollection(gs1, gs1)) try(gsc[c("set1", "set1")])
Use GeneSetCollection
to construct a collection of gene sets
from GeneSet
arguments, or a list of
GeneSet
s.
GeneSetCollection(object, ..., idType, setType)
GeneSetCollection(object, ..., idType, setType)
object |
An argument determining how the gene set collection will be created, as described in the methods section. |
... |
Additional arugments for gene set collection construction, as described below. |
idType |
An argument of class
|
setType |
An argument of class
|
signature(object = "GeneSet",idType="missing", setType="missing")
Construct a gene set collection from one or more GeneSet
arugments.
signature(object = "list", idType="missing", setType="missing")
Construct a gene set collection from a list of GeneSet
s.
signature(object="missing", idType="AnnotationIdentifier",
setType="CollectionType")
signature(object="missing", idType="AnnotationIdentifier",
setType="CollectionIdType")
Construct a gene set collection of CollectionType
entities
(e.g., pathways for KEGGCollection
, protein families for
PfamCollection
) implied by the map found in
annotation(idType)
. If setType
is a
CollectionIdType
and length(ids(setType))>0
, the
gene set collection is filtered to contain only those sets implied
by the ids.
signature(object="character", idType="AnnotationIdentifier",
setType="CollectionType")
signature(object="character", idType="AnnotationIdentifier",
setType="CollectionIdType")
signature(object="character", idType="AnnotationIdentifier",
setType="GOCollection")
Construct a gene set collection of CollectionType
entities
(e.g., pathways for KEGGCollection
, protein families for
PfamCollection
) implied by the map found in
annotation(idType)
. Use only those identifiers in
object
. If setType
is a CollectionIdType
and
length(ids(setType))>0
, the gene set collection is filtered
to contain only those sets implied by the ids.
signature(object="character", idType="AnnotationIdentifier",
setType="PfamCollection")
Construct a gene set collection by mapping all values in
object
to PfamIds
found in the PFAM
map
implied by idType
.
signature(object="character", idType="AnnotationIdentifier",
setType="PrositeCollection")
Construct a gene set collection by mapping all values in
object
to ipi_ids
found in the PFAM
map
implied by idType
.
signature(object="character", idType="AnnotationIdentifier",
setType="ChrlocCollection")
Construct a gene set collection by mapping all values in
object
to chromosome, strand, and position information
found in the map implied by idType
.
signature(object="ExpressionSet", idType="missing",
setType="CollectionType")
signature(object="ExpressionSet", idType="missing",
setType="CollectionIdType")
Construct a gene set collection using the annotation
and
featureNames
of object
to identify elements for
CollectionType
gene sets (e.g., pathways for
KEGGCollection
, protein families for PfamCollection
)
implied by object
. The gene set collection contains only
those AnnotationIdentifiers
found in
featureNames(object)
; if setType
is a
CollectionIdType
and length(ids(setType))>0
, the
gene set collection is further filtered to contain only those sets
implied by the ids.
signature(object="ExpressionSet", idType="missing",
setType="GOCollection")
Construct a gene set collection using the annotation
and
featureNames
of object
to identify
GO
pathways implied by object
. The map between
featureNames
and GO
pathway identifiers is derived
from the GO2PROBE
table of the annotation
package of object
. The gene set collection contains only
those AnnotationIdentifiers
found in
featureNames(object)
. The evidenceCode
of
GOCollection
can be used to restrict the pathways seleted to
those with matching evidence codes.
signature(object="ExpressionSet", idType="missing",
setType="PfamCollection")
Construct a gene set collection by mapping all values in
featureNames(object)
to PfamIds
found in the
PFAM
map implied by idType=AnnotationIdentifer(annotation(object))
.
signature(object="ExpressionSet", idType="missing", setType="PrositeCollection")
Construct a gene set collection by mapping all values in
featureNames(object)
to ipi_id
found in the
PFAM
map implied by idType=AnnotationIdentifer(annotation(object))
.
signature(object="ExpressionSet", idType="missing", setType="ChrlocCollection")
Construct a gene set collection by mapping all values in
featureNames(object)
to chromosome, strand, and position
information found in the CHRLOC
map implied by
idType=AnnotationIdentifer(annotation(object))
.
signature(object="missing", idType="AnnotationIdentifier", setType="GOCollection")
signature(object="GOAllFrame", idType="missing", setType="GOCollection")
Construct a gene set collection containing all GO pathways
referenced in the GOALLFrame
provided. Each gene set only
those Identifiers
found in GOALLFrame
. The
ontology
of each GOALLFrame
GO ID will be included
in the gene Set of that GO ID .
GeneSetCollection
-class
gs1 <- GeneSet(setName="set1", setIdentifier="101") gs2 <- GeneSet(setName="set2", setIdentifier="102") ## construct from indivdiual elements... gsc <- GeneSetCollection(gs1, gs2) ## or from a list gsc <- GeneSetCollection(list(gs1, gs2)) ## set names must be unique try(GeneSetCollection(gs1, gs1)) data(sample.ExpressionSet) gsc <- GeneSetCollection(sample.ExpressionSet[200:250], setType = GOCollection()) ## Not run: ## from KEGG identifiers, for example library(KEGG.db) lst <- head(as.list(KEGGEXTID2PATHID)) gsc <- GeneSetCollection(mapply(function(geneIds, keggId) { GeneSet(geneIds, geneIdType=EntrezIdentifier(), collectionType=KEGGCollection(keggId), setName=keggId) }, lst, names(lst))) ## End(Not run)
gs1 <- GeneSet(setName="set1", setIdentifier="101") gs2 <- GeneSet(setName="set2", setIdentifier="102") ## construct from indivdiual elements... gsc <- GeneSetCollection(gs1, gs2) ## or from a list gsc <- GeneSetCollection(list(gs1, gs2)) ## set names must be unique try(GeneSetCollection(gs1, gs1)) data(sample.ExpressionSet) gsc <- GeneSetCollection(sample.ExpressionSet[200:250], setType = GOCollection()) ## Not run: ## from KEGG identifiers, for example library(KEGG.db) lst <- head(as.list(KEGGEXTID2PATHID)) gsc <- GeneSetCollection(mapply(function(geneIds, keggId) { GeneSet(geneIds, geneIdType=EntrezIdentifier(), collectionType=KEGGCollection(keggId), setName=keggId) }, lst, names(lst))) ## End(Not run)
getOBOCollection
parses a uri (file or internet location)
encoded following the OBO specification defined by the Gene Onotology
consortium.
getOBOCollection(uri, evidenceCode="ANY", ...)
getOBOCollection(uri, evidenceCode="ANY", ...)
uri |
A file name or URL containing gene sets encoded following the OBO specification. |
evidenceCode |
A character vector of evidence codes. |
... |
Further arguments passed to the
|
getOBOCollection
returns an OBOCollection
of gene
sets. The gene set is constructed by parsing the file for id
tags in TERM
stanzas. The parser does not currently support all
features of OBO, e.g., the ability to import additional files.
Martin Morgan <[email protected]>
## 'fl' could also be a URI fl <- system.file("extdata", "goslim_plant.obo", package="GSEABase") getOBOCollection(fl) # GeneSetCollection of 2 sets ## Not run: ## Download from the internet fl <- "http://www.geneontology.org/GO_slims/goslim_plant.obo" getOBOCollection(fl, evidenceCode="TAS") ## End(Not run)
## 'fl' could also be a URI fl <- system.file("extdata", "goslim_plant.obo", package="GSEABase") getOBOCollection(fl) # GeneSetCollection of 2 sets ## Not run: ## Download from the internet fl <- "http://www.geneontology.org/GO_slims/goslim_plant.obo" getOBOCollection(fl, evidenceCode="TAS") ## End(Not run)
These methods summarize the gene ontology terms implied by the
idSrc
argument into the GO terms implied by the
slimCollection
argument. The summary takes identifiers in
idSrc
and determines all GO terms that
apply to the identifiers. This full list of GO terms are then
classified for membership in each term in the
slimCollection
.
The resulting object is a data frame containing the terms of
slimCollection
as row labels, counts and frequencies of
identifiers classified to each term, and an abbreviated term
description.
An identifier in idSrc
can expand to several GO terms, and the
GO terms in slimCollection
can imply an overlapping hierarchy
of terms. Thus the resulting summary can easily contain more counts
than there are identifiers in idSrc
.
goSlim(idSrc, slimCollection, ontology, ..., verbose=FALSE)
goSlim(idSrc, slimCollection, ontology, ..., verbose=FALSE)
idSrc |
An argument determining the source of GO terms to be
mapped to slim terms. The source might be a |
slimCollection |
An argument containing the GO slim terms. |
ontology |
A character string naming the ontology to be consulted when identifying slim term hierarchies. One of ‘MF’ (molecular function), ‘BP’ (biological process), ‘CC’ (cellular compartment). |
... |
Additional arguments passed to specific methods. |
verbose |
Logical influencing whether messages (primarily missing GO terms arising during creation of the slim hierarchy) are reported. |
Classify idSrc
GO
terms into slimCollection
categories. The hierarchy of
terms included for each term is from the ontology (MF, BP, or CC)
specified by ontology
. verbose
informs about, e.g., GO
terms that are not found.
Determine the (unique) GO terms
implied by feature names in idSrc
(using the annotation map
identified in annotation(idSrc)
).
myIds <- c("GO:0016564", "GO:0003677", "GO:0004345", "GO:0008265", "GO:0003841", "GO:0030151", "GO:0006355", "GO:0009664", "GO:0006412", "GO:0015979", "GO:0006457", "GO:0005618", "GO:0005622", "GO:0005840", "GO:0015935", "GO:0000311") myCollection <- GOCollection(myIds) fl <- system.file("extdata", "goslim_plant.obo", package="GSEABase") slim <- getOBOCollection(fl) goSlim(myCollection, slim, "MF") data(sample.ExpressionSet) goSlim(sample.ExpressionSet, slim, "MF", evidenceCode="TAS")
myIds <- c("GO:0016564", "GO:0003677", "GO:0004345", "GO:0008265", "GO:0003841", "GO:0030151", "GO:0006355", "GO:0009664", "GO:0006412", "GO:0015979", "GO:0006457", "GO:0005618", "GO:0005622", "GO:0005840", "GO:0015935", "GO:0000311") myCollection <- GOCollection(myIds) fl <- system.file("extdata", "goslim_plant.obo", package="GSEABase") slim <- getOBOCollection(fl) goSlim(myCollection, slim, "MF") data(sample.ExpressionSet) goSlim(sample.ExpressionSet, slim, "MF", evidenceCode="TAS")
getBroadSets
parses one or more XML files for gene sets. The
file can reside locally or at a URL. The format followed is that
defined by the Broad (below). toBroadXML
creates Broad XML
from BroadCollection
gene sets.
toGmt
converts GeneSetColletion
objects to a character
vector representing the gene set collection in GMT
format. getGmt
reads a GMT file or other character vector into a
GeneSetColletion
.
getBroadSets(uri, ..., membersId=c("MEMBERS_SYMBOLIZED", "MEMBERS_EZID")) toBroadXML(geneSet, con, ...) asBroadUri(name, base="http://www.broad.mit.edu/gsea/msigdb/cards") getGmt(con, geneIdType=NullIdentifier(), collectionType=NullCollection(), sep="\t", ...) toGmt(x, con, ...)
getBroadSets(uri, ..., membersId=c("MEMBERS_SYMBOLIZED", "MEMBERS_EZID")) toBroadXML(geneSet, con, ...) asBroadUri(name, base="http://www.broad.mit.edu/gsea/msigdb/cards") getGmt(con, geneIdType=NullIdentifier(), collectionType=NullCollection(), sep="\t", ...) toGmt(x, con, ...)
uri |
A file name or URL containing gene sets encoded following the Broad specification. For Broad sets, the uri can point to a MSIGDB. |
geneSet |
A |
x |
A |
con |
A (optional, in the case of |
name |
A character vector of Broad gene set names, e.g.,
|
base |
Base uri for finding Broad gene sets. |
geneIdType |
A constructor for the type of identifier the members
of the gene sets represent. See |
collectionType |
A constructor for the type of collection for the
gene sets. See |
sep |
The character string separating members of each gene set in the GMT file. |
... |
Further arguments passed to the underlying XML parser,
particularly |
membersId |
XML field name from which |
getBroadSets
returns a GeneSetCollection
of gene sets.
toBroadXML
returns a character vector of a single
GeneSet
or, if con
is provided, writes the XML to a
file.
asBroadUri
can be used to create URI names (to be used by
getBroadSets
of Broad files.
getGmt
returns a GeneSetCollection
of gene sets.
toGmt
returns character vectors where each line represents a
gene set. If con
is provided, the result is written to the
specified connection.
Actual Broad XML files differ from the DTD (e.g., an implied ',' separator between genes in a set); we parse to and from files as they exists the actual files.
Martin Morgan <[email protected]>
http://www.broad.mit.edu/gsea/
## 'fl' could also be a URI fl <- system.file("extdata", "Broad.xml", package="GSEABase") gss <- getBroadSets(fl) # GeneSetCollection of 2 sets names(gss) gss[[1]] ## Not run: ## Download 'msigdb_v2.5.xml' or 'c3.all.v2.5.symbols.gmt' from the ## Broad, http://www.broad.mit.edu/gsea/downloads.jsp#msigdb, then gsc <- getBroadSets("/path/to/msigdb_v.2.5.xml") types <- sapply(gsc, function(elt) bcCategory(collectionType(elt))) c3gsc1 <- gsc[types == "c3"] c3gsc2 <- getGmt("/path/to/c3.all.v2.5.symbols.gmt", collectionType=BroadCollection(category="c3"), geneIdType=SymbolIdentifier()) ## End(Not run) fl <- tempfile() toBroadXML(gss[[1]], con=fl) noquote(readLines(fl)) unlink(fl) ## Not run: toBroadXML(gss[[1]]) # character vector ## End(Not run) fl <- tempfile() toGmt(gss, fl) getGmt(fl) unlink(fl)
## 'fl' could also be a URI fl <- system.file("extdata", "Broad.xml", package="GSEABase") gss <- getBroadSets(fl) # GeneSetCollection of 2 sets names(gss) gss[[1]] ## Not run: ## Download 'msigdb_v2.5.xml' or 'c3.all.v2.5.symbols.gmt' from the ## Broad, http://www.broad.mit.edu/gsea/downloads.jsp#msigdb, then gsc <- getBroadSets("/path/to/msigdb_v.2.5.xml") types <- sapply(gsc, function(elt) bcCategory(collectionType(elt))) c3gsc1 <- gsc[types == "c3"] c3gsc2 <- getGmt("/path/to/c3.all.v2.5.symbols.gmt", collectionType=BroadCollection(category="c3"), geneIdType=SymbolIdentifier()) ## End(Not run) fl <- tempfile() toBroadXML(gss[[1]], con=fl) noquote(readLines(fl)) unlink(fl) ## Not run: toBroadXML(gss[[1]]) # character vector ## End(Not run) fl <- tempfile() toGmt(gss, fl) getGmt(fl) unlink(fl)
An incidence matrix summarizes shared membership of gene identifiers across (pairs of) gene sets.
The return value is a matrix with rows representing gene sets and columns genes.
signature(x="GeneSet", ...)
signature(x="GeneColorSet", ...)
All additional arguments ...
are of the same class as
x
. The incidence matrix contains elements 0 (genes not
present) or 1 (genes present).
signature(x = "GeneSetCollection", ...)
Additional arguments ...
can be of class
GeneSetCollection
or GeneSet
. The incidence matrix
contains elements 0 (genes not present) or 1 (genes present).
fl <- system.file("extdata", "Broad.xml", package="GSEABase") gss <- getBroadSets(fl) # GeneSetCollection of 2 sets ## From one or more GeneSetCollections... imat <- incidence(gss) dim(imat) imat[,c(1:3,ncol(imat)-3+1:3)] ## .. or GeneSets imat1 <- incidence(gss[[1]], gss[[2]], gss[[1]]) imat1[,1:5]
fl <- system.file("extdata", "Broad.xml", package="GSEABase") gss <- getBroadSets(fl) # GeneSetCollection of 2 sets ## From one or more GeneSetCollections... imat <- incidence(gss) dim(imat) imat[,c(1:3,ncol(imat)-3+1:3)] ## .. or GeneSets imat1 <- incidence(gss[[1]], gss[[2]], gss[[1]]) imat1[,1:5]
These methods convert the genes identifiers of a gene set from one
type to another, e.g., from EntrezIdentifier
to
AnnotationIdentifier
. Methods can be called directly by
the user; geneIdType<-
provides similar
functionality. verbose=TRUE
produces warning messages when
maps between identifier types are not 1:1, or a map has to be
constructed on the fly (this situation does not apply when using the
DBI-based annotation packages).
The following methods are defined on what="GeneSet"
:
This method warns of attempts to map from
and to
the
same type, or generates an error if no suitable
mapIdentifiers
methods are available.
This method will re-dispatch to a method with signature
signature(what=what, to=to, from=geneIdType(what))
, and is
present so that a user can call mapIdentifiers
without providing an
explicit from
argument.
This maps a gene set from gene identifiers represented by the
NullIdentifier
type (i.e., no type associated with the genes) to
gene identifiers represent by any class derived from
GeneIdentifierType
.
This maps a gene set from gene identifiers represented by any
GeneIdentifierType
type to one represented by the
NullIdentifier
(i.e., no type associated with the genes).
Maps identifiers found in what
to the type described by
to
, using the map (key-value pairs) found in from
.
Maps identifiers found in what
to the type described by
to
, using the map (key-value pairs) found in from
.
The following methods are defined for what=GeneColorSet
. These methods
map gene- and phenotype color appropriately, and fail if coloring of
gene identifiers involved in several-to-1 mappings conflict.
This method will re-dispatch to a method with signature
signature(what=what, to=to, from=geneIdType(what))
, and is
present so that a user can call mapIdentifiers
without providing an
explicit from
argument.
This maps a gene set from gene identifiers represented by the
NullIdentifier
type (i.e., no type associated with the genes) to
gene identifiers represent by any class derived from
GeneIdentifierType
.
This maps a gene set from gene identifiers represented by any
GeneIdentifierType
type to one represented by the
NullIdentifier
(i.e., no type associated with the genes).
This method is not implemented, and exists to stop incorrect
application of the GeneSet
method.
This method is not implemented, and exists to stop incorrect
application of the GeneSet
method.
A method exists for what="GeneSetCollection"
:
Map each gene set in what
to gene identifier type
to
, using methods described above.
OBOCollection
extends the GOCollection
class, and
is usually constructed from a file formated following the OBO file
format. See CollectionType
for general use of
collections with gene sets.
Objects are instantiated with calls to OBOCollection
or
getOBOCollection
.
OBOCollection
extends GOCollection
and
OBOCollection
has the following additional slots (these slots are
NOT meant to be manipulated directly by the user):
.stanza
:A data.frame
representing the stanzas
present in an OBO file. Row names of the data frame are
unique stanza identifiers. The value
column contains the
stanza name (e.g., ‘Term’, i.e., the stanza name associated
with a GO identifier).
.subset
A data.frame
representing (optional)
subsets defined in the collection. Subsets are defined in the
header of an OBO file with a subsetdef tag. Row names of the data
frame are the subsetdef names; the value
column contains
the subset definition.
.kv
A data.frame
representing key-value pairs
in the OBO source file. The row names of the data frame correspond
to lines in the OBO file. The stanza_id
column indexes the
row of .stanza
describing the stanza in which the key-value
pair occured. The remaining columns (key
, value
)
contain the parsed key and value.
OBOCollection
has the following methods, in addition to those inherited from
GOCollection
.
These methods list and select subsets of OBOCollection
:
signature(object="OBOCollection",
display="named")
: return a character vector of subsets present
in object
. Valid values for display
are
‘named’ (a named character vector, with names equal to the
names of the subsets and values the descriptions), ‘full’
(a character vector of name and description, with each pair
formated into a single entry as “name (description)”),
‘key’ (subset names), or ‘value’ (subset
descriptions).
signature(object="OBOCollection", i="character",
j="missing", ...)
: return an
OBOCollection
by selecting just those subsets whose name
matches the string(s) in i
. This method calls the
[,GOCollection
method so, e.g., evidenceCode
can be
used to restricts which evidence codes the collection will
identify.
These methods coerce to and from OBOCollection
:
signature(object="OBOCollection", "graphNEL")
:
create a directed graph with nodes generated from
ids(object)
and edges from is_a
relations of
object
.
signature(object="graphNEL", "OBOCollection")
:
create an OBOCollection
with ids
from the graph
nodes, and edges from inNodes(object)
.
Martin Morgan <[email protected]>
http://www.geneontology.org for details of the OBO format.
OBOCollection
constructor;
CollectionType
classes.
fl <- system.file("extdata", "goslim_plant.obo", package="GSEABase") obo <- getOBOCollection(fl) obo subsets(obo) obo["goslim_plant", evidenceCode="TAS"] g <- as(obo["goslim_goa"], "graphNEL") if (interactive() && require("Rgraphviz")) { plot(g) }
fl <- system.file("extdata", "goslim_plant.obo", package="GSEABase") obo <- getOBOCollection(fl) obo subsets(obo) obo["goslim_plant", evidenceCode="TAS"] g <- as(obo["goslim_goa"], "graphNEL") if (interactive() && require("Rgraphviz")) { plot(g) }