Title: | Category Analysis |
---|---|
Description: | A collection of tools for performing category (gene set enrichment) analysis. |
Authors: | Robert Gentleman [aut], Seth Falcon [ctb], Deepayan Sarkar [ctb], Robert Castelo [ctb], Bioconductor Package Maintainer [cre] |
Maintainer: | Bioconductor Package Maintainer <[email protected]> |
License: | Artistic-2.0 |
Version: | 2.73.0 |
Built: | 2024-12-29 04:08:50 UTC |
Source: | https://github.com/bioc/Category |
For each category, apply the function FUN
to the set of values
of stats
belonging to that category.
applyByCategory(stats, Amat, FUN = mean, ...)
applyByCategory(stats, Amat, FUN = mean, ...)
stats |
Numeric vector with test statistics of interest. |
Amat |
A logical or numeric matrix: the adjacency matrix of the
bipartite genes - category graph.
Its rows correspond to the categories, columns
to the genes, and |
FUN |
A function to apply to the subsets |
... |
Extra parameters passed to |
For GO categories, the function cateGOry
might be useful
for the construction of Amat
.
The return value is a list or vector of length equal to
the number of categories. Each element corresponds to the
values obtained by applying FUN
to the subset of values
in stats
according to the category defined for that
row.
R. Gentleman, contributions from W. Huber
set.seed(0xabcd) st = rnorm(20) names(st) = paste("gene", 1:20) a = matrix(sample(c(FALSE, TRUE), 60, replace=TRUE), nrow=3, dimnames = list(paste("category", LETTERS[1:3]), names(st))) applyByCategory(st, a, median)
set.seed(0xabcd) st = rnorm(20) names(st) = paste("gene", 1:20) a = matrix(sample(c(FALSE, TRUE), 60, replace=TRUE), nrow=3, dimnames = list(paste("category", LETTERS[1:3]), names(st))) applyByCategory(st, a, median)
The function constructs a category membership matrix, such as used by
applyByCategory
,
from a list of gene identifiers and their annotated GO categories.
For each of the GO categories stated in categ
,
all less specific terms (ancestors) are also included, thus one need
only obtain the most specific set of GO term mappings, which
can be obtained from Bioconductor annotation packages or via biomaRt.
The ancestor relationships are obtained from the GO.db package.
cateGOry(x, categ, sparse=FALSE)
cateGOry(x, categ, sparse=FALSE)
x |
Character vector with (arbitrary) gene identifiers. They will be used for the column names of the resulting matrix. |
categ |
A character vector of the same length as |
sparse |
Logical. If |
The function requires the GO
package.
For subsequent analyses, it is often useful to remove categories that have only a small number of members. Use the normal matrix subsetting syntax for this, see example.
If a GO category in categ
is not found in the GO annotation
package, a warning will be generated, and no ancestors
for that GO category are added (but that category itself will be part
of the returned adjacency matrix).
The adjacency matrix of the bipartite category membership graph, rows are categories and columns genes.
Wolfgang Huber
g = cateGOry(c("CG2671", "CG2671", "CG2950"), c("GO:0090079", "GO:0001738", "GO:0003676"), sparse=TRUE) g rowSums(g) ## number of genes in each category ## Filter out categories with less than minMem and more than maxMem members. ## This is toy data, in real applications, a choice of minMem higher ## than 2 will be more appropriate. filter = function(x, minMemb = 2, maxMemb = 35) ((x>=minMemb) & (x<=maxMemb)) g[filter(rowSums(g)),,drop=FALSE ]
g = cateGOry(c("CG2671", "CG2671", "CG2950"), c("GO:0090079", "GO:0001738", "GO:0003676"), sparse=TRUE) g rowSums(g) ## number of genes in each category ## Filter out categories with less than minMem and more than maxMem members. ## This is toy data, in real applications, a choice of minMem higher ## than 2 will be more appropriate. filter = function(x, minMemb = 2, maxMemb = 35) ((x>=minMemb) & (x<=maxMemb)) g[filter(rowSums(g)),,drop=FALSE ]
The functions or variables listed here are no longer part of the Category package.
condGeneIdUniverse() isConditional() geneGoHyperGeoTest() geneKeggHyperGeoTest() cb_parse_band_hsa() chrBandInciMat()
condGeneIdUniverse() isConditional() geneGoHyperGeoTest() geneKeggHyperGeoTest() cb_parse_band_hsa() chrBandInciMat()
Return a list mapping category ids to the Entrez Gene ids annotated at
the category id. Only those category ids that have at least one
annotation in the set of Entrez Gene ids specified by the
geneIds
slot of p
are included.
categoryToEntrezBuilder(p)
categoryToEntrezBuilder(p)
p |
A subclass of |
End users should not call this directly. This method gets
called from hyperGTest
. To add support for a new
category, a new method for this generic must be defined. Its
signature should match a subclass of
HyperGParams-class
appropriate for the new
category.
A list mapping category ids to Entrez Gene identifiers.
S. Falcon
For each chromosome band identifier in chrVect
,
cb_contingency
builds and performs a test on a 2 x k
contingency table for the genes from selids
found in the child
bands of the given chrVect
element.
cb_sigBands
extracts the chromosome band identifiers that were
in a contingency table that tested significant given the specified
p-value cutoff.
cb_children
returns the child bands of a given band in the
chromosome band graph. The argument must have length equal to one.
cb_contingency(selids, chrVect, chrGraph, testFun = chisq.test, min.expected = 5L, min.k = 1L) cb_sigBands(b, p.value = 0.01) cb_children(n, chrGraph)
cb_contingency(selids, chrVect, chrGraph, testFun = chisq.test, min.expected = 5L, min.k = 1L) cb_sigBands(b, p.value = 0.01) cb_children(n, chrGraph)
selids |
A vector of the selected gene identifiers (usual Entrez IDs). |
chrVect |
A character vector of chromosome band identifiers |
chrGraph |
A |
testFun |
The function to use for testing the 2 x k contingency
tables. The default is |
min.expected |
A numeric value specifying the minimum expected
count for columns to be included in the contingency table. The
expected count is |
min.k |
An integer giving the minimum number of chromosome bands that must be present in a contingency table in order to proceed with testing. |
b |
A list as returned by |
p.value |
A p-value cutoff to use in selecting significant contingency tables. |
n |
A length one character vector specifying a chromosome band
annotation. Bands not found in |
cb_sigBands
assumes that the p-value associated with a result
of testFun
can by accessed as testFun(t)$p.value
. We
should improve this to be a method call which can then be specialized
based on the class of the object returned by testFun
.
cb_contingency
returns a list with an element for each test
performed. This will most often be shorter than
length(chrVect)
due to skipped tests based on min.found
and min.k
. Each element of the returned list is itself a list
with components:
table |
A 2 x k contingency table |
result |
The output of |
cb_sigBands
returns a character vector of chromosome band
identifiers that are in one of the contingency tables that had a
p-value less than the cutoff specified by p.value
.
Seth Falcon
This function parses chromosome band annotations as found in the <chip>MAP map of Bioconductor annotation data packages. The return value is a vector of parent bands up to the relevant chromosome.
cb_parse_band_Hs(x)
cb_parse_band_Hs(x)
x |
A chromosome band annotation given as a string. |
The former function cb\_parse\_band\_hsa is now deprecated.
A character vector giving the path to the relevant chromosome.
Seth Falcon
cb_parse_band_Hs("12q32.12")
cb_parse_band_Hs("12q32.12")
This function parses chromosome band annotations as found in the <chip>MAP map of Bioconductor annotation data packages. The return value is a vector of parent bands up to the relevant chromosome.
cb_parse_band_Mm(x)
cb_parse_band_Mm(x)
x |
A chromosome band annotation given as a string. |
A character vector giving the path to the relevant chromosome.
Seth Falcon \& Nolwenn Le Meur
cb_parse_band_Mm("10 B3")
cb_parse_band_Mm("10 B3")
cb_test
is a flexible tool for discovering interesting
chromosome bands relative to a selected gene list. The function
supports local and global tests which can be carried out in a top down
or bottom up fashion on the tree of chromosome bands.
cb_test(selids, chrtree, level, dir = c("up", "down"), type = c("local", "global"), next.pval = 0.05, cond.pval = 0.05, conditional = FALSE)
cb_test(selids, chrtree, level, dir = c("up", "down"), type = c("local", "global"), next.pval = 0.05, cond.pval = 0.05, conditional = FALSE)
selids |
A vector of gene IDs. The IDs should match those used
to annotatate the |
chrtree |
A |
level |
An integer giving the level of the chromosome band tree
at which testing should begin. The level is conceptualized as the set
of nodes with a given path length to the root (organism) node of the
chromosome band tree. So level 1 is the chromosome and level 2 is the
chromosome arms. You can get a better sense by calling
|
dir |
A string giving the direction in which the chromosome band
tree will be traversed when carrying out the tests. A bottom up
traversal, from leaves to root, is specified by |
type |
A string giving the type of test to perform. The current
choices are |
next.pval |
The p-value cutoff used to determine whether the parents or children of a node should be tested. After testing a given level of the tree, the decision of whether or not to continue testing the children (or parents) of the already tested nodes is made by comparing the p-value result for a given node with this cutoff; relatives of nodes with values strictly greater than the cutoff are skipped. |
cond.pval |
The p-value cutoff used to determine whether a node
is significant during a conditional test. See |
conditional |
A logical value. Can only be used when
|
A list with an element for each level of the tree that was tested.
Note that the first element will correspond to the level given by
level
and that subsequent elements will be the next or previous
depending on dir
.
Each level element is itself a list consisting of a result list for each node or set of nodes tested. These inner-most lists will have, at least, the following components:
nodes |
A character vector of the nodes involved in the test. |
p.value |
The p-value for the test |
observed |
The contingency table |
method |
A brief description of the test method |
Seth Falcon
This class represents chromosome band annotation data for a given experiment. The class is responsible for storing the mapping of band to set of gene IDs located within that band as well as for representing the tree structured relationship among the bands.
Objects should be created using NewChrBandTree
or
ChrBandTreeFromGraph
.
toParentGraph
:Object of class "graph"
representing the tree of chromosome bands. Edges in this directed
graph go from child to parent.
toChildGraph
:Object of class "graph"
. This is
the same as toParentGraph
, but with the edge directons
reversed. This is not an ideal implementation due to the
duplication of data, but it provides quick access to parents or
children of a given node.
root
:Object of class "character"
giving the
name of the root node. The convention is to use "ORGANISM:<organism>".
level2nodes
:Object of class "list"
providing a
mapping of levels in the tree to the set of nodes at that level.
Levels X
is defined as the set of nodes with a path length of
X
from the root node.
Return a vector of gene IDs representing the gene
universe for this ChrBandTree
Return a list with an element for each the
character vector n
. Each element is a character vector of
node names of the children of the named element.
Return a vector of gene IDs for a single band.
Return a list of vectors of gene IDs when given more than one band. The "l" prefix is for list.
Return the parents of the specified bands. See
childrenOf
for a description of the structure of the return
value.
Return an integer vector identifying the levels of the tree.
Return the nodes in the tree that are at
the level specified by level
. The level
argument can
be either numeric or character, but should match a level returned by
treeLevels
.
Not all known chromosome bands will be represented in a given instance. The set of bands that will be present is determined by the available annotation data and the specified gene universe. The annotation source maps genes to their most specific band. Such bands and all bands on the path to the root will be represented in the resulting tree.
Currently there is only support for human and mouse data.
S. Falcon
library("hgu95av2.db") set.seed(0xfeee) univ = NULL ## use all Entrez Gene IDs on the chip (not recommended) ct = NewChrBandTree("hgu95av2.db", univ) length(allGeneIds(ct)) exampleLevels(ct) geneIds(ct, "10p11") lgeneIds(ct, "10p11") lgeneIds(ct, c("10p11", "Yq11.22")) pp = parentOf(ct, c("10p11", "Yq11.22")) childrenOf(ct, unlist(pp)) treeLevels(ct) level2nodes(ct, 0) level2nodes(ct, 0L) level2nodes(ct, "0") level2nodes(ct, 1)
library("hgu95av2.db") set.seed(0xfeee) univ = NULL ## use all Entrez Gene IDs on the chip (not recommended) ct = NewChrBandTree("hgu95av2.db", univ) length(allGeneIds(ct)) exampleLevels(ct) geneIds(ct, "10p11") lgeneIds(ct, "10p11") lgeneIds(ct, c("10p11", "Yq11.22")) pp = parentOf(ct, c("10p11", "Yq11.22")) childrenOf(ct, unlist(pp)) treeLevels(ct) level2nodes(ct, 0) level2nodes(ct, 0L) level2nodes(ct, "0") level2nodes(ct, 1)
This class encapsulates parameters needed for Hypergeometric testing
of over or under representation of chromosome bands among a selected
gene list using hyperGTest
.
Objects can be created by calls of the form
new("ChrMapHyperGParams", ...)
.
chrGraph
:Object of class "graph"
. The nodes
are the chromosome bands and the edges describe the tree structure
of the bands. Each node has a "geneIds" node attributes (see
nodeData
) which contains a vector of gene IDs annotated at
the given band.
conditional
:Object of class "logical"
,
indicating whether the test performed should be a conditional
test.
geneIds
:Object of class "ANY"
: A vector of
gene identifiers. Numeric and character vectors are probably the
only things that make sense. These are the gene ids for the
selected gene set.
universeGeneIds
:Object of class "ANY"
: A
vector of gene ids in the same format as geneIds
defining a
subset of the gene ids on the chip that will be used as the
universe for the hypergeometric calculation. If this is
NULL
or has length zero, then all gene ids on the chip will
be used.
annotation
:A string giving the name of the annotation data package for the chip used to generate the data.
categorySubsetIds
:Object of class "ANY"
:
If the test method supports it, can be used to specify a subset of
category ids to include in the test instead of all possible
category ids.
categoryName
:A string describing the category.
Usually set automatically by subclasses. For example
"GO"
.
pvalueCutoff
:The p-value to use as a cutoff for significance for testing methods that require it. This value will also be passed on to the result instance and used for display and counting of significant results. The default is 0.01.
testDirection
:A string indicating whether the test
should be for overrepresentation ("over"
) or
underrepresentation ("under"
).
datPkg
:Object of class "DatPkg"
used to assist
with dispatch based on type of annotation data available.
Class "HyperGParams"
, directly.
No methods defined with class "ChrMapHyperGParams" in the signature.
Seth Falcon
showClass("ChrMapHyperGParams")
showClass("ChrMapHyperGParams")
This class represents the results of a Hypergeometric test for
over-representation of genes in a selected gene list in the
chromosome band annotation. The hyperGTest
function returns
an instance of ChrMapHyperGResult
when given a parameter
object of class ChrMapHyperGParams
. For details on accessing
the results, see HyperGResult-accessors.
Objects can be created by calls of the form new("ChrMapHyperGResult", ...)
.
pvalue.order
:Object of class "integer"
that
gives the order of the p-values.
conditional
:Object of class "logical"
is a flag indicating whether or not this result is from a
conditional analysis.
chrGraph
:Object of class "graph"
. The nodes
are the chromosome bands with edges representing the tree
structure of the bands. Each node has a "geneIds"
attribute that gives the gene IDs annotated at that band.
annotation
:A string giving the name of the chip annotation data package used.
geneIds
:Object of class "ANY"
: the input
vector of gene identifiers intersected with the universe of gene
identifiers used in the computation. The class of this slot is
specified as "ANY"
because gene IDs may be integer or
character vectors depending on the annotation package.
testName
:A string identifying the testing method used.
pvalueCutoff
:Numeric value used a a p-value
cutoff. Used by the show
method to count number of
significant terms.
testDirection
:Object of class "character"
indicating whether the test was for over-representation
("over"
) or under-representation ("under"
).
Class "HyperGResultBase"
, directly.
Seth Falcon
showClass("ChrMapHyperGResult") ## For details on accessing the results: ## help("HyperGResult-accessors")
showClass("ChrMapHyperGResult") ## For details on accessing the results: ## help("HyperGResult-accessors")
This class encapsulates parameters needed for testing systematic
variations in some gene-level statistic by chromosome bands using
linearMTest
.
Objects can be created by calls of the form
new("ChrMapLinearMParams", ...)
.
graph
:Object of class "graph"
. The nodes
are the chromosome bands and the edges describe the tree structure
of the bands. Each node has a "geneIds" node attributes (see
nodeData
) which contains a vector of gene IDs annotated at
the given band.
conditional
:Object of class "logical"
,
indicating whether the test performed should be a conditional
test.
gsc
:The
GeneSetCollection
object grouping the gene ids into sets.
geneStats
:Named vector of class "numeric"
,
giving the gene-level statistics to be used in the tests.
universeGeneIds
:Object of class "ANY"
: A
vector of gene ids defining a subset of the gene ids on the chip
that will be used as the universe for the hypergeometric
calculation. If this is NULL
or has length zero, then all
gene ids on the chip will be used.
annotation
:A string giving the name of the annotation data package for the chip used to generate the data.
datPkg
:Object of class "DatPkg"
used to assist
with dispatch based on type of annotation data available.
categorySubsetIds
:Object of class "ANY"
:
If the test method supports it, can be used to specify a subset of
category ids to include in the test instead of all possible
category ids.
categoryName
:A string describing the category.
Usually set automatically by subclasses. For example
"GO"
.
pvalueCutoff
:The p-value to use as a cutoff for significance for testing methods that require it. This value will also be passed on to the result instance and used for display and counting of significant results. The default is 0.01.
minSize
:An integer giving a minimum size for a gene set for it to be tested. The default is 5.
testDirection
:A string indicating whether the test
should test for systematic increase ("up"
) or decrease
("down"
) in the geneStats
values within a gene set
relative to the remaining genes.
Class "LinearMParams"
, directly.
Deepayan Sarkar
showClass("ChrMapLinearMParams")
showClass("ChrMapLinearMParams")
This class represents the results of a linear model-based test for
systematic changes in a per-gene statistic by chromosome band
annotation. The linearMTest
function returns an
instance of ChrMapLinearMResult
when given a parameter object
of class ChrMapLinearMParams
. Most slots can be queried using
accessors.
Objects can be created by calls of the form
new("ChrMapLinearMResult", ...)
, but is more commonly created
by callinf linearMTest
pvalues
:Object of class "numeric"
, with the
p-values for each term.
pvalue.order
:Object of class "integer"
, the
order vector (increasing) for the p-values.
effectSize
:Object of class "numeric"
, with
the effect size for each term.
annotation
:Object of class "character"
~~
geneIds
:Object of class "ANY"
~~
testName
:Object of class "character"
~~
pvalueCutoff
:Object of class "numeric"
~~
minSize
:Object of class "integer"
~~
testDirection
:Object of class "character"
~~
conditional
:Object of class "logical"
~~
graph
:Object of class "graph"
~~
gsc
:Object of class "GeneSetCollection"
~~
Class "LinearMResult"
, directly.
Class "LinearMResultBase"
, by class
"LinearMResult", distance 2.
None
Deepayan Sarkar, Michael Lawrence
linearMTest
, ChrMapLinearMParams
,
LinearMResult
,
LinearMResultBase
,
showClass("ChrMapLinearMResult")
showClass("ChrMapLinearMResult")
DatPkg
is a VIRTUAL
class for representing annotation
data packages.
AffyDatPkg
is a subclass of DatPkg
used to represent
standard annotation data packages that follow the format of Affymetrix
expression array annotation.
YeastDatPkg
is a subclass of DatPkg
used to represent
the annotation data packages for yeast. The yeast chip packages are
based on sgd and are internally different from the AffyDatPkg
conforming packages.
ArabidopsisDatPkg
is a subclass of DatPkg
used to
represent the annotation packages for Arabidopsis. These packages are
internally slightly different from the AffyDatPkg
conforming
packages.
Org.XX.egDatPkg
is a subclass of DatPkg
used to
represent the org.*.eg.db
organism-level Entez Gene based
annotation data packages.
OBOCollectionDatPkg
is a subclass of DatPkg
used to
represent the OBO
based annotation data packages.
GeneSetCollectionDatPkg
is a subclass of DatPkg
used to
represent annotations in the form of GeneSetCollection
objects
which are not based on any annotation packages but are instead derived
from custom (user supplied) annotations.
These methods have been extended to accommodate uninstalled annotation objects, primarily those available from the AnnotationHub package. See below for an example.
A virtual Class: No objects may be created from it.
Given the name of an annotation data package, DatPkgFactory
can
be used to create an appropriate DatPkg
subclass.
A string giving the name of the annotation data package.
The underlying AnnotationDbi database object.
Boolean. Distinguishes between conventional installed annotation packages, and those from AnnotationHub.
See showMethods(classes="DatPkg")
.
The set of methods, ID2EntreizID
map between the standard
IDs for an organism, or Chip and EntrezIDs, typically to give
a way to get the GO terms. Different organisms, such as S. cerevisae
and A. thaliana have their own internal IDs, so we need specialized methods
for them.
Seth Falcon
DatPkgFactory("hgu95av2") ## Not run: DatPkgFactory("org.Sc.sgd") DatPkgFactory("org.Hs.eg.db") DatPkgFactory("ag") library(AnnotationHub) hub <- AnnotationHub() ## get an OrgDb for Atlantic salmon query(hub, c("salmo salar","orgdb")) salmodb <- hub[["AH58003"]] DatPkgFactory(salmodb) ## End(Not run)
DatPkgFactory("hgu95av2") ## Not run: DatPkgFactory("org.Sc.sgd") DatPkgFactory("org.Hs.eg.db") DatPkgFactory("ag") library(AnnotationHub) hub <- AnnotationHub() ## get an OrgDb for Atlantic salmon query(hub, c("salmo salar","orgdb")) salmodb <- hub[["AH58003"]] DatPkgFactory(salmodb) ## End(Not run)
This function extracts estimated effect sizes from the results of a linear model-based gene-set / category enrichment test.
effectSize(r)
effectSize(r)
r |
The results of the test |
A numeric vector.
Deepayan Sarkar
linkS4class{LinearMResult}
The "levels" of a chromosome band tree represented by a ChrBandTree
object
are the sets of nodes with a given path length to the root node. This
function displays the available levels along with an example node from
each level.
exampleLevels(g)
exampleLevels(g)
g |
A |
A list with an element for each level. The names of the list are the levels. Each element is an example of a node from the given level.
S. Falcon
For a given incidence matrix, Amat
, compute some per category
statistics.
findAMstats(Amat, tstats)
findAMstats(Amat, tstats)
Amat |
An incidence matrix, with categories as the rows and probes as the columns. |
tstats |
A vector of per probe test statistics (should be the
same length as |
Simple summary statistics are computed, such as the row sums and the
vector of per category sums of the test statistics, tstats
.
A list with components,
eDE |
per category sums of the test statistics |
lens |
row sums of |
R. Gentleman
ts = rnorm(100) Am = matrix(sample(c(0,1), 1000, replace=TRUE), ncol=100) findAMstats(Am, ts)
ts = rnorm(100) Am = matrix(sample(c(0,1), 1000, replace=TRUE), ncol=100) findAMstats(Am, ts)
Given a KEGG pathway ID this function returns the character name of the pathway.
getPathNames(iPW, organism = "hsa")
getPathNames(iPW, organism = "hsa")
iPW |
A vector of KEGG pathway IDs. |
organism |
A single character vector of the organism identifier, e.g., "hsa" |
This function simply does a look up in KEGGPATHID2NAME
and
returns a list of the pathway names.
Possible extensions would be to extend it to work with the cMAP library as well.
A list of pathway names.
R. Gentleman
nms = "00031" getPathNames(nms)
nms = "00031" getPathNames(nms)
A parameter class for representing all parameters needed for running
the hyperGTest
method with one of the GO
ontologies (BP, CC, MF) as the category.
Objects can be created by calls of the form new("GOHyperGParams", ...)
.
ontology
:A string specifying the GO ontology to use. Must be one of "BP", "CC", or "MF".
conditional
:A logical indicating whether the calculation should condition on the GO structure.
geneIds
:Object of class "ANY"
: A vector of
gene identifiers. Numeric and character vectors are probably the
only things that make sense. These are the gene ids for the
selected gene set.
universeGeneIds
:Object of class "ANY"
: A
vector of gene ids in the same format as geneIds
defining a
subset of the gene ids on the chip that will be used as the
universe for the hypergeometric calculation. If this is
NULL
or has length zero, then all gene ids on the chip will
be used.
annotation
:A string giving the name of the annotation data package for the chip used to generate the data.
categorySubsetIds
:Object of class "ANY"
:
If the test method supports it, can be used to specify a subset of
category ids to include in the test instead of all possible
category ids.
categoryName
:A string describing the category. Usually set automatically by subclasses. For example "GO".
datPkg
:Holds a DatPkg object which is of a particular type that in turn varies with the kind of annotation package this is.
pvalueCutoff
:A numeric values between zero and one used as a p-value cutoff for p-values generated by the Hypergeometric test. When the test being performed is non-conditional, this is only used as a default value for printing and summarizing the results. For a conditional analysis, the cutoff is used during the computation to determine perform the conditioning: child terms with a p-value less than pvalueCutoff are conditioned out of the test for their parent term.
orCutoff
:A numeric value used as an odds-ratio
cutoff for odds ratios generated by the conditional
Hypergeometric test. For such a test, it works like the
pvalueCutoff
but applied on the odds ratio. It has no
effect when conditional=FALSE
.
minSizeCutoff
:A numeric value used as a cutoff for
minimum size of the gene sets being tested with the conditional
Hypergeometric test. For such a test, it works like the
pvalueCutoff
but applied on the odds ratio. It has no
effect when conditional=FALSE
.
maxSizeCutoff
:A numeric value used as a cutoff for
maximum size of the gene sets being tested with the conditional
Hypergeometric test. For such a test, it works like the
pvalueCutoff
but applied on the odds ratio. It has no
effect when conditional=FALSE
.
testDirection
:A string which can be either "over" or "under". This determines whether the test performed detects over or under represented GO terms.
Class "HyperGParams"
, directly.
hyperGTest(p)
Perform hypergeometric tests to
assess overrepresentation of category ids in the gene set. See
the documentation for the generic function for details. This
method must be called with a proper subclass of
HyperGParams
.
ontology(p)
, ontology(p) <- value
Accessors
for the GO ontology. When setting, value
should be one
of "BP", "CC", or "MF".
conditional(p)
, conditional(p) <-
value
Accessors for the conditional flag. When setting,
value
must be TRUE
or FALSE
.
S. Falcon
HyperGResult-class
GOHyperGParams-class
hyperGTest
Helps to create A parameter class for representing all parameters
needed for running the hyperGTest
method. If it is a
GOHyperGParams object, being made, then with one of the GO ontologies
(BP, CC, MF) as the category. This function will construct the
parameter object from a GeneSetCollection object and if necessary will
also try to check to make sure that the object is based on a GO2ALL
mapping.
GSEAGOHyperGParams(name, geneSetCollection, geneIds, universeGeneIds, ontology, pvalueCutoff, conditional, testDirection, ...) GSEAKEGGHyperGParams(name, geneSetCollection, geneIds, universeGeneIds, pvalueCutoff, testDirection, ...)
GSEAGOHyperGParams(name, geneSetCollection, geneIds, universeGeneIds, ontology, pvalueCutoff, conditional, testDirection, ...) GSEAKEGGHyperGParams(name, geneSetCollection, geneIds, universeGeneIds, pvalueCutoff, testDirection, ...)
name |
String specifying name of the GeneSetCollection. |
geneSetCollection |
A GeneSetCollection Object. If a GOHyperGParams object is sought, then this GeneSetCollection should be based on a GO2ALLFrame object and so the idType of that GeneSetCollection should be GOAllFrameIdentifier. If a KEGGHyperGParams object is sought then a GeneSetCollection based on a KEGGFrame object should be used and the idType will be a KEGGFrameIdentifier. |
geneIds |
Object of class |
universeGeneIds |
Object of class |
ontology |
A string specifying the GO ontology to use. Must be one of "BP", "CC", or "MF". (used with GO only) |
pvalueCutoff |
A numeric values between zero and one used as a p-value cutoff for p-values generated by the Hypergeometric test. When the test being performed is non-conditional, this is only used as a default value for printing and summarizing the results. For a conditional analysis, the cutoff is used during the computation to determine perform the conditioning: child terms with a p-value less than pvalueCutoff are conditioned out of the test for their parent term. |
conditional |
A logical indicating whether the calculation should condition on the GO structure. (GO only) |
testDirection |
A string which can be either "over" or "under". This determines whether the test performed detects over or under represented GO terms. |
... |
optional arguments to configure the GOHyperGParams object. |
M. Carlson
HyperGResult-class
GOHyperGParams-class
hyperGTest
This function performs GSEA computations and returns p-values for each gene set based on repeated permutation of the phenotype labels.
gseattperm(eset, fac, mat, nperm)
gseattperm(eset, fac, mat, nperm)
eset |
An |
fac |
A |
mat |
A 0/1 incidence matrix with each row representing a gene set and each column representing a gene. A 1 indicates membership of a gene in a gene set. |
nperm |
Number of permutations to test to build the reference distribution. |
The t-statistic is used (via rowttests
) to test for a
difference in means between the phenotypes determined by fac
within each gene set (given as a row of mat
).
A reference distribution for these statistics is established by
permuting fac
and repeating the test B
times.
A matrix with the same number of rows as mat
and two columns,
"Lower"
and "Upper"
. The "Lower"
("Upper"
) column gives the probability of seeing a t-statistic
smaller (larger) than the observed.
Seth Falcon
## This example uses a random sample of probesets and a randomly ## generated category matrix. The results, therefore, are not ## meaningful, but the code demonstrates how to use gseattperm without ## requiring any expensive computations. ## Obtain an ExpressionSet with two types of samples (mol.biol) haveALL <- require("ALL") if (haveALL) { data(ALL) set.seed(0xabcd) rndIdx <- sample(1:nrow(ALL), 500) Bcell <- grep("^B", as.character(ALL$BT)) typeNames <- c("NEG", "BCR/ABL") bcrAblOrNegIdx <- which(as.character(ALL$mol.biol) %in% typeNames) s <- ALL[rndIdx, intersect(Bcell, bcrAblOrNegIdx)] s$mol.biol <- factor(s$mol.biol) ## Generate a random category matrix nCats <- 100 set.seed(0xdcba) rndCatMat <- matrix(sample(c(0L, 1L), replace=TRUE), nrow=nCats, ncol=nrow(s), dimnames=list( paste("c", 1:nCats, sep=""), featureNames(s))) ## Demonstrate use of gseattperm N <- 10 pvals <- gseattperm(s, s$mol.biol, rndCatMat, N) pvals[1:5, ] }
## This example uses a random sample of probesets and a randomly ## generated category matrix. The results, therefore, are not ## meaningful, but the code demonstrates how to use gseattperm without ## requiring any expensive computations. ## Obtain an ExpressionSet with two types of samples (mol.biol) haveALL <- require("ALL") if (haveALL) { data(ALL) set.seed(0xabcd) rndIdx <- sample(1:nrow(ALL), 500) Bcell <- grep("^B", as.character(ALL$BT)) typeNames <- c("NEG", "BCR/ABL") bcrAblOrNegIdx <- which(as.character(ALL$mol.biol) %in% typeNames) s <- ALL[rndIdx, intersect(Bcell, bcrAblOrNegIdx)] s$mol.biol <- factor(s$mol.biol) ## Generate a random category matrix nCats <- 100 set.seed(0xdcba) rndCatMat <- matrix(sample(c(0L, 1L), replace=TRUE), nrow=nCats, ncol=nrow(s), dimnames=list( paste("c", 1:nCats, sep=""), featureNames(s))) ## Demonstrate use of gseattperm N <- 10 pvals <- gseattperm(s, s$mol.biol, rndCatMat, N) pvals[1:5, ] }
This function performs a hypergeometric test for over- or under-representation of significant ‘genes’ amongst those assayed in a universe of genes. It provides an interface based on character vectors of identifying member of gene sets and the gene universe.
hyperg(assayed, significant, universe, representation = c("over", "under"), ...)
hyperg(assayed, significant, universe, representation = c("over", "under"), ...)
assayed |
A vector of assayed genes (or other
identifiers). |
significant |
A vector of assayed genes that were
differentially expressed. If |
universe |
A character vector defining the universe of genes. |
representation |
Either “over” or “under”, to indicate testing for over- or under-representation, respectively, of differentially expressed genes. |
... |
Additional arguments, unused. |
When invoked with a character vector of assayed
genes, a named
numeric vector providing the input values, P-value, odds ratio, and
expected number of significantly expressed genes.
When invoked with a list of character vectors of assayed
genes,
a data frame with columns of input values, P-value, odds ratio, and
expected number of significantly expressed genes.
Martin Morgan [email protected] with contributions from Paul Shannon.
hyperGTest
for convenience functions using Bioconductor
annotation resources such as GO.db.
set.seed(123) ## artifical sets -- affy probes grouped by protein family library(hgu95av2.db) map <- select(hgu95av2.db, keys(hgu95av2.db), "PFAM") sets <- Filter(function(x) length(x) >= 10, split(map$PROBEID, map$PFAM)) universe <- unlist(sets, use.names=FALSE) siggenes <- sample(universe, length(universe) / 20) ## simulate sigsets <- Map(function(x, y) x[x %in% y], sets, MoreArgs=list(y=siggenes)) result <- hyperg(sets, sigsets, universe) head(result)
set.seed(123) ## artifical sets -- affy probes grouped by protein family library(hgu95av2.db) map <- select(hgu95av2.db, keys(hgu95av2.db), "PFAM") sets <- Filter(function(x) length(x) >= 10, split(map$PROBEID, map$PFAM)) universe <- unlist(sets, use.names=FALSE) siggenes <- sample(universe, length(universe) / 20) ## simulate sigsets <- Map(function(x, y) x[x %in% y], sets, MoreArgs=list(y=siggenes)) result <- hyperg(sets, sigsets, universe) head(result)
An abstract (VIRTUAL) parameter class for representing all parameters
needed by a method specializing the hyperGTest
generic. You should only use subclasses of this class directly.
Objects of this class cannot be instantiated directly.
geneIds
:Object of class "ANY"
: A vector of
gene identifiers. Numeric and character vectors are probably the
only things that make sense. These are the gene ids for the
selected gene set.
universeGeneIds
:Object of class "ANY"
: A
vector of gene ids in the same format as geneIds
defining a
subset of the gene ids on the chip that will be used as the
universe for the hypergeometric calculation. If this is
NULL
or has length zero, then all gene ids on the chip will
be used.
annotation
:Object of class
"ANY"
. Functionally, this is either a string giving the name of the
annotation data package for the chip used to generate the data, or
the name of an annotation object downloaded using AnnotationHub.
categorySubsetIds
:Object of class "ANY"
:
If the test method supports it, can be used to specify a subset of
category ids to include in the test instead of all possible
category ids.
categoryName
:A string describing the category. Usually set automatically by subclasses. For example "GO".
pvalueCutoff
:The p-value to use as a cutoff for significance for testing methods that require it. This value will also be passed on to the result instance and used for display and counting of significant results. The default is 0.01.
testDirection
:A string indicating whether the test
should be for overrepresentation ("over"
) or
underrepresentation ("under"
).
datPkg
:Holds a DatPkg object which is of a particular type that in turn varies with the kind of annotation package this is.
signature(p =
"HyperGParams")
: Perform hypergeometric tests to
assess overrepresentation of category ids in the gene set. See the
documentation for the generic function for details. This method
must be called with a proper subclass of
HyperGParams
.
geneIds(object)
, geneIds(object) <- value
Accessors for the gene identifiers that will be used as the selected gene list.
Accessor for annotation. If you want to change the annotation for an existing instance, use the replacement form.
ontology(object)
Accessor for GO ontology.
organism(object)
Accessor for the organism character
string used as an identifier in DatPkg
.
pvalueCutoff(r)
, pvalueCutoff(r) <-
value
Accessor for the p-value cutoff. When setting,
value
should be a numeric value between zero and one.
testDirection
Accessor for the test direction. When setting,
value
must be either "over" or "under".
universeGeneIds(r)
accessor for vector of gene identifiers.
S. Falcon
HyperGResult-class
GOHyperGParams-class
KEGGHyperGParams-class
hyperGTest
This manual page documents generic functions for extracting data
from the result object returned from a call to hyperGTest
.
The result object will be a subclass of HyperGResultBase
.
Methods apply to all result object classes unless otherwise noted.
pvalues(r) oddsRatios(r) expectedCounts(r) geneCounts(r) universeCounts(r) universeMappedCount(r) geneMappedCount(r) geneIds(object, ...) geneIdUniverse(r, cond = TRUE) geneIdsByCategory(r, catids = NULL) sigCategories(r, p) ## R CMD check doesn't like these ## annotation(r) ## description(r) testName(r) pvalueCutoff(r) testDirection(r) chrGraph(r)
pvalues(r) oddsRatios(r) expectedCounts(r) geneCounts(r) universeCounts(r) universeMappedCount(r) geneMappedCount(r) geneIds(object, ...) geneIdUniverse(r, cond = TRUE) geneIdsByCategory(r, catids = NULL) sigCategories(r, p) ## R CMD check doesn't like these ## annotation(r) ## description(r) testName(r) pvalueCutoff(r) testDirection(r) chrGraph(r)
r , object
|
An instance of a subclass of
|
catids |
A character vector of category identifiers. |
p |
Numeric p-value used as a cutoff for selecting a subset of the result. |
cond |
A logical value indicating whether to return conditional
results for a conditional test. The default is |
... |
Additional arguments that may be used by specializing methods. |
returns a "character"
vector describing the
organism for which the results were calculated.
returns an "integer"
vector: for each category term tested, the number of genes from
the gene set that are annotated at the term.
returns a "numeric"
vector: the ordered p-values for each category term tested.
returns an "integer"
vector: for each category term tested, the number of genes from
the gene universe that are annotated at the term.
returns an "integer"
vector of length one giving the size of the gene universe set.
returns a "numeric"
vector
giving the expected number of genes in the selected gene list to
be found at each tested category term. These values may surprise
you if you forget that your gene list and gene universe might have
had to undergo further filtering to ensure that each gene has been
labeled by at least one GO term.
returns a "numeric"
vector giving
the odds ratio for each category term tested.
returns the name of the annotation data package used.
returns the input vector of gene identifiers intersected with the universe of gene identifiers used in the computation.
returns a list named by the tested categories. Each element of the list is a vector of gene identifiers (from the gene universe) annotated at the corresponding category term.
returns a list similar to
geneIdUniverse
, but each vector of gene IDs is
intersected with the list of selected gene IDs from
geneIds
. The result is the selected gene IDs annotated at
each category.
returns a character vector of category
identifiers with a significant p-value. If argument p
is
missing, then the cutoff obtained from pvalueCutoff(r)
will
be used.
returns the size of
the selected gene set used in the computation. This is simply
length(geneIds(obj))
.
accessor for the
pvalueCutoff
slot.
accessor for the
testDirection
slot. Contains a string indicating
whether the test was for "over"
or "under"
representation of the categories.
returns a character string description of the test result.
returns a string describing the testing method used.
returns
a data.frame
summarizing the test result. Optional
arguments pvalue
and categorySize
allow
specification of maximum p-value and minimum categorySize,
respectively.
The data frame contains the GOID
,
Pvalue
, OddsRatio
, ExpCount
, Count
,
and Size
. ExpCount
is the expected count and
the Count
is how many instances of that term were actually
oberved in your gene list while the Size
is the number that
could have been found in your gene list if every instance had
turned up. Values like the ExpCount
and the Size
are going to be affected by what is included in the gene universe
as well as by whether or not it was a conditional test.
writes
an HTML version of the table produced by the summary
method. The first argument should be a HyperGResult
instance (or subclass). The path of a file to write the report to
can be specified using the file
argument. The default is
file=""
which will cause the report to be printed to the
screen. If you wish to create a single report comprising multiple
results you can set append=TRUE
. The default is
FALSE
(overwrite pre-existing report file). You can
specify a string to use as an identifier for each table by
providing a value for the label
argument. The number of
digits displayed in numerical columns can be controlled using
digits
(defaults to 3). The summary
method is
called on the HyperGResult
instance to generate a data
frame that is transformed to HTML. You can pass additional
arguments to the summary
method which is used to generate
the data frame that is transformed to HTML by specifying a named
list using summary.args
.
Seth Falcon
hyperGTest
HyperGResult-class
HyperGParams-class
GOHyperGParams-class
KEGGHyperGParams-class
## Note that more in-depth examples can be found in the GOstats ## vignette (Hypergeometric tests using GOstats). library("hgu95av2.db") library("annotate") ## Retrieve 300 probeids that have PFAM ids probids <- keys(hgu95av2.db,keytype="PROBEID",column="PFAM")[1:300] ## get unique Entrez Gene IDs geneids <- select(hgu95av2.db, probids, 'ENTREZID', 'PROBEID') geneids <- unique(geneids[['ENTREZID']]) ## Now do the same for the universe univ <- keys(hgu95av2.db,keytype="PROBEID",column="PFAM") univ <- select(hgu95av2.db, univ, 'ENTREZID', 'PROBEID') univ <- unique(univ[['ENTREZID']]) p <- new("PFAMHyperGParams", geneIds=geneids, universeGeneIds=univ, annotation="hgu95av2") ## this takes a while... if(interactive()){ hypt <- hyperGTest(p) summary(hypt) htmlReport(hypt, file="temp.html", summary.args=list("htmlLinks"=TRUE)) }
## Note that more in-depth examples can be found in the GOstats ## vignette (Hypergeometric tests using GOstats). library("hgu95av2.db") library("annotate") ## Retrieve 300 probeids that have PFAM ids probids <- keys(hgu95av2.db,keytype="PROBEID",column="PFAM")[1:300] ## get unique Entrez Gene IDs geneids <- select(hgu95av2.db, probids, 'ENTREZID', 'PROBEID') geneids <- unique(geneids[['ENTREZID']]) ## Now do the same for the universe univ <- keys(hgu95av2.db,keytype="PROBEID",column="PFAM") univ <- select(hgu95av2.db, univ, 'ENTREZID', 'PROBEID') univ <- unique(univ[['ENTREZID']]) p <- new("PFAMHyperGParams", geneIds=geneids, universeGeneIds=univ, annotation="hgu95av2") ## this takes a while... if(interactive()){ hypt <- hyperGTest(p) summary(hypt) htmlReport(hypt, file="temp.html", summary.args=list("htmlLinks"=TRUE)) }
This class represents the results of a test for over-representation of
categories among genes in a selected gene set based upon the
Hypergeometric distribution. The hyperGTest
generic function returns an instance of the
HyperGResult
class. For details on accessing
the results, see HyperGResult-accessors.
Objects can be created by calls of the form new("HyperGResult", ...)
.
pvalues
:"numeric"
vector: the ordered
p-values for each category term tested.
catToGeneId
:Object of class "list"
. The
names of the list are category IDs. Each element is a vector
of gene IDs annotated at the given category ID and in the
specified gene universe.
annotation
:A string giving the name of the chip annotation data package used.
geneIds
:Object of class "ANY"
: the input
vector of gene identifiers intersected with the universe of gene
identifiers used in the computation. The class of this slot is
specified as "ANY"
because gene IDs may be integer or
character vectors depending on the annotation package.
testName
:A string identifying the testing method used.
pvalueCutoff
:Numeric value used a a p-value
cutoff. Used by the show
method to count number of
significant terms.
testDirection
:A string indicating whether the
test should be for overrepresentation ("over"
) or
underrepresentation ("under"
).
oddsRatios
a "numeric"
vector giving
the odds ratio for each category term tested.
a "numeric"
vector
giving the expected number of genes in the selected gene list to
be found at each tested category term.
Class "HyperGResultBase"
, directly.
Seth Falcon
HyperGResultBase-class
GOHyperGResult-class
HyperGResult-accessors
This VIRTUAL class represents common elements of the return values
of generic functions like hyperGTest
. All subclasses are
intended to implement the accessor functions documented at
HyperGResult-accessors.
A virtual Class: No objects may be created from it.
annotation
:Object of class "character"
giving the name of the annotation data package used.
geneIds
:Object of class "ANY"
(usually
a character vector, but sometimes an integer vector).
The input vector of gene identifiers intersected with the
universe of gene identifiers used in the computation.
testName
:Object of class "character"
identifying the testing method used.
pvalueCutoff
:Numeric value used by the
testing method as a p-value cutoff. Not all testing
methods use this. Also used by the show
method to
count number of significant terms.
testDirection
:A string indicating whether the test
performed was for overrepresentation ("over"
) or
underrepresentation("under"
).
Seth Falcon
HyperGResult-class
GOHyperGResult-class
HyperGResult-accessors
Given a subclass of HyperGParams
, compute Hypergeomtric
p-values for over or under-representation of each term in the
specified category among the specified gene set.
hyperGTest(p)
hyperGTest(p)
p |
An instance of a subclass of
|
The gene identifiers in the geneIds
slot of p
define the
selected set of genes. The universe of gene ids is determined by the
chip annotation found in the annotation
slot of p
. Both
the selected genes and the universe are reduced by removing
identifiers that do not have any annotations in the specified
category.
For each term in the specified category that has at least one
annotation in the selected gene set, we determine how many of its
annotations are in the universe set and how many are in the selected
set. With these counts we perform a Hypergeometric test using
phyper
. This is equivalent to using Fisher's exact test.
It is important that the correct chip annotation data package be identified as it determines the universe of gene identifiers and is often used to determine the mapping between the category term and the gene identifiers.
For S. cerevisiae if the annotation
slot of p
is set to
'"org.Sc.sgd"' then comparisons and statistics are computed using common
names and are with respect to all genes annotated in the S. cerevisiae
genome not with respect to any microarray chip. This will *not* be
the right thing to do if you are working with a yeast microarray.
A HyperGResult
instance.
In most cases, the provided method with signature matching any
subclass of HyperGParams
is all that will be needed. This
method follows a template pattern. To add support for a new FOO
category type, a developer would need to create a
FooHyperGParams
subclass and then define two methods
specialized to the new subclass that get called from inside
hyperGTest
: universeBuilder
and
categoryToEntrezBuilder
.
S. Falcon
HyperGResult-class
HyperGParams-class
GOHyperGParams-class
KEGGHyperGParams-class
Parameter classes for representing all parameters needed for
running the hyperGTest
method with KEGG or PFAM as the
category.
Objects can be created by calls of the form
new("KEGGHyperGParams", ...)
or
new("PFAMHyperGParams", ...)
.
geneIds
:Object of class "ANY"
: A vector of
gene identifiers. Numeric and character vectors are probably the
only things that make sense. These are the gene ids for the
selected gene set.
universeGeneIds
:Object of class "ANY"
: A
vector of gene ids in the same format as geneIds
defining a
subset of the gene ids on the chip that will be used as the
universe for the hypergeometric calculation. If this is
NULL
or has length zero, then all gene ids on the chip will
be used.
annotation
:A string giving the name of the annotation data package for the chip used to generate the data.
cateogrySubsetIds
:Object of class "ANY"
:
If the test method supports it, can be used to specify a subset of
category ids to include in the test instead of all possible
category ids.
categoryName
:A string describing the category. This will be automatically set to "KEGG" or "PFAM" via the class's prototype.
pvalueCutoff
:The p-value to use as a cutoff for significance for testing methods that require it. This value will also be passed on to the result instance and used for display and counting of significant results. The default is 0.01.
testDirection
:A string indicating whether the
test should be for overrepresentation ("over"
) or
underrepresentation ("under"
).
Class "HyperGParams"
, directly.
signature(p =
"HyperGParams")
: Perform hypergeometric tests to
assess overrepresentation of category ids in the gene set. See
the documentation for the generic function for details. This
method must be called with a proper subclass of
HyperGParams
.
S. Falcon
HyperGResult-class
GOHyperGParams-class
hyperGTest
A parameter class for representing all parameters
needed by a method specializing the linearMTest
generic.
Objects can be created by calls of the form
new("LinearMParams", ...)
.
geneStats
:Named vector of class "numeric"
,
giving the gene-level statistics to be used in the tests. The
names should correspond to the gene identifiers in gsc
.
universeGeneIds
:Object of class "ANY"
: A
vector of gene ids defining a subset of the gene ids on the chip
that will be used as the universe for the hypergeometric
calculation. If this is NULL
or has length zero, then all
gene ids on the chip will be used. Currently this parameter is
ignored by the base linearMTest
method.
annotation
:A string giving the name of the annotation data package for the chip used to generate the data.
datPkg
:Object of class "DatPkg"
used to assist
with dispatch based on type of annotation data
available. Currently this parameter is ignored by the
base linearMTest
method.
categorySubsetIds
:Object of class "ANY"
:
If the test method supports it, can be used to specify a subset of
category ids to include in the test instead of all possible
category ids. Currently this parameter is ignored by the
base linearMTest
method.
categoryName
:A string describing the category.
Usually set automatically by subclasses. For example
"ChrMap"
.
pvalueCutoff
:The p-value to use as a cutoff for significance for testing methods that require it. This value will also be passed on to the result instance and used for display and counting of significant results. The default is 0.01.
minSize
:An integer giving a minimum size for a gene set for it to be tested. The default is 5.
testDirection
:A string indicating whether the test
should test for systematic increase ("up"
) or decrease
("down"
) in the geneStats
values within a gene set
relative to the remaining genes.
graph
:The graph
object indicating the hierarchical relationship among terms of the
ontology or other grouping.
conditional
:A logical
indicating whether
conditional tests should be performed. This tests whether a term
is still significant even when including its sub-terms in the model.
gsc
:The
GeneSetCollection
object grouping the gene ids into sets.
These are accessor methods for the various parameter slots:
signature(object = "LinearMParams", value = "character")
: ...
signature(object = "LinearMParams")
: ...
signature(r = "LinearMParams")
: ...
signature(r = "LinearMParams")
: ...
signature(object = "LinearMParams")
: ...
signature(object = "LinearMParams")
: ...
signature(r = "LinearMParams")
: ...
signature(r = "LinearMParams")
: ...
signature(object = "LinearMParams")
: ...
signature(r = "LinearMParams")
: ...
signature(r = "LinearMParams")
: ...
signature(r = "LinearMParams")
: ...
signature(r = "LinearMParams")
: ...
signature(r = "LinearMParams")
: ...
Deepayan Sarkar, Michael Lawrence
See linearMTest
for
examples. ChrMapLinearMParams
is a specialization
of this class for chromosome maps.
This class represents the results of a test for systematic change in
some gene-level statistic by gene sets. The linearMTest
generic function returns an instance of the LinearMResult
class.
Objects can be created by calls of the form new("LinearMResult",
...)
, but is more commonly created using a call to
linearMTest
.
pvalues
:Object of class "numeric"
, with the
p-values for each term.
pvalue.order
:Object of class "integer"
, the
order vector (increasing) for the p-values.
effectSize
:Object of class "numeric"
, with
the effect size for each term.
annotation
:Object of class "character"
~~
geneIds
:Object of class "ANY"
~~
testName
:Object of class "character"
~~
pvalueCutoff
:Object of class "numeric"
~~
minSize
:Object of class "integer"
~~
testDirection
:Object of class "character"
~~
conditional
:Object of class "logical"
~~
graph
:Object of class "graph"
~~
gsc
:Object of class "GeneSetCollection"
~~
Class "LinearMResultBase"
, directly.
signature(r = "LinearMResult")
: ...
signature(r = "LinearMResult")
: ...
signature(r = "LinearMResult")
: returns
a data.frame
with a row for each gene set tested the
following columns: ID
, Pvalue
, Effect
size,
Size
(number of members), Conditional
(whether the
test used the conditional test), and TestDirection
(for up
or down).
Deepayan Sarkar, Michael Lawrence
showClass("LinearMResult")
showClass("LinearMResult")
This VIRTUAL class represents common elements of the return values of
generic functions like linearMTest
. These elements are
essentially those that are passed through from the input
parameters. See LinearMResult for a concrete result
class with the basic outputs.
A virtual Class: No objects may be created from it.
All of these slots correspond to slots in the LinearMParams class.
annotation
:Object of class "character"
~~
geneIds
:Object of class "ANY"
~~
testName
:Object of class "character"
~~
pvalueCutoff
:Object of class "numeric"
~~
minSize
:Object of class "integer"
~~
testDirection
:Object of class "character"
~~
conditional
:Object of class "logical"
~~
graph
:Object of class "graph"
~~
gsc
:Object of class "GeneSetCollection"
~~
signature(object = "LinearMResultBase")
: ...
signature(r = "LinearMResultBase")
: ...
signature(object = "LinearMResultBase")
: ...
signature(object = "LinearMResultBase")
: ...
signature(object = "LinearMResultBase")
: ...
signature(r = "LinearMResultBase")
: ...
signature(r = "LinearMResultBase")
: ...
signature(r = "LinearMResultBase")
: ...
signature(object = "LinearMResultBase")
: ...
signature(r = "LinearMResultBase")
: ...
signature(object = "LinearMResultBase")
: ...
signature(r = "LinearMResultBase")
:
...
signature(object = "LinearMResultBase")
: ...
signature(r = "LinearMResultBase")
: ...
signature(r = "LinearMResultBase")
: ...
signature(r = "LinearMResultBase")
: ...
Deepayan Sarkar, Michael Lawrence
LinearMResult
,
LinearMParams
,
linearMTest
Given a subclass of LinearMParams
, compute p-values for
detecting systematic up or downregulation of the specified gene set in
the specified category.
linearMTest(p)
linearMTest(p)
p |
An instance of a subclass of |
The per-gene statistics in the geneStats
slot of p
give
a measure of up or down regulation of the individual genes in the
universe.
A LinearMResult
instance.
D. Sarkar
LinearMResult-class
LinearMParams-class
This function returns a graph
object representing the nested
structure of chromosome bands (also known as cytogenetic bands).
The nodes of the graph are band identifiers. Each node has a
geneIds
node attribute that lists the gene IDs that are
annotated at the band (the gene IDs will be Entrez IDs in most
cases).
makeChrBandGraph(chip, univ = NULL)
makeChrBandGraph(chip, univ = NULL)
chip |
A string giving the annotation source. For example, |
univ |
A vector of gene IDs (these should be Entrez IDs for
most annotation sources). The annotations
attached to the graph will be limited to those specified by |
This function parses the data stored in the
<chip>MAP
map from the appropriate annotation data package.
Although cytogenetic bands are observed in all organisms, currently,
only human and mouse band nomenclatures are supported.
A graph-class
instance. The graph will be a
tree and the root node is labeled for the organism.
Seth Falcon
chrGraph <- makeChrBandGraph("hgu95av2.db") chrGraph
chrGraph <- makeChrBandGraph("hgu95av2.db") chrGraph
Using EBarrays to detect differential expression requires the construction of a set of contrasts. This little helper function computes these contrasts for a two level factor.
makeEBcontr(f1, hival)
makeEBcontr(f1, hival)
f1 |
The factor that will define the contrasts. |
hival |
The |
Not much more to add, see EBarrays for more details. This is used in the Category package to let users compute the posterior probability of differential expression, and hence to compute expected numbers of differentially expressed genes, per category.
An object of class “ebarraysPatterns”.
R. Gentleman
if( require("EBarrays") ) { myfac = factor(rep(c("A", "B"), c(12, 24))) makeEBcontr(myfac, "B") }
if( require("EBarrays") ) { myfac = factor(rep(c("A", "B"), c(12, 24))) makeEBcontr(myfac, "B") }
This function is not intended for end-users, but may be useful for developers extending the Hypergeometric testing capabilities provideded by the Category package.
makeValidParams
is intended to validate a parameter object
instance (e.g. HyperGParams or subclass). The idea is that unlike
validObject
, methods for this generic attempt to fix invalid
instances when possible, and in this case issuing a warning, and
only give an error if the object cannot be fixed.
makeValidParams(object)
makeValidParams(object)
object |
A parameter object. Consult |
The value must have the same class as the object
argument.
Seth Falcon
These functions return a mapping of chromosome bands to genes.
makeChrBandGSC
returns a
GeneSetCollection
object,
with a GeneSet
for each band. The other functions return a 0/1
incidence matrix with a row for each chromosme band and a column for
each gene. Only those chromosome bands with at least one gene
annotation will be included.
MAPAmat(chip, univ = NULL, minCount = 0) makeChrBandInciMat(chrGraph) makeChrBandGSC(chrGraph)
MAPAmat(chip, univ = NULL, minCount = 0) makeChrBandInciMat(chrGraph) makeChrBandGSC(chrGraph)
chip |
A string giving the annotation source. For example,
|
univ |
A vector of gene IDs (these should be Entrez IDs for
most annotation sources). The the annotations will be limited to
those in the set specified by |
chrGraph |
A |
minCount |
Bands with less than |
For makeChrBandGSC
, a GeneSetCollection
object with
a GeneSet
for each band.
For the other functions, (0/1) incidence matrix with chromosome bands
as rows and gene IDs as columns. A 1
in m[i, j]
indicates that the chromosome band rownames(m)[i]
contains the
geneID colnames(m)[j]
.
Seth Falcon, Michael Lawrence
makeChrBandGraph
,
cateGOry
,
probes2MAP
have_hgu95av2.db <- suppressWarnings(require("hgu95av2.db")) if (have_hgu95av2.db) mam <- MAPAmat("hgu95av2.db")
have_hgu95av2.db <- suppressWarnings(require("hgu95av2.db")) if (have_hgu95av2.db) mam <- MAPAmat("hgu95av2.db")
NewChrBandTree
and ChrBandTreeFromGraph
provide
constructors for the ChrBandTree
class.
NewChrBandTree(chip, univ) ChrBandTreeFromGraph(g)
NewChrBandTree(chip, univ) ChrBandTreeFromGraph(g)
chip |
The name of an annotation data package |
univ |
A vector of gene identifiers that defines the universe of
genes. Usually, this will be a vector of Entez Gene IDs. If
|
g |
A |
A new ChrBandTree
instance.
S. Falcon
A parameter class for representing all parameters needed for running
the hyperGTest
method with an ontology adhered to the OBO
Foundry (see http://www.obofoundry.org) as the category.
Objects can be created by calls of the form OBOHyperGParams(...)
,
where ...
correspond to slots defined below.
conditional
:A logical indicating whether the calculation should condition on the ontology structure.
geneIds
:Object of class "ANY"
: A vector of
gene identifiers. Numeric and character vectors are probably the
only things that make sense. These are the gene ids for the
selected gene set.
universeGeneIds
:Object of class "ANY"
: A
vector of gene ids in the same format as geneIds
defining a
subset of the gene ids on the chip that will be used as the
universe for the hypergeometric calculation. If this is
NULL
or has length zero, then all gene ids on the chip will
be used.
annotation
:A string giving the name of the annotation data package for the chip used to generate the data.
categorySubsetIds
:Object of class "ANY"
:
If the test method supports it, can be used to specify a subset of
category ids to include in the test instead of all possible
category ids.
categoryName
:A string describing the category. Usually set automatically by subclasses. For example "GO".
datPkg
:Holds a DatPkg object which is of a particular type that in turn varies with the kind of annotation package this is.
pvalueCutoff
:A numeric values between zero and one used as a p-value cutoff for p-values generated by the Hypergeometric test. When the test being performed is non-conditional, this is only used as a default value for printing and summarizing the results. For a conditional analysis, the cutoff is used during the computation to determine perform the conditioning: child terms with a p-value less than pvalueCutoff are conditioned out of the test for their parent term.
orCutoff
:A numeric value used as an odds-ratio
cutoff for odds ratios generated by the conditional
Hypergeometric test. For such a test, it works like the
pvalueCutoff
but applied on the odds ratio. It has no
effect when conditional=FALSE
.
minSizeCutoff
:A numeric value used as a cutoff for
minimum size of the gene sets being tested with the conditional
Hypergeometric test. For such a test, it works like the
pvalueCutoff
but applied on the odds ratio. It has no
effect when conditional=FALSE
.
maxSizeCutoff
:A numeric value used as a cutoff for
maximum size of the gene sets being tested with the conditional
Hypergeometric test. For such a test, it works like the
pvalueCutoff
but applied on the odds ratio. It has no
effect when conditional=FALSE
.
testDirection
:A string which can be either "over" or "under". This determines whether the test performed detects over or under represented GO terms.
Class "HyperGParams"
, directly.
hyperGTest(p)
Perform hypergeometric tests to
assess overrepresentation of category ids in the gene set. See
the documentation for the generic function for details. This
method must be called with a proper subclass of
HyperGParams
.
conditional(p)
, conditional(p) <-
value
Accessors for the conditional flag. When setting,
value
must be TRUE
or FALSE
.
R. Castelo
This function maps probe identifiers to MAP positions using the appropriate Bioconductor meta-data package.
probes2MAP(pids, data = "hgu133plus2")
probes2MAP(pids, data = "hgu133plus2")
pids |
A vector of probe IDs for the chip in use. |
data |
The name of the chip, as a character string. |
Probes are mapped to regions, no checking for duplicate Entrez gene IDs is done.
A vector, the same length as pids
, with the MAP locations.
R. Gentleman
set.seed(123) library("hgu95av2.db") v1 = sample(names(as.list(hgu95av2MAP)), 100) pp = probes2MAP(v1, "hgu95av2.db")
set.seed(123) library("hgu95av2.db") v1 = sample(names(as.list(hgu95av2MAP)), 100) pp = probes2MAP(v1, "hgu95av2.db")
Given a set of probe identifiers from a microarray this function looks up all KEGG pathways that the probe is documented to be involved in.
probes2Path(pids, data = "hgu133plus2")
probes2Path(pids, data = "hgu133plus2")
pids |
A vector of probe identifiers. |
data |
The character name of the chip. |
This is a simple look up in the appropriate chip PATH
data
environment.
A list of pathway vectors. One element for each value of pid
that is mapped to at least one pathway.
R. Gentleman
library("hgu95av2.db") x = c("1001_at", "1000_at") probes2Path(x, "hgu95av2.db")
library("hgu95av2.db") x = c("1001_at", "1000_at") probes2Path(x, "hgu95av2.db")
The data matrix, x
, with two-level factor, fac
, is used
to compute t-tests. The values of fac
are permuted B
times and the complete set of t-tests is performed for each
permutation.
ttperm(x, fac, B = 100, tsO = TRUE)
ttperm(x, fac, B = 100, tsO = TRUE)
x |
A data matrix. The number of columns should be the same as
the length of |
fac |
A factor with two levels. |
B |
An integer specifying the number of permutations. |
tsO |
A logical indicating whether to compute only the t-test
statistic for each permuation. If |
Not much more to say. Probably there is a generic function somewhere, but I could not find it.
A list, the first element is named obs
and contains the true,
observed, values of the t-statistic. The second element is named
ans
and contains a list of length B
containing the
different permuations.
R. Gentleman
x=matrix(rnorm(100), nc=10) y = factor(rep(c("A","B"), c(5,5))) ttperm(x, y, 10)
x=matrix(rnorm(100), nc=10) y = factor(rep(c("A","B"), c(5,5))) ttperm(x, y, 10)
Return all gene ids that are annotated at one or more terms in the
category. If the universeGeneIds
slot of p
has length
greater than zero, then the intersection of the gene ids specified in
that slot and the normal return value is given.
universeBuilder(p)
universeBuilder(p)
p |
A subclass of |
End users should not call this directly. This method gets
called from hyperGTest
. To add support for a new
category, a new method for this generic must be defined. Its
signature should match a subclass of
HyperGParams-class
appropriate for the new
category.
A vector of gene identifiers.
S. Falcon