Title: | Model-based gene set analysis |
---|---|
Description: | Model-based Gene Set Analysis (MGSA) is a Bayesian modeling approach for gene set enrichment. The package mgsa implements MGSA and tools to use MGSA together with the Gene Ontology. |
Authors: | Sebastian Bauer <[email protected]>, Julien Gagneur <[email protected]> |
Maintainer: | Sebastian Bauer <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.55.0 |
Built: | 2024-11-29 06:40:44 UTC |
Source: | https://github.com/bioc/mgsa |
Model-based Gene Set Analysis (MGSA) is a Bayesian modeling approach for gene set enrichment. The package mgsa implements MGSA and tools to use MGSA together with the Gene Ontology.
Sebastian Bauer [email protected], Julien Gagneur [email protected]
S. Bauer, J. Gagneur and P. N. Robinson. GOing Bayesian: model-based gene set analysis of genome-scale data. Nucleic acids research, 2010.
Posterior estimates of the parameter alpha for each MCMC run.
alphaMcmcPost(x) ## S4 method for signature 'MgsaMcmcResults' alphaMcmcPost(x)
alphaMcmcPost(x) ## S4 method for signature 'MgsaMcmcResults' alphaMcmcPost(x)
x |
matrix
: Posterior estimates of the parameter alpha for each MCMC run.
Realization values, posterior estimate and standard error for the parameter alpha.
alphaPost(x) ## S4 method for signature 'MgsaResults' alphaPost(x)
alphaPost(x) ## S4 method for signature 'MgsaResults' alphaPost(x)
x |
a |
data.frame
: realization values, posterior estimate and standard error for the parameter alpha.
Posterior estimates of the parameter beta for each MCMC run.
betaMcmcPost(x) ## S4 method for signature 'MgsaMcmcResults' betaMcmcPost(x)
betaMcmcPost(x) ## S4 method for signature 'MgsaMcmcResults' betaMcmcPost(x)
x |
matrix
: Posterior estimates of the parameter beta for each MCMC run.
Realization values, posterior estimate and standard error for the parameter beta.
betaPost(x) ## S4 method for signature 'MgsaResults' betaPost(x)
betaPost(x) ## S4 method for signature 'MgsaResults' betaPost(x)
x |
a |
data.frame
: realization values, posterior estimate and standard error for the parameter beta.
This functions takes a 1:1 mapping of go.ids to items and returns a full MgsaGOSets instance. The structure of GO is gathered from GO.db. It is sufficient to specify just the directly asserted mapping (or annotation), i.e., the most specific ones. The true path rule is taken account, that is, if an item is annotated to a term then it will be also annotated to more general terms (some people prefer to say that just the transitive closure is calculated).
createMgsaGoSets(go.ids, items)
createMgsaGoSets(go.ids, items)
go.ids |
a character vector of GO ids (GO:00001234) |
items |
a vector of identifiers that are annotated to the term in the corresponding position of the go.ids vector. |
Item annotations of a MgsaSets
.
itemAnnotations(sets, items) ## S4 method for signature 'MgsaSets,missing' itemAnnotations(sets, items) ## S4 method for signature 'MgsaSets,character' itemAnnotations(sets, items)
itemAnnotations(sets, items) ## S4 method for signature 'MgsaSets,missing' itemAnnotations(sets, items) ## S4 method for signature 'MgsaSets,character' itemAnnotations(sets, items)
sets |
an instance of class |
items |
|
a data.frame
: the item annotations.
Returns the indices corresponding to the items
itemIndices(sets, items) ## S4 method for signature 'MgsaSets,character' itemIndices(sets, items) ## S4 method for signature 'MgsaSets,numeric' itemIndices(sets, items)
itemIndices(sets, items) ## S4 method for signature 'MgsaSets,character' itemIndices(sets, items) ## S4 method for signature 'MgsaSets,numeric' itemIndices(sets, items)
sets |
an instance of class |
items |
|
a integer
: the item indices.
Length (number of sets) of MgsaSets
.
## S4 method for signature 'MgsaSets' length(x)
## S4 method for signature 'MgsaSets' length(x)
x |
an instance of class |
integer
vector.
Estimate marginal posterior of the MGSA problem with an MCMC sampling algorithm.
mgsa(o, sets, population = NULL, p = seq(min(0.1, 1/length(sets)), min(0.3, 20/length(sets)), length.out = 10), ...) ## S4 method for signature 'integer,list' mgsa(o, sets, population = NULL, p = seq(1, min(20, floor(length(sets)/3)), length.out = 10)/length(sets), ...) ## S4 method for signature 'numeric,list' mgsa(o, sets, population = NULL, p = seq(1, min(20, floor(length(sets)/3)), length.out = 10)/length(sets), ...) ## S4 method for signature 'character,list' mgsa(o, sets, population = NULL, p = seq(1, min(20, floor(length(sets)/3)), length.out = 10)/length(sets), ...) ## S4 method for signature 'logical,list' mgsa(o, sets, population = NULL, p = seq(min(0.1, 1/length(sets)), min(0.3, 20/length(sets)), length.out = 10), ...) ## S4 method for signature 'character,MgsaSets' mgsa(o, sets, population = NULL, p = seq(min(0.1, 1/length(sets)), min(0.3, 20/length(sets)), length.out = 10), ...)
mgsa(o, sets, population = NULL, p = seq(min(0.1, 1/length(sets)), min(0.3, 20/length(sets)), length.out = 10), ...) ## S4 method for signature 'integer,list' mgsa(o, sets, population = NULL, p = seq(1, min(20, floor(length(sets)/3)), length.out = 10)/length(sets), ...) ## S4 method for signature 'numeric,list' mgsa(o, sets, population = NULL, p = seq(1, min(20, floor(length(sets)/3)), length.out = 10)/length(sets), ...) ## S4 method for signature 'character,list' mgsa(o, sets, population = NULL, p = seq(1, min(20, floor(length(sets)/3)), length.out = 10)/length(sets), ...) ## S4 method for signature 'logical,list' mgsa(o, sets, population = NULL, p = seq(min(0.1, 1/length(sets)), min(0.3, 20/length(sets)), length.out = 10), ...) ## S4 method for signature 'character,MgsaSets' mgsa(o, sets, population = NULL, p = seq(min(0.1, 1/length(sets)), min(0.3, 20/length(sets)), length.out = 10), ...)
o |
The observations. It can be a |
sets |
The sets. It can be an |
population |
The total population. Optional. A |
p |
Grid of values for the parameter p. Values represent probabilities of term activity and therefore must be in [0,1]. |
... |
Optional arguments that are passed to the methods. Supported parameters are
|
The function can handle items (such as genes) encoded as character
or integer
.
For convenience numeric
items can also be provided but these values should essentially be integers.
The type of items in the observations o
, the sets
and in the optional population
should be consistent.
In the case of character
items, o
and population
should be of type character
and sets
can either be an MgsaSets
or a list
of character
vectors.
In the case of integer
items, o
should be of type integer
, numeric
(but essentially with integer values),
or logical
and entries in sets
as well as the population
should be integer
.
When o
is logical
, it is first coerced to integer with a call on which
.
Observations outside the population
are not taken into account. If population
is NULL
, it is defined as the union of all sets.
The default grid value for p is such that between 1 and 20 sets are active in expectation.
The lower limit is constrained to be lower than 0\.1 and the upper limit lower than 0\.3 independently of the total number of sets to make sure that complex solutions are penalized.
Marginal posteriors of activity of each set are estimated using an MCMC sampler as described in Bauer et al., 2010.
Because convergence of an MCM sampler is difficult to assess, it is recommended to run it several times (using restarts
).
If variations between runs are too large (see MgsaResults
), the number of steps (steps
) of each MCMC run should be increased.
An MgsaMcmcResults
object.
Bauer S., Gagneur J. and Robinson P. GOing Bayesian: model-based gene set analysis of genome-scale data. Nucleic Acids Research (2010) http://nar.oxfordjournals.org/content/38/11/3523.full
## observing items A and B, with sets {A,B,C} and {B,C,D} mgsa(c("A", "B"), list(set1 = LETTERS[1:3], set2 = LETTERS[2:4])) ## same case with integer representation of the items and logical observation mgsa(c(TRUE,TRUE,FALSE,FALSE), list(set1 = 1:3, set2 = 2:4)) ## a small example with gene ontology sets and plot data(example) fit = mgsa(example_o, example_go) ## Not run: plot(fit) ## End(Not run)
## observing items A and B, with sets {A,B,C} and {B,C,D} mgsa(c("A", "B"), list(set1 = LETTERS[1:3], set2 = LETTERS[2:4])) ## same case with integer representation of the items and logical observation mgsa(c(TRUE,TRUE,FALSE,FALSE), list(set1 = 1:3, set2 = 2:4)) ## a small example with gene ontology sets and plot data(example) fit = mgsa(example_o, example_go) ## Not run: plot(fit) ## End(Not run)
This class represents gene ontology annotations.
For now, it is identical to the parental class MgsaSets
.
Instances of this class are used to hold the additional information that was provided by running (possibly multiple times) an MCMC algorithm.
nsamples
how many samples collected per MCMC run
steps
how many steps per MCMC run
restarts
how many MCMC runs
alphaMcmcPost
posterior estimates for each MCMC run of the parameter alpha
betaMcmcPost
posterior estimates for each MCMC run of the parameter beta
pMcmcPost
posterior estimates for each MCMC run of the parameter p
setsMcmcPost
posterior estimates for each MCMC run of the sets marginal posterior probabilities
The columns of the matrices alphaMcmcPost
, betaMcmcPost
, pMcmcPost
and setsMcmcPost stores the posterior estimates for each individual MCMC run.
The row order matches the one of the slot alphaPost
, betaPost
, pPots
, and setsResults
respectively.
Accessor methods exist for each slot.
The results of an MGSA analysis.
populationSize
The number of items in the population.
studySetSizeInPopulation
The number of items both in the study set and in the population.
alphaPost
with columns value
, estimate
and std.error
.
betaPost
with columns value
, estimate
and std.error
.
pPost
with columns value
, estimate
and std.error
.
setsResults
with columns inPopulation
, inStudySet
, estimate
and std.error
.
The columns of the slots alphaPost
, betaPost
, and pPost
contains a realization value, its posterior estimate and standard error for the parameters alpha, beta and p respectively.
The columns of the slot setsResults
contains the number of items of the set in the population, the number of items of the set in the study set, the estimate of its marginal posterior probability and its standard error.
The rownames
are the names of the sets if available.
Because an MgsaResults
is the outcome of an MGSA analysis (see mgsa
), accessors but no replacement methods exist for each slot.
This class describes sets, items and their annotations.
Internally, the method mgsa
indexes all elements of the sets before fitting the model.
In case mgsa
must be run on several observations with the same gene sets, computations can be speeded up by performing this indexing once for all.
This can be achieved by building a MgsaSets
.
In order to ensure consistency of the indexing, no replace method for any slot is provided. Accessors are available.
The data frames setAnnotations
and itemAnnotations
allow to store annotations. No constraint is imposed on the number and names of their columns.
sets
A list whose elements are vector of item indices.
itemName2ItemIndex
The mapping of item names to index.
numberOfItems
How many items?
setAnnotations
Annotations of the sets. The rownames
are set names.
itemAnnotations
Annotations of the items. The rownames
are item names.
new("MgsaSets", sets=list(set1=c("a", "b"), set2=c("b", "c")))
new("MgsaSets", sets=list(set1=c("a", "b"), set2=c("b", "c")))
how many samples collected per MCMC run.
nsamples(x) ## S4 method for signature 'MgsaMcmcResults' nsamples(x)
nsamples(x) ## S4 method for signature 'MgsaMcmcResults' nsamples(x)
x |
integer
: how many samples per MCMC run collected.
Plot method for MgsaResults objects
## S4 method for signature 'MgsaResults' plot(x, y, ...)
## S4 method for signature 'MgsaResults' plot(x, y, ...)
x |
|
y |
unused |
... |
unused |
Posterior estimates of the parameter p for each MCMC run.
pMcmcPost(x) ## S4 method for signature 'MgsaMcmcResults' pMcmcPost(x)
pMcmcPost(x) ## S4 method for signature 'MgsaMcmcResults' pMcmcPost(x)
x |
matrix
: Posterior estimates of the parameter p for each MCMC run.
The size of the population on which the analysis was run.
populationSize(x) ## S4 method for signature 'MgsaResults' populationSize(x)
populationSize(x) ## S4 method for signature 'MgsaResults' populationSize(x)
x |
a |
integer
: the size of the population.
Realization values, posterior estimate and standard error for the parameter p.
pPost(x) ## S4 method for signature 'MgsaResults' pPost(x)
pPost(x) ## S4 method for signature 'MgsaResults' pPost(x)
x |
a |
data.frame
: realization values, posterior estimate and standard error for the parameter p.
Creates a MgsaGoSets using gene ontology annotations provided by a file in GAF 1.0 or 2.0 format.
readGAF(filename, evidence=NULL, aspect=c("P", "F", "C"))
readGAF(filename, evidence=NULL, aspect=c("P", "F", "C"))
filename |
The name of the Gene Ontology annotation file. It must be in the GAF 1.0 or 2.0 format. It may be gzip-compressed. |
evidence |
|
aspect |
|
The function extracts from the annotation file all direct gene annotations and infers from the Gene Ontology all the indirect annotations (due to term relationships).
This is done using the package Go.db
which provides the ontology as a database and RSQLite
for querying the database.
An MgsaGoSets
object.
The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology. Nature Genetics, 2000. The GAF file format: http://www.geneontology.org/GO.format.annotation.shtml GO evidence codes: http://www.geneontology.org/GO.evidence.shtml
## parsing provided example file (yeast) gofile = system.file("example_files/gene_association_head.sgd", package="mgsa") readGAF(gofile) ## only annoations infered from experiment or a direct assay readGAF(gofile, evidence=c("EXP", "IDA"))
## parsing provided example file (yeast) gofile = system.file("example_files/gene_association_head.sgd", package="mgsa") readGAF(gofile) ## only annoations infered from experiment or a direct assay readGAF(gofile, evidence=c("EXP", "IDA"))
how many MCMC runs.
restarts(x) ## S4 method for signature 'MgsaMcmcResults' restarts(x)
restarts(x) ## S4 method for signature 'MgsaMcmcResults' restarts(x)
x |
integer
: how many MCMC runs.
Set annotations of a MgsaSets
.
setAnnotations(sets, names) ## S4 method for signature 'MgsaSets,missing' setAnnotations(sets, names) ## S4 method for signature 'MgsaSets,character' setAnnotations(sets, names)
setAnnotations(sets, names) ## S4 method for signature 'MgsaSets,missing' setAnnotations(sets, names) ## S4 method for signature 'MgsaSets,character' setAnnotations(sets, names)
sets |
an instance of class |
names |
|
a data.frame
: the set annotations.
Posterior estimates of the set marginal probabilities for each MCMC run.
setsMcmcPost(x) ## S4 method for signature 'MgsaMcmcResults' setsMcmcPost(x)
setsMcmcPost(x) ## S4 method for signature 'MgsaMcmcResults' setsMcmcPost(x)
x |
matrix
: Posterior estimates of the set marginal probabilities for each MCMC run.
Number of items of the set in the population, the number of items of the set in the study set, the estimate of its marginal posterior probability and its standard error.
setsResults(x) ## S4 method for signature 'MgsaResults' setsResults(x)
setsResults(x) ## S4 method for signature 'MgsaResults' setsResults(x)
x |
a |
data.frame
: For each set, number of items of the set in the population, number of items of the set in the study set, estimate of its marginal posterior probability and standard error.
Show an MgsaResults
.
## S4 method for signature 'MgsaResults' show(object)
## S4 method for signature 'MgsaResults' show(object)
object |
an instance of class |
an invisible NULL
Show an MgsaSets
.
## S4 method for signature 'MgsaSets' show(object)
## S4 method for signature 'MgsaSets' show(object)
object |
an instance of class |
an invisible NULL
how many steps per MCMC run.
steps(x) ## S4 method for signature 'MgsaMcmcResults' steps(x)
steps(x) ## S4 method for signature 'MgsaMcmcResults' steps(x)
x |
integer
: how many steps per MCMC run.
The size of the study set on which the analysis was run.
studySetSizeInPopulation(x) ## S4 method for signature 'MgsaResults' studySetSizeInPopulation(x)
studySetSizeInPopulation(x) ## S4 method for signature 'MgsaResults' studySetSizeInPopulation(x)
x |
a |
integer
: the size of the study set.
Returns a subset of an MgsaSets
that contains
only the specified items. Empty sets are removed.
subMgsaSets(sets, items) ## S4 method for signature 'MgsaSets,character' subMgsaSets(sets, items)
subMgsaSets(sets, items) ## S4 method for signature 'MgsaSets,character' subMgsaSets(sets, items)
sets |
an |
items |
|
an MgsaSets
.