Title: | Interpretation and enrichment for metabolomics data |
---|---|
Description: | Enrichment of metabolomics data using KEGG entries. Given a set of affected compounds, FELLA suggests affected reactions, enzymes, modules and pathways using label propagation in a knowledge model network. The resulting subnetwork can be visualised and exported. |
Authors: | Sergio Picart-Armada [aut, cre], Francesc Fernandez-Albert [aut], Alexandre Perera-Lluna [aut] |
Maintainer: | Sergio Picart-Armada <[email protected]> |
License: | GPL-3 |
Version: | 1.27.0 |
Built: | 2024-11-20 06:45:15 UTC |
Source: | https://github.com/bioc/FELLA |
An internal S4 class for the diffusion data
matrix
Numeric (dense) matrix [optional]
rowSums
Numeric named vector with rowSums internal data
squaredRowSums
Numeric named vector with squaredRowSums internal data
An internal S4 class for the binary matrix (hypergeometric test)
matrix
Binary sparse matrix
An internal S4 class to represent the KEGG graph and related files
graph
KEGG graph
id2name
Mapping list: KEGG ID to KEGG name (can contain multiple hits)
pvalues.size
Numeric matrix for the evaluation of CC through their size
id
List with character vectors for KEGG categories
status
Character that specifies the current status of this S4 class
An internal S4 class for the PageRank data
matrix
Numeric (dense) matrix [optional]
rowSums
Numeric named vector with rowSums internal data
squaredRowSums
Numeric named vector with squaredRowSums internal data
Function buildGraphFromKEGGREST
makes use of the KEGG
REST API (requires internet connection)
to build and return the curated KEGG graph.
Function buildDataFromGraph
takes as input the KEGG graph
generated by buildGraphFromKEGGREST
and writes the KEGG knowledge model in the desired permanent directory.
Function loadKEGGdata
loads the internal files
containing the KEGG knowledge model into a
FELLA.DATA
object.
In general, generateGraphFromKEGGREST
and
generateDataFromGraph
are one-time executions
for a given organism and knowledge model,
in this precise order.
On the other hand, the user needs to run loadKEGGdata
in every new R session to load such model into a
FELLA.DATA
object.
buildGraphFromKEGGREST(organism = "hsa", filter.path = NULL) buildDataFromGraph(keggdata.graph = NULL, databaseDir = NULL, internalDir = TRUE, matrices = c("hypergeom", "diffusion", "pagerank"), normality = c("diffusion", "pagerank"), dampingFactor = 0.85, niter = 100) loadKEGGdata(databaseDir = tail(listInternalDatabases(), 1), internalDir = TRUE, loadMatrix = NULL)
buildGraphFromKEGGREST(organism = "hsa", filter.path = NULL) buildDataFromGraph(keggdata.graph = NULL, databaseDir = NULL, internalDir = TRUE, matrices = c("hypergeom", "diffusion", "pagerank"), normality = c("diffusion", "pagerank"), dampingFactor = 0.85, niter = 100) loadKEGGdata(databaseDir = tail(listInternalDatabases(), 1), internalDir = TRUE, loadMatrix = NULL)
organism |
Character, KEGG code for the organism of interest |
filter.path |
Character vector, pathways to filter.
This is a pattern matched using regexp.
E.g: |
keggdata.graph |
An igraph
object generated by the function
|
databaseDir |
Character containing the directory to save KEGG files.
It is a relative directory inside the library location
if |
internalDir |
Logical, should the directory be internal in the package directory? |
matrices |
A character vector, containing any of these:
|
normality |
A character vector, containing any of these:
|
dampingFactor |
Numeric value between 0 and 1 (none inclusive),
damping factor |
niter |
Numeric value, number of iterations to estimate the p-values for the CC size. Between 10 and 1e3. |
loadMatrix |
Character vector to choose if
heavy matrices should be loaded.
Can contain: |
In function buildGraphFromKEGGREST
,
The user specifies (i) an organism, and (ii) patterns matching
pathways that should not be included as nodes.
A graph object, as described in [Picart-Armada, 2017],
is built from the comprehensive
KEGG database [Kanehisa, 2017].
As described in the main vignette, accessible through
browseVignettes("FELLA")
, this graph has five levels that
represent categories of KEGG nodes.
From top to bottom: pathways, modules, enzymes, reactions and compounds.
This knowledge representation is resemblant to the one formerly
used by MetScape [Karnovsky, 2011], in which enzymes connect
to genes instead of modules and pathways.
The necessary KEGG annotations
are retrieved through KEGGREST R package [Tenenbaum, 2013].
Connections between pathways/modules and enzymes are inferred through
organism-specific genes, i.e. an edge is added if a gene
connects both entries.
However, in order to enrich metabolomics data, the user has to
pass the graph object to buildDataFromGraph
to obtain the FELLA.USER
object.
All the networks are handled with the igraph R package [Csardi, 2006].
Using buildDataFromGraph
is the second step
to use the FELLA
package.
The knoledge graph is used to compute other internal variables that are
required to run any enrichment.
The main point behind the enrichment is to provide a small
part of the knowledge graph relevant to the supplied metabolites.
This is accomplished through diffusion processes and random walks,
followed by a statistical normalisation,
as described in [Picart-Armada, 2017].
When building the internal files,
the user can choose whether to store (i) matrices for each
provided method, and (ii) vectors derived from such matrices
to use the parametric approaches.
These are optional but enable (i) faster permutations and custom
metabolite backgrounds, and (ii) parametric approaches.
WARNING: diffusion and PageRank matrices in (i)
can allocate up to 250MB each.
On the other hand, the niter
parameter
controls the amount of trials to approximate the
distribution of the connected component size under
uniform node sampling.
For further info, see the option thresholdConnectedComponent
in the details from ?generateResultsGraph
.
Regarding the destination, the user can specify
the name of the directory.
Otherwise a name containing the creation date, the organism
and the KEGG release will be used.
The database can be stored within the library path or in a
custom location.
Function loadKEGGdata
returns a
FELLA.DATA
object from any of the
databases generated by FELLA.DATA
.
This object is the starting point of any enrichment
using FELLA
.
In case the user built the matrices for "diffusion" and "pagerank",
he or she can choose to load them.
Further detail on the methods can be found in [Picart-Armada, 2017].
The matrices allow a faster computation and the definition
of a custom background, but use up to 250MB of memory each.
buildGraphFromKEGGREST
returns the
curated KEGG graph (class igraph)
buildDataFromGraph
returns
invisible(TRUE)
if successful.
As a side effect, the
directory outdir
is created, containing
the internal data.
loadKEGGdata
returns the
FELLA.DATA
object
that contains the KEGG knowledge representation.
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., & Morishima, K. (2017). KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic acids research, 45(D1), D353-D361.
Karnovsky, A., Weymouth, T., Hull, T., Tarcea, V. G., Scardoni, G., Laudanna, C., ... & Athey, B. (2011). Metscape 2 bioinformatics tool for the analysis and visualization of metabolomics and gene expression data. Bioinformatics, 28(3), 373-380.
Tenenbaum, D. (2013). KEGGREST: Client-side REST access to KEGG. R package version, 1(1).
Chang, W., Cheng, J., Allaire, JJ., Xie, Y., & McPherson, J. (2017). shiny: Web Application Framework for R. R package version 1.0.5. https://CRAN.R-project.org/package=shiny
Picart-Armada, S., Fernandez-Albert, F., Vinaixa, M., Rodriguez, M. A., Aivio, S., Stracker, T. H., Yanes, O., & Perera-Lluna, A. (2017). Null diffusion-based enrichment for metabolomics data. PLOS ONE, 12(12), e0189012.
class FELLA.DATA
## Toy example ## In this case, the graph is not built from current KEGG. ## It is loaded from sample data in FELLA data("FELLA.sample") ## Graph to build the database (this example is a bit hacky) g.sample <- FELLA:::getGraph(FELLA.sample) dir.tmp <- paste0(tempdir(), "/", paste(sample(letters), collapse = "")) ## Build internal files in a temporary directory buildDataFromGraph( keggdata.graph = g.sample, databaseDir = dir.tmp, internalDir = FALSE, matrices = NULL, normality = NULL, dampingFactor = 0.85, niter = 10) ## Load database myFELLA.DATA <- loadKEGGdata( dir.tmp, internalDir = FALSE) myFELLA.DATA ###################### ## Not run: ## Full example ## First step: graph for Mus musculus discarding the mmu01100 pathway ## (an analog example can be built from human using organism = "hsa") g.mmu <- buildGraphFromKEGGREST( organism = "mmu", filter.path = "mmu01100") summary(g.mmu) cat(comment(g.mmu)) ## Second step: build internal files for this graph ## (consumes some time and memory, especially if we compute "diffusion" and "pagerank" matrices) buildDataFromGraph( keggdata.graph = g.mmu, databaseDir = "example_db_mmu", internalDir = TRUE, matrices = c("hypergeom", "diffusion", "pagerank"), normality = c("diffusion", "pagerank"), dampingFactor = 0.85, niter = 1e3) ## Third step: load the internal files into a FELLA.DATA object FELLA.DATA.mmu <- loadKEGGdata( "example_db_mmu", internalDir = TRUE, loadMatrix = c("diffusion", "pagerank")) FELLA.DATA.mmu ## End(Not run)
## Toy example ## In this case, the graph is not built from current KEGG. ## It is loaded from sample data in FELLA data("FELLA.sample") ## Graph to build the database (this example is a bit hacky) g.sample <- FELLA:::getGraph(FELLA.sample) dir.tmp <- paste0(tempdir(), "/", paste(sample(letters), collapse = "")) ## Build internal files in a temporary directory buildDataFromGraph( keggdata.graph = g.sample, databaseDir = dir.tmp, internalDir = FALSE, matrices = NULL, normality = NULL, dampingFactor = 0.85, niter = 10) ## Load database myFELLA.DATA <- loadKEGGdata( dir.tmp, internalDir = FALSE) myFELLA.DATA ###################### ## Not run: ## Full example ## First step: graph for Mus musculus discarding the mmu01100 pathway ## (an analog example can be built from human using organism = "hsa") g.mmu <- buildGraphFromKEGGREST( organism = "mmu", filter.path = "mmu01100") summary(g.mmu) cat(comment(g.mmu)) ## Second step: build internal files for this graph ## (consumes some time and memory, especially if we compute "diffusion" and "pagerank" matrices) buildDataFromGraph( keggdata.graph = g.mmu, databaseDir = "example_db_mmu", internalDir = TRUE, matrices = c("hypergeom", "diffusion", "pagerank"), normality = c("diffusion", "pagerank"), dampingFactor = 0.85, niter = 1e3) ## Third step: load the internal files into a FELLA.DATA object FELLA.DATA.mmu <- loadKEGGdata( "example_db_mmu", internalDir = TRUE, loadMatrix = c("diffusion", "pagerank")) FELLA.DATA.mmu ## End(Not run)
Function defineCompounds
creates a
FELLA.USER
object from a list of
compounds and a FELLA.DATA
object.
Functions runHypergeom
,
runDiffusion
and runPagerank
perform an enrichment on a FELLA.USER
with
the mapped input metabolites
(through defineCompounds
)
and a FELLA.DATA
object.
They are based on the hypergeometric test, the heat diffusion model
and the PageRank algorithm, respectively.
Function enrich
is a wrapper with
the following order:
loadKEGGdata
(optional),
defineCompounds
and one or more in
runHypergeom
, runDiffusion
and runPagerank
defineCompounds(compounds = NULL, compoundsBackground = NULL, data = NULL) runHypergeom(object = NULL, data = NULL, p.adjust = "fdr") runDiffusion(object = NULL, data = NULL, approx = "normality", t.df = 10, niter = 1000) runPagerank(object = NULL, data = NULL, approx = "normality", dampingFactor = 0.85, t.df = 10, niter = 1000) enrich(compounds = NULL, compoundsBackground = NULL, methods = listMethods(), loadMatrix = "none", approx = "normality", t.df = 10, niter = 1000, databaseDir = NULL, internalDir = TRUE, data = NULL, ...)
defineCompounds(compounds = NULL, compoundsBackground = NULL, data = NULL) runHypergeom(object = NULL, data = NULL, p.adjust = "fdr") runDiffusion(object = NULL, data = NULL, approx = "normality", t.df = 10, niter = 1000) runPagerank(object = NULL, data = NULL, approx = "normality", dampingFactor = 0.85, t.df = 10, niter = 1000) enrich(compounds = NULL, compoundsBackground = NULL, methods = listMethods(), loadMatrix = "none", approx = "normality", t.df = 10, niter = 1000, databaseDir = NULL, internalDir = TRUE, data = NULL, ...)
compounds |
Character vector containing the KEGG IDs of the compounds considered as affected |
compoundsBackground |
Character vector containing the KEGG IDs of
the compounds that belong to the background. Can be |
data |
FELLA.DATA object |
object |
FELLA.USER object |
p.adjust |
Character passed to the
|
approx |
Character: "simulation" for Monte Carlo, "normality", "gamma" or "t" for parametric approaches |
t.df |
Numeric value; number of degrees of freedom
of the t distribution
if the approximation |
niter |
Number of iterations (permutations) for Monte Carlo ("simulation"), must be a numeric value between 1e2 and 1e5 |
dampingFactor |
Numeric value between 0 and 1 (none inclusive),
damping factor |
methods |
Character vector, containing some of:
|
loadMatrix |
Character vector to choose if
heavy matrices should be loaded.
Can contain: |
databaseDir |
Character, path to load the
|
internalDir |
Logical, is the directory located in the package directory? |
... |
Further arguments for the enrichment function(s)
|
Function defineCompounds
maps the
specficied list of KEGG compounds [Kanehisa, 2017], usually from an
experimental metabolomics study, to the graph contained in the
FELLA.DATA
object.
Importantly, the names must be KEGG ids, so other formats
(common names, HMDB ids, etc) must be mapped to KEGG first.
For example, through the "Compound ID Conversion"
tool in MetaboAnalyst [Xia, 2015].
The user can also define a personalised background as a
list of KEGG compound ids, which should be more extensive than
the list of input metabolites.
Once the compounds are mapped, the enrichment
can be performed through runHypergeom
,
runDiffusion
and runPagerank
.
Function runHypergeom
performs an over representation analysis
through the hypergeometric test [Fisher, 1935] on a
FELLA.USER
object with mapped metabolites
and a FELLA.DATA
object.
If a custom background was specified, it will be used.
This approach is included for completeness and it is not the
main purpose behind the FELLA
package.
Importantly, runHypergeom
is not a hypergeometric test using the
original KEGG pathways.
Instead, a compound "belongs" to a "pathway" if
it can reach the original pathway in the
upwards-directed KEGG graph.
This is a way to evaluate enrichment including indirect connections
to a pathway, e.g. through an enzymatic family.
New "pathways" are expected to be larger than the original pathways
in this analysis and therefore the results can differ from the
standard over representation.
Function runDiffusion
performs
the diffusion-based enrichment on a
FELLA.USER
object with mapped metabolites
and a FELLA.DATA
object [Picart-Armada, 2017].
If a custom background was specified, it will be used.
The idea behind the heat diffusion is the usage of the
finite difference formulation of the heat equation to
propagate labels from the metabolites to the rest of the graph.
Following the notation in [Picart-Armada, 2017], the temperatures (diffusion scores) are computed as:
G
is an indicator vector of the input metabolites
(1
if input metabolite, 0
otherwise).
KI
is the matrix -KI = L + B
, being
L
the unnormalised graph Laplacian and
B
the diagonal matrix with B[i,i] = 1
if
node i
is a pathway and B[i,i] = 0
otherwise.
Equivalently, with the notation in the HotNet approach [Vandin, 2011],
the stationary temperature is named fs
:
bs
is the indicator vector G
from above.
Lgamma
, on the other hand, is found as
Lgamma = L + gamma*I
, where L
is the unnormalised
graph Laplacian, gamma
is the first order leaking rate
and I
is the identity matrix.
In our formulation, only the pathway nodes are allowed to leak,
therefore I
is switched to B
.
The parameter gamma
is set to gamma = 1
.
The input metabolites are forced to stay warm, propagating flow to all the nodes in the network. However, only pathway nodes are allowed to evacuate this flow, so that its directionality is bottom-up. Further details on the setup of the diffusion process can be found in the supplementary file S2 from [Picart-Armada, 2017].
Finally, the warmest nodes in the graph are reported as the relevant sub-network. This will probably include some input metabolites and also reactions, enzymes, modules and pathways. Other metabolites can be suggested as well.
Function runPagerank
performs the random walk
based enrichment on a
FELLA.USER
object with mapped metabolites
and a FELLA.DATA
object.
If a custom background was specified, it will be used.
PageRank was originally conceived as a scoring system for websites
[Page, 1999].
Intuitively, PageRank favours nodes that
(1) have a large amount of nodes pointing
at them, and (2) whose pointing nodes also have high scores.
Classical PageRank is formulated in terms of a random walker -
the PageRank of a given node is the stationary probability
of the walker visiting it.
The walker chooses, in each step,
whether to continue the random walk with probability
dampingFactor
or to restart it with probability
1 - dampingFactor
.
In the original publication, dampingFactor = 0.85
,
which is the value used in FELLA
by default.
If he or she continues, an edge is picked from the outgoing edges
in the current node with a probability proportional to its weight.
If he or she restarts it, a node is uniformly picked from the
whole graph.
The "personalised PageRank" variant allows a user-defined
distribution as the source of new random walks.
The R package igraph
contains such variant in its
page.rank
function [Csardi, 2006].
As described in the supplement S3 from [Picart-Armada, 2017],
the PageRank PR
can be computed as
a column vector by imposing a stationary
state in the probability.
With a damping factor d
and the user-defined
distribution p
as a column vector:
M
is the matrix whose element M[i,j]
is the
probability of transitioning from j
to i
.
If node j
has outgoing edges, their probability is proportional
to their weight - all weights must be positive.
If node j
has no outgoing edges, the probability is
uniform over all the nodes, i.e. M[i,j] = 1/nrow(M)
for every i
.
Note that all the columns from M
sum up exactly 1
.
This leads to an expression to compute PageRank:
The idea behind the method "pagerank"
is closely related
to "diffusion"
.
Relevant metabolites are the sources of new random walks and
nodes are scored through their PageRank.
Specifically, p
is set to a uniform probability on the
input metabolites.
More details on the setup can be found in
the supplementary file S3 from [Picart-Armada, 2017].
There is an important detail for "diffusion"
and "pagerank"
: the scores are statistically normalised.
Omitting this normalisation leads to a systematic bias,
especially in pathway nodes, as described in [Picart-Armada, 2017].
Therefore, in both cases, scores undergo a normalisation
through permutation analysis.
The score of a node i
is compared to its null distribution
under input permutation, leading to their p-scores.
As described in [Picart-Armada, 2017], two alternatives are offered:
a parametric and deterministic approach
and a non-parametric, stochastic one.
Stochastic Monte Carlo trials ("simulation"
) imply
randomly permuting the input niter
times and counting,
for each node i
, how many trials
led to an equally or more extreme value than the original score.
An empirical p-value is returned [North, 2002].
On the other hand, the parametric
scores (approx = "normality"
)
give a z-score for such permutation analysis.
The expected value and variance of such null distributions
are known quantities, see supplementary
file S4 from [Picart-Armada, 2017].
To work in the same range [0,1]
, z-scores are
transformed using the routine pnorm
.
The user can also choose the Student's t using
approx = "t"
and choosing a number of degrees of freedom
through t.df
.
This uses the function pt
instead.
Alternatively, a gamma distribution can be used by setting
approx = "gamma"
.
The theoretical mean (E) and variance (V)
are used to define the shape
(E^2/V) and scale (V/E) of the gamma distribution, and
pgamma
to map to [0,1].
Any sub-network prioritised by "diffusion"
and "pagerank"
is selected by applying
a threshold on the p-scores.
Finally, the function enrich
is a wrapper to perform the enrichment analysis.
If no FELLA.DATA
object is supplied,
it loads it, maps the affected compounds and performs
the desired enrichment(s) with a single call.
Returned is a list with the loaded
FELLA.DATA
object
and the results in a FELLA.USER
object.
Conversely, the user can supply the
FELLA.DATA
object and the wrapper
will map the metabolites and run the desired enrichment
method(s).
In this case, only the FELLA.USER
will be returned.
defineCompounds
returns
the FELLA.USER
object
with the mapped metabolites, ready to be enriched.
runHypergeom
returns a
FELLA.USER
object
updated with the hypergeometric test results
runDiffusion
returns a
FELLA.USER
object
updated with the diffusion enrichment results
runPagerank
returns a
FELLA.USER
object
updated with the PageRank enrichment results
enrich
returns a
FELLA.USER
object
updated with the desired enrichment results if
the FELLA.DATA
was supplied.
Otherwise, a list with the freshly loaded
FELLA.DATA
object and the
corresponding enrichment in the
FELLA.USER
object.
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., & Morishima, K. (2017). KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic acids research, 45(D1), D353-D361.
Xia, J., Sinelnikov, I. V., Han, B., & Wishart, D. S. (2015). MetaboAnalyst 3.0 - making metabolomics more meaningful. Nucleic acids research, 43(W1), W251-W257.
Fisher, R. A. (1935). The logic of inductive inference. Journal of the Royal Statistical Society, 98(1), 39-82.
Picart-Armada, S., Fernandez-Albert, F., Vinaixa, M., Rodriguez, M. A., Aivio, S., Stracker, T. H., Yanes, O., & Perera-Lluna, A. (2017). Null diffusion-based enrichment for metabolomics data. PLOS ONE, 12(12), e0189012.
Vandin, F., Upfal, E., & Raphael, B. J. (2011). Algorithms for detecting significantly mutated pathways in cancer. Journal of Computational Biology, 18(3), 507-522.
Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford InfoLab.
Csardi, G., & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695(5), 1-9.
North, B. V., Curtis, D., & Sham, P. C. (2002). A note on the calculation of empirical P values from Monte Carlo procedures. American journal of human genetics, 71(2), 439.
## Load the internal database. ## This one is a toy example! ## Do not use as a regular database data(FELLA.sample) ## Load a list of compounds to enrich data(input.sample) ###################### ## Example, step by step ## First, map the compounds obj <- defineCompounds( compounds = c(input.sample, "I_dont_map", "me_neither"), data = FELLA.sample) obj ## See the mapped and unmapped compounds getInput(obj) getExcluded(obj) ## Compounds are already mapped ## We can enrich using any method now ## If no compounds are mapped an error is thrown. Example: ## Not run: data(FELLA.sample) obj <- defineCompounds( compounds = c("C00049", "C00050"), data = FELLA.sample) ## End(Not run) ## Enrich using hypergeometric test obj <- runHypergeom( object = obj, data = FELLA.sample) obj ## Enrich using diffusion ## Note how the results are added; ## the hypergeometric results are not overwritten obj <- runDiffusion( object = obj, approx = "normality", data = FELLA.sample) obj ## Enrich using PageRank ## Again, this does not overwrite other methods obj <- runPagerank( object = obj, approx = "simulation", data = FELLA.sample) obj ###################### ## Example using the "enrich" wrapper ## Only diffusion obj.wrap <- enrich( compounds = input.sample, method = "diffusion", data = FELLA.sample) obj.wrap ## All the methods obj.wrap <- enrich( compounds = input.sample, methods = FELLA::listMethods(), data = FELLA.sample) obj.wrap
## Load the internal database. ## This one is a toy example! ## Do not use as a regular database data(FELLA.sample) ## Load a list of compounds to enrich data(input.sample) ###################### ## Example, step by step ## First, map the compounds obj <- defineCompounds( compounds = c(input.sample, "I_dont_map", "me_neither"), data = FELLA.sample) obj ## See the mapped and unmapped compounds getInput(obj) getExcluded(obj) ## Compounds are already mapped ## We can enrich using any method now ## If no compounds are mapped an error is thrown. Example: ## Not run: data(FELLA.sample) obj <- defineCompounds( compounds = c("C00049", "C00050"), data = FELLA.sample) ## End(Not run) ## Enrich using hypergeometric test obj <- runHypergeom( object = obj, data = FELLA.sample) obj ## Enrich using diffusion ## Note how the results are added; ## the hypergeometric results are not overwritten obj <- runDiffusion( object = obj, approx = "normality", data = FELLA.sample) obj ## Enrich using PageRank ## Again, this does not overwrite other methods obj <- runPagerank( object = obj, approx = "simulation", data = FELLA.sample) obj ###################### ## Example using the "enrich" wrapper ## Only diffusion obj.wrap <- enrich( compounds = input.sample, method = "diffusion", data = FELLA.sample) obj.wrap ## All the methods obj.wrap <- enrich( compounds = input.sample, methods = FELLA::listMethods(), data = FELLA.sample) obj.wrap
In general, generateResultsTable
, generateEnzymesTable
and generateResultsGraph
provide the results of an enrichment
in several formats.
Function generateResultsTable
returns a table
that contains the best hits from
a FELLA.USER
object
with a successful enrichment analysis.
Similarly, generateEnzymesTable
returns
a data frame with the best scoring enzyme families and their
annotated genes.
Function generateResultsGraph
gives a sub-network, plottable through
plotGraph
, witht the nodes with
the lowest p.score
from an enrichment analysis.
Function addGOToGraph
can be applied to such
sub-networks to overlay GO labels and
similarity to a user-defined GO term.
Function exportResults
is a wrapper around generateResultsTable
,
generateEnzymesTable
and generateResultsGraph
to write the results to files.
generateResultsTable(method = "diffusion", threshold = 0.05, plimit = 15, nlimit = 250, LabelLengthAtPlot = 45, capPscores = 1e-06, object = NULL, data = NULL, ...) generateEnzymesTable(method = "diffusion", threshold = 0.05, nlimit = 250, LabelLengthAtPlot = 45, capPscores = 1e-06, mart.options = list(biomart = "ensembl", dataset = "hsapiens_gene_ensembl"), object = NULL, data = NULL, ...) generateResultsGraph(method = "diffusion", threshold = 0.05, plimit = 15, nlimit = 250, thresholdConnectedComponent = 0.05, LabelLengthAtPlot = 22, object = NULL, data = NULL, ...) exportResults(format = "csv", file = "myOutput", method = "diffusion", object = NULL, data = NULL, ...) addGOToGraph(graph = NULL, GOterm = NULL, godata.options = list(OrgDb = "org.Hs.eg.db", ont = "CC"), mart.options = list(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")) plotGraph(graph = NULL, layout = FALSE, graph.layout = NULL, plotLegend = TRUE, plot.fun = "plot.igraph", NamesAsLabels = TRUE, ...)
generateResultsTable(method = "diffusion", threshold = 0.05, plimit = 15, nlimit = 250, LabelLengthAtPlot = 45, capPscores = 1e-06, object = NULL, data = NULL, ...) generateEnzymesTable(method = "diffusion", threshold = 0.05, nlimit = 250, LabelLengthAtPlot = 45, capPscores = 1e-06, mart.options = list(biomart = "ensembl", dataset = "hsapiens_gene_ensembl"), object = NULL, data = NULL, ...) generateResultsGraph(method = "diffusion", threshold = 0.05, plimit = 15, nlimit = 250, thresholdConnectedComponent = 0.05, LabelLengthAtPlot = 22, object = NULL, data = NULL, ...) exportResults(format = "csv", file = "myOutput", method = "diffusion", object = NULL, data = NULL, ...) addGOToGraph(graph = NULL, GOterm = NULL, godata.options = list(OrgDb = "org.Hs.eg.db", ont = "CC"), mart.options = list(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")) plotGraph(graph = NULL, layout = FALSE, graph.layout = NULL, plotLegend = TRUE, plot.fun = "plot.igraph", NamesAsLabels = TRUE, ...)
method |
one in |
threshold |
Numeric value between 0 and 1.
|
plimit |
Pathway limit, must be a numeric value between 1 and 50.
Limits the amount of pathways in |
nlimit |
Node limit, must be a numeric value between 1 and 1000.
Limits the order of the solution sub-graph when
in |
LabelLengthAtPlot |
Numeric value between 10 and 50. Maximum length that a label can reach when plotting the graph. The remaining characters will be truncated using "..." |
capPscores |
Numeric value, minimum p-score
admitted for the readable
formatting. Smaller p-scores will be displayed
as |
object |
FELLA.USER object |
data |
FELLA.DATA object |
... |
Optional arguments for the plotting function
in |
mart.options |
List, options for the |
thresholdConnectedComponent |
Numeric value between 0 and 1. Connected components that are below the threshold are kept, while the ones exceeding it (because they are too small) are discarded. |
format |
Character, one of: |
file |
Character specifying the output file name |
graph |
An igraph object,
typically a small one,
coming from an enrichment through |
GOterm |
Character, GO entry to draw
semantic similarity in the solution graph.
If |
godata.options |
List, options for the database creator
|
layout |
Logical, should the plot be returned as a layout? |
graph.layout |
Two-column numeric matrix, if this argument is not null then it is used as graph layout |
plotLegend |
Logical, should the legend be plotted as well? |
plot.fun |
Character, can be either
|
NamesAsLabels |
Logical, should KEGG names be displayed as labels instead of KEGG identifiers? |
Functions generateResultsTable
and
generateEnzymesTable
need a
FELLA.DATA
object and a
FELLA.USER
object with a successful enrichment.
generateResultsTable
provides the entries
whose p-score is below the chosen threshold
in a tabular format.
generateEnzymesTable
returns a table
that contains (1) the enzymes that are below the user-defined
p-score threshold, along with (2) the genes that belong to
the enzymatic families in the organism defined in the database,
and (3) GO labels of such enzymes, if mart.options
is
not NULL
and points to the right database.
Function generateResultsGraph
returns an
igraph
object with a relevant sub-network
for manual examination.
A FELLA.USER
object with a successful enrichment analysis and the corresponding
FELLA.DATA
must be supplied.
Graph nodes are prioritised by p.score
and selected through
the most stringent between (1) p.score threshold
and
(2) maximum number of nodes nlimit
.
There is an additional filtering feature for tiny connected components,
controllable through thresholdConnectedComponent
(smaller is stricter).
The user can choose to turn off this filter by setting
thresholdConnectedComponent = 1
.
The idea is to discard connected components so small
that are likely to arise from random selection of nodes.
Let k
be the order of the current sub-network.
A connected component of order r
will
be kept only if the probability that a
random subgraph from the whole KEGG knowledge model
of order k
contains a
connected component of order at least r
is smaller than thresholdConnectedComponent
.
Such probabilities are estimated during
buildDataFromGraph
; the amount of random
trials can be controlled by its niter
argument.
Function exportResults
writes the enrichment results
as the specified filetype.
Options are: a csv table ("csv"
),
an enzyme csv table ("enzyme"
)
an igraph
object as an RData
file,
or any format supported by igraph's
write_graph
.
Function addGOToGraph
takes and returns
a graph object with class
igraph
adding the following attributes:
GO labels in V(graph)$GO
, and
semantic similarities in V(graph)$GO.simil
if
GOterm != NULL
.
The GO database describes genes in terms of three ontologies:
molecular function (MF), biological process (BP) and
cellular component (CC) [Gene Ontology Consortium, 2015].
The user can be interested in finding which enzymatic families
reported with a low p.score
are closest to a particular GO term.
To assess similarity between GO labels, FELLA uses the
semantic similarity defined in [Yu, 2010] and their implementation
in the GOSemSim R package.
The user will obtain, for each enzymatic family, the closest GO
term to his or her GO query and the semantic similarity between them.
Exact matches have a similarity of 1
.
Function plotGraph
detects the presence
of the GO similarity option and plots its magnitude.
Function plotGraph
plots a solution graph from the diffusion and pagerank analysis.
For plotting hypergeom results, please use plot
instead.
Specific colors and shapes for each KEGG category are used:
pathways are maroon, modules are violet, enzymes are orange,
reactions are blue and compounds are green.
If the graph contains the similarity to a GO term, enzymes will
be displayed as triangles whose color depicts the strength of
such measure (yellow: weak, purple: strong).
At the moment, plotGraph
allows plotting
throug the static plot.igraph
and the
interactive tkplot
.
generateResultsTable
returns a
data.frame that contains the nodes below the p.score
threshold
from an enrichment analysis
generateEnzymesTable
returns a
data.frame that contains the enzymes below the p.score
threshold,
along with their genes and GO labels
generateResultsGraph
returns
an igraph
object: a sub-network from the whole
KEGG knowledge model under the specified thresholds
(threshold
and thresholdConnectedComponent
)
exportResults
returns invisible()
,
but as a side effect the specified file
is created.
addGOToGraph
returns
an igraph object,
which is the input graph
with
extra attributes: GO labels in V(graph)$GO
, and
semantic similarities in V(graph)$GO.simil
if
GOterm != NULL
plotGraph
returns
invisible()
if layout = F
and
the plotting layout as a data.frame otherwise.
Gene Ontology Consortium. (2015). Gene ontology consortium: going forward. Nucleic acids research, 43(D1), D1049-D1056.
Yu, G., Li, F., Qin, Y., Bo, X., Wu, Y., & Wang, S. (2010). GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics, 26(7), 976-978.
## First generate a toy enrichment library(igraph) data(FELLA.sample) data(input.sample) ## Enrich input obj <- enrich( compounds = input.sample, data = FELLA.sample) ###################### ## Results table tab.res <- generateResultsTable( method = "hypergeom", threshold = 0.1, object = obj, data = FELLA.sample) head(tab.res) tab.res <- generateResultsTable( method = "diffusion", threshold = 0.1, object = obj, data = FELLA.sample) head(tab.res) ###################### ## Use wrapper to write the table to a file out.file <- tempfile() exportResults( format = "csv", threshold = 0.1, file = out.file, object = obj, data = FELLA.sample) tab.wrap <- read.csv(out.file) head(tab.wrap) ###################### ## Enzymes table tab.ec <- generateEnzymesTable( threshold = 0.1, object = obj, data = FELLA.sample, mart.options = NULL) head(tab.ec) ###################### ## Generate graph g.res <- generateResultsGraph( method = "pagerank", threshold = 0.1, object = obj, data = FELLA.sample) g.res ## Plot graph (without GO terms) plotGraph(g.res) ## Add similarity to the GO CC term "mitochondrion" ## Not run: g.cc <- FELLA:::addGOToGraph( graph = g.res, GOterm = "GO:0005739") ## Plot graph (with GO terms) plotGraph(g.cc) ## Without the CC any(V(g.res)$GO.simil >= 0) ## With the CC v.cc <- unlist(V(g.cc)$GO.simil) sum(v.cc >= 0, na.rm = TRUE) ## Similarity values table(v.cc) ## End(Not run)
## First generate a toy enrichment library(igraph) data(FELLA.sample) data(input.sample) ## Enrich input obj <- enrich( compounds = input.sample, data = FELLA.sample) ###################### ## Results table tab.res <- generateResultsTable( method = "hypergeom", threshold = 0.1, object = obj, data = FELLA.sample) head(tab.res) tab.res <- generateResultsTable( method = "diffusion", threshold = 0.1, object = obj, data = FELLA.sample) head(tab.res) ###################### ## Use wrapper to write the table to a file out.file <- tempfile() exportResults( format = "csv", threshold = 0.1, file = out.file, object = obj, data = FELLA.sample) tab.wrap <- read.csv(out.file) head(tab.wrap) ###################### ## Enzymes table tab.ec <- generateEnzymesTable( threshold = 0.1, object = obj, data = FELLA.sample, mart.options = NULL) head(tab.ec) ###################### ## Generate graph g.res <- generateResultsGraph( method = "pagerank", threshold = 0.1, object = obj, data = FELLA.sample) g.res ## Plot graph (without GO terms) plotGraph(g.res) ## Add similarity to the GO CC term "mitochondrion" ## Not run: g.cc <- FELLA:::addGOToGraph( graph = g.res, GOterm = "GO:0005739") ## Plot graph (with GO terms) plotGraph(g.cc) ## Without the CC any(V(g.res)$GO.simil >= 0) ## With the CC v.cc <- unlist(V(g.cc)$GO.simil) sum(v.cc >= 0, na.rm = TRUE) ## Similarity values table(v.cc) ## End(Not run)
FELLA
is a metabolomics data enrichment tool that contextualises
a list of metabolites using KEGG reactions,
enzymes, modules and pathways [Picart-Armada, 2017].
FELLA
can build knowledge models for the desired organism
from the KEGG database [Kanehisa, 2017].
Once a model is ready, the input for the enrichment is introduced
as a list of affected metabolites (as KEGG IDs).
The output contains a comprehensive biological network layout
that relates relevant pathways to the affected metabolites.
Results are available in network and tabular format.
FELLA
is equipped with a simple graphical interface
for the lay user, deployed through launchApp.
FELLA
relies mainly on the following packages:
KEGGREST for the queries
to the KEGG server [Tenenbaum, 2013],
igraph for the
network support [Csardi, 2006] and
shiny for the
graphical user interface [Chang, 2017].
Methodology:
Picart-Armada, S., Fernandez-Albert, F., Vinaixa, M., Rodriguez, M. A., Aivio, S., Stracker, T. H., Yanes, O., & Perera-Lluna, A. (2017). Null diffusion-based enrichment for metabolomics data. PLOS ONE, 12(12), e0189012.
Database:
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., & Morishima, K. (2017). KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic acids research, 45(D1), D353-D361.
Main dependencies:
Tenenbaum, D. (2013). KEGGREST: Client-side REST access to KEGG. R package version, 1(1).
Csardi, G., & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695(5), 1-9.
Chang, W., Cheng, J., Allaire, JJ., Xie, Y., & McPherson, J. (2017). shiny: Web Application Framework for R. R package version 1.0.5. https://CRAN.R-project.org/package=shiny
## Walkthrough browseVignettes("FELLA") ## I: create database ?buildGraphFromKEGGREST ## II: enrich data ?enrich ## III: export results ?exportResults
## Walkthrough browseVignettes("FELLA") ## I: create database ?buildGraphFromKEGGREST ## II: enrich data ?enrich ## III: export results ?exportResults
An S4 class to represent all the necessary KEGG data
"show"
is an S4 method to show a FELLA.DATA object
## S4 method for signature 'FELLA.DATA' show(object)
## S4 method for signature 'FELLA.DATA' show(object)
object |
A |
show
returns invisible()
keggdata
A D.keggdata
S4 object
hypergeom
A D.hypergeom
S4 object
diffusion
A D.diffusion
S4 object
pagerank
A D.pagerank
S4 object
This FELLA.DATA
object is
a small KEGG graph object.
Despite being a small database that only contains
the two metabolic pathways
hsa00010 - Glycolysis / Gluconeogenesis, and hsa00640 -
Propanoate metabolism,
it is useful to play around with FELLA
's
functions. It is also used for internal testing of this package.
data(FELLA.sample)
data(FELLA.sample)
An object of class FELLA.DATA
of length 1.
A FELLA.DATA
object
Generated from a mid-2017 KEGG release (http://www.genome.jp/kegg/)
data(FELLA.sample)
data(FELLA.sample)
Assigning the value of show to a variable will provide
small data frames with the best scoring pathways (hypergeom
)
and the best nodes in the KEGG network (diffusion
and
pagerank
)
## S4 method for signature 'FELLA.USER' show(object) ## S4 method for signature 'FELLA.USER,missing' plot(x = new("FELLA.USER"), method = "hypergeom", threshold = 0.05, plimit = 15, nlimit = 250, layout = FALSE, thresholdConnectedComponent = 0.05, LabelLengthAtPlot = 22, data = NULL, ...)
## S4 method for signature 'FELLA.USER' show(object) ## S4 method for signature 'FELLA.USER,missing' plot(x = new("FELLA.USER"), method = "hypergeom", threshold = 0.05, plimit = 15, nlimit = 250, layout = FALSE, thresholdConnectedComponent = 0.05, LabelLengthAtPlot = 22, data = NULL, ...)
object |
A |
x |
A |
method |
Character, exactly one of:
|
threshold |
Numeric value between 0 and 1.
|
plimit |
Pathway limit, must be a numeric value between 1 and 50.
Limits the amount of pathways in |
nlimit |
Node limit, must be a numeric value between 1 and 1000.
Limits the order of the solution sub-graph when
in |
layout |
Logical, should the plot be returned as a layout? |
thresholdConnectedComponent |
Numeric value between 0 and 1. Connected components that are below the threshold are kept, while the ones exceeding it (because they are too small) are discarded. |
LabelLengthAtPlot |
Numeric value between 10 and 50. Maximum length that a label can reach when plotting the graph. The remaining characters will be truncated using "..." |
data |
FELLA.DATA object |
... |
Additional arguments passed to plotting functions |
show
invisibly returns a list of data frames with the
best hits for each applied method
plot
returns a layout if layout = T
,
otherwise invisible()
userinput
A U.userinput
S4 object
hypergeom
A U.hypergeom
S4 object
diffusion
A U.diffusion
S4 object
pagerank
A U.pagerank
S4 object
Extractor function for the compounds defined as background
getBackground(object)
getBackground(object)
object |
FELLA.USER object |
Vector of compounds in the background. If this vector is empty, all the compounds are used as background by default.
data(FELLA.sample) data(input.sample) input <- head(input.sample, 12) ## If the background is default, we see an empty vector ## Note that the number of iterations is really small in the example obj <- enrich( compounds = input, method = "diffusion", approx = "simulation", niter = 100, data = FELLA.sample) getBackground(obj) ## Otherwise we see the background compounds that mapped to the graph obj <- enrich( compounds = input, compoundsBackground = input.sample, method = "diffusion", approx = "simulation", niter = 100, data = FELLA.sample) getBackground(obj)
data(FELLA.sample) data(input.sample) input <- head(input.sample, 12) ## If the background is default, we see an empty vector ## Note that the number of iterations is really small in the example obj <- enrich( compounds = input, method = "diffusion", approx = "simulation", niter = 100, data = FELLA.sample) getBackground(obj) ## Otherwise we see the background compounds that mapped to the graph obj <- enrich( compounds = input, compoundsBackground = input.sample, method = "diffusion", approx = "simulation", niter = 100, data = FELLA.sample) getBackground(obj)
Extractor function for all the nodes from a level/community of KEGG graph
getCom(data, level, format = "name")
getCom(data, level, format = "name")
data |
FELLA.DATA object |
level |
Desired level, can be coded as a number or a character: 1 or "pathway"; 2 or "module"; 3 or "enzyme"; 4 or "reaction"; 5 or "compound". |
format |
Format of the output, "name" returns KEGG IDs whereas "id" returns vertices IDs |
Vector of the names/ids of the desired KEGG graph community
data(FELLA.sample) ## Pathways getCom(FELLA.sample, 1, format = "name") getCom(FELLA.sample, 1, format = "id") ## Modules getCom(FELLA.sample, 2) ## Enzymes head(getCom(FELLA.sample, 3)) ## Reactions head(getCom(FELLA.sample, 4)) ## Compounds head(getCom(FELLA.sample, 5))
data(FELLA.sample) ## Pathways getCom(FELLA.sample, 1, format = "name") getCom(FELLA.sample, 1, format = "id") ## Modules getCom(FELLA.sample, 2) ## Enzymes head(getCom(FELLA.sample, 3)) ## Reactions head(getCom(FELLA.sample, 4)) ## Compounds head(getCom(FELLA.sample, 5))
Extractor function for the compounds in the input that were not mapped to the KEGG graph
getExcluded(object)
getExcluded(object)
object |
FELLA.USER object |
Vector of the excluded compounds
data(FELLA.sample) data(input.sample) ## No excluded compounds obj <- defineCompounds( compounds = input.sample, data = FELLA.sample) getExcluded(obj) ## One compound does not map ## The user gets a warning as well obj <- defineCompounds( compounds = c(input.sample, "intruder"), data = FELLA.sample) getExcluded(obj)
data(FELLA.sample) data(input.sample) ## No excluded compounds obj <- defineCompounds( compounds = input.sample, data = FELLA.sample) getExcluded(obj) ## One compound does not map ## The user gets a warning as well obj <- defineCompounds( compounds = c(input.sample, "intruder"), data = FELLA.sample) getExcluded(obj)
Extractor function for the KEGG graph from the FELLA.DATA object
getGraph(data)
getGraph(data)
data |
FELLA.DATA object |
KEGG graph as an igraph object
data(FELLA.sample) g <- getGraph(FELLA.sample) class(g)
data(FELLA.sample) g <- getGraph(FELLA.sample) class(g)
Extractor function for the info about the KEGG version used to build the FELLA.DATA object
getInfo(data)
getInfo(data)
data |
FELLA.DATA object |
Character containing the KEGG release details
data(FELLA.sample) getInfo(FELLA.sample)
data(FELLA.sample) getInfo(FELLA.sample)
Extractor function for the metabolites specified by the user in the input
getInput(object)
getInput(object)
object |
FELLA.USER object |
Vector of metabolites in the input
data(FELLA.sample) data(input.sample) ## No excluded compounds: the input is recovered as is obj <- defineCompounds( compounds = input.sample, data = FELLA.sample) i1 <- getInput(obj) ## One compound does not map: the input contains only the mapped entities obj <- defineCompounds( compounds = c(input.sample, "intruder"), data = FELLA.sample) i2 <- getInput(obj) identical(sort(i1), sort(i2))
data(FELLA.sample) data(input.sample) ## No excluded compounds: the input is recovered as is obj <- defineCompounds( compounds = input.sample, data = FELLA.sample) i1 <- getInput(obj) ## One compound does not map: the input contains only the mapped entities obj <- defineCompounds( compounds = c(input.sample, "intruder"), data = FELLA.sample) i2 <- getInput(obj) identical(sort(i1), sort(i2))
Extractor function for the matrices of hypergeometric, diffusion and PageRank methodologies
getMatrix(data, method)
getMatrix(data, method)
data |
FELLA.DATA object |
method |
Character, exactly one of:
|
Matrix for the desired methodology (internal usage)
## This function is internal attach(environment(FELLA:::getMatrix)) data(FELLA.sample) # When a matrix is loaded: x <- getMatrix(FELLA.sample, "hypergeom") dim(x) # When it is not: y <- getMatrix(FELLA.sample, "diffusion") dim(y) y
## This function is internal attach(environment(FELLA:::getMatrix)) data(FELLA.sample) # When a matrix is loaded: x <- getMatrix(FELLA.sample, "hypergeom") dim(x) # When it is not: y <- getMatrix(FELLA.sample, "diffusion") dim(y) y
Map KEGG identifiers to KEGG names, multiple names for an ID are reported if annotated. The KEGG identifiers may have mixed levels.
getName(data, id)
getName(data, id)
data |
FELLA.DATA object |
id |
KEGG IDs whose name is desired |
List whose names are KEGG IDs and whose entries are the vectors of matches
data(FELLA.sample) getName(FELLA.sample, c("C00002", "C00040"))
data(FELLA.sample) getName(FELLA.sample, c("C00002", "C00040"))
Extractor function for the p-scores using the desired methodology
getPscores(object, method)
getPscores(object, method)
object |
FELLA.USER object |
method |
Character, exactly one of:
|
Named vector of p-scores
data(FELLA.sample) data(input.sample) obj <- enrich( compounds = input.sample, data = FELLA.sample) p <- getPscores(obj, "diffusion") sum(p < 0.1)
data(FELLA.sample) data(input.sample) obj <- enrich( compounds = input.sample, data = FELLA.sample) p <- getPscores(obj, "diffusion") sum(p < 0.1)
Extractor function for the slot "status" for the KEGG data
getStatus(data)
getStatus(data)
data |
FELLA.DATA object |
Slot "status" (internal usage)
## This function is internal data(FELLA.sample) ## Is the object loade? FELLA:::getStatus(FELLA.sample) FELLA:::getStatus(new("FELLA.DATA"))
## This function is internal data(FELLA.sample) ## Is the object loade? FELLA:::getStatus(FELLA.sample) FELLA:::getStatus(new("FELLA.DATA"))
Extractor function for rowSums/squaredRowSums
getSums(data, method, squared)
getSums(data, method, squared)
data |
FELLA.DATA object |
method |
Character, exactly one of:
|
squared |
Logical, whether to return
|
Named vector with rowSums/squaredRowSums (internal usage)
## This function is internal attach(environment(FELLA:::getSums)) data(FELLA.sample) rowsums <- getSums(FELLA.sample, "diffusion", squared = FALSE) hist(rowsums)
## This function is internal attach(environment(FELLA:::getSums)) data(FELLA.sample) rowsums <- getSums(FELLA.sample, "diffusion", squared = FALSE) hist(rowsums)
Extractor function for the slot "valid"
getValid(object, method)
getValid(object, method)
object |
FELLA.USER object |
method |
Character, exactly one of:
|
Slot "valid" (internal usage)
## This function is internal data(FELLA.sample) data(input.sample) obj <- enrich( compounds = input.sample, method = "diffusion", data = FELLA.sample) ## If the analysis is valid FELLA:::getValid(obj, "diffusion") ## Otherwise FELLA:::getValid(new("FELLA.USER"), "diffusion") FELLA:::getValid(obj, "pagerank")
## This function is internal data(FELLA.sample) data(input.sample) obj <- enrich( compounds = input.sample, method = "diffusion", data = FELLA.sample) ## If the analysis is valid FELLA:::getValid(obj, "diffusion") ## Otherwise FELLA:::getValid(new("FELLA.USER"), "diffusion") FELLA:::getValid(obj, "pagerank")
This character
vector object has been
generated using the
sample data in the object FELLA.sample
.
The KEGG compounds have been chosen
with preference for the hsa00640 pathway,
so that the enrichment results choose
pathway hsa00640 over hsa00010.
data(input.sample)
data(input.sample)
An object of class character
of length 30.
A character
vector containing 30 KEGG IDs
Generated from a mid-2017 KEGG release (http://www.genome.jp/kegg/)
data(input.sample)
data(input.sample)
Is x
a FELLA.DATA
object?
is.FELLA.DATA(x = NULL)
is.FELLA.DATA(x = NULL)
x |
Object to check |
Logical value stating if x
is a FELLA.DATA
object
data(FELLA.sample) is.FELLA.DATA(FELLA.sample) is.FELLA.DATA(42)
data(FELLA.sample) is.FELLA.DATA(FELLA.sample) is.FELLA.DATA(42)
Is x
a FELLA.USER
object?
is.FELLA.USER(x = NULL)
is.FELLA.USER(x = NULL)
x |
Object to check |
Logical value stating if x
is a FELLA.USER
object
is.FELLA.USER(new("FELLA.USER")) is.FELLA.USER(42) data(FELLA.sample) data(input.sample) obj <- enrich( compounds = input.sample, method = "diffusion", data = FELLA.sample) is.FELLA.USER(obj)
is.FELLA.USER(new("FELLA.USER")) is.FELLA.USER(42) data(FELLA.sample) data(input.sample) obj <- enrich( compounds = input.sample, method = "diffusion", data = FELLA.sample) is.FELLA.USER(obj)
launchApp
deploys a shiny application
to perform the metabolomics data enrichment.
Although this app does not provide all the options available in
FELLA
, it is easily accessible for the lay user.
launchApp(...)
launchApp(...)
... |
Parameters passed to |
The graphical interface allows to: (1) upload the data and
check if the KEGG ids have successfully mapped,
(2) select database, set analysis and graphical parameters,
(3) interactively browse the resulting sub-network as a graph or
as a table, and (4) export such results as a table or a network.
At least one database is needed before deploying the app.
See ?buildDataFromGraph
for further details.
invisible()
, but as a side effect the app will be launched
## Not run: r <- try(launchApp()) ## End(Not run)
## Not run: r <- try(launchApp()) ## End(Not run)
Available approximations for the analysis
listApprox()
listApprox()
Character vector
listApprox()
listApprox()
Node categories used in the internal representations
listCategories()
listCategories()
Character vector
listCategories()
listCategories()
This function lists the directories in the local database path
listInternalDatabases(full.names = FALSE)
listInternalDatabases(full.names = FALSE)
full.names |
Logical, should full paths be returned? |
Vector with database directories
listInternalDatabases()
listInternalDatabases()
Methods available for the analysis
listMethods()
listMethods()
Character vector
listMethods()
listMethods()
This function adds a legend to a solution plot. It can include the CC similarity.
plotLegend(GO.annot = FALSE, cex = 0.75)
plotLegend(GO.annot = FALSE, cex = 0.75)
GO.annot |
Logical, should GO annotations be included? |
cex |
Numeric value, |
This function is only used for its effect,
so it returns invisible()
## This function is internal library(igraph) g <- barabasi.game(20) plot(g) FELLA:::plotLegend() plot(g) FELLA:::plotLegend(GO.annot = TRUE)
## This function is internal library(igraph) g <- barabasi.game(20) plot(g) FELLA:::plotLegend() plot(g) FELLA:::plotLegend(GO.annot = TRUE)
An internal S4 class for the user data of the diffusion enrichment analysis
valid
Logical value; is the analysis valid?
pscores
Named numeric vector with p-scores
approx
Character; which approximation was used? Can be "simulation" for Monte Carlo; "normality", "gamma" or "t" for parametric approaches
niter
Numeric value, number of iterations for the simulated approach
An internal S4 class for the user data of the hypergeometric over representation analysis
valid
Logical value; is the analysis valid?
pvalues
Named numeric vector with p-values
pathhits
Numeric named vector with the quantities "sample_success" for the hypergeometric distribution (#affected in path)
pathbackground
Numeric named vector with the quantities "total_success" for the hypergeometric distribution (total in path)
nbackground
Numeric value, number of compoudns in the background. Equivalently, number of rows for the hypergeometric binary matrix
ninput
Numeric value, number of affected compounds matched to the rownames
An internal S4 class for the user data of the PageRank enrichment analysis
valid
Logical value; is the analysis valid?
pscores
Named numeric vector with p-scores
approx
Character; which approximation was used? Can be "simulation" for Monte Carlo; "normality", "gamma" or "t" for parametric approaches
niter
Numeric value, number of iterations for the simulated approach
An internal S4 class for the user input data
metabolites
Character vector containing the affected compounds
metabolitesbackground
Character vector containing
the compounds for the personalised background.
Optionally, can be NULL
for default background
excluded
Character vector containing the compounds that have been excluded because they cannot be mapped to KEGG graph compounds