Title: | Creation and Analysis of Pathway Expression Profiles (PEPs) |
---|---|
Description: | Pathway Expression Profiles (PEPs) are based on the expression of pathways (defined as sets of genes) as opposed to individual genes. This package converts gene expression profiles to PEPs and performs enrichment analysis of both pathways and experimental conditions, such as "drug set enrichment analysis" and "gene2drug" drug discovery analysis respectively. |
Authors: | Francesco Napolitano <[email protected]> |
Maintainer: | Francesco Napolitano <[email protected]> |
License: | GPL-3 |
Version: | 1.27.0 |
Built: | 2024-10-30 08:02:43 UTC |
Source: | https://github.com/bioc/gep2pep |
Pathway Expression Profiles (PEPs) are based on the expression of
pathways (or generic gene sets) belonging to a collection, as
opposed to individual genes. gep2pep
supports the conversion
of gene expression profiles (GEPs) to PEPs and performs enrichment
analysis of both pathways and conditions.
gep2pep
creates a local repository of gene sets, which can
also be imported from the MSigDB [1] database. The local repository
is in the repo
format. When a GEP, defined as a ranked list
of genes, is passed to buildPEPs
, the stored database
of pathways is used to convert the GEP to a PEP and permanently
store the latter.
One type of analysis that can be performed on PEPs and that is
directly supported by gep2pep
is the Drug-Set Enrichment
Analysis (DSEA [2]). It finds pathways that are consistently
dysregulated by a set of drugs, as opposed to a background of other
drugs. Of course PEPs may refer to non-pharmacological conditions
(genetic perturbations, disease states, cell types, etc.) for
analogous analyses. See CondSEA
function.
A complementary approach is that of finding conditions that
consistently dysregulate a set of pathways. This is the
pathway-based version of the Gene Set Enrichment Analysis
(GSEA). As an application example, this approach can be used to
find drugs mimicking the dysregulation of a gene by looking for
drugs dysregulating pathways involving the gene (this has been
published as the gene2drug
tool [3]). See
PathSEA
.
Both DSEA and gene2drug analyses can be performed using
preprocessed data from
http://dsea.tigem.it/downloads.php. The data include
Connectivity Map [4] GEPs (drug-induced gene expression profiles)
converted to PEPs in the form of a gep2pep
repository.
Naming conventions:
pathway: any set of gene identifiers (not necessarily representing a molecular pathway).
pathway collection: a set of pathways.
pathway database: a set of pathway collections, like the MSigDB.
Gene Expression Profile (GEP): a named vector where names are gene identifiers of the same type as those in the pathway database and elements are ranks ranging from 1 to the number of genes.
Pathway Expression Profile (PEP): a ranked list of pathways, as converted from a GEP according to a pathway collection.
condition: any transcriptomic-modelled biological state (drug treatment, gene knock-out, disease state, cell type, etc.) characterized by an induced GEP and therefore a PEP.
gep2pep repository: a pathway database and possibly a
related database of PEPs as created by the gep2pep
package. It is implemented in repo
format.
Francesco Napolitano [email protected]
[1] Subramanian A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS 102, 15545-15550 (2005).
[2] Napolitano F. et al, Drug-set enrichment analysis: a novel tool to investigate drug mode of action. Bioinformatics 32, 235-241 (2016).
[3] Napolitano, F. et al. gene2drug: a computational tool for pathway-based rational drug repositioning. Bioinformatics (2017). https://doi.org/10.1093/bioinformatics/btx800
[4] Lamb, J. et al. The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease. Science 313, 1929-1935 (2006).
This function can be used to add single-gene (as opposed to pathway) -based collections. Sets including a single gene don't need to go through normal Kolmogorov-Smirnov statistic computation and are treated differently for performance.
addSingleGeneSets(rp, genes, organism = "Homo Sapiens")
addSingleGeneSets(rp, genes, organism = "Homo Sapiens")
rp |
A repository created by |
genes |
a character vector containing the gene names. For each
og them a single-gene |
organism |
Character vector used to annotate the created sets. "Homo Sapiens" by default. |
Enrichment Scores and p-values for sets including a single gene are
computed with dedicated (fast) routines. Although a statistic based
on a single gene is not efficient per se, it is useful to have data
in the same format as pathway-based profiles. buildPEPs
internally calls single gene dedicated routines whenever a gene set
collection is tagged (see repo function tag
) with "SGE"
("Single Gene Expression"), which is done automatically by
addSingleGeneSets
. In that case, the min_size
parameter is ignored.
Nothing, used for side effects.
buildPEPs
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) ## The following will create PEPs in 2 separate files geps <- loadSampleGEP() addSingleGeneSets(rp, rownames(geps)) unlink(repo_path, TRUE)
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) ## The following will create PEPs in 2 separate files geps <- loadSampleGEP() addSingleGeneSets(rp, rownames(geps)) unlink(repo_path, TRUE)
Converts GeneSetCollection objects to CategorizedCollection objects.
as.CategorizedCollection(GScollection, category = "uncategorized", subCategory = "uncategorized")
as.CategorizedCollection(GScollection, category = "uncategorized", subCategory = "uncategorized")
GScollection |
An object of class |
category |
The name of the category that all the gene sets will be assigned to (see details). |
subCategory |
The name of the subcategory that all the gene sets will be assigned to (see details). |
This function sets the CollectionType
for each set
in the collection to CategorizedCollection. If
GScollection
contains BroadCollection
gene sets,
their fields category
and subcategory
will be
used. Otherwise the category
and subcategory
fields will be used.
A CategorizedCollection object
## Not run: ## To run this example, first obtain the MSigDB database in XML ## format (see ## http://software.broadinstitute.org/gsea/downloads.jsp). It is ## assumed that the database is locally available as the file ## "msigdb_v6.0.xml". The \code{importMSigDB.xml} function is just a shortcut to the following: db <- getBroadSets("msigdb_v6.1.xml") db <- as.CategorizedCollection(db) ## The database is now in an acceptable format to create a local ## repository using createRepository ## End(Not run) ## A small sample of the MSigDB as imported by importMSigDB.xml is ## included in gep2pep. The following creates (and deletes) a ## gep2pep repository. db_sample <- loadSamplePWS() ## The function \code{as.CategorizedCollection} can also be used to ## create arbitrary gene set collections specifying the categories ## and subcategories once for all the sets: library(GSEABase) mysets <- as.CategorizedCollection( GeneSetCollection( list(GeneSet(c("g1", "g2"), setName="set1"), GeneSet(c("g3", "g4"), setName="set2")) ), category="mycategory", subCategory="mysubcategory" ) newCollection <- GeneSetCollection(c(db_sample, mysets)) ## The created repository will include both the sample gene sets ## and the two sets just created: repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, newCollection) ## removing temporary repository unlink(repo_path, TRUE)
## Not run: ## To run this example, first obtain the MSigDB database in XML ## format (see ## http://software.broadinstitute.org/gsea/downloads.jsp). It is ## assumed that the database is locally available as the file ## "msigdb_v6.0.xml". The \code{importMSigDB.xml} function is just a shortcut to the following: db <- getBroadSets("msigdb_v6.1.xml") db <- as.CategorizedCollection(db) ## The database is now in an acceptable format to create a local ## repository using createRepository ## End(Not run) ## A small sample of the MSigDB as imported by importMSigDB.xml is ## included in gep2pep. The following creates (and deletes) a ## gep2pep repository. db_sample <- loadSamplePWS() ## The function \code{as.CategorizedCollection} can also be used to ## create arbitrary gene set collections specifying the categories ## and subcategories once for all the sets: library(GSEABase) mysets <- as.CategorizedCollection( GeneSetCollection( list(GeneSet(c("g1", "g2"), setName="set1"), GeneSet(c("g3", "g4"), setName="set2")) ), category="mycategory", subCategory="mysubcategory" ) newCollection <- GeneSetCollection(c(db_sample, mysets)) ## The created repository will include both the sample gene sets ## and the two sets just created: repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, newCollection) ## removing temporary repository unlink(repo_path, TRUE)
Given a matrix of ranked lists of genes (GEPs) and a gep2pep
repository, converts GEPs to PEPs and stores the latter in the
repository.
buildPEPs(rp, geps, min_size = 3, max_size = 500, parallel = FALSE, collections = "all", replace_existing = FALSE, donotstore = FALSE, progress_bar = TRUE, rawmode_id = NULL, rawmode_outdir = file.path(rp$root(), "raw"))
buildPEPs(rp, geps, min_size = 3, max_size = 500, parallel = FALSE, collections = "all", replace_existing = FALSE, donotstore = FALSE, progress_bar = TRUE, rawmode_id = NULL, rawmode_outdir = file.path(rp$root(), "raw"))
rp |
A repository created by |
geps |
A matrix of ranks where each row corresponds to a gene and each column to a condition. Each column must include all ranks from 1 to the number of rows. Row and column names must be defined. Row names will be matched against gene identifiers in the pathways collections, and unrecognized gene names will not be used. |
min_size |
An integer representing the minimum number of genes
that must be included in a set before the KS statistic is
computed. Smaller gene sets will get ES=NA and p=NA. Default is
3. Ignored for SGE mode (see |
max_size |
An integer representing the maximum number of genes that must be included in a set before the KS statistic is computed. Larger gene sets will get ES=NA and p=NA. Default is 500. |
parallel |
If TRUE, gene sets will be processed in parallel. Requires a parallel backend. |
collections |
A subset of the collection names returned by
|
replace_existing |
What to do if PEPs, identified by column
names of |
donotstore |
Just compute and return the pathway-based profiles without storing them in the repository. The repository is still required to load pathway data, however it will not be modified. |
progress_bar |
If set to TRUE (default) will show a progress
bar updated after coversion of each column of |
rawmode_id |
An integer to be appended to files produced in raw mode (see details). If set to NULL (default), raw mode is turned off. |
rawmode_outdir |
A charater vector specifying the destination
path for files produced in raw mode (by the fault it is
ROOT/raw, where ROOT is the root of the repository). Ignored if
|
By deault, output is written to the repository as new
items named using the collection name. However, it is possible
to avoid the repository and write the output to regular files
turning 'raw mode' on through the rawmode_id
and
rawmode_outdir
parameters. This is particuarly useful
when dealing with very large corpora of GEPs, and conversions
are split into independent jobs submitted to a scheduler. At
the end, the data will need to be reconstructed and put into
the repository using importFromRawMode
in order to
perform CondSEA
or PathSEA
analysis.
Nothing. The computed PEPs will be available in the repository.
buildPEPs
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) ## Repo root created. ## Repo created. ## [15:45:06] Storing pathway data for collection: c3_TFT ## [15:45:06] Storing pathway data for collection: c3_MIR ## [15:45:06] Storing pathway data for collection: c4_CGN rp ## ID Dims Size ## c3_TFT_sets 10 18.16 kB ## c3_MIR_sets 10 17.25 kB ## c4_CGN_sets 10 6.9 kB ## Loading sample gene expression profiles geps <- loadSampleGEP() geps[1:3,1:3] ## (+)_chelidonine (+)_isoprenaline (+/_)_catechin ## AKT3 88 117 417 ## MED6 357 410 34 ## NR2E3 383 121 453 buildPEPs(rp, geps) rp ## ID Dims Size ## c3_TFT_sets 10 18.16 kB ## c3_MIR_sets 10 17.25 kB ## c4_CGN_sets 10 6.9 kB ## c3_TFT 2 1.07 kB ## c3_MIR 2 1.07 kB ## c4_CGN 2 1.04 kB unlink(repo_path, TRUE)
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) ## Repo root created. ## Repo created. ## [15:45:06] Storing pathway data for collection: c3_TFT ## [15:45:06] Storing pathway data for collection: c3_MIR ## [15:45:06] Storing pathway data for collection: c4_CGN rp ## ID Dims Size ## c3_TFT_sets 10 18.16 kB ## c3_MIR_sets 10 17.25 kB ## c4_CGN_sets 10 6.9 kB ## Loading sample gene expression profiles geps <- loadSampleGEP() geps[1:3,1:3] ## (+)_chelidonine (+)_isoprenaline (+/_)_catechin ## AKT3 88 117 417 ## MED6 357 410 34 ## NR2E3 383 121 453 buildPEPs(rp, geps) rp ## ID Dims Size ## c3_TFT_sets 10 18.16 kB ## c3_MIR_sets 10 17.25 kB ## c4_CGN_sets 10 6.9 kB ## c3_TFT 2 1.07 kB ## c3_MIR 2 1.07 kB ## c4_CGN 2 1.04 kB unlink(repo_path, TRUE)
See CategorizedCollection-class
.
CategorizedCollection(category = "uncategorized", subCategory = "uncategorized")
CategorizedCollection(category = "uncategorized", subCategory = "uncategorized")
category |
A character defining the main category that the gene set belongs to. |
subCategory |
A character defining the secondary category that the gene set belongs to. |
An object of class CategorizedCollection
.
library(GSEABase) gs1 <- GeneSet(setName="set1", setIdentifier="101") collectionType(gs1) <- CategorizedCollection()
library(GSEABase) gs1 <- GeneSet(setName="set1", setIdentifier="101") collectionType(gs1) <- CategorizedCollection()
This class is a simple generalization of the
BroadCollection
function of GSEABase
to store gene
sets having assigned categories and subcategories that can be
different from those of the MSigDB.
category
A character defining the main category that the gene set belongs to.
subCategory
A character defining the secondary category that the gene set belongs to.
Check both repository data consistency (see repo_check
from
the repo
package) and specific gep2pep data consistency.
checkRepository(rp)
checkRepository(rp)
rp |
A repository created by |
Nothing.
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) checkRepository(rp) unlink(repo_path, TRUE)
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) checkRepository(rp) unlink(repo_path, TRUE)
Clear cached ranked matrices
clearCache(rp_peps)
clearCache(rp_peps)
rp_peps |
A repository created with
|
This will clear everything in the repository tagged with
"stashed", which by default includes only matrices ranked by some
gep2pep functions such as CondSEA
.
Nothing, used for side effects
CondSEA
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) geps <- loadSampleGEP() buildPEPs(rp, geps) pgset <- c("(+)_chelidonine", "(+/_)_catechin") psea <- CondSEA(rp, pgset, usecache=TRUE) ## the repository contains cached data print(rp, all=TRUE) clearCache(rp) unlink(repo_path, TRUE)
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) geps <- loadSampleGEP() buildPEPs(rp, geps) pgset <- c("(+)_chelidonine", "(+/_)_catechin") psea <- CondSEA(rp, pgset, usecache=TRUE) ## the repository contains cached data print(rp, all=TRUE) clearCache(rp) unlink(repo_path, TRUE)
Condition Set Enrichment Analysis (CondSEA) can be seen as a Gene-SEA performed over rows (as opposed to columns) of a matrix of GEPs. It tells how much a pathway is consistently dysregulated under a set of conditions (such as a set of drug treatments, disease states, cell types, etc.) when compared to a statistical background of other conditions.
CondSEA(rp_peps, pgset, bgset = "all", collections = "all", details = TRUE, rankingFun = rankPEPsByRows.ES, usecache = FALSE, sortoutput = TRUE)
CondSEA(rp_peps, pgset, bgset = "all", collections = "all", details = TRUE, rankingFun = rankPEPsByRows.ES, usecache = FALSE, sortoutput = TRUE)
rp_peps |
A repository created with
|
pgset |
A vector of names of conditions. Corresponding PEPs
must exist in all the pathway collections currently in
|
bgset |
The background against which to compare
|
collections |
A subset of the collection names returned by
|
details |
If TRUE (default) rank details will be reported for
each condition in |
rankingFun |
The function used to rank PEPs column-wise. By
default |
usecache |
If set to TRUE, the computed ranked matrix will be stored in the the repository (see details). FALSE by default. |
sortoutput |
If TRUE (default) the output gene sets will be sorted in order of increasing p-value. |
For each pathway, all conditions are ranked by how much
they dysregulate it (from the most UP-regulating to the most
DOWN-regulating). Then, a Kolmogorov-Smirnov (KS) test is performed
to compare the ranks assigned to conditions in pgset
against
the ranks assigned to conditions in bgset
. A positive
(negative) Enrichment Score (ES) of the KS test indicates whether
each pathway is UP- (DOWN-) regulated by pgset
as compared
to bgset
. A p-value is associated to the ES.
When PEPs are obtained from drug-induced gene expression profiles,
PathSEA
is the Drug-Set Enrichment Analysis [1].
The rankingFun
must take in input PEPs like those loaded
from the repository and return a matrix of row-wise ranks. Each row
must contains ranks from 1 to the number of PEPs minus the number
of NAs in the row.
When usecache=TRUE
, the ranked matrix is permanently stored
in HDF5 format, and subsequent calls to CondSEA
will load
from the disk the necessary ranks (not the whole matrix). The
correct cached data is identified by the alphabetically sorted set
union(pgset, bgset)
, by the collection name, and by the
ranking function. Additional alls to CondSEA with variations of
these inputs will create additional cache. Cached data is hidden in
the repository by default and can be printed with
rp_peps$print(all=TRUE)
, and cleared with
clearCache(rp_peps)
.
A list of 2, by names "CondSEA" and "details". The
"CondSEA" entry is a 2-columns matrix including ESs and p-values
(see details) for each pathway database and condition. The
"details" entry reports the rank of each condition in pgset
for each pathway.
[1] Napolitano F. et al, Drug-set enrichment analysis: a novel tool to investigate drug mode of action. Bioinformatics 32, 235-241 (2016).
getResults, getDetails, clearCache
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) geps <- loadSampleGEP() buildPEPs(rp, geps) pgset <- c("(+)_chelidonine", "(+/_)_catechin") psea <- CondSEA(rp, pgset) res <- getResults(psea, "c3_TFT") ## getting the names of the top pathways setId2setName(loadCollection(rp, "c3_TFT"), rownames(res)) unlink(repo_path, TRUE)
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) geps <- loadSampleGEP() buildPEPs(rp, geps) pgset <- c("(+)_chelidonine", "(+/_)_catechin") psea <- CondSEA(rp, pgset) res <- getResults(psea, "c3_TFT") ## getting the names of the top pathways setId2setName(loadCollection(rp, "c3_TFT"), rownames(res)) unlink(repo_path, TRUE)
Merge multiple PEPs to build a repository of consensus PEPs
createMergedRepository(rpIn_path, rpOut_path, mergestr, progressBar = TRUE, collections = "all")
createMergedRepository(rpIn_path, rpOut_path, mergestr, progressBar = TRUE, collections = "all")
rpIn_path |
path to existing gep2pep repository |
rpOut_path |
path where the new merged repository will be created |
mergestr |
a named list of character vectors, each one including a set of PEP names. For each list entry, a consensus PEP will be created and assigned the entry name. |
progressBar |
if TRUE, show a progress bar |
collections |
A subset of the collection names returned by
|
The merging is performed as follows. Given N PEPs, the corresponding consensus PEP will get as enrichement score the average enrichment scores of the N PEPs, and as p-value the composition of the N PEP p-values by Fisher's method.
Nothing, used for side effects.
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) geps <- loadSampleGEP() buildPEPs(rp, geps) mergestr <- list( che_iso = c("(+)_chelidonine", "(+)_isoprenaline"), cat_mk = c("(+/_)_catechin", "(_)_mk_801") ) merged_path <- file.path(tempdir(), "gep2pepTempMerged") createMergedRepository(repo_path, merged_path, mergestr) unlink(repo_path, TRUE) unlink(merged_path, TRUE)
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) geps <- loadSampleGEP() buildPEPs(rp, geps) mergestr <- list( che_iso = c("(+)_chelidonine", "(+)_isoprenaline"), cat_mk = c("(+/_)_catechin", "(_)_mk_801") ) merged_path <- file.path(tempdir(), "gep2pepTempMerged") createMergedRepository(repo_path, merged_path, mergestr) unlink(repo_path, TRUE) unlink(merged_path, TRUE)
Given a database of collections, stores them in a local repository
to be used by gep2pep
functions.
createRepository(path, sets, name = NULL, description = NULL)
createRepository(path, sets, name = NULL, description = NULL)
path |
Path to a non-existing directory where the repository will be created. |
sets |
An object of class |
name |
Name of the repository. Defaults to |
description |
Description of the repository. If NULL (default), a generic description will be given. |
sets
can be created by
importMSigDB.xml
or using GSEABase
GeneSetCollection
class and then converting it to
CategorizedCollection. See examples.
An object of class repo
that can be passed to
gep2pep
functions.
buildPEPs
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) ## Repo root created. ## Repo created. ## [15:45:06] Storing pathway data for collection: c3_TFT ## [15:45:06] Storing pathway data for collection: c3_MIR ## [15:45:06] Storing pathway data for collection: c4_CGN rp ## ID Dims Size ## c3_TFT_sets 10 18.16 kB ## c3_MIR_sets 10 17.25 kB ## c4_CGN_sets 10 6.9 kB unlink(repo_path, TRUE)
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) ## Repo root created. ## Repo created. ## [15:45:06] Storing pathway data for collection: c3_TFT ## [15:45:06] Storing pathway data for collection: c3_MIR ## [15:45:06] Storing pathway data for collection: c4_CGN rp ## ID Dims Size ## c3_TFT_sets 10 18.16 kB ## c3_MIR_sets 10 17.25 kB ## c4_CGN_sets 10 6.9 kB unlink(repo_path, TRUE)
The XLS output includes the full CondSEA or PathSEA results,
together with additional gene set information for the CondSEA. If
the PathSEA or CondSEA analysis was performed with
details=TRUE
, details will be reported in the XLS file. This
function requires the WriteXLS library.
exportSEA(rp, results, outname = NULL)
exportSEA(rp, results, outname = NULL)
rp |
A repository created by |
results |
The output of |
outname |
Name of the XLS file to be created. |
Nothing.
CondSEA, PathSEA
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) geps <- loadSampleGEP() buildPEPs(rp, geps) pgset <- c("(+)_chelidonine", "(+/_)_catechin") psea <- CondSEA(rp, pgset) ## Not run: exportSEA(rp, psea) ## End(Not run) unlink(repo_path, TRUE)
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) geps <- loadSampleGEP() buildPEPs(rp, geps) pgset <- c("(+)_chelidonine", "(+/_)_catechin") psea <- CondSEA(rp, pgset) ## Not run: exportSEA(rp, psea) ## End(Not run) unlink(repo_path, TRUE)
Given a gene, find the set of pathways that involve it in each
collection of the repository. This can be used to define a set of
pathways for the PathSEA
.
gene2pathways(rp, genes, and = TRUE)
gene2pathways(rp, genes, and = TRUE)
rp |
A repository created by |
genes |
A vector of gene identifiers of the same type as that used to create the repository. |
and |
If set to TRUE (default), will return sets containing
all of |
A database of pathways suitable as input to
PathSEA
.
createRepository, PathSEA
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) ## Finding all pathways containing "FAM126A": subpw <- gene2pathways(rp, "FAM126A") print(names(subpw)) unlink(repo_path, TRUE)
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) ## Finding all pathways containing "FAM126A": subpw <- gene2pathways(rp, "FAM126A") print(names(subpw)) unlink(repo_path, TRUE)
Given a gep2pep
repository, returns the names of the
stored collections by looking at appropriate repository item
names.
getCollections(rp)
getCollections(rp)
rp |
A repository created by |
Each collection in a database has a "category" and a
"subcategory" assigned, which are used to build the collection
identifier as "category_subcategory". This function obtains the
identifiers by looking at data stored in the repository
rp
(entries that are tagged with "sets").
Vector of collection names (see details).
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) ## Repo root created. ## Repo created. ## [15:45:06] Storing pathway data for collection: c3_TFT ## [15:45:06] Storing pathway data for collection: c3_MIR ## [15:45:06] Storing pathway data for collection: c4_CGN getCollections(rp) ## [1] "c3_TFT" "c3_MIR" "c4_CGN" unlink(repo_path, TRUE)
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) ## Repo root created. ## Repo created. ## [15:45:06] Storing pathway data for collection: c3_TFT ## [15:45:06] Storing pathway data for collection: c3_MIR ## [15:45:06] Storing pathway data for collection: c4_CGN getCollections(rp) ## [1] "c3_TFT" "c3_MIR" "c4_CGN" unlink(repo_path, TRUE)
CondSEA
or PathSEA
outputExtracts the details matrix from CondSEA
or PathSEA
output
getDetails(analysis, collection)
getDetails(analysis, collection)
analysis |
The output of either |
collection |
One of the names returned by
|
A matrix including the ranks of each pathway (over rows)
and each condition (over columns) used as input to
CondSEA
or PathSEA
.
CondSEA, PathSEA
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) geps <- loadSampleGEP() buildPEPs(rp, geps) pgset <- c("(+)_chelidonine", "(+/_)_catechin") psea <- CondSEA(rp, pgset) getDetails(psea, "c3_TFT") unlink(repo_path, TRUE)
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) geps <- loadSampleGEP() buildPEPs(rp, geps) pgset <- c("(+)_chelidonine", "(+/_)_catechin") psea <- CondSEA(rp, pgset) getDetails(psea, "c3_TFT") unlink(repo_path, TRUE)
CondSEA
or PathSEA
outputExtracts the results matrix from CondSEA
or PathSEA
output
getResults(analysis, collection)
getResults(analysis, collection)
analysis |
The output of either |
collection |
One of the names returned by
|
A 2-columns matrix including ESs and p-values (see details) for each pathway database and condition.
CondSEA, PathSEA
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) geps <- loadSampleGEP() buildPEPs(rp, geps) pgset <- c("(+)_chelidonine", "(+/_)_catechin") psea <- CondSEA(rp, pgset) getResults(psea, "c3_TFT") unlink(repo_path, TRUE)
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) geps <- loadSampleGEP() buildPEPs(rp, geps) pgset <- c("(+)_chelidonine", "(+/_)_catechin") psea <- CondSEA(rp, pgset) getResults(psea, "c3_TFT") unlink(repo_path, TRUE)
Raw mode is meant to deal with large collections of PEPs (like
hundreds of thousands). In this case, problems may arise while
trying to convert GEPs by loading all of them in memory at
once. Raw mode is meant to be used with HDF5 format, which allows
to load subsets of GEPs from the disk. buildPEPs
, when used
in raw mode, can create the corresponding subsets of PEPs, so that
the job can be distributed on a computer
cluster. importFromRawMode
is meant to join the chunks into
HDF5 matrices, which are than stored into the repository. The
.loadPEPs
function can seamlessly load PEPs stored in normal
(RDS) or HDF5 format.
importFromRawMode(rp, path = file.path(rp$root(), "raw"), collections = "all")
importFromRawMode(rp, path = file.path(rp$root(), "raw"), collections = "all")
rp |
A repository created by |
path |
Path were raw PEPs are stored (default is a "raw" directory under the repository root folder). |
collections |
A subset of the collection names returned by
|
PEPs are expect to be found at the specified path
and follow the naming convention as generated by
buildPEPs
. According to such convention, each file is
named usign the format
category_subcategory#chunknumber.RDS. All non-alphanumeric
characters from the original category and subcategory names are
replace with an underscore (in rare cases this could create
ambiguity that should be manually prevented). All chunks for
the same subcategory are joined together following the chunk
numbers into a single HDF5 matrix and stored in the repository
as an "attachment" (see repo
documentation).
Note that raw PEPs (by default everything at repository_root/raw) can be safly removed once they have been imported.
Nothing, used for side effects.
buildPEPs
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) ## The following will create PEPs in 2 separate files geps <- loadSampleGEP() buildPEPs(rp, geps[,1:2], progress_bar=FALSE, rawmode_id=1) buildPEPs(rp, geps[,3:5], progress_bar=FALSE, rawmode_id=2) ## The separate files are then merged into one (possibly big) file ## in HDF5 format importFromRawMode(rp) ## Now most operations (excluding the addition of new PEPs to ## existing collections) will be available as usual. unlink(repo_path, TRUE)
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) ## The following will create PEPs in 2 separate files geps <- loadSampleGEP() buildPEPs(rp, geps[,1:2], progress_bar=FALSE, rawmode_id=1) buildPEPs(rp, geps[,3:5], progress_bar=FALSE, rawmode_id=2) ## The separate files are then merged into one (possibly big) file ## in HDF5 format importFromRawMode(rp) ## Now most operations (excluding the addition of new PEPs to ## existing collections) will be available as usual. unlink(repo_path, TRUE)
Creates a GeneSetCollection
object using the XML
distribution of the MSigDB (see references). The returned object
can be passed to createRepository
.
importMSigDB.xml(fname, organism = "Homo Sapiens")
importMSigDB.xml(fname, organism = "Homo Sapiens")
fname |
Path to an XML file downloaded from MSigDB. |
organism |
Select only gene sets for a given organism. Default is "Homo Sapiens". |
This function now just calls getBroadSets(fname)
from the GSEABase
package. However, it is left for
backward compatibility and as an entry point to package
functionalities.
A CategorizedCollection object
http://software.broadinstitute.org/gsea/downloads.jsp
## Not run: ## To run this example, first obtain the MSigDB database in XML ## format (see ## http://software.broadinstitute.org/gsea/downloads.jsp). It is ## assumed that the database is locally available as the file ## "msigdb_v6.0.xml". db <- importMSigDB.xml("msigdb_v6.0.xml") ## The database is now in an acceptable format to create a local ## repository using createRepository ## End(Not run) ## A small excerpt from the MSigDB is included in gep2pep. The ## following creates (and then deletes) a gep2pep repository. db_sample <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db_sample) ## removing temporary repository unlink(repo_path, TRUE)
## Not run: ## To run this example, first obtain the MSigDB database in XML ## format (see ## http://software.broadinstitute.org/gsea/downloads.jsp). It is ## assumed that the database is locally available as the file ## "msigdb_v6.0.xml". db <- importMSigDB.xml("msigdb_v6.0.xml") ## The database is now in an acceptable format to create a local ## repository using createRepository ## End(Not run) ## A small excerpt from the MSigDB is included in gep2pep. The ## following creates (and then deletes) a gep2pep repository. db_sample <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db_sample) ## removing temporary repository unlink(repo_path, TRUE)
Loads a collection of pathways from the repository
loadCollection(rp, collection)
loadCollection(rp, collection)
rp |
A repository created by |
collection |
One of the names returned by
|
a GeneSetCollection
object loaded from the
repository rp
.
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) geps <- loadSampleGEP() loadCollection(rp, "c3_TFT") unlink(repo_path, TRUE)
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) geps <- loadSampleGEP() loadCollection(rp, "c3_TFT") unlink(repo_path, TRUE)
Loads the matrix of Enrichment Scores for a collection
loadESmatrix(rp, collection)
loadESmatrix(rp, collection)
rp |
A repository created by |
collection |
One of the names returned by
|
The matrix of Enrichment Scores (ES) of the
Kolmogorov-Smirnov statistic for the pathway collection, if
previously computed with buildPEPs
. The entry i,j
reports the ES for the pathway i
, conditionj. If
buildPEPs
was not run, throws an error.
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) geps <- loadSampleGEP() buildPEPs(rp, geps) loadESmatrix(rp, "c3_TFT")[1:5,1:2] unlink(repo_path, TRUE)
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) geps <- loadSampleGEP() buildPEPs(rp, geps) loadESmatrix(rp, "c3_TFT")[1:5,1:2] unlink(repo_path, TRUE)
Loads the matrix of p-values for a collection
loadPVmatrix(rp, collection)
loadPVmatrix(rp, collection)
rp |
A repository created by |
collection |
One of the names returned by
|
The matrix of p-values (PV) of the
Kolmogorov-Smirnov statistic for the pathway collection, if
previously computed with buildPEPs
. The entry i,j
reports the PV for the pathway i
, conditionj. If
buildPEPs
was not run, throws an error.
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) geps <- loadSampleGEP() buildPEPs(rp, geps) loadPVmatrix(rp, "c3_TFT") unlink(repo_path, TRUE)
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) geps <- loadSampleGEP() buildPEPs(rp, geps) loadPVmatrix(rp, "c3_TFT") unlink(repo_path, TRUE)
Loads sample Gene Expression Profiles
loadSampleGEP()
loadSampleGEP()
Sample gene expression data
geps <- loadSampleGEP()
geps <- loadSampleGEP()
Loads sample pathway collections
loadSamplePWS()
loadSamplePWS()
Sample pathway collections in GeneSetCollection
format
geps <- loadSampleGEP()
geps <- loadSampleGEP()
Given a database, uses "category" and "subcategory" entries to create a vector of collection identifiers. Useful to extract a collection from a database.
makeCollectionIDs(sets)
makeCollectionIDs(sets)
sets |
A pathway database in the same format as output by
|
A vector of identifiers, one per pathway, with the format: "category_subcategory".
importMSigDB.xml
db <- loadSamplePWS() ids <- makeCollectionIDs(db) unique(ids) ## [1] "c3_TFT" "c3_MIR" "c4_CGN" db <- db[ids=="c3_MIR"] length(db) ## [1] 10
db <- loadSamplePWS() ids <- makeCollectionIDs(db) unique(ids) ## [1] "c3_TFT" "c3_MIR" "c4_CGN" db <- db[ids=="c3_MIR"] length(db) ## [1] 10
The repository must have been created by
createRepository
. Provides an R object to interact
with the repository.
openRepository(path)
openRepository(path)
path |
Path to a directory where the repository has been
created with |
This function only calls the repo_open
function
from the repo
package on path
. It is meant to
allow users not to explicitly load the repo
library,
unless they want to access advanced features.
An object of class repo
that can be passed to
gep2pep
functions.
createRepository
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) rp2 <- openRepository(repo_path) ## rp and rp2 point to the same data: identical(rp$entries(), rp2$entries()) ## > [1] TRUE unlink(repo_path, TRUE)
db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) rp2 <- openRepository(repo_path) ## rp and rp2 point to the same data: identical(rp$entries(), rp2$entries()) ## > [1] TRUE unlink(repo_path, TRUE)
PathSEA is analogous to the Gene Set Enrichment Analysis (GSEA), but for pathways instead of single genes. It can therefore be used to look for conditions under which a given set of pathways is consistently UP- or DOWN-regulated.
PathSEA(rp_peps, pathways, bgsets = "all", collections = "all", subset = "all", details = TRUE, rankingFun = rankPEPsByCols.SPV)
PathSEA(rp_peps, pathways, bgsets = "all", collections = "all", subset = "all", details = TRUE, rankingFun = rankPEPsByCols.SPV)
rp_peps |
A repository created with
|
pathways |
A database of pathways in the same format as input
to |
bgsets |
Another list like |
collections |
A subset of the collection names returned by
|
subset |
Character vector including PEP names to be considered (all by default, which may take time). |
details |
If TRUE (default) details will be reported for each
condition in |
rankingFun |
The function used to rank PEPs column-wise. By
default |
For each condition, all pathways are ranked by how much
they are dysregulated by it (from the most UP-regulated to the
most DOWN-regulatied, according to the corresponding
p-values). Then, a Kolmogorov-Smirnov (KS) test is performed to
compare the ranks assigned to pathways in pathways
against the ranks assigned to pathways in bgsets
. A
positive (negative) Enrichment Score (ES) of the KS test
indicates whether each pathway is UP- (DOWN-) regulated by
pgset
as compared to bgset
. A p-value is
associated to the ES.
When PEPs are obtained from drug-induced gene expression
profiles, PathSEA
can be used together with
gene2pathways
to perform gene2drug [1] analysis, which
predicts which drugs may target a gene of interest (or mimick
such effect).
The rankingFun
must take in input PEPs like those loaded
from the repository and return a matrix of column-wise
ranks. Each column must contain ranks from 1 to the number of
gene sets minus the number of NAs in the column.
A list of 2, by names "PathSEA" and "details". The
"PathSEA" entry is a 2-columns matrix including ESs and
p-values for each collection and condition. The "details" entry
reports the rank of each pathway in pathways
for each
condition.
[1] Napolitano, F. et al. gene2drug: a computational tool for pathway-based rational drug repositioning. Bioinformatics (2017). https://doi.org/10.1093/bioinformatics/btx800
getResults, getDetails
library(GSEABase) db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) geps <- loadSampleGEP() buildPEPs(rp, geps) pathways <- c("M11607", "M10817", "M16694", ## from c3_TFT "M19723", "M5038", "M13419", "M1094") ## from c4_CGN w <- sapply(db, setIdentifier) %in% pathways psea <- PathSEA(rp, db[w]) ## [15:35:29] Working on collection: c3_TFT ## [15:35:29] Common pathway sets removed from bgset. ## [15:35:29] Column-ranking collection... ## [15:35:29] Computing enrichments... ## [15:35:29] done. ## [15:35:29] Working on collection: C4_CGN ## [15:35:29] Common pathway sets removed from bgset. ## [15:35:29] Column-ranking collection... ## [15:35:29] Computing enrichments... ## [15:35:29] done. getResults(psea, "c3_TFT") ## ES PV ## (_)_mk_801 0.7142857 0.1666667 ## (_)_atenolol 0.7142857 0.1666667 ## (+)_isoprenaline 0.5714286 0.4000000 ## (+/_)_catechin 0.5714286 0.4000000 ## (+)_chelidonine 0.3333333 0.9333333 unlink(repo_path, TRUE)
library(GSEABase) db <- loadSamplePWS() repo_path <- file.path(tempdir(), "gep2pepTemp") rp <- createRepository(repo_path, db) geps <- loadSampleGEP() buildPEPs(rp, geps) pathways <- c("M11607", "M10817", "M16694", ## from c3_TFT "M19723", "M5038", "M13419", "M1094") ## from c4_CGN w <- sapply(db, setIdentifier) %in% pathways psea <- PathSEA(rp, db[w]) ## [15:35:29] Working on collection: c3_TFT ## [15:35:29] Common pathway sets removed from bgset. ## [15:35:29] Column-ranking collection... ## [15:35:29] Computing enrichments... ## [15:35:29] done. ## [15:35:29] Working on collection: C4_CGN ## [15:35:29] Common pathway sets removed from bgset. ## [15:35:29] Column-ranking collection... ## [15:35:29] Computing enrichments... ## [15:35:29] done. getResults(psea, "c3_TFT") ## ES PV ## (_)_mk_801 0.7142857 0.1666667 ## (_)_atenolol 0.7142857 0.1666667 ## (+)_isoprenaline 0.5714286 0.4000000 ## (+/_)_catechin 0.5714286 0.4000000 ## (+)_chelidonine 0.3333333 0.9333333 unlink(repo_path, TRUE)
Converts gene set IDs to gene set names
setId2setName(sets, ids)
setId2setName(sets, ids)
sets |
An object of class GeneSetCollection |
ids |
character vector of gene set IDs to be converted to set names |
A vector of gene set names
CondSEA, PathSEA
collection <- loadSamplePWS() setId2setName(collection, c("M3128", "M11607"))
collection <- loadSamplePWS() setId2setName(collection, c("M3128", "M11607"))