Title: | a tool set for pathway based data integration and visualization |
---|---|
Description: | Pathview is a tool set for pathway based data integration and visualization. It maps and renders a wide variety of biological data on relevant pathway graphs. All users need is to supply their data and specify the target pathway. Pathview automatically downloads the pathway graph data, parses the data file, maps user data to the pathway, and render pathway graph with the mapped data. In addition, Pathview also seamlessly integrates with pathway and gene set (enrichment) analysis tools for large-scale and fully automated analysis. |
Authors: | Weijun Luo |
Maintainer: | Weijun Luo <[email protected]> |
License: | GPL (>=3.0) |
Version: | 1.47.0 |
Built: | 2024-12-29 06:55:15 UTC |
Source: | https://github.com/bioc/pathview |
Pathway based data integration and visualization
Package: | pathview |
Type: | Package |
Version: | 1.0 |
Date: | 2012-12-26 |
License: | What license is it under? |
LazyLoad: | yes |
~~ An overview of how to use the package, including the most important ~~ ~~ functions ~~
Weijun Luo <[email protected]>
Maintainer: Weijun Luo <[email protected]>
Luo, W. and Brouwer, C., Pathview: an R/Bioconductor package for pathway based data integration and visualization. Bioinformatics, 2013, 29(14): 1830-1831, doi: 10.1093/bioinformatics/btt285
combineKEGGnodes
combines nodes into a group in a KEGG pathway
graph.
reaction2edge
converts reactions into edges in KEGG pathway
graph.
combineKEGGnodes(nodes, graph, combo.node) reaction2edge(path, gR)
combineKEGGnodes(nodes, graph, combo.node) reaction2edge(path, gR)
nodes |
character, names of the names to be combined. |
graph , gR
|
a object of "graphNEL" class, the graph parsed and converted from KEGG pathway. |
path |
a object of "KEGGPathway" class, the parsed KEGG pathway. |
combo.node |
character, the name of result combined node. |
combineKEGGnodes
not only combines nodes in the graph object,
but also corresponding node data in the KEGG pathway object. This
function is needed for KEGG-defined group nodes and parsed enzyme
groups involved in the same reaction.
reaction2edge
converts a reaction into 2 consecutive edges
between substrate and enzyme and enzyme and product. This function is
needed as to faithfully show the compound-enzyme nodes and their
interactions in Graphviz-style view of KEGG pathway.
The results returned by combineKEGGnodes
is a combined graph
of "graphNEL" class.
The results returned by reaction2edge
is a list of 3
elements: gR
, the converted graph ("graphNEL"); edata.new, the
new edge data ("KEGGEdge"); ndata.new, the new node data ("KEGGNode").
Weijun Luo <[email protected]>
Luo, W. and Brouwer, C., Pathview: an R/Bioconductor package for pathway based data integration and visualization. Bioinformatics, 2013, 29(14): 1830-1831, doi: 10.1093/bioinformatics/btt285
node.info
the main parser function
Mapping data between compound or gene IDs and KEGG accessions
data(cpd.accs) data(cpd.names) data(kegg.met) data(ko.ids) data(rn.list) data(gene.idtype.list) data(gene.idtype.bods) data(cpd.simtypes)
data(cpd.accs) data(cpd.names) data(kegg.met) data(ko.ids) data(rn.list) data(gene.idtype.list) data(gene.idtype.bods) data(cpd.simtypes)
cpd.accs is a data frame with 30054 observations on the following 4 variables. cpd.names is a data frame with 12314 observations on the following 5 variables. kegg.met is a character matrix of 694 rows and 3 columns. ko.ids is a character vector 8511 KEGG ortholog gene IDs, as used in KEGG ortholog pathways. rn.list is a namedlist of 21 vectors. Each vector records the row numbers for one of 21 dfferent compound ID types in cpd.accs data.frame. gene.idtype.list is a character vector of 13 common gene, transcript or protein ID types. Note some ID types are species specific, for example TAIR or ORF. gene.idtype.bods is a list of character vectors ofcommon gene, transcript or protein ID types for the 19 major research species in bods. Each element corresponds to a species. cpd.simtypes is a character vector of 7 common compound related ID types, each of them has over 1000 unique entries. Hence these ID types are good for generating simulation compound data.
ftp://ftp.ebi.ac.uk/pub/databases/chebi/Flat_file_tab_delimited/
http://www.genome.jp/kegg-bin/get_htext?br08001.keg
data(cpd.accs) data(rn.list) names(rn.list) cpd.accs[rn.list[[1]][1:4],] lapply(rn.list[1:4], function(rn) cpd.accs[rn[1:4],]) data(kegg.met) head(kegg.met)
data(cpd.accs) data(rn.list) names(rn.list) cpd.accs[rn.list[[1]][1:4],] lapply(rn.list[1:4], function(rn) cpd.accs[rn[1:4],]) data(kegg.met) head(kegg.met)
These auxillary compound ID mappers connect KEGG compound/glycan/drug accessions to compound names/synonyms and other commonly used compound-related IDs.
cpdidmap(in.ids, in.type, out.type) cpd2kegg(in.ids, in.type) cpdkegg2name(in.ids, in.type = c("KEGG", "KEGG COMPOUND accession")[1]) cpdname2kegg(in.ids)
cpdidmap(in.ids, in.type, out.type) cpd2kegg(in.ids, in.type) cpdkegg2name(in.ids, in.type = c("KEGG", "KEGG COMPOUND accession")[1]) cpdname2kegg(in.ids)
in.ids |
character, input IDs to be mapped. |
in.type |
character, the input ID type, needs to be either "KEGG" (including
compound, glycan and durg) or one of the
compound-related ID types used in CHEMBL database. For a full list of
the CHEMBL IDs, do |
out.type |
character, the output ID type, needs to be either "KEGG" (including
compound/glycan/durg) or one of the
compound-related ID types used in CHEMBL database. For a full list of
the CHEMBL IDs, do |
character, the output ID type, needs to be either "KEGG" or one of the
compound-related ID types used in CHEMBL database. For a full list of
the CHEMBL IDs, do data(rn.list); names(rn.list)
.
KEGG has its own compound ID system, including compound (glycan/durg)
accessions. Therefore, all compound
data need to be mapped to KEGG accessions when working with KEGG
pathways. Function cpd2kegg
does this mapping by calling
cpdname2kegg
or cpdidmap
. On the other hand, we
frequently want to check or show compound full names or other commonly
used IDs instead of the less informative KEGG accessions when working with KEGG compound nodes,
Functions cpdkegg2name
and cpdidmap
do this reverse mapping.
These functions are
written as part of the Pathview mapper module, they are equally useful
for other compound ID or data mapping tasks.
The use of these functions depends on a few data objects:
"cpd.accs", "cpd.names", "keg.met" and "rn.list", which are included in
this package. To access them, use data()
function.
a 2-column character matrix recording the mapping between input IDs to the target ID type.
Weijun Luo <[email protected]>
Luo, W. and Brouwer, C., Pathview: an R/Bioconductor package for pathway based data integration and visualization. Bioinformatics, 2013, 29(14): 1830-1831, doi: 10.1093/bioinformatics/btt285
eg2id
and id2eg
the auxillary gene ID mappers,
mol.sum
the auxillary molecular data mapper,
node.map
the node data mapper function.
data(cpd.simtypes) #generate simulated compound data named with non-KEGG ("CAS Registry Number")IDs cpd.cas <- sim.mol.data(mol.type = "cpd", id.type = cpd.simtypes[2], nmol = 10000) #construct map between non-KEGG ID and KEGG ID ("KEGG COMPOUND accession") id.map.cas <- cpdidmap(in.ids = names(cpd.cas), in.type = cpd.simtypes[2], out.type = "KEGG COMPOUND accession") #Map molecular data onto standard KEGG IDs cpd.kc <- mol.sum(mol.data = cpd.cas, id.map = id.map.cas) #check the results head(cpd.cas) head(id.map.cas) head(cpd.kc) #map KEGG ID to compound name cpd.names=cpdkegg2name(in.ids=id.map.cas[,2]) head(cpd.names)
data(cpd.simtypes) #generate simulated compound data named with non-KEGG ("CAS Registry Number")IDs cpd.cas <- sim.mol.data(mol.type = "cpd", id.type = cpd.simtypes[2], nmol = 10000) #construct map between non-KEGG ID and KEGG ID ("KEGG COMPOUND accession") id.map.cas <- cpdidmap(in.ids = names(cpd.cas), in.type = cpd.simtypes[2], out.type = "KEGG COMPOUND accession") #Map molecular data onto standard KEGG IDs cpd.kc <- mol.sum(mol.data = cpd.cas, id.map = id.map.cas) #check the results head(cpd.cas) head(id.map.cas) head(cpd.kc) #map KEGG ID to compound name cpd.names=cpdkegg2name(in.ids=id.map.cas[,2]) head(cpd.names)
demo.paths includes pathway ids and optimal plotting parameters when calling pathview.
GSE16873 is a breast cancer study (Emery et al, 2009) downloaded from
Gene Expression Omnibus (GEO). Dataset gse16873 is pre-processed using FARMS
method and includes 6 patient cases,
each with HN (histologically normal) and DCIS (ductal carcinoma in situ)
RMA samples. The same dataset is also used in gage
package. Dataset gse16873.d includes the gene expression changes of two
pairs of DCIS vs HN samples.
paths.hsa includes the full list of human pathway ID/names from KEGG.
data(demo.paths) data(gse16873.d) data(paths.hsa)
data(demo.paths) data(gse16873.d) data(paths.hsa)
demo.paths is a named list with ids and plotting parameters for 3 pathways. For details do:
data(demo.paths); demo.paths
gse16873.d is a numeric matrix with over 10000 rows (genes) and 2
columns (samples). For details do:
data(gse16873.d); str(gse16873.d)
.
paths.hsa is a named vector mapping KEGG pathway ID to human pathway names.
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE16873
This is the downloader function for KEGG pathways, automatically download graph images and associated KGML data.
download.kegg(pathway.id = "00010", species = "hsa", kegg.dir = ".", file.type=c("xml", "png"))
download.kegg(pathway.id = "00010", species = "hsa", kegg.dir = ".", file.type=c("xml", "png"))
pathway.id |
character, 5-digit KEGG pathway IDs. Default pathway.id="00010". |
species |
character, either the KEGG code, scientific name or the common name of the target species. When KEGG ortholog pathway is considered, species="ko". Default species="hsa", it is equivalent to use either "Homo sapiens" (scientific name) or "human" (common name). |
kegg.dir |
character, the directory of KEGG pathway data file (.xml) and image file (.png). Default kegg.dir="." (current working directory). |
file.type |
character, the file type(s) to be downloaded, either KEGG pathway data file (xml) or image file (png). Default include both types. |
Species can be specified as either kegg code, scientific name or the common name. Scientific name and the common name are always mapped to kegg code first. Length of species should be either 1 or the same as pathway.id, if not, the same set of pathway.id will be applied to all species.
a named character vector, either "succeed" or "failed", indicating the download status of corresponding pathways.
Weijun Luo <[email protected]>
Luo, W. and Brouwer, C., Pathview: an R/Bioconductor package for pathway based data integration and visualization. Bioinformatics, 2013, 29(14): 1830-1831, doi: 10.1093/bioinformatics/btt285
pathview
the main function,
node.info
the parser,
data(demo.paths) sel.2paths=demo.paths$sel.paths[1:2] download.kegg(pathway.id = sel.2paths, species = "hsa") #pathway files should be downloaded into current working directory
data(demo.paths) sel.2paths=demo.paths$sel.paths[1:2] download.kegg(pathway.id = sel.2paths, species = "hsa") #pathway files should be downloaded into current working directory
These auxillary gene ID mappers connect different gene ID or annotation types, especially they are used to map Entrez Gene ID to external gene, transcript or protein IDs or vise versa.
eg2id(eg, category = gene.idtype.list[1:2], org = "Hs", pkg.name = NULL, ...) id2eg(ids, category = gene.idtype.list[1], org = "Hs", pkg.name = NULL, ...) geneannot.map(in.ids, in.type, out.type, org="Hs", pkg.name=NULL, unique.map=TRUE, na.rm=TRUE, keep.order=TRUE)
eg2id(eg, category = gene.idtype.list[1:2], org = "Hs", pkg.name = NULL, ...) id2eg(ids, category = gene.idtype.list[1], org = "Hs", pkg.name = NULL, ...) geneannot.map(in.ids, in.type, out.type, org="Hs", pkg.name=NULL, unique.map=TRUE, na.rm=TRUE, keep.order=TRUE)
eg |
character, input Entrez Gene IDs. |
ids |
character, input gene/transcript/protein IDs to be converted to Entrez Gene IDs. |
in.ids |
character, input gene/transcript/protein IDs to be converted or mapped to other Gene IDs or annotation types. |
category |
character, for |
in.type |
character, the input gene/transcript/protein ID type to be mapped or converted to other ID/annotation types. |
out.type |
character, the output gene/transcript/protein ID type to be mapped or converted to other ID/annotation types. |
org |
character, the two-letter abbreviation of organism name, or KEGG species
code, or the common species name, used to determine the gene annotation
package. For all potential values check: |
pkg.name |
character, name of the gene annotation package. This package should be
one of the standard annotation packages from Bioconductor, such as
"org.Hs.eg.db". Check |
unique.map |
logical, whether to combine multiple entries mapped to the same input ID as a single entry (separted by "; "). Default unique.map=TRUE. |
na.rm |
logical, whether to remove the lines where input ID is not mapped (NA for mapped entries). Default na.rm=TRUE. |
keep.order |
logical, whether to keep the original input order even with all unmapped input IDs. Default keep.order=TRUE. |
... |
other arguments to be passed to geneannot.map function. |
KEGG uses Entrez Gene ID as its standard gene ID. Therefore, all gene
data need to be mapped to Entrez Genes when working with KEGG
pathways. Function id2eg
does this mapping. On the other hand, we
frequently want to check or show gene symbols or full names instead of
the less informative Entrez Gene ID when working with KEGG gene nodes,
Function eg2id
does this reverse mapping. Both id2eg
and
eg2id
are wrapper functions of geneannot.map
function. The
latter can be used to map between a range of major
gene/transcript/protein IDs or annotation types, not just Entrez Gene ID.
These functions are written as part of the Pathview mapper module, they
are equally useful for other gene ID or data mapping tasks.
The use of these functions depends on gene annotation packages like
"org.Hs.eg.db", which are Bioconductor standard. IFf no such packages not available for
your interesting organisms, you may build one with Bioconductor
AnnotationDbi package.
a 2- or multi-column character matrix recording the mapping between input IDs to the target ID type(s).
Weijun Luo <[email protected]>
Luo, W. and Brouwer, C., Pathview: an R/Bioconductor package for pathway based data integration and visualization. Bioinformatics, 2013, 29(14): 1830-1831, doi: 10.1093/bioinformatics/btt285
cpd2kegg
etc the auxillary compound ID mappers,
mol.sum
the auxillary molecular data mapper,
node.map
the node data mapper function.
data(gene.idtype.list) #generate simulated gene data named with non-KEGG/Entrez gene IDs gene.ensprot <- sim.mol.data(mol.type = "gene", id.type = gene.idtype.list[4], nmol = 50000) #construct map between non-KEGG ID and KEGG ID (Entrez gene) id.map.ensprot <- id2eg(ids = names(gene.ensprot), category = gene.idtype.list[4], org = "Hs") #Map molecular data onto Entrez Gene IDs gene.entrez <- mol.sum(mol.data = gene.ensprot, id.map = id.map.ensprot) #check the results head(gene.ensprot) head(id.map.ensprot) head(gene.entrez) #map Entrez Gene to Gene Symbol and Name eg.symbname=eg2id(eg=id.map.ensprot[,2]) #entries with more than 1 Entrez Genes are not mapped head(eg.symbname) #not run: map between other ID types for other species #ath.tair=sim.mol.data(id.type="tair", species="ath", nmol=1000) #data(gene.idtype.bods) #gid.map <-geneannot.map(in.ids=names(ath.tair)[rep(1:100,each=2)], #in.type="tair", out.type=gene.idtype.bods$ath[-1], org="At") #gid.map1 <-geneannot.map(in.ids=names(ath.tair)[rep(1:100,each=2)], #in.type="tair", out.type=gene.idtype.bods$ath[-1], org="At", #unique.map=F, keep.order=F) #str(gid.map) #str(gid.map1)
data(gene.idtype.list) #generate simulated gene data named with non-KEGG/Entrez gene IDs gene.ensprot <- sim.mol.data(mol.type = "gene", id.type = gene.idtype.list[4], nmol = 50000) #construct map between non-KEGG ID and KEGG ID (Entrez gene) id.map.ensprot <- id2eg(ids = names(gene.ensprot), category = gene.idtype.list[4], org = "Hs") #Map molecular data onto Entrez Gene IDs gene.entrez <- mol.sum(mol.data = gene.ensprot, id.map = id.map.ensprot) #check the results head(gene.ensprot) head(id.map.ensprot) head(gene.entrez) #map Entrez Gene to Gene Symbol and Name eg.symbname=eg2id(eg=id.map.ensprot[,2]) #entries with more than 1 Entrez Genes are not mapped head(eg.symbname) #not run: map between other ID types for other species #ath.tair=sim.mol.data(id.type="tair", species="ath", nmol=1000) #data(gene.idtype.bods) #gid.map <-geneannot.map(in.ids=names(ath.tair)[rep(1:100,each=2)], #in.type="tair", out.type=gene.idtype.bods$ath[-1], org="At") #gid.map1 <-geneannot.map(in.ids=names(ath.tair)[rep(1:100,each=2)], #in.type="tair", out.type=gene.idtype.bods$ath[-1], org="At", #unique.map=F, keep.order=F) #str(gid.map) #str(gid.map1)
This function maps species name to KEGG code.
kegg.species.code(species = "hsa", na.rm = FALSE, code.only = TRUE)
kegg.species.code(species = "hsa", na.rm = FALSE, code.only = TRUE)
species |
character, either the KEGG code, scientific name or the common name of the target species. Default species="hsa", it is equivalent to use either "Homo sapiens" (scientific name) or "human" (common name). |
na.rm |
logical, should unmapped entris be removed. Default na.rm = FALSE. |
code.only |
logical, whether to extract KEGG species code only or with gene ID usage info too. Default , code.only = TRUE. |
a character vector of mapped KEGG code of species.
Weijun Luo <[email protected]>
Luo, W. and Brouwer, C., Pathview: an R/Bioconductor package for pathway based data integration and visualization. Bioinformatics, 2013, 29(14): 1830-1831, doi: 10.1093/bioinformatics/btt285
korg
the species and KEGG code mapping data,
cpd2kegg
etc the auxillary compound ID mappers,
download.kegg
the downloader function.
species=c("ptr", "Mus musculus", "dog", "happ") kcode=kegg.species.code(species = species, na.rm = FALSE) print(kcode)
species=c("ptr", "Mus musculus", "dog", "happ") kcode=kegg.species.code(species = species, na.rm = FALSE) print(kcode)
Data on KEGG species, including taxonomy IDs, KEGG code, scientific name, common name, corresponding gene ID types, and gene annotation package names in Bioconductor
data(korg) data(bods)
data(korg) data(bods)
korg is a character matrix of ~4800 rows and 10 columns. First 5 columns are KEGG and NCBI taxonomy IDs, KEGG species code, scientific name and common name, followed columns on gene ID types used for each species: entrez.gnodes ("1" or "0", whether EntrezGene is the default gene ID) and representative KEGG gene ID, NCBI or Entrez Gene ID, NCBI protein and Uniprot ID. Note korg includes 4800 KEGG species (as of 06/2017), in the meantime, an updated version of korg is now checked out from Pathview Web server each time pathview package is loaded.
bods is a character matrix of 19 rows and 3 columns on the mapping between gene annotation package names in Bioconductor, common name and KEGG code of most common research species.
http://www.genome.jp/kegg-bin/get_htext?br08601.keg
http://bioconductor.org/packages/release/BiocViews.html#___OrgDb
data(korg) data(bods) head(korg) head(bods)
data(korg) data(bods) head(korg) head(bods)
Molecular data like gene or metabolite data are frequently annotated by various types of IDs. This function maps and summarize molecular data onto standard gene or compound IDs. It would be straightforward to integrate, analyze or visualize the "standardized" data with pathways or functional categories.
mol.sum(mol.data, id.map, gene.annotpkg = "org.Hs.eg.db", sum.method = c("sum", "mean", "median", "max", "max.abs", "random")[1])
mol.sum(mol.data, id.map, gene.annotpkg = "org.Hs.eg.db", sum.method = c("sum", "mean", "median", "max", "max.abs", "random")[1])
mol.data |
Either vector (single sample) or a matrix-like data (multiple sample). Vector should be numeric with molecule IDs as names or it may also be character of molecule IDs. Character vector is treated as discrete or count data. Matrix-like data structure has molecules as rows and samples as columns. Row names should be molecule IDs. Default mol.data=NULL. This argument is equivalent to gene.data or cpd.data in the pathview function. Check pahtview function for more information. |
id.map |
a two-column character matrix, giving the mapping between molecular IDs
used in mol.data and taget/standard molecular IDs. Then mol.data are
gene data, |
gene.annotpkg |
character, name of the gene annotation package. This package should be
one of the standard annotation packages from Bioconductor, such as
"org.Hs.eg.db" (default). Check |
sum.method |
character, the method name to calculate node summary given that multiple genes or compounds are mapped to it. Poential options include "sum","mean", "median", "max", "max.abs" and "random". Default sum.method="sum". |
This function is called in pathview main function when gene.idtype or
cpd.idtype is not the standard type, so that the molecular data can be
mapped and summarized onto standard IDs. This is needed for further
mapping to KEGG pathways. The same standard ID mapping is needed when
carry out pathway or functional analysis on molecular data, which are
labeled by non-standard (or alien) IDs or probe names, like in most of
the microarray or metabolomics datasets. In other words, function
mol.sum
can be useful in all these situations.
a numeric vector or matrix. Its dimensionality is the same as the input mol.data except row names are standard molecular IDs.
Weijun Luo <[email protected]>
Luo, W. and Brouwer, C., Pathview: an R/Bioconductor package for pathway based data integration and visualization. Bioinformatics, 2013, 29(14): 1830-1831, doi: 10.1093/bioinformatics/btt285
node.map
the node data mapper function.
id2eg
, cpd2kegg
etc the auxillary molecular ID mappers,
pathview
the main function,
data(gene.idtype.list) #generate simulated gene data named with non-KEGG/Entrez gene IDs gene.ensprot <- sim.mol.data(mol.type = "gene", id.type = gene.idtype.list[4], nmol = 50000) #construct map between non-KEGG ID and KEGG ID (Entrez gene) id.map.ensprot <- id2eg(ids = names(gene.ensprot), category = gene.idtype.list[4], org = "Hs") #Map molecular data onto Entrez Gene IDs gene.entrez <- mol.sum(mol.data = gene.ensprot, id.map = id.map.ensprot) #check the results head(gene.ensprot) head(id.map.ensprot) head(gene.entrez)
data(gene.idtype.list) #generate simulated gene data named with non-KEGG/Entrez gene IDs gene.ensprot <- sim.mol.data(mol.type = "gene", id.type = gene.idtype.list[4], nmol = 50000) #construct map between non-KEGG ID and KEGG ID (Entrez gene) id.map.ensprot <- id2eg(ids = names(gene.ensprot), category = gene.idtype.list[4], org = "Hs") #Map molecular data onto Entrez Gene IDs gene.entrez <- mol.sum(mol.data = gene.ensprot, id.map = id.map.ensprot) #check the results head(gene.ensprot) head(id.map.ensprot) head(gene.entrez)
node.color
converts the mapped molecular (gene, protein
or metabolite etc) data as pseudo colors on pathway nodes.
col.key
draws color key(s) for mapped molecular data on the
pathway graph.
node.color(plot.data = NULL, discrete=FALSE, limit, bins, both.dirs = TRUE, low = "green", mid = "gray", high = "red", na.col = "transparent", trans.fun = NULL) col.key(discrete=FALSE, limit = 1, bins = 10, cols = NULL, both.dirs = TRUE, low = "green", mid = "gray", high = "red", graph.size, node.size, size.by.graph = TRUE, key.pos = "topright", off.sets = c(x = 0, y = 0), align = "n", cex = 1, lwd = 1)
node.color(plot.data = NULL, discrete=FALSE, limit, bins, both.dirs = TRUE, low = "green", mid = "gray", high = "red", na.col = "transparent", trans.fun = NULL) col.key(discrete=FALSE, limit = 1, bins = 10, cols = NULL, both.dirs = TRUE, low = "green", mid = "gray", high = "red", graph.size, node.size, size.by.graph = TRUE, key.pos = "topright", off.sets = c(x = 0, y = 0), align = "n", cex = 1, lwd = 1)
plot.data |
the result returned by |
discrete |
logical, whether to treat the molecular data or node summary data as discrete. d discrete=FALSE, otherwise, mol.data will be a charactor vector of molecular IDs. |
limit |
a list of two numeric elements with "gene" and "cpd" as the names. This argument specifies the limit values for gene.data and cpd.data when converting them to pseudo colors. Each element of the list could be of length 1 or 2. Length 1 suggests discrete data or 1 directional (positive-valued) data, or the absolute limit for 2 directional data. Length 2 suggests 2 directional data. Default limit=list(gene=0.5, cpd=1). |
bins |
a list of two integer elements with "gene" and "cpd" as the names. This argument specifies the number of levels or bins for gene.data and cpd.data when converting them to pseudo colors. Default limit=list(gene=10, cpd=10). |
both.dirs |
a list of two logical elements with "gene" and "cpd" as the names. This argument specifies whether gene.data and cpd.data are 1 directional or 2 directional data when converting them to pseudo colors. Default limit=list(gene=TRUE, cpd=TRUE). |
trans.fun |
a list of two function (not character) elements with "gene" and "cpd" as the names. This
argument specifies whether and how gene.data and cpd.data are
transformed. Examples are |
low , mid , high
|
each is a list of two colors with "gene" and "cpd" as the names. This argument specifies the color spectra to code gene.data and cpd.data. When data are 1 directional (TRUE value in both.dirs), only mid and high are used to specify the color spectra. Default spectra (low-mid-high) "green"-"gray"-"red" and "blue"-"gray"-"yellow" are used for gene.data and cpd.data respectively. The values for 'low, mid, high' can be given as color names ('red'), plot color index (2=red), and HTML-style RGB, ("\#FF0000"=red). |
na.col |
color used for NA's or missing values in gene.data and cpd.data. d na.col="transparent". |
cols |
character, specifying a discrete spectrum of colors to be plotted as
color key. Note this argument is usually NULL (default), otherwise, the
number of discrete colors has to match |
graph.size |
numeric vector of length 2, i.e. the sizes (width, height) of the pathway graph panel. This is needed to determine the sizes and exact location of the color key. |
node.size |
numeric vector of length 2, i.e. the sizes (width, height) of the standard gene nodes (rectangles). This is needed to determine the sizes and exact location of the color key when size.by.graph=FALSE. |
size.by.graph |
logical, whether to determine the sizes and exact location of the color key with respect to the size of the whole graph panel or that of a single node. Default size.by.graph=TRUE. |
key.pos |
character, controlling the position of color key(s). Potentail values are "bottomleft", "bottomright", "topleft" and "topright". d key.pos="topright". |
off.sets |
numeric vector of length 2, with "x" and "y" as the names. This argument
specifies the offset values in x and y axes when plotting a new color
key, as to avoid overlap with existing color keys or boundaries. Note
that the |
align |
character, controlling how the color keys are aligned when needed. Potential values are "x", aligned by x coordinates, and "y", aligned by y coordinates. Default align="x". |
cex |
A numerical value giving the amount by which legend text and symbols should be scaled relative to the default 1. |
lwd |
numeric, the line width, a _positive_ number, defaulting to '1'. |
node.color
converts the mapped molecular data (gene.data or cpd.data) by
node.map function into pseudo colors, which then can be plotted on the
pathway graph.
col.key
is used in combination with node.color in pathview, although
this function can be used independently for similar tasks.
node.color
returns a vector or matrix of colors. Its
dimensionality is the same as the corresponding gene.data or cpd.data.
col.key
plots a color key on existing pathway graph, then returns
a updated version of off.sets for the reference of next color key.
Weijun Luo <[email protected]>
Luo, W. and Brouwer, C., Pathview: an R/Bioconductor package for pathway based data integration and visualization. Bioinformatics, 2013, 29(14): 1830-1831, doi: 10.1093/bioinformatics/btt285
keggview.native
and keggview.graph
the
viwer functions,
node.map
the node data mapper function.
xml.file=system.file("extdata", "hsa04110.xml", package = "pathview") node.data=node.info(xml.file) names(node.data) data(gse16873.d) plot.data.gene=node.map(mol.data=gse16873.d[,1], node.data, node.types="gene") head(plot.data.gene) cols.ts.gene=node.color(plot.data.gene, limit=1, bins=10) head(cols.ts.gene)
xml.file=system.file("extdata", "hsa04110.xml", package = "pathview") node.data=node.info(xml.file) names(node.data) data(gse16873.d) plot.data.gene=node.map(mol.data=gse16873.d[,1], node.data, node.types="gene") head(plot.data.gene) cols.ts.gene=node.color(plot.data.gene, limit=1, bins=10) head(cols.ts.gene)
The parser function, parser KGML file and/or extract node information from KEGG pathway.
node.info(object, short.name = TRUE)
node.info(object, short.name = TRUE)
object |
either a character specifying the full KGML file name (with directory), or a object of "KEGGPathway" class, or a object of "graphNEL" class. The latter two are parsed results of KGML file. |
short.name |
logical, if TRUE, the short labels, i.e. the first iterm separated by "," in the long labels are parsed out as node labels. Default short.name=TRUE. |
Parser function node.info extract node data from parsed KEGG
pathways. KGML files are parsed using parseKGML2
and
KEGGpathway2Graph2
. These functions from KEGGgraph package have
been heavily modified for reaction parsing and conversion to
edges.
a named list of 10 elements: "kegg.names", "type", "component", "size", "labels", "shape", "x", "y", "width" and "height". Each elements record the corresponding attribute for all nodes in the parsed KEGG pathway.
Weijun Luo <[email protected]>
Luo, W. and Brouwer, C., Pathview: an R/Bioconductor package for pathway based data integration and visualization. Bioinformatics, 2013, 29(14): 1830-1831, doi: 10.1093/bioinformatics/btt285
pathview
the main function,
combineKEGGnodes
and reaction2edge
for
special treatment of nodes or edges.
xml.file=system.file("extdata", "hsa04110.xml", package = "pathview") node.data=node.info(xml.file) names(node.data) #or parse into a graph object, then extract node info gR1=pathview:::parseKGML2Graph2(xml.file, genesOnly=FALSE, expand=FALSE, split.group=FALSE) node.data=node.info(gR1)
xml.file=system.file("extdata", "hsa04110.xml", package = "pathview") node.data=node.info(xml.file) names(node.data) #or parse into a graph object, then extract node info gR1=pathview:::parseKGML2Graph2(xml.file, genesOnly=FALSE, expand=FALSE, split.group=FALSE) node.data=node.info(gR1)
The mapper function, mapping molecular data(gene expression, metabolite abundance etc)to nodes in KEGG pathway.
node.map(mol.data = NULL, node.data, node.types = c("gene", "ortholog", "compound")[1], node.sum = c("sum", "mean", "median", "max", "max.abs", "random")[1], entrez.gnodes=TRUE)
node.map(mol.data = NULL, node.data, node.types = c("gene", "ortholog", "compound")[1], node.sum = c("sum", "mean", "median", "max", "max.abs", "random")[1], entrez.gnodes=TRUE)
mol.data |
Either vector (single sample) or a matrix-like data (multiple sample). Vector should be numeric with molecule IDs as names or it may also be character of molecule IDs. Character vector is treated as discrete or count data. Matrix-like data structure has molecules as rows and samples as columns. Row names should be molecule IDs. Default mol.data=NULL. This argument is equivalent to gene.data or cpd.data in the pathview function. Check pahtview function for more information. |
node.data |
a named list of 10 elements, the results returned by |
node.types |
character, sepcify the node type to map the mol.data to, either "gene", "compound", or "compound". Default node.types="gene". |
node.sum |
character, the method name to calculate node summary given that multiple genes or compounds are mapped to it. Poential options include "sum","mean", "median", "max", "max.abs" and "random". Default node.sum="sum". |
entrez.gnodes |
logical, whether EntrezGene (NCBI GeneID) is used as the default gene ID in the KEGG data files. This is needed because KEGG uses different types default gene ID for different species. Some most common model species use EntrezGene, but majority of others use Locus tag. Default entrez.gnodes=TRUE. |
Mapper function node.map maps user supplied molecular data to KEGG
pathways. This function takes standard KEGG molecular IDs (Entrez Gene
ID or KEGG Compound Accession) and map them to pathway nodes. None KEGG
molecular gene IDs or Compound IDs are pre-mapped to standard KEGG IDs
by calling another function mol.sum
. When
multiple molecules map to one node, the corresponding molecular data are
summarized into a single node summary by calling function specified by
node.sum
. This mapped node summary data together with the parsed
KGML data are then returned for further processing.
Proper input data include: gene expression, protein
expression, genetic association, metabolite abundance, genomic data,
literature, and other data types mappable to pathways.
The input mol.data may be NULL, then no molecular data are actually
mapped, but all nodes of the specified node.type are considered
"mappable" and their parsed KGML data returned.
A data.frame composed of parsed KGML data and summary molecular data for each mapped node. Each row is a mapped node, and columns are:
kegg.names |
standard KEGG IDs/Names for mapped nodes. It's Entrez Gene ID or KEGG Compound Accessions. |
labels |
Node labels to be used when needed |
type |
node type, currently 4 types are supported: "gene","enzyme", "compound" and "ortholog". |
x |
x coordinate in the original KEGG pathway graph. |
y |
y coordinate in the original KEGG pathway graph. |
width |
node width in the original KEGG pathway graph. |
height |
node height in the original KEGG pathway graph. |
other columns |
columns of the mapped gene/compound data |
Weijun Luo <[email protected]>
Luo, W. and Brouwer, C., Pathview: an R/Bioconductor package for pathway based data integration and visualization. Bioinformatics, 2013, 29(14): 1830-1831, doi: 10.1093/bioinformatics/btt285
mol.sum
the auxillary molecular data mapper,
id2eg
, cpd2kegg
etc the auxillary molecular ID mappers,
node.color
the node color coder,
pathview
the main function,
node.info
the parser.
xml.file=system.file("extdata", "hsa04110.xml", package = "pathview") node.data=node.info(xml.file) names(node.data) data(gse16873.d) plot.data.gene=node.map(mol.data=gse16873.d[,1], node.data, node.types="gene") head(plot.data.gene)
xml.file=system.file("extdata", "hsa04110.xml", package = "pathview") node.data=node.info(xml.file) names(node.data) data(gse16873.d) plot.data.gene=node.map(mol.data=gse16873.d[,1], node.data, node.types="gene") head(plot.data.gene)
Pathview is a tool set for pathway based data integration and visualization. It maps and renders user data on relevant pathway graphs. All users need is to supply their gene or compound data and specify the target pathway. Pathview automatically downloads the pathway graph data, parses the data file, maps user data to the pathway, and render pathway graph with the mapped data. Pathview generates both native KEGG view and Graphviz views for pathways. keggview.native and keggview.graph are the two viewer functions, and pathview is the main function providing a unified interface to downloader, parser, mapper and viewer functions.
pathview(gene.data = NULL, cpd.data = NULL, pathway.id, species = "hsa", kegg.dir = ".", cpd.idtype = "kegg", gene.idtype = "entrez", gene.annotpkg = NULL, min.nnodes = 3, kegg.native = TRUE, map.null = TRUE, expand.node = FALSE, split.group = FALSE, map.symbol = TRUE, map.cpdname = TRUE, node.sum = "sum", discrete=list(gene=FALSE, cpd=FALSE), limit = list(gene = 1, cpd = 1), bins = list(gene = 10, cpd = 10), both.dirs = list(gene = T, cpd = T), trans.fun = list(gene = NULL, cpd = NULL), low = list(gene = "green", cpd = "blue"), mid = list(gene = "gray", cpd = "gray"), high = list(gene = "red", cpd = "yellow"), na.col = "transparent", ...) keggview.native(plot.data.gene = NULL, plot.data.cpd = NULL, cols.ts.gene = NULL, cols.ts.cpd = NULL, node.data, pathway.name, out.suffix = "pathview", kegg.dir = ".", multi.state=TRUE, match.data = TRUE, same.layer = TRUE, res = 300, cex = 0.25, discrete = list(gene=FALSE, cpd=FALSE), limit= list(gene = 1, cpd = 1), bins = list(gene = 10, cpd = 10), both.dirs =list(gene = T, cpd = T), low = list(gene = "green", cpd = "blue"), mid = list(gene = "gray", cpd = "gray"), high = list(gene = "red", cpd = "yellow"), na.col = "transparent", new.signature = TRUE, plot.col.key = TRUE, key.align = "x", key.pos = "topright", ...) keggview.graph(plot.data.gene = NULL, plot.data.cpd = NULL, cols.ts.gene = NULL, cols.ts.cpd = NULL, node.data, path.graph, pathway.name, out.suffix = "pathview", pdf.size = c(7, 7), multi.state=TRUE, same.layer = TRUE, match.data = TRUE, rankdir = c("LR", "TB")[1], is.signal = TRUE, split.group = F, afactor = 1, text.width = 15, cex = 0.5, map.cpdname = FALSE, cpd.lab.offset = 1.0, discrete=list(gene=FALSE, cpd=FALSE), limit = list(gene = 1, cpd = 1), bins = list(gene = 10, cpd = 10), both.dirs = list(gene = T, cpd = T), low = list(gene = "green", cpd = "blue"), mid = list(gene = "gray", cpd = "gray"), high = list(gene = "red", cpd = "yellow"), na.col = "transparent", new.signature = TRUE, plot.col.key = TRUE, key.align = "x", key.pos = "topright", sign.pos = "bottomright", ...)
pathview(gene.data = NULL, cpd.data = NULL, pathway.id, species = "hsa", kegg.dir = ".", cpd.idtype = "kegg", gene.idtype = "entrez", gene.annotpkg = NULL, min.nnodes = 3, kegg.native = TRUE, map.null = TRUE, expand.node = FALSE, split.group = FALSE, map.symbol = TRUE, map.cpdname = TRUE, node.sum = "sum", discrete=list(gene=FALSE, cpd=FALSE), limit = list(gene = 1, cpd = 1), bins = list(gene = 10, cpd = 10), both.dirs = list(gene = T, cpd = T), trans.fun = list(gene = NULL, cpd = NULL), low = list(gene = "green", cpd = "blue"), mid = list(gene = "gray", cpd = "gray"), high = list(gene = "red", cpd = "yellow"), na.col = "transparent", ...) keggview.native(plot.data.gene = NULL, plot.data.cpd = NULL, cols.ts.gene = NULL, cols.ts.cpd = NULL, node.data, pathway.name, out.suffix = "pathview", kegg.dir = ".", multi.state=TRUE, match.data = TRUE, same.layer = TRUE, res = 300, cex = 0.25, discrete = list(gene=FALSE, cpd=FALSE), limit= list(gene = 1, cpd = 1), bins = list(gene = 10, cpd = 10), both.dirs =list(gene = T, cpd = T), low = list(gene = "green", cpd = "blue"), mid = list(gene = "gray", cpd = "gray"), high = list(gene = "red", cpd = "yellow"), na.col = "transparent", new.signature = TRUE, plot.col.key = TRUE, key.align = "x", key.pos = "topright", ...) keggview.graph(plot.data.gene = NULL, plot.data.cpd = NULL, cols.ts.gene = NULL, cols.ts.cpd = NULL, node.data, path.graph, pathway.name, out.suffix = "pathview", pdf.size = c(7, 7), multi.state=TRUE, same.layer = TRUE, match.data = TRUE, rankdir = c("LR", "TB")[1], is.signal = TRUE, split.group = F, afactor = 1, text.width = 15, cex = 0.5, map.cpdname = FALSE, cpd.lab.offset = 1.0, discrete=list(gene=FALSE, cpd=FALSE), limit = list(gene = 1, cpd = 1), bins = list(gene = 10, cpd = 10), both.dirs = list(gene = T, cpd = T), low = list(gene = "green", cpd = "blue"), mid = list(gene = "gray", cpd = "gray"), high = list(gene = "red", cpd = "yellow"), na.col = "transparent", new.signature = TRUE, plot.col.key = TRUE, key.align = "x", key.pos = "topright", sign.pos = "bottomright", ...)
gene.data |
either vector (single sample) or a matrix-like data (multiple sample). Vector should be numeric with gene IDs as names or it may also be character of gene IDs. Character vector is treated as discrete or count data. Matrix-like data structure has genes as rows and samples as columns. Row names should be gene IDs. Here gene ID is a generic concepts, including multiple types of gene, transcript and protein uniquely mappable to KEGG gene IDs. KEGG ortholog IDs are also treated as gene IDs as to handle metagenomic data. Check details for mappable ID types. Default gene.data=NULL. numeric, character, continuous |
cpd.data |
the same as gene.data, excpet named with IDs mappable to KEGG compound IDs. Over 20 types of IDs included in CHEMBL database can be used here. Check details for mappable ID types. Default cpd.data=NULL. Note that gene.data and cpd.data can't be NULL simultaneously. |
pathway.id |
character vector, the KEGG pathway ID(s), usually 5 digit, may also include the 3 letter KEGG species code. |
species |
character, either the kegg code, scientific name or the common name of the target species. This applies to both pathway and gene.data or cpd.data. When KEGG ortholog pathway is considered, species="ko". Default species="hsa", it is equivalent to use either "Homo sapiens" (scientific name) or "human" (common name). |
kegg.dir |
character, the directory of KEGG pathway data file (.xml) and image file (.png). Users may supply their own data files in the same format and naming convention of KEGG's (species code + pathway id, e.g. hsa04110.xml, hsa04110.png etc) in this directory. Default kegg.dir="." (current working directory). |
cpd.idtype |
character, ID type used for the cpd.data. Default cpd.idtype="kegg" (include compound, glycan and drug accessions). |
gene.idtype |
character, ID type used for the gene.data, case insensitive. Default
gene.idtype="entrez", i.e. Entrez Gene, which are the primary KEGG gene ID
for many common model organisms. For other species, gene.idtype should
be set to "KEGG" as KEGG use other types of gene IDs. For the common
model organisms (to check the list, do: |
gene.annotpkg |
character, the name of the annotation package to use for mapping between other gene ID types including symbols and Entrez gene ID. Default gene.annotpkg=NULL. |
min.nnodes |
integer, minimal number of nodes of type "gene","enzyme", "compound" or "ortholog" for a pathway to be considered. Default min.nnodes=3. |
kegg.native |
logical, whether to render pathway graph as native KEGG graph (.png) or using graphviz layout engine (.pdf). Default kegg.native=TRUE. |
map.null |
logical, whether to map the NULL gene.data or cpd.data to pathway. When NULL data are mapped, the gene or compound nodes in the pathway will be rendered as actually mapped nodes, except with NA-valued color. When NULL data are not mapped, the nodes are rendered as unmapped nodes. This argument mainly affects native KEGG graph view, i.e. when kegg.native=TRUE. Default map.null=TRUE. |
expand.node |
logical, whether the multiple-gene nodes are expanded into single-gene nodes. Each expanded single-gene nodes inherits all edges from the original multiple-gene node. This option only affects graphviz graph view, i.e. when kegg.native=FALSE. This option is not effective for most metabolic pathways where it conflits with converting reactions to edges. Default expand.node=FLASE. |
split.group |
logical, whether split node groups are split to individual nodes. Each split member nodes inherits all edges from the node group. This option only affects graphviz graph view, i.e. when kegg.native=FALSE. This option also effects most metabolic pathways even without group nodes defined orginally. For these pathways, genes involved in the same reaction are grouped automatically when converting reactions to edges unless split.group=TRUE. d split.group=FLASE. |
map.symbol |
logical, whether map gene IDs to symbols for gene node labels or use the graphic name from the KGML file. This option is only effective for kegg.native=FALSE or same.layer=FALSE when kegg.native=TRUE. For same.layer=TRUE when kegg.native=TRUE, the native KEGG labels will be kept. Default map.symbol=TRUE. |
map.cpdname |
logical, whether map compound IDs to formal names for compound node labels or use the graphic name from the KGML file (KEGG compound accessions). This option is only effective for kegg.native=FALSE. When kegg.native=TRUE, the native KEGG labels will be kept. Default map.cpdname=TRUE. |
node.sum |
character, the method name to calculate node summary given that multiple genes or compounds are mapped to it. Poential options include "sum","mean", "median", "max", "max.abs" and "random". Default node.sum="sum". |
discrete |
a list of two logical elements with "gene" and "cpd" as the names. This argument tells whether gene.data or cpd.data should be treated as discrete. Default dsicrete=list(gene=FALSE, cpd=FALSE), i.e. both data should be treated as continuous. |
limit |
a list of two numeric elements with "gene" and "cpd" as the names. This argument specifies the limit values for gene.data and cpd.data when converting them to pseudo colors. Each element of the list could be of length 1 or 2. Length 1 suggests discrete data or 1 directional (positive-valued) data, or the absolute limit for 2 directional data. Length 2 suggests 2 directional data. Default limit=list(gene=1, cpd=1). |
bins |
a list of two integer elements with "gene" and "cpd" as the names. This argument specifies the number of levels or bins for gene.data and cpd.data when converting them to pseudo colors. Default limit=list(gene=10, cpd=10). |
both.dirs |
a list of two logical elements with "gene" and "cpd" as the names. This argument specifies whether gene.data and cpd.data are 1 directional or 2 directional data when converting them to pseudo colors. Default limit=list(gene=TRUE, cpd=TRUE). |
trans.fun |
a list of two function (not character) elements with "gene" and "cpd" as the names. This
argument specifies whether and how gene.data and cpd.data are
transformed. Examples are |
low , mid , high
|
each is a list of two colors with "gene" and "cpd" as the names. This argument specifies the color spectra to code gene.data and cpd.data. When data are 1 directional (TRUE value in both.dirs), only mid and high are used to specify the color spectra. Default spectra (low-mid-high) "green"-"gray"-"red" and "blue"-"gray"-"yellow" are used for gene.data and cpd.data respectively. The values for 'low, mid, high' can be given as color names ('red'), plot color index (2=red), and HTML-style RGB, ("\#FF0000"=red). |
na.col |
color used for NA's or missing values in gene.data and cpd.data. d na.col="transparent". |
... |
extra arguments passed to keggview.native or keggview.graph function. |
special arguments for keggview.native or keggview.graph function.
plot.data.gene |
data.frame returned by node.map function for rendering mapped gene nodes, including node name, type, positions (x, y), sizes (width, height), and mapped gene.data. This data is also used as input for pseduo-color coding through node.color function. Default plot.data.gene=NULL. |
plot.data.cpd |
same as plot.data.gene function, except for mapped compound node data. d plot.data.cpd=NULL. Default plot.data.cpd=NULL. Note that plot.data.gene and plot.data.cpd can't be NULL simultaneously. |
cols.ts.gene |
vector or matrix of colors returned by node.color function for rendering gene.data. Dimensionality is the same as the latter. Default cols.ts.gene=NULL. |
cols.ts.cpd |
same as cols.ts.gene, except corresponding to cpd.data. d cols.ts.cpd=NULL. Note that cols.ts.gene and cols.ts.cpd plot.data.gene can't be NULL simultaneously. |
node.data |
list returned by node.info function, which parse KGML file directly or indirectly, and extract the node data. |
pathway.name |
character, the full KEGG pathway name in the format of 3-letter species code with 5-digit pathway id, eg "hsa04612". |
out.suffix |
character, the suffix to be added after the pathway name as part of the output graph file. Sample names or column names of the gene.data or cpd.data are also added when there are multiple samples. Default out.suffix="pathview". |
multi.state |
logical, whether multiple states (samples or columns) gene.data or cpd.data should be integrated and plotted in the same graph. Default match.data=TRUE. In other words, gene or compound nodes will be sliced into multiple pieces corresponding to the number of states in the data. |
match.data |
logical, whether the samples of gene.data and cpd.data are paired. Default match.data=TRUE. When let sample sizes of gene.data and cpd.data be m and n, when m>n, extra columns of NA's (mapped to no color) will be added to cpd.data as to make the sample size the same. This will result in the smae number of slice in gene nodes and compound when multi.state=TRUE. |
same.layer |
logical, control plotting layers: 1) if node colors be plotted in the same layer as the pathway graph when kegg.native=TRUE, 2) if edge/node type legend be plotted in the same page when kegg.native=FALSE. |
res |
The nominal resolution in ppi which will be recorded in the bitmap file, if a positive integer. Also used for 'units' other than the default, and to convert points to pixels. This argument is only effective when kegg.native=TRUE. Default res=300. |
cex |
A numerical value giving the amount by which plotting text and symbols should be scaled relative to the default 1. Default cex=0.25 when kegg.native=TRUE, cex=0.5 when kegg.native=FALSE. |
new.signature |
logical, whether pathview signature is added to the pathway graphs. Default new.signature=TRUE. |
plot.col.key |
logical, whether color key is added to the pathway graphs. Default plot.col.key= TRUE. |
key.align |
character, controlling how the color keys are aligned when both gene.data and cpd.data are not NULL. Potential values are "x", aligned by x coordinates, and "y", aligned by y coordinates. Default key.align="x". |
key.pos |
character, controlling the position of color key(s). Potentail values are "bottomleft", "bottomright", "topleft" and "topright". d key.pos="topright". |
sign.pos |
character, controlling the position of pathview signature. Only effective when kegg.native=FALSE, Signature position is fixed in place of the original KEGG signature when kegg.native=TRUE. Potentail values are "bottomleft", "bottomright", "topleft" and "topright". d sign.pos="bottomright". |
path.graph |
a graph object parsed from KGML file, only effective when kegg.native=FALSE. |
pdf.size |
a numeric vector of length 2, giving the width and height of the pathway graph pdf file. Note that pdf width increase by half when same.layer=TRUE to accommodate legends. Only effective when kegg.native=FALSE. Default pdf.size=c(7,7). |
rankdir |
character, either "LR" (left to right) or "TB" (top to bottom), specifying the pathway graph layout direction. Only effective when kegg.native=FALSE. Default rank.dir="LR". |
is.signal |
logical, if the pathway is treated as a signaling pathway, where all the unconnected nodes are dropped. This argument also affect the graph layout type, i.e. "dot" for signals or "neato" otherwise. Only effective when kegg.native=FALSE. Default is.signal=TRUE. |
afactor |
numeric, node amplifying factor. This argument is for node size fine-tuning, its effect is subtler than expected. Only effective when kegg.native=FALSE. Default afctor=1. |
text.width |
numeric, specifying the line width for text wrap. Only effective when kegg.native= FALSE. Default text.width=15 (characters). |
cpd.lab.offset |
numeric, specifying how much compound labels should be put above the default position or node center. This argument is useful when map.cpdname=TRUE, i.e. compounds are labelled by full name, which affects the look of compound nodes and color. Only effective when kegg.native=FALSE. Default cpd.lab.offset=1.0. |
Pathview maps and renders user data on relevant pathway graphs. Pathview
is a stand alone program for pathway based data integration and
visualization. It also seamlessly integrates with pathway and functional
analysis tools for large-scale and fully automated analysis.
Pathview provides strong support for data Integration. It works with: 1)
essentially all types of biological data mappable to pathways, 2) over
10 types of gene or protein IDs, and 20 types of compound or metabolite
IDs, 3) pathways for over 2000 species as well as KEGG orthology, 4)
varoius data attributes and formats, i.e. continuous/discrete data,
matrices/vectors, single/multiple samples etc.
To see mappable external gene/protein IDs do:
data(gene.idtype.list)
, to see mappable external compound related
IDs do: data(rn.list)
; names(rn.list).
Pathview generates both native KEGG view and Graphviz views for
pathways. Currently only KEGG pathways are implemented. Hopefully, pathways from
Reactome, NCI and other databases will be supported in the future.
From viersion 1.9.3, pathview can accept either a single pathway or multiple pathway ids. The result returned by pathview function is a named list corresponding to the input pathway ids. Each element (for each pathway itself is a named list, with 2 elements ("plot.data.gene", "plot.data.cpd"). Both elements are data.frame or NULL depends on the corresponding input data gene.data and cpd.data. These data.frames record the plot data for mapped gene or compound nodes: rows are mapped genes/compounds, columns are:
kegg.names |
standard KEGG IDs/Names for mapped nodes. It's Entrez Gene ID or KEGG Compound Accessions. |
labels |
Node labels to be used when needed. |
all.mapped |
All molecule (gene or compound) IDs mapped to this node. |
type |
node type, currently 4 types are supported: "gene","enzyme", "compound" and "ortholog". |
x |
x coordinate in the original KEGG pathway graph. |
y |
y coordinate in the original KEGG pathway graph. |
width |
node width in the original KEGG pathway graph. |
height |
node height in the original KEGG pathway graph. |
other columns |
columns of the mapped gene/compound data and corresponding pseudo-color codes for individual samples |
The results returned by keggview.native
and
codekeggview.graph are both a list of graph plotting
parameters. These are not intended to be used externally.
Weijun Luo <[email protected]>
Luo, W. and Brouwer, C., Pathview: an R/Bioconductor package for pathway based data integration and visualization. Bioinformatics, 2013, 29(14): 1830-1831, doi: 10.1093/bioinformatics/btt285
download.kegg
the downloader,
node.info
the parser,
node.map
and node.color
the mapper.
#load data data(gse16873.d) data(demo.paths) #KEGG view: gene data only i <- 1 pv.out <- pathview(gene.data = gse16873.d[, 1], pathway.id = demo.paths$sel.paths[i], species = "hsa", out.suffix = "gse16873", kegg.native = TRUE) str(pv.out) head(pv.out$plot.data.gene) #result PNG file in current directory #Graphviz view: gene data only pv.out <- pathview(gene.data = gse16873.d[, 1], pathway.id = demo.paths$sel.paths[i], species = "hsa", out.suffix = "gse16873", kegg.native = FALSE, sign.pos = demo.paths$spos[i]) #result PDF file in current directory #KEGG view: both gene and compound data sim.cpd.data=sim.mol.data(mol.type="cpd", nmol=3000) i <- 3 print(demo.paths$sel.paths[i]) pv.out <- pathview(gene.data = gse16873.d[, 1], cpd.data = sim.cpd.data, pathway.id = demo.paths$sel.paths[i], species = "hsa", out.suffix = "gse16873.cpd", keys.align = "y", kegg.native = TRUE, key.pos = demo.paths$kpos1[i]) str(pv.out) head(pv.out$plot.data.cpd) #multiple states in one graph set.seed(10) sim.cpd.data2 = matrix(sample(sim.cpd.data, 18000, replace = TRUE), ncol = 6) pv.out <- pathview(gene.data = gse16873.d[, 1:3], cpd.data = sim.cpd.data2[, 1:2], pathway.id = demo.paths$sel.paths[i], species = "hsa", out.suffix = "gse16873.cpd.3-2s", keys.align = "y", kegg.native = TRUE, match.data = FALSE, multi.state = TRUE, same.layer = TRUE) str(pv.out) head(pv.out$plot.data.cpd) #result PNG file in current directory ##more examples of pathview usages are shown in the vignette.
#load data data(gse16873.d) data(demo.paths) #KEGG view: gene data only i <- 1 pv.out <- pathview(gene.data = gse16873.d[, 1], pathway.id = demo.paths$sel.paths[i], species = "hsa", out.suffix = "gse16873", kegg.native = TRUE) str(pv.out) head(pv.out$plot.data.gene) #result PNG file in current directory #Graphviz view: gene data only pv.out <- pathview(gene.data = gse16873.d[, 1], pathway.id = demo.paths$sel.paths[i], species = "hsa", out.suffix = "gse16873", kegg.native = FALSE, sign.pos = demo.paths$spos[i]) #result PDF file in current directory #KEGG view: both gene and compound data sim.cpd.data=sim.mol.data(mol.type="cpd", nmol=3000) i <- 3 print(demo.paths$sel.paths[i]) pv.out <- pathview(gene.data = gse16873.d[, 1], cpd.data = sim.cpd.data, pathway.id = demo.paths$sel.paths[i], species = "hsa", out.suffix = "gse16873.cpd", keys.align = "y", kegg.native = TRUE, key.pos = demo.paths$kpos1[i]) str(pv.out) head(pv.out$plot.data.cpd) #multiple states in one graph set.seed(10) sim.cpd.data2 = matrix(sample(sim.cpd.data, 18000, replace = TRUE), ncol = 6) pv.out <- pathview(gene.data = gse16873.d[, 1:3], cpd.data = sim.cpd.data2[, 1:2], pathway.id = demo.paths$sel.paths[i], species = "hsa", out.suffix = "gse16873.cpd.3-2s", keys.align = "y", kegg.native = TRUE, match.data = FALSE, multi.state = TRUE, same.layer = TRUE) str(pv.out) head(pv.out$plot.data.cpd) #result PNG file in current directory ##more examples of pathview usages are shown in the vignette.
The molecular data simulator generates either gene.data or cpd.data of different ID types, molecule numbers, sample sizes, either continuous or discrete.
sim.mol.data(mol.type = c("gene", "gene.ko", "cpd")[1], id.type = NULL, species="hsa", discrete = FALSE, nmol = 1000, nexp = 1, rand.seed=100)
sim.mol.data(mol.type = c("gene", "gene.ko", "cpd")[1], id.type = NULL, species="hsa", discrete = FALSE, nmol = 1000, nexp = 1, rand.seed=100)
mol.type |
character of length 1, specifing the molecular type, either "gene" (including
transcripts, proteins), or "gene.ko" (KEGG ortholog genes, as defined in
KEGG ortholog pathways), or "cpd" (including metabolites, glycans,
drugs). Note that KEGG ortholog gene are considered "gene" in function
|
id.type |
character of length 1, the molecular ID type. When mol.type="gene", proper ID types include "KEGG" and "ENTREZ" (Entrez Gene). Multiple other ID types are also valid When species is among 19 major species fully annotated in Bioconductor, e.g. "hsa" (human), "mmu" (mouse) etc, check:
|
species |
character, either the kegg code, scientific name or the common name of
the target species. This is only effective when mol.type =
"gene". Setting species="ko" is equilvalent to
mol.type="gene.ko". Default species="hsa", equivalent to either "Homo
sapiens" (scientific name) or "human" (common name). Gene data id.type
has multiple other choices for 19 major research species, for details
do: |
discrete |
logical, whether to generate discrete or continuous data. d discrete=FALSE, otherwise, mol.data will be a charactor vector of molecular IDs. |
nmol |
integer, the target number of different molecules. Note that the specified id.type may not have as many different IDs as nmol. In this case, all IDs of id.type are used. |
nexp |
integer, the sample size or the number of columns in the result simulated data. |
rand.seed |
numeric of length 1, the seed number to start the random sampling process. This argumemnt makes the simulation reproducible as long as its value keeps the same. Default rand.seed=100. |
This function is written mainly for simulation or experiment with pathview package. With the simulated molecular data, you may check whether and how pathview works for molecular data of different types, IDs, format or sample sizes etc. You may also generate both gene.data and cpd.data and check data pathway based integration with pathview.
either vector (single sample) or a matrix-like data (multiple
sample), depends on the value of nexp
. Vector should be numeric
with molecular IDs as names or it may also be character of molecular
IDs depending on the value of discrete
. Matrix-like data structure has molecules as
rows and samples as columns. Row names should be molecular IDs.
This returned data can be used directly as gene.data or cpd.data
input of pathview
main function.
Weijun Luo <[email protected]>
Luo, W. and Brouwer, C., Pathview: an R/Bioconductor package for pathway based data integration and visualization. Bioinformatics, 2013, 29(14): 1830-1831, doi: 10.1093/bioinformatics/btt285
node.map
the node data mapper function.
mol.sum
the auxillary molecular data mapper,
id2eg
, cpd2kegg
etc the auxillary molecular ID mappers,
pathview
the main function,
#continuous compound data cpd.data.c=sim.mol.data(mol.type="cpd", nmol=3000) #discrete compound data cpd.data.d=sim.mol.data(mol.type="cpd", nmol=3000, discrete=TRUE) head(cpd.data.c) head(cpd.data.d) #continuous compound data named with "CAS Registry Number" cpd.cas <- sim.mol.data(mol.type = "cpd", id.type = "CAS Registry Number", nmol = 10000) #gene data with two samples gene.data.2=sim.mol.data(mol.type="gene", nmol=1000, nexp=2) head(gene.data.2) #KEGG ortholog gene data ko.data=sim.mol.data(mol.type="gene.ko", nmol=5000)
#continuous compound data cpd.data.c=sim.mol.data(mol.type="cpd", nmol=3000) #discrete compound data cpd.data.d=sim.mol.data(mol.type="cpd", nmol=3000, discrete=TRUE) head(cpd.data.c) head(cpd.data.d) #continuous compound data named with "CAS Registry Number" cpd.cas <- sim.mol.data(mol.type = "cpd", id.type = "CAS Registry Number", nmol = 10000) #gene data with two samples gene.data.2=sim.mol.data(mol.type="gene", nmol=1000, nexp=2) head(gene.data.2) #KEGG ortholog gene data ko.data=sim.mol.data(mol.type="gene.ko", nmol=5000)
strfit does hard wrapping, i.e. break within long words, wordwrap is a wrapper of strfit but also provides soft wrapping option, i.e. break only between words, and keep long words intact.
wordwrap(s, width = 20, break.word = FALSE) strfit(s, width = 20)
wordwrap(s, width = 20, break.word = FALSE) strfit(s, width = 20)
s |
characcter, strings to be wrapped or broken down. |
width |
integer, target line width in terms of number of characters. d width=20. |
break.word |
logical, whether to break within words or only between words as to fit
the line width. Default break.word=FALSE, i.e. keep words intact and only
break between words. Therefore, some line may exceed the |
These functions are called as to wrap long node labels into shorter
lines on pathway graphs in keggview.graph
function (when
keggview.native=FALSE). They are equally useful for wrapping long
labels in other types of graphs or output formats.
character of the same length of s
except that each element has
been wrapped softly or hardly.
Weijun Luo <[email protected]>
Luo, W. and Brouwer, C., Pathview: an R/Bioconductor package for pathway based data integration and visualization. Bioinformatics, 2013, 29(14): 1830-1831, doi: 10.1093/bioinformatics/btt285
strwrap
in R base.
long.str="(S)-Methylmalonate semialdehyde" wr1=wordwrap(long.str, width=15) #long word intact cat(wr1, sep="\n") wr2=strfit(long.str, width=15) #long word split cat(wr2, sep="\n")
long.str="(S)-Methylmalonate semialdehyde" wr1=wordwrap(long.str, width=15) #long word intact cat(wr1, sep="\n") wr2=strfit(long.str, width=15) #long word split cat(wr2, sep="\n")