Title: | PaIRKAT |
---|---|
Description: | PaIRKAT is model framework for assessing statistical relationships between networks of metabolites (pathways) and an outcome of interest (phenotype). PaIRKAT queries the KEGG database to determine interactions between metabolites from which network connectivity is constructed. This model framework improves testing power on high dimensional data by including graph topography in the kernel machine regression setting. Studies on high dimensional data can struggle to include the complex relationships between variables. The semi-parametric kernel machine regression model is a powerful tool for capturing these types of relationships. They provide a framework for testing for relationships between outcomes of interest and high dimensional data such as metabolomic, genomic, or proteomic pathways. PaIRKAT uses known biological connections between high dimensional variables by representing them as edges of ‘graphs’ or ‘networks.’ It is common for nodes (e.g. metabolites) to be disconnected from all others within the graph, which leads to meaningful decreases in testing power whether or not the graph information is included. We include a graph regularization or ‘smoothing’ approach for managing this issue. |
Authors: | Charlie Carpenter [aut], Cameron Severn [aut], Max McGrath [cre, aut] |
Maintainer: | Max McGrath <[email protected]> |
License: | GPL-3 |
Version: | 1.13.0 |
Built: | 2024-10-31 00:53:04 UTC |
Source: | https://github.com/bioc/pairkat |
Takes a SummarizedExperiment
and constructs a list with KEGG pathways
GatherNetworks(SE, keggID = "KEGG", species = "hsa", minPathwaySize = 5)
GatherNetworks(SE, keggID = "KEGG", species = "hsa", minPathwaySize = 5)
SE |
A |
keggID |
column name in pathway data containing KEGG IDs |
species |
The three letter KEGG organism ID |
minPathwaySize |
Filter pathways that are below a minimum size |
Queries KEGG database for known molecular interactions between included metabolites via the KEGGREST API.
a list object containing the original SummarizedExperiment and igraph network objects
data(smokers) # Query KEGGREST API networks <- GatherNetworks(SE = smokers, keggID = "kegg_id", species = "hsa", minPathwaySize = 5)
data(smokers) # Query KEGGREST API networks <- GatherNetworks(SE = smokers, keggID = "kegg_id", species = "hsa", minPathwaySize = 5)
Pathway Integrated Regression-based Kernel Association Test (PaIRKAT) is a model framework for assessing statistical relationships between networks and some outcome of interest while adjusting for potential confounders and covariates.
Use of PaIRKAT is motivated by the analysis of networks of metabolites from a metabolomics assay and the relationship of those networks with a phenotype or clinical outcome of interest, though the method can be generalized to other domains.
PaIRKAT(formula.H0, networks, tau = 1)
PaIRKAT(formula.H0, networks, tau = 1)
formula.H0 |
The null model in the "formula" format used
in |
networks |
networks object obtained
with |
tau |
A parameter to control the amount of smoothing, analagous to a bandwidth parameter in kernel smoothing. We found 1 often gave reasonable results, as over-smoothing can lead to inflated Type I errors. |
The PaIRKAT method is to update the feature matrix, ,
with the regularized normalized Laplacian,
, before performing
the kernel association test.
is calculated using a "linear"
regularization,
where is the identity
matrix,
is a regularization parameter that controls the amount
of smoothing, and
is the graph's normalized Laplacian. The updated
feature matrix,
is matrix used for the kernel association
test.
The linear regularization and Gaussian kernel is used for all tests.
See Carpenter 2021 for complete details on PaIRKAT and Smola 2003
for information about graph regularization
a list object containing the formula call and results by pathway
Carpenter CM, Zhang W, Gillenwater L, Severn C, Ghosh T, Bowler R, et al. PaIRKAT: A pathway integrated regression-based kernel association test with applications to metabolomics and COPD phenotypes. bioRxiv. 2021 Apr 26;2021.04.23.440821.
Smola AJ, Kondor R. Kernels and Regularization on Graphs. In: Schölkopf B, Warmuth MK, editors. Learning Theory and Kernel Machines. Berlin, Heidelberg: Springer Berlin Heidelberg; 2003. p. 144–58. (Goos G, Hartmanis J, van Leeuwen J, editors. Lecture Notes in Computer Science; vol. 2777). http://link.springer.com/10.1007/978-3-540-45167-9_12
data(smokers) # Query KEGGREST API networks <- GatherNetworks(SE = smokers, keggID = "kegg_id", species = "hsa", minPathwaySize = 5) # Run PaIRKAT Analysis output <- PaIRKAT(log_FEV1_FVC_ratio ~ age, networks = networks) # View Results output$results
data(smokers) # Query KEGGREST API networks <- GatherNetworks(SE = smokers, keggID = "kegg_id", species = "hsa", minPathwaySize = 5) # Run PaIRKAT Analysis output <- PaIRKAT(log_FEV1_FVC_ratio ~ age, networks = networks) # View Results output$results
Helper function for plotting networks of metabolites gathered
from the KEGG pathways database using the
GatherNetworks
function.
plotNetworks(networks, pathway = "all", ...)
plotNetworks(networks, pathway = "all", ...)
networks |
networks object obtained
with |
pathway |
Pathway to be plotted. Leaving this as 'all' will plot all pathways in 'networks' |
... |
Parameters to be passed to the |
Plots the specified network(s) as an igraph
a plot or list of plots generated by igraph
data(smokers) # Query KEGGREST API networks <- GatherNetworks(SE = smokers, keggID = "kegg_id", species = "hsa", minPathwaySize = 5) # Plot all networks plotNetworks(networks) # Plot specific network plotNetworks(networks, pathway = "Glycerophospholipid metabolism", layout = igraph::layout_with_kk, main = "Glycerophospholipid Metabolism")
data(smokers) # Query KEGGREST API networks <- GatherNetworks(SE = smokers, keggID = "kegg_id", species = "hsa", minPathwaySize = 5) # Plot all networks plotNetworks(networks) # Plot specific network plotNetworks(networks, pathway = "Glycerophospholipid metabolism", layout = igraph::layout_with_kk, main = "Glycerophospholipid Metabolism")
A synthetic data set of human subjects with phenotype variables related to lung health among smokers. Subjects have associated metabolomics assay data that are linked to KEGG pathway database IDs.
data(smokers)
data(smokers)
A SummarizedExperiment S4 object containing the following components
phenotype
A dataframe containing phenotype variables and outcomes of interest. Row names are subject IDs.
pathways
A dataframe containing pathway database identifiers (i.e. KEGG IDs). Row names are metabolite names
metabalome
A dataframe containing a metabolomics assay. Row names are metabolite names and column names are subject IDs.