Title: | Integrating Multi-Omic Network Data With Summay-Level GWAS Data |
---|---|
Description: | This pacakge integrates data from the Kyoto Encyclopedia of Genes and Genomes (KEGG) with summary-level genome-wide association (GWAS) data, such as that provided by the GWAS Catalog or GWAS Central databases, or a user's own study or dataset, in order to produce biological networks, termed IMONs (Integrated Multi-Omic Networks). IMONs can be used to analyse trait-specific polymorphic data within the context of biochemical and metabolic reaction networks, providing greater biological interpretability for GWAS data. |
Authors: | Luca Anholt [cre, aut] |
Maintainer: | Luca Anholt <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.9.0 |
Built: | 2024-10-30 09:23:08 UTC |
Source: | https://github.com/bioc/PanViz |
internal function that assembles all the KEGG data into a network/graph
adj_list_to_igraph(adjl_G_S)
adj_list_to_igraph(adjl_G_S)
adjl_G_S |
adjacency list containing relevant adjacent SNPs/KEGG genes |
an igraph object, containing a network representing all the KEGG data
Internal function that constructs an IMON (Integrated Multi-Omic Network) for an inputted adjacency list containing adjacency information between KEGG genes and queried SNPs.
adjl_to_G(adjl_G_S)
adjl_to_G(adjl_G_S)
adjl_G_S |
- adjacency list containing relevant adjacencies between inputted SNPs and genes from KEGG |
igraph object representing total IMON for inputted SNPs
Internal function that constructs either a variable-coloured or uncoloured IMON (Integrated Multi-Omic Network) for an inputted adjacency list containing adjacency information between KEGG genes and queried SNPs.
adjl_to_G_grouped( adjl_G_S, unique_group_names, unique_group_cols, group_snps, colour_groups, ego, progress_bar )
adjl_to_G_grouped( adjl_G_S, unique_group_names, unique_group_cols, group_snps, colour_groups, ego, progress_bar )
adjl_G_S |
- adjacency list containing relevant adjacencies between inputted SNPs and genes from KEGG |
unique_group_names |
- a list of the unique group/variable names in the provided GWAS Catalog association file |
unique_group_cols |
- a list of unique colours for each unique group/variable in the provided GWAS Catalog association file |
group_snps |
- a recursive list containing the lists of SNPs belonging to each unique group/variable in the provided GWAS Catalog association file |
colour_groups |
- boolean: whether or not user has chosen to colour the network by the unique group/variables in the provided GWAS Catalog association file |
ego |
- the egocentric order (centred around the SNPs in the network) in which to build the network i.e. pathlength from SNPs downwards towards the metabolome |
progress_bar |
- boolean: whether or not user has decided to have a progress bar print to the console |
- an igraph object containing the IMON
colour network by categorical group levels
colour_IMON(G, progress_bar)
colour_IMON(G, progress_bar)
G |
- igraph object containing uncoloured IMON |
progress_bar |
Boolean (default = TRUE) argument that controls whether or not a progress bar for calculations/KEGGREST API GET requests should be printed to the console |
- igraph object containing coloured IMON
dbSNP_query_check
dbSNP_query_check(query)
dbSNP_query_check(query)
query |
- raw query data from NCBI dbSNP API |
- vector containing either 0 (denoting successful query) or NA (unsuccessful query)
Internal function clean up raw SNP data queried from NCBI dbSNP via Entrez API depending on whether or not it could be successfully queried
dbSNP_query_clean(query)
dbSNP_query_clean(query)
query |
- raw dbSNP query object |
- dataframe of separate chromosome number, position and ID
This function returns a list of fully connected IMONs from a single parent unconnected IMON.
decompose_IMON(G)
decompose_IMON(G)
G |
- igraph object containing non-fully connected IMON |
- list of igraph objects, where each index contains a fully connected IMON
data("er_snp_vector") G <- PanViz::get_IMON(snp_list = er_snp_vector, ego = 5, save_file = FALSE) G_list <- decompose_IMON(G)
data("er_snp_vector") G <- PanViz::get_IMON(snp_list = er_snp_vector, ego = 5, save_file = FALSE) G_list <- decompose_IMON(G)
Internal function for trimming IMON to ego-centred (centred around SNPs) to specified order (pathway length from SNPs)
ego_IMON(G, ego)
ego_IMON(G, ego)
G |
- igraph object representing IMON |
ego |
- the selected ego-centred path length |
- ego-centred IMON set at desired path length
A dataset containing a vector of SNPs (summary-level GWAS data) associated with estrogen-receptor positive breast cancer (EFO_1000649), collated by the GWAS Catalog.
data(er_snp_vector)
data(er_snp_vector)
A vector with 110 elements
This function constructs an IMON (Integrated Multi-Omic Network) with SNPs/or whole network coloured by selected categorical levels (either studies or phenotypes)
get_grouped_IMON( dataframe, groupby = c("studies", "traits"), ego = 5, save_file = c(FALSE, TRUE), export_type = c("igraph", "edge_list", "graphml", "gml"), directory = c("wd", "choose"), colour_groups = c(FALSE, TRUE), progress_bar = c(TRUE, FALSE) )
get_grouped_IMON( dataframe, groupby = c("studies", "traits"), ego = 5, save_file = c(FALSE, TRUE), export_type = c("igraph", "edge_list", "graphml", "gml"), directory = c("wd", "choose"), colour_groups = c(FALSE, TRUE), progress_bar = c(TRUE, FALSE) )
dataframe |
A dataframe including 3 columns in the following order and with the following names: snps, studies, traits (all character vectors) |
groupby |
Choose whether to group SNP and or network colouring by either studies or traits |
ego |
This dictates what length order ego-centred network should be constructed. If set to 5 (default and recommended), an IMON with the first layer of the connected metabolome will be returned. If set above 5, the corresponding extra layer of the metabolome will be returned. If set to 0 (not recommended) the fully connected metabolome will be returned. Note, this cannot be set between 0 and 5. |
save_file |
Boolean (default = FALSE) argument that indicates whether or not the user wants to save the graph as an exported file in their current working directory |
export_type |
This dictates the network data structure saved in your working directory. By default this outputs an igraph object, however, you can choose to export and save an edge list, graphml or GML file. |
directory |
If set to "choose" this argument allows the user to interactively select the directory of their choice in which they wish to save the constructed IMON, else the file will be saved to the working directory "wd" by default |
colour_groups |
Boolean (default = FALSE) chooses whether or not to colour the whole network by grouping variables |
progress_bar |
Boolean (default = TRUE) argument that controls whether or not a progress bar for calculations/KEGGREST API GET requests should be printed to the console |
An igraph object containing the constructed IMON with coloured SNPs/and or whole network by selected grouping variable
##getting GWAS Catalog association tsv file and cleaning up using ##GWAS_catalog_tsv_to_dataframe function: path <- system.file("extdata", "gwas-association-downloaded_2021-09-13-EFO_1000649.tsv", package="PanViz") df <- PanViz::GWAS_data_reader(file = path, snp_col = "SNPS", study_col = "STUDY", trait_col = "DISEASE/TRAIT") ##creating uncoloured IMON: G <- PanViz::get_grouped_IMON(dataframe = df, groupby = "studies", ego = 5, save_file = FALSE, colour_groups = FALSE) ##creating IMON where vertices/edges are coloured by the variable study: G <- PanViz::get_grouped_IMON(dataframe = df, groupby = "studies", ego = 5, save_file = FALSE, colour_groups = TRUE)
##getting GWAS Catalog association tsv file and cleaning up using ##GWAS_catalog_tsv_to_dataframe function: path <- system.file("extdata", "gwas-association-downloaded_2021-09-13-EFO_1000649.tsv", package="PanViz") df <- PanViz::GWAS_data_reader(file = path, snp_col = "SNPS", study_col = "STUDY", trait_col = "DISEASE/TRAIT") ##creating uncoloured IMON: G <- PanViz::get_grouped_IMON(dataframe = df, groupby = "studies", ego = 5, save_file = FALSE, colour_groups = FALSE) ##creating IMON where vertices/edges are coloured by the variable study: G <- PanViz::get_grouped_IMON(dataframe = df, groupby = "studies", ego = 5, save_file = FALSE, colour_groups = TRUE)
Internal function that constructs an IMON (Integrated Multi-Omic Network) for an inputted vector of SNPs and exports an igraph file.
get_IMON( snp_list, ego = 5, save_file = c(FALSE, TRUE), export_type = c("igraph", "edge_list", "graphml", "gml"), directory = c("wd", "choose"), progress_bar = c(TRUE, FALSE) )
get_IMON( snp_list, ego = 5, save_file = c(FALSE, TRUE), export_type = c("igraph", "edge_list", "graphml", "gml"), directory = c("wd", "choose"), progress_bar = c(TRUE, FALSE) )
snp_list |
A vector of SNPs (strings/characters) using standard NCBI dbSNP accession number naming convention (e.g. "rs185345278") |
ego |
This dictates what length order ego-centred network should be constructed. If set to 5 (default and recommended), an IMON with the first layer of the connected metabolome will be returned. If set above 5, the corresponding extra layer of the metabolome will be returned. If set to 0 (not recommended) the fully connected metabolome will be returned. Note, this cannot be set between 0 and 5. |
save_file |
Boolean (default = FALSE) argument that indicates whether or not the user wants to save the graph as an exported file in their current working directory |
export_type |
This dictates the network data structure saved in the chosen directory. By default this outputs an igraph object, however, you can choose to export and save an edge list, graphml or GML file. |
directory |
If set to "choose" this argument allows the user to interactively select the directory of their choice in which they wish to save the constructed IMON, else the file will be saved to the working directory "wd" by default |
progress_bar |
Boolean (default = TRUE) argument that controls whether or not a progress bar for calculations/KEGGREST API GET requests should be printed to the console |
An igraph object containing the constructed IMON
##getting vector of SNPs to query: data("er_snp_vector") ##build IMON using vector: G <- PanViz::get_IMON(snp_list = er_snp_vector, ego = 5, save_file = FALSE)
##getting vector of SNPs to query: data("er_snp_vector") ##build IMON using vector: G <- PanViz::get_IMON(snp_list = er_snp_vector, ego = 5, save_file = FALSE)
GWAS_data_reader
GWAS_data_reader(file, snp_col, study_col, trait_col)
GWAS_data_reader(file, snp_col, study_col, trait_col)
file |
- Character (string) containing the directory path to a .tsv or .csv file containing summary level GWAS data, typically this can be sourced from major GWAS databases such as the GWAS Catalog or GWAS Central. |
snp_col |
- Character (string) reflecting the column name containing the SNP (standard dbSNP accession number, e.g. rs992531) data. In data sourced from the GWAS Catalog, this column will typically be named "SNPS" and in GWAS Central this will typically be "Source Marker Accession". |
study_col |
- Character (string) reflecting the column name containing the study names associated with each SNP. In data sourced from the GWAS Catalog, this column will typically be named "STUDY" and in GWAS Central this will typically be "Study Name". |
trait_col |
- Character (string) reflecting the column name containing the trait/phenotype names associated with each SNP. In data sourced from the GWAS Catalog, this column will typically be named "DISEASE/TRAIT" and in GWAS Central this will typically be "Annotation Name". |
A processed dataframe containing only the columns including GWAS studies, traits/phenotypes and relevant SNPs in NCBI standard accession number naming convention
##getting directory path to GWAS Catalog association .tsv file: path = system.file("extdata", "gwas-association-downloaded_2021-09-13-EFO_1000649.tsv", package="PanViz") ##opening/cleaning data: df <- PanViz::GWAS_data_reader(file = path, snp_col = "SNPS", study_col = "STUDY", trait_col = "DISEASE/TRAIT") ##getting directory path to GWAS Central association .tsv file: path = system.file("extdata", "GWASCentralMart_ERplusBC.tsv", package="PanViz") ##opening/cleaning data: df <- PanViz::GWAS_data_reader(file = path, snp_col = "Source Marker Accession", study_col = "Study Name", trait_col = "Annotation Name")
##getting directory path to GWAS Catalog association .tsv file: path = system.file("extdata", "gwas-association-downloaded_2021-09-13-EFO_1000649.tsv", package="PanViz") ##opening/cleaning data: df <- PanViz::GWAS_data_reader(file = path, snp_col = "SNPS", study_col = "STUDY", trait_col = "DISEASE/TRAIT") ##getting directory path to GWAS Central association .tsv file: path = system.file("extdata", "GWASCentralMart_ERplusBC.tsv", package="PanViz") ##opening/cleaning data: df <- PanViz::GWAS_data_reader(file = path, snp_col = "Source Marker Accession", study_col = "Study Name", trait_col = "Annotation Name")
This is a helper function that merges any vector of hex colours
multi_hex_col_mix(col_vector)
multi_hex_col_mix(col_vector)
col_vector |
- vector of hex colours |
- a single mixed hex color from inputted hex codes
NCBI_clean
NCBI_clean(queried_data)
NCBI_clean(queried_data)
queried_data |
- input queried NCBI gene data |
remove genes with no genomic information from NCBI query
NCBI_clean_2
NCBI_clean_2(queried_data)
NCBI_clean_2(queried_data)
queried_data |
- rentrez object queried from NCBI |
return chromosome location, start and end position of gene from NCBI query
NCBI_dbSNP_query
NCBI_dbSNP_query(snp_list, progress_bar)
NCBI_dbSNP_query(snp_list, progress_bar)
snp_list |
- list of SNPs to be queried via NCBI dbSNP API |
progress_bar |
Boolean (default = TRUE) argument that controls whether or not a progress bar for calculations/KEGGREST API GET requests should be printed to the console |
- raw output from NCBI dbSNP API
This function helps to cleans up queried KEGG reaction recursive lists + separates compound/metabolite and reaction pair data into new sections
reaction_cleanup(queried_data)
reaction_cleanup(queried_data)
queried_data |
- input queried KEGG reaction data |
Trimmed recursive lists containing queried KEGG reaction data
Internal function for handling errors when accessing APIs
retry( expr, isError = function(x) "try-error" %in% class(x), maxErrors = 5, sleep = 0 )
retry( expr, isError = function(x) "try-error" %in% class(x), maxErrors = 5, sleep = 0 )
expr |
This is the function you want to catch and handle errors from |
isError |
Function for evaluating if provided expression is throwing an error |
maxErrors |
The maximum number of errrors it should handle from the function |
sleep |
The amount of sleep between a caught error and the next attempt |
The expression that has been either successfully ran or retried maximum number of times
set_base_graph_attributes
set_base_graph_attributes(G, colour_groups)
set_base_graph_attributes(G, colour_groups)
G |
igraph object containing KEGG network |
colour_groups |
logical - whether or not user has indicated on colouring the network by categorical variable i.e. study or trait/phenotype (only available via PanViz::get_grouped_IMON()) |
igraph object with node attributes set
snp grouping by chosen categorical variable
set_snp_grouping(G, unique_group_names, unique_group_cols, group_snps)
set_snp_grouping(G, unique_group_names, unique_group_cols, group_snps)
G |
- igraph object containing IMON |
unique_group_names |
- vector containing unique grouping variable names |
unique_group_cols |
- vector containing unique grouping colours for each variable |
group_snps |
- snps split by each variable/group |
- igraph object containing IMON with labelled and coloured snps by grouping variable
snp_gene_chr_match
snp_gene_chr_match(snp_loc, gene_loc)
snp_gene_chr_match(snp_loc, gene_loc)
snp_loc |
- snp locations |
gene_loc |
- dataframe of genes and their chromosome numbers and start/stop positions |
- a recursive list of gene with their relative snps that have the same chromosome number
Fast vectorised SNP to gene chromosome number and genomic location mapping
snp_gene_map(gene_loc, snp_loc)
snp_gene_map(gene_loc, snp_loc)
gene_loc |
dataframe containing KEGG genes and relevant chromosome number and positions |
snp_loc |
dataframe containing queried SNPs and relevant chromosome number and positions |
an adjacency list of SNPs with their relevant mapped genes to their genomic location