Title: | Interactive Visualization of Integrated Differential Expression and Differential Network Analysis for Biomarker Candidate Selection Package |
---|---|
Description: | An R package for integrated differential expression and differential network analysis based on omic data for cancer biomarker discovery. Both correlation and partial correlation can be used to generate differential network to aid the traditional differential expression analysis to identify changes between biomolecules on both their expression and pairwise association levels. A detailed description of the methodology has been published in Methods journal (PMID: 27592383). An interactive visualization feature allows for the exploration and selection of candidate biomarkers. |
Authors: | Yiming Zuo <[email protected]>, Kian Ghaffari <[email protected]>, Zhenzhi Li <[email protected]> |
Maintainer: | Ressom group <[email protected]>, Yiming Zuo <[email protected]> |
License: | Artistic-2.0 |
Version: | 2.21.0 |
Built: | 2024-11-29 06:10:16 UTC |
Source: | https://github.com/bioc/INDEED |
This function draws error curve using cross-validation.
choose_rho(data, n_fold, rho)
choose_rho(data, n_fold, rho)
data |
This is a matrix. |
n_fold |
This parameter specifies the n number in n-fold cross_validation. |
rho |
This is the regularization parameter values to be evalueated in terms their errors. |
A list of errors and their corresponding .
This function computes either the pearson or spearman correlation coefficient.
compute_cor(data_group_1, data_group_2, type_of_cor)
compute_cor(data_group_1, data_group_2, type_of_cor)
data_group_1 |
This is a n*p matrix. |
data_group_2 |
This is a n*p matrix. |
type_of_cor |
If this is NULL, pearson correlation coefficient will be calculated as default. Otherwise, a character string "spearman" will calculate the spearman correlation coefficient. |
A list of correlation matrices for both group 1 and group 2.
This function calculates differential network score by using the binary link and z-scores.
compute_dns(binary_link, z_score)
compute_dns(binary_link, z_score)
binary_link |
This is the binary correlation matrix with 1 indicating positive correlation and -1 indicating negative correlation for each biomolecular pair. |
z_score |
This is converted from the given or calculated p-value. |
An activity score associated with each biomarker candidate.
This function computes the partial correlation coefficient.
compute_par(pre_inv)
compute_par(pre_inv)
pre_inv |
This is an inverse covariance matrix. |
A partial correlation matrix.
The INDEED R package provides important functions as shown below: non_partial_cor(), select_rho_partial(), partial_cor(), and network_display().
non_partial_cor function performs typical correlation analysis based on user input data, class label, p-value, sample id, number of permutations, and the method (default pearson) p value is optional, the result of score table and differential network will be returned
select_rho_partial function preprocesses data for partical correlation analysis, the result contains list of preprocessed data and rho values and error plot for user to choose desired rho value for graphical lasso
partial_cor function performs partical correlation analysis based on user input preprocessed list from select_rho_partial step and the rho choosing method or values of their choice and number of permutations (default 1000), p-value is optional, the result of score table and differential network will be returned
A function to assist in the network visualization of the result from INDEED functions non_partial_cor() and patial_cor().
This function calculates the log likelihood error.
loglik_ave(data, theta)
loglik_ave(data, theta)
data |
This is a matrix. |
theta |
This is a precision matrix. |
log likelihood error
A dataset containing group information (CIRR group: 0 and HCC group: 1).
Met_Group_GU
Met_Group_GU
A data frame with 1 row and 120 (subjects) columns.
A dataset containing the expression levels for each of the 120 subjects (HCC: 60; CIRR: 60) in terms of 39 metabolites.
Met_GU
Met_GU
A data frame with 39 variables (rows) and 120 subjects (columns).
A dataset containing the KEGG ID for each metabolite.
Met_name_GU
Met_name_GU
A data frame with 39 KEGG ID as rows and 1 column:
An interactive tool to assist in the visualization of the results from INDEED functions non_partial_corr() or patial_corr(). The size and the color of each node can be adjusted by users to represent either the Node_Degree, Activity_Score, Z_Score, or P_Value. The color of the edge is based on the binary value of either 1 corresonding to a positive correlation dipicted as green or a negative correlation of -1 dipicted as red. The user also has the option of having the width of each edge be proportional to its weight value. The layout of the network can also be customized by choosing from the options: 'nice', 'sphere', 'grid', 'star', and 'circle'. Nodes can be moved and zoomed in on. Each node and edge will display extra information when clicked on. Secondary interactions will be highlighted as well when a node is clicked on.
network_display(results = NULL, nodesize = "P_Value", nodecolor = "Activity_Score", edgewidth = "NO", layout = "nice")
network_display(results = NULL, nodesize = "P_Value", nodecolor = "Activity_Score", edgewidth = "NO", layout = "nice")
results |
This is the result from calling either non_partial_corr() or partial_corr(). |
nodesize |
This parameter determines what the size of each node will represent. The options are 'Node_Degree', 'Activity_Score','P_Value' and 'Z_Score'. The title of the resulting network will identify which parameter was selected to represent the node size. The default is P_Value. |
nodecolor |
This parameter determines what color each node will be based on a yellow to blue color gradient. The options are 'Node_Degree', 'Activity_Score', 'P_Value', and ' Z_Score'. A color bar will be created based on which parameter is chosen. The default is Activity_Score. |
edgewidth |
This is a 'YES' or 'NO' option as to if the edgewidth should be representative of the weight value corresponding to the correlation change between two nodes. The default is NO. |
layout |
User can choose from a a handful of network visualization templates including: 'nice', 'sphere', 'grid', 'star', and 'circle'. The default is nice. |
An interactive dipiction of the network resulting from INDEED functions non_partial_corr() or patial_corr().
result = non_partial_cor(data = Met_GU, class_label = Met_Group_GU, id = Met_name_GU, method = "spearman", permutation_thres = 0.05, permutation = 1000) network_display(results = result, nodesize = 'P_Value', nodecolor = 'Activity_Score', edgewidth = 'NO', layout = 'nice')
result = non_partial_cor(data = Met_GU, class_label = Met_Group_GU, id = Met_name_GU, method = "spearman", permutation_thres = 0.05, permutation = 1000) network_display(results = result, nodesize = 'P_Value', nodecolor = 'Activity_Score', edgewidth = 'NO', layout = 'nice')
A method that integrates differential expression (DE) analysis and differential network (DN) analysis to select biomarker candidates for cancer studies. non_partial_cor is a one step function for user to perform the analysis based on typical correlation analysis, no pre-processing step required.
non_partial_cor(data = NULL, class_label = NULL, id = NULL, method = "pearson", p_val = NULL, permutation = 1000, permutation_thres = 0.05)
non_partial_cor(data = NULL, class_label = NULL, id = NULL, method = "pearson", p_val = NULL, permutation = 1000, permutation_thres = 0.05)
data |
This is a matrix of expression from all biomolecules and all samples. |
class_label |
this is a binary array with 0 for group 1 and 1 for group 2. |
id |
This is an array of biomolecule IDs. |
method |
This is a character string indicating which correlation coefficient is to be computed. The options are either "pearson" as the default or "spearman". |
p_val |
This is optional, it is a dataframe containing p-value for each biomolecule. |
permutation |
This is a positive integer representing the desired number of permutations, default is 1000. |
permutation_thres |
This is a threshold for permutation. The defalut is 0.05 to make 95 percent confidence.. |
A list containing a score table with "ID", "P_value", "Node_Degree", "Activity_Score" and a differential network table with "Node1", "Node2", the binary link value and the weight link value.
non_partial_cor(data = Met_GU, class_label = Met_Group_GU, id = Met_name_GU, method = "pearson", permutation = 1000, permutation_thres = 0.05)
non_partial_cor(data = Met_GU, class_label = Met_Group_GU, id = Met_name_GU, method = "pearson", permutation = 1000, permutation_thres = 0.05)
A method that integrates differential expression (DE) analysis and differential network (DN) analysis to select biomarker candidates for cancer studies. partial_cor is the second step of partial correlation calculation after getting the result from select_rho_partial function.
partial_cor(data_list = NULL, rho_group1 = NULL, rho_group2 = NULL, permutation = 1000, p_val = NULL, permutation_thres = 0.05)
partial_cor(data_list = NULL, rho_group1 = NULL, rho_group2 = NULL, permutation = 1000, p_val = NULL, permutation_thres = 0.05)
data_list |
This is a list of pre-processed data outputed by the select_rho_partial function. |
rho_group1 |
This is the rule for choosing rho for group 1, "min": minimum rho, "ste": one standard error from minimum, or user can input rho of their choice, the default is minimum. |
rho_group2 |
This is the rule for choosing rho for group 2, "min": minimum rho, "ste": one standard error from minimum, or user can input rho of their choice, the default is minimum. |
permutation |
This is a positive integer of the desired number of permutations. The default is 1000 permutations. |
p_val |
This is optional. It is a data frame that contains p-values for each biomolecule. |
permutation_thres |
This is the threshold for permutation. The defalut is 0.05 to make 95 percent confidence. |
A list containing a score table with "ID", "P_value", "Node_Degree", "Activity_Score" and a differential network table with "Node1", "Node2", the binary link value and the weight link value.
# step 1: select_rho_partial preprocess<- select_rho_partial(data = Met_GU, class_label = Met_Group_GU, id = Met_name_GU, error_curve = "YES") # step 2: partial_cor partial_cor(data_list = preprocess, rho_group1 = 'min', rho_group2 = "min", permutation = 1000, p_val = pvalue_M_GU, permutation_thres = 0.05)
# step 1: select_rho_partial preprocess<- select_rho_partial(data = Met_GU, class_label = Met_Group_GU, id = Met_name_GU, error_curve = "YES") # step 2: partial_cor partial_cor(data_list = preprocess, rho_group1 = 'min', rho_group2 = "min", permutation = 1000, p_val = pvalue_M_GU, permutation_thres = 0.05)
A permutation test that randomly permutes the sample labels in distinct biological groups for each biomolecule. The difference in each paired biomolecule is considered statistically significant if it falls into the 2.5 empirical distribution curve.
permutation_cor(m, p, n_group_1, n_group_2, data_group_1, data_group_2, type_of_cor)
permutation_cor(m, p, n_group_1, n_group_2, data_group_1, data_group_2, type_of_cor)
m |
This is the number of permutations desired. |
p |
This is the number of biomarker candidates present. |
n_group_1 |
This is the number of subjects in group 1. |
n_group_2 |
This is the number of subjects in group 2. |
data_group_1 |
This is a |
data_group_2 |
THis is a |
type_of_cor |
If this is NULL, pearson correlation coefficient will be calculated as default. Otherwise, a character string "spearman" will calculate the spearman correlation coefficient. |
A multi-dimensional matrix that contains the permutation result.
A permutation test that randomly permutes the sample labels in distinct biological groups for each biomolecule. The difference in paired partial correlation is considered statistically significant if it falls into the 2.5 empirical distribution curve.
permutation_pc(m, p, n_group_1, n_group_2, data_group_1, data_group_2, rho_group_1_opt, rho_group_2_opt)
permutation_pc(m, p, n_group_1, n_group_2, data_group_1, data_group_2, rho_group_1_opt, rho_group_2_opt)
m |
This is the number of permutations desired. |
p |
This is the number of biomarker candidates present. |
n_group_1 |
This is the number of subjects in group 1. |
n_group_2 |
This is the number of subjects in group 2. |
data_group_1 |
This is a |
data_group_2 |
This is a |
rho_group_1_opt |
This is an optimal tuning parameter to obtain a sparse differential network for group 1. |
rho_group_2_opt |
This is an optimal tuning parameter to obtain a sparse differential network for group 2. |
A multi-dimensional matrix that contains the permutation result.
This function calculates the positive and negative thresholds based on the permutation result.
permutation_thres(thres_left, thres_right, p, diff_p)
permutation_thres(thres_left, thres_right, p, diff_p)
thres_left |
This is the threshold representing 2.5 percent of the left tail of the empirical distributuion curve. |
thres_right |
This is the threshold representing 2.5 percent of the right tail of the empirical distributuion curve. |
p |
This is the number of biomarker candidates present. |
diff_p |
This is the permutation result from either permutation_cor or permutation_pc. |
A list of positive and negative thresholds.
This function calculates p-values using logistic regression in cases that p-values are not provided.
pvalue_logit(x, class_label, Met_name)
pvalue_logit(x, class_label, Met_name)
x |
This is a data frame consists of data from group 1 and group 2. |
class_label |
This is a binary array indicating 0 for group 1 and 1 for group 2. |
Met_name |
This is an array of IDs. |
p-values
A dataset containing the p-value for each metabolite obtained through DE analysis.
pvalue_M_GU
pvalue_M_GU
A data frame with 39 rows and 2 variables:
KEGG ID
p-value
This function is used to help spread out data values across 0 to 1. This is so that it will be easier to distinguish values later incorporated into the network_display function.
scale_range(x)
scale_range(x)
x |
This is a list of numbers taken form on the columns outputted from calling non_partial_corr or patial_corr functions. |
Scaled version of data that fits between 0 to 1.
A method that integrates differential expression (DE) analysis and differential network (DN) analysis to select biomarker candidates for cancer studies. select_rho_partial is the pre-processing step for INDEED partial differential analysis.
select_rho_partial(data = NULL, class_label = NULL, id = NULL, error_curve = "YES")
select_rho_partial(data = NULL, class_label = NULL, id = NULL, error_curve = "YES")
data |
This is a matrix of expression from all biomolecules and all samples. |
class_label |
This is a binary array with 0 for group 1 and 1 for group 2. |
id |
This is an array of biomolecule IDs. |
error_curve |
This is an option on whether a error curve plot will be provided to the user, user can choose "YES" or "NO". The default is YES. |
A list of processed data for the next step, and generates an error curve to select rho for graphical lasso.
select_rho_partial(data = Met_GU, class_label = Met_Group_GU, id = Met_name_GU, error_curve = "YES")
select_rho_partial(data = Met_GU, class_label = Met_Group_GU, id = Met_name_GU, error_curve = "YES")