Title: | Molecular Degree of Perturbation calculates scores for transcriptome data samples based on their perturbation from controls |
---|---|
Description: | The Molecular Degree of Perturbation webtool quantifies the heterogeneity of samples. It takes a data.frame of omic data that contains at least two classes (control and test) and assigns a score to all samples based on how perturbed they are compared to the controls. It is based on the Molecular Distance to Health (Pankla et al. 2009), and expands on this algorithm by adding the options to calculate the z-score using the modified z-score (using median absolute deviation), change the z-score zeroing threshold, and look at genes that are most perturbed in the test versus control classes. |
Authors: | Melissa Lever [aut], Pedro Russo [aut], Helder Nakaya [aut, cre] |
Maintainer: | Helder Nakaya <[email protected]> |
License: | GPL-3 |
Version: | 1.27.0 |
Built: | 2024-11-04 06:05:59 UTC |
Source: | https://github.com/bioc/mdp |
Compute gene score Computes gene scores for each gene within each class and perturbation freq
compute_gene_score(zscore, pdata, control_lab, score_type = c("gene_score", "gene_freq"))
compute_gene_score(zscore, pdata, control_lab, score_type = c("gene_score", "gene_freq"))
zscore |
zscore data frame |
pdata |
phenotypic data with Class and Sample columns |
control_lab |
character specifying control class |
score_type |
set to 'gene_score' or 'gene_freq' to compute gene scores or frequencies |
data frame of gene scores or gene frequencies
Compute perturbed genes Find the top fraction of genes that are more perturbed in test versus controls
compute_perturbed_genes(gmdp_results, control_lab, fraction_genes)
compute_perturbed_genes(gmdp_results, control_lab, fraction_genes)
gmdp_results |
results table of gene scores |
control_lab |
label specificying control class |
fraction_genes |
fraction of top perturbed genes that will make the set of perturbed genes |
vector of perturbed genes
Compute sample scores for each pathway
compute_sample_scores(zscore, perturbed_genes, control_samples, test_samples, pathways, pdata)
compute_sample_scores(zscore, perturbed_genes, control_samples, test_samples, pathways, pdata)
zscore |
zscore data frame |
perturbed_genes |
list of pertured genes |
control_samples |
vector of control sample names |
test_samples |
vector of test sample names |
pathways |
list of pathways |
pdata |
phenotypic data with Sample and Class columns |
data frame of sample scores
Computes the thresholded Z score Plots the Z score using control samples to compute the average and standard deviation
compute_zscore(data, control_samples, measure = c("mean", "median"), std = 2)
compute_zscore(data, control_samples, measure = c("mean", "median"), std = 2)
data |
Gene expression data with gene symbols in rows, sample names in columns |
control_samples |
Character vector specifying the control sample names |
measure |
Either 'mean' or 'median'. 'mean' uses mean and standard deviation. 'median' uses the median and the median absolute deviation to estimate the standard devation (modified z-score). |
std |
Set as default to 2. This controls the standard deviation threshold for the Z-score calculation. #' Normalised expression values less than 'std' will be set to 0. |
zscore data frame
control_samples <- example_pheno$Sample[example_pheno$Class == 'baseline'] compute_zscore(example_data, control_samples,'median',2)
control_samples <- example_pheno$Sample[example_pheno$Class == 'baseline'] compute_zscore(example_data, control_samples,'median',2)
HGNC gene names
sample expression data
...
example_data
example_data
A data frame with 13838 rows and 40 variables:
Author expression data for GEO dataset GSE17156 of transcriptome blood samples from patients that were inoculated with the RSV virus that has been altered by collapsing for HGNC gene symbols.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE17156
Subset of the annotation data for GEO dataset GSE17156, using only patients that have been inoculated with the RSV virus
example_pheno
example_pheno
A data frame with 40 rows and 2 variables:
GSM identified
Symtpomatic state
...
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE17156
Based on the Molecular Distance to Health, this function calculates scores to each sample based on their perturbation from healthy
mdp(data, pdata, control_lab, directory = "", pathways, print = TRUE, measure = c("mean", "median"), std = 2, fraction_genes = 0.25, save_tables = TRUE, file_name = "")
mdp(data, pdata, control_lab, directory = "", pathways, print = TRUE, measure = c("mean", "median"), std = 2, fraction_genes = 0.25, save_tables = TRUE, file_name = "")
data |
|
pdata |
|
control_lab |
character |
directory |
(optional) character string of output directory |
pathways |
(optional) |
print |
set as default to TRUE for pdfs of the sample scores to be saved |
measure |
'medan' as default, can change to 'median'.
|
std |
|
fraction_genes |
|
save_tables |
Set as default to TRUE. Tables of zscore and gene and sample scores will be saved. |
file_name |
(optional) character string that will be added to the saved file names |
A list: zscore, gene_scores, gene_freq, sample_scores, perturbed_genes
Z-score - z-score is calculated using the control samples to compute the average and the standard deviation. The absolute value of this matrix is taken and values less than the std are set to zero. This z-score data frame is used to compute the gene and sample scores.
Gene scores - mean z-score value for each gene in each class
Gene frequency - frequency with which a gene has a non zero z-score value in each class
Sample scores - list containing sample scores for different genesets. Sample scores are the sum of the z-scored gene values for each sample, averaged for the number of genes in that geneset.
Perturbed genes - vector of the top fraction of genes that have higher gene scores in the test classes compared to the control.
Pathways - if genesets are provided, they are ranked according to the signal-to-noise #' ratio of test sample scores versus control sample scores calculated using that geneset.
a list
of pathways can be loaded from a .gmt file using the
fgsea
function using fgsea::gmtPathways('gmt.file.location')
if median
is selected, the z-score will be calculated using the
median, and the standard deviation will be estimated using the median
absolute deviation, utilising the mad
function.
# basic run mdp(example_data,example_pheno,'baseline') # run with pathways pathway_file <- system.file('extdata', 'ReactomePathways.gmt', package = 'mdp') mypathway <- fgsea::gmtPathways(pathway_file) # load a gmt file mdp(data=example_data,pdata=example_pheno,control_lab='baseline', pathways=mypathway)
# basic run mdp(example_data,example_pheno,'baseline') # run with pathways pathway_file <- system.file('extdata', 'ReactomePathways.gmt', package = 'mdp') mypathway <- fgsea::gmtPathways(pathway_file) # load a gmt file mdp(data=example_data,pdata=example_pheno,control_lab='baseline', pathways=mypathway)
print pathways generates a summary plot for pathways and sample score plot of best gene set
pathway_summary(sample_results, path, file_name, control_samples, control_lab)
pathway_summary(sample_results, path, file_name, control_samples, control_lab)
sample_results |
list of sample scores for each geneset |
path |
directory to save images |
file_name |
name of saved imaged |
control_samples |
list of control sample names |
control_lab |
label that specifies control class |
data frame of signal to noise ratio of control vc test sample scores for each pathway
Resultant sample scores when the mdp is applied to example_data and example_pheno
sample_data
sample_data
A data frame with 40 rows and 3 variables:
GSM identified
Sample score
Symtpomatic state
...
Plot sample scores Plots the sample scores data.frame for a given geneset. Data frame must have Score, Sample and Class columns
sample_plot(sample_data, filename = "", directory = "", title = "", print = TRUE, display = TRUE, control_lab)
sample_plot(sample_data, filename = "", directory = "", title = "", print = TRUE, display = TRUE, control_lab)
sample_data |
|
filename |
(optional) character string that will be added to the saved pdf filename |
directory |
(optional) character string of directory to save file |
title |
(optional) character string of title name for graph |
print |
(default TRUE) Save as a pdf file |
display |
(default TRUE) Display plot |
control_lab |
(optional) character string Specifying control_lab will set the control class as light blue as a default |
generates a plot of the sample scores
sample_plot(sample_data = sample_data, control_lab = 'baseline')
sample_plot(sample_data = sample_data, control_lab = 'baseline')