Title: | The Discordant Method: A Novel Approach for Differential Correlation |
---|---|
Description: | Discordant is an R package that identifies pairs of features that correlate differently between phenotypic groups, with application to -omics data sets. Discordant uses a mixture model that “bins” molecular feature pairs based on their type of coexpression or coabbundance. Algorithm is explained further in "Differential Correlation for Sequencing Data"" (Siska et al. 2016). |
Authors: | Charlotte Siska [aut], McGrath Max [aut, cre], Katerina Kechris [aut, cph, ths] |
Maintainer: | McGrath Max <[email protected]> |
License: | GPL-3 |
Version: | 1.31.0 |
Built: | 2024-11-07 05:59:01 UTC |
Source: | https://github.com/bioc/discordant |
Calculates correlation coefficients based on two groups of -omics bivariate data. Currently, only two groups of samples can be specified. Used to make input for discordantRun().
createVectors( x, y = NULL, groups, cor.method = c("spearman", "pearson", "bwmc", "sparcc") )
createVectors( x, y = NULL, groups, cor.method = c("spearman", "pearson", "bwmc", "sparcc") )
x |
ExpressionSet of -omics data |
y |
Optional second ExpressionSet of -omics data, induces dual -omics analysis |
groups |
n-length vector of 1s and 2s matching samples belonging to groups 1 and 2 |
cor.method |
Correlation method to measure association. Options are "spearman", "pearson", "bwmc" and "sparcc" |
Creates vectors of correlation coefficents based on feature pairs within x or between x and y. The names of the vectors are the feature pairs taken from x and y.
List of two named numeric vectors. Vectors give the correlation coefficients for groups 1 and 2 respectively, and vector names give the each feature for the resptive feature pair seperated by an underscore.
Charlotte Siska [email protected]
Max McGrath [email protected]
Siska C, Bowler R and Kechris K. The Discordant Method: A Novel Approach for Differential Correlation. (2015) Bioinformatics. 32(5): 690-696.
Friedman J and Alm EJ. Inferring Correlation Networks from Genomic Survey Data. (2012) PLoS Computational Biology. 8:9, e1002687.
## load data data("TCGA_GBM_miRNA_microarray") data("TCGA_GBM_transcript_microarray") print(colnames(TCGA_GBM_transcript_microarray)) # look at groups groups <- c(rep(1,10), rep(2,20)) # transcript-transcript pairs vectors <- createVectors(TCGA_GBM_transcript_microarray, groups = groups, cor.method = c("pearson")) # miRNA-transcript pairs vectors <- createVectors(TCGA_GBM_transcript_microarray, TCGA_GBM_miRNA_microarray, groups = groups)
## load data data("TCGA_GBM_miRNA_microarray") data("TCGA_GBM_transcript_microarray") print(colnames(TCGA_GBM_transcript_microarray)) # look at groups groups <- c(rep(1,10), rep(2,20)) # transcript-transcript pairs vectors <- createVectors(TCGA_GBM_transcript_microarray, groups = groups, cor.method = c("pearson")) # miRNA-transcript pairs vectors <- createVectors(TCGA_GBM_transcript_microarray, TCGA_GBM_miRNA_microarray, groups = groups)
Discordant is a method to determine differential correlation of molecular feature pairs from -omics data using mixture models. Algorithm is explained further in Siska et al.
Charlotte Siska
Max McGrath
Katerina Kechris
Runs discordant algorithm on two vectors of correlation coefficients.
discordantRun( v1, v2, x, y = NULL, transform = TRUE, subsampling = FALSE, subSize = NULL, iter = 100, components = 3 )
discordantRun( v1, v2, x, y = NULL, transform = TRUE, subsampling = FALSE, subSize = NULL, iter = 100, components = 3 )
v1 |
Vector of correlation coefficients in group 1 |
v2 |
Vector of correlation coefficients in group 2 |
x |
ExpressionSet of -omics data |
y |
ExpressionSet of -omics data, induces dual -omics analysis |
transform |
If TRUE v1 and v2 will be Fisher transformed |
subsampling |
If TRUE subsampling will be run |
subSize |
Indicates how many feature pairs to be used for subsampling. Default is the feature size in x |
iter |
Number of iterations for subsampling. Default is 100 |
components |
Number of components in mixture model. |
The discordant algorithm is based on a Gaussian mixture model. If there are three components, correlation coefficients are clustered into negative correlations (-), positive correlations (+) and no correlation (0). If there are five components, then there are two more classes for very negative correlation (–) and very positive correlations (++). All possible combinations for these components are made into classes. If there are three components, there are 9 classes. If there are five components, there are 25 classes.
The posterior probabilities for each class are generated and outputted into the value probMatrix. The value probMatrix is a matrix where each column is a class and each row is a feature pair. The values discordPPVector and discordPPMatrix are the summed differential correlation posterior probability for each feature pair. The values classVector and classMatrix are the class with the highest posterior probability for each feature pair.
Vector of differentially correlated posterior probabilities.
Matrix of differentially correlated posterior probabilities where rows and columns reflect features
Vector of classes that have the highest posterior probability
Matrix of classes that have hte highest posterior probability where rows and columns reflect features
Matrix of posterior probabilities where rows are each molecular feature pair and columns are nine different classes
Final log likelihood
Charlotte Siska [email protected]
Max McGrath [email protected]
Siska C, Bowler R and Kechris K. The Discordant Method: A Novel Approach for Differential Correlation (2015), Bioinformatics. 32 (5): 690-696.
Lai Y, Zhang F, Nayak TK, Modarres R, Lee NH and McCaffrey TA. Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets. (2014) BMC Genomics 15, S6.
Lai Y, Adam B-l, Podolsky R, She J-X. A mixture model approach to the tests of concordance and discordancd between two large-scale experiments with two sample groups. (2007) Bioinformatics 23, 1243-1250.
# Load Data data(TCGA_GBM_miRNA_microarray) data(TCGA_GBM_transcript_microarray) print(colnames(TCGA_GBM_transcript_microarray)) # look at groups groups <- c(rep(1,10), rep(2,20)) ## DC analysis on only transcripts pairs vectors <- createVectors(TCGA_GBM_transcript_microarray, groups = groups) result <- discordantRun(vectors$v1, vectors$v2, TCGA_GBM_transcript_microarray) ## DC analysis on miRNA-transcript pairs vectors <- createVectors(TCGA_GBM_transcript_microarray, TCGA_GBM_miRNA_microarray, groups = groups, cor.method = c("pearson")) result <- discordantRun(vectors$v1, vectors$v2, TCGA_GBM_transcript_microarray, TCGA_GBM_miRNA_microarray)
# Load Data data(TCGA_GBM_miRNA_microarray) data(TCGA_GBM_transcript_microarray) print(colnames(TCGA_GBM_transcript_microarray)) # look at groups groups <- c(rep(1,10), rep(2,20)) ## DC analysis on only transcripts pairs vectors <- createVectors(TCGA_GBM_transcript_microarray, groups = groups) result <- discordantRun(vectors$v1, vectors$v2, TCGA_GBM_transcript_microarray) ## DC analysis on miRNA-transcript pairs vectors <- createVectors(TCGA_GBM_transcript_microarray, TCGA_GBM_miRNA_microarray, groups = groups, cor.method = c("pearson")) result <- discordantRun(vectors$v1, vectors$v2, TCGA_GBM_transcript_microarray, TCGA_GBM_miRNA_microarray)
Transforms Pearsons correlation coefficients into z scores using Fishers method.
fishersTrans(rho)
fishersTrans(rho)
rho |
Integer or numeric vector of Pearson's correlation coefficients |
Fisher's transformation is when correlation coefficients are transformed into a z score. These z scores have an approximately normal distribution.
Returns Fisher-transformed correlation coefficients
Fisher, R.A. (1915). "Frequency distribution of the values of the correlation coefficient in samples of an indefinitely large population". Biometrika (Biometrika Trust) 10 (4).
## Create integer or list of Pearson's correlation coefficients. library(MASS) rhoV <- as.vector(cor(t(mvrnorm(10,rep(3,100),diag(100))))) ## Determine Fisher-Transformed z scores of rho zV <- fishersTrans(rhoV)
## Create integer or list of Pearson's correlation coefficients. library(MASS) rhoV <- as.vector(cor(t(mvrnorm(10,rep(3,100),diag(100))))) ## Determine Fisher-Transformed z scores of rho zV <- fishersTrans(rhoV)
Identify features with outliers using left and right median absolute deviation (MAD).
splitMADOutlier(mat, filter0 = TRUE, threshold = 2)
splitMADOutlier(mat, filter0 = TRUE, threshold = 2)
mat |
m by n matrix of -omics data, where rows are features and columns samples. |
filter0 |
Option to filter out features if they have at least one 0 value. Default is TRUE. |
threshold |
Threshold of how many MADs outside the left or right median is used to determine features with outliers. |
The purpose of this function is to determine outliers in non-symmetric distributions. The distribution is split by the median. Outliers are identifed by being however many median absolute deviations (MAD) from either split distribution.
Input matrix where features with outliers filtered out.
Index of features that have no outliers.
Leys C, Klein O, Bernard P and Licata L. "Detecting Outliers: Do Not Use Standard Deviation Around the Mean, Use Absolute Deivation Around the Median." Journal of Experimental Social Psychology, 2013. 49(4), 764-766.
Magwene, PM, Willis JH, Kelly JK and Siepel A. "The Statistics of Bulk Segregant Analysis Using Next Generation Sequencing." PLoS Computational Biology, 2011. 7(11), e1002255.
## Simulate matrix of continuous -omics data. data(TCGA_Breast_miRNASeq) ## Filter matrix based on outliers. mat.filtered <- splitMADOutlier(TCGA_Breast_miRNASeq)$mat.filtered
## Simulate matrix of continuous -omics data. data(TCGA_Breast_miRNASeq) ## Filter matrix based on outliers. mat.filtered <- splitMADOutlier(TCGA_Breast_miRNASeq)$mat.filtered
This dataset contains TMM normalized voom-transformed miRNA count values from miRNASeq that was taken from the Cancer Genome Atlas, or TCGA. The dataset has 100 miRNA and 57 samples. The original dataset has 212 miRNA and 57 samples.
data(TCGA_Breast_miRNASeq)
data(TCGA_Breast_miRNASeq)
An ExpressionSet with 100 features, 57 samples
National Institutes of Health. The Cancer Genome Atlas.
data(TCGA_Breast_miRNASeq)
data(TCGA_Breast_miRNASeq)
This dataset contains TMM normalized voom-transformed miRNA count values from miRNASeq that was taken from the Cancer Genome Atlas, or TCGA. The dataset has 100 miRNA and 57 samples. The original dataset has 212 miRNA and 57 samples.
data(TCGA_Breast_miRNASeq_voom)
data(TCGA_Breast_miRNASeq_voom)
An ExpressionSet with 100 features and 57 samples
Charity W Law, Yunshun Chen, Wei Shi, Gordon K Smyth. voom: precision weights unlock linear model analysis tools for RNA-seq read counts. 2014. Genome Biology, 15:R29.
National Institues of Health. The Cancer Genome Atlas.
data(TCGA_Breast_miRNASeq_voom)
data(TCGA_Breast_miRNASeq_voom)
This dataset contains TMM normalized RNA count values from RNASeq that was taken from the Cancer Genome Atlas, or TCGA. It has 100 features and 57 samples. The original dataset had 17972 features and 57 samples.
data(TCGA_Breast_RNASeq)
data(TCGA_Breast_RNASeq)
An ExpressionSet with 100 features and 57 samples
Charity W Law, Yunshun Chen, Wei Shi, Gordon K Smyth. voom: precision weights unlock linear model analysis tools for RNA-seq read counts. 2014. Genome Biology, 15:R29.
National Institues of Health. The Cancer Genome Atlas.
data(TCGA_Breast_RNASeq)
data(TCGA_Breast_RNASeq)
This dataset contains TMM normalized voom-transformed RNA count values from RNASeq that was taken from the Cancer Genome Atlas, or TCGA.
data(TCGA_Breast_miRNASeq_voom)
data(TCGA_Breast_miRNASeq_voom)
An ExpressionSet with 100 features and 57 samples
Charity W Law, Yunshun Chen, Wei Shi, Gordon K Smyth. voom: precision weights unlock linear model analysis tools for RNA-seq read counts. 2014. Genome Biology, 15:R29.
National Institues of Health. The Cancer Genome Atlas.
data(TCGA_Breast_miRNASeq_voom)
data(TCGA_Breast_miRNASeq_voom)
This dataset contains miRNA expression values from a microarray that was taken from the Cancer Genome Atlas, or TCGA. It has 10 features and 30 samples. The original dataset had 331 features and 30 samples.
data(TCGA_GBM_miRNA_microarray)
data(TCGA_GBM_miRNA_microarray)
An ExpressionSet with 10 features, 30 samples
National Institutes of Health. The Cancer Genome Atlas.
data(TCGA_GBM_miRNA_microarray)
data(TCGA_GBM_miRNA_microarray)
This dataset contains transcript expression values from a microarray that was taken from the Cancer Genome Atlas, or TCGA. It has 10 features and 30 samples. The original dataset had 72656 features and 30 samples.
data(TCGA_GBM_transcript_microarray)
data(TCGA_GBM_transcript_microarray)
An ExpressionSet with 10 features, 30 samples
National Institutes of Health. The Cancer Genome Atlas.
data(TCGA_GBM_transcript_microarray)
data(TCGA_GBM_transcript_microarray)