Title: | PROMISE analysis with Canonical Correlation for Two Forms of High Dimensional Genetic Data |
---|---|
Description: | Perform Canonical correlation between two forms of high demensional genetic data, and associate the first compoent of each form of data with a specific biologically interesting pattern of associations with multiple endpoints. A probe level analysis is also implemented. |
Authors: | Xueyuan Cao <[email protected]> and Stanley.pounds <[email protected]> |
Maintainer: | Xueyuan Cao <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.33.0 |
Built: | 2024-10-31 05:51:58 UTC |
Source: | https://github.com/bioc/CCPROMISE |
a tool to indentify genes that are correlated between two set of genomic variables and are associated with a predefined pattern of associations with multiple endpoint variables.
Package: | CCPROMISE |
Type: | Package |
Version: | 0.99.3 |
Date: | 2016-10-11 |
License: | GPL (>=2) |
LazyLoad: | yes |
The CCPROMISE (Canonical correlation with PROMISE analysis) is performed by calling function CCPROMISE. The two forms of genomic data such as gene expression and methylation are passed through minimal ExpressionSet; the gene annotation (defining relationship between a gene and the two forms of genomic data), phenotypic data and definition of R routines for calculating association statistics with individual endpoint variable are same as in PROMISE package. Please refer to PROMISE package for writing user defined routines.
Xueyuan Cao [email protected], Stanley Pounds [email protected]
Maintainer: Xueyuan Cao [email protected]
Cao X, Crews KR, Downing J, Lamba J and Pounds XB (2016) CC-PROMISE effectively integrates two forms of molecular data with multiple biologically related endpoint. BMC Bioinformatics 17(Suppl 13):382
Hotelling H. (1936). Relations between two sets of variables. Biometrika, 28, 321-327
Pounds S, Cheng C, Cao X, Crews KR, Plunkett W, Gandhi V, Rubnitz J, Ribeiro RC, Downing JR, and Lamba J (2009) PROMISE: a tool to identify genomic features with a specific biologically interesting pattern of associations with multiple endpoint variables. Bioinformatics 25: 2013-2019
Wilks, S. S. (1935) On the independence of k sets of normally distributed statistical variables. Econometrica, 3 309-326.
## load data data(exmplESet) data(exmplMSet) data(exmplGeneSet) data(exmplPat) ## Perform CCPROMISE test test<- CCPROMISE(geneSet=exmplGeneSet, ESet=exmplESet, MSet=exmplMSet, promise.pattern=exmplPat, strat.var=NULL, prlbl=NULL, EMlbl=c("Expr", "Methyl"), nbperm=TRUE, max.ntail=10, nperms=100, seed=13)
## load data data(exmplESet) data(exmplMSet) data(exmplGeneSet) data(exmplPat) ## Perform CCPROMISE test test<- CCPROMISE(geneSet=exmplGeneSet, ESet=exmplESet, MSet=exmplMSet, promise.pattern=exmplPat, strat.var=NULL, prlbl=NULL, EMlbl=c("Expr", "Methyl"), nbperm=TRUE, max.ntail=10, nperms=100, seed=13)
Compute canonical correlation between two sets of genomic data.
CANN (geneSet, Edat, Mdat, EMlbl = c("Expr", "Methyl"), phdat)
CANN (geneSet, Edat, Mdat, EMlbl = c("Expr", "Methyl"), phdat)
geneSet |
a gene set collection to annotate probes to gene |
Edat |
data frame of the first form of genomic data, such as gene expression data with row being probes and column being subjects. The column names should match the row names phdat |
Mdat |
data frame of the second form of genomic data, such as methylation data with row being probes and column being subjects. The column names should match the row names phdat |
EMlbl |
lablel of the genomic data, default=c("Expr", "Methyl") for Edat and Mdat |
phdat |
phenotype data with row being subjects and column being phenotype variables. The row names should match the column names of Edat and Mdat |
The function performs Canonical correlation between two forms genomic data for each gene (Edat and Mdat) defined by gann. If a gene only has one form of genomic data, the first principal component is used; If one form of data has numberof probesets exceeding the number of subjects, the first number of subjects probesets are used. The function return a list of three components. See value for details.
The output of the function is a list of length 3 with thee components:
CCres |
canonical correlation result: a data frame with row for each each gene and six columns (Gene: gene names; n.EMlbl[1]: number of probes of first form genomic data; n.EMlbl[2]: number of probes of second form genomic data; CanonicalCR: Canonical correlation of first components; WilksPermPval: permuatation p value of Wilks' Lambda; WilksAsymPval: p value of F-approximations of Wilks' Lambda). |
FSTccscore |
the first component of canonical correlation: a data frame with row for each gene, first half of columns for first component of first form genomic data and second half of columns for first component of second form genomic data. |
CCload |
a data frame of loading (each row is for a gene, first column is gene names, second column is the probeset ids of first form genomic data seperated by '|', third column is the load for each probeset in first form genomic data seperated by '|', fourth column is the probeset ids of second form genomic data seperated by '|', fifth column is the load for each probeset in second form genomic data seperated by '|') |
Xueyuan Cao [email protected], Stanley Pounds [email protected]
Hotelling H. (1936). Relations between two sets of variables. Biometrika, 28, 321-327
## load exmplEdat exmplMdat data(exmplESet) data(exmplMSet) data(exmplGeneSet) ## Perform canonical correlation test test1<- CANN(geneSet=exmplGeneSet, Edat=exprs(exmplESet), Mdat=exprs(exmplMSet), EMlbl=c("Expr", "Methyl"), phdat=pData(exmplESet))
## load exmplEdat exmplMdat data(exmplESet) data(exmplMSet) data(exmplGeneSet) ## Perform canonical correlation test test1<- CANN(geneSet=exmplGeneSet, Edat=exprs(exmplESet), Mdat=exprs(exmplMSet), EMlbl=c("Expr", "Methyl"), phdat=pData(exmplESet))
PROMISE analysis of two genomic sets with multiple phenotypes under a predefined association pattern at gene level.
CCPROMISE (geneSet, ESet, MSet, promise.pattern, strat.var = NULL, prlbl = NULL, EMlbl = c("Expr", "Mthyl"), nbperm = FALSE, max.ntail = 100, nperms = 10000, seed = 13)
CCPROMISE (geneSet, ESet, MSet, promise.pattern, strat.var = NULL, prlbl = NULL, EMlbl = c("Expr", "Mthyl"), nbperm = FALSE, max.ntail = 100, nperms = 10000, seed = 13)
geneSet |
a gene set collection to annotate probes to gene |
ESet |
an ExpressionSet class contains minimum of exprs (expression matrix) of first form of genomic data such as gene expression and phenoData (AnnotatedDataFrame of end point data). Please refer to Biobase for details on how to create such an ExpressionSet expression set. |
MSet |
an ExpressionSet class of second form of genomic data such as methylation levels, the subject id of MSet and ESet should be exactly same |
promise.pattern |
PROMISE pattern |
strat.var |
stratum variable |
prlbl |
labels |
EMlbl |
lablel of the genomic data, default=c('Expr', 'Methyl') for ESet and MSet |
nbperm |
indicator of fast permuation using negative binomial strategy, taking two valid values: FALSE or TRUE. The default is FALSE. |
max.ntail |
number of sucess if nbperm = T. Further permutation will not be performed for gene(s) or gene set(s) which max.ntail permutated statistics are greater or equal to the observed statistics, The default is 100. |
nperms |
number of permutation, default = 10,000 |
seed |
initial seed of random number generator. The default is 13. |
The function performs PROMISE analysis for two forms of genomic data in minimal expression set format with a prefined phenotypic pattern. It calls two external function CANN and PROMISE2
The output is a list of length 4. The 4 components are as following:
PRres |
PROMISE result for the first component of canonical correlation between two forms of geneomic data. individual genes' test statistics and p-values for each individual endpoint and PROMISE analysis |
CCres |
result of canonical correlation analysis with six columns: Gene: Gene names; n.EMlbl[1]: number of probe set in the first form data; n.EMlbl[2]: number of probe set in the second form data; CanonicalCR: Canonical correlation of first components; WilksPermPval: permuatation p value of Wilks' Lambda; WilksAsymPval: p value of F-approximations of Wilks' Lambda. |
FSTccscore |
loads of first component of canonical correlation: a data frame of loading (each row is for a gene, first column is gene names, second column is the probeset ids of first form genomic data seperated by '|', third column is the load for each probeset in first form genomic data seperated by '|', fourth column is the probeset ids of second form genomic data seperated by '|', fifth column is the load for each probeset in second form genomic data seperated by '|') |
CCload |
a data frame of loading (each row is for a gene, first column is gene names, second column is the probeset ids of first form genomic data seperated by '|', third column is the load for each probeset in first form genomic data seperated by '|', fourth column is the probeset ids of second form genomic data seperated by '|', fifth column is the load for each probeset in second form genomic data seperated by '|') |
Xueyuan Cao [email protected], Stanley Pounds [email protected]
Cao X, Crews KR, Downing J, Lamba J and Pounds SB (2016) CC-PROMISE effectively integrates two forms of molecular data with multiple biologically related endpoint. BMC Bioinformatics 17(Suppl 13):382
## load data data(exmplESet) data(exmplMSet) data(exmplGeneSet) data(exmplPat) ## Perform canonical correlation test test<- CCPROMISE(geneSet=exmplGeneSet, ESet=exmplESet, MSet=exmplMSet, promise.pattern=exmplPat, strat.var=NULL, prlbl=NULL, EMlbl=c("Expr", "Methyl"), nbperm=FALSE, max.ntail=10, nperms=100, seed=13)
## load data data(exmplESet) data(exmplMSet) data(exmplGeneSet) data(exmplPat) ## Perform canonical correlation test test<- CCPROMISE(geneSet=exmplGeneSet, ESet=exmplESet, MSet=exmplMSet, promise.pattern=exmplPat, strat.var=NULL, prlbl=NULL, EMlbl=c("Expr", "Methyl"), nbperm=FALSE, max.ntail=10, nperms=100, seed=13)
an ExpressionSet class contains minimum of exprs (expression matrix) of gene expression and phenoData (AnnotatedDataFrame of end point data).
data(exmplESet)
data(exmplESet)
an example ExpressionSet contains conceptual data of 105 expression features measured by U133A array for 151 subjects. The phenotype data has 8 columns for the same 151 subjects.
An conceptual exmple of gene set collection to annotate both form of genomic data to genes. The gene names can be extracted by method of setName() and probe ids can be extracted by method of geneIds()
data(exmplGeneSet)
data(exmplGeneSet)
a conceptual gene set collection of 10 genes with 319 unique U133A expression probe ids or Infinium HumanMethylation450 probe ids.
an conceptual ExpressionSet class contains minimum of exprs (matrix) of DNA methylation and phenoData (AnnotatedDataFrame of end point data).
data(exmplMSet)
data(exmplMSet)
an conceptual example ExpressionSet of 735 DNA methylation probe ids for 151 subjects. The phenotype data has 8 columns for the same 151 subjects
An conceptual exmple of phenotype pattern definition set with three columns: stat.coef, stat.func, and endpt.vars; It defines an association pattern for three phenotypes.
data(exmplPat)
data(exmplPat)
a data frame
Compute Spearman correlation of all probe combination between two sets of genomic data within a gene.
PrbCor (geneSet, Edat, Mdat, EMlbl = c("Expr", "Methyl"), phdat, pcut = 0.05)
PrbCor (geneSet, Edat, Mdat, EMlbl = c("Expr", "Methyl"), phdat, pcut = 0.05)
geneSet |
a gene set collection to annotate probes to gene |
Edat |
data frame of the first form of genomic data, such as gene expression data with row being probes and column being subjects. The column names should match the row names phdat |
Mdat |
data frame of the second form of genomic data, such as methylation data with row being probes and column being subjects. The column names should match the row names phdat |
EMlbl |
lablel of the genomic data, default=c("Expr", "Methyl") for Edat and Mdat |
phdat |
phenotype data with row being subjects and column being phenotype variables. The row names should match the column names of Edat and Mdat |
pcut |
p value cutoff to eliminate probe pairs that are not significantly correlated. Default is 0.05 |
The function performs Spearman correlation for all probe pairs between two forms genomic data within each gene (Edat and Mdat) defined by gann. If a gene only has one form of genomic data, the other form is coded as NA. The function return a list of two components. See value for details.
The output of the function is a list of length 2. The 2 components are as following:
res |
spearman correlation result: a data frame with row for each probe pair with correlation p value < pcut and five columns; Gene: Gene names; EMlbl[1]: probe id in the first form data; EMlbl[2]: probe id in the second form data; Spearman.rstat: Spearman r statistics; Spearman.p: Spearman p value. |
gen |
Probe level data: a data frame with row for each probe pairs, first half of columns for first form genomic data and second half of columns for second form genomic data with sign reflecting the correlation of the probe pair. |
Xueyuan Cao [email protected], Stanley Pounds [email protected]
## load exmplPhDat exmplEdat exmplMdat data(exmplESet) data(exmplMSet) data(exmplGeneSet) ## Perform canonical correlation test test1<- PrbCor(geneSet=exmplGeneSet, Edat=exprs(exmplESet), Mdat=exprs(exmplMSet), EMlbl=c("Expr", "Methyl"), phdat=pData(exmplESet))
## load exmplPhDat exmplEdat exmplMdat data(exmplESet) data(exmplMSet) data(exmplGeneSet) ## Perform canonical correlation test test1<- PrbCor(geneSet=exmplGeneSet, Edat=exprs(exmplESet), Mdat=exprs(exmplMSet), EMlbl=c("Expr", "Methyl"), phdat=pData(exmplESet))
PROMISE analysis of two genomic sets with multiple phenotypes under a predefined association pattern at probe level.
PrbPROMISE (geneSet, ESet, MSet, promise.pattern, strat.var = NULL, prlbl = NULL, EMlbl = c("Expr", "Mthyl"), pcut = 0.05, nbperm = FALSE, max.ntail = 100, nperms = 10000, seed = 13)
PrbPROMISE (geneSet, ESet, MSet, promise.pattern, strat.var = NULL, prlbl = NULL, EMlbl = c("Expr", "Mthyl"), pcut = 0.05, nbperm = FALSE, max.ntail = 100, nperms = 10000, seed = 13)
geneSet |
a gene set collection to annotate probes to gene |
ESet |
an ExpressionSet class contains minimum of exprs (expression matrix) of first form of genomic data such as gene expression and phenoData (AnnotatedDataFrame of end point data). Please refer to Biobase for details on how to create such an ExpressionSet expression set. |
MSet |
an ExpressionSet class of second form of genomic data such as methylation levels, the subject id of MSet and ESet should be exactly same |
promise.pattern |
PROMISE pattern |
strat.var |
stratum variable |
prlbl |
labels |
EMlbl |
lablel of the genomic data, default=c('Expr', 'Methyl') for ESet and MSet |
pcut |
p value cutoff to eliminate probe pairs that are not significantly correlated. Default is 0.05 |
nbperm |
indicator of fast permuation using negative binomial strategy, taking two valid values: FALSE or TRUE. The default is FALSE. |
max.ntail |
number of sucess if nbperm = T. Further permutation will not be performed for gene(s) or gene set(s) which max.ntail permutated statistics are greater or equal to the observed statistics, The default is 100. |
nperms |
number of permutation, default = 10,000 |
seed |
initial seed of random number generator. The default is 13. |
The function performs PROMISE analysis for two forms of genomic data in minimal expression set format with a prefined phenotypic pattern. It calls two external function PrbCor and PROMISE2
The output of the function is a list of length 2. The 2 components are as following:
PRres |
PROMISE result for the first component of canonical correlation between two forms of geneomic data. individual genes' test statistics and p-values for each individual endpoint and PROMISE analysis |
CORres |
result of spearman correlation analysis of probe pairs within a gene with five columns: Gene: Gene names; EMlbl[1]: probe id in the first form data; EMlbl[2]: probe id in the second form data; Spearman.rstat: Spearman r statistics; Spearman.p: Spearman p value. |
Xueyuan Cao [email protected], Stanley Pounds [email protected]
## load data data(exmplESet) data(exmplMSet) data(exmplGeneSet) data(exmplPat) ## Perform probe level PROMISE analysis test<-PrbPROMISE(geneSet=exmplGeneSet, ESet=exmplESet, MSet=exmplMSet, promise.pattern=exmplPat, strat.var=NULL, prlbl=c('LC50', 'MRD22', 'EFS', 'PR3'), EMlbl=c("Expr", "Methyl"), nbperm=TRUE, max.ntail=10, nperms=100, seed=13)
## load data data(exmplESet) data(exmplMSet) data(exmplGeneSet) data(exmplPat) ## Perform probe level PROMISE analysis test<-PrbPROMISE(geneSet=exmplGeneSet, ESet=exmplESet, MSet=exmplMSet, promise.pattern=exmplPat, strat.var=NULL, prlbl=c('LC50', 'MRD22', 'EFS', 'PR3'), EMlbl=c("Expr", "Methyl"), nbperm=TRUE, max.ntail=10, nperms=100, seed=13)
PROMISE analysis of two genomic sets with multiple phenotypes.
PROMISE2 (exprSet, exprSet2, geneSet = NULL, promise.pattern, strat.var = NULL, nbperm = FALSE, max.ntail = 100, nperms = 10000, seed = 13)
PROMISE2 (exprSet, exprSet2, geneSet = NULL, promise.pattern, strat.var = NULL, nbperm = FALSE, max.ntail = 100, nperms = 10000, seed = 13)
exprSet |
expression set of first genomic data |
exprSet2 |
expression set of second genomic data |
geneSet |
geneSet should be NULL. |
promise.pattern |
PROMISE pattern |
strat.var |
stratum variable |
nbperm |
indicator of fast permuation using negative binomial strategy, taking two valid values: FALSE or TRUE. The default is FALSE. |
max.ntail |
number of sucess if nbperm = T. Further permutation will not be performed for gene(s) or gene set(s) which max.ntail permutated statistics are greater or equal to the observed statistics, The default is 100. |
nperms |
number of permutation, default = 10,000 |
seed |
random seed, default = 13 |
The function performs PROMISE analysis for two set genomic data with a prefined phenotypic pattern. It is intermediate function called by CCPROMISE to perform PROMISE analysis with canonical correlation
The output of the function is a list of length 2. The 2 components are as following:
generes |
individual genes' test statistics and p-values for each individual endpoint and PROMISE analysis. |
setres |
Gene set level analysis is not implemented with value NULL |
Xueyuan Cao [email protected], Stanley Pounds [email protected]
## load data data(exmplESet) data(exmplMSet) data(exmplGeneSet) data(exmplPat) ## Perform canonical correlation test test<- PROMISE2(exmplESet[1:10], exmplMSet[1:10], promise.pattern=exmplPat, strat.var=NULL, nbperm=FALSE, max.ntail=10, nperms=100, seed=13)
## load data data(exmplESet) data(exmplMSet) data(exmplGeneSet) data(exmplPat) ## Perform canonical correlation test test<- PROMISE2(exmplESet[1:10], exmplMSet[1:10], promise.pattern=exmplPat, strat.var=NULL, nbperm=FALSE, max.ntail=10, nperms=100, seed=13)