Title: | Outlier profile and pathway analysis in R |
---|---|
Description: | The R implementation of mCOPA package published by Wang et al. (2012). Oppar provides methods for Cancer Outlier profile Analysis. Although initially developed to detect outlier genes in cancer studies, methods presented in oppar can be used for outlier profile analysis in general. In addition, tools are provided for gene set enrichment and pathway analysis. |
Authors: | Chenwei Wang [aut], Alperen Taciroglu [aut], Stefan R Maetschke [aut], Colleen C Nelson [aut], Mark Ragan [aut], Melissa Davis [aut], Soroor Hediyeh zadeh [cre], Momeneh Foroutan [ctr] |
Maintainer: | Soroor Hediyeh zadeh <[email protected]> |
License: | GPL-2 |
Version: | 1.35.0 |
Built: | 2024-11-19 03:57:39 UTC |
Source: | https://github.com/bioc/oppar |
An ExpressionSet object containing trimmed GSE46141 data. The object contains gene expression measurements on local breast tumours and liver, lymph node, skin local- regional etc metastatic tumours. Contains 9503 features and 80 samples (ascite, bone, lung and skin samples were removed).
data(GSE46141)
data(GSE46141)
Contains gene expression matrix, phenotype (pData) and feature (fData) data:
In fData(e) – probe IDs
In fData(e) – Entrez Ids
In fData(e) – Gene Symbols
In pData(e) – GEO accession IDs
In pData(e) – Sample information
An ExpressionSet object
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE46141
An ExpressionSet object containing microarray gene expression measurements on normal tissue and metastatic prostate cancer tumoures, and the corresponsing feature and phynotypic meta data.
data(tomlins)
data(tomlins)
An ExpressionSet object with 10945 features and 86 samples.
In pData(eset) – Sample names
In pData(eset) –GEO accession numbers
In pData(eset) – sample description
In fData(eset) – probes Ids
...
In fData(eset) – Gene Symbol
In fData(eset) – Entrez Gene IDs
An ExpressionSet object
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE6099
Returns a list of outlier genes for each given sample
getSampleOutlier(profileMatrix, sample.name) ## S4 method for signature 'matrix,nuchar' getSampleOutlier(profileMatrix, sample.name) ## S4 method for signature 'OPPARList,nuchar' getSampleOutlier(profileMatrix, sample.name)
getSampleOutlier(profileMatrix, sample.name) ## S4 method for signature 'matrix,nuchar' getSampleOutlier(profileMatrix, sample.name) ## S4 method for signature 'OPPARList,nuchar' getSampleOutlier(profileMatrix, sample.name)
profileMatrix |
A matrix of 0, -1 and 1 representing outlier genes in samples.
Also an object of type |
sample.name |
A character vector containing one or more sample names, or a numeric vector containing sample indices. |
A list of lists. For each given sample, the fuction return up-regulated and down-regulated outlier genes.
profileMatrix = matrix,sample.name = nuchar
: A method for getSampleOutlier
profileMatrix = OPPARList,sample.name = nuchar
: A method for getSampleOutlier
data(GSE46141) library(Biobase) group <- sapply(pData(bcm)$source_name_ch1, function(x){ ifelse(x == "breast",0,1)}) group <- factor(group) bcm.opa <- opa(bcm,group=group) # Outlier profile for sample "GSM1124929" getSampleOutlier(bcm.opa, "GSM1124929") # Also can use sample index, instead of sample name getSampleOutlier(bcm.opa, 11) # A vector of sample names/indices are accepted as well getSampleOutlier(bcm.opa, c(1,2)) getSampleOutlier(bcm.opa, c("GSM1124929","GSM1124941"))
data(GSE46141) library(Biobase) group <- sapply(pData(bcm)$source_name_ch1, function(x){ ifelse(x == "breast",0,1)}) group <- factor(group) bcm.opa <- opa(bcm,group=group) # Outlier profile for sample "GSM1124929" getSampleOutlier(bcm.opa, "GSM1124929") # Also can use sample index, instead of sample name getSampleOutlier(bcm.opa, 11) # A vector of sample names/indices are accepted as well getSampleOutlier(bcm.opa, c(1,2)) getSampleOutlier(bcm.opa, c("GSM1124929","GSM1124941"))
Returns a list of genes that are outlier in a group of samples, such as samples from the same subtype.
getSubtypeProbes(profileMatrix, sample.names) ## S4 method for signature 'matrix,nuchar' getSubtypeProbes(profileMatrix, sample.names) ## S4 method for signature 'OPPARList,nuchar' getSubtypeProbes(profileMatrix, sample.names)
getSubtypeProbes(profileMatrix, sample.names) ## S4 method for signature 'matrix,nuchar' getSubtypeProbes(profileMatrix, sample.names) ## S4 method for signature 'OPPARList,nuchar' getSubtypeProbes(profileMatrix, sample.names)
profileMatrix |
A matrix of 0,1 and -1, representing outlier genes in samples.
Also an object of type |
sample.names |
A character vector containing sample names, or a numeric vector containing the indices of the samples. |
A list of lists. The sub-lists are up-regulated outlier genes, and down-regulated outlier genes.
profileMatrix = matrix,sample.names = nuchar
: A method for getSubtypeProbes with signature profileMatrix = matrix
and sample.names = nuchar
profileMatrix = OPPARList,sample.names = nuchar
: A method for getSubtypeProbes with signature profileMatrix = OPPARList
and sample.names = nuchar
data(GSE46141) library(Biobase) group <- sapply(pData(bcm)$source_name_ch1, function(x){ ifelse(x == "breast",0,1)}) group <- factor(group) bcm.opa <- opa(bcm,group=group) # extracting liver samples index <- which(pData(bcm)$source_name_ch1 == "liver") samples <- rownames(pData(bcm)[index,]) samples <- match(samples, colnames(bcm.opa$profileMatrix)) samples <- Reduce(c,samples) # liver subtype outlier profile liver.subtype.outlier <- getSubtypeProbes(bcm.opa, samples)
data(GSE46141) library(Biobase) group <- sapply(pData(bcm)$source_name_ch1, function(x){ ifelse(x == "breast",0,1)}) group <- factor(group) bcm.opa <- opa(bcm,group=group) # extracting liver samples index <- which(pData(bcm)$source_name_ch1 == "liver") samples <- rownames(pData(bcm)[index,]) samples <- match(samples, colnames(bcm.opa$profileMatrix)) samples <- Reduce(c,samples) # liver subtype outlier profile liver.subtype.outlier <- getSubtypeProbes(bcm.opa, samples)
Gene Set Variation Analysis
gsva(expr, gset.idx.list, ...) ## S4 method for signature 'ExpressionSet,list' gsva(expr, gset.idx.list, annotation, method = c("gsva", "ssgsea", "zscore", "plage"), rnaseq = FALSE, abs.ranking = FALSE, min.sz = 1, max.sz = Inf, no.bootstraps = 0, bootstrap.percent = 0.632, parallel.sz = 0, parallel.type = "SOCK", mx.diff = TRUE, tau = switch(method, gsva = 1, ssgsea = 0.25, NA), kernel = TRUE, ssgsea.norm = TRUE, verbose = TRUE, is.gset.list.up.down = FALSE) ## S4 method for signature 'ExpressionSet,GeneSetCollection' gsva(expr, gset.idx.list, annotation, method = c("gsva", "ssgsea", "zscore", "plage"), rnaseq = FALSE, abs.ranking = FALSE, min.sz = 1, max.sz = Inf, no.bootstraps = 0, bootstrap.percent = 0.632, parallel.sz = 0, parallel.type = "SOCK", mx.diff = TRUE, tau = switch(method, gsva = 1, ssgsea = 0.25, NA), kernel = TRUE, ssgsea.norm = TRUE, verbose = TRUE, is.gset.list.up.down = FALSE) ## S4 method for signature 'matrix,GeneSetCollection' gsva(expr, gset.idx.list, annotation, method = c("gsva", "ssgsea", "zscore", "plage"), rnaseq = FALSE, abs.ranking = FALSE, min.sz = 1, max.sz = Inf, no.bootstraps = 0, bootstrap.percent = 0.632, parallel.sz = 0, parallel.type = "SOCK", mx.diff = TRUE, tau = switch(method, gsva = 1, ssgsea = 0.25, NA), kernel = TRUE, ssgsea.norm = TRUE, verbose = TRUE, is.gset.list.up.down = FALSE) ## S4 method for signature 'matrix,list' gsva(expr, gset.idx.list, annotation, method = c("gsva", "ssgsea", "zscore", "plage"), rnaseq = FALSE, abs.ranking = FALSE, min.sz = 1, max.sz = Inf, no.bootstraps = 0, bootstrap.percent = 0.632, parallel.sz = 0, parallel.type = "SOCK", mx.diff = TRUE, tau = switch(method, gsva = 1, ssgsea = 0.25, NA), kernel = TRUE, ssgsea.norm = TRUE, verbose = TRUE, is.gset.list.up.down = FALSE)
gsva(expr, gset.idx.list, ...) ## S4 method for signature 'ExpressionSet,list' gsva(expr, gset.idx.list, annotation, method = c("gsva", "ssgsea", "zscore", "plage"), rnaseq = FALSE, abs.ranking = FALSE, min.sz = 1, max.sz = Inf, no.bootstraps = 0, bootstrap.percent = 0.632, parallel.sz = 0, parallel.type = "SOCK", mx.diff = TRUE, tau = switch(method, gsva = 1, ssgsea = 0.25, NA), kernel = TRUE, ssgsea.norm = TRUE, verbose = TRUE, is.gset.list.up.down = FALSE) ## S4 method for signature 'ExpressionSet,GeneSetCollection' gsva(expr, gset.idx.list, annotation, method = c("gsva", "ssgsea", "zscore", "plage"), rnaseq = FALSE, abs.ranking = FALSE, min.sz = 1, max.sz = Inf, no.bootstraps = 0, bootstrap.percent = 0.632, parallel.sz = 0, parallel.type = "SOCK", mx.diff = TRUE, tau = switch(method, gsva = 1, ssgsea = 0.25, NA), kernel = TRUE, ssgsea.norm = TRUE, verbose = TRUE, is.gset.list.up.down = FALSE) ## S4 method for signature 'matrix,GeneSetCollection' gsva(expr, gset.idx.list, annotation, method = c("gsva", "ssgsea", "zscore", "plage"), rnaseq = FALSE, abs.ranking = FALSE, min.sz = 1, max.sz = Inf, no.bootstraps = 0, bootstrap.percent = 0.632, parallel.sz = 0, parallel.type = "SOCK", mx.diff = TRUE, tau = switch(method, gsva = 1, ssgsea = 0.25, NA), kernel = TRUE, ssgsea.norm = TRUE, verbose = TRUE, is.gset.list.up.down = FALSE) ## S4 method for signature 'matrix,list' gsva(expr, gset.idx.list, annotation, method = c("gsva", "ssgsea", "zscore", "plage"), rnaseq = FALSE, abs.ranking = FALSE, min.sz = 1, max.sz = Inf, no.bootstraps = 0, bootstrap.percent = 0.632, parallel.sz = 0, parallel.type = "SOCK", mx.diff = TRUE, tau = switch(method, gsva = 1, ssgsea = 0.25, NA), kernel = TRUE, ssgsea.norm = TRUE, verbose = TRUE, is.gset.list.up.down = FALSE)
expr |
Gene expression data which can be given either as an |
gset.idx.list |
Gene sets provided either as a |
... |
other optional arguments. |
annotation |
In the case of calling |
method |
Method to employ in the estimation of gene-set enrichment scores per sample. By default
this is set to |
rnaseq |
Flag to inform whether the input gene expression data comes from microarray
( |
abs.ranking |
Flag to determine whether genes should be ranked according to
their sign ( |
min.sz |
Minimum size of the resulting gene sets. |
max.sz |
Maximum size of the resulting gene sets. |
no.bootstraps |
Number of bootstrap iterations to perform. |
bootstrap.percent |
.632 is the ideal percent samples bootstrapped. |
parallel.sz |
Number of processors to use when doing the calculations in parallel.
This requires to previously load either the |
parallel.type |
Type of cluster architecture when using |
mx.diff |
Offers two approaches to calculate the enrichment statistic (ES)
from the KS random walk statistic. |
tau |
Exponent defining the weight of the tail in the random walk performed by both the |
kernel |
Logical, set to |
ssgsea.norm |
Logical, set to |
verbose |
Gives information about each calculation step. Default: |
is.gset.list.up.down |
logical. Is the gene list divided into up/down sublists? Please note that it is important to name the up-regulated gene set list 'up', and the down-regulated gene set list to 'down', if this argument is used (e.g gset = list(up = up_gset, down = down_gset)) |
returns gene set enrichment scores for each sample and gene set
expr = ExpressionSet,gset.idx.list = list
: Method for ExpressionSet and list
expr = ExpressionSet,gset.idx.list = GeneSetCollection
: Method for ExpressionSet and GeneSetCollection
expr = matrix,gset.idx.list = GeneSetCollection
: Method for matrix and GeneSetCollection
expr = matrix,gset.idx.list = list
: Method for matrix and list
Hanzelmann, S., Castelo, R., & Guinney, J. (2013). GSVA: gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics, 14, 7. http://doi.org/10.1186/1471-2105-14-7
data("Maupin") names(maupin) geneSet<- maupin$sig$EntrezID #Symbol ##EntrezID # both up and down genes: up_sig<- maupin$sig[maupin$sig$upDown == "up",] d_sig<- maupin$sig[maupin$sig$upDown == "down",] u_geneSet<- up_sig$EntrezID #Symbol # up_sig$Symbol ## EntrezID d_geneSet<- d_sig$EntrezID es.dif <- gsva(maupin$data, list(up = u_geneSet, down= d_geneSet), mx.diff=1, verbose=TRUE, abs.ranking=FALSE, is.gset.list.up.down=TRUE, parallel.sz = 1 )$es.obs
data("Maupin") names(maupin) geneSet<- maupin$sig$EntrezID #Symbol ##EntrezID # both up and down genes: up_sig<- maupin$sig[maupin$sig$upDown == "up",] d_sig<- maupin$sig[maupin$sig$upDown == "down",] u_geneSet<- up_sig$EntrezID #Symbol # up_sig$Symbol ## EntrezID d_geneSet<- d_sig$EntrezID es.dif <- gsva(maupin$data, list(up = u_geneSet, down= d_geneSet), mx.diff=1, verbose=TRUE, abs.ranking=FALSE, is.gset.list.up.down=TRUE, parallel.sz = 1 )$es.obs
A list consisting of two components: data: A matrix consisting of gene expression values on 3 control and 3 TGFb treated samples. sig: A TGFb gene signature. A dataframe containing a gene signature and information on whether genes in the signature are up or dow regulated.
data(Maupin)
data(Maupin)
A list of two components:
In data; Control Sample - Replicate 1
In data; Control Sample - Replicate 2
In data; Control Sample - Replicate 3
In data; Control Sample - Replicate 1
In data; Control Sample - Replicate 2
In data; Control Sample - Replicate 3
In sig; The Entrez IDs
In sig; Gene symbols
In sig; Direction of regulation e.g. up, down
A list of 2
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE23952
Returns a matrix with 0, -1 and 1 entries that describe outlier profiles in samples. The rows reperesent genes and the columns represent samples. -1 implies that the gene is a down-regulated outlier, 1 indicates an up-regulate outlier and 0 means that the gene is not an outlier in a sample.
opa(exprs.matrix, ...) ## S4 method for signature 'matrix' opa(exprs.matrix, group, upper.quantile = 0.95, lower.quantile = 0.05) ## S4 method for signature 'ExpressionSet' opa(exprs.matrix, group, upper.quantile = 0.95, lower.quantile = 0.05)
opa(exprs.matrix, ...) ## S4 method for signature 'matrix' opa(exprs.matrix, group, upper.quantile = 0.95, lower.quantile = 0.05) ## S4 method for signature 'ExpressionSet' opa(exprs.matrix, group, upper.quantile = 0.95, lower.quantile = 0.05)
exprs.matrix |
Gene expression data. Can be either a matrix or an object of type ExpressionSet. |
... |
Numeric. To supply values for upper.quantile and lower.quantile arguments if default values are going to be override. |
group |
A vector of factors representing the groups to which each sample belong. This can be either a vector of 0s and 1s, or normal and cases. |
upper.quantile |
Numeric. The cut-off for upper quantile when determining outliers. Default to 0.95 |
lower.quantile |
Numeric. The cut-off for lower quantile when determining outliers. Default to 0.05 |
opa
returns an object of type OPPARList
. The outlier profiles
are stored in profileMatrix
and can be accessed using $. It it also
possible to retrieve parameters used to run the outlier profile analysis, such
as upper.quantile
, lower.quantile
via the $ operator.
matrix
: opa(exprs.matrix, group, lower.quantile = 0.05, upper.quantile = 0.95)
ExpressionSet
: opa(eset, group, lower.quantile = 0.05, upper.quantile = 0.95)
Wang, C., Taciroglu, A., Maetschke, S. R., Nelson, C. C., Ragan, M. A., & Davis, M. J. (2012). mCOPA: analysis of heterogeneous features in cancer expression data. Journal of Clinical Bioinformatics, 2, 22. http://doi.org/10.1186/2043-9113-2-22
# loading bcm object from GSE46141 dataset data(GSE46141) library(Biobase) # defining the group variable. local breast tumors are the controls # and the rest of the samples are the diseased samples group <- sapply(pData(bcm)$source_name_ch1, function(x){ ifelse(x == "breast",0,1)}) group <- factor(group) # running opa with default values (i.e upper.quantile = 0.95, lower.quantile = 0.05) # the result is an object of type OPPARList opa(bcm,group = group)
# loading bcm object from GSE46141 dataset data(GSE46141) library(Biobase) # defining the group variable. local breast tumors are the controls # and the rest of the samples are the diseased samples group <- sapply(pData(bcm)$source_name_ch1, function(x){ ifelse(x == "breast",0,1)}) group <- factor(group) # running opa with default values (i.e upper.quantile = 0.95, lower.quantile = 0.05) # the result is an object of type OPPARList opa(bcm,group = group)
The oppar package provides 3 main function for outlier profile analysis: opa, getSampleOutlier and getSubtypeProbes, and 1 function for pathway and gene set enrichment analysis, based on gsva function implemented in the GSVA package
calculates the outlier profile matrix, using the method proposed in Wang et al. (2012) paper
extracts outlier profile for individual samples
extracts outlier profile for a group of related samples, such as subtypes
A S4 class for the output of OPPAR main function, opa.
## S4 method for signature 'OPPARList' show(object) ## S4 method for signature 'OPPARList' x$name
## S4 method for signature 'OPPARList' show(object) ## S4 method for signature 'OPPARList' x$name
object |
An object of type OPPARList |
x |
Object of type OPPARList. |
name |
Name of the slot to access. |
returns the number of outlier features detected, the number of samples retained,
and the parameters used to run the opa
function
extracts slots from an object of type OPPARList
.
show
: A show method for objects of class OPPARList
$
: A method to extract slots in OPPARList
profileMatrix
A matrix of 0, -1 and 1 representing outlier genes in samples
upper.quantile
Numeric. The upper quantile cut-off for detection of outliers
lower.quantile
Numeric. The lower quantile cut-off for detection of outliers
group
A factor vector representing the group to which each sample belong
.Data
matrix