Title: | Methods to Find the Gene Expression Modules that Represent the Drivers of Kauffman's Attractor Landscape |
---|---|
Description: | This package contains the functions to find the gene expression modules that represent the drivers of Kauffman's attractor landscape. The modules are the core attractor pathways that discriminate between different cell types of groups of interest. Each pathway has a set of synexpression groups, which show transcriptionally-coordinated changes in gene expression. |
Authors: | Jessica Mar |
Maintainer: | Samuel Zimmerman <[email protected]> |
License: | LGPL (>= 2.0) |
Version: | 1.59.0 |
Built: | 2024-11-19 03:13:13 UTC |
Source: | https://github.com/bioc/attract |
This package contains functions used to determine the gene expression modules that represent the drivers of Kauffman's attractor landscape.
Package: | attract |
Type: | Package |
Version: | 1.33.2 |
Date: | 2018-06-29 |
License: | |
LazyLoad: | yes |
The method can be summarized in the following key steps:
(1) Determine core KEGG or reactome pathways that discriminate the most strongly between celltypes or experimental groups of interest (see findAttractors)
).
(2) Find the different synexpression groups that are present within a core attractor pathway (see findSynexprs
).
(3) Find sets of genes that show highly similar profiles to the synexpression groups within an attractor pathway module (see findCorrPartners
).
(4) Test for functional enrichment for each of the synexpression groups to detect any potentially shared biological themes (see calcFuncSynexprs
).
Jessica Mar <[email protected]>
Kauffman S. 2004. A proposal for using the ensemble approach to understand genetic regulatory networks. J Theor Biol. 230:581. Mar JC, Wells CA, Quackenbush J. 2010. Identifying Gene Expression Modules that Represent the Drivers of Kauffman's Attractor Landscape. To Appear. M\"uller F et al. 2008. Regulatory networks define phenotypic classes of human stem cell lines. Nature. 455(7211): 401. Mar JC, Wells CA, Quackenbush J. 2010. Defining an Informativeness Metric for Clustering Gene Expression Data. To Appear.
## Not run: data(subset.loring.eset) attractor.states <- findAttractors(subset.loring.eset, "celltype", nperm=10, annotation="illuminaHumanv1.db") remove.these.genes <- removeFlatGenes(subset.loring.eset, "celltype", contrasts=NULL, limma.cutoff=0.05) mapk.syn <- findSynexprs("04010", attractor.states, remove.these.genes) mapk.cor <- findCorrPartners(mapk.syn, subset.loring.eset, remove.these.genes) mapk.func <- calcFuncSynexprs(mapk.syn, attractor.states, "CC", annotation="illuminaHumanv1.db") ## End(Not run)
## Not run: data(subset.loring.eset) attractor.states <- findAttractors(subset.loring.eset, "celltype", nperm=10, annotation="illuminaHumanv1.db") remove.these.genes <- removeFlatGenes(subset.loring.eset, "celltype", contrasts=NULL, limma.cutoff=0.05) mapk.syn <- findSynexprs("04010", attractor.states, remove.these.genes) mapk.cor <- findCorrPartners(mapk.syn, subset.loring.eset, remove.these.genes) mapk.func <- calcFuncSynexprs(mapk.syn, attractor.states, "CC", annotation="illuminaHumanv1.db") ## End(Not run)
This is a class representation for storing the output of the findAttractors
function.
Objects are output by the function findAttractors
.
Objects can also be created by using new("AttractorModuleSet", ...)
.
eSet
:ExpressionSet which primarily stores the expression data and the phenotype/sample data sets.
cellTypeTag
:character string of the tag which stores the group membership information for the samples. Must be a column name of the data frame pData(eset).
incidenceMatrix
:incidence matrix used as input to GSEAlm.
rankedPathways
:Data frame of significantly enriched pathways, ranked first by significance and then by size.
No methods have yet been defined with class "AttractorModuleSet" in the signature.
This class is better describe in the vignette.
Jessica Mar [email protected]
## Not run: new.attractmodule <- new("AttractorModuleSet", eSet=new("ExpressionSet"), cellTypeTag=character(1), incidenceMatrix=matrix(0), rankedPathways=data.frame()) ## End(Not run)
## Not run: new.attractmodule <- new("AttractorModuleSet", eSet=new("ExpressionSet"), cellTypeTag=character(1), incidenceMatrix=matrix(0), rankedPathways=data.frame()) ## End(Not run)
This function builds an incidence matrix for custom gene sets.
buildCustomIncidenceMatrix(geneSetFrame, geneNames, databaseGeneFormat, expressionSetGeneFormat,geneSetNames)
buildCustomIncidenceMatrix(geneSetFrame, geneNames, databaseGeneFormat, expressionSetGeneFormat,geneSetNames)
geneSetFrame |
a |
geneNames |
a |
databaseGeneFormat |
a character string specifying the type of identifier for a gene in a database (KEGG, reactome, MsigDB) gene set. The default value is NULL. (ex. SYMBOL, ENTREZID, REFSEQ, ENSEMBL) |
expressionSetGeneFormat |
a character string specifying the type of identifier for a gene in your expression data set. The default value is NULL. (ex. SYMBOL, ENTREZID, REFSEQ, ENSEMBL) |
geneSetNames |
a |
This function creates an incidence matrix from a dataframe where the rows are the names of gene sets and the columns are genes.
A matrix object with 0 and 1 entries where 1 denotes membership of a gene in a custom gene set, 0 denotes non-membership.
Jessica Mar
Mar, J., C. Wells, and J. Quackenbush, Identifying the Gene Expression Modules that Represent the Drivers of Kauffman's Attractor Landscape. to appear, 2010.
This function performs functional enrichment for a given set of synexpression groups.
calcFuncSynexprs(mySynExpressionSet, myAttractorModuleSet, ontology = "BP", min.pvalue = 0.05, min.pwaysize = 5, annotation = "illuminaHumanv2.db", analysis="microarray", expressionSetGeneFormat=NULL, ...)
calcFuncSynexprs(mySynExpressionSet, myAttractorModuleSet, ontology = "BP", min.pvalue = 0.05, min.pwaysize = 5, annotation = "illuminaHumanv2.db", analysis="microarray", expressionSetGeneFormat=NULL, ...)
mySynExpressionSet |
|
myAttractorModuleSet |
|
ontology |
character string specifying which GO ontology to use, either "MF", "BP", or "CC"; defaults to "BP". |
min.pvalue |
numeric value specifying adjusted P-value cut-off to use, categories with P-values <= min.pvalue will be reported. |
min.pwaysize |
|
annotation |
character string specifying the annotation package that corresponds to the chip platform the data was generated from. |
analysis |
a character string specifying what type of experiment you performed, microarray or RNAseq. |
expressionSetGeneFormat |
a character string specifying the type of identifier for a gene in your expression data set. The default value is NULL. (ex. SYMBOL, ENTREZID, REFSEQ, ENSEMBL) |
... |
additional arguments. |
This function performs a functional enrichment analysis on each synexpression group using the hyperGTest
from
the GOstats
package. P-values are adjusted using the Benjamini-Hochberg correction method. Results are returned
only if they satisfy the minimum P-value level, as specified by the min.pvalue
argument.
A list
object.
Jessica Mar
Falcon, S. and R. Gentleman, Using GOstats to test gene lists for GO term association. Bioinformatics, 2007. 23(2): p. 257-8.
data(subset.loring.eset) attractor.states <- findAttractors(subset.loring.eset, "celltype", nperm=10, annotation="illuminaHumanv1.db",analysis="microarray") remove.these.genes <- removeFlatGenes(subset.loring.eset, "celltype", contrasts=NULL, limma.cutoff=0.05) mapk.syn <- findSynexprs("04010", attractor.states, remove.these.genes) mapk.func <- calcFuncSynexprs(mapk.syn, attractor.states, "CC", annotation="illuminaHumanv1.db", analysis="microarray", expressionSetGeneFormat=NULL)
data(subset.loring.eset) attractor.states <- findAttractors(subset.loring.eset, "celltype", nperm=10, annotation="illuminaHumanv1.db",analysis="microarray") remove.these.genes <- removeFlatGenes(subset.loring.eset, "celltype", contrasts=NULL, limma.cutoff=0.05) mapk.syn <- findSynexprs("04010", attractor.states, remove.these.genes) mapk.func <- calcFuncSynexprs(mapk.syn, attractor.states, "CC", annotation="illuminaHumanv1.db", analysis="microarray", expressionSetGeneFormat=NULL)
Function calculates the informativeness metric (average MSS) for a set of cluster assignments.
calcInform(exprs.dat, cl, class.vector)
calcInform(exprs.dat, cl, class.vector)
exprs.dat |
a |
cl |
a |
class.vector |
a |
This function is also called internally by findSynexprs
.
A numeric value representing the average MSS value (informativeness metric) for a set of cluster assignments. For an informative cluster, the RSS values should be very small relative to those produced by the informativeness metric (the MSS values).
Jessica Mar
Mar, J., C. Wells, and J. Quackenbush, Defining an Informativeness Metric for Clustering Gene Expression Data. to appear, 2010.
## Not run: library(cluster) data(subset.loring.eset) clustObj <- agnes(as.dist(1-t(cor(exprs(subset.loring.eset))))) cinform.vals <- NULL for( i in 1:10 ){ cinform.vals <- c(cinform.vals, calcInform(exprs(subset.loring.eset), cutree(clustObj,i), pData(subset.loring.eset)$celltype)) } k <- (1:10)[cinform.vals==max(cinform.vals)] # gives the optimal number of clusters ## End(Not run)
## Not run: library(cluster) data(subset.loring.eset) clustObj <- agnes(as.dist(1-t(cor(exprs(subset.loring.eset))))) cinform.vals <- NULL for( i in 1:10 ){ cinform.vals <- c(cinform.vals, calcInform(exprs(subset.loring.eset), cutree(clustObj,i), pData(subset.loring.eset)$celltype)) } k <- (1:10)[cinform.vals==max(cinform.vals)] # gives the optimal number of clusters ## End(Not run)
Function calculates a modified F-statistic for a set of cluster assignments.
calcModfstat(exprs.dat, cl, class.vector)
calcModfstat(exprs.dat, cl, class.vector)
exprs.dat |
a |
cl |
a |
class.vector |
a |
This function is called internally by findSynexprs
.
a modified F-statistic (average MSS/average RSS) value for a set of cluster assignments.
Jessica Mar
## Not run: library(cluster) data(subset.loring.eset) clustObj <- agnes(as.dist(1-t(cor(exprs(subset.loring.eset))))) cfmod.vals <- NULL for( i in 1:10 ){ cfmod.vals <- c(cfmod.vals, calcModfstat(exprs(subset.loring.eset), cutree(clustObj,i), pData(subset.loring.eset)$celltype)) } k <- (1:10)[cfmod.vals==max(cfmod.vals)] ## End(Not run)
## Not run: library(cluster) data(subset.loring.eset) clustObj <- agnes(as.dist(1-t(cor(exprs(subset.loring.eset))))) cfmod.vals <- NULL for( i in 1:10 ){ cfmod.vals <- c(cfmod.vals, calcModfstat(exprs(subset.loring.eset), cutree(clustObj,i), pData(subset.loring.eset)$celltype)) } k <- (1:10)[cfmod.vals==max(cfmod.vals)] ## End(Not run)
Function calculates the average RSS for a set of cluster assignments.
calcRss(exprs.dat, cl, class.vector)
calcRss(exprs.dat, cl, class.vector)
exprs.dat |
a |
cl |
a |
class.vector |
a |
This function is called internally by findSynexprs
.
For an informative cluster, the RSS values should be very small relative to those produced by the informativeness metric (the MSS values).
A numeric value representing the average RSS value for this set of cluster assignments.
Jessica Mar
## Not run: library(cluster) data(subset.loring.eset) clustObj <- agnes(as.dist(1-t(cor(exprs(subset.loring.eset))))) crss.vals <- NULL for( i in 1:10 ){ crss.vals <- c(crss.vals, calcRss(exprs(subset.loring.eset), cutree(clustObj,i), pData(subset.loring.eset)$celltype)) } # The RSS values are expected to be smaller than the informativeness metric values in the presence of genuine cluster structure. ## End(Not run)
## Not run: library(cluster) data(subset.loring.eset) clustObj <- agnes(as.dist(1-t(cor(exprs(subset.loring.eset))))) crss.vals <- NULL for( i in 1:10 ){ crss.vals <- c(crss.vals, calcRss(exprs(subset.loring.eset), cutree(clustObj,i), pData(subset.loring.eset)$celltype)) } # The RSS values are expected to be smaller than the informativeness metric values in the presence of genuine cluster structure. ## End(Not run)
This is a matrix object containing published gene expression data from Mueller et al. (NCBI GEO accession id GSE11508). The data set contains 11044 probes for 68 samples. From the original data set, we have selected four cell lines giving a total of 68 samples - embryonic stem cells (12 samples), neural progenitors (31 samples), neural stem cells (8 samples) and teratoma-differentiated cells (17 samples). The lines have also been restricted based on Illumina BeadChip platform, and only those collected using the WG-6 version have been used.
We also applied a quality filter to the original gene expression data where a probe was retained if it passed a 0.99 detection score in 75
data(exprs.dat)
data(exprs.dat)
A matrix with normalized log2 expression intensities for 11044 probes on 68 samples (representing 4 different cell types).
A matrix object containing published gene expression data from Mueller et al. (NCBI GEO accession id GSE11508). The data set contains 11044 probes for 68 samples.
M\"uller F, et al., Regulatory networks define phenotypic classes of human stem cell lines. Nature, 2008. 455(7211): p. 401-405.
data(exprs.dat)
data(exprs.dat)
This function filters our lowly expressed genes in RNAseq data.
filterDataSet(data,filterPerc=0.75)
filterDataSet(data,filterPerc=0.75)
data |
A dataset with genes as rows and samples as columns. |
filterPerc |
a number specifying the percent of expression values that are not equal to 0 for a gene. |
This function removes any genes in a dataset that have an expression value of 0 for a specified percentage of samles.
A data frame is returned.
Jessica Mar
data(exprs.dat) exprs.filtered.dat <- filterDataSet(exprs.dat)
data(exprs.dat) exprs.filtered.dat <- filterDataSet(exprs.dat)
The function infers a set of KEGG pathways that correspond to the cell-lineage specific gene expression modules, as determined using GSEA. These pathways represent those that show the greatest discrimination between the different cell types or tissues in the expression data set supplied.
findAttractors(myEset, cellTypeTag, min.pwaysize = 5, annotation = "illuminaHumanv2.db", database="KEGG", analysis="microarray", databaseGeneFormat=NULL, expressionSetGeneFormat=NULL, ...)
findAttractors(myEset, cellTypeTag, min.pwaysize = 5, annotation = "illuminaHumanv2.db", database="KEGG", analysis="microarray", databaseGeneFormat=NULL, expressionSetGeneFormat=NULL, ...)
myEset |
|
cellTypeTag |
character string of the variable name which stores the cell-lineages or experimental groups of interest for the samples in the data set
(this string should be one of the column names of |
min.pwaysize |
|
annotation |
character string specifying the annotation package that corresponds to the chip platform or organism (for RNAseq data) the data was generated from. |
database |
a character string specifiying what pathway database you would like to use. |
analysis |
a character string specifying what type of experiment you performed, microarray or RNAseq. |
databaseGeneFormat |
a character string specifying the type of identifier for a gene in a database (KEGG, REACTOME, MsigDB) gene set. The default value is NULL. (ex. SYMBOL, ENTREZID, REFSEQ, ENSEMBL) |
expressionSetGeneFormat |
a character string specifying the type of identifier for a gene in your expression data set. The default value is NULL. (ex. SYMBOL, ENTREZID, REFSEQ, ENSEMBL) |
... |
additional arguments. |
This function subsets the expression data so that only those genes with annotations in KEGG or reactome are used for the downstream
gene set enrichment analysis. This subset is stored in the eSet slot of the AttractoModuleSet output object.
The GSEAlm algorithm finds the KEGG or reactome pathway modules which discriminate between the celltypes or experimental groups of interest.
It also ranks the results of the GSEAlm step by significance of these pathway modules, as stored in rankedPathways
.
The output object of the findAttractors
function also contains the incidence matrix that was built for the KEGG or reactome pathways, stored in
the slot incidenceMatrix
and the character string denoting which column of the sample data represents the cell type or
experimental groups of interest, as stored in the slot cellTypeTag
.
An AttractorModuleSet
object.
Jessica Mar
Jiang, Z. and R. Gentleman, Extensions to gene set enrichment. Bioinformatics, 2007. 23(3): p. 306-313. Kanehisa, M. and S. Goto, KEGG: Kyoto Encyclopedia of Genes and Genomes. . Nucleic Acids Res., 2000. 28: p. 27-30. Mar, J., C. Wells, and J. Quackenbush, Identifying the Gene Expression Modules that Represent the Drivers of Kauffman's Attractor Landscape. to appear, 2010.
data(subset.loring.eset) attractor.states <- findAttractors(subset.loring.eset, "celltype", annotation="illuminaHumanv1.db",database="KEGG", analysis="microarray",databaseGeneFormat=NULL, expressionSetGeneFormat=NULL) MSigDBpath <- system.file("extdata","c4.cgn.v5.0.entrez.gmt",package="attract") attractor.states.cutsom <- findAttractors(subset.loring.eset, "celltype", annotation="illuminaHumanv1.db",database=MSigDBpath, analysis="microarray",databaseGeneFormat="ENTREZID", expressionSetGeneFormat="PROBEID")
data(subset.loring.eset) attractor.states <- findAttractors(subset.loring.eset, "celltype", annotation="illuminaHumanv1.db",database="KEGG", analysis="microarray",databaseGeneFormat=NULL, expressionSetGeneFormat=NULL) MSigDBpath <- system.file("extdata","c4.cgn.v5.0.entrez.gmt",package="attract") attractor.states.cutsom <- findAttractors(subset.loring.eset, "celltype", annotation="illuminaHumanv1.db",database=MSigDBpath, analysis="microarray",databaseGeneFormat="ENTREZID", expressionSetGeneFormat="PROBEID")
This function finds genes with expression profiles highly correlated to a synexpression group.
findCorrPartners(mySynExpressionSet, myEset, removeGenes = NULL, cor.cutoff = 0.85, ...)
findCorrPartners(mySynExpressionSet, myEset, removeGenes = NULL, cor.cutoff = 0.85, ...)
mySynExpressionSet |
|
myEset |
|
removeGenes |
|
cor.cutoff |
numeric value specifying the correlation cut-off. |
... |
additional arguments. |
Genes with highly correlated profiles to the synexpression groups (e.g. R > 0.85) are also likely to be integral in
maintaining cell type-specific differences, however due to their lack of inclusion in resources like KEGG, would not
have been picked up by the first GSEA step using findAttractors
.
A SynExpressionSet
object which stores the genes that are highly correlated with the synexpression group provided, and their average expression profile.
Jessica Mar
data(subset.loring.eset) attractor.states <- findAttractors(subset.loring.eset, "celltype", annotation="illuminaHumanv1.db") remove.these.genes <- removeFlatGenes(subset.loring.eset, "celltype", contrasts=NULL, limma.cutoff=0.05) mapk.syn <- findSynexprs("04010", attractor.states, remove.these.genes) mapk.cor <- findCorrPartners(mapk.syn, subset.loring.eset, remove.these.genes)
data(subset.loring.eset) attractor.states <- findAttractors(subset.loring.eset, "celltype", annotation="illuminaHumanv1.db") remove.these.genes <- removeFlatGenes(subset.loring.eset, "celltype", contrasts=NULL, limma.cutoff=0.05) mapk.syn <- findSynexprs("04010", attractor.states, remove.these.genes) mapk.cor <- findCorrPartners(mapk.syn, subset.loring.eset, remove.these.genes)
This function takes the modules that were inferred from the GSEA step using (findAttractors
) and finds a set of transcriptionally
coherent set of genes associated with a particular core attractor pathway, i.e. the synexpression groups.
findSynexprs(myIDs, myDataSet, cellTypeTag, removeGenes = NULL, min.clustersize = 5, ...)
findSynexprs(myIDs, myDataSet, cellTypeTag, removeGenes = NULL, min.clustersize = 5, ...)
myIDs |
either a single character string or |
myDataSet |
|
cellTypeTag |
character string of the variable name which stores the cell-lineages or experimental groups of interest for the samples in the data set (this string should be one of the column names of pData(myEset)). |
removeGenes |
|
min.clustersize |
|
... |
additional arguments. |
This function performs a hierarichical cluster analysis of the genes in a core attractor pathway module, and uses an informativeness metric to determine the number of optimal clusters (syenxpression groups) that describe the data.
If a single KEGG or reactome ID is specified in pwayIds
, then a SynExpressionSet
object is returned.
If a multiple KEGG or reactome IDs are specified, then an environment object is returned where the keys are labeled "pwayIDsynexprs"
(e.g. for MAPK KEGGID = 04010, the key is pway04010synexprs). The value associated with each key is a SynExpressionSet
object.
Jessica Mar
Mar, J., C. Wells, and J. Quackenbush, Identifying the Gene Expression Modules that Represent the Drivers of Kauffman's Attractor Landscape. to appear, 2010.
data(subset.loring.eset) attractor.states <- findAttractors(subset.loring.eset, "celltype", annotation="illuminaHumanv1.db") remove.these.genes <- removeFlatGenes(subset.loring.eset, "celltype", contrasts=NULL, limma.cutoff=0.05) mapk.syn <- findSynexprs("04010", attractor.states, "celltype", remove.these.genes) top5.syn <- findSynexprs(attractor.states@rankedPathways[1:5,1], attractor.states, "celltype", removeGenes=remove.these.genes) vec.geneid <- c("GI_17999531-S","GI_17978503-A") custom.syn <- findSynexprs(vec.geneid, subset.loring.eset, "celltype", removeGenes=remove.these.genes)
data(subset.loring.eset) attractor.states <- findAttractors(subset.loring.eset, "celltype", annotation="illuminaHumanv1.db") remove.these.genes <- removeFlatGenes(subset.loring.eset, "celltype", contrasts=NULL, limma.cutoff=0.05) mapk.syn <- findSynexprs("04010", attractor.states, "celltype", remove.these.genes) top5.syn <- findSynexprs(attractor.states@rankedPathways[1:5,1], attractor.states, "celltype", removeGenes=remove.these.genes) vec.geneid <- c("GI_17999531-S","GI_17978503-A") custom.syn <- findSynexprs(vec.geneid, subset.loring.eset, "celltype", removeGenes=remove.these.genes)
This is an ExpressionSet
object containing the published data from M?ller et al. (NCBI GEO
accession id GSE11508). The expression data set contains 11044 probes for 68 samples.
data(loring.eset)
data(loring.eset)
An ExpressionSet
object.
An ExpressionSet
object containing the published data from M?ller et al. (NCBI GEO
accession id GSE11508). The expression data set contains 11044 probes for 68 samples.
M\"uller, F, et al., Regulatory networks define phenotypic classes of human stem cell lines. Nature, 2008. 455(7211): p. 401-405.
data(loring.eset) exprs.dat <- exprs(loring.eset) # gene expression matrix
data(loring.eset) exprs.dat <- exprs(loring.eset) # gene expression matrix
This function plots the average expression profile for a specific synexpression group.
plotsynexprs(mySynExpressionSet, tickMarks, tickLabels, vertLines, index=1, ...)
plotsynexprs(mySynExpressionSet, tickMarks, tickLabels, vertLines, index=1, ...)
mySynExpressionSet |
|
tickMarks |
numeric vector of specifying the location of the tick marks along the x-axis. There should be one tick for each cell type or group. |
tickLabels |
character vector specifying the labels to be appear underneath the tick marks on the x-axis. These should correspond to the cell type or group names. |
vertLines |
numeric vector specifying the location of the vertical lines that indicate the cell type or group-specific regions along the x-axis. |
index |
numeric value specifying which synexpression group should be plotted. |
... |
additional arguments. |
Generic plotting parameters can be passed to this function to create a more sophisticated plot, e.g col="blue"
, main="Synexpression Group 1"
.
A plot showing the average expression profile for the synexpression group specified.
Jessica Mar
data(subset.loring.eset) attractor.states <- findAttractors(subset.loring.eset, "celltype", nperm=10, annotation="illuminaHumanv1.db") remove.these.genes <- removeFlatGenes(subset.loring.eset, "celltype", contrasts=NULL, limma.cutoff=0.05) mapk.syn <- findSynexprs("04010", attractor.states, remove.these.genes) par(mfrow=c(2,2)) pretty.col <- rainbow(3) for( i in 1:3 ){ plotsynexprs(mapk.syn, tickMarks=c(6, 28, 47, 60), tickLabels=c("ESC", "PRO", "NSC", "TER"), vertLines=c(12.5, 43.5, 51.5), index=i, main=paste("Synexpression Group ", i, sep=""), col=pretty.col[i]) }
data(subset.loring.eset) attractor.states <- findAttractors(subset.loring.eset, "celltype", nperm=10, annotation="illuminaHumanv1.db") remove.these.genes <- removeFlatGenes(subset.loring.eset, "celltype", contrasts=NULL, limma.cutoff=0.05) mapk.syn <- findSynexprs("04010", attractor.states, remove.these.genes) par(mfrow=c(2,2)) pretty.col <- rainbow(3) for( i in 1:3 ){ plotsynexprs(mapk.syn, tickMarks=c(6, 28, 47, 60), tickLabels=c("ESC", "PRO", "NSC", "TER"), vertLines=c(12.5, 43.5, 51.5), index=i, main=paste("Synexpression Group ", i, sep=""), col=pretty.col[i]) }
This function uses a linear model set up in limma
to assess the degree of association between celltype and a gene's expression
profile. In this way, we can flag those genes whose profiles show very little change across different celltype groups, or in other words
are "flat".
removeFlatGenes(eSet, cellTypeTag, contrasts = NULL, limma.cutoff = 0.05, ...)
removeFlatGenes(eSet, cellTypeTag, contrasts = NULL, limma.cutoff = 0.05, ...)
eSet |
|
cellTypeTag |
character string of the variable name which stores the cell-lineages or experimental groups of interest for the samples in the data set (this string should be one of the column names of pData(myEset)). |
contrasts |
optional vector of contrasts that specify the comparisons of interest. By default, all comparisons between the differnt groups are generated. |
limma.cutoff |
numeric specifying the P-value cutoff. Genes with P-values greater than this value are considered "flat" and will be included in the set of flat genes. |
... |
additional arguments. |
Flat genes are removed from the analysis after the core attractor pathway modules are first inferred (i.e. the findAttractors
step).
A vector
with gene names (as defined in the eset) of those genes with expression profiles that hardly vary across
different celltype or experimental groups.
Jessica Mar
limma
package.
Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3, No. 1, Article 3.
data(subset.loring.eset) remove.these.genes <- removeFlatGenes(subset.loring.eset, "celltype", contrasts=NULL, limma.cutoff=0.05)
data(subset.loring.eset) remove.these.genes <- removeFlatGenes(subset.loring.eset, "celltype", contrasts=NULL, limma.cutoff=0.05)
This is sample information data frame for the samples in the Mueller data set (NCBI GEO accession id GSE11508). The data frame contains the cell type groups for the 68 samples.
data(samp.info)
data(samp.info)
A data.frame object with one column of sample IDs (these are the column IDs of the exprs.dat expression matrix object) and second column indicating which cell type each sample represents.
ChipID
A vector
of sample IDs.
celltype
A vector
denoting the cell type a sample represents.
A sample data frame for the samples in the Mueller data set (NCBI GEO accession id GSE11508). The data frame contains the cell type groups for the 68 samples.
M\"uller F, et al., Regulatory networks define phenotypic classes of human stem cell lines. Nature, 2008. 455(7211): p. 401-405.
data(samp.info)
data(samp.info)
This is an ExpressionSet
object containing a subset of the published data from M?ller et al. (NCBI GEO
accession id GSE11508). The expression data set contains 5522 probes for 68 samples. This ExpressionSet
object
was created specifically to demonstrate the functions in this package. If you're looking for the full M?ller data set,
see loring.eset
.
data(subset.loring.eset)
data(subset.loring.eset)
An ExpressionSet
object.
An ExpressionSet
object containing a subset of the published data from M?ller et al. (NCBI GEO
accession id GSE11508). The expression data set contains 5522 probes for 68 samples.
M\"uller, F, et al., Regulatory networks define phenotypic classes of human stem cell lines. Nature, 2008. 455(7211): p. 401-405.
exprs.dat
, samp.info
, loring.eset
data(subset.loring.eset) subset.exprs.dat <- exprs(subset.loring.eset) # gene expression matrix
data(subset.loring.eset) subset.exprs.dat <- exprs(subset.loring.eset) # gene expression matrix
This is a class representation for storing synexpression group information.
Objects are output by the function findSynexprs
.
Objects can also be created by using new("SynExpressionSet", ...)
.
groups
:A list
object denoting the probes or gene IDs (rnaseq) belonging to each synexpression group.
profiles
: A matrix
of average expresson profiles for each synexpression group, each profile is stored as a row.
No methods have yet been defined with class "SynExpressionSet" in the signature.
This class is described in more detail in the vignette.
Jessica Mar [email protected]
new.synexpressionset <- new("SynExpressionSet", groups=list(), profiles=matrix(0))
new.synexpressionset <- new("SynExpressionSet", groups=list(), profiles=matrix(0))