Title: | Soft clustering of omics time series data |
---|---|
Description: | The Mfuzz package implements noise-robust soft clustering of omics time-series data, including transcriptomic, proteomic or metabolomic data. It is based on the use of c-means clustering. For convenience, it includes a graphical user interface. |
Authors: | Matthias Futschik <[email protected]> |
Maintainer: | Matthias Futschik <[email protected]> |
License: | GPL-2 |
Version: | 2.67.0 |
Built: | 2024-10-30 08:22:23 UTC |
Source: | https://github.com/bioc/Mfuzz |
This function extracts genes forming the alpha cores of soft clusters
acore(eset,cl,min.acore=0.5)
acore(eset,cl,min.acore=0.5)
eset |
object of the class ExpressionSet. |
cl |
An object of class |
min.acore |
minimum membership values of gene belonging to the cluster core. |
The function produces an list of alpha cores including genes and their membership values for the corresponding cluster.
Matthias E. Futschik (http://www.sysbiolab.eu/matthias.html)
if (interactive()){ ### Data loaing and pre-processing data(yeast) # data set includes 17 measurements yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) ### Soft clustering and visualisation cl <- mfuzz(yeastF,c=20,m=1.25) acore.list <- acore(yeastF,cl=cl,min.acore=0.7) }
if (interactive()){ ### Data loaing and pre-processing data(yeast) # data set includes 17 measurements yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) ### Soft clustering and visualisation cl <- mfuzz(yeastF,c=20,m=1.25) acore.list <- acore(yeastF,cl=cl,min.acore=0.7) }
This function performs repeated soft clustering for a range of cluster numbers c and reports the number of empty clusters detected.
cselection(eset,m,crange=seq(4,32,4),repeats=5,visu=TRUE,...)
cselection(eset,m,crange=seq(4,32,4),repeats=5,visu=TRUE,...)
eset |
object of class ExpressionSet. |
m |
value of fuzzy c-means parameter |
crange |
range of number of clusters |
repeats |
number of repeated clusterings. |
visu |
If |
... |
additional arguments for underlying |
A soft cluster is considered as empty, if none of the genes has a corresponding membership value larger than 0.5
A matrix with the number of empty clusters detected is generated.
The cselection
function may help to determine an accurate cluster number. However, it should be used with care, as the determination remains difficult especially for short time series and overlapping clusters. Alternatively, the Dmin function can be used to select an optimal number of clusters based on the distances between centroids. Another way to select the cluster number is to use external annotation. For instance, one might perform clustering with a range of cluster numbers and subsequently assess their biological relevance e.g. by GO analyses.
Matthias E. Futschik (http://www.sysbiolab.eu)
M.E. Futschik and B. Charlisle, Noise robust clustering of gene expression time-course data, Journal of Bioinformatics and Computational Biology, 3 (4), 965-988, 2005
L. Kumar and M. Futschik, Mfuzz: a software package for soft clustering of microarray data, Bioinformation, 2(1) 5-7,2007
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) #### parameter selection # Empty clusters should not appear cl <- mfuzz(yeastF,c=20,m=1.25) mfuzz.plot(yeastF,cl=cl,mfrow=c(4,5)) # Note: The following calculation might take some time tmp <- cselection(yeastF,m=1.25,crange=seq(5,40,5),repeats=5,visu=TRUE) # derivation of number of non-empty clusters (crosses) from diagnonal # line indicate appearance of empty clusters # Empty clusters might appear cl <- mfuzz(yeastF,c=40,m=1.25) mfuzz.plot(yeastF,cl=cl,mfrow=c(4,5)) }
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) #### parameter selection # Empty clusters should not appear cl <- mfuzz(yeastF,c=20,m=1.25) mfuzz.plot(yeastF,cl=cl,mfrow=c(4,5)) # Note: The following calculation might take some time tmp <- cselection(yeastF,m=1.25,crange=seq(5,40,5),repeats=5,visu=TRUE) # derivation of number of non-empty clusters (crosses) from diagnonal # line indicate appearance of empty clusters # Empty clusters might appear cl <- mfuzz(yeastF,c=40,m=1.25) mfuzz.plot(yeastF,cl=cl,mfrow=c(4,5)) }
This function performs repeated soft clustering for a range of cluster numbers c and reports the minimum centroid distance.
Dmin(eset,m,crange=seq(4,40,4),repeats=3,visu=TRUE)
Dmin(eset,m,crange=seq(4,40,4),repeats=3,visu=TRUE)
eset |
object of class ExpressionSet. |
m |
value of fuzzy c-means parameter |
crange |
range of number of clusters |
repeats |
number of repeated clusterings. |
visu |
If |
The minimum centroid distance is defined as the minimum distance between two cluster centers produced by the c-means clusterings.
The average minimum centroid distance for the given range of cluster number is returned.
The minimum centroid distance can be used as cluster validity
index. For an optimal cluster number, we may see a ‘drop’ of minimum centroid distance
wh plotted versus a range of cluster number and a slower
decrease of the minimum centroid distance for higher cluster number.
More information and some examples can be found in the study of
Schwaemmle and Jensen (2010).
However, it should be used with care, as the determination remains
difficult especially for short time series and overlapping
clusters. Alternatively, the function cselection
can be used or
functional enrichment analysis (e.g. using Gene Ontology) can help to
adjust the cluster number.
Matthias E. Futschik (http://www.sysbiolab.eu/matthias.html)
M.E. Futschik and B. Charlisle, Noise robust clustering of gene expression time-course data, Journal of Bioinformatics and Computational Biology, 3 (4), 965-988, 2005
L. Kumar and M. Futschik, Mfuzz: a software package for soft clustering of microarray data, Bioinformation, 2(1) 5-7,2007
Schwaemmle and Jensen, Bioinformatics,Vol. 26 (22), 2841-2848, 2010
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) #### parameter selection # For fuzzifier m, we could use mestimate m1 <- mestimate(yeastF) m1 # 1.15 # or the function partcoef (see example there) # For selection of c, either cselection (see example there) # or tmp <- Dmin(yeastF,m=m1,crange=seq(4,40,4),repeats=3,visu=TRUE)# Note: This calculation might take some time # It seems that the decrease for c ~ 20 - 25 24 and thus 20 might be # a suitable number of clusters }
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) #### parameter selection # For fuzzifier m, we could use mestimate m1 <- mestimate(yeastF) m1 # 1.15 # or the function partcoef (see example there) # For selection of c, either cselection (see example there) # or tmp <- Dmin(yeastF,m=m1,crange=seq(4,40,4),repeats=3,visu=TRUE)# Note: This calculation might take some time # It seems that the decrease for c ~ 20 - 25 24 and thus 20 might be # a suitable number of clusters }
Methods for replacement of missing values. Missing values should be indicated by NA in the expression matrix.
fill.NA(eset,mode="mean",k=10)
fill.NA(eset,mode="mean",k=10)
eset |
object of the class ExpressionSet. |
mode |
method for replacement of missing values:
|
k |
Number of neighbours, if one of the knn method for
replacement is chosen ( |
The function produces an object of the ExpressionSet class with missing values replaced.
The replacement methods knn and knnw can computationally intensive for large gene expression data sets. It may be a good idea to run these methods as a ‘lunchtime’ or ‘overnight’ job.
Matthias E. Futschik (http://itb.biologie.hu-berlin.de/~futschik) and Lokesh Kumar
if (interactive()){ data(yeast) # data set includes 17 measurements yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) }
if (interactive()){ data(yeast) # data set includes 17 measurements yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) }
This function can be used to exclude genes with a large number of expression values not available.
filter.NA(eset,thres=0.25)
filter.NA(eset,thres=0.25)
eset |
object of the class “ExpressionSet”. |
thres |
threshold for excluding genes. If the percentage
of missing values (indicated by NA in the expression matrix)
is larger than |
The function produces an object of the ExpressionSet class. It is the same as the input eset object, except for the genes excluded.
Matthias E. Futschik (http://www.sysbiolab.eu)
if (interactive()){ data(yeast) # data set includes 17 measurements yeastF <- filter.NA(yeast) # genes are excluded if more than 4 measurements are missing }
if (interactive()){ data(yeast) # data set includes 17 measurements yeastF <- filter.NA(yeast) # genes are excluded if more than 4 measurements are missing }
This function can be used to exclude genes with low standard deviation.
filter.std(eset,min.std,visu=TRUE)
filter.std(eset,min.std,visu=TRUE)
eset |
object of the class ExpressionSet. |
min.std |
threshold for minimum standard deviation. If
the standard deviation of a gene's expression is smaller than
|
visu |
If |
The function produces an object of the ExpressionSet class. It is the
same as the input eset
object, except for the genes excluded.
As soft clustering is noise robust, pre-filtering can usually be avoided. However, if the number of genes with small expression changes is large, such pre-filtering may be necessary to reduce noise.
Matthias E. Futschik (http://itb.biologie.hu-berlin.de/~futschik)
data(yeast) # data set includes 17 measurements yeastF <- filter.NA(yeast) # filtering of genes based on missing values yeastF <- filter.std(yeastF,min.std=0.3) # filtering of genes based on standard deviation
data(yeast) # data set includes 17 measurements yeastF <- filter.NA(yeast) # filtering of genes based on missing values yeastF <- filter.std(yeastF,min.std=0.3) # filtering of genes based on standard deviation
This function is a wrapper function for
kmeans
of the e1071
package. It performs
hard clustering of genes based on their expression values using
the k-means algorithm.
kmeans2(eset,k,iter.max=100)
kmeans2(eset,k,iter.max=100)
eset |
object of the class ExpressionSet. |
k |
number of clusters. |
iter.max |
maximal number of iterations. |
An list of clustering components (see
kmeans
).
Matthias E. Futschik (http://itb.biologie.hu-berlin.de/~futschik)
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) # K-means clustering and visualisation kl <- kmeans2(yeastF,k=20) kmeans2.plot(yeastF,kl=kl,mfrow=c(2,2)) }
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) # K-means clustering and visualisation kl <- kmeans2(yeastF,k=20) kmeans2.plot(yeastF,kl=kl,mfrow=c(2,2)) }
This function visualises the clusters
produced by kmeans2
.
kmeans2.plot(eset,kl,mfrow=c(1,1))
kmeans2.plot(eset,kl,mfrow=c(1,1))
eset |
object of the class“ExpressionSet”. |
kl |
list produced by |
mfrow |
determines splitting of graphic window. |
The function displays the temporal profiles of clusters detected by k-means.
Matthias E. Futschik (http://itb.biologie.hu-berlin.de/~futschik)
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) # K-means clustering and visualisation kl <- kmeans2(yeastF,k=20) kmeans2.plot(yeastF,kl=kl,mfrow=c(2,2)) }
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) # K-means clustering and visualisation kl <- kmeans2(yeastF,k=20) kmeans2.plot(yeastF,kl=kl,mfrow=c(2,2)) }
Function that calculates the membership values of genes based on provided data and existing clustering
membership(x,clusters,m)
membership(x,clusters,m)
x |
expression vector or expression matrix |
clusters |
cluster centroids from existing clustering |
m |
fuzzification parameter |
Matrix of membership values for new genes
This function calculates membership values for new data based on existing cluster centroids and fuzzification parameter. It can be useful, for instance, when comparing two time series, to assess whether the same gene in the different time series changes its cluster association.
Matthias E. Futschik (http://www.sysbiolab.eu)
if (interactive()){ data(yeast) yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) # for illustration only; rather use knn method yeastF <- standardise(yeastF) cl <- mfuzz(yeastF,c=20,m=1.25) m <- 1.25 clusters <- cl[[1]] x <- matrix(rnorm(2*17),nrow=2) # new expression matrix with two genes mem.tmp <- membership(x,clusters=clusters,m=m) #membership values }
if (interactive()){ data(yeast) yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) # for illustration only; rather use knn method yeastF <- standardise(yeastF) cl <- mfuzz(yeastF,c=20,m=1.25) m <- 1.25 clusters <- cl[[1]] x <- matrix(rnorm(2*17),nrow=2) # new expression matrix with two genes mem.tmp <- membership(x,clusters=clusters,m=m) #membership values }
This function estimates an optimal setting of fuzzifier m
mestimate(eset)
mestimate(eset)
eset |
object of class “ExpressionSet” |
Schwaemmle and Jensen proposed an method to estimate of m, which was motivated by the evaluation of fuzzy clustering applied to randomized datasets. The estimated m should give the minimum fuzzifier value which prevents clustering of randomized data.
Estimate for optimal fuzzifier.
Matthias E. Futschik (http://itb.biologie.hu-berlin.de/~futschik)
Schwaemmle and Jensen, Bioinformatics,Vol. 26 (22), 2841-2848, 2010
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) #### parameter selection #### parameter selection # For fuzzifier m, we could use mestimate m1 <- mestimate(yeastF) m1 # 1.15 cl <- mfuzz(yeastF,c=20,m=m1) mfuzz.plot(yeastF,cl=cl,mfrow=c(4,5)) }
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) #### parameter selection #### parameter selection # For fuzzifier m, we could use mestimate m1 <- mestimate(yeastF) m1 # 1.15 cl <- mfuzz(yeastF,c=20,m=m1) mfuzz.plot(yeastF,cl=cl,mfrow=c(4,5)) }
This function is a wrapper function for
cmeans
of the e1071
package. It performs
soft clustering of genes based on their expression values using
the fuzzy c-means algorithm.
mfuzz(eset,centers,m,...)
mfuzz(eset,centers,m,...)
eset |
object of the class “ExpressionSet”. |
centers |
number of clusters. |
m |
fuzzification parameter. |
... |
additional parameters for |
This function is the core function for soft clustering. It groups genes based on the Euclidean distance and the c-means objective function which is a weighted square error function. Each gene is assigned a membership value between 0 and 1 for each cluster. Hence, genes can be assigned to different clusters in a gradual manner. This contrasts hard clustering where each gene can belongs to a single cluster.
An object of class flcust
(see
cmeans
) which is a list with components:
centers |
the final cluster centers. |
size |
the number of data points in each cluster of the closest hard clustering. |
cluster |
a vector of integers containing the indices of the clusters where the data points are assigned to for the closest hard clustering, as obtained by assigning points to the (first) class with maximal membership. |
iter |
the number of iterations performed. |
membership |
a matrix with the membership values of the data points to the clusters. |
withinerror |
the value of the objective function. |
call |
the call used to create the object. |
Note that the clustering is based soley on the exprs
matrix and
no information is used from the phenoData
. In particular,
the ordering of samples (arrays) is the same as the ordering
of the columns in the exprs
matrix. Also, replicated arrays in the
exprs
matrix are treated as independent by the mfuzz
function
i.e. they should be averagered prior to clustering or placed into different
distinct “ExpressionSet” objects.
Matthias E. Futschik (http://www.sysbiolab.eu)
M.E. Futschik and B. Charlisle, Noise robust clustering of gene expression time-course data, Journal of Bioinformatics and Computational Biology, 3 (4), 965-988, 2005
L. Kumar and M. Futschik, Mfuzz: a software package for soft clustering of microarray data, Bioinformation, 2(1) 5-7,2007
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) # for illustration only; rather use knn method yeastF <- standardise(yeastF) # Soft clustering and visualisation cl <- mfuzz(yeastF,c=20,m=1.25) mfuzz.plot(yeastF,cl=cl,mfrow=c(2,2)) # Plotting center of cluster 1 X11(); plot(cl[[1]][1,],type="l",ylab="Expression") # Getting the membership values for the first 10 genes in cluster 1 cl[[4]][1:10,1] }
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) # for illustration only; rather use knn method yeastF <- standardise(yeastF) # Soft clustering and visualisation cl <- mfuzz(yeastF,c=20,m=1.25) mfuzz.plot(yeastF,cl=cl,mfrow=c(2,2)) # Plotting center of cluster 1 X11(); plot(cl[[1]][1,],type="l",ylab="Expression") # Getting the membership values for the first 10 genes in cluster 1 cl[[4]][1:10,1] }
This function visualises the clusters
produced by mfuzz
.
mfuzz.plot(eset,cl,mfrow=c(1,1),colo,min.mem=0,time.labels,new.window=TRUE)
mfuzz.plot(eset,cl,mfrow=c(1,1),colo,min.mem=0,time.labels,new.window=TRUE)
eset |
object of the classExpressionSet. |
cl |
object of class flclust. |
mfrow |
determines splitting of graphic window. |
colo |
color palette to be used for plotting. If the color argument remains empty, the default palette is used. |
min.mem |
Genes with membership values below
|
time.labels |
labels can be given for the time axis. |
new.window |
should a new window be opened for graphics. |
The function generates plots where the membership of genes is color-encoded.
Matthias E. Futschik (http://www.sysbiolab.eu/matthias)
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) # Soft clustering and visualisation cl <- mfuzz(yeastF,c=20,m=1.25) mfuzz.plot(yeastF,cl=cl,mfrow=c(2,2)) # display of cluster cores with alpha = 0.5 mfuzz.plot(yeastF,cl=cl,mfrow=c(2,2),min.mem=0.5) # display of cluster cores with alpha = 0.7 mfuzz.plot(yeastF,cl=cl,mfrow=c(2,2),min.mem=0.7) }
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) # Soft clustering and visualisation cl <- mfuzz(yeastF,c=20,m=1.25) mfuzz.plot(yeastF,cl=cl,mfrow=c(2,2)) # display of cluster cores with alpha = 0.5 mfuzz.plot(yeastF,cl=cl,mfrow=c(2,2),min.mem=0.5) # display of cluster cores with alpha = 0.7 mfuzz.plot(yeastF,cl=cl,mfrow=c(2,2),min.mem=0.7) }
This function visualises the clusters
produced by mfuzz
. it is similar to mfuzz.plot
, but offers
more options for adjusting the plots.
mfuzz.plot2(eset,cl,mfrow=c(1,1),colo,min.mem=0,time.labels,time.points, ylim.set=c(0,0), xlab="Time",ylab="Expression changes",x11=TRUE, ax.col="black",bg = "white",col.axis="black",col.lab="black", col.main="black",col.sub="black",col="black",centre=FALSE, centre.col="black",centre.lwd=2, Xwidth=5,Xheight=5,single=FALSE,...)
mfuzz.plot2(eset,cl,mfrow=c(1,1),colo,min.mem=0,time.labels,time.points, ylim.set=c(0,0), xlab="Time",ylab="Expression changes",x11=TRUE, ax.col="black",bg = "white",col.axis="black",col.lab="black", col.main="black",col.sub="black",col="black",centre=FALSE, centre.col="black",centre.lwd=2, Xwidth=5,Xheight=5,single=FALSE,...)
eset |
object of the classExpressionSet. |
cl |
object of class flclust. |
mfrow |
determines splitting of graphic window. Use
|
colo |
color palette to be used for plotting. If the
color argument remains empty, the default palette is
used. If the |
min.mem |
Genes with membership values below
|
time.labels |
labels for ticks on x axis. |
time.points |
numerical values for the ticks on x axis. These can be used if the measured time points are not equidistant. |
ylim.set |
Vector of min. and max. y-value set for
plotting. If |
xlab |
label for x axis |
ylab |
label for y axis |
x11 |
If TRUE, a new window will be open for plotting. |
ax.col |
Color of axis line. |
bg |
Background color. |
col.axis |
Color for axis annotation. |
col.lab |
Color for axis labels. |
col.main |
Color for main titles. |
col.sub |
Color for sub-titles. |
col |
Default plotting color. |
centre |
If TRUE, a line for the cluster centre will be drawn. |
centre.col |
Color of the line for the cluster centre |
centre.lwd |
Width of the line for the cluster centre |
Xwidth |
Width of window. |
Xheight |
Height of window. |
single |
Integer if a specific cluster is to be plotted, otherwise it should be set to FALSE. |
... |
Additional, optional plotting arguments passed to plot.default
and axes functions such as |
The function generates plots where the membership of genes is color-encoded.
Matthias E. Futschik (http://www.sysbiolab.eu/matthias)
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) # Soft clustering and visualisation cl <- mfuzz(yeastF,c=20,m=1.25) mfuzz.plot2(yeastF,cl=cl,mfrow=c(2,2)) # same output as mfuzz.plot mfuzz.plot2(yeastF, cl=cl,mfrow=c(2,2),centre=TRUE) # lines for cluster centres will be included # More fancy choice of colors mfuzz.plot2(yeastF,cl=cl,mfrow=c(2,2),colo="fancy", ax.col="red",bg = "black",col.axis="red",col.lab="white", col.main="green",col.sub="blue",col="blue",cex.main=1.3,cex.lab=1.1) ### Single cluster with colorbar (cluster # 3) X11(width=12) mat <- matrix(1:2,ncol=2,nrow=1,byrow=TRUE) l <- layout(mat,width=c(5,1)) mfuzz.plot2(yeastF,cl=cl,mfrow=NA,colo="fancy", ax.col="red",bg = "black",col.axis="red",col.lab="white", col.main="green",col.sub="blue",col="blue",cex.main=2, single=3,x11=FALSE) mfuzzColorBar(col="fancy",main="Membership",cex.main=1) ### Single cluster with colorbar (cluster # 3 X11(width=14) mat <- matrix(1:2,ncol=2,nrow=1,byrow=TRUE) l <- layout(mat,width=c(5,1)) mfuzz.plot2(yeastF,cl=cl,mfrow=NA,colo="fancy", ax.col="red",bg = "black",col.axis="red",col.lab="white",time.labels = c(paste(seq(0,160,10),"min")), col.main="green",col.sub="blue",col="blue",cex.main=2, single=3,x11=FALSE) mfuzzColorBar(col="fancy",main="Membership",cex.main=1) }
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) # Soft clustering and visualisation cl <- mfuzz(yeastF,c=20,m=1.25) mfuzz.plot2(yeastF,cl=cl,mfrow=c(2,2)) # same output as mfuzz.plot mfuzz.plot2(yeastF, cl=cl,mfrow=c(2,2),centre=TRUE) # lines for cluster centres will be included # More fancy choice of colors mfuzz.plot2(yeastF,cl=cl,mfrow=c(2,2),colo="fancy", ax.col="red",bg = "black",col.axis="red",col.lab="white", col.main="green",col.sub="blue",col="blue",cex.main=1.3,cex.lab=1.1) ### Single cluster with colorbar (cluster # 3) X11(width=12) mat <- matrix(1:2,ncol=2,nrow=1,byrow=TRUE) l <- layout(mat,width=c(5,1)) mfuzz.plot2(yeastF,cl=cl,mfrow=NA,colo="fancy", ax.col="red",bg = "black",col.axis="red",col.lab="white", col.main="green",col.sub="blue",col="blue",cex.main=2, single=3,x11=FALSE) mfuzzColorBar(col="fancy",main="Membership",cex.main=1) ### Single cluster with colorbar (cluster # 3 X11(width=14) mat <- matrix(1:2,ncol=2,nrow=1,byrow=TRUE) l <- layout(mat,width=c(5,1)) mfuzz.plot2(yeastF,cl=cl,mfrow=NA,colo="fancy", ax.col="red",bg = "black",col.axis="red",col.lab="white",time.labels = c(paste(seq(0,160,10),"min")), col.main="green",col.sub="blue",col="blue",cex.main=2, single=3,x11=FALSE) mfuzzColorBar(col="fancy",main="Membership",cex.main=1) }
This function produces a (separate) colour bar for graphs produced by mfuzz.plot
mfuzzColorBar(col, horizontal=FALSE,...)
mfuzzColorBar(col, horizontal=FALSE,...)
col |
vector of colours used. If missing, the same vector
as the default vector for mfuzz.plot is used. If col="fancy", an
alternative color palette is used (see |
horizontal |
If TRUE, a horizontal colour bar is generated, otherwise a vertical one will be produced. |
... |
additional parameter passed to |
Matthias E. Futschik (http://www.sysbiolab.eu/matthias.html)
M.E. Futschik and B. Charlisle, Noise robust clustering of gene expression time-course data, Journal of Bioinformatics and Computational Biology, 3 (4), 965-988, 2005
L. Kumar and M. Futschik, Mfuzz: a software package for soft clustering of microarray data, Bioinformation, 2(1) 5-7,2007
if (interactive()){ X11(w=1.5,h=5); par(mar=c(1,1,1,5)) mfuzzColorBar() mfuzzColorBar(col="fancy",main="Membership value") mfuzzColorBar(rev(heat.colors(100))) # example of using heat colors with red indicating high membership values }
if (interactive()){ X11(w=1.5,h=5); par(mar=c(1,1,1,5)) mfuzzColorBar() mfuzzColorBar(col="fancy",main="Membership value") mfuzzColorBar(rev(heat.colors(100))) # example of using heat colors with red indicating high membership values }
The function Mfuzzgui
provides a graphical user interface for
clustering of microarray data and visualisation of results.
It is based on the functions of the Mfuzz package.
Mfuzzgui()
Mfuzzgui()
The function Mfuzzgui
launches a graphical user interface for the Mfuzz package.
It is based on Tk widgets using the R TclTk interface by Peter Dalgaard. It also employs
some pre-made widgets from the tkWidgets Bioconductor-package by Jianhua Zhang for
the selection of objects/files to be loaded.
Mfuzzgui provides a convenient interface to most functions of the Mfuzz package without restriction
of flexibility. An exception is the batch processes such as partcoeff
and cselection
routines which
are used for parameter selection in fuzzy c-means clustering of microarray data. These routines are
not included in Mfuzzgui. To select various parameters, the underlying Mfuzz routines may be applied.
Usage of Mfuzzgui does not require assumes an pre-built exprSet
object
but can be used with tab-delimited text files containing the gene expression data.
Note, however, that the clustering is based on the
the ordering of samples (arrays) as
of the columns in the expression matrix of the exprSet
object or in the uploaded table, respectively.
Also, replicated arrays in the
expression matrix (or table) are treated as independent by the mfuzz
function
and, thus, should be averagered prior to clustering.
For a overview of the functionality of Mfuzzgui, please refer to the package vignette. For a description
of the underlying functions, please refer to the Mfuzz
package.
Mfuzzgui returns a tclObj object.
The newest versions of Mfuzzgui
can be found at the Mfuzz webpage
(http://itb.biologie.hu-berlin.de/~futschik/software/R/Mfuzz).
Matthias E. Futschik (http://itb.biologie.hu-berlin.de/~futschik)and Lokesh Kumar
M.E. Futschik and B. Charlisle, Noise robust clustering of gene expression time-course data, Journal of Bioinformatics and Computational Biology, Vol. 3, No. 4, 965-988, 2005.
Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, Davis RW, A genome-wide transcriptional analysis of the mitotic cell cycle, Mol Cell,(2):65-73, 1998.
Mfuzz web-page: http://itb.biologie.hu-berlin.de/~futschik/software/R/Mfuzz
This function calculates the overlap of clusters
produced by mfuzz
.
overlap(cl)
overlap(cl)
cl |
object of class flclust |
The function generates a matrix of the normalised overlap of soft clusters. The overlap indicates the extent of “shared” genes between clusters. For a mathematical definiton of the overlap, see the vignette of the package or the reference below.
Matthias E. Futschik (http://itb.biologie.hu-berlin.de/~futschik)
M.E. Futschik and B. Charlisle, Noise robust clustering of gene expression time-course data, Journal of Bioinformatics and Computational Biology, 3 (4), 965-988, 2005
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) # Soft clustering and visualisation cl <- mfuzz(yeastF,c=20,m=1.25) mfuzz.plot(yeastF,cl=cl,mfrow=c(4,5)) # Calculation of cluster overlap and visualisation O <- overlap(cl) X11() Ptmp <- overlap.plot(cl,over=O,thres=0.05) }
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) # Soft clustering and visualisation cl <- mfuzz(yeastF,c=20,m=1.25) mfuzz.plot(yeastF,cl=cl,mfrow=c(4,5)) # Calculation of cluster overlap and visualisation O <- overlap(cl) X11() Ptmp <- overlap.plot(cl,over=O,thres=0.05) }
This function visualises the cluster overlap
produced by overlap
.
overlap.plot(cl,overlap,thres=0.1,scale=TRUE,magni=30,P=NULL)
overlap.plot(cl,overlap,thres=0.1,scale=TRUE,magni=30,P=NULL)
cl |
object of class “flclust” |
overlap |
matrix of cluster overlap produced by |
thres |
threshold for visualisation. Cluster overlaps below the threshold will not be visualised. |
scale |
Scale parameter for principal component
analysis by |
magni |
Factor for increase the line width for cluster overlap. |
P |
Projection matrix produced by principal component analysis. |
A plot is genererated based on a prinicpal component analysis of the cluster centers. The overlap is visualised by lines with variable width indicating the strength of the overlap. Additonally, the matrix of principal components is returned. This matrix can be re-used for other projections to compare the overlap and global cluster structure of different clusterings.
Matthias E. Futschik (http://itb.biologie.hu-berlin.de/~futschik)
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) # Soft clustering cl <- mfuzz(yeastF,c=20,m=1.25) X11();mfuzz.plot(yeastF,cl=cl,mfrow=c(4,5)) O <- overlap(cl) X11();Ptmp <- overlap.plot(cl,over=O,thres=0.05) # Alternative clustering cl <- mfuzz(yeastF,c=10,m=1.25) X11();mfuzz.plot(yeastF,cl=cl,mfrow=c(3,4)) O <- overlap(cl) X11();overlap.plot(cl,over=O,P=Ptmp,thres=0.05) # visualisation based on principal compents from previous projection }
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) # Soft clustering cl <- mfuzz(yeastF,c=20,m=1.25) X11();mfuzz.plot(yeastF,cl=cl,mfrow=c(4,5)) O <- overlap(cl) X11();Ptmp <- overlap.plot(cl,over=O,thres=0.05) # Alternative clustering cl <- mfuzz(yeastF,c=10,m=1.25) X11();mfuzz.plot(yeastF,cl=cl,mfrow=c(3,4)) O <- overlap(cl) X11();overlap.plot(cl,over=O,P=Ptmp,thres=0.05) # visualisation based on principal compents from previous projection }
This function calculates partition coefficient for clusters within a range of cluster parameters. It can be used to determine the parameters which lead to uniform clustering.
partcoef(eset,crange=seq(4,32,4),mrange=seq(1.05,2,0.1),...)
partcoef(eset,crange=seq(4,32,4),mrange=seq(1.05,2,0.1),...)
eset |
object of class “ExpressionSet”. |
crange |
range of number of clusters |
mrange |
range of clustering paramter |
... |
additional arguments for underlying |
Introduced by Bezdek (1981), the partition coefficient F is defined as the sum of squares of values of the partition matrix divided by the number of values. It is maximal if the partition is hard and reaches a minimum for U=1/c when every gene is equally assigned to every cluster.
It is well-known that the partition coefficient tends to decrease monotonically with increasing n. To reduce this tendency we defined a normalized partition coefficient where the partition for uniform partitions are subtracted from the actual partition coefficients (Futschik and Kasabov,2002).
The function generates the matrix of partition coefficients for
a range of c
and m
values. It also produces a matrix of normalised
partition coefficients as well as a matrix with partition coefficient for uniform partitions.
Matthias E. Futschik (http://itb.biologie.hu-berlin.de/~futschik)
J.C.Bezdek, Pattern recognition with fuzzy objective function algorithms, Plenum, 1981
M.E. Futschik and N.K. Kasabov. Fuzzy clustering of gene expression data, Proceedings of World Congress of Computational Intelligence WCCI 2002, Hawaii, IEEE Press, 2002
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) #### parameter selection yeastFR <- randomise(yeastF) cl <- mfuzz(yeastFR,c=20,m=1.1) mfuzz.plot(yeastFR,cl=cl,mfrow=c(4,5)) # shows cluster structures (non-uniform partition) tmp <- partcoef(yeastFR) # This might take some time. F <- tmp[[1]];F.n <- tmp[[2]];F.min <- tmp[[3]] # Which clustering parameters result in a uniform partition? F > 1.01 * F.min cl <- mfuzz(yeastFR,c=20,m=1.25) # produces uniform partion mfuzz.plot(yeastFR,cl=cl,mfrow=c(4,5)) # uniform coloring of temporal profiles indicates uniform partition }
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) #### parameter selection yeastFR <- randomise(yeastF) cl <- mfuzz(yeastFR,c=20,m=1.1) mfuzz.plot(yeastFR,cl=cl,mfrow=c(4,5)) # shows cluster structures (non-uniform partition) tmp <- partcoef(yeastFR) # This might take some time. F <- tmp[[1]];F.n <- tmp[[2]];F.min <- tmp[[3]] # Which clustering parameters result in a uniform partition? F > 1.01 * F.min cl <- mfuzz(yeastFR,c=20,m=1.25) # produces uniform partion mfuzz.plot(yeastFR,cl=cl,mfrow=c(4,5)) # uniform coloring of temporal profiles indicates uniform partition }
This function randomise the time order for each gene separately.
randomise(eset)
randomise(eset)
eset |
object of the class ExpressionSet. |
The function produces an object of the ExpressionSet class with randomised expression data.
Matthias E. Futschik (http://itb.biologie.hu-berlin.de/~futschik)
data(yeast) # data set includes 17 measurements yeastR <- randomise(yeast)
data(yeast) # data set includes 17 measurements yeastR <- randomise(yeast)
Standardisation of the expression values of every gene/transcript/protein is carried out, so that the average expression value for each gene/transcript/protein is zero and the standard deviation of its expression profile is one.
standardise(eset)
standardise(eset)
eset |
object of the classe ExpressionSet. |
The function produces an object of the ExpressionSet class with standardised expression values.
Mfuzz assumes that the given expression data are preprocessed
(including the normalisation). The function standardise
does not
replace the normalisation step. Note the difference: Normalisation is carried
out to make different samples comparable, while standardisation (in Mfuzz)
is carried out to make transcripts (genes) comparable.
Matthias E. Futschik (http://www.sysbiolab.eu)
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) # Soft clustering and visualisation cl <- mfuzz(yeastF,c=20,m=1.25) mfuzz.plot(yeastF,cl=cl,mfrow=c(4,5)) }
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) # Soft clustering and visualisation cl <- mfuzz(yeastF,c=20,m=1.25) mfuzz.plot(yeastF,cl=cl,mfrow=c(4,5)) }
Standardisation of the expression values of every gene is performed, so that the expression values at a chosen time point are zero and the standard deviation of expression profiles of individual genes/transcripts/proteins is one.
standardise2(eset,timepoint=1)
standardise2(eset,timepoint=1)
eset |
object of the class ExpressionSet. |
timepoint |
integer: which time point should have expression values of zero. |
The function produces an object of the ExpressionSet class with standardised expression values.
Mfuzz assumes that the given expression data are preprocessed
(including the normalisation). The function standardise2
does not
replace the normalisation step. Note the difference: Normalisation is carried
out to make different samples comparable, while standardisation (in Mfuzz)
is carried out to make transcripts (genes) comparable.
Matthias E. Futschik (http://www.sysbiolab.eu)
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise2(yeastF,timepoint=1) # Soft clustering and visualisation cl <- mfuzz(yeastF,c=20,m=1.25) mfuzz.plot(yeastF,cl=cl,mfrow=c(4,5)) }
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise2(yeastF,timepoint=1) # Soft clustering and visualisation cl <- mfuzz(yeastF,c=20,m=1.25) mfuzz.plot(yeastF,cl=cl,mfrow=c(4,5)) }
A expression matrix stored as a table (in a defined format) is read and converted to Expression Set object.
table2eset(filename)
table2eset(filename)
filename |
name of file to be scanned in |
The expression matrix stored as table in the file
has to follow some conventions in order to be able to be converted
to an Expression Set
object: The first row of the file contains sample labels and optionally, the second column can
contains the time points. If the second row is used for the input the time, the first field
in the second row must contain “Time”. Similarly, the first column
contains unique gene IDs and optionally second row can contain gene names. If the second row contains gene
names, the second field in the first row must contain “Gene.Name”. The rest of the file
contains expression data. As example, two tables with expression data are provided.
These examples can be viewed by inputing data(yeast.table)
and data(yeast.table2)
in the R console.
An Expression Set
object is generated.
Matthias E. Futschik (http://www.sysbiolab.eu)
This function calculates the number,for which each gene appears to have the top
membership score in the partition matrix of clusters produced by mfuzz
.
top.count(cl)
top.count(cl)
cl |
object of class “flclust” |
The function generates a vector containing a count for each gene, which is just the number of times that particular gene has acquired the top membership score.
Lokesh Kumar and Matthias E. Futschik (http://itb.biologie.hu-berlin.de/~futschik)
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) # Soft clustering and visualisation cl <- mfuzz(yeastF,c=20,m=1.25) top.count(cl) }
if (interactive()){ data(yeast) # Data pre-processing yeastF <- filter.NA(yeast) yeastF <- fill.NA(yeastF) yeastF <- standardise(yeastF) # Soft clustering and visualisation cl <- mfuzz(yeastF,c=20,m=1.25) top.count(cl) }
The data contains gene expression measurements for 3000 randomly chosen genes of the yeast mutant cdc28 as performed and described by Cho et al. For details, see the reference.
data(yeast)
data(yeast)
An object of class “ExpressionSet”.
The data was downloaded from Yeast Cell Cylce Analysis Project webside and converted to an ExpressionSet object.
Cho et al., A genome-wide transcriptional analysis of the mitotic cell cycle, Mol Cell. 1998 Jul;2(1):65-73.
The data serves as an example for the format required for uploading tables with expression data into Mfuzzgui. The first row contains the names of the samples, the second row contains the measured time points. Note that “TIME” has to placed in the first field of the second row.
The first column contains unique identifiers for genes; optionally the second row can contain gene names if “GENE.NAMES” is in the second field in the first row.
An example for an table without optional fields is the dataset
yeast.table2
.
The exemplary tables can be found in the data sub-folder of the Mfuzzgui package.
Cho et al., A genome-wide transcriptional analysis of the mitotic cell cycle, Mol Cell. 1998 Jul;2(1):65-73.
The data serves as an example for the format required to
upload tables with expression data into Mfuzzgui. The first
row contains the names of the samples and the first column contains
unique identifiers for genes. To input measurement time and gene
names, refer to yeast.table
.
The exemplary tables can be found in the data sub-folder of the Mfuzzgui package.
Cho et al., A genome-wide transcriptional analysis of the mitotic cell cycle, Mol Cell. 1998 Jul;2(1):65-73.