Title: | Cluster and Tree Conversion. |
---|---|
Description: | Tools for export and import classification trees and clusters to other programs |
Authors: | Antoine Lucas <[email protected]>, Laurent Gautier <[email protected]> |
Maintainer: | Antoine Lucas <[email protected]> |
License: | GPL-2 |
Version: | 1.81.0 |
Built: | 2024-12-29 05:05:40 UTC |
Source: | https://github.com/bioc/ctc |
Convert hclust objects to Newick format files.
hc2Newick(hc, flat=TRUE)
hc2Newick(hc, flat=TRUE)
hc |
a |
flat |
a boolean (see section value). |
If flat=TRUE
the result is a string (that you can write in a
file).
If flat=FALSE
the result is a list (of lists). Each list
is consituted of the elements left
, right
and dist
.
Laurent ([email protected])
Antoine Lucas and Sylvain Jasson, Using amap and ctc Packages for Huge Clustering, R News, 2006, vol 6, issue 5 pages 58-60.
data(USArrests) h = hclust(dist(USArrests)) write(hc2Newick(h),file='hclust.newick')
data(USArrests) h = hclust(dist(USArrests)) write(hc2Newick(h),file='hclust.newick')
This function compute hierachical clustering with function hcluster and export cluster to treeview files format.
hclust2treeview(x,file="cluster.cdt",method = "euclidean",link = "complete",keep.hclust=FALSE)
hclust2treeview(x,file="cluster.cdt",method = "euclidean",link = "complete",keep.hclust=FALSE)
x |
numeric matrix or a data frame or an object of class "exprSet". |
file |
File name of export file. |
method |
the distance measure to be used. This must be one of
|
link |
the agglomeration method to be used. This should
be (an unambiguous abbreviation of) one of
|
keep.hclust |
if TRUE: function returns a list of 2 objects of class hclust |
This function producte all 3 files needed by treeview, with extentions: cdt, gtr, atr.
if keep.hclust = FALSE, function return 1. else function returns 2 objects of class hclust, first: hierachical clustering by rows, second: hierarchical clustering by columns
Antoine Lucas, http://mulcyber.toulouse.inra.fr/projects/amap/
Antoine Lucas and Sylvain Jasson, Using amap and ctc Packages for Huge Clustering, R News, 2006, vol 6, issue 5 pages 58-60.
data(USArrests) hclust2treeview(USArrests,file="cluster.cdt")
data(USArrests) hclust2treeview(USArrests,file="cluster.cdt")
Converting data to Cluster format
r2cluster(data,labels=FALSE,colname="ACC",description=FALSE, file="cluster.txt",dec='.')
r2cluster(data,labels=FALSE,colname="ACC",description=FALSE, file="cluster.txt",dec='.')
file |
the path of the file |
data |
a matrix (or data frame) which provides the data to put into the file |
labels |
a logical value indicating whether we use the frist column as labels (ACC column in cluster file) |
colname |
a character string indicating what kind of objects are in each row. YORF, MCLID, CLID, ACC can be used: see details. |
description |
a logical value indicating whether we use the second column as description (NAME column for cluster file) |
dec |
the character used in the file for decimal points |
Software Cluster, made by M. Eisen needs formatted input data like:
ACC NAME GWEIGHT GORDER V3 V4 V5 EWEIGHT 1 1 1 gbk01 Gene1 1 1 0.9 0.4 1.4 gbk02 Gene2 1 2 0.6 0.2 0.2 gbk03 Gene3 1 3 1.6 1.1 0.9 gbk04 Gene4 1 4 0.4 1 1
First field of first line (i.e "ACC") is a special field, that tells program what kind of objects are in each row.
Four special values are defined with web link (when visualize with TreeView):
YORF http://genome-www.stanford.edu/cgi-bin/dbrun/SacchDB?find+Locus+%22UNIQID%22
MCLID http://genome.rtc.riken.go.jp/cgi-bin/getseq?g+R+UNIQID
Line begining with EWEIGHT
gives weights for each
column (variable). Column GWEIGHT
gives weights for each line
(individuals).
Cluster is a program made by M. Eisen that performs hierarchical clustering, K-means and SOM.
Cluster is copyrighted. To get or have information about Cluster: http://rana.lbl.gov/EisenSoftware.htm
Antoine Lucas, http://antoinelucas.free.fr/ctc
Antoine Lucas and Sylvain Jasson, Using amap and ctc Packages for Huge Clustering, R News, 2006, vol 6, issue 5 pages 58-60.
# Create data set.seed(1) m <- matrix(rep(1,3*24),ncol=3) m[9:16,3] <- 3 ; m[17:24,] <- 3 #create 3 groups m <- m+rnorm(24*3,0,0.5) #add noise m <- floor(10*m)/10 #just one digits r2cluster(m)
# Create data set.seed(1) m <- matrix(rep(1,3*24),ncol=3) m[9:16,3] <- 3 ; m[17:24,] <- 3 #create 3 groups m <- m+rnorm(24*3,0,0.5) #add noise m <- floor(10*m)/10 #just one digits r2cluster(m)
Write data frame and hclust object to gtr atr, cdt files (Xcluster or Cluster output). Visualisation of cluster can be done with tools like treeview
r2gtr(hr,file="cluster.gtr",distance=hr$dist.method,dec='.',digits=5) r2atr(hc,file="cluster.atr",distance=hc$dist.method,dec='.',digits=5) r2cdt(hr,hc,data,labels=FALSE,description=FALSE,file="cluster.cdt",dec='.')
r2gtr(hr,file="cluster.gtr",distance=hr$dist.method,dec='.',digits=5) r2atr(hc,file="cluster.atr",distance=hc$dist.method,dec='.',digits=5) r2cdt(hr,hc,data,labels=FALSE,description=FALSE,file="cluster.cdt",dec='.')
file |
the path of the file |
data |
a matrix (or data frame) which provides the data to put into the file |
hr , hc
|
objects of class hclust (rows and columns) |
distance |
The distance measure used. This must be one of ‘"euclidean"’, ‘"maximum"’, ‘"manhattan"’, ‘"canberra"’ or ‘"binary"’. Any unambiguous substring can be given. |
digits |
number digits for precision |
labels |
a logical value indicating whether we use the frist column as labels (NAME column for cluster file) |
description |
a logical value indicating whether we use the second column as description (DESCRIPTION column for cluster file) |
dec |
the character used in the file for decimal points |
Function hclust2treeview
compute hierarchical
clustering and export to all files at once.
Antoine Lucas, http://antoinelucas.free.fr/ctc
Antoine Lucas and Sylvain Jasson, Using amap and ctc Packages for Huge Clustering, R News, 2006, vol 6, issue 5 pages 58-60.
r2xcluster
, xcluster2r
,hclust
,hcluster
# Create data set.seed(1) m <- matrix(rep(1,3*24),ncol=3) m[9:16,3] <- 3 ; m[17:24,] <- 3 #create 3 groups m <- m+rnorm(24*3,0,0.5) #add noise m <- floor(10*m)/10 #just one digits # use library stats # Cluster columns hc <- hclust(dist(t(m))) # Cluster rows hr <- hclust(dist(m)) # Export files r2atr(hc,file="cluster.atr") r2gtr(hr,file="cluster.gtr") r2cdt(hr,hc,m ,file="cluster.cdt")
# Create data set.seed(1) m <- matrix(rep(1,3*24),ncol=3) m[9:16,3] <- 3 ; m[17:24,] <- 3 #create 3 groups m <- m+rnorm(24*3,0,0.5) #add noise m <- floor(10*m)/10 #just one digits # use library stats # Cluster columns hc <- hclust(dist(t(m))) # Cluster rows hr <- hclust(dist(m)) # Export files r2atr(hc,file="cluster.atr") r2gtr(hr,file="cluster.gtr") r2cdt(hr,hc,m ,file="cluster.cdt")
Converting data to Xcluster format
r2xcluster(data,labels=FALSE,description=FALSE,file="xcluster.txt")
r2xcluster(data,labels=FALSE,description=FALSE,file="xcluster.txt")
file |
the path of the file |
data |
a matrix (or data frame) which provides the data to put into the file |
labels |
a logical value indicating whether we use the frist column as labels (NAME column for xcluster file) |
description |
a logical value indicating whether we use the second column as description (DESCRIPTION column for cluster file) |
Software Xcluster, made by G. Sherlock needs formatted input data like:
NAME DESCRIPTION GWEIGHT V2 V3 V4 EWEIGHT 1 1 1 gbk01 Gene1 1 0.9 0.4 1.4 gbk02 Gene2 1 0.6 0.2 0.2 gbk03 Gene3 1 1.6 1.1 0.9 gbk04 Gene4 1 0.4 1 1
Line begining with EWEIGHT
gives weights for each
column (variable). Column GWEIGHT
gives weights for each line
(individuals).
Xcluster is a C program made by Gavin Sherlock that performs hierarchical clustering, K-means and SOM.
Xcluster is copyrighted. To get or have information about Xcluster: http://genome-www.stanford.edu/~sherlock/cluster.html
Antoine Lucas, http://antoinelucas.free.fr/ctc
Antoine Lucas and Sylvain Jasson, Using amap and ctc Packages for Huge Clustering, R News, 2006, vol 6, issue 5 pages 58-60.
xcluster
, xcluster2r
, hclust
, hcluster
## Create data set.seed(1) m <- matrix(rep(1,3*24),ncol=3) m[9:16,3] <- 3 ; m[17:24,] <- 3 #create 3 groups m <- m+rnorm(24*3,0,0.5) #add noise m <- floor(10*m)/10 #just one digits r2xcluster(m) ## And once you have Xcluster program: ## Not run: system('Xcluster -f xcluster.txt -e 0 -p 0 -s 0 -l 0') h <- xcluster2r('xcluster.gtr') plot(h,hang=-1) ## End(Not run)
## Create data set.seed(1) m <- matrix(rep(1,3*24),ncol=3) m[9:16,3] <- 3 ; m[17:24,] <- 3 #create 3 groups m <- m+rnorm(24*3,0,0.5) #add noise m <- floor(10*m)/10 #just one digits r2xcluster(m) ## And once you have Xcluster program: ## Not run: system('Xcluster -f xcluster.txt -e 0 -p 0 -s 0 -l 0') h <- xcluster2r('xcluster.gtr') plot(h,hang=-1) ## End(Not run)
The input for Eisen-clustering is a slight variation of a tab delimited file. This method reads the expression data from such files as a matrix and provides optional additional information on the experiments as attributes.
read.eisen(file,sep="\t",dec=".", format.check = TRUE)
read.eisen(file,sep="\t",dec=".", format.check = TRUE)
file |
The relative or absolute path to the file to be read, as internally forwarded to the read.table function. |
sep |
Separator of fields, passed on to read.table. |
dec |
Passed on to read.table. This is particulary helpful for the interpretation of data from localised spreadsheet programs. |
format.check |
TRUE or FLASE: to disable file format check. |
The software of Michael Eisen and its plain tab separated format for the presentation of gene expression data prior to their clustering is supported by many hard- and software providers, both as an input for their tools and as resulting from the analysis and normalisation of the chip images. To be able to read and write this format, the Bioconductor suite is enabled to easily reanalyse or extend older experiments that might have been analysed with the Eisen tools before.
A numerical matrix is returned. It is a complete analogue of the Eisen-format, except the descriptions, weights and other information being passed to attributes. The first row will be the column names, the first column will be the respective row name. A second row that has a first empty field is referred to via the attribute "second.row". A column NAME is stored in the attribute "NAME".
Steffen Moeller
Michael Eisen Lab http://rana.lbl.gov/
Michael Hoon's Cluster 3.0 http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/
Eisen M.B., P.T. Spellman, P.O. Brown, and D. Botstein. 1998. Cluster analysis and display of genome-wide expression patterns. /Proc. Natl. Acad. Sci. USA /, 95:14863-14868.
De Hoon M.J.L., S. Imoto, J. Nolan, and S. Miyano. Open source clustering software. Bioinformatics *20* (9): 1453–1454 (2004).
Antoine Lucas and Sylvain Jasson, Using amap and ctc Packages for Huge Clustering, R News, 2006, vol 6, issue 5 pages 58-60.
Performs a hierarchical cluster analysis on a set of dissimilarities (this function launch an external program: Xcluster).
xcluster(data,distance="euclidean",clean=FALSE,tmp.in="tmp.txt",tmp.out="tmp.gtr")
xcluster(data,distance="euclidean",clean=FALSE,tmp.in="tmp.txt",tmp.out="tmp.gtr")
data |
a matrix (or data frame) which provides the data to analyze |
distance |
The distance measure used with Xcluster. This must be one of
|
clean |
a logical value indicating whether you want the true
distances ( |
tmp.in , tmp.out
|
temporary files for Xcluster |
Available distance measures are (written for two vectors and
):
Euclidean: Usual square distance between the two vectors (2 norm).
Pearson:
Pearson not centered:
Xcluster does not use usual agglomerative methods (single, average, complete), but compute the distance between each groups' barycenter for the distance between two groups.
This have a problem for this kind of data:
A | 0 | 0 |
B | 0 | 1 |
C | 0.9 | 0.5 |
Ie: a triangular in R, the distance between A and B is larger
than the distance between the group A,B and C (with euclidean distance).
For that case it can be useful to use clean=TRUE
and that mean
that you must not consider A and B as a group without C.
An object of class hclust which describes the tree produced by the clustering process. The object is a list with components:
merge |
an |
height |
a set of |
order |
a vector giving the permutation of the original
observations suitable for plotting, in the sense that a cluster
plot using this ordering and matrix |
labels |
labels for each of the objects being clustered. |
call |
the call which produced the result. |
method |
the cluster method that has been used. |
dist.method |
the distance that has been used to create |
Xcluster is a C program made by Gavin Sherlock that performs hierarchical clustering, K-means and SOM.
Xcluster is copyrighted. To get or have information about Xcluster: http://genome-www.stanford.edu/~sherlock/cluster.html
Antoine Lucas, http://mulcyber.toulouse.inra.fr/projects/amap/
Antoine Lucas and Sylvain Jasson, Using amap and ctc Packages for Huge Clustering, R News, 2006, vol 6, issue 5 pages 58-60.
r2xcluster
, xcluster2r
,hclust
, hcluster
# Create data set.seed(1) m <- matrix(rep(1,3*24),ncol=3) m[9:16,3] <- 3 ; m[17:24,] <- 3 #create 3 groups m <- m+rnorm(24*3,0,0.5) #add noise m <- floor(10*m)/10 #just one digits # And once you have Xcluster program: # #h <- xcluster(m) # #plot(h)
# Create data set.seed(1) m <- matrix(rep(1,3*24),ncol=3) m[9:16,3] <- 3 ; m[17:24,] <- 3 #create 3 groups m <- m+rnorm(24*3,0,0.5) #add noise m <- floor(10*m)/10 #just one digits # And once you have Xcluster program: # #h <- xcluster(m) # #plot(h)
Converting Xcluster/Cluster output (.gtr or .atr) to R hclust file
xcluster2r(file,distance="euclidean",labels=FALSE,fast=FALSE,clean=FALSE, dec='.')
xcluster2r(file,distance="euclidean",labels=FALSE,fast=FALSE,clean=FALSE, dec='.')
file |
the path of a Xcluster/Cluster file (.gtr or .atr) |
distance |
The distance measure used with Xcluster/Cluster. This must be one of
|
labels |
a logical value indicating whether we use labels values (in the .cdt file) or not. |
fast |
a logical value indicating whether we reorganize data
like R ( |
clean |
a logical value indicating whether you want the true
distances ( |
dec |
the character used in the file for decimal points |
See xcluster for more details.
An object of class hclust which describes the tree produced by the clustering process.
Xcluster is a C program made by Gavin Sherlock that performs hierarchical clustering, K-means and SOM.
Xcluster is copyrighted. To get or have information about Xcluster: http://genome-www.stanford.edu/~sherlock/cluster.html
Cluster is a program made by Michael Eisen that performs hierarchical clustering, K-means and SOM.
Cluster is copyrighted. To get or have information about Cluster: http://rana.lbl.gov/EisenSoftware.htm
Antoine Lucas, http://mulcyber.toulouse.inra.fr/projects/amap/
Antoine Lucas and Sylvain Jasson, Using amap and ctc Packages for Huge Clustering, R News, 2006, vol 6, issue 5 pages 58-60.
xcluster
, r2xcluster
, hclust
, hcluster
# Create data set.seed(1) m <- matrix(rep(1,3*24),ncol=3) m[9:16,3] <- 3 ; m[17:24,] <- 3 #create 3 groups m <- m+rnorm(24*3,0,0.5) #add noise m <- floor(10*m)/10 #just one digits r2xcluster(m) # And once you have Xcluster program: # #system('Xcluster -f xcluster.txt -e 0 -p 0 -s 0 -l 0') #h <- xcluster2r('xcluster.gtr') #plot(h,hang=-1)
# Create data set.seed(1) m <- matrix(rep(1,3*24),ncol=3) m[9:16,3] <- 3 ; m[17:24,] <- 3 #create 3 groups m <- m+rnorm(24*3,0,0.5) #add noise m <- floor(10*m)/10 #just one digits r2xcluster(m) # And once you have Xcluster program: # #system('Xcluster -f xcluster.txt -e 0 -p 0 -s 0 -l 0') #h <- xcluster2r('xcluster.gtr') #plot(h,hang=-1)