Title: | Cluster Merging for Flow Cytometry Data |
---|---|
Description: | Merging of mixture components for model-based automated gating of flow cytometry data using the flowClust framework. Note: users should have a working copy of flowClust 2.0 installed. |
Authors: | Greg Finak <[email protected]>, Raphael Gottardo <[email protected]> |
Maintainer: | Greg Finak <[email protected]> |
License: | Artistic-2.0 |
Version: | 2.55.0 |
Built: | 2024-10-30 07:52:01 UTC |
Source: | https://github.com/bioc/flowMerge |
Merges mixture components from the flowClust framework based on the entropy of clustering and provides a simple representation of complicated, non-convex cell populations.
Package: | flowMerge |
Type: | Package |
Version: | 0.4.1 |
Date: | 2009-09-07 |
License: | Artistic-2.0 |
LazyLoad: | yes |
Depends: | methods |
High density, non-convex cell populations in flow cytometry data often require multiple mixture components for a good model fit. The components are often overlapping, resulting in a complicated representation of individual cell populations. flowMerge merges overlapping mixture components (based on the max BIC flowClust
model fit) in an iterative manner based on an entropy criterion, allowing these cell populations to be represented by individual mixture components while retaining the good model fitting properties of the BIC solution. Estimates of the number of clusters from a flowMerge
model more accurately represent the "true" number of cell populations in the data.
Running flowMerge
is relatively straightforward. A flowClust
object is converted to a flowObj
object, which groups the model and the data (a flowFrame
) into a single object. This is done by a call to flowObj(model, data)
with a call to merge
, which takes a flowObj
object.
The algorithm may be run in parallel on a multi-core machine or a networked cluster of machines. It uses the functionality in the snow
package to achieve this. Parallelized calls to flowClust
are available via the pFlowClust
and pFlowMerge
functions.
flowMerge
has functionality to automatically select the "correct" number of clusters by fitting a piecewise linear model to the entropy of clustering vs number of clusters, and locating the position of the changepoint. The piecewise linear model fitting is invoked by a call to fitPiecewiseLinreg
, which returns the location of the changepoint.
Greg Finak <[email protected]>, Raphael Gottardo <[email protected]>
Maintainer: Greg Finak <[email protected]>
Finak G, Bashasharti A, Brinkmann R, Gottardo R. Merging Mixture Model Components for Improved Cell Population Identification in High Throughput Flow Cytometry Data; Advances in Bioinformatics (To Appear)
flowClust,flowObj,pFlowMerge,pFlowClust,fitPiecewiseLinreg,merge,getData,link{plot}
#data(rituximab) #data(RituximabFlowClustFit) #o<-flowObj(flowClust.res[[which.max(flowMerge:::BIC(flowClust.res))]],rituximab); #m<-merge(o); #i<-fitPiecewiseLinreg(m); #m<-m[[i]]; #plot(m,pch=20,level=0.9);
#data(rituximab) #data(RituximabFlowClustFit) #o<-flowObj(flowClust.res[[which.max(flowMerge:::BIC(flowClust.res))]],rituximab); #m<-merge(o); #i<-fitPiecewiseLinreg(m); #m<-m[[i]]; #plot(m,pch=20,level=0.9);
Overrides the snow checkForRemoteErrors
function. Try errors are returned when cluster nodes produce errors, rather than completely aborting the computation. Not meant to be called by the user.
checkForRemoteErrors(val)
checkForRemoteErrors(val)
val |
The result returned from an individual cluster node. |
This function is meant to be called internally, but must be exported so that it can hide the native checkForRemoteErrors
function in the snow package.
The result from the snow cluster node, or an object of type try-error
if there was an error.
Greg Finak <[email protected]>
Finak G, Bashasharti A, Brinkmann R, Gottardo R. Merging Mixture Model Components for Improved Cell Population Identification in High Throughput Flow Cytometry Data; Advances in Bioinformatics (To Appear)
Fits a two–component piecewise linear regression to the entropy vs number of clusters for a list of merged cluster solutions.
fitPiecewiseLinreg(x, plot=FALSE, normalized=TRUE, ...)
fitPiecewiseLinreg(x, plot=FALSE, normalized=TRUE, ...)
x |
A "list" of |
plot |
A |
normalized |
A |
... |
Additional arguments not currently used. |
An S4 method that takes a list of flowMerge
objects output by the merge
method, extracts the entropy and fits a piecwise linear regression to the entropy vs number of clusters in order to find the postion of the changepoint. The location of the changepoint corresponds to the optimal merged cluster solution. The piecewise linear regression now is fitted to the entropy vs cumulative sum of merged observations at each number of clusters. This normalizes the change in entropy for the number of data points as described in Baudry et al.
An integer
value corresponding to the position of the changepoint.
Greg Finak <[email protected]>
Finak G, Bashasharti A, Brinkmann R, Gottardo R. Merging Mixture Model Components for Improved Cell Population Identification in High Throughput Flow Cytometry Data; Advances in Bioinformatics (To Appear)
#data(rituximab) #data(RituximabFlowClustFit) #o<-flowObj(flowClust.res[[which.max(BIC(flowClust.res))]],rituximab); #m<-merge(o) #i<-fitPiecewiseLinreg(m);
#data(rituximab) #data(RituximabFlowClustFit) #o<-flowObj(flowClust.res[[which.max(BIC(flowClust.res))]],rituximab); #m<-merge(o) #i<-fitPiecewiseLinreg(m);
Methods for the function fitPiecewiseLinreg
in the package flowMerge
A list of flowMerge
objectes derived from a call to the merge
function.
Finak G, Bashasharti A, Brinkmann R, Gottardo R. Merging Mixture Model Components for Improved Cell Population Identification in High Throughput Flow Cytometry Data; Advances in Bioinformatics (To Appear)
Update the flagOutliers slot in a flowMerge object. This method is internal and called automatically from within the merging code.
flagOutliers(object,...)
flagOutliers(object,...)
object |
An object of type |
... |
Additional arguments, currently unused |
.
Methods that update the flagOutliers
slot in a flowMerge object so that they reflec the outliers in the new merged clustering. This is an internal function, not meant for user consumption. It is called from within the merge
method.
Update the flagOutliers
slot for an object of type flowMerge
The Rituximab data set accessible via data(rituximab)
in the flowClust
package fitted to a flowClust model containing from one to ten components. The results are in the object flowClust.res
.
data(RituximabFlowClustFit)
data(RituximabFlowClustFit)
The format is:
flowClust.res
is a flowClustList
, where each element of the list is a flowClust
model of the rituximab
data, for K=1
through K=10
components, respectively. The structure of flowClustList
and flowClust
can be found in the corresponding documentation of the flowClust
package. The format of the rituximab
data is found in the documentation for that data set.
The models have been precomputed for use in flowMerge
examples to save computation time. flowClust
was called on the rituximab data to generate these models with the following command: flowClust.res<-flowClust(rituximab,K=1:10,B=1000,B.init=100,tol=1e-5,tol.init=1e-2,nu=4,randomStart=50,trans=1,nu.est=1)
.
Gasparetto, M., Gentry, T., Sebti, S., O'Bryan, E., Nimmanapalli, R., Blaskovich, M. A., Bhalla, K., Rizzieri, D., Haaland, P., Dunne, J. and Smith, C. (2004) Identification of compounds that enhance the anti-lymphoma activity of rituximab using flow cytometric high-content screening. J. Immunol. Methods 292, 59-71.
#data(RituximabFlowClustFit) #summary(flowClust.res);
#data(RituximabFlowClustFit) #summary(flowClust.res);
A class to represent flowMerge objects
The object unites the flowMerge model output and the data being modeled and contains additional slots for various characteristics of a merged cluster solution, including the entropy of clustering.
merged
:The number of observations merged at the current step in the algorithm.
mtree
:A tree–structured graph representing the order of merged components in the model. Inspired by SPADE. (Bendall et al.)
entropy
:The entropy of clustering of the current solution.
DATA
:An environment whose first element contains the flowFrame with the data modeled by this flowMerge object
expName
:See the flowClust
package for details
varNames
:See the flowClust
package for details
K
:The number of clusters in the merged solution. See the flowClust
package for details
w
:The proportions for each component in the merged solution. See the flowClust
package for details
mu
:The means of the components in the merged solution. See the flowClust
package for details
sigma
:The covraiances of the components in the merged solution. See the flowClust
package for details
lambda
:See the flowClust
package for details
nu
:See the flowClust
package for details
z
:See the flowClust
package for details
u
:The uncertainties for each data point.
label
:See the flowClust
package for details
uncertainty
:See the flowClust
package for details
ruleOutliers
:See the flowClust
package for details
flagOutliers
:See the flowClust
package for details
rm.min
:See the flowClust
package for details
rm.max
:See the flowClust
package for details
logLike
:See the flowClust
package for details
BIC
:See the flowClust
package for details
ICL
:See the flowClust
package for details
Class "flowObj"
, directly.
Class "flowClust"
, by class "flowObj", distance 2.
signature(obj = "flowMerge")
: Retrieves the flowFrame
in the DATA
environment slot.
signature(x = "flowMerge", y = "missing")
: Plots the clusters in this object.
signature(x="flowMerge")
: Prints a summary of the object.
signature(x="flowMerge")
: Prints information about the object.
Greg Finak <[email protected]>
Finak G, Bashasharti A, Brinkmann R, Gottardo R. Merging Mixture Model Components for Improved Cell Population Identification in High Throughput Flow Cytometry Data (Submitted)
Convenience method that creates a flowObj
object from a flowClust
and flowFrame
object, so as to group the model and data together. Useful for high-throughput analysis where one may want to access the data to compute other statistics.
flowObj(flowC = NULL, flowF = NULL)
flowObj(flowC = NULL, flowF = NULL)
flowC |
A |
flowF |
A |
Calls the new("flowObj",..)
constructor.
An object of class flowObj-class
Greg Finak <[email protected]>, Raphael Gottardo <[email protected]>
Finak G, Bashasharti A, Brinkmann R, Gottardo R. Merging Mixture Model Components for Improved Cell Population Identification in High Throughput Flow Cytometry Data; Advances in Bioinformatics (To Appear)
#data(rituximab) #data(RituximabFlowClustFit) #o<-flowObj(flowClust.res[[which.max(flowMerge:::BIC(flowClust.res))]],rituximab); #m<-merge(o);
#data(rituximab) #data(RituximabFlowClustFit) #o<-flowObj(flowClust.res[[which.max(flowMerge:::BIC(flowClust.res))]],rituximab); #m<-merge(o);
A class inheriting from flowClust
that groups the model and data in a single object.
Objects can be created by calls of the form new("flowObj", ...)
. Has a convenience method flowObj(flowClustObj, flowFrameObj)
for creating instances of the class.
DATA
:An "environment"
that holds a pointer to the flowFrame
data in position [[1]]
.
expName
:As described in the flowClust
documentation
varNames
:As described in the flowClust
documentation
K
:As described in the flowClust
documentation
w
:As described in the flowClust
documentation
mu
:As described in the flowClust
documentation
sigma
:As described in the flowClust
documentation
lambda
:As described in the flowClust
documentation
nu
:As described in the flowClust
documentation
z
:As described in the flowClust
documentation
u
:As described in the flowClust
documentation
label
:As described in the flowClust
documentation
uncertainty
:As described in the flowClust
documentation
ruleOutliers
:As described in the flowClust
documentation
flagOutliers
:As described in the flowClust
documentation
rm.min
:As described in the flowClust
documentation
rm.max
:As described in the flowClust
documentation
logLike
:As described in the flowClust
documentation
BIC
:As described in the flowClust
documentation
ICL
:As described in the flowClust
documentation
Class "flowClust"
, directly.
signature(obj = "flowObj")
: Retreives the contents of the DATA
environment
signature(x = "flowObj", y = "missing")
: the flowMerge
algorithm is called via this function on objects of type flowObj
.
signature(x = "flowObj", y = "missing")
: A simplified plotting method. Does not require specification of the data since it is contained in the flowObj
object. Takes most of the same parameters as plot.flowClust
, except the data
parameter
Greg Finak <[email protected]>, Raphael Gottardo <[email protected]>
Finak G, Bashasharti A, Brinkmann R, Gottardo R. Merging Mixture Model Components for Improved Cell Population Identification in High Throughput Flow Cytometry Data; Advances in Bioinformatics (To Appear)
Initializes a snow cluster for use with flowMerge, ensures that the flowMerge library is loaded in all environments. Not meant to be called by the user
initPFlowMerge(cl)
initPFlowMerge(cl)
cl |
A |
A valid snow
cluster.
Greg Finak <[email protected]>
Finak G, Bashasharti A, Brinkmann R, Gottardo R. Merging Mixture Model Components for Improved Cell Population Identification in High Throughput Flow Cytometry Data; Advances in Bioinformatics (To Appear)
Traverse the rows of a matrix of probabilities of size n x k, where the n rows are samples, and the k columns are the probability of assignment of the sample to each of k classes. The most probable class assignment is selected for each row and a vector of classes is returned.
map(z, ...)
map(z, ...)
z |
A matrix of probabilities. |
... |
Additional arguments, not currently used. |
A vector of class assignments of lenght n
.
Greg Finak <[email protected]>, Raphael Gottardo <[email protected]>
z<-t(apply(t(replicate(100,rgamma(5,0.1,1))),1,function(x)x/sum(x))); map(z);
z<-t(apply(t(replicate(100,rgamma(5,0.1,1))),1,function(x)x/sum(x))); map(z);
Merge the clusters in a flowClust solution using the cluster merging algorithm and entropy criterion.
merge(x,y,...)
merge(x,y,...)
x |
A |
y |
missing |
... |
Additional arguments. i.e. |
Run the cluster merging algorithm on the max BIC solution from a call to flowClust
. The optional argument, metric
specifies the measure used for clustering. Either "mahalanobis" or "entropy". Defaults to "entropy".
A list of unnamed flowMerge
objects. The first element of the list corresponds to the 1–cluster merged solution. The second element corresponds to the 2–cluster merged solution, and so on.
Greg Finak <[email protected]>
Finak G, Bashasharti A, Brinkmann R, Gottardo R. Merging Mixture Model Components for Improved Cell Population Identification in High Throughput Flow Cytometry Data (Submitted)
#data(rituximab) #data(RituximabFlowClustFit) #o<-flowObj(flowClust.res[[which.max(BIC(flowClust.res))]],rituximab) #m<-merge(o);
#data(rituximab) #data(RituximabFlowClustFit) #o<-flowObj(flowClust.res[[which.max(BIC(flowClust.res))]],rituximab) #m<-merge(o);
Merge mixture components in a flowObj
derived from a flowClust
result and a flowFrame
using the cluster merging algorithm.
An unnamed list of flowMerge
objects with the k
th element corresponding to the k-cluster merged solution.
The generic method. Should not be called.
The merge method for a flowObj
.
Finak G, Bashasharti A, Brinkmann R, Gottardo R. Merging Mixture Model Components for Improved Cell Population Identification in High Throughput Flow Cytometry Data (To Appear)
#data(rituximab) #data(RituximabFlowClustFit) #o<-flowObj(flowClust.res[[which.max(flowMerge:::BIC(flowClust.res))]],rituximab); #m<-merge(o);
#data(rituximab) #data(RituximabFlowClustFit) #o<-flowObj(flowClust.res[[which.max(flowMerge:::BIC(flowClust.res))]],rituximab); #m<-merge(o);
Internal cluster merging function.
mergeClusters(object, metric)
mergeClusters(object, metric)
object |
not meant to be called by the user |
metric |
not meant to be called by the user |
Not meant to be called by the user
Not meant to be called by the user
Greg Finak <[email protected]>
Finak G, Bashasharti A, Brinkmann R, Gottardo R. Merging Mixture Model Components for Improved Cell Population Identification in High Throughput Flow Cytometry Data (Submitted)
Internal function not meant to be called by the user.
mergeClusters2(object, a, b)
mergeClusters2(object, a, b)
object |
Internal function not meant to be called by the user. |
a |
Internal function not meant to be called by the user. |
b |
Internal function not meant to be called by the user. |
Internal function not meant to be called by the user.
Internal function not meant to be called by the user.
Greg Finak <[email protected]>
Finak G, Bashasharti A, Brinkmann R, Gottardo R. Merging Mixture Model Components for Improved Cell Population Identification in High Throughput Flow Cytometry Data (Submitted)
Extracts the normalized entropy from a list of flowMerge objects.
NENT(x)
NENT(x)
x |
A list of flowMerge objects |
The normalized entropy is extracted from a flowMerge object by computing where
is the entropy, and
and n are the number of clusters and data points, respectively.
Returns a vector of normalized entropy values for the flowMerge objects.
This function doesn't do enough error checking and will try to extract the entropy from a list of anything.
Greg Finak <[email protected]>
Finak G, Bashasharti A, Brinkmann R, Gottardo R. Merging Mixture Model Components for Improved Cell Population Identification in High Throughput Flow Cytometry Data; Advances in Bioinformatics (To Appear)
#data(RituximabFlowClustFit) #data(rituximab) #o<-flowObj(flowClust.res[[which.max(flowMerge:::BIC(flowClust.res))]],rituximab); #m<-merge(o); #flowMerge:::ENT(m); #flowMerge:::NENT(m);
#data(RituximabFlowClustFit) #data(rituximab) #o<-flowObj(flowClust.res[[which.max(flowMerge:::BIC(flowClust.res))]],rituximab); #m<-merge(o); #flowMerge:::ENT(m); #flowMerge:::NENT(m);
A parallelized call to flowClust via the snow package and framework. Not called by the user.
pFlowClust(flowData,cl, K = 1:15, B.init = 100, tol.init = 0.01, tol = 1e-05, B = 1000, randomStart = 50, nu = 4, nu.est = 1, trans = 1, varNames = NA)
pFlowClust(flowData,cl, K = 1:15, B.init = 100, tol.init = 0.01, tol = 1e-05, B = 1000, randomStart = 50, nu = 4, nu.est = 1, trans = 1, varNames = NA)
flowData |
The data object, must be a flowFrame, flowSet or list of flowFrames |
cl |
The snow cluster object |
K |
The number of clusters to try for each flowFrame. Can be a vector. This is what is parallelized across processors. |
B.init |
See |
tol.init |
See |
tol |
See |
B |
See |
randomStart |
See |
nu |
See |
nu.est |
See |
trans |
See |
varNames |
See |
Calls flowClust via the clusterMap
method of the snow
package. Parallelizes the computation of multiple components for a single flowFrame in a loop over multiple flowFrames. If the snow cluster is NULL, will make the call via mapply.
Returns a list of lists of flowClust objects
The outer list corresponds to the flowFrames passed into the method.
The inner list corresponds to the K
cluster solutions passed into the method, for each flowFrame
(ie If the input is a list of two flowFrames, and K=1:10, then the result is a list of length 2. Each element of the list is itself a list of length 10. The kth element of the inner list is the flowClust k cluster solution.)
Greg Finak <[email protected]>
Finak G, Bashasharti A, Brinkmann R, Gottardo R. Merging Mixture Model Components for Improved Cell Population Identification in High Throughput Flow Cytometry Data (Submitted)
flowClust,snow
Calls the flowMerge methods to compute the merged solution from a flowClust object or set of objects in a parallelized manner using the snow
framework.
pFlowMerge(flowData, cl, K = 1:15, B.init = 100, tol.init = 0.01, tol = 1e-05, B = 500, randomStart = 10, nu = 4, nu.est = 0, trans = 1, varNames = NA)
pFlowMerge(flowData, cl, K = 1:15, B.init = 100, tol.init = 0.01, tol = 1e-05, B = 500, randomStart = 10, nu = 4, nu.est = 0, trans = 1, varNames = NA)
flowData |
The data to be fit. A list of |
cl |
The |
K |
See |
B.init |
See |
tol.init |
See |
tol |
See |
B |
See |
randomStart |
See |
nu |
See |
nu.est |
See |
trans |
See |
varNames |
See |
Makes a parallelized call to flowClust
. Parses the results to extract the max BIC solution, merges clusters, finds the optimal k-cluster solution using the entropy and returns it. If cl
is NULL
, a non-parallel call is made to the flowClust function.
A list of flowMerge objects. One per flowFrame passed into the method.
This function does not do any special memory management. A large data set will likely cause it to run out of memory and start swapping incessantly. If you have lots of data, it's best to feed it piecewise to pFlowClust.
Greg Finak <[email protected]>
Finak G, Bashasharti A, Brinkmann R, Gottardo R. Merging Mixture Model Components for Improved Cell Population Identification in High Throughput Flow Cytometry Data (Submitted)
pFlowClust,flowClust,merge,snow, fitPiecewiseLinreg
data(rituximab) #Parallelized call below: ## Not run: cl<-makeSOCKcluster(rep("finakg@localhost",7)) ## Not run: result<-pFlowMerge(rituximab,cl,varNames=c("FSC.H","SSC.H")) ## Not run: plot(result) #cl<-NULL; #result<-pFlowMerge(rituximab,cl=NULL,varNames=c("FSC.H","SSC.H"),K=1:8); #plot(result);
data(rituximab) #Parallelized call below: ## Not run: cl<-makeSOCKcluster(rep("finakg@localhost",7)) ## Not run: result<-pFlowMerge(rituximab,cl,varNames=c("FSC.H","SSC.H")) ## Not run: plot(result) #cl<-NULL; #result<-pFlowMerge(rituximab,cl=NULL,varNames=c("FSC.H","SSC.H"),K=1:8); #plot(result);
Plots all possible two-dimensional projections of the parameters in a flowMerge
or flowObj
object and does not require specification of the flowFrame
since a pointer to the data is stored in the object. Informative axis names are used, rather than the usual FL1/FL2/FS/SS channel names. This funciton can take most of the usual additional arguments provided to plot
for the flowClust
package, although some, like the axis names and the data
are fixed. In order for flowMerge
objects to display outliers correctly with plot
(following merging), the updateU
method must be called on them first.
x
is a flowMerge
object.
x
is a flowObj
object.
#data(rituximab) #data(RituximabFlowClustFit) #o<-flowObj(flowClust.res[[which.max(flowMerge:::BIC(flowClust.res))]],rituximab); #m<-merge(o); #i<-fitPiecewiseLinreg(m); #m<-m[[i]]; #plot(m,pch=20,level=0.9);
#data(rituximab) #data(RituximabFlowClustFit) #o<-flowObj(flowClust.res[[which.max(flowMerge:::BIC(flowClust.res))]],rituximab); #m<-merge(o); #i<-fitPiecewiseLinreg(m); #m<-m[[i]]; #plot(m,pch=20,level=0.9);
This function generates and returns a new function which can be used to plot the merging tree for a flowMerge model, with nodes highlighted based on the expression of different parameters for each cell population.
ptree(x,y)
ptree(x,y)
x |
A character string of the name of the variable holding the list of merged models returned from flowMerge |
y |
The index of the best fitting merged model in that list |
ptree will generate a function that will plot the merging tree from a flowMerge model. Nodes will be colored by the intensity of staining of that population in a given dimension. Calling f<-ptree("model.name",fitPiecewiseLinreg(model.name))
will assign the function to f
. Calling f(3)
will plot the merging tree with nodes hightlighted according to parameter 3, presuming that there are that many parameters in the model.
Returns a function
A plot will be drawn on the current device.
Greg Finak <[email protected]>
Accessors to describe a flowObj
or flowMerge
object.
Describe a flowMerge
object.
Describe a flowObj
object.
Split method defined for flowMerge objects. Pulls out the population based on cluster number.
\itemx = "flowMerge", f = "missing" Split a flowMerge object into its component clusters.
Summary method for flowMerge
objects.
Summarize a flowMerge object.
Summarize a flowObj object
Updates the uncertainties in a flowMerge
ojbect after merging clusters. This function is now internal and no longer exported. It is called automatically within the cluster merging method.
updateU(object)
updateU(object)
object |
An object of type |
Updates the u
slot of the flowMerge
object following merging. The update is computation intensive, and so, is not automatically performed on each flowMerge
object. Should only be done on objects used in further analysis.
A flowMerge
object with the u
slot updated to reflect the new parameter values.
Greg Finak <[email protected]>
#data(rituximab) #data(RituximabFlowClustFit) #o<-flowObj(flowClust.res[[which.max(flowMerge:::BIC(flowClust.res))]],rituximab); #m<-merge(o); #i<-fitPiecewiseLinreg(m); #m<-m[[i]]; #plot(m,pch=20,level=0.9);
#data(rituximab) #data(RituximabFlowClustFit) #o<-flowObj(flowClust.res[[which.max(flowMerge:::BIC(flowClust.res))]],rituximab); #m<-merge(o); #i<-fitPiecewiseLinreg(m); #m<-m[[i]]; #plot(m,pch=20,level=0.9);