Title: | Low Dimensions projection of cytometry samples |
---|---|
Description: | This package implements a low dimensional visualization of a set of cytometry samples, in order to visually assess the 'distances' between them. This, in turn, can greatly help the user to identify quality issues like batch effects or outlier samples, and/or check the presence of potential sample clusters that might align with the exeprimental design. The CytoMDS algorithm combines, on the one hand, the concept of Earth Mover's Distance (EMD), a.k.a. Wasserstein metric and, on the other hand, the Multi Dimensional Scaling (MDS) algorithm for the low dimensional projection. Also, the package provides some diagnostic tools for both checking the quality of the MDS projection, as well as tools to help with the interpretation of the axes of the projection. |
Authors: | Philippe Hauchamps [aut, cre] , Laurent Gatto [aut] , Dan Lin [ctb] |
Maintainer: | Philippe Hauchamps <[email protected]> |
License: | GPL-3 |
Version: | 1.3.1 |
Built: | 2024-11-17 03:10:47 UTC |
Source: | https://github.com/bioc/CytoMDS |
Computation of summary statistic for selected channels, for all flowFrames of a flowSet, or for all expression matrices of a list. This method provides three different input modes:
the user provides directly a flowCore::flowSet loaded in memory (RAM)
the user provides directly a list of expression matrices of which the column names are the channel/marker names
the user provides (1.) a number of samples nSamples
; (2.) an ad-hoc
function that takes as input an index between 1 and nSamples
, and codes
the method to load the corresponding expression matrix in memory;
channelSummaryStats( x, loadExprMatrixFUN = NULL, loadExprMatrixFUNArgs = NULL, channels = NULL, statFUNs = stats::median, verbose = FALSE, BPPARAM = BiocParallel::SerialParam(), BPOPTIONS = BiocParallel::bpoptions(packages = c("flowCore")) )
channelSummaryStats( x, loadExprMatrixFUN = NULL, loadExprMatrixFUNArgs = NULL, channels = NULL, statFUNs = stats::median, verbose = FALSE, BPPARAM = BiocParallel::SerialParam(), BPOPTIONS = BiocParallel::bpoptions(packages = c("flowCore")) )
x |
can be:
|
loadExprMatrixFUN |
the function used to translate an integer index
into an expression matrix. In other words, the function should code how to
load the |
loadExprMatrixFUNArgs |
(optional) a named list containing
additional input parameters of |
channels |
which channels needs to be included:
|
statFUNs |
a list (possibly of length one) of functions to call to calculate the statistics, or a simple function. This list can be named, in that case, these names will be transfered to the returned list. |
verbose |
if |
BPPARAM |
sets the |
BPOPTIONS |
sets the BPOPTIONS to be passed to |
a list of named statistic matrices.
In each stat matrix, the columns are the channel statistics
for all flowFrames of the flowSet.
Exception: if only one stat function (and not a list) is passed in
statFUNs
, the return value is simplified to the stat matrix itself.
library(CytoPipeline) data(OMIP021Samples) # estimate scale transformations # and transform the whole OMIP021Samples transList <- estimateScaleTransforms( ff = OMIP021Samples[[1]], fluoMethod = "estimateLogicle", scatterMethod = "linearQuantile", scatterRefMarker = "BV785 - CD3") OMIP021Trans <- CytoPipeline::applyScaleTransforms( OMIP021Samples, transList) channelsOrMarkers <- c("FSC-A", "SSC-A", "BV785 - CD3") # calculate mean for each 4 selected channels, for each 2 samples channelMeans <- channelSummaryStats( OMIP021Trans, channels = channelsOrMarkers, statFUNs = mean) # calculate median AND std deviation # for each 4 selected channels, for each 2 samples channelMedians <- channelSummaryStats( OMIP021Trans, channels = channelsOrMarkers, statFUNs = list("median" = stats::median, "std.dev" = stats::sd))
library(CytoPipeline) data(OMIP021Samples) # estimate scale transformations # and transform the whole OMIP021Samples transList <- estimateScaleTransforms( ff = OMIP021Samples[[1]], fluoMethod = "estimateLogicle", scatterMethod = "linearQuantile", scatterRefMarker = "BV785 - CD3") OMIP021Trans <- CytoPipeline::applyScaleTransforms( OMIP021Samples, transList) channelsOrMarkers <- c("FSC-A", "SSC-A", "BV785 - CD3") # calculate mean for each 4 selected channels, for each 2 samples channelMeans <- channelSummaryStats( OMIP021Trans, channels = channelsOrMarkers, statFUNs = mean) # calculate median AND std deviation # for each 4 selected channels, for each 2 samples channelMedians <- channelSummaryStats( OMIP021Trans, channels = channelsOrMarkers, statFUNs = list("median" = stats::median, "std.dev" = stats::sd))
Multi-dimensional scaling projection of samples,
using a distance matrix as an input.
The MDS algorithm is not the classical MDS
(cmdscale
alike, aka Torgerson's algorithm),
but is the SMACOF algorithm for metric distances that are not
necessarily euclidean.
After having obtained the projections on the nDim
dimensions,
we always apply svd decomposition to visualize as first axes the ones that
contain the most variance of the projected dataset in nDim
dimensions.
Instead of being provided directly by the user, the nDim
parameter can
otherwise be found iteratively by finding the minimum nDim
parameter that
allows the projection to reach a target pseudo RSquare.
If this is the case, the maxDim
parameter is used to avoid
looking for too big projection spaces.
computeMetricMDS( pwDist, nDim = NULL, seed = NULL, targetPseudoRSq = 0.95, maxDim = 128, ... )
computeMetricMDS( pwDist, nDim = NULL, seed = NULL, targetPseudoRSq = 0.95, maxDim = 128, ... )
pwDist |
( |
nDim |
number of dimensions of projection, as input to SMACOF algorithm
if not provided, will be found iteratively using |
seed |
seed to be set when launching SMACOF algorithm
(e.g. when |
targetPseudoRSq |
target pseudo RSquare to be reached
(only used when |
maxDim |
in case |
... |
additional parameters passed to SMACOF algorithm |
an object of S4 class MDS
library(CytoPipeline) data(OMIP021Samples) # estimate scale transformations # and transform the whole OMIP021Samples transList <- estimateScaleTransforms( ff = OMIP021Samples[[1]], fluoMethod = "estimateLogicle", scatterMethod = "linearQuantile", scatterRefMarker = "BV785 - CD3") OMIP021Trans <- CytoPipeline::applyScaleTransforms( OMIP021Samples, transList) # As there are only 2 samples in OMIP021Samples dataset, # we create artificial samples that are random combinations of both samples ffList <- c( flowCore::flowSet_to_list(OMIP021Trans), lapply(3:5, FUN = function(i) { aggregateAndSample( OMIP021Trans, seed = 10*i, nTotalEvents = 5000)[,1:22] })) fsNames <- c("Donor1", "Donor2", paste0("Agg",1:3)) names(ffList) <- fsNames fsAll <- as(ffList,"flowSet") flowCore::pData(fsAll)$type <- factor(c("real", "real", rep("synthetic", 3))) flowCore::pData(fsAll)$grpId <- factor(c("D1", "D2", rep("Agg", 3))) # calculate all pairwise distances pwDist <- pairwiseEMDDist(fsAll, channels = c("FSC-A", "SSC-A"), verbose = FALSE) # compute Metric MDS object with explicit number of dimensions mdsObj <- computeMetricMDS(pwDist, nDim = 4, seed = 0) dim <- nDim(mdsObj) # should be 4 #' # compute Metric MDS object by reaching a target pseudo RSquare mdsObj2 <- computeMetricMDS(pwDist, seed = 0, targetPseudoRSq = 0.999)
library(CytoPipeline) data(OMIP021Samples) # estimate scale transformations # and transform the whole OMIP021Samples transList <- estimateScaleTransforms( ff = OMIP021Samples[[1]], fluoMethod = "estimateLogicle", scatterMethod = "linearQuantile", scatterRefMarker = "BV785 - CD3") OMIP021Trans <- CytoPipeline::applyScaleTransforms( OMIP021Samples, transList) # As there are only 2 samples in OMIP021Samples dataset, # we create artificial samples that are random combinations of both samples ffList <- c( flowCore::flowSet_to_list(OMIP021Trans), lapply(3:5, FUN = function(i) { aggregateAndSample( OMIP021Trans, seed = 10*i, nTotalEvents = 5000)[,1:22] })) fsNames <- c("Donor1", "Donor2", paste0("Agg",1:3)) names(ffList) <- fsNames fsAll <- as(ffList,"flowSet") flowCore::pData(fsAll)$type <- factor(c("real", "real", rep("synthetic", 3))) flowCore::pData(fsAll)$grpId <- factor(c("D1", "D2", rep("Agg", 3))) # calculate all pairwise distances pwDist <- pairwiseEMDDist(fsAll, channels = c("FSC-A", "SSC-A"), verbose = FALSE) # compute Metric MDS object with explicit number of dimensions mdsObj <- computeMetricMDS(pwDist, nDim = 4, seed = 0) dim <- nDim(mdsObj) # should be 4 #' # compute Metric MDS object by reaching a target pseudo RSquare mdsObj2 <- computeMetricMDS(pwDist, seed = 0, targetPseudoRSq = 0.999)
Calculate Earth Mover's distance between two samples
EMDDist( x1, x2, channels = NULL, binSize = 0.05, minRange = -10, maxRange = 10, returnAll = FALSE )
EMDDist( x1, x2, channels = NULL, binSize = 0.05, minRange = -10, maxRange = 10, returnAll = FALSE )
x1 |
can be either a flowCore::flowFrame, or an expression matrix |
x2 |
can be either a flowCore::flowFrame, or an expression matrix |
channels |
which channels (integer index(ices) or character(s)):
|
binSize |
size of equal bins to approximate the marginal distributions. |
minRange |
minimum value taken when approximating the marginal distributions |
maxRange |
maximum value taken when approximating the marginal distributions |
returnAll |
If |
the Earth Mover's distance between x1
and x2
,
which is calculated by summing up all EMD approximates for
the marginal distributions of each channel
library(CytoPipeline) data(OMIP021Samples) # estimate scale transformations # and transform the whole OMIP021Samples transList <- estimateScaleTransforms( ff = OMIP021Samples[[1]], fluoMethod = "estimateLogicle", scatterMethod = "linearQuantile", scatterRefMarker = "BV785 - CD3") OMIP021Trans <- CytoPipeline::applyScaleTransforms( OMIP021Samples, transList) # distance with itself (all channels at once) # => should return 0 dist0 <- EMDDist( x1 = OMIP021Trans[[1]], x2 = OMIP021Trans[[1]]) # returning only distance, 2 channels dist1 <- EMDDist( x1 = OMIP021Trans[[1]], x2 = OMIP021Trans[[2]], channels = c("FSC-A", "SSC-A")) # using only one channel, passed by marker name dist2 <- EMDDist(x1 = OMIP021Trans[[1]], x2 = OMIP021Trans[[2]], channels = c("BV785 - CD3")) # using only one channel, passed by index dist3 <- EMDDist(x1 = OMIP021Trans[[1]], x2 = OMIP021Trans[[2]], channels = 10) dist2 == dist3
library(CytoPipeline) data(OMIP021Samples) # estimate scale transformations # and transform the whole OMIP021Samples transList <- estimateScaleTransforms( ff = OMIP021Samples[[1]], fluoMethod = "estimateLogicle", scatterMethod = "linearQuantile", scatterRefMarker = "BV785 - CD3") OMIP021Trans <- CytoPipeline::applyScaleTransforms( OMIP021Samples, transList) # distance with itself (all channels at once) # => should return 0 dist0 <- EMDDist( x1 = OMIP021Trans[[1]], x2 = OMIP021Trans[[1]]) # returning only distance, 2 channels dist1 <- EMDDist( x1 = OMIP021Trans[[1]], x2 = OMIP021Trans[[2]], channels = c("FSC-A", "SSC-A")) # using only one channel, passed by marker name dist2 <- EMDDist(x1 = OMIP021Trans[[1]], x2 = OMIP021Trans[[2]], channels = c("BV785 - CD3")) # using only one channel, passed by index dist3 <- EMDDist(x1 = OMIP021Trans[[1]], x2 = OMIP021Trans[[2]], channels = 10) dist2 == dist3
ggplotMarginalDensities
uses ggplot2
to draw plots of marginal densities of selected channels of a flowSet.
If the flowSet contains several flowFrames, all events are concatenated
together.
By default, a pseudo Rsquare projection quality indicator,
and the number of dimensions of the MDS projection are provided in sub-title
ggplotMarginalDensities( x, sampleSubset, channels, pDataForColour, pDataForGroup, nEventInSubsample = Inf, seed = NULL, transList )
ggplotMarginalDensities( x, sampleSubset, channels, pDataForColour, pDataForGroup, nEventInSubsample = Inf, seed = NULL, transList )
x |
a |
sampleSubset |
(optional) a logical vector, of size |
channels |
(optional) |
pDataForColour |
(optional) which |
pDataForGroup |
(optional) which |
nEventInSubsample |
how many event to take (per flowFrame of the flowSet). |
seed |
if not null, used in subsampling. |
transList |
a |
a ggplot object
library(CytoPipeline) data(OMIP021Samples) # estimate scale transformations # and transform the whole OMIP021Samples transList <- estimateScaleTransforms( ff = OMIP021Samples[[1]], fluoMethod = "estimateLogicle", scatterMethod = "linearQuantile", scatterRefMarker = "BV785 - CD3") OMIP021Trans <- CytoPipeline::applyScaleTransforms( OMIP021Samples, transList) # As there are only 2 samples in OMIP021Samples dataset, # we create artificial samples that are random combinations of both samples ffList <- c( flowCore::flowSet_to_list(OMIP021Trans), lapply(3:5, FUN = function(i) { aggregateAndSample( OMIP021Trans, seed = 10*i, nTotalEvents = 5000)[,1:22] })) fsNames <- c("Donor1", "Donor2", paste0("Agg",1:3)) names(ffList) <- fsNames fsAll <- as(ffList,"flowSet") flowCore::pData(fsAll)$grpId <- factor(c("D1", "D2", rep("Agg", 3))) flowCore::pData(fsAll)$lbl <- paste0("S", 1:5) # plot densities, all samples together p <- ggplotMarginalDensities(fsAll) # plot densities, per sample p <- ggplotMarginalDensities(fsAll, pDataForGroup = "lbl") # plot densities, per sample and coloured by group p <- ggplotMarginalDensities( fsAll, pDataForGroup = "lbl", pDataForColour = "grpId")
library(CytoPipeline) data(OMIP021Samples) # estimate scale transformations # and transform the whole OMIP021Samples transList <- estimateScaleTransforms( ff = OMIP021Samples[[1]], fluoMethod = "estimateLogicle", scatterMethod = "linearQuantile", scatterRefMarker = "BV785 - CD3") OMIP021Trans <- CytoPipeline::applyScaleTransforms( OMIP021Samples, transList) # As there are only 2 samples in OMIP021Samples dataset, # we create artificial samples that are random combinations of both samples ffList <- c( flowCore::flowSet_to_list(OMIP021Trans), lapply(3:5, FUN = function(i) { aggregateAndSample( OMIP021Trans, seed = 10*i, nTotalEvents = 5000)[,1:22] })) fsNames <- c("Donor1", "Donor2", paste0("Agg",1:3)) names(ffList) <- fsNames fsAll <- as(ffList,"flowSet") flowCore::pData(fsAll)$grpId <- factor(c("D1", "D2", rep("Agg", 3))) flowCore::pData(fsAll)$lbl <- paste0("S", 1:5) # plot densities, all samples together p <- ggplotMarginalDensities(fsAll) # plot densities, per sample p <- ggplotMarginalDensities(fsAll, pDataForGroup = "lbl") # plot densities, per sample and coloured by group p <- ggplotMarginalDensities( fsAll, pDataForGroup = "lbl", pDataForColour = "grpId")
ggplotSampleMDS
uses ggplot2
to provide plots of Metric MDS results.
By default, a pseudo Rsquare projection quality indicator,
and the number of dimensions of the MDS projection are provided in sub-title
ggplotSampleMDS( mdsObj, pData, sampleSubset, projectionAxes = c(1, 2), biplot = FALSE, biplotType = c("correlation", "regression"), extVariables, pDataForColour, pDataForShape, pDataForLabel, pDataForAdditionalLabelling, sizeReflectingStress = FALSE, title = "Multi Dimensional Scaling", displayPointLabels = TRUE, pointLabelSize = 3.88, repelPointLabels = TRUE, displayArrowLabels = TRUE, arrowLabelSize = 3.88, repelArrowLabels = FALSE, arrowThreshold = 0.8, flipXAxis = FALSE, flipYAxis = FALSE, displayPseudoRSq = TRUE, ... )
ggplotSampleMDS( mdsObj, pData, sampleSubset, projectionAxes = c(1, 2), biplot = FALSE, biplotType = c("correlation", "regression"), extVariables, pDataForColour, pDataForShape, pDataForLabel, pDataForAdditionalLabelling, sizeReflectingStress = FALSE, title = "Multi Dimensional Scaling", displayPointLabels = TRUE, pointLabelSize = 3.88, repelPointLabels = TRUE, displayArrowLabels = TRUE, arrowLabelSize = 3.88, repelArrowLabels = FALSE, arrowThreshold = 0.8, flipXAxis = FALSE, flipYAxis = FALSE, displayPseudoRSq = TRUE, ... )
mdsObj |
a MDS object, output of the |
pData |
(optional) a data.frame providing user input sample data. These can be design of experiment variables, phenotype data per sample,... and will be used to highlight sample categories in the plot and/or for subsetting. |
sampleSubset |
(optional) a logical vector, of size |
projectionAxes |
which two axes should be plotted (should be a numeric vector of length 2) |
biplot |
if TRUE, adds projection of external variables |
biplotType |
type of biplot used:
|
extVariables |
are used to generate a biplot these are the external variables that will be used in the biplot. They should be provided as a matrix with named columns corresponding to the variables. The number of rows should be the same as the number of samples. The matrix might contain some NA's, in that case only complete rows will be used to calculate biplot arrows. |
pDataForColour |
(optional) which |
pDataForShape |
(optional) which |
pDataForLabel |
(optional) which |
pDataForAdditionalLabelling |
(optional) which |
sizeReflectingStress |
if TRUE, size of points will appear proportional to stress by point, i.e. the bigger the sample point appears, the less accurate its representation is (in terms of distances w.r.t. other points) |
title |
title to give to the plot |
displayPointLabels |
if TRUE, displays labels attached to points
(see |
pointLabelSize |
size of point labels
(default: 3.88 as in |
repelPointLabels |
if TRUE, uses |
displayArrowLabels |
if TRUE, displays arrows labels (only with biplot) |
arrowLabelSize |
size of arrow labels
(default: 3.88 as in |
repelArrowLabels |
if TRUE, uses |
arrowThreshold |
(only with biplot), arrows will be made barely visible if their length is (in absolute value) less than this threshold. |
flipXAxis |
if TRUE, take the opposite of x values (provided as it might ease low dimensional projection comparisons) |
flipYAxis |
if TRUE, take the opposite of y values (provided as it might ease low dimensional projection comparisons) |
displayPseudoRSq |
if TRUE, display pseudo RSquare in subtitle, on top of nb of dimensions |
... |
additional parameters passed to |
a ggplot object
ggplotSampleMDSWrapBiplots, ggplotSampleMDSShepard, computeMetricMDS
library(CytoPipeline) data(OMIP021Samples) # estimate scale transformations # and transform the whole OMIP021Samples transList <- estimateScaleTransforms( ff = OMIP021Samples[[1]], fluoMethod = "estimateLogicle", scatterMethod = "linearQuantile", scatterRefMarker = "BV785 - CD3") OMIP021Trans <- CytoPipeline::applyScaleTransforms( OMIP021Samples, transList) # As there are only 2 samples in OMIP021Samples dataset, # we create artificial samples that are random combinations of both samples ffList <- c( flowCore::flowSet_to_list(OMIP021Trans), lapply(3:5, FUN = function(i) { aggregateAndSample( OMIP021Trans, seed = 10*i, nTotalEvents = 5000)[,1:22] })) fsNames <- c("Donor1", "Donor2", paste0("Agg",1:3)) names(ffList) <- fsNames fsAll <- as(ffList,"flowSet") flowCore::pData(fsAll)$type <- factor(c("real", "real", rep("synthetic", 3))) flowCore::pData(fsAll)$grpId <- factor(c("D1", "D2", rep("Agg", 3))) # calculate all pairwise distances pwDist <- pairwiseEMDDist(fsAll, channels = c("FSC-A", "SSC-A"), verbose = FALSE) # compute Metric MDS object with explicit number of dimensions mdsObj <- computeMetricMDS(pwDist, nDim = 4, seed = 0) dim <- nDim(mdsObj) # should be 4 #' # compute Metric MDS object by reaching a target pseudo RSquare mdsObj2 <- computeMetricMDS(pwDist, seed = 0, targetPseudoRSq = 0.999) # plot mds projection on axes 1 and 2, # use 'grpId' for colour, 'type' for shape, and no label p_12 <- ggplotSampleMDS( mdsObj = mdsObj, pData = flowCore::pData(fsAll), projectionAxes = c(1,2), pDataForColour = "grpId", pDataForShape = "type") # plot mds projection on axes 3 and 4, # use 'grpId' for colour, and 'name' as point label p_34 <- ggplotSampleMDS( mdsObj = mdsObj, pData = flowCore::pData(fsAll), projectionAxes = c(3,4), pDataForColour = "grpId", pDataForLabel = "name") # plot mds projection on axes 1 and 2, # use 'group' for colour, 'type' for shape, and 'name' as point label # have sample point size reflecting 'stress' # i.e. quality of projection w.r.t. distances to other points p12_Stress <- ggplotSampleMDS( mdsObj = mdsObj, pData = flowCore::pData(fsAll), projectionAxes = c(1,2), pDataForColour = "grpId", pDataForLabel = "name", pDataForShape = "type", sizeReflectingStress = TRUE) # try to associate axes with median of each channel # => use bi-plot extVars <- channelSummaryStats( fsAll, channels = c("FSC-A", "SSC-A"), statFUNs = stats::median) bp_12 <- ggplotSampleMDS( mdsObj = mdsObj, pData = flowCore::pData(fsAll), projectionAxes = c(1,2), biplot = TRUE, extVariables = extVars, pDataForColour = "grpId", pDataForShape = "type", seed = 0) bp_34 <- ggplotSampleMDS( mdsObj = mdsObj, pData = flowCore::pData(fsAll), projectionAxes = c(3,4), biplot = TRUE, extVariables = extVars, pDataForColour = "grpId", pDataForLabel = "name", seed = 0)
library(CytoPipeline) data(OMIP021Samples) # estimate scale transformations # and transform the whole OMIP021Samples transList <- estimateScaleTransforms( ff = OMIP021Samples[[1]], fluoMethod = "estimateLogicle", scatterMethod = "linearQuantile", scatterRefMarker = "BV785 - CD3") OMIP021Trans <- CytoPipeline::applyScaleTransforms( OMIP021Samples, transList) # As there are only 2 samples in OMIP021Samples dataset, # we create artificial samples that are random combinations of both samples ffList <- c( flowCore::flowSet_to_list(OMIP021Trans), lapply(3:5, FUN = function(i) { aggregateAndSample( OMIP021Trans, seed = 10*i, nTotalEvents = 5000)[,1:22] })) fsNames <- c("Donor1", "Donor2", paste0("Agg",1:3)) names(ffList) <- fsNames fsAll <- as(ffList,"flowSet") flowCore::pData(fsAll)$type <- factor(c("real", "real", rep("synthetic", 3))) flowCore::pData(fsAll)$grpId <- factor(c("D1", "D2", rep("Agg", 3))) # calculate all pairwise distances pwDist <- pairwiseEMDDist(fsAll, channels = c("FSC-A", "SSC-A"), verbose = FALSE) # compute Metric MDS object with explicit number of dimensions mdsObj <- computeMetricMDS(pwDist, nDim = 4, seed = 0) dim <- nDim(mdsObj) # should be 4 #' # compute Metric MDS object by reaching a target pseudo RSquare mdsObj2 <- computeMetricMDS(pwDist, seed = 0, targetPseudoRSq = 0.999) # plot mds projection on axes 1 and 2, # use 'grpId' for colour, 'type' for shape, and no label p_12 <- ggplotSampleMDS( mdsObj = mdsObj, pData = flowCore::pData(fsAll), projectionAxes = c(1,2), pDataForColour = "grpId", pDataForShape = "type") # plot mds projection on axes 3 and 4, # use 'grpId' for colour, and 'name' as point label p_34 <- ggplotSampleMDS( mdsObj = mdsObj, pData = flowCore::pData(fsAll), projectionAxes = c(3,4), pDataForColour = "grpId", pDataForLabel = "name") # plot mds projection on axes 1 and 2, # use 'group' for colour, 'type' for shape, and 'name' as point label # have sample point size reflecting 'stress' # i.e. quality of projection w.r.t. distances to other points p12_Stress <- ggplotSampleMDS( mdsObj = mdsObj, pData = flowCore::pData(fsAll), projectionAxes = c(1,2), pDataForColour = "grpId", pDataForLabel = "name", pDataForShape = "type", sizeReflectingStress = TRUE) # try to associate axes with median of each channel # => use bi-plot extVars <- channelSummaryStats( fsAll, channels = c("FSC-A", "SSC-A"), statFUNs = stats::median) bp_12 <- ggplotSampleMDS( mdsObj = mdsObj, pData = flowCore::pData(fsAll), projectionAxes = c(1,2), biplot = TRUE, extVariables = extVars, pDataForColour = "grpId", pDataForShape = "type", seed = 0) bp_34 <- ggplotSampleMDS( mdsObj = mdsObj, pData = flowCore::pData(fsAll), projectionAxes = c(3,4), biplot = TRUE, extVariables = extVars, pDataForColour = "grpId", pDataForLabel = "name", seed = 0)
ggplotSampleMDSShepard
uses ggplot2
to provide plot of Metric MDS results.
Shepard diagram provides a scatter plot of :
on the x axis, the high dimensional pairwise distances between each sample pairs
on the y axis, the corresponding pairwise distances in the obtained low dimensional projection
ggplotSampleMDSShepard( mdsObj, nDim, title = "Multi Dimensional Scaling - Shepard's diagram", pointSize = 0.5, lineWidth = 0.5, displayPseudoRSq = TRUE )
ggplotSampleMDSShepard( mdsObj, nDim, title = "Multi Dimensional Scaling - Shepard's diagram", pointSize = 0.5, lineWidth = 0.5, displayPseudoRSq = TRUE )
mdsObj |
a MDS object, output of the |
nDim |
(optional) number of dimensions to use when calculating
Shepard's diagram and pseudoRSquare.
If missing, it will be set equal to the number of projection dimensions
as calculated in |
title |
title to give to the plot |
pointSize |
point size in plot |
lineWidth |
line width in plot |
displayPseudoRSq |
if TRUE, display pseudo RSquare in subtitle, on top of nb of dimensions |
a ggplot object
ggplotSampleMDS, computeMetricMDS
library(CytoPipeline) data(OMIP021Samples) # estimate scale transformations # and transform the whole OMIP021Samples transList <- estimateScaleTransforms( ff = OMIP021Samples[[1]], fluoMethod = "estimateLogicle", scatterMethod = "linearQuantile", scatterRefMarker = "BV785 - CD3") OMIP021Trans <- CytoPipeline::applyScaleTransforms( OMIP021Samples, transList) ffList <- c( flowCore::flowSet_to_list(OMIP021Trans), lapply(3:5, FUN = function(i) { aggregateAndSample( OMIP021Trans, seed = 10*i, nTotalEvents = 5000)[,1:22] })) fsNames <- c("Donor1", "Donor2", paste0("Agg",1:3)) names(ffList) <- fsNames fsAll <- as(ffList,"flowSet") flowCore::pData(fsAll)$type <- factor(c("real", "real", rep("synthetic", 3))) flowCore::pData(fsAll)$grpId <- factor(c("D1", "D2", rep("Agg", 3))) # calculate all pairwise distances pwDist <- pairwiseEMDDist(fsAll, channels = c("FSC-A", "SSC-A"), verbose = FALSE) # compute Metric MDS object with explicit number of dimensions mdsObj <- computeMetricMDS(pwDist, nDim = 4, seed = 0) dim <- nDim(mdsObj) # should be 4 #' # compute Metric MDS object by reaching a target pseudo RSquare mdsObj2 <- computeMetricMDS(pwDist, seed = 0, targetPseudoRSq = 0.999) # Shepard diagrams p2D <- ggplotSampleMDSShepard( mdsObj, nDim = 2, pointSize = 1, title = "Shepard with 2 dimensions") p3D <- ggplotSampleMDSShepard( mdsObj, nDim = 3, title = "Shepard with 3 dimensions") #' pDefD <- ggplotSampleMDSShepard( mdsObj, title = "Shepard with default nb of dimensions")
library(CytoPipeline) data(OMIP021Samples) # estimate scale transformations # and transform the whole OMIP021Samples transList <- estimateScaleTransforms( ff = OMIP021Samples[[1]], fluoMethod = "estimateLogicle", scatterMethod = "linearQuantile", scatterRefMarker = "BV785 - CD3") OMIP021Trans <- CytoPipeline::applyScaleTransforms( OMIP021Samples, transList) ffList <- c( flowCore::flowSet_to_list(OMIP021Trans), lapply(3:5, FUN = function(i) { aggregateAndSample( OMIP021Trans, seed = 10*i, nTotalEvents = 5000)[,1:22] })) fsNames <- c("Donor1", "Donor2", paste0("Agg",1:3)) names(ffList) <- fsNames fsAll <- as(ffList,"flowSet") flowCore::pData(fsAll)$type <- factor(c("real", "real", rep("synthetic", 3))) flowCore::pData(fsAll)$grpId <- factor(c("D1", "D2", rep("Agg", 3))) # calculate all pairwise distances pwDist <- pairwiseEMDDist(fsAll, channels = c("FSC-A", "SSC-A"), verbose = FALSE) # compute Metric MDS object with explicit number of dimensions mdsObj <- computeMetricMDS(pwDist, nDim = 4, seed = 0) dim <- nDim(mdsObj) # should be 4 #' # compute Metric MDS object by reaching a target pseudo RSquare mdsObj2 <- computeMetricMDS(pwDist, seed = 0, targetPseudoRSq = 0.999) # Shepard diagrams p2D <- ggplotSampleMDSShepard( mdsObj, nDim = 2, pointSize = 1, title = "Shepard with 2 dimensions") p3D <- ggplotSampleMDSShepard( mdsObj, nDim = 3, title = "Shepard with 3 dimensions") #' pDefD <- ggplotSampleMDSShepard( mdsObj, title = "Shepard with default nb of dimensions")
ggplotSampleMDSWrapBiplots
calls ggplotSampleMDS
repeatly to generate biplots with different sets of external variables
and align them in a grid using the patchwork
package, in a similar fashion
as ggplot2::facet_wrap()
does.
ggplotSampleMDSWrapBiplots( mdsObj, extVariableList, ncol = NULL, nrow = NULL, byrow = NULL, displayLegend = TRUE, ... )
ggplotSampleMDSWrapBiplots( mdsObj, extVariableList, ncol = NULL, nrow = NULL, byrow = NULL, displayLegend = TRUE, ... )
mdsObj |
a MDS object, output of the |
extVariableList |
should be a named list of external variable matrices Each element of the list should be a matrix with named columns corresponding to the variables. The number of rows should be the same as the number of samples. |
ncol |
passed to |
nrow |
passed to |
byrow |
passed to |
displayLegend |
if FALSE, will de-active the legend display |
... |
additional parameters passed to |
a ggplot object
ggplotSampleMDS, ggplotSampleMDSShepard, computeMetricMDS
library(CytoPipeline) data(OMIP021Samples) # estimate scale transformations # and transform the whole OMIP021Samples transList <- estimateScaleTransforms( ff = OMIP021Samples[[1]], fluoMethod = "estimateLogicle", scatterMethod = "linearQuantile", scatterRefMarker = "BV785 - CD3") OMIP021Trans <- CytoPipeline::applyScaleTransforms( OMIP021Samples, transList) # As there are only 2 samples in OMIP021Samples dataset, # we create artificial samples that are random combinations of both samples ffList <- c( flowCore::flowSet_to_list(OMIP021Trans), lapply(3:5, FUN = function(i) { aggregateAndSample( OMIP021Trans, seed = 10*i, nTotalEvents = 5000)[,1:22] })) fsNames <- c("Donor1", "Donor2", paste0("Agg",1:3)) names(ffList) <- fsNames fsAll <- as(ffList,"flowSet") flowCore::pData(fsAll)$type <- factor(c("real", "real", rep("synthetic", 3))) flowCore::pData(fsAll)$grpId <- factor(c("D1", "D2", rep("Agg", 3))) # calculate all pairwise distances pwDist <- pairwiseEMDDist(fsAll, channels = c("FSC-A", "SSC-A"), verbose = FALSE) # compute Metric MDS object with explicit number of dimensions mdsObj <- computeMetricMDS(pwDist, nDim = 4, seed = 0) dim <- nDim(mdsObj) # should be 4 #' # compute Metric MDS object by reaching a target pseudo RSquare mdsObj2 <- computeMetricMDS(pwDist, seed = 0, targetPseudoRSq = 0.999) # plot mds projection on axes 1 and 2, # use 'group' for colour, 'type' for shape, and no label p_12 <- ggplotSampleMDS( mdsObj = mdsObj, pData = flowCore::pData(fsAll), projectionAxes = c(1,2), pDataForColour = "grpId", pDataForShape = "type") # try to associate axes with median or std deviation of each channel # => use bi-plots extVarList <- channelSummaryStats( fsAll, channels = c("FSC-A", "SSC-A"), statFUNs = c("median" = stats::median, "std.dev" = stats::sd)) bpFull <- ggplotSampleMDSWrapBiplots( mdsObj = mdsObj, extVariableList = extVarList, pData = flowCore::pData(fsAll), projectionAxes = c(1,2), pDataForColour = "group", pDataForShape = "type", seed = 0)
library(CytoPipeline) data(OMIP021Samples) # estimate scale transformations # and transform the whole OMIP021Samples transList <- estimateScaleTransforms( ff = OMIP021Samples[[1]], fluoMethod = "estimateLogicle", scatterMethod = "linearQuantile", scatterRefMarker = "BV785 - CD3") OMIP021Trans <- CytoPipeline::applyScaleTransforms( OMIP021Samples, transList) # As there are only 2 samples in OMIP021Samples dataset, # we create artificial samples that are random combinations of both samples ffList <- c( flowCore::flowSet_to_list(OMIP021Trans), lapply(3:5, FUN = function(i) { aggregateAndSample( OMIP021Trans, seed = 10*i, nTotalEvents = 5000)[,1:22] })) fsNames <- c("Donor1", "Donor2", paste0("Agg",1:3)) names(ffList) <- fsNames fsAll <- as(ffList,"flowSet") flowCore::pData(fsAll)$type <- factor(c("real", "real", rep("synthetic", 3))) flowCore::pData(fsAll)$grpId <- factor(c("D1", "D2", rep("Agg", 3))) # calculate all pairwise distances pwDist <- pairwiseEMDDist(fsAll, channels = c("FSC-A", "SSC-A"), verbose = FALSE) # compute Metric MDS object with explicit number of dimensions mdsObj <- computeMetricMDS(pwDist, nDim = 4, seed = 0) dim <- nDim(mdsObj) # should be 4 #' # compute Metric MDS object by reaching a target pseudo RSquare mdsObj2 <- computeMetricMDS(pwDist, seed = 0, targetPseudoRSq = 0.999) # plot mds projection on axes 1 and 2, # use 'group' for colour, 'type' for shape, and no label p_12 <- ggplotSampleMDS( mdsObj = mdsObj, pData = flowCore::pData(fsAll), projectionAxes = c(1,2), pDataForColour = "grpId", pDataForShape = "type") # try to associate axes with median or std deviation of each channel # => use bi-plots extVarList <- channelSummaryStats( fsAll, channels = c("FSC-A", "SSC-A"), statFUNs = c("median" = stats::median, "std.dev" = stats::sd)) bpFull <- ggplotSampleMDSWrapBiplots( mdsObj = mdsObj, extVariableList = extVarList, pData = flowCore::pData(fsAll), projectionAxes = c(1,2), pDataForColour = "group", pDataForShape = "type", seed = 0)
Class representing Multi Dimensional Scaling (MDS) projection.
returns the value of the stress criterion, minimized by the SMACOF algorithm.
returns a vector of nPoints dimension, containing the stress
indicator per point. The stress
minimization criterion can indeed be
allocated per represented point. The more the stress of a particular point,
the less accurate its distances w.r.t. the other points.
## S4 method for signature 'MDS' show(object) nDim(x) nPoints(x) pwDist(x) projections(x) projDist(x) stress(x) spp(x) eigenVals(x) pctvar(x) RSq(x) RSqVec(x) GoF(x) smacofRes(x)
## S4 method for signature 'MDS' show(object) nDim(x) nPoints(x) pwDist(x) projections(x) projDist(x) stress(x) spp(x) eigenVals(x) pctvar(x) RSq(x) RSqVec(x) GoF(x) smacofRes(x)
object |
a |
x |
a |
nothing
nDim
numeric
, nb of dimensions of the projection
pwDist
An object of class dist
storing
the triangular relevant part of the symmetric, zero diagonal
pairwise distance matrix (nPoints * nPoints), BEFORE projection.
proj
The projection matrix, resulting from MDS
projDist
An object of class dist
storing
the triangular relevant part of the symmetric, zero diagonal
pairwise distance matrix (nPoints * nPoints), AFTER projection.
eigen
numeric
, vector of nDim
length, containing the eigen
values of the PCA that is applied after the Smacof algorithm.
pctvar
numeric
, vector of nDim
length, containing the percentage
of explained variance per axis.
RSq
numeric
, vector of pseudo R square indicators,
as a function of number of dimensions.
RSq[nDim]
is the global pseudo R square, as displayed on plots.
GoF
numeric
, vector of goodness of fit indicators,
as a function of number of dimensions.
GoF[nDim]
is the global goodness of fit.
Note pseudo R square and goodness of fit indicators are essentially the same indicator, only the definition of total sum of squares differ:
for pseudo RSq: TSS is calculated using the mean pairwise distance as minimum
for goodness of fit: TSS is calculated using 0 as minimum
smacofRes
an object of class 'smacofB' containing the algorithmic
optimization results, for example stress and stress per point,
as returned by smacof::smacofSym()
method.
nHD <- 10 nLD <- 2 nPoints <- 20 # generate uniformly distributed points in 10 dimensions points <- matrix( data = runif(n = nPoints * nHD), nrow = nPoints) # calculate euclidian distances pwDist <- dist(points) # compute Metric MDS object by reaching a target pseudo RSquare mdsObj <- computeMetricMDS(pwDist, targetPseudoRSq = 0.95) show(mdsObj)
nHD <- 10 nLD <- 2 nPoints <- 20 # generate uniformly distributed points in 10 dimensions points <- matrix( data = runif(n = nPoints * nHD), nrow = nPoints) # calculate euclidian distances pwDist <- dist(points) # compute Metric MDS object by reaching a target pseudo RSquare mdsObj <- computeMetricMDS(pwDist, targetPseudoRSq = 0.95) show(mdsObj)
Computation of all EMD between pairs of flowFrames belonging to a flowSet. This method provides three different input modes:
the user provides directly a flowCore::flowSet loaded in memory (RAM).
the user provides directly a list of expression matrices loaded in RAM, of which the column names are the channel/marker names
the user provides (1.) a number of samples nSamples
; (2.) an ad-hoc
function that takes as input an index between 1 and nSamples
, and codes
the method to load the corresponding expression matrix in memory;
Optional row and column ranges can be provided to limit the calculation
to a specific rectangle of the matrix. These i.e. can be specified as a way
to split heavy calculations of large distance matrices
on several computation nodes.
pairwiseEMDDist( x, rowRange = c(1, nSamples), colRange = c(min(rowRange), nSamples), loadExprMatrixFUN = NULL, loadExprMatrixFUNArgs = NULL, channels = NULL, verbose = FALSE, BPPARAM = BiocParallel::SerialParam(), BPOPTIONS = BiocParallel::bpoptions(packages = c("flowCore")), binSize = 0.05, minRange = -10, maxRange = 10 )
pairwiseEMDDist( x, rowRange = c(1, nSamples), colRange = c(min(rowRange), nSamples), loadExprMatrixFUN = NULL, loadExprMatrixFUNArgs = NULL, channels = NULL, verbose = FALSE, BPPARAM = BiocParallel::SerialParam(), BPOPTIONS = BiocParallel::bpoptions(packages = c("flowCore")), binSize = 0.05, minRange = -10, maxRange = 10 )
x |
can be:
|
rowRange |
the range of rows of the distance matrix to be calculated |
colRange |
the range of columns of the distance matrix to be calculated |
loadExprMatrixFUN |
the function used to translate an integer index
into an expression matrix. In other words, the function should code how to
load the |
loadExprMatrixFUNArgs |
(optional) a named list containing
additional input parameters of |
channels |
which channels (integer index(ices) or character(s)):
|
verbose |
if |
BPPARAM |
sets the |
BPOPTIONS |
sets the BPOPTIONS to be
passed to |
binSize |
size of equal bins to approximate the marginal distributions. |
minRange |
minimum value taken when approximating the marginal distributions |
maxRange |
maximum value taken when approximating the marginal distributions |
a distance matrix of pairwise distances (full symmetric with 0. diagonal)
library(CytoPipeline) data(OMIP021Samples) # estimate scale transformations # and transform the whole OMIP021Samples transList <- estimateScaleTransforms( ff = OMIP021Samples[[1]], fluoMethod = "estimateLogicle", scatterMethod = "linearQuantile", scatterRefMarker = "BV785 - CD3") OMIP021Trans <- CytoPipeline::applyScaleTransforms( OMIP021Samples, transList) # calculate pairwise distances using only FSC-A & SSC-A channels pwDist <- pairwiseEMDDist( x = OMIP021Trans, channels = c("FSC-A", "SSC-A"))
library(CytoPipeline) data(OMIP021Samples) # estimate scale transformations # and transform the whole OMIP021Samples transList <- estimateScaleTransforms( ff = OMIP021Samples[[1]], fluoMethod = "estimateLogicle", scatterMethod = "linearQuantile", scatterRefMarker = "BV785 - CD3") OMIP021Trans <- CytoPipeline::applyScaleTransforms( OMIP021Samples, transList) # calculate pairwise distances using only FSC-A & SSC-A channels pwDist <- pairwiseEMDDist( x = OMIP021Trans, channels = c("FSC-A", "SSC-A"))