sigma
The other important parameter for DiffusionMap
is the
Gaussian kernel width sigma
(σ) that determines the transition
probability between data points. The default call of
destiny – DiffusionMap(data)
aka
DiffusionMap(data, 'local')
) – uses a local
sigma
per cell, derived from a local density estimate
around each cell.
Using the 1.0 default, sigma = 'global'
, estimates sigma
using a heuristic. It is also possible to specify this parameter
manually to tweak the result. The eigenvector plot explained above will
show a continuous decline instead of sharp drops if either the dataset
is too big or the sigma
is chosen too small.
The sigma estimation algorithm is explained in detail in Haghverdi, Buettner, and Theis (2015). In brief,
it works by finding a maximum in the slope of the log-log plot of local
density versus sigma
.
An efficient variant of that procedure is provided by
find_sigmas
. This function determines the optimal sigma for
a subset of the given data and provides the default sigma for a
DiffusionMap
call. Due to a different starting point, the
resulting sigma is different from above:
## [1] 10.8946
The resulting diffusion map’s approximation depends on the chosen sigma. Note that the sigma estimation heuristic only finds local optima and even the global optimum of the heuristic might not be ideal for your data.