The external R package igraph is required for the computation of the normalized mutual information to assess the results of the clustering.
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
We now run SIMLR as an example on an input dataset from Buettner, Florian, et al. “Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells.” Nature biotechnology 33.2 (2015): 155-160. For this dataset we have a ground true of 3 cell populations, i.e., clusters.
set.seed(11111)
data(BuettnerFlorian)
example = SIMLR(X = BuettnerFlorian$in_X, c = BuettnerFlorian$n_clust, cores.ratio = 0)
## Computing the multiple Kernels.
## Performing network diffiusion.
## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5
## Iteration: 6
## Iteration: 7
## Iteration: 8
## Iteration: 9
## Iteration: 10
## Iteration: 11
## Warning in SIMLR(X = BuettnerFlorian$in_X, c = BuettnerFlorian$n_clust, : Maybe
## you should set a larger value of c.
## Performing t-SNE.
## Epoch: Iteration # 100 error is: 0.09963693
## Epoch: Iteration # 200 error is: 0.06058465
## Epoch: Iteration # 300 error is: 0.05917217
## Epoch: Iteration # 400 error is: 0.05894224
## Epoch: Iteration # 500 error is: 0.05875543
## Epoch: Iteration # 600 error is: 0.05859694
## Epoch: Iteration # 700 error is: 0.05846076
## Epoch: Iteration # 800 error is: 0.05834064
## Epoch: Iteration # 900 error is: 0.05823708
## Epoch: Iteration # 1000 error is: 0.05814519
## Performing Kmeans.
## Performing t-SNE.
## Epoch: Iteration # 100 error is: 10.68672
## Epoch: Iteration # 200 error is: 0.7104486
## Epoch: Iteration # 300 error is: 0.5736699
## Epoch: Iteration # 400 error is: 0.7495346
## Epoch: Iteration # 500 error is: 0.4888358
## Epoch: Iteration # 600 error is: 0.7009221
## Epoch: Iteration # 700 error is: 0.5901729
## Epoch: Iteration # 800 error is: 0.4479219
## Epoch: Iteration # 900 error is: 0.4278976
## Epoch: Iteration # 1000 error is: 0.4230764
We now compute the normalized mutual information between the inferred clusters by SIMLR and the true ones. This measure with values in [0,1], allows us to assess the performance of the clustering with higher values reflecting better performance.
## [1] 0.888298
As a further understanding of the results, we now visualize the cell populations in a plot.
plot(example$ydata,
col = c(topo.colors(BuettnerFlorian$n_clust))[BuettnerFlorian$true_labs[,1]],
xlab = "SIMLR component 1",
ylab = "SIMLR component 2",
pch = 20,
main="SIMILR 2D visualization for BuettnerFlorian")
We also run SIMLR feature ranking on the same inputs to get a rank of the key genes with the related pvalues.
## 1
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## 58
## 59
## 60
## 61
## 62
## 63
## 64
## 65
## 66
## 67
## 68
## 69
## 70
## 71
## 72
## 73
## 74
## 75
## 76
## 77
## 78
## 79
## 80
## 81
## 82
## 83
## 84
## 85
## 86
## 87
## 88
## 89
## 90
## 91
## 92
## 93
## 94
## 95
## 96
## 97
## 98
## 99
## 100
## [1] 1.086748e-128 1.189327e-90 5.504924e-80 4.652359e-75 5.593957e-73
## [6] 3.373056e-69
## [1] 5701 1689 7549 57 2653 8081
Similarly, we show an example for SIMLR large scale on an input dataset being a reduced version of the dataset provided in Buettner, Zeisel, Amit, et al. “Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq.” Science 347.6226 (2015): 1138-1142. For this dataset we have a ground true of 9 cell populations, i.e., clusters. But we should notice that here we use for computational reasons a reduced version of the data, so please refer to the original publication for the full data.
set.seed(11111)
data(ZeiselAmit)
example_large_scale = SIMLR_Large_Scale(X = ZeiselAmit$in_X, c = ZeiselAmit$n_clust, kk = 10)
## Performing fast PCA.
## Performing k-nearest neighbour search.
## Computing the multiple Kernels.
## Performing the iterative procedure 5 times.
## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5
## Performing Kmeans.
## Performing t-SNE.
## The main loop will be now performed with a maximum of 300 iterations.
## Performing iteration 1.
## Performing iteration 2.
## Performing iteration 3.
## Performing iteration 4.
## Performing iteration 5.
## Performing iteration 6.
## Performing iteration 7.
## Performing iteration 8.
## Performing iteration 9.
## Performing iteration 10.
## Performing iteration 11.
## Performing iteration 12.
## Performing iteration 13.
## Performing iteration 14.
## Performing iteration 15.
## Performing iteration 16.
## Performing iteration 17.
## Performing iteration 18.
## Performing iteration 19.
## Performing iteration 20.
## Performing iteration 21.
## Performing iteration 22.
## Performing iteration 23.
## Performing iteration 24.
## Performing iteration 25.
## Performing iteration 26.
## Performing iteration 27.
## Performing iteration 28.
## Performing iteration 29.
## Performing iteration 30.
## Performing iteration 31.
## Performing iteration 32.
## Performing iteration 33.
## Performing iteration 34.
## Performing iteration 35.
## Performing iteration 36.
## Performing iteration 37.
## Performing iteration 38.
## Performing iteration 39.
## Performing iteration 40.
## Performing iteration 41.
## Performing iteration 42.
## Performing iteration 43.
## Performing iteration 44.
## Performing iteration 45.
## Performing iteration 46.
## Performing iteration 47.
## Performing iteration 48.
## Performing iteration 49.
## Performing iteration 50.
## Performing iteration 51.
## Performing iteration 52.
## Performing iteration 53.
## Performing iteration 54.
## Performing iteration 55.
## Performing iteration 56.
## Performing iteration 57.
## Performing iteration 58.
## Performing iteration 59.
## Performing iteration 60.
## Performing iteration 61.
## Performing iteration 62.
## Performing iteration 63.
## Performing iteration 64.
## Performing iteration 65.
## Performing iteration 66.
## Performing iteration 67.
## Performing iteration 68.
## Performing iteration 69.
## Performing iteration 70.
## Performing iteration 71.
## Performing iteration 72.
## Performing iteration 73.
## Performing iteration 74.
## Performing iteration 75.
## Performing iteration 76.
## Performing iteration 77.
## Performing iteration 78.
## Performing iteration 79.
## Performing iteration 80.
## Performing iteration 81.
## Performing iteration 82.
## Performing iteration 83.
## Performing iteration 84.
## Performing iteration 85.
## Performing iteration 86.
## Performing iteration 87.
## Performing iteration 88.
## Performing iteration 89.
## Performing iteration 90.
## Performing iteration 91.
## Performing iteration 92.
## Performing iteration 93.
## Performing iteration 94.
## Performing iteration 95.
## Performing iteration 96.
## Performing iteration 97.
## Performing iteration 98.
## Performing iteration 99.
## Performing iteration 100.
## Performing iteration 101.
## Performing iteration 102.
## Performing iteration 103.
## Performing iteration 104.
## Performing iteration 105.
## Performing iteration 106.
## Performing iteration 107.
## Performing iteration 108.
## Performing iteration 109.
## Performing iteration 110.
## Performing iteration 111.
## Performing iteration 112.
## Performing iteration 113.
## Performing iteration 114.
## Performing iteration 115.
## Performing iteration 116.
## Performing iteration 117.
## Performing iteration 118.
## Performing iteration 119.
## Performing iteration 120.
## Performing iteration 121.
## Performing iteration 122.
## Performing iteration 123.
## Performing iteration 124.
## Performing iteration 125.
## Performing iteration 126.
## Performing iteration 127.
## Performing iteration 128.
## Performing iteration 129.
## Performing iteration 130.
## Performing iteration 131.
## Performing iteration 132.
## Performing iteration 133.
## Performing iteration 134.
## Performing iteration 135.
## Performing iteration 136.
## Performing iteration 137.
## Performing iteration 138.
## Performing iteration 139.
## Performing iteration 140.
## Performing iteration 141.
## Performing iteration 142.
## Performing iteration 143.
## Performing iteration 144.
## Performing iteration 145.
## Performing iteration 146.
## Performing iteration 147.
## Performing iteration 148.
## Performing iteration 149.
## Performing iteration 150.
## Performing iteration 151.
## Performing iteration 152.
## Performing iteration 153.
## Performing iteration 154.
## Performing iteration 155.
## Performing iteration 156.
## Performing iteration 157.
## Performing iteration 158.
## Performing iteration 159.
## Performing iteration 160.
## Performing iteration 161.
## Performing iteration 162.
## Performing iteration 163.
## Performing iteration 164.
## Performing iteration 165.
## Performing iteration 166.
## Performing iteration 167.
## Performing iteration 168.
## Performing iteration 169.
## Performing iteration 170.
## Performing iteration 171.
## Performing iteration 172.
## Performing iteration 173.
## Performing iteration 174.
## Performing iteration 175.
## Performing iteration 176.
## Performing iteration 177.
## Performing iteration 178.
## Performing iteration 179.
## Performing iteration 180.
## Performing iteration 181.
## Performing iteration 182.
## Performing iteration 183.
## Performing iteration 184.
## Performing iteration 185.
## Performing iteration 186.
## Performing iteration 187.
## Performing iteration 188.
## Performing iteration 189.
## Performing iteration 190.
## Performing iteration 191.
## Performing iteration 192.
## Performing iteration 193.
## Performing iteration 194.
## Performing iteration 195.
## Performing iteration 196.
## Performing iteration 197.
## Performing iteration 198.
## Performing iteration 199.
## Performing iteration 200.
## Performing iteration 201.
## Performing iteration 202.
## Performing iteration 203.
## Performing iteration 204.
## Performing iteration 205.
## Performing iteration 206.
## Performing iteration 207.
## Performing iteration 208.
## Performing iteration 209.
## Performing iteration 210.
## Performing iteration 211.
## Performing iteration 212.
## Performing iteration 213.
## Performing iteration 214.
## Performing iteration 215.
## Performing iteration 216.
## Performing iteration 217.
## Performing iteration 218.
## Performing iteration 219.
## Performing iteration 220.
## Performing iteration 221.
## Performing iteration 222.
## Performing iteration 223.
## Performing iteration 224.
## Performing iteration 225.
## Performing iteration 226.
## Performing iteration 227.
## Performing iteration 228.
## Performing iteration 229.
## Performing iteration 230.
## Performing iteration 231.
## Performing iteration 232.
## Performing iteration 233.
## Performing iteration 234.
## Performing iteration 235.
## Performing iteration 236.
## Performing iteration 237.
## Performing iteration 238.
## Performing iteration 239.
## Performing iteration 240.
## Performing iteration 241.
## Performing iteration 242.
## Performing iteration 243.
## Performing iteration 244.
## Performing iteration 245.
## Performing iteration 246.
## Performing iteration 247.
## Performing iteration 248.
## Performing iteration 249.
## Performing iteration 250.
## Performing iteration 251.
## Performing iteration 252.
## Performing iteration 253.
## Performing iteration 254.
## Performing iteration 255.
## Performing iteration 256.
## Performing iteration 257.
## Performing iteration 258.
## Performing iteration 259.
## Performing iteration 260.
## Performing iteration 261.
## Performing iteration 262.
## Performing iteration 263.
## Performing iteration 264.
## Performing iteration 265.
## Performing iteration 266.
## Performing iteration 267.
## Performing iteration 268.
## Performing iteration 269.
## Performing iteration 270.
## Performing iteration 271.
## Performing iteration 272.
## Performing iteration 273.
## Performing iteration 274.
## Performing iteration 275.
## Performing iteration 276.
## Performing iteration 277.
## Performing iteration 278.
## Performing iteration 279.
## Performing iteration 280.
## Performing iteration 281.
## Performing iteration 282.
## Performing iteration 283.
## Performing iteration 284.
## Performing iteration 285.
## Performing iteration 286.
## Performing iteration 287.
## Performing iteration 288.
## Performing iteration 289.
## Performing iteration 290.
## Performing iteration 291.
## Performing iteration 292.
## Performing iteration 293.
## Performing iteration 294.
## Performing iteration 295.
## Performing iteration 296.
## Performing iteration 297.
## Performing iteration 298.
## Performing iteration 299.
## Performing iteration 300.
We compute the normalized mutual information between the inferred clusters by SIMLR large scale and the true ones.
## [1] 0.04158302
As a further understanding of the results, also in this case we visualize the cell populations in a plot.
plot(example_large_scale$ydata,
col = c(topo.colors(ZeiselAmit$n_clust))[ZeiselAmit$true_labs[,1]],
xlab = "SIMLR component 1",
ylab = "SIMLR component 2",
pch = 20,
main="SIMILR 2D visualization for ZeiselAmit")
Now, as a final example, we also provide the results of two heuristics (see the original SIMLR paper) to estimate the number of clusters from data.
set.seed(53900)
NUMC = 2:5
res_example = SIMLR_Estimate_Number_of_Clusters(BuettnerFlorian$in_X,
NUMC = NUMC,
cores.ratio = 0)
Best number of clusters, K1 heuristic:
## [1] 2
K2 heuristic:
## [1] 2
Results of the two heuristics:
## $K1
## [1] -63.04223 -19.49278 -23.77182 13.30109
##
## $K2
## [1] -94.56335 -25.99037 -29.71477 15.96130