×

T-SNE, forceful colorings, and mean field limits. (English) Zbl 07562340

Summary: t-SNE is a commonly used force-based nonlinear dimensionality reduction method. This paper has two contributions: the first is forceful colorings, an idea that is also applicable to other force-based methods (UMAP, ForceAtlas2,…). In every equilibrium, the attractive and repulsive forces acting on a particle cancel out: however, both the size and the direction of the attractive (or repulsive) forces acting on a particle are related to its properties: the force vector can serve as an additional feature. Secondly, we analyze the case of t-SNE acting on a single homogeneous cluster (modeled by affinities coming from the adjacency matrix of a random \(k\)-regular graph); we derive a mean-field model that leads to interesting questions in classical calculus of variations. The model predicts that, in the limit, the t-SNE embedding of a single perfectly homogeneous cluster is not a point but a thin annulus of diameter \(\sim k^{-1/4}n^{-1/4}\). This is supported by numerical results. The mean field ansatz extends to other force-based dimensionality reduction methods. query Please check the edit made in the article title.

MSC:

68R12 Metric embeddings as related to computational problems and algorithms
82M99 Basic methods in statistical mechanics
62R07 Statistical aspects of big data and data science

References:

[1] Alon, N.; Chung, FK, Explicit construction of linear sized tolerant networks, Discrete Math., 72, 15-19 (1989) · Zbl 0657.05068 · doi:10.1016/0012-365X(88)90189-6
[2] Arora, S.; Hu, W.; Kothari, PK, An analysis of the t-SNE algorithm for data visualization, Proc. Mach. Learn. Res., 75, 1-8 (2018)
[3] Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in Neural Information Processing Systems, pp. 585-591 (2002)
[4] Blaschke, W., Eine isoperimetrische Eigenschaft des Kreises, Math. Z, 1, 52-57 (1918) · JFM 46.0426.01 · doi:10.1007/BF01726042
[5] Böhm, J. N., Berens, P., Kobak, D.: A Unifying Perspective on Neighbor Embeddings along the Attraction-Repulsion Spectrum, arXiv:2007.08902
[6] Bonnet, G., Gusakova, A., Thäle, C., Zaporozhets, D.: Sharp inequalities for the mean distance of random points in convex bodies, arXiv:2010.03351 · Zbl 1467.52010
[7] Burgstaller, B.; Pillichshammer, F., The average distance between two points, Bull. Aust. Math. Soc., 80, 353-359 (2009) · Zbl 1183.52006 · doi:10.1017/S0004972709000707
[8] Tony Cai, T., Ma, R.: Theoretical Foundations of t-SNE for Visualizing High-Dimensional Clustered Data, arXiv:2105.07536
[9] Carreira-Perpinan, M.A.: The elastic embedding algorithm for dimensionality reduction. In: International Conference on Machine Learning, vol. 10, pp. 167-174 (2010)
[10] Coifman, R.; Lafon, S., Diffusion maps, Appl. Comput. Harmonic Anal., 21, 5-30 (2006) · Zbl 1095.68094 · doi:10.1016/j.acha.2006.04.006
[11] Hinton, G.E., Roweis, S.T.: Stochastic neighbor embedding. In: Advances in Neural Information Processing Systems, pp. 857-864 (2003)
[12] Jacomy, M., Venturini, T., Heymann, S., Bastian, M.: ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the Gephi software. PloS ONE, 9(6), (2014)
[13] Kobak, D., Linderman, G.C.: UMAP does not preserve global structure any better than t-SNE when using the same initialization, bioRxiv 2019.12.19.877522
[14] Kobak, D., Linderman, G., Steinerberger, S., Kluger, Y., Berens, P.: Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations, ECML PKDD 2019, Würzburg, Germany, September 16-20, (2019)
[15] Kobak, D.; Berens, P., The art of using t-SNE for single-cell transcriptomics, Nat. Commun., 10, 5416 (2019) · doi:10.1038/s41467-019-13056-x
[16] Liao, J.; Berg, A., Sharpening Jensen’s inequality, Am. Stat., 73, 278-281 (2019) · Zbl 07588155 · doi:10.1080/00031305.2017.1419145
[17] Linderman, G.; Steinerberger, S., Clustering with t-SNE, provably, SIAM J. Math. Data Sci., 1, 313-332 (2019) · Zbl 1499.60259 · doi:10.1137/18M1216134
[18] Linderman, G.; Rachh, M.; Hoskins, JG; Steinerberger, S.; Kluger, Y., Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data, Nat. Meth., 16, 243 (2019) · doi:10.1038/s41592-018-0308-4
[19] McInnes, L., Healy, J., Melville, J.: UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426, (2018)
[20] Pfiefer, R., Maximum and minimum sets for some geometric mean values, J. Theor. Probab., 3, 169-179 (1990) · Zbl 0715.60010 · doi:10.1007/BF01045156
[21] Tang, J., Liu, J., Zhang, M., Mei, Q.: Visualizing large-scale and highdimensional data. In: International Conference on World Wide Web, pp. 287-297 (2016)
[22] van der Maaten, L.; Hinton, G., Visualizing data using t-SNE, J. Mach. Learn. Res., 9, 2008, 2579-2605 (2008) · Zbl 1225.68219
[23] Wang, Y., Huang, H., Rudin, C., Shaposhnik, Y.: Understanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t-SNE, UMAP, TriMAP, and PaCMAP for Data Visualization, arXiv:2012.04456
[24] Wattenberg, M., Viegas, F., Johnson, I.: How to use t-SNE effectively. Distill 1, e2 (2016)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.