×

Conditional density estimation, latent variable discovery, and optimal transport. (English) Zbl 07513414

Summary: A framework is proposed that addresses both conditional density estimation and latent variable discovery. The objective function maximizes explanation of variability in the data, achieved through the optimal transport barycenter generalized to a collection of conditional distributions indexed by a covariate – either given or latent – in any suitable space. Theoretical results establish the existence of barycenters, a minimax formulation of optimal transport maps, and a general characterization of variability via the optimal transport cost. This framework leads to a family of nonparametric neural network-based algorithms, the BaryNet, with a supervised version that estimates conditional distributions and an unsupervised version that assigns latent variables. The efficacy of BaryNets is demonstrated by tests on both artificial and real-world data sets. A parallel drawn between autoencoders and the barycenter framework leads to the Barycentric autoencoder algorithm (BAE).

MSC:

62-XX Statistics
68-XX Computer science

References:

[1] Advani, M. S.; Saxe, A. M.High‐dimensional dynamics of generalization error in neural networks. Preprint, 2017. 1710.03667[stat.ML]
[2] AgnelliJ. P.; CadeirasM.; TabakE. G.; TurnerC. V.; Vanden‐EijndenE.Clustering and classification through normalizing flows in feature space. Multiscale Model. Simul.8 (2010) no. 51784-1802. doi: https://doi.org/10.1137/100783522 · Zbl 1208.62100 · doi:10.1137/100783522
[3] Agueh, M.; Carlier, G.Barycenters in the Wasserstein space. SIAM J. Math. Anal.43 (2011), no. 2, 904-924. doi: https://doi.org/10.1137/100805741 · Zbl 1223.49045 · doi:10.1137/100805741
[4] Alpaydin, E.Introduction to machine learning. Second edition. MIT Press, Cambridge, Mass., 2009.
[5] Arjovsky, M.; Chintala, S.; Bottou, L.Wasserstein GAN. Preprint, 2017. 1701.07875[stat.ML]
[6] BillingsleyP. Convergence of probability measures. Second edition. WileyNew York1999 · Zbl 0944.60003
[7] Bishop, C. M.Mixture density networks. WorkingPaper. Aston University, Birmingham, UK, 1994.
[8] Bishop, C. M.Pattern recognition and machine learning. Information Science and Statistics. Springer, New York, 2006. doi: https://doi.org/10.1007/978‐0‐387‐45528‐0 · Zbl 1107.68072 · doi:10.1007/978‐0‐387‐45528‐0
[9] Carlier, G.; Ekeland, I.Matching for teams. Econom. Theory42 (2010), no. 2, 397-418. doi: https://doi.org/10.1007/s00199‐008‐0415‐z · Zbl 1183.91112 · doi:10.1007/s00199‐008‐0415‐z
[10] Chang, J. T.; Pollard, D.Conditioning as disintegration. Statist. Neerlandica51 (1997), no. 3, 287-317. doi: https://doi.org/10.1111/1467‐9574.00056 · Zbl 0889.62003 · doi:10.1111/1467‐9574.00056
[11] Chiappori, P.‐A.; McCann, R. J.; Nesheim, L. P.Hedonic price equilibria, stable matching, and optimal transport: equivalence, topology, and uniqueness. Econom. Theory42 (2010), no. 2, 317-354. doi: https://doi.org/10.1007/s00199‐009‐0455‐z · Zbl 1183.91056 · doi:10.1007/s00199‐009‐0455‐z
[12] E, W.; Ma, C.; Wu, L. A priori estimates of the population risk for two‐layer neural networks. Commun. Math. Sci.17 (2019), no. 5, 1407‐–1425. doi: https://doi.org/10.4310/CMS.2019.v17.n5.a11 · Zbl 1427.68277 · doi:10.4310/CMS.2019.v17.n5.a11
[13] E, W.; Ma, C.; Wu, L. Barron spaces and the compositional function spaces for neural network models. Preprint, 2019. 1906.08039[cs.LG]
[14] EssidM.; TabakE.; TrigilaG.An implicit gradient‐descent procedure for minimax problems. Preprint2019. 1906.00233 [math.OC]
[15] GoodfellowI.; BengioY.; CourvilleA. Deep learning. Adaptive Computation and MachineLearning. MIT PressCambridgeMass.2016 · Zbl 1373.68009
[16] Goodfellow, I.; Pouget‐Abadie, J.; Mirza, M.; Xu, B.; Warde‐Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Advances in neural information processing systems, 2672-2680, 2014.
[17] Gretton, A.; Borgwardt, K. M.; Rasch, M. J.; Sch0olkopf, B.; Smola, A. A kernel two‐sample test. J. Mach. Learn. Res.13 (2012), 723-773. · Zbl 1283.62095
[18] Hastie, T.; Tibshirani, R.; Friedman, J.; Franklin, J.The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer27 (2005), no. 2, 83-85.
[19] He, K.; Zhang, X.; Ren, S.; Sun, J.Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, 770-778.
[20] Holmes, M. P.; Gray, A. G.; Isbell, C. L.Fast nonparametric conditional density estimation. Preprint, 2012. 1206.5278[stat.ME]
[21] HornikK. Approximation capabilities of multilayer feedforward networks. Neural networks4 (1991) no. 2251-257. doi: https://doi.org/10.1016/0893‐6080(91)90009‐T · doi:10.1016/0893‐6080(91)90009‐T
[22] Hyndman, R. J.; Bashtannyk, D. M.; Grunwald, G. K.Estimating and visualizing conditional densities. J. Comput. Graph. Statist.5 (1996), no. 4, 315-336. doi: https://doi.org/10.2307/1390887 · doi:10.2307/1390887
[23] Kallenberg, O.Random measures, theory and applications. Probability Theory and Stochastic Modelling, 77. Springer, Cham, 2017. doi: https://doi.org/10.1007/978‐3‐319‐41598‐7 · Zbl 1376.60003 · doi:10.1007/978‐3‐319‐41598‐7
[24] Kantorovich, L. V.On the translocation of masses. C. R. (Doklady) Acad. Sci. URSS (N.S.)37 (1942), 199-201. · Zbl 0061.09705
[25] Kim, Y.‐H.; Pass, B.Wasserstein barycenters over Riemannian manifolds. Adv. Math.307 (2017), 640-683. doi: https://doi.org/10.1016/j.aim.2016.11.026 · Zbl 1373.60006 · doi:10.1016/j.aim.2016.11.026
[26] Kingma, D. P.; Welling, M.Auto‐encoding variational Bayes. Preprint, 2013. 1312.6114[stat.ML]
[27] LiX.; PlataniotisK. N.A complete color normalization approach to histopathology images using color cues computed from saturation‐weighted statistics. IEEE Transactions on Biomedical Engineering62 (2015) no. 71862-1873
[28] Li, M.; Soltanolkotabi, M.; Oymak, S.Gradient descent with early stopping is provably robust to label noise for overparameterized neural networks. Preprint, 2019. 1903.11680[cs.LG]
[29] LinT.; JinC.; JordanM. I.On gradient descent ascent for nonconvex‐concave minimax problems. Preprint2019. 1906.00331[cs.LG]
[30] LuZ.; PuH.; WangF.; HuZ.; WangL. The expressive power of neural networks: A view from the width. Advances in neural information processing systems6231-62392017
[31] Makhzani, A.; Shlens, J.; Jaitly, N.; Goodfellow, I.; Frey, B.Adversarial autoencoders. Preprint, 2015. 1511.05644[cs.LG]
[32] MertikopoulosP.; LecouatB.; ZenatiH.; FooC.‐S.; ChandrasekharV.; PiliourasG.Optimistic mirror descent in saddle‐point problems: Going the extra (gradient) mile. International Conference on Learning Representations 1-23. ICLR2019
[33] Mirza, M.; Osindero, S.Conditional generative adversarial nets. Preprint, 2014. 1411.1784[cs.LG]
[34] MongeG.Mémoire sur la théorie des déblais et des remblais. Histoire de l’Académie Royale des Sciences de Paris (1781)
[35] NOAA. Daily temperature data set. https://www1.ncdc.noaa.gov/pub/data/uscrn/products/daily01/ .
[36] NOAA. Hourly temperature data set. https://www1.ncdc.noaa.gov/pub/data/uscrn/products/hourly02/ .
[37] Pass, B.Multi‐marginal optimal transport and multi‐agent matching problems: uniqueness and structure of solutions. Preprint, 2012. 1210.7372[math.AP]
[38] PassB. Optimal transportation with infinitely many marginals. J. Funct. Anal.264 (2013) no. 4947-963. doi: https://doi.org/10.1016/j.jfa.2012.12.002 · Zbl 1258.49073 · doi:10.1016/j.jfa.2012.12.002
[39] Rabin, J.; Peyré, G.; Delon, J.; Bernot, M. Wasserstein barycenter and its application to texture mixing. International Conference on Scale Space and Variational Methods in Computer Vision, 435-446. Springer, 2011.
[40] Radford, A.; Metz, L.; Chintala, S.Unsupervised representation learning with deep convolutional generative adversarial networks. Preprint, 2015. 1511.06434[cs.LG]
[41] RahamanN.; BaratinA.; ArpitD.; DraxlerF.; LinM.; HamprechtF. A.; BengioY.; CourvilleA.On the spectral bias of neural networks. Preprint2018. 1806.08734 [stat.ML]
[42] SimonsS. Minimax theorems and theirproofs. Minimax and applications 1-23. Nonconvex Optimization and Its Applications4. Kluwer Acad.Dordrecht1995 · Zbl 0862.49010
[43] Sipser, M.A definition of information. Introduction to the theory of computation. Third edition. Cengage Learning, Boston, 2013.
[44] Sohn, K.; Lee, H.; Yan, X. Learning structured output representation using deep conditional generative models, Advances in neural information processing systems, 3483-3491, 2015.
[45] Tabak, E. G.; Trigila, G.Explanation of variability and removal of confounding factors from data through optimal transport. Comm. Pure Appl. Math.71 (2018), no. 1, 163-199. doi: https://doi.org/10.1002/cpa.21706 · Zbl 1381.49052 · doi:10.1002/cpa.21706
[46] Tabak, E. G.; Trigila, G.; Zhao, W.Conditional density estimation and simulation through optimal transport. Mach. Learn.109 (2020), no. 4, 665‐–688. · Zbl 1496.62068
[47] Tolstikhin, I.; Bousquet, O.; Gelly, S.; Schoelkopf, B.Wasserstein auto‐encoders. Preprint, 2017. 1711.01558 [stat.ML]
[48] TrippeB. L.; TurnerR. E.Conditional density estimation with Bayesian normalising flows. Preprint2018. 1802.04908 [stat.ML]
[49] USGS. Centennial earthquake catalog. https://earthquake.usgs.gov/data/centennial/centennial_Y2K.CAT2008
[50] Villani, C.Topics in optimal transportation. Graduate Studies in Mathematics, 58. American Mathematical Society, Providence, R.I., 2003. doi: https://doi.org/10.1090/gsm/058 · Zbl 1106.90001 · doi:10.1090/gsm/058
[51] Villani, C. Optimal transport. Old and new. Grundlehren der mathematischen Wissenschaften, 338. Springer, Berlin, 2009. doi: https://doi.org/10.1007/978‐3‐540‐71050‐9 · Zbl 1156.53003 · doi:10.1007/978‐3‐540‐71050‐9
[52] Xu, Z.‐Q. J.; Zhang, Y.; Luo, T.; Xiao, Y.; Ma, Z.Frequency principle: Fourier analysis sheds light on deep neural networks. Preprint, 2019. 1901.06523 [cs.LG]
[53] YangH.; TabakE. G.Clustering factor discovery and optimal transport. Preprint2019. 1902.10288 [math.OC]
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.