×

Entropy-based guidance of deep neural networks for accelerated convergence and improved performance. (English) Zbl 07897644

Summary: Neural networks have dramatically increased our capacity to learn from large, high-dimensional datasets across innumerable disciplines. However, their decisions are not easily interpretable, their computational costs are high, and building and training them are not straightforward processes. To add structure to these efforts, we derive new mathematical results to efficiently measure the changes in entropy as fully-connected and convolutional neural networks process data. By measuring the change in entropy as networks process data effectively, patterns critical to a well-performing network can be visualized and identified. Entropy-based loss terms are developed to improve dense and convolutional model accuracy and efficiency by promoting the ideal entropy patterns. Experiments in image compression, image classification, and image segmentation on benchmark datasets demonstrate these losses guide neural networks to learn rich latent data representations in fewer dimensions, converge in fewer training epochs, and achieve higher accuracy.

MSC:

68Txx Artificial intelligence

References:

[1] Amanova, Narbota; Martin, Jörg; Elster, Clemens, Finding the input features that reduce the entropy of a neural network’s prediction, Appl. Intell., 54, 2, 1922-1936, 2024
[2] Beirlant, Jan; Dudewicz, Edward J.; Györfi, László; Van der Meulen, Edward C., Nonparametric entropy estimation: an overview, Int. J. Math. Stat. Sci., 6, 1, 17-39, 1997 · Zbl 0882.62003
[3] Bickel, Peter J.; Breiman, Leo, Sums of functions of nearest neighbor distances, moment bounds, limit theorems and a goodness of fit test, Ann. Probab., 185-214, 1983 · Zbl 0502.62045
[4] Chen, Xi; Duan, Yan; Houthooft, Rein; Schulman, John; Sutskever, Ilya; Abbeel, Pieter, InfoGAN: interpretable representation learning by information maximizing generative adversarial nets, Adv. Neural Inf. Process. Syst., 29, 2016
[5] Chopra, Hetarth, Coco unet, 2024
[6] Cover, Thomas M.; Thomas, Joy A., Elements of Information Theory, Wiley Series in Telecommunications and Signal Processing, July 2006, Wiley-Interscience · Zbl 1140.94001
[7] Deng, Jia; Dong, Wei; Socher, Richard; Li, Li-Jia; Li, Kai; Li, Fei-Fei, ImageNet: a large-scale hierarchical image database, (2009 IEEE Conference on Computer Vision and Pattern Recognition, June 2009), 248-255, ISSN: 1063-6919
[8] Dmitriev, Yu. G.; Tarasenko, F. P., On the estimation of functionals of the probability density and its derivatives, Theory Probab. Appl., 18, 3, 628-633, 1974 · Zbl 0301.62020
[9] Dong, Yuxin; Gong, Tieliang; Chen, Hong; Li, Chen, Understanding the generalization ability of deep learning algorithms: a kernelized rényi’s entropy perspective, (Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-2023, 2023, International Joint Conferences on Artificial Intelligence Organization), 3642-3650
[10] Dosovitskiy, Alexey; Beyer, Lucas; Kolesnikov, Alexander; Weissenborn, Dirk; Zhai, Xiaohua; Unterthiner, Thomas; Dehghani, Mostafa; Minderer, Matthias; Heigold, Georg; Gelly, Sylvain; Uszkoreit, Jakob; Houlsby, Neil, An image is worth 16x16 words: transformers for image recognition at scale, (9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, 2021, OpenReview.net)
[11] Erdogmus, D.; Hild, K. E.; Principe, J. C., Online entropy manipulation: stochastic information gradient, IEEE Signal Process. Lett., 10, 8, 242-245, August 2003, Conference Name: IEEE Signal Processing Letters
[12] Finnegan, Alex; Song, Jun S., Maximum entropy methods for extracting the learned features of deep neural networks, PLoS Comput. Biol., 13, 10, Article e1005836 pp., 2017
[13] Gabrié, Marylou; Manoel, Andre; Luneau, Clément; Macris, Nicolas; Krzakala, Florent; Zdeborová, Lenka, Entropy and mutual information in models of deep neural networks, Adv. Neural Inf. Process. Syst., 31, 2018
[14] Gajowniczek, Krzysztof; Liang, Yitao; Friedman, Tal; Ząbkowski, Tomasz; Van den Broeck, Guy, Semantic and generalized entropy loss functions for semi-supervised deep learning, Entropy, 22, 3, 2020
[15] Goodfellow, Ian; Pouget-Abadie, Jean; Mirza, Mehdi; Xu, Bing; Warde-Farley, David; Ozair, Sherjil; Courville, Aaron; Bengio, Yoshua, Generative adversarial nets, (Advances in Neural Information Processing Systems, vol. 27, 2014, Curran Associates, Inc.)
[16] Györfi, László; van der Meulen, Edward C., Density-free convergence properties of various estimators of entropy, Comput. Stat. Data Anal., 5, 4, 425-436, 1987 · Zbl 0632.62031
[17] Haarnoja, Tuomas; Zhou, Aurick; Abbeel, Pieter; Levine, Sergey, Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor, (International Conference on Machine Learning, 2018, PMLR), 1861-1870
[18] Han, Shuai; Zhou, Wenbo; Lü, Shuai; Zhu, Sheng; Gong, Xiaoyu, Entropy regularization methods for parameter space exploration, Inf. Sci., 622, 476-489, 2023 · Zbl 1537.94036
[19] Hao, Dong; Zhang, Dongcheng; Shi, Qi; Li, Kai, Entropy regularized actor-critic based multi-agent deep reinforcement learning for stochastic games, Inf. Sci., 617, 17-40, 2022 · Zbl 07813886
[20] Hayashi, Toshitaka; Cimr, Dalibor; Fujita, Hamido; Cimler, Richard, Image entropy equalization: a novel preprocessing technique for image recognition tasks, Inf. Sci., 647, Article 119539 pp., 2023
[21] He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian, Deep residual learning for image recognition, (2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, IEEE), 770-778
[22] Hjelm, R. Devon; Fedorov, Alex; Lavoie-Marchildon, Samuel; Grewal, Karan; Bachman, Philip; Trischler, Adam; Bengio, Yoshua, Learning deep representations by mutual information estimation and maximization, (7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019, OpenReview.net)
[23] Ho, Jonathan; Jain, Ajay; Abbeel, Pieter, Denoising Diffusion Probabilistic Models, Advances in Neural Information Processing Systems, vol. 33, 6840-6851, 2020, Curran Associates, Inc.
[24] Howard, Jeremy; Gugger, Sylvain, Fastai: a layered api for deep learning, Information, 11, 2, 108, March 2019
[25] Kingma, Diederik P.; Welling, Max, Auto-encoding variational bayes, 2013, arXiv
[26] Krizhevsky, Alex, Learning Multiple Layers of Features from Tiny Images, 2009
[27] Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E., ImageNet classification with deep convolutional neural networks, (Pereira, F.; Burges, C. J.; Bottou, L.; Weinberger, K. Q., Advances in Neural Information Processing Systems, vol. 25, 2012, Curran Associates, Inc.)
[28] LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P., Gradient-based learning applied to document recognition, Proc. IEEE, 86, 11, 2278-2324, November 1998, Conference Name: Proceedings of the IEEE
[29] LeCun, Yann; Cortes, Corinna; Burges, C. J., Mnist handwritten digit database, 2, 2010, ATT Labs [Online]. Available:
[30] Lhermitte, Emma; Hilal, Mirvana; Furlong, Ryan; O’Brien, Vincent; Humeau-Heurtier, Anne, Deep learning and entropy-based texture features for color image classification, Entropy, 24, 11, 1577, 2022
[31] Lin, Tsung-Yi; Maire, Michael; Belongie, Serge; Hays, James; Perona, Pietro; Ramanan, Deva; Dollár, Piotr; Zitnick, C. Lawrence, Microsoft COCO: common objects in context, (Fleet, David; Pajdla, Tomas; Schiele, Bernt; Tuytelaars, Tinne, Computer Vision - ECCV 2014. Computer Vision - ECCV 2014, Lecture Notes in Computer Science, 2014, Springer International Publishing: Springer International Publishing Cham), 740-755
[32] Linsker, Ralph, An Application of the Principle of Maximum Information Preservation to Linear Systems, Advances in Neural Information Processing Systems, vol. 1, 1988, Morgan-Kaufmann
[33] Liu, Pingping; Shi, Lida; Miao, Zhuang; Jin, Baixin; Zhou, Qiuzhan, Relative distribution entropy loss function in cnn image retrieval, Entropy, 22, 3, 2020
[34] Mnih, Volodymyr; Badia, Adria Puigdomenech; Mirza, Mehdi; Graves, Alex; Lillicrap, Timothy; Harley, Tim; Silver, David; Kavukcuoglu, Koray, Asynchronous methods for deep reinforcement learning, (Balcan, Maria Florina; Weinberger, Kilian Q., Proceedings of the 33rd International Conference on Machine Learning, New York, New York, USA, 20-22 Jun 2016. Proceedings of the 33rd International Conference on Machine Learning, New York, New York, USA, 20-22 Jun 2016, Proceedings of Machine Learning, vol. 48, 2016, PMLR), 1928-1937
[35] Ouyang, Long; Wu, Jeffrey; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie; Askell, Amanda; Welinder, Peter; Christiano, Paul F.; Leike, Jan; Lowe, Ryan, Training language models to follow instructions with human feedback, Adv. Neural Inf. Process. Syst., 35, 27730-27744, December 2022
[36] Principe, Jose C.; Xu, Dongxin; Zhao, Qun; Fisher, John W., Learning from examples with information theoretic criteria. journal of VLSI signal processing systems for signal, image and video technology, J. VLSI Signal Process. Syst. Signal Image Video Technol., 26, 1, 61-77, August 2000 · Zbl 0965.68135
[37] Qi, Di; Majda, Andrew J., Using machine learning to predict extreme events in complex systems, Proc. Natl. Acad. Sci., 117, 1, 52-59, 2020 · Zbl 1456.68176
[38] Radford, Alec; Narasimhan, Karthik; Salimans, Tim; Sutskever, Ilya, Improving Language Understanding by Generative Pre-Training, 2018
[39] Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation, 234-241, 2015, Springer International Publishing: Springer International Publishing Cham
[40] Rudy, Samuel H.; Sapsis, Themistoklis P., Output-weighted and relative entropy loss functions for deep learning precursors of extreme events, Phys. D: Nonlinear Phenom., 443, Article 133570 pp., 2023 · Zbl 07639104
[41] Simonyan, Karen; Zisserman, Andrew, Very deep convolutional networks for large-scale image recognition, (Bengio, Yoshua; LeCun, Yann, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015)
[42] Stiennon, Nisan; Ouyang, Long; Wu, Jeffrey; Ziegler, Daniel; Lowe, Ryan; Voss, Chelsea; Radford, Alec; Amodei, Dario; Christiano, Paul F., Learning to Summarize with Human Feedback, Advances in Neural Information Processing Systems, vol. 33, 3008-3021, 2020, Curran Associates, Inc.
[43] Tarasenko, F. P., On the evaluation of an unknown probability density function, the direct estimation of the entropy from independent observations of a continuous random variable, and the distribution-free entropy test of goodness-of-fit, Proc. IEEE, 56, 11, 2052-2053, 1968
[44] Tishby, Naftali; Pereira, Fernando C.; Bialek, William, The information bottleneck method, (Proceedings of the 37th Annual Allerton Conference on Communication Control and Computing, 1999), 368-377
[45] Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Łukasz; Polosukhin, Illia, Attention is all you need, Adv. Neural Inf. Process. Syst., 30, 2017
[46] Vincent, Pascal; Larochelle, Hugo; Lajoie, Isabelle; Bengio, Yoshua; Manzagol, Pierre-Antoine, Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion, J. Mach. Learn. Res., 11, 110, 3371-3408, 2010 · Zbl 1242.68256
[47] Xu, Sheng; Li, Yanjing; Lin, Mingbao; Gao, Peng; Guo, Guodong; Lü, Jinhu; Zhang, Baochang, Q-DETR: an efficient low-bit quantized detection transformer, (Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023), 3842-3851
[48] Yu, Shujian; Giraldo, Luis Sanchez; Principe, Jose, Information-theoretic methods in deep neural networks: recent advances and emerging opportunities, (Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-2021, vol. 5, August 2021, International Joint Conferences on Artificial Intelligence Organization), 4669-4678
[49] Yu, Xi; Yu, Shujian; Príncipe, José C., Deep: deterministic information bottleneck with matrix-based entropy functional, (ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), June 2021), 3160-3164, ISSN: 2379-190X
[50] Özdenizci, Ozan; Deniz, Erdoğmuş, Stochastic mutual information gradient estimation for dimensionality reduction networks, Inf. Sci., 570, 298-305, 2021
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.