×

On dynamical system modeling of learned primal-dual with a linear operator \(\mathcal{K}\): stability and convergence properties. (English) Zbl 07867349

Summary: Learned Primal-Dual (LPD) is a deep learning based method for composite optimization problems that is based on unrolling/unfolding the primal-dual hybrid gradient algorithm. While achieving great successes in applications, the mathematical interpretation of LPD as a truncated iterative scheme is not necessarily sufficient to fully understand its properties. In this paper, we study the LPD with a general linear operator. We model the forward propagation of LPD as a system of difference equations and a system of differential equations in discrete- and continuous-time settings (for primal and dual variables/trajectories), which are named discrete-time LPD and continuous-time LPD, respectively. Forward analyses such as stabilities and the convergence of the state variables of the discrete-time LPD to the solution of continuous-time LPD are given. Moreover, we analyze the learning problems with/without regularization terms of both discrete-time and continuous-time LPD from the optimal control viewpoint. We prove convergence results of their optimal solutions with respect to the network state initialization and training data, showing in some sense the topological stability of the learning problems. We also establish convergence from the solution of the discrete-time LPD learning problem to that of the continuous-time LPD learning problem through a piecewise linear extension, under some appropriate assumptions on the space of learnable parameters. This study demonstrates theoretically the robustness of the LPD structure and the associated training process, and can induce some future research and applications.
{© 2024 IOP Publishing Ltd}

MSC:

68T07 Artificial neural networks and deep learning
Full Text: DOI

References:

[1] Adams, R. A.; Fournier, J. J., Sobolev Spaces, 2003, Elsevier · Zbl 1098.46001
[2] Adler, J.; Öktem, O., Learned primal-dual reconstruction, IEEE Trans. Med. Imaging, 37, 1322-32, 2018 · doi:10.1109/TMI.2018.2799231
[3] Arridge, S.; Maass, P.; Öktem, O.; Schönlieb, C-B, Solving inverse problems using data-driven models, Acta Numer., 28, 1-174, 2019 · Zbl 1429.65116 · doi:10.1017/S0962492919000059
[4] Arrow, K. J.; Hurwicz, L.; Uzawa, H., Studies in Linear and non-Linear Programming, 1958, Stanford University Press · Zbl 0091.16002
[5] Braides, A., Gamma-Convergence for Beginners, vol 22, 2002, Clarendon · Zbl 1198.49001
[6] Cai, J-F; Dong, B.; Osher, S.; Shen, Z., Image restoration: total variation, wavelet frames and beyond, J. Am. Math. Soc., 25, 1033-89, 2012 · Zbl 1277.35019 · doi:10.1090/S0894-0347-2012-00740-1
[7] Chambolle, A.; Pock, T., A first-order primal-dual algorithm for convex problems with applications to imaging, J. Math. Imaging Vis., 40, 120-45, 2011 · Zbl 1255.68217 · doi:10.1007/s10851-010-0251-1
[8] Chang, B.; Chen, M.; Haber, E.; Chi, E. H., Antisymmetricrnn: a dynamical system view on recurrent neural networks, 2019
[9] Chen, X.; Zhang, Y.; Reisinger, C.; Song, L., Understanding deep architecture with reasoning layer, 2020
[10] Chen, Y.; Lan, G.; Ouyang, Y., Optimal primal-dual methods for a class of saddle point problems, SIAM J. Optim., 24, 1779-814, 2014 · Zbl 1329.90090 · doi:10.1137/130919362
[11] Chen, Y.; Pock, T., Trainable nonlinear reaction diffusion: a flexible framework for fast and effective image restoration, IEEE Trans. Pattern Anal. Mach. Intell., 39, 1256-72, 2016 · doi:10.1109/TPAMI.2016.2596743
[12] Combettes, P. L.; Pesquet, J-C, Deep neural network structures solving variational inequalities, Set Valued Var. Anal., 28, 491-518, 2020 · Zbl 1448.49014 · doi:10.1007/s11228-019-00526-z
[13] Dal Maso, G., An Introduction to Γ-Convergence, vol 8, 2012, Springer
[14] Daubechies, I.; Defrise, M.; De Mol, C., An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Commun. Pure Appl. Math., 57, 1413-57, 2004 · Zbl 1077.65055 · doi:10.1002/cpa.20042
[15] Dong, C.; Loy, C. C.; He, K.; Tang, X., Image super-resolution using deep convolutional networks, IEEE Trans. Pattern Anal. Mach. Intell., 38, 295-307, 2016 · doi:10.1109/TPAMI.2015.2439281
[16] Dragomir, S SCity, M2022Some gronwall type inequalities and applications(available at: http://rgmia.vu.edu.au/SSDragomirWeb.html)
[17] Ehrhardt, M. J.; Thielemans, K.; Pizarro, L.; Atkinson, D.; Ourselin, S.; Hutton, B. F.; Arridge, S. R., Joint reconstruction of PET-MRI by exploiting structural similarity, Inverse Problems, 31, 2014 · Zbl 1320.92057 · doi:10.1088/0266-5611/31/1/015001
[18] Elliott, C. M., On the convergence of a one-step method for the numerical solution of an ordinary differential inclusion, IMA J. Numer. Anal., 5, 3-21, 1985 · Zbl 0598.65052 · doi:10.1093/imanum/5.1.3
[19] Ernst, P.; Chatterjee, S.; Rose, G.; Speck, O.; Nürnberger, A., Sinogram upsampling using primal-dual UNet for undersampled CT and radial MRI reconstruction, Neural Netw., 166, 704-21, 2023 · doi:10.1016/j.neunet.2023.08.004
[20] Esser, E.; Zhang, X.; Chan, T. F., A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science, SIAM J. Imaging Sci., 3, 1015-46, 2010 · Zbl 1206.90117 · doi:10.1137/09076934X
[21] Frecon, J.; Salzo, S.; Pontil, M., Bilevel learning of the group lasso structure, 2018
[22] Gao, Y.; Pan, X.; Chen, C., An extended primal-dual algorithm framework for nonconvex problems: application to image reconstruction in spectral CT, Inverse Problems, 38, 2022 · Zbl 1510.65044 · doi:10.1088/1361-6420/ac79c8
[23] Géron, A., Hands-on Machine Learning With Scikit-Learn, Keras and Tensorflow, 2022, O’Reilly Media, Inc.
[24] Gilton, D.; Ongie, G.; Willett, R., Deep equilibrium architectures for inverse problems in imaging, IEEE Trans. Comput. Imaging, 7, 1123-33, 2021 · doi:10.1109/TCI.2021.3118944
[25] Glowinski, R.; Osher, S. J.; Yin, W., Splitting Methods in Communication, Imaging, Science and Engineering, 2017, Springer
[26] Goldstein, T.; Li, M.; Yuan, X.; Esser, E.; Baraniuk, R., Adaptive primal-dual hybrid gradient methods for saddle-point problems, 2013
[27] Gregor, K.; LeCun, Y., Learning fast approximations of sparse coding, pp 399-406, 2010
[28] Haber, E.; Lensink, K.; Treister, E.; Ruthotto, L., Imexnet a forward stable deep neural network, pp 2525-34, 2019
[29] Haber, E.; Ruthotto, L., Stable architectures for deep neural networks, Inverse Problems, 34, 2017 · Zbl 1426.68236 · doi:10.1088/1361-6420/aa9a90
[30] He, B.; Ma, F.; Xu, S.; Yuan, X., A generalized primal-dual algorithm with improved convergence condition for saddle point problems, SIAM J. Imaging Sci., 15, 1157-83, 2022 · Zbl 1494.94009 · doi:10.1137/21M1453463
[31] He, B.; You, Y.; Yuan, X., On the convergence of primal-dual hybrid gradient algorithm, SIAM J. Imaging Sci., 7, 2526-37, 2014 · Zbl 1308.90129 · doi:10.1137/140963467
[32] He, BYuan, X2010Convergence analysis of primal-dual algorithms for total variation image restorationTechnical Report
[33] He, K.; Zhang, X.; Ren, S.; Sun, J., Deep residual learning for image recognition, pp 770-8, 2016
[34] Hochreiter, S.; Schmidhuber, J., Long short-term memory, Neural Comput., 9, 1735-80, 1997 · doi:10.1162/neco.1997.9.8.1735
[35] Kobler, E.; Effland, A.; Kunisch, K.; Pock, T., Total deep variation for linear inverse problems, pp 7549-58, 2020
[36] Kobler, E.; Klatzer, T.; Hammernik, K.; Pock, T., Variational networks: connecting variational methods and deep learning, pp 281-93, 2017, Springer
[37] LeCun, Y.; Bengio, Y.; Hinton, G., Deep learning, Nature, 521, 436-44, 2015 · doi:10.1038/nature14539
[38] LeCun, Y.; Touretzky, D.; Hinton, G.; Sejnowski, T., A theoretical framework for back-propagation, Proceedings of the 1988 Connectionist Models Summer School, 21-28, 1988, Morgan Kaufmann
[39] Leoni, G., A First Course in Sobolev Spaces, 2017, American Mathematical Society · Zbl 1382.46001
[40] Li, Q.; Hao, S., An optimal control approach to deep learning and applications to discrete-weight neural networks, pp 2985-94, 2018
[41] Li, Q.; Lin, T.; Shen, Z., Deep learning via dynamical systems: an approximation perspective, J. Eur. Math. Soc., 25, 1671-709, 2022 · Zbl 1528.41106 · doi:10.4171/jems/1221
[42] Li, Y.; Tofighi, M.; Geng, J.; Monga, V.; Eldar, Y. C., Efficient and interpretable deep blind image deblurring via algorithm unrolling, IEEE Trans. Comput. Imaging, 6, 666-81, 2020 · doi:10.1109/TCI.2020.2964202
[43] Lin, XWu, C2023Deep layer limit and stability analysis of the basic forward-backward-splitting induced network (i): feed-forward systemssubmitted
[44] Lin, XWu, C2023Deep layer limit and stability analysis of the basic forward-backward-splitting induced network (ii): learning problemssubmitted
[45] Long, Z.; Lu, Y.; Ma, X.; Dong, B., Pde-net: learning pdes from data, pp 3208-16, 2018
[46] Loshchilov, I.; Hutter, F., SGDR: stochastic gradient descent with warm restarts, 2016
[47] Lu, Y.; Zhong, A.; Li, Q.; Dong, B., Beyond finite layer neural networks: bridging deep architectures and numerical differential equations, pp 3276-85, 2018
[48] Mikolov, T.; Karafiát, M.; Burget, L.; Cernocký, J.; Khudanpur, S., Recurrent neural network based language model, pp 1045-8, 2010, Makuhari
[49] Monga, V.; Li, Y.; Eldar, Y. C., Algorithm unrolling: interpretable, efficient deep learning for signal and image processing, IEEE Signal Process. Mag., 38, 18-44, 2021 · doi:10.1109/MSP.2020.3016905
[50] Moriakov, N.; Michielsen, K.; Adler, J.; Mann, R.; Sechopoulos, I.; Teuwen, J., Deep learning framework for digital breast tomosynthesis reconstruction, Medical Imaging 2019: Physics of Medical Imaging, vol 10948, pp 9-14, 2019, SPIE
[51] Mukherjee, S.; Hauptmann, A.; Öktem, O.; Pereyra, M.; Schönlieb, C-B, Learned reconstruction methods with convergence guarantees: a survey of concepts and applications, IEEE Signal Process. Mag., 40, 164-82, 2023 · doi:10.1109/MSP.2022.3207451
[52] Ongie, G.; Jalal, A.; Metzler, C. A.; Baraniuk, R. G.; Dimakis, A. G.; Willett, R., Deep learning techniques for inverse problems in imaging, IEEE J. Sel. Areas Inf. Theory, 1, 39-56, 2020 · doi:10.1109/JSAIT.2020.2991563
[53] Rudin, L. I.; Osher, S.; Fatemi, E., Nonlinear total variation based noise removal algorithms, Physica D, 60, 259-68, 1992 · Zbl 0780.49028 · doi:10.1016/0167-2789(92)90242-F
[54] Rudzusika, J.; Bajic, B.; Öktem, O.; Schönlieb, C-B; Etmann, C., Invertible learned primal-dual, 2021
[55] Ruthotto, L.; Haber, E., Deep neural networks motivated by partial differential equations, J. Math. Imaging Vis., 62, 352-64, 2020 · Zbl 1434.68522 · doi:10.1007/s10851-019-00903-1
[56] Ryu, E. K.; Yin, W., Large-Scale Convex Optimization: Algorithm Analysis via Monotone Operators, 2022, Cambridge University Press
[57] Scherzer, O., Handbook of Mathematical Methods in Imaging, 2010, Springer
[58] Sherstinsky, A., Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network, Physica D, 404, 2020 · Zbl 1508.68298 · doi:10.1016/j.physd.2019.132306
[59] Teschl, G., Ordinary Differential Equations and Dynamical Systems, vol 140, 2012, American Mathematical Society · Zbl 1263.34002
[60] Thorpe, M.; van Gennip, Y., Deep limits of residual neural networks, Res. Math. Sci., 10, 6, 2023 · Zbl 1538.68057 · doi:10.1007/s40687-022-00370-y
[61] Valkonen, T., A primal-dual hybrid gradient method for nonlinear operators with applications to MRI, Inverse Problems, 30, 2014 · Zbl 1310.47081 · doi:10.1088/0266-5611/30/5/055012
[62] Wei, C.; Lee, J. D.; Liu, Q.; Ma, T., Regularization matters: generalization and optimization of neural nets vs their induced kernel, vol 32, 2019
[63] Weinan, E., A proposal on machine learning via dynamical systems, Commun. Math. Stat., 1, 1-11, 2017 · Zbl 1380.37154 · doi:10.1007/s40304-017-0103-z
[64] Weinan, E.; Li, Q., A mean-field optimal control formulation of deep learning, Res. Math. Sci., 6, 10, 2018 · Zbl 1421.49021 · doi:10.1007/s40687-018-0172-y
[65] Yang, Y.; Sun, J.; Li, H.; Xu, Z., Deep ADMM-net for compressive sensing MRI, 2016
[66] Zhang, H.; Dong, B.; Liu, B., A reweighted joint spatial-radon domain CT image reconstruction model for metal artifact reduction, SIAM J. Imaging Sci., 11, 707-33, 2018 · Zbl 1421.94011 · doi:10.1137/17M1140212
[67] Zhang, J.; Ghanem, B., Ista-net: interpretable optimization-inspired deep network for image compressive sensing, pp 1828-37, 2018
[68] Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L., Beyond a gaussian denoiser: residual learning of deep cnn for image denoising, IEEE Trans. Image Process., 26, 3142-55, 2017 · Zbl 1409.94754 · doi:10.1109/TIP.2017.2662206
[69] Zhang, L.; Schaeffer, H., Forward stability of resnet and its variants, J. Math. Imaging Vis., 62, 328-51, 2020 · Zbl 1434.68528 · doi:10.1007/s10851-019-00922-y
[70] Zhu, MChan, T2008An efficient primal-dual hybrid gradient algorithm for total variation image restorationUCLA CAM Report
[71] Shepp, L. A.; Logan, B. F., The Fourier reconstruction of a head section, IEEE Trans. Nucl. Sci., 21, 21-43, 1974 · doi:10.1109/TNS.1974.6499235
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.