×

Accelerated forward-backward optimization using deep learning. (English) Zbl 07836044

Summary: We propose several deep-learning accelerated optimization solvers with convergence guarantees. We use ideas from the analysis of accelerated forward-backward schemes like FISTA, but instead of the classical approach of proving convergence for a choice of parameters, such as a step-size, we show convergence whenever the update is chosen in a specific set. Rather than picking a point in this set using some predefined method, we train a deep neural network to pick the best update within a given space. Finally, we show that the method is applicable to several cases of smooth and nonsmooth optimization and show superior results to established accelerated solvers.

MSC:

90C25 Convex programming
90C06 Large-scale problems in mathematical programming
68T07 Artificial neural networks and deep learning
49M37 Numerical methods based on nonlinear programming
65K10 Numerical optimization and variational techniques

Software:

ODL; GitHub; ASTRA; TensorFlow

References:

[1] Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X., TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, https://arxiv.org/abs/1603.04467v2, 2016.
[2] Adler, J., Kohr, H., and Öktem, O., Operator Discretization Library (ODL), https://github.com/odlgroup/odl/, 2017.
[3] Adler, J. and Öktem, O., Solving ill-posed inverse problems using iterative deep neural networks, Inverse Problems, 33 (2017), 124007, doi:10.1088/1361-6420/aa9581. · Zbl 1394.92070
[4] Adler, J. and Öktem, O., Learned primal-dual reconstruction, IEEE Trans. Med. Imaging, 37 (2018), pp. 1322-1332, doi:10.1109/TMI.2018.2799231.
[5] Ahookhosh, M. and Neumaier, A., An optimal subgradient algorithm for large-scale bound-constrained convex optimization, Math. Methods Oper. Res., 86 (2017), pp. 123-147, doi:10.1007/s00186-017-0585-1. · Zbl 1380.90215
[6] Ahookhosh, M. and Neumaier, A., Optimal subgradient algorithms for large-scale convex optimization in simple domains, Numer. Algorithms, 76 (2017), pp. 1071-1097, doi:10.1007/s11075-017-0297-x. · Zbl 1411.90261
[7] Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T., Shillingford, B., and De Freitas, N., Learning to learn by gradient descent by gradient descent, in Advances in Neural Information Processing Systems, Vol. 29, Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R., eds., 2016, https://papers.nips.cc/paper/2016/hash/fb87582825f9d28a8d42c5e5e5e8b23d-Abstract.html.
[8] Arridge, S., Maass, P., Öktem, O., and Schönlieb, C.-B., Solving inverse problems using data-driven models, Acta Numer., 28 (2019), pp. 1-174, doi:10.1017/S0962492919000059. · Zbl 1429.65116
[9] Banert, S., Ringh, A., Adler, J., Karlsson, J., and ktem, O., Data-driven nonsmooth optimization, SIAM J. Optim., 30 (2020), pp. 102-131, doi:10.1137/18M1207685. · Zbl 1435.90105
[10] Bauschke, H. H. and Combettes, P. L., Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd ed., , Springer, Cham, 2017, doi:10.1007/978-3-319-48311-5. · Zbl 1359.26003
[11] Beck, A. and Teboulle, M., A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2 (2009), pp. 183-202, doi:10.1137/080716542. · Zbl 1175.94009
[12] Bello, I., Zoph, B., Vasudevan, V., and Le, Q. V., Neural optimizer search with reinforcement learning, in Proceedings of the 34th International Conference on Machine Learning, , PMLR, 2017, pp. 459-468, https://proceedings.mlr.press/v70/bello17a.html.
[13] Benning, M. and Burger, M., Modern regularization methods for inverse problems, Acta Numer., 27 (2018), pp. 1-111, doi:10.1017/S0962492918000016. · Zbl 1431.65080
[14] Briceño-Arias, L. M. and Combettes, P. L., A monotone + skew splitting model for composite monotone inclusions in duality, SIAM J. Optim., 21 (2011), pp. 1230-1250, doi:10.1137/10081602X. · Zbl 1239.47053
[15] Burger, M. and Osher, S., A guide to the TV zoo, in Level Set and PDE Based Reconstruction Methods in Imaging, Springer, Cham, 2013, pp. 1-70, doi:10.1007/978-3-319-01712-9_1. · Zbl 1342.94014
[16] Caselles, V., Chambolle, A., and Novaga, M., Total variation in imaging, in Handbook of Mathematical Methods in Imaging, 2nd ed., Springer New York, 2015, pp. 1455-1499, doi:10.1007/978-1-4939-0790-8_23. · Zbl 1331.68264
[17] Cauchy, A., Méthode générale pour la résolution des systemes d’équations simultanées, C. R. Acad. Sci., 25 (1847), pp. 536-538, https://gallica.bnf.fr/ark:/12148/bpt6k2982c/f540.item.
[18] Chen, T., Chen, X., Chen, W., Heaton, H., Liu, J., Wang, Z., and Yin, W., Learning to optimize: A primer and a benchmark, J. Mach. Learn. Res., 23 (2022), pp. 1-59, http://jmlr.org/papers/v23/21-0308.html.
[19] Chen, X., Liu, J., Wang, Z., and Yin, W., Theoretical linear convergence of unfolded ISTA and its practical weights and thresholds, in Advances in Neural Information Processing Systems, Vol. 31, Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., eds., 2018, https://proceedings.neurips.cc/paper/2018/hash/cf8c9be2a4508a24ae92c9d3d379131d-Abstract.html.
[20] Combettes, P. L. and Pesquet, J.-C., Primal-dual splitting algorithm for solving inclusions with mixtures of composite, Lipschitzian, and parallel-sum type monotone operators, Set-Valued Var. Anal., 20 (2012), pp. 307-330, doi:10.1007/s11228-011-0191-y. · Zbl 1284.47043
[21] d’Aspremont, A., Scieur, D., Taylor, A., et al., Acceleration methods, Found. Trends Optim., 5 (2021), pp. 1-245.
[22] Drori, Y. and Teboulle, M., Performance of first-order methods for smooth convex minimization: A novel approach, Math. Program., 145 (2014), pp. 451-482, doi:10.1007/s10107-013-0653-0. · Zbl 1300.90068
[23] Fornasier, M. and Rauhut, H., Compressive sensing, in Handbook of Mathematical Methods in Imaging, 2nd ed., Springer, New York, 2015, pp. 205-256, doi:10.1007/978-1-4939-0790-8_6. · Zbl 1331.94019
[24] Goldstein, A. A., Convex programming in Hilbert space, Bull. Amer. Math. Soc., 70 (1964), pp. 709-710, doi:10.1090/S0002-9904-1964-11178-2. · Zbl 0142.17101
[25] Gonzaga, C. C., Karas, E. W., and Rossetto, D. R., An optimal algorithm for constrained differentiable convex optimization, SIAM J. Optim., 23 (2013), pp. 1939-1955, doi:10.1137/110836602. · Zbl 1288.65087
[26] Gregor, K. and LeCun, Y., Learning fast approximations of sparse coding, in Proceedings of the 27th International Conference on Machine Learning (ICML 2010), , 2010, pp. 399-406, https://icml.cc/Conferences/2010/papers/449.pdf.
[27] Hammernik, K., Klatzer, T., Kobler, E., Recht, M. P., Sodickson, D. K., Pock, T., and Knoll, F., Learning a variational network for reconstruction of accelerated MRI data, Magn. Reson. Med., 79 (2018), pp. 3055-3071, doi:10.1002/mrm.26977.
[28] Heaton, H., Chen, X., Wang, Z., and Yin, W., Safeguarded learned convex optimization, Proc. AAAI Conf. Artif. Intell., 37 (2023), pp. 7848-7855.
[29] Hsieh, J.-T., Zhao, S., Eismann, S., Mirabella, L., and Ermon, S., Learning neural PDE solvers with convergence guarantees, in International Conference on Learning Representations (ICLR 2019), , 2019, https://openreview.net/forum?id=rklaWn0qK7.
[30] Li, K. and Malik, J., Learning to optimize, in International Conference on Learning Representations, , 2016.
[31] Liang, J., Luo, T., and Schönlieb, C.-B., Improving “fast iterative shrinkage-thresholding algorithm”: Faster, smarter, and greedier, SIAM J. Sci. Comput., 44 (2022), pp. A1069-A1091, doi:10.1137/21M1395685. · Zbl 1492.65163
[32] Liu, R., Cheng, S., He, Y., Fan, X., Lin, Z., and Luo, Z., On the convergence of learning-based iterative methods for nonconvex inverse problems, IEEE Trans. Pattern Anal. Mach. Intell., 42 (2020), pp. 3027-3039, doi:10.1109/TPAMI.2019.2920591.
[33] Maheswaranathan, N., Sussillo, D., Metz, L., Sun, R., and Sohl-Dickstein, J., Reverse engineering learned optimizers reveals known and novel mechanisms, in Advances in Neural Information Processing Systems, Vol. 34, Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W., eds., Curran Associates, 2021, pp. 19910-19922, https://proceedings.neurips.cc/paper/2021/file/a57ecd54d4df7d999bd9c5e3b973ec75-Paper.pdf.
[34] McCollough, C. H., Bartley, A. C., Carter, R. E., Chen, B., Drees, T. A., Edwards, P., Holmes, D. R., Huang, A. E., Khan, F., Leng, S., McMillan, K. L., Michalak, G. J., Nunez, K. M., Yu, L., and Fletcher, J. G., Low-dose CT for the detection and classification of metastatic liver lesions: Results of the 2016 Low Dose CT Grand Challenge, Med. Phys., 44 (2017), pp. e339-e352, doi:10.1002/mp.12345.
[35] Metz, L., Maheswaranathan, N., Freeman, C. D., Poole, B., and Sohl-Dickstein, J., Tasks, Stability, Architecture, and Compute: Training More Effective Learned Optimizers, and Using Them to Train Themselves, https://arxiv.org/abs/2009.11243v1, 2020.
[36] Nesterov, Y. E., A method of solving a convex programming problem with convergence rate \(O(1/k^2)\), Sov. Math. Dokl., 27 (1983), pp. 372-376. · Zbl 0535.90071
[37] Nesterov, Y. E., Introductory Lectures on Convex Optimization. A Basic Course, , Springer, New York, 2004, doi:10.1007/978-1-4419-8853-9. · Zbl 1086.90045
[38] Neumaier, A., OSGA: A fast subgradient algorithm with optimal complexity, Math. Program., 158 (2016), pp. 1-21, doi:10.1007/s10107-015-0911-4. · Zbl 1346.90671
[39] Rockafellar, R. T., Convex Analysis, , Princeton University Press, Princeton, NJ, 1970, doi:10.1515/9781400873173. · Zbl 0193.18401
[40] Rudin, L. I., Osher, S., and Fatemi, E., Nonlinear total variation based noise removal algorithms, Phys. D, 60 (1992), pp. 259-268, doi:10.1016/0167-2789(92)90242-F. · Zbl 0780.49028
[41] Scherzer, O., Grasmair, M., Grossauer, H., Haltmeier, M., and Lenzen, F., Variational Methods in Imaging, , Springer, New York, 2009, doi:10.1007/978-0-387-69277-7. · Zbl 1177.68245
[42] Van Aarle, W., Palenstijn, W. J., Cant, J., Janssens, E., Bleichrodt, F., Dabravolski, A., De Beenhouwer, J., Batenburg, K. J., and Sijbers, J., Fast and flexible X-ray tomography using the ASTRA toolbox, Opt. Express, 24 (2016), pp. 25129-25147, doi:10.1364/OE.24.025129.
[43] Vũ, B. C., A splitting algorithm for dual monotone inclusions involving cocoercive operators, Adv. Comput. Math., 38 (2013), pp. 667-681, doi:10.1007/s10444-011-9254-8. · Zbl 1284.47045
[44] Wichrowska, O., Maheswaranathan, N., Hoffman, M. W., Colmenarejo, S. G., Denil, M., Freitas, N., and Sohl-Dickstein, J., Learned optimizers that scale and generalize, in International Conference on Machine Learning, , PMLR, 2017, pp. 3751-3760, https://proceedings.mlr.press/v70/wichrowska17a.html.
[45] Yang, Y., Sun, J., Li, H., and Xu, Z., Deep ADMM-net for compressive sensing MRI, in Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16), , 2016, pp. 10-18, https://papers.nips.cc/paper/2016/hash/1679091c5a880faf6fb5e6087eb1b2dc-Abstract.html.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.