Document Zbl 07642857

Do ideas have shape? Idea registration as the continuous limit of artificial neural networks. (English) Zbl 07642857

Physica D 444, Article ID 133592, 41 p. (2023).

Summary: We introduce a Gaussian Process (GP) generalization of ResNets (with unknown functions of the network replaced by GPs and identified via MAP estimation), which includes ResNets (trained with \(L_2\) regularization on weights and biases) as a particular case (when employing particular kernels). We show that ResNets (and their warping GP regression extension) converge, in the infinite depth limit, to a generalization of image registration variational algorithms. In this generalization, images are replaced by functions mapping input/output spaces to a space of unexpressed abstractions (ideas), and material points are replaced by data points. Whereas computational anatomy aligns images via warping of the material space, this generalization aligns ideas (or abstract shapes as in Plato’s theory of forms) via the warping of the Reproducing Kernel Hilbert Space (RKHS) of functions mapping the input space to the output space. While the Hamiltonian interpretation of ResNets is not new, it was based on an Ansatz. We do not rely on this Ansatz and present the first rigorous proof of convergence of ResNets with trained weights and biases towards a Hamiltonian dynamics driven flow. Since our proof is constructive and based on discrete and continuous mechanics, it reveals several remarkable properties of ResNets and their GP generalization. ResNets regressors are kernel regressors with data-dependent warping kernels. Minimizers of \(L_2\) regularized ResNets satisfy a discrete least action principle implying the near preservation of the norm of weights and biases across layers. The trained weights of ResNets with scaled/strong \(L^2\) regularization can be identified by solving an autonomous Hamiltonian system. The trained ResNet parameters are unique up to (a function of) the initial momentum, and the initial momentum representation of those parameters is generally sparse. The kernel (nugget) regularization strategy provides a provably robust alternative to Dropout for ANNs. We introduce a functional generalization of GPs and show that pointwise GP/RKHS error estimates lead to probabilistic and deterministic generalization error estimates for ResNets. When performed with feature maps, the proposed analysis identifies the (EPDiff) mean fields limit of trained ResNet parameters as the number of data points goes to infinity. The search for good architectures can be reduced to that of good kernels, and we show that the composition of warping regression blocks with reduced equivariant multichannel kernels (introduced here) recovers and generalizes CNNs to arbitrary spaces and groups of transformations.

Cited in 4 Documents

MSC:

68-XX	Computer science
65-XX	Numerical analysis

Keywords:

residual neural network; image registration; infinite depth limit; Hamiltonian and Lagrangian analysis; regularization; deep Gaussian processes

Software:

LDDMM; PCANet; torchdiffeq; ANODEs

Cite Review PDF

Full Text: DOI arXiv

References:

[1]	He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian, Deep residual learning for image recognition, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)), 770-778
[2]	Weinan, E., A proposal on machine learning via dynamical systems, Commun. Math. Stat., 5, 1, 1-11 (2017) · Zbl 1380.37154
[3]	Chen, Tian Qi; Rubanova, Yulia; Bettencourt, Jesse; Duvenaud, David K., Neural ordinary differential equations, (Advances in Neural Information Processing Systems (2018)), 6571-6583
[4]	Thorpe, Matthew; van Gennip, Yves, Deep limits of residual neural networks (2018), arXiv preprint arXiv:1810.11741
[5]	Haber, Eldad; Ruthotto, Lars, Stable architectures for deep neural networks, Inverse Problems, 34, 1, Article 014004 pp. (2017) · Zbl 1426.68236
[6]	Li, Qianxiao; Chen, Long; Tai, Cheng; Weinan, E., Maximum principle based algorithms for deep learning, J. Mach. Learn. Res., 18 (2017), Paper No. 165, 29 · Zbl 1467.68156
[7]	Han, Jiequn; Li, Qianxiao, A mean-field optimal control formulation of deep learning, Res. Math. Sci., 6, 1, 1-41 (2019) · Zbl 1421.49021
[8]	LeCun, Yann; Haffner, Patrick; Bottou, Léon; Bengio, Yoshua, Object recognition with gradient-based learning, (Shape, Contour and Grouping in Computer Vision (1999), Springer), 319-345
[9]	Belkin, Mikhail, Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation (2021), arXiv preprint arXiv:2105.14368
[10]	Owhadi, Houman, Computational graph completion, Res. Math. Sci., 9, 2, 1-33 (2022) · Zbl 1497.68464
[11]	Owhadi, Houman; Yoo, Gene Ryan, Kernel flows: From learning kernels from data into the abyss, J. Comput. Phys., 389, 22-47 (2019) · Zbl 1452.65028
[12]	Chen, Yifan; Owhadi, Houman; Stuart, Andrew, Consistency of empirical Bayes and kernel flow for hierarchical parameter estimation, Math. Comp., 90, 332, 2527-2578 (2021) · Zbl 1472.62012
[13]	Hamzi, Boumediene; Owhadi, Houman, Learning dynamical systems from data: A simple cross-validation perspective, part I: Parametric kernel flows, Physica D, 421, Article 132817 pp. (2021) · Zbl 1509.68217
[14]	Akian, Jean-Luc; Bonnet, Luc; Owhadi, Houman; Savin, Éric, Learning“ best” kernels from data in Gaussian process regression. With application to aerodynamics, J. Comput. Phys., 470 (2022) · Zbl 07599627
[15]	Hamzi, B.; Maulik, R.; Owhadi, H., Simple, low-cost and accurate data-driven geophysical forecasting with learned kernels, Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci., 477, 2252, Article 20210326 pp. (2021)
[16]	Yoo, Gene Ryan; Owhadi, Houman, Deep regularization and direct training of the inner layers of neural networks with kernel flows, Physica D, 426, Article 132952 pp. (2021) · Zbl 1515.68273
[17]	Shirdel, Mahdy; Asadi, Reza; Do, Duc; Hintlian, Micheal, Deep learning with kernel flow regularization for time series forecasting (2021), arXiv preprint arXiv:2109.11649
[18]	Houman Owhadi, Do Ideas Have Shape? Plato’s Theory of Forms as the Continuous Limit of Artificial Neutral Networks, Fields Institute, Youtube,. · Zbl 1040.60025
[19]	Nelsen, Nicholas H.; Stuart, Andrew M., The random feature model for input-output maps between Banach spaces (2020), arXiv preprint arXiv:2005.10224 · Zbl 07398767
[20]	Owhadi, Houman; Scovel, Clint, Operator-Adapted Wavelets, Fast Solvers, and Numerical Homogenization: from a Game Theoretic Approach to Numerical Approximation and Algorithm Design, Vol. 35 (2019), Cambridge University Press · Zbl 1477.65004
[21]	Micchelli, Charles A.; Pontil, Massimiliano, Kernels for multi-task learning, (Advances in Neural Information Processing Systems (2005)), 921-928
[22]	Cohen, Alain-Sam; Cont, Rama; Rossier, Alain; Xu, Renyuan, Scaling properties of deep residual networks (2021), arXiv preprint arXiv:2105.12245
[23]	Alvarez, Mauricio A.; Rosasco, Lorenzo; Lawrence, Neil D., Kernels for vector-valued functions: A review, Found. Trends Mach. Learn., 4, 3, 195-266 (2012) · Zbl 1301.68212
[24]	LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey, Deep learning, Nature, 521, 7553, 436-444 (2015)
[25]	Arino, Julien, (Fundamental Theory of Ordinary Differential Equations. Fundamental Theory of Ordinary Differential Equations, Lecture Notes (2006), University of Manitoba)
[26]	Teixeira, Eduardo V., Strong solutions for differential equations in abstract spaces, J. Differential Equations, 214, 1, 65-91 (2005) · Zbl 1094.34043
[27]	Li, Tien-Yien, Existence of solutions for ordinary differential equations in Banach spaces, J. Differential Equations, 18, 1, 29-40 (1975) · Zbl 0325.34069
[28]	Grenander, Ulf; Miller, Michael I., Computational anatomy: An emerging discipline, Quart. Appl. Math., 56, 4, 617-694 (1998) · Zbl 0952.92016
[29]	Brown, Lisa Gottesfeld, A survey of image registration techniques, ACM Comput. Surv., 24, 4, 325-376 (1992)
[30]	Younes, Laurent, Shapes and Diffeomorphisms, Vol. 171 (2010), Springer · Zbl 1205.68355
[31]	Younes, Laurent, Computable elastic distances between shapes, SIAM J. Appl. Math., 58, 2, 565-586 (1998) · Zbl 0907.68158
[32]	Trouvé, Alain, Diffeomorphisms groups and pattern matching in image analysis, Int. J. Comput. Vis., 28, 3, 213-221 (1998)
[33]	Dupuis, Paul; Grenander, Ulf; Miller, Michael I., Variational problems on flows of diffeomorphisms for image matching, Quart. Appl. Math., 587-600 (1998) · Zbl 0949.49002
[34]	Miller, Michael I.; Trouvé, Alain; Younes, Laurent, On the metrics and Euler-Lagrange equations of computational anatomy, Annu. Rev. Biomed. Eng., 4, 1, 375-405 (2002)
[35]	Joshi, Sarang C.; Miller, Michael I., Landmark matching via large deformation diffeomorphisms, IEEE Trans. Image Process., 9, 8, 1357-1370 (2000) · Zbl 0965.37065
[36]	Plato, The Republic, Vol. VII, 375 BCE. · Zbl 1245.65187
[37]	Sampson, Paul D.; Guttorp, Peter, Nonparametric estimation of nonstationary spatial covariance structure, J. Amer. Statist. Assoc., 87, 417, 108-119 (1992)
[38]	Perrin, O.; Monestiez, P., Modelling of non-stationary spatial structure using parametric radial basis deformations, (GeoENV II—Geostatistics for Environmental Applications (1999), Springer), 175-186
[39]	Schmidt, Alexandra M.; O’Hagan, Anthony, Bayesian inference for non-stationary spatial covariance structure via spatial deformations, J. R. Stat. Soc. Ser. B Stat. Methodol., 65, 3, 743-758 (2003) · Zbl 1063.62034
[40]	Zammit-Mangion, Andrew; Ng, Tin Lok James; Vu, Quan; Filippone, Maurizio, Deep compositional spatial models (2019), arXiv preprint arXiv:1906.02840
[41]	Owhadi, Houman; Zhang, Lei, Metric-based upscaling, Commun. Pure Appl. Math.: J. Issued Courant Inst. Math. Sci., 60, 5, 675-723 (2007) · Zbl 1190.35070
[42]	Allassonnière, Stéphanie; Trouvé, Alain; Younes, Laurent, Geodesic shooting and diffeomorphic matching via textured meshes, (International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition (2005), Springer), 365-381
[43]	Chen, Yifan; Hosseini, Bamdad; Owhadi, Houman; Stuart, Andrew M., Solving and learning nonlinear PDEs with gaussian processes, J. Comput. Phys., 447 (2021) · Zbl 07516428
[44]	Marsden, Jerrold E.; West, Matthew, Discrete mechanics and variational integrators, Acta Numer., 10, 357-514 (2001) · Zbl 1123.37327
[45]	Hairer, Ernst; Lubich, Christian; Wanner, Gerhard, Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations, Vol. 31 (2006), Springer Science & Business Media · Zbl 1094.65125
[46]	Marsden, Jerrold E.; Ratiu, Tudor S., Introduction to Mechanics and Symmetry: A Basic Exposition of Classical Mechanical Systems, Vol. 17 (2013), Springer Science & Business Media · Zbl 0933.70003
[47]	Bruveris, Martins; Gay-Balmaz, François; Holm, Darryl D.; Ratiu, Tudor S., The momentum map representation of images, J. Nonlinear Sci., 21, 1, 115-150 (2011) · Zbl 1211.58010
[48]	Vialard, François-Xavier; Risser, Laurent; Rueckert, Daniel; Cotter, Colin J., Diffeomorphic 3D image registration via geodesic shooting using an efficient adjoint calculation, Int. J. Comput. Vis., 97, 2, 229-241 (2012) · Zbl 1235.68295
[49]	Fishbaugh, James; Prastawa, Marcel; Gerig, Guido; Durrleman, Stanley, Geodesic image regression with a sparse parameterization of diffeomorphisms, (International Conference on Geometric Science of Information (2013), Springer), 95-102 · Zbl 1368.62182
[50]	Steinwart, Ingo; Christmann, Andreas, Support Vector Machines (2008), Springer Science & Business Media · Zbl 1203.68171
[51]	Micheli, Mario, The Differential Geometry of Landmark Shape Manifolds: Metrics, Geodesics, and Curvature (2008), Brown University, (Ph.D. thesis)
[52]	Srivastava, Nitish; Hinton, Geoffrey; Krizhevsky, Alex; Sutskever, Ilya; Salakhutdinov, Ruslan, Dropout: a simple way to prevent neural networks from overfitting, J. Mach. Learn. Res., 15, 1, 1929-1958 (2014) · Zbl 1318.68153
[53]	Owhadi, Houman; Scovel, Clint; Sullivan, Tim, Brittleness of Bayesian inference under finite information in a continuous world, Electron. J. Stat., 9, 1, 1-79 (2015), arXiv:1304.6772 (April 2013) · Zbl 1305.62123
[54]	Owhadi, Houman; Scovel, Clint; Sullivan, Tim, On the brittleness of Bayesian inference, SIAM Rev., 57, 4, 566-582 (2015) · Zbl 1341.62094
[55]	Owhadi, Houman; Scovel, Clint, Brittleness of Bayesian inference and new Selberg formulas, Commun. Math. Sci., 14, 1, 83-145 (2016) · Zbl 1357.62125
[56]	Szegedy, Christian; Zaremba, Wojciech; Sutskever, Ilya; Bruna, Joan; Erhan, Dumitru; Goodfellow, Ian; Fergus, Rob, Intriguing properties of neural networks (2013), arXiv preprint arXiv:1312.6199
[57]	Mike McKerns, Mystic: a framework for predictive science; SciPy 2013 presentation; https://www.youtube.com/watch?v=o-nwSnLC6DU&feature=youtu.be&t=74.
[58]	Owhadi, Houman; Scovel, Clint, Qualitative robustness in Bayesian inference, ESAIM Probab. Stat., 21, 251-274 (2017) · Zbl 1395.62059
[59]	Casetti, Lapo; Clementi, Cecilia; Pettini, Marco, Riemannian theory of Hamiltonian chaos and Lyapunov exponents, Phys. Rev. E, 54, 6, 5969 (1996)
[60]	Schäfer, Florian; Katzfuss, Matthias; Owhadi, Houman, Sparse Cholesky factorization by Kullback-Leibler minimization, SIAM J. Sci. Comput., 43, 3, A2019-A2046 (2021) · Zbl 07364386
[61]	Holmstrom, Lasse; Koistinen, Petri, Using additive noise in back-propagation training, IEEE Trans. Neural Netw., 3, 1, 24-38 (1992)
[62]	An, Guozhong, The effects of adding noise during backpropagation training on a generalization performance, Neural Comput., 8, 3, 643-674 (1996)
[63]	Gulcehre, Caglar; Moczulski, Marcin; Denil, Misha; Bengio, Yoshua, Noisy activation functions, (International Conference on Machine Learning (2016)), 3059-3068
[64]	Bajgiran, Hamed Hamze; Franch, Pau Batlle; Owhadi, Houman; Scovel, Clint; Shirdel, Mahdy; Stanley, Michael; Tavallali, Peyman, Uncertainty quantification of the 4th kind; optimal posterior accuracy-uncertainty tradeoff with the minimum enclosing ball (2021), arXiv preprint arXiv:2108.10517 · Zbl 07605580
[65]	Carreira-Perpinan, Miguel; Wang, Weiran, Distributed optimization of deeply nested systems, (Artificial Intelligence and Statistics (2014), PMLR), 10-19
[66]	Choromanska, Anna; Cowen, Benjamin; Kumaravel, Sadhana; Luss, Ronny; Rigotti, Mattia; Rish, Irina; Diachille, Paolo; Gurev, Viatcheslav; Kingsbury, Brian; Tejwani, Ravi, Beyond backprop: Online alternating minimization with auxiliary variables, (International Conference on Machine Learning (2019), PMLR), 1193-1202
[67]	Owhadi, H.; Scovel, C., (Operator Adapted Wavelets, Fast Solvers, and Numerical Homogenization, from a Game Theoretic Approach to Numerical Approximation and Algorithm Design. Operator Adapted Wavelets, Fast Solvers, and Numerical Homogenization, from a Game Theoretic Approach to Numerical Approximation and Algorithm Design, Cambridge Monographs on Applied and Computational Mathematics (2019), Cambridge University Press) · Zbl 1477.65004
[68]	Still, Georg, Lectures on parametric optimization: An introduction, Optim. Online (2018)
[69]	Baxendale, Peter, Brownian motions in the diffeomorphism group I, Compos. Math., 53, 1, 19-50 (1984) · Zbl 0547.58041
[70]	Kunita, Hiroshi, Stochastic Flows and Stochastic Differential Equations, Vol. 24 (1997), Cambridge University Press · Zbl 0865.60043
[71]	Damianou, Andreas; Lawrence, Neil, Deep Gaussian processes, (Artificial Intelligence and Statistics (2013)), 207-215
[72]	Wu, Zong-min; Schaback, Robert, Local error estimates for radial basis function interpolation of scattered data, IMA J. Numer. Anal., 13, 1, 13-27 (1993) · Zbl 0762.41006
[73]	Owhadi, Houman, Bayesian numerical homogenization, Multiscale Model. Simul., 13, 3, 812-828 (2015) · Zbl 1322.35002
[74]	Barron, Andrew R., Universal approximation bounds for superpositions of a sigmoidal function, IEEE Trans. Inform. Theory, 39, 3, 930-945 (1993) · Zbl 0818.68126
[75]	E, Weinan; Ma, Chao; Wu, Lei, Barron spaces and the compositional function spaces for neural network models (2019), arXiv preprint arXiv:1906.08039
[76]	Dunlop, Matthew M.; Helin, Tapio; Stuart, Andrew M., Hyperparameter estimation in Bayesian MAP estimation: parameterizations and consistency, SMAI J. Comput. Math., 6, 69-100 (2020) · Zbl 1441.62084
[77]	Hart, Gabriel L.; Zach, Christopher; Niethammer, Marc, An optimal control approach for deformable registration, (2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2009), IEEE), 9-16
[78]	Mei, Song; Montanari, Andrea; Nguyen, Phan-Minh, A mean field view of the landscape of two-layer neural networks, Proc. Natl. Acad. Sci., 115, 33, E7665-E7671 (2018) · Zbl 1416.92014
[79]	Rotskoff, Grant M.; Vanden-Eijnden, Eric, Neural networks as interacting particle systems: Asymptotic convexity of the loss landscape and universal scaling of the approximation error, Stat, 1050, 22 (2018)
[80]	Holm, Darryl; Trouvé, Alain; Younes, Laurent, The Euler-Poincaré theory of metamorphosis, Quart. Appl. Math., 67, 4, 661-685 (2009) · Zbl 1186.68413
[81]	Holm, Darryl D.; Marsden, Jerrold E., Momentum maps and measure-valued solutions (peakons, filaments, and sheets) for the EPDiff equation, (The Breadth of Symplectic and Poisson Geometry (2005), Springer), 203-235
[82]	Smirnov, Alexandre; Hamzi, Boumediene; Owhadi, Houman, Mean-field limits of trained weights in deep learning: A dynamical systems perspective (2022), RG.2.2.26186.24007
[83]	Schäfer, Florian; Sullivan, T. J.; Owhadi, Houman, Compression, inversion, and approximate PCA of dense kernel matrices at near-linear computational complexity, Multiscale Model. Simul., 19, 2, 688-730 (2021) · Zbl 1461.65067
[84]	Reisert, Marco; Burkhardt, Hans, Learning equivariant functions with matrix valued kernels, J. Mach. Learn. Res., 8, Mar, 385-408 (2007) · Zbl 1222.68286
[85]	Bohn, Bastian; Rieger, Christian; Griebel, Michael, A representer theorem for deep kernel learning, J. Mach. Learn. Res., 20, 64, 1-32 (2019) · Zbl 1489.62197
[86]	Joshi, Sarang C., Large Deformation Diffeomorphisms and Gaussian Random Fields for Statistical Characterization of Brain Sub-Manifolds (1998), Washington University, (Ph.D. thesis)
[87]	Camion, Vincent; Younes, Laurent, Geodesic interpolating splines, (International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition (2001), Springer), 513-527 · Zbl 1001.68698
[88]	Miller, Michael I.; Trouvé, Alain; Younes, Laurent, Geodesic shooting for computational anatomy, J. Math. Imaging Vision, 24, 2, 209-228 (2006) · Zbl 1478.92084
[89]	Charon, Nicolas; Charlier, Benjamin; Trouvé, Alain, Metamorphoses of functional shapes in Sobolev spaces, Found. Comput. Math., 18, 6, 1535-1596 (2018) · Zbl 1403.49021
[90]	Beg, M. Faisal; Miller, Michael I.; Trouvé, Alain; Younes, Laurent, Computing large deformation metric mappings via geodesic flows of diffeomorphisms, Int. J. Comput. Vis., 61, 2, 139-157 (2005) · Zbl 1477.68459
[91]	Trouvé, Alain; Younes, Laurent, Metamorphoses through lie group action, Found. Comput. Math., 5, 2, 173-198 (2005) · Zbl 1099.68116
[92]	Glaunes, Joan; Trouvé, Alain; Younes, Laurent, Diffeomorphic matching of distributions: A new approach for unlabelled point-sets and sub-manifolds matching, (Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. Vol. 2 (2004), IEEE), II
[93]	Younes, Laurent, Diffeomorphic matching, (Shapes and Diffeomorphisms (2019), Springer), 291-346 · Zbl 1423.53002
[94]	Hennig, Philipp; Osborne, Michael A.; Girolami, Mark, Probabilistic numerics and uncertainty in computations, Proc. R. Soc. A: Math. Phys. Eng. Sci., 471, 2179, Article 20150142 pp. (2015) · Zbl 1372.65010
[95]	Cockayne, Jon; Oates, Chris J.; Sullivan, Timothy John; Girolami, Mark, Bayesian probabilistic numerical methods, SIAM Rev., 61, 4, 756-789 (2019) · Zbl 1451.65179
[96]	Owhadi, Houman; Scovel, Clint; Schäfer, Florian, Statistical numerical approximation, Notices Amer. Math. Soc. (2019) · Zbl 07146568
[97]	Rico-Martinez, Ramiro; Kevrekidis, Ioannis G., Continuous time modeling of nonlinear systems: A neural network-based approach, (IEEE International Conference on Neural Networks (1993), IEEE), 1522-1525
[98]	Owhadi, Houman, Multigrid with rough coefficients and multiresolution operator decomposition from hierarchical information games, SIAM Rev., 59, 1, 99-149 (2017) · Zbl 1358.65071
[99]	Raissi, Maziar; Perdikaris, Paris; Karniadakis, George E., Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, J. Comput. Phys., 378, 686-707 (2019) · Zbl 1415.68175
[100]	Belkin, Mikhail; Ma, Siyuan; Mandal, Soumik, To understand deep learning we need to understand kernel learning (2018), arXiv preprint arXiv:1802.01396
[101]	Zhang, Chiyuan; Bengio, Samy; Hardt, Moritz; Recht, Benjamin; Vinyals, Oriol, Understanding deep learning requires rethinking generalization (2016), arXiv preprint arXiv:1611.03530
[102]	Rousseau, François; Fablet, Ronan, Residual networks as geodesic flows of diffeomorphisms (2018), arXiv preprint arXiv:1805.09585
[103]	Vialard, François-Xavier; Kwitt, Roland; Wei, Susan; Niethammer, Marc, A shooting formulation of deep learning, Adv. Neural Inf. Process. Syst., 33 (2020)
[104]	Chang, Bo; Meng, Lili; Haber, Eldad; Ruthotto, Lars; Begert, David; Holtham, Elliot, Reversible architectures for arbitrarily deep residual neural networks, (Thirty-Second AAAI Conference on Artificial Intelligence (2018))
[105]	Greydanus, Samuel; Dzamba, Misko; Yosinski, Jason, Hamiltonian neural networks, Adv. Neural Inf. Process. Syst., 32 (2019)
[106]	Sander, Michael E.; Ablin, Pierre; Blondel, Mathieu; Peyré, Gabriel, Momentum residual neural networks (2021), arXiv preprint arXiv:2102.07870
[107]	Dupont, Emilien; Doucet, Arnaud; Teh, Yee Whye, Augmented neural odes, Adv. Neural Inf. Process. Syst., 32 (2019)
[108]	Barks, Coleman, The essential rumi, (Elephant in the Dark, Vol. 252 (1995), HarperSanFrancisco)
[109]	Jacot, Arthur; Gabriel, Franck; Hongler, Clément, Neural tangent kernel: Convergence and generalization in neural networks, (Advances in Neural Information Processing Systems (2018)), 8571-8580
[110]	Wilson, Andrew Gordon; Hu, Zhiting; Salakhutdinov, Ruslan; Xing, Eric P., Deep kernel learning, (Artificial Intelligence and Statistics (2016)), 370-378
[111]	LeCun, Yann; Touresky, D.; Hinton, G.; Sejnowski, T., A theoretical framework for back-propagation, (Proceedings of the 1988 Connectionist Models Summer School, Vol. 1 (1988), CMU, Morgan Kaufmann: CMU, Morgan Kaufmann Pittsburgh, Pa), 21-28
[112]	Owhadi, Houman; Scovel, Clint; Yoo, Gene Ryan, Kernel Mode Decomposition and the Programming of Kernels (2021), Springer, arXiv:1907.08592 · Zbl 1504.62012
[113]	Kadri, Hachem; Duflos, Emmanuel; Preux, Philippe; Canu, Stéphane; Rakotomamonjy, Alain; Audiffren, Julien, Operator-valued kernels for learning from functional response data, J. Mach. Learn. Res., 17, 1, 613-666 (2016) · Zbl 1360.68682
[114]	Micheli, Mario; Michor, Peter W.; Mumford, David, Sectional curvature in terms of the cometric, with applications to the Riemannian manifolds of landmarks, SIAM J. Imaging Sci., 5, 1, 394-433 (2012) · Zbl 1276.37047
[115]	Bruveris, Martins; Vialard, François-Xavier, On completeness of groups of diffeomorphisms, J. Eur. Math. Soc., 19, 5, 1507-1544 (2017) · Zbl 1370.58003
[116]	West, Matthew, Variational Integrators (2004), California Institute of Technology, (Ph.D. thesis)
[117]	Blanes, Sergio; Casas, Fernando, A Concise Introduction to Geometric Numerical Integration (2017), CRC Press · Zbl 1343.65146
[118]	Müller, Stefan; Ortiz, Michael, On the \(\gamma \)-convergence of discrete dynamics and variational integrators, J. Nonlinear Sci., 14, 3, 279-296 (2004) · Zbl 1136.37350
[119]	Huang, Gao; Liu, Zhuang; Van Der Maaten, Laurens; Weinberger, Kilian Q., Densely connected convolutional networks, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)), 4700-4708
[120]	Hairer, Ernst; Lubich, Christian; Wanner, Gerhard, Geometric numerical integration illustrated by the Störmer-Verlet method, Acta Numer., 12, 399-450 (2003) · Zbl 1046.65110
[121]	Tao, Molei, Explicit symplectic approximation of nonseparable Hamiltonians: Algorithm and long time performance, Phys. Rev. E, 94, 4, Article 043303 pp. (2016)
[122]	Rahimi, Ali; Recht, Benjamin, Random features for large-scale kernel machines, (Advances in Neural Information Processing Systems (2008)), 1177-1184
[123]	Haasdonk, Bernard; Vossen, A.; Burkhardt, Hans, Invariance in kernel methods by haar-integration kernels, (Scandinavian Conference on Image Analysis (2005), Springer), 841-851
[124]	Cohen, Taco; Welling, Max, Group equivariant convolutional networks, (International Conference on Machine Learning (2016)), 2990-2999
[125]	Sabour, Sara; Frosst, Nicholas; Hinton, Geoffrey E., Dynamic routing between capsules, (Advances in Neural Information Processing Systems (2017)), 3856-3866
[126]	LeCun, Yann; Bengio, Yoshua, Convolutional networks for images, speech, and time series, Handb. Brain Theory Neural Netw., 3361, 10, 1995 (1995)
[127]	Chan, Tsung-Han; Jia, Kui; Gao, Shenghua; Lu, Jiwen; Zeng, Zinan; Ma, Yi, PCANet: A simple deep learning baseline for image classification?, IEEE Trans. Image Process., 24, 12, 5017-5032 (2015) · Zbl 1408.94080

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.