×

A compute-bound formulation of Galerkin model reduction for linear time-invariant dynamical systems. (English) Zbl 1506.74436

Summary: This work aims to advance computational methods for projection-based reduced-order models (ROMs) of linear time-invariant (LTI) dynamical systems. For such systems, current practice relies on ROM formulations expressing the state as a rank-1 tensor (i.e., a vector), leading to computational kernels that are memory bandwidth bound and, therefore, ill-suited for scalable performance on modern architectures. This weakness can be particularly limiting when tackling many-query studies, where one needs to run a large number of simulations. This work introduces a reformulation, called rank-2 Galerkin, of the Galerkin ROM for LTI dynamical systems which converts the nature of the ROM problem from memory bandwidth to compute bound. We present the details of the formulation and its implementation, and demonstrate its utility through numerical experiments using, as a test case, the simulation of elastic seismic shear waves in an axisymmetric domain. We quantify and analyze performance and scaling results for varying numbers of threads and problem sizes. Finally, we present an end-to-end demonstration of using the rank-2 Galerkin ROM for a Monte Carlo sampling study. We show that the rank-2 Galerkin ROM is one order of magnitude more efficient than the rank-1 Galerkin ROM (the current practice) and about 970 times more efficient than the full-order model, while maintaining accuracy in both the mean and statistics of the field.

MSC:

74S05 Finite element methods applied to problems in solid mechanics
65N30 Finite element, Rayleigh-Ritz and Galerkin methods for boundary value problems involving PDEs
74L05 Geophysical solid mechanics

Software:

ViennaCL; Kokkos; BLIS

References:

[1] Benner, P.; Gugercin, S.; Willcox, K., A survey of projection-based model reduction methods for parametric dynamical systems, SIAM Rev., 57, 4, 483-531 (2015) · Zbl 1339.37089
[2] Baur, U.; Benner, P.; Feng, L., Model order reduction for linear and nonlinear systems: A system-theoretic perspective, Arch. Comput. Methods Eng., 21, 4, 331-358 (2014) · Zbl 1348.93075
[3] Moore, B., Principal component analysis in linear systems: Controllability, observability, and model reduction, IEEE Trans. Automat. Control, 26, 1, 17-32 (1981) · Zbl 0464.93022
[4] Willcox, K.; Peraire, J., Balanced model reduction via the proper orthogonal decomposition, AIAA J., 40, 11, 2323-2330 (2002)
[5] Mullis, C. T.; Roberts, R. A., Synthesis of minimum roundoff noise fixed point digital filters, IEEE Trans. Circuits Syst., 23, 9, 551-562 (1976) · Zbl 0342.93066
[6] Lall, S.; Marsden, J. E.; ski, S. G., Empirical model reduction of controlled nonlinear systems, IFAC Proc. Vol., 32, 2, 2598-2603 (1999), 14th IFAC World Congress 1999, Beijing, Chia, 5-9 July
[7] Gugercin, S.; Antoulas, A.; Beattie, C., \( \mathcal{H}_2\) model reduction for large-scale linear dynamical systems, SIAM J. Matrix Anal. Appl., 30, 2, 609-638 (2008) · Zbl 1159.93318
[8] Wilson, D., Optimum solution of model-reduction problem, Proc. Inst. Electr. Eng., 117, 6, 1161-1165 (1970)
[9] Hyland, D.; Bernstein, D., The optimal projection equations for model reduction and the relationships among the methods of Wilson, Skelton, and Moore, IEEE Trans. Automat. Control, 30, 12, 1201-1211 (1985) · Zbl 0583.93004
[10] Lall, S.; Krysl, P.; Marsden, J. E., Structure-preserving model reduction for mechanical systems, Physica D, 184, 1, 304-318 (2003), Complexity and Nonlinearity in Physical Systems - A Special Issue to Honor Alan Newell, URL http://www.sciencedirect.com/science/article/pii/S0167278903002276 · Zbl 1041.70011
[11] Benner, P.; Saak, J.; Uddin, M. M., Structure preserving model order reduction of large sparse second-order index-1 systems and application to a mechatronics model, Math. Comput. Model. Dyn. Syst., 22, 6, 509-523 (2016), URL arXiv:https://doi.org/10.1080/13873954.2016.1218347 · Zbl 1348.93025
[12] Bui-Thanh, T.; Willcox, K.; Ghattas, O., Parametric reduced-order models for probabilistic analysis of unsteady aerodynamic applications, AIAA J., 46, 10, 2520-2529 (2008)
[13] Grepl, M. A.; Patera, A. T., A posteriori error bounds for reduced-basis approximations of parametrized parabolic partial differential equations, ESAIM: M2AN, 39, 1, 157-181 (2005) · Zbl 1079.65096
[14] Prud’Homme, C.; Rovas, D.; Veroy, K.; Machiels, L.; Maday, Y.; Patera, A.; Turinici, G., Reliable real-time solution of parametrized partial differential equations: Reduced-basis output bound methods, J. Fluids Eng., 124, 1, 70-80 (2001)
[15] Rovas, D.; Machiels, L.; Maday, Y., Reduced-basis output bound methods for parabolic problems, IMA J. Numer. Anal., 26, 423-445 (2006) · Zbl 1101.65099
[16] Rovas, D. V., Reduced-Basis Output Bound Methods for Parametrized Partial Differential Equations (2003), Massachusetts Institute of Technology, URL http://hdl.handle.net/1721.1/16956
[17] Kunisch, K.; Volkwein, S., Galerkin proper orthogonal decomposition methods for parabolic problems, Numer. Math., 90, 1, 117-148 (2001) · Zbl 1005.65112
[18] Singler, J. R., New POD error expressions, error bounds, and asymptotic results for reduced order models of parabolic PDEs, SIAM J. Numer. Anal., 52, 2, 852-876 (2014) · Zbl 1298.65140
[19] Volkwein, S., Model reduction using proper orthogonal decomposition, 2011. Lecture Notes, University of Konstanz, www.math.uni-konstanz.de/numerik/ personen/volkwein/teaching/POD-vorlesung.pdf. Reduction for parametrized pdes 27 andrea manzoni CMCS - modelling and scie, (CMCS - Modelling and Scientific Computing MATHICSE - Mathematics Institute of Computational Science and Engineering EPFL - Ecole Polytechnique Fédérale de Lausanne Station 8, CH-1015 Lausanne Switzerland and MOX - Modellistica E Calcolo Scientifico Dipart (2012))
[20] Gugercin, S.; Antoulas, A. C., A survey of model reduction by balanced truncation and some new results, Internat. J. Control, 77, 8, 748-766 (2004), URL arXiv:https://doi.org/10.1080/00207170410001713448 · Zbl 1061.93022
[21] Carter Edwards, H.; Trott, C. R.; Sunderland, D., Kokkos, J. Parallel Distrib. Comput., 74, 12, 3202-3216 (2014)
[22] Dowell, E., (A Modern Course in Aeroelasticity. A Modern Course in Aeroelasticity, Solid Mechanics and its Applcations (2015), Springer International Publishing) · Zbl 1297.74001
[23] Meirovitch, L., Fundamentals of Vibrations (2010), Waveland Press
[24] Goman, M.; Khrabrov, A., State-space representation of aerodynamic characteristics of an aircraft at high angles of attack, J. Aircr., 31, 5, 1109-1115 (1994)
[25] Bazilevs, Y.; Hsu, M.-C.; Scott, M., Isogeometric fluid-structure interaction analysis with emphasis on non-matching discretizations, and with application to wind turbines, Comput. Methods Appl. Mech. Engrg., 249, 28-41 (2012) · Zbl 1348.74094
[26] Borg, M.; Collu, M., Frequency-domain characteristics of aerodynamic loads of offshore floating vertical axis wind turbines, Appl. Energy, 155, 629-636 (2015) · Zbl 1353.70004
[27] Incropera, F. P.; Lavine, A. S.; Bergman, T. L.; DeWitt, D. P., Fundamentals of Heat and Mass Transfer (2007), Wiley
[28] Zhao, C.; Lu, T., Analysis of microchannel heat sinks for electronics cooling, Int. J. Heat Mass Transfer, 45, 24, 4857-4869 (2002) · Zbl 1032.76680
[29] Banerjee, S.; Mukhopadhyay, A.; Sen, S.; Ganguly, R., Thermomagnetic convection in square and shallow enclosures for electronics cooling, Numer. Heat Transfer Part A, 55, 10, 931-951 (2009)
[30] Epting, J., Thermal management of urban subsurface resources - delineation of boundary conditions, Procedia Eng., 209, 83-91 (2017), The Urban Subsurface - from Geoscience and Engineering to Spatial Planning and Management
[31] Shah, R. K.; Sekulic, D. P., Fundamentals of Heat Exchanger Design (2003), John Wiley & Sons
[32] Lewis, E. E.; Miller, W. F., Computational methods of neutron transport (1984), URL https://www.osti.gov/biblio/5538794 · Zbl 0594.65096
[33] Tencer, J.; Carlberg, K.; Larsen, M.; Hogan, R., Accelerated solution of discrete ordinates approximation to the boltzmann transport equation for a gray absorbing-emitting medium via model reduction, J. Heat Transfer, 139, 12 (2017)
[34] Lighthill, M. J., On sound generated aerodynamically I. General theory, Proc. R. Soc. A, 211, 1107, 564-587 (1952) · Zbl 0049.25905
[35] Ffowcs Williams, J. E.; Hawkings, D. L., Sound generation by turbulence and surfaces in arbitrary motion, Proc. R. Soc. A, 264, 1151, 321-342 (1969) · Zbl 0182.59205
[36] Phillips, O. M., On the generation of sound by supersonic turbulent shear layers, J. Fluid Mech., 9, 1, 1-28 (1960) · Zbl 0097.41502
[37] Lombard, O.; Barrière, C.; Leroy, V., Nonlinear multiple scattering of acoustic waves by a layer of bubbles, Europhys. Lett., 112, 2, 24002 (2015)
[38] Vlach, J.; Jiří, V.; Singhal, K., Computer Methods for Circuit Analysis and Design (1983), Springer Science & Business Media
[39] Chetty, P., Current injected equivalent circuit approach to modeling of switching DC-DC converters in discontinuous inductor conduction mode, IEEE Trans. Ind. Electron., 3, 230-234 (1982)
[40] Chen, Z., Linear circuit model combination for coupled noise simulation by using directional junction, (IEEE 14th Topical Meeting on Electrical Performance of Electronic Packaging, 2005 (2005)), 83-86
[41] Sirovich, L., Turbulence and the dynamics of coherent structures part I: coherent structures, Quart. Appl. Math., 45, 3, 561-571 (1987), URL http://www.jstor.org/stable/43637457 · Zbl 0676.76047
[42] Berkooz, G.; Holmes, P.; Lumley, J. L., The proper orthogonal decomposition in the analysis of turbulent flows, Annu. Rev. Fluid Mech., 25, 1, 539-575 (1993)
[43] Hutcheson, A.; Natoli, V., Memory Bound vs. Compute Bound: A Quantitative Study of Cache and Memory Bandwidth in High Performance Applications (2011), URL http://docplayer.net/33565764-Memory-bound-vs-compute-bound-a-quantitative-study-of-cache-and-memory-bandwidth-in-high-performance-applications.html
[44] Yuen, D.; Wang, J.; Johnsson, L.; Chi, C.-H.; Shi, Y., GPU Solutions to Multi-Scale Problems in Science and Engineering (2011), Springer Publishing Company, Inc.
[45] Elafrou, A.; Goumas, G.; Koziris, N., Performance analysis and optimization of sparse matrix-vector multiplication on Intel Xeon Phi, (2017 IEEE International Parallel and Distributed Processing Symposium Workshops. 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW (2017)), 1389-1398
[46] Yang, H.; Gunzburger, M., Algorithms and analyses for stochastic optimization for turbofan noise reduction using parallel reduced-order modeling, Comput. Methods Appl. Mech. Engrg., 319, 217-239 (2017) · Zbl 1439.76136
[47] Bell, N.; Garland, M., Efficient Sparse Matrix-Vector Multiplication on CUDANVIDIA Technical Report (2008), NVIDIA Corporation, URL https://www.nvidia.com/docs/IO/66889/nvr-2008-004.pdf
[48] Peise, E., Performance modeling and prediction for dense linear algebra (2017), URL arXiv:1706.01341
[49] Ahmad, K.; Venkat, A.; Hall, M., Optimizing LOBPCG: Sparse matrix loop and data transformations in action, (Ding, C.; Criswell, J.; Wu, P., Languages and Compilers for Parallel Computing (2017), Springer International Publishing: Springer International Publishing Cham), 218-232
[50] Hong, C.; Sukumaran-Rajam, A.; Bandyopadhyay, B.; Kim, J.; Kurt, S. E.; Nisa, I.; Sabhlok, S.; Çatalyürek, U. V.; Parthasarathy, S.; Sadayappan, P., Efficient sparse-matrix multi-vector product on GPUs, (Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing. Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’18 (2018), Association for Computing Machinery: Association for Computing Machinery New York, NY, USA), 66-79
[51] Jiang, N.; Layton, W., An algorithm for fast calculation of flow ensembles, Int. J. Uncertain. Quantif., 4, 4, 273-301 (2014) · Zbl 1301.65099
[52] Gunzburger, M.; Jiang, N.; Schneier, M., An ensemble-proper orthogonal decomposition method for the nonstationary Navier-Stokes equations, SIAM J. Numer. Anal., 55, 1, 286-304 (2017) · Zbl 1394.76067
[53] Gunzburger, M.; Jiang, N.; Wang, Z., An efficient algorithm for simulating ensembles of parameterized flow problems, IMA J. Numer. Anal., 39, 3, 1180-1205 (2018) · Zbl 1466.65133
[54] Phipps, E.; D’Elia, M.; Edwards, H. C.; Hoemmen, M.; Hu, J.; Rajamanickam, S., Embedded ensemble propagation for improving performance, portability and scalability of uncertainty quantification on emerging computational architectures (2015), URL arXiv:1511.03703 · Zbl 1365.65017
[55] Igel, H.; Weber, M., SH-Wave propagation in the whole mantle using high-order finite differences, Geophys. Res. Lett., 22, 6, 731-734 (1995)
[56] Jahnke, G.; Thorne, M. S.; Cochard, A.; Igel, H., Global SH-wave propagation using a parallel axisymmetric spherical finite-difference scheme: Application to whole mantle scattering, Geophys. J. Int., 173, 3, 815-826 (2008)
[57] Chaljub, E.; Tarantola, A., Sensitivity of SS precursors to topography on the upper-mantle 660-km discontinuity, Geophys. Res. Lett., 24, 21, 2613-2616 (1997)
[58] Wang, Y.; Takenaka, H.; Jiang, X.; Lei, J., Modelling two-dimensional global seismic wave propagation in a laterally heterogeneous whole-Moon model, Geophys. J. Int., 192, 3, 1271-1287 (2012)
[59] Virieux, J., SH-Wave propagation in heterogeneous media: velocity-stress finite-difference method, Explor. Geophys., 15, 4, 265 (1984)
[60] Virieux, J., P-SV wave propagation in heterogeneous media: Velocity-stress finite-difference method, GEOPHYSICS, 51, 4, 889-901 (1986)
[61] Pereyra, V.; Kaelin, A., Fast wave propagation by model order reduction, Electron. Trans. Numer. Anal. Vol., 30, 406-419 (2008), URL http://emis.impa.br/EMIS/journals/ETNA/vol.30.2008/pp406-419.dir/pp406-419.pdf · Zbl 1171.65072
[62] Pereyra, V., Model order reduction with oblique projections for large scale wave propagation, J. Comput. Appl. Math., 295, 103-114 (2016), VIII Pan-American Workshop in Applied and Computational Mathematics · Zbl 1329.86018
[63] Dziewonski, A. M.; Anderson, D. L., Preliminary reference Earth model, Phys. Earth Planet. Inter., 25, 4, 297-356 (1981)
[64] Kennett, B. L.N.; Engdahl, E. R.; Buland, R., Constraints on seismic velocities in the Earth from traveltimes, Geophys. J. Int., 122, 1, 108-124 (1995)
[65] Montagner, J.-P.; Kennett, B. L.N., How to reconcile body-wave and normal-mode reference Earth models, Geophys. J. Int., 125, 1, 229-248 (1996)
[66] Rabinovich, E.; Filipenko, N. Y.; Shefel, G. S., Generalized model of seismic pulse, J. Phys. Conf. Ser., 1015, Article 052025 pp. (2018)
[67] Van Zee, F. G.; van de Geijn, R. A., BLIS: A framework for rapidly instantiating BLAS functionality, ACM Trans. Math. Software, 41, 3, 14:1-14:33 (2015) · Zbl 1347.65054
[68] Van Zee, F. G.; Smith, T.; Igual, F. D.; Smelyanskiy, M.; Zhang, X.; Kistler, M.; Austel, V.; Gunnels, J.; Low, T. M.; Marker, B.; Killough, L.; van de Geijn, R. A., The BLIS framework: Experiments in portability, ACM Trans. Math. Software, 42, 2, 12:1-12:19 (2016)
[69] Li, A.; Hammad Mazhar, R. S.; Negrut, D., Comparison of SPMV performance on matrices with different matrix format using CUSP, cuSPARSE and ViennaCL (2015), URL https://sbel.wiscweb.wisc.edu/wp-content/uploads/sites/569/2018/05/TR-2015-02.pdf
[70] Swischuk, R.; Kramer, B.; Huang, C.; Willcox, K., Learning physics-based reduced-order models for a single-injector combustion process, AIAA J., 58, 6, 2658-2672 (2020)
[71] Handbook of Linear Algebra (2006), CRC Press: CRC Press Boca Raton, FL, USA
[72] I. Kalashnikova, S. Arunajatesan, M.F. Barone, B.G. van Bloemen Waanders, J.A. Fike, Reduced order modeling for prediction and control of large-scale systems, http://dx.doi.org/10.2172/1177206. · Zbl 1296.93165
[73] Amsallem, D.; Tezaur, R.; Farhat, C., Real-time solution of linear computational problems using databases of parametric reduced-order models with arbitrary underlying meshes, J. Comput. Phys., 326, 373-397 (2016) · Zbl 1373.68446
[74] Afra, S.; Gildin, E., Tensor based geology preserving reservoir parameterization with Higher Order Singular Value Decomposition (HOSVD), Comput. Geosci., 94, 110-120 (2016)
[75] Shi, Y.; Niranjan, U. N.; Anandkumar, A.; Cecka, C., Tensor contractions with extended BLAS kernels on CPU and GPU, (2016 IEEE 23rd International Conference on High Performance Computing. 2016 IEEE 23rd International Conference on High Performance Computing, HiPC (2016)), 193-202
[76] Abdelfattah, A.; Haidar, A.; Tomov, S.; Dongarra, J., Performance, design, and autotuning of batched GEMM for GPUs, (Kunkel, J. M.; Balaji, P.; Dongarra, J., High Performance Computing (2016), Springer International Publishing: Springer International Publishing Cham), 21-38
[77] Li, X.; Liang, Y.; Yan, S.; Jia, L.; Li, Y., A coordinated tiling and batching framework for efficient GEMM on GPUs, (Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, PPoPP ’19 (2019), Association for Computing Machinery: Association for Computing Machinery New York, NY, USA), 229-241
[78] Jiang, L.; Yang, C.; Ma, W., Enabling highly efficient batched matrix multiplications on SW26010 many-core processor, ACM Trans. Archit. Code Optim., 17, 1 (2020)
[79] Markidis, S.; Chien, S. W.D.; Laure, E.; Peng, I. B.; Vetter, J. S., NVIDIA Tensor core programmability, performance precision, (2018 IEEE International Parallel and Distributed Processing Symposium Workshops. 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW (2018)), 522-531
[80] Jouppi, N. P.; Young, C.; Patil, N.; Patterson, D.; Agrawal, G.; Bajwa, R.; Bates, S.; Bhatia, S.; Boden, N.; Borchers, A.; Boyle, R.; Cantin, P.-l.; Chao, C.; Clark, C.; Coriell, J.; Daley, M.; Dau, M.; Dean, J.; Gelb, B.; Ghaemmaghami, T. V.; Gottipati, R.; Gulland, W.; Hagmann, R.; Ho, C. R.; Hogberg, D.; Hu, J.; Hundt, R.; Hurt, D.; Ibarz, J.; Jaffey, A.; Jaworski, A.; Kaplan, A.; Khaitan, H.; Killebrew, D.; Koch, A.; Kumar, N.; Lacy, S.; Laudon, J.; Law, J.; Le, D.; Leary, C.; Liu, Z.; Lucke, K.; Lundin, A.; MacKean, G.; Maggiore, A.; Mahony, M.; Miller, K.; Nagarajan, R.; Narayanaswami, R.; Ni, R.; Nix, K.; Norrie, T.; Omernick, M.; Penukonda, N.; Phelps, A.; Ross, J.; Ross, M.; Salek, A.; Samadiani, E.; Severn, C.; Sizikov, G.; Snelham, M.; Souter, J.; Steinberg, D.; Swing, A.; Tan, M.; Thorson, G.; Tian, B.; Toma, H.; Tuttle, E.; Vasudevan, V.; Walter, R.; Wang, W.; Wilcox, E.; Yoon, D. H., In-datacenter performance analysis of a tensor processing unit, (Proceedings of the 44th Annual International Symposium on Computer Architecture. Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA ’17 (2017), Association for Computing Machinery: Association for Computing Machinery New York, NY, USA), 1-12
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.