Document Zbl 1417.93141

Output-feedback \(H_\infty\) quadratic tracking control of linear systems using reinforcement learning. (English) Zbl 1417.93141

Int. J. Adapt. Control Signal Process. 33, No. 2, 300-314 (2019).

Summary: This paper presents an online learning algorithm based on integral reinforcement learning (IRL) to design an output-feedback (OPFB) \(H_\infty\) tracking controller for partially unknown linear continuous-time systems. Although reinforcement learning techniques have been successfully applied to find optimal state-feedback controllers, in most control applications, it is not practical to measure the full system states. Therefore, it is desired to design OPFB controllers. To this end, a general bounded \(L_2\)-gain tracking problem with a discounted performance function is used for the OPFB \(H_\infty\) tracking. A tracking game algebraic Riccati equation is then developed that gives a Nash equilibrium solution to the associated min-max optimization problem. An IRL algorithm is then developed to solve the game algebraic Riccati equation online without requiring complete knowledge of the system dynamics. The proposed IRL-based algorithm solves an IRL Bellman equation in each iteration online in real time to evaluate an OPFB policy and updates the OPFB gain using the information given by the evaluated policy. An adaptive observer is used to provide the knowledge of the full states for the IRL Bellman equation during learning. However, the observer is not needed after the learning process is finished. A simulation example is provided to verify the convergence of the proposed algorithm to a suboptimal OPFB solution and the performance of the proposed method.

Cited in 8 Documents

MSC:

93B52	Feedback control
93B36	\(H^\infty\)-control
93C05	Linear systems in control theory
68T05	Learning and adaptive systems in artificial intelligence
91A80	Applications of game theory
49N90	Applications of optimal control and differential games

Keywords:

bounded \(L_2\)-gain; \(H_\infty\) controller; optimal control; output feedback; reinforcement learning (RL)

Cite Review PDF

Full Text: DOI

References:

[1]	Syrmos, VL, Abdallah, CT, Dorato, P, Grigoriadis, K. Static output feedback‐a survey. Automatica. 1997; 33( 2): 125‐ 137. · Zbl 0872.93036
[2]	Stevens, B, Lewis, FL, Johnson, EN. Aircraft Control and Simulation: Dynamics, Controls Design, and Autonomous Systems. Hoboken, New Jersey: John Wiley & Sons; 2015.
[3]	Lewis, FL, Vrabie, DL, Syrmos, VL. Optimal Control. Hoboken, New Jersey: John Wiley & Sons; 2015.
[4]	Mannava, A, Balakrishnan, SN, Tang, L, Landers, RG. Optimal tracking control of motion systems. IEEE Trans Control Syst Technol. 2012; 20( 6): 1548‐ 1556.
[5]	Sutton, RS, Barto, AG. Reinforcement Learning—An Introduction. Cambridge, MA: MIT Press; 1998. · Zbl 1407.68009
[6]	Bertsekas, DP, Tsitsiklis, JN. Neuro‐Dynamic Programming. Belmont, MA: Athena Scientific; 1996. · Zbl 0924.68163
[7]	Powell, WB. Approximate Dynamic Programming: Solving the Curses of Dimensionality. New York: Wiley‐Interscience; 2007. · Zbl 1156.90021
[8]	Werbos, PJ. A menu of designs for reinforcement learning over time. In: WT Miller, RS Sutton, PJ Werbos, 1st ed. Neural Networks for Control. Cambridge, MA: MIT Press; 1991: 67‐ 95.
[9]	Lewis, FL, Vrabie, D, Vamvoudakis, KG. Reinforcement learning and feedback control. IEEE Control Syst Mag. 2012; 32( 6): 76‐ 105. · Zbl 1395.93584
[10]	Wang, FY, Zhang, H, Liu, D. Adaptive dynamic programming: an introduction. IEEE Comput Intell Mag. 2009; 4( 2): 39‐ 47.
[11]	Doya, K. Reinforcement learning in continuous‐time and space. Neural Comput. 2000; 12: 219‐ 245.
[12]	Bradtke, SJ, Ydestie, BE, Barto, AG. Adaptive linear quadratic control using policy iteration. Paper presented at: Proceedings of the American Control Conference; June 1994; Baltimore, MD.
[13]	Landelius, T. Reinforcement Learning and Distributed Local Model Synthesis [Ph.D. dissertation]. Linkoping Sweden: Linkoping Univ.; 1997.
[14]	Lewis, FL, Vamvoudakis, K. Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output data. IEEE Trans Syst Man Cybern B. 2011; 41( 1): 14‐ 23.
[15]	Vrabie, D, Pastravanu, O, Abu‐Khalaf, M, Lewis, FL. Adaptive optimal control for continuous‐time linear systems based on policy iteration. Automatica. 2009; 45( 2): 477‐ 484. · Zbl 1158.93354
[16]	Lee, JY, Park, JB, Choi, YH. Integral Q‐learning and explorized policy iteration for adaptive optimal control of continuous‐time linear systems. Automatica. 2012; 48( 11): 2850‐ 2859. · Zbl 1254.49019
[17]	Jiang, Y, Jiang, ZP. Computational adaptive optimal control for continuous‐time linear systems with completely unknown dynamics. Automatica. 2012; 48( 10): 2699‐ 2704. · Zbl 1271.93088
[18]	Modares, H, Lewis, FL. Linear quadratic tracking control of partially‐unknown continuous‐time systems using reinforcement learning. IEEE Trans Autom Control. 2014; 59( 11): 3051‐ 3056. · Zbl 1360.93726
[19]	Modares, H, Lewis, FL. Optimal tracking control of nonlinear partially‐unknown constrained‐input systems using integral reinforcement learning. Automatica. 2014; 50( 7): 1780‐ 1792. · Zbl 1296.93073
[20]	Kiumarsi, B, Lewis, FL, Naghibi‐Sistani, MB, Karimpour, A. Optimal tracking control of unknown discrete‐time linear systems using input‐output measured data. IEEE Trans Cybern. 2015; 45( 12): 2770‐ 2779.
[21]	Kiumarsi‐Khomartash, B, Lewis, FL, Naghibi‐Sistani, MB, Karimpour, A. Optimal tracking control for linear discrete‐time systems using reinforcement learning. Paper presented at: 52nd IEEE Conference on Decision and Control, Firenze; 2013; 3845‐ 3850.
[22]	Modares, H, Lewis, FL, Jiang, ZP. Tracking control of completely unknown continuous‐time systems via off‐policy reinforcement learning. IEEE Trans Neural Netw Learn Syst. 2015; 26( 10): 2550‐ 2562.
[23]	Zhu, LM, Modares, H, Peen, GO, Lewis, FL, Yue, B. Adaptive suboptimal output‐feedback control for linear systems using integral reinforcement learning. IEEE Trans Control Sys Technol. 2014; 23( 1): 264‐ 273.
[24]	Zhang, H, Wei, Q, Liu, D. An iterative adaptive dynamic programming method for solving a class of nonlinear zero‐sum differential games. Automatica. 2011; 47( 1): 207‐ 214. · Zbl 1231.91028
[25]	Vamvoudakis, KG, Lewis, FL. Online solution of nonlinear 2‐player zero‐sum games using synchronous policy iteration. Int J Robust Nonlinear Control. 2012; 22( 13): 1460‐ 1483. · Zbl 1292.91011
[26]	Zhang, H, Qin, C, Jiang, B, Luo, Y. Online adaptive policy learning algorithm for H∞ state feedback control of unknown affine nonlinear discrete‐time systems. IEEE Trans Cybern. 2014; 44( 12): 2706‐ 2718.
[27]	Wu, H‐N, Luo, B. Simultaneous policy update algorithms for learning the solution of linear continuous‐time H∞ state feedback control. Inf Sci. 2013; 222: 472‐ 485. · Zbl 1293.93275
[28]	Luo, B, Wu, H‐N. Computationally efficient simultaneous policy update algorithm for nonlinear H∞ state feedback control with Galerkin’s method. Int J Robust Nonlinear Control. 2013; 23( 9): 991‐ 1012. · Zbl 1270.93044
[29]	Modares, H, Lewis, FL, Jiang, Z‐P. H∞ tracking control of completely unknown continuous‐time systems via off‐policy reinforcement learning. IEEE Trans Neural Netw Learn Syst. 2015; 26( 10): 2550‐ 2562.
[30]	Modares, H, Lewis, FL, Jiang, ZP. Optimal output‐feedback control of unknown continuous‐time tinear systems using off‐policy reinforcement learning. IEEE Trans Cybern. 2016; 46: 2401‐ 2410.
[31]	Basar, T, Bernard, P. H∞‐Optimal Control and Related Minimax Design Problems. Boston, MA, USA: Birkhäuser; 1995. · Zbl 0835.93001
[32]	Abu‐Khalaf, M, Lewis, FL, Huang, J. Policy iterations on the Hamilton‐Jacobi‐Isaacs equation for H_∞ state feedback control with input saturation. IEEE Trans Autom Control. 2006; 51( 12): 1986‐ 1995. · Zbl 1366.93147
[33]	Modares, H, Nageshrao, SP, Lopes, GAD, Babuška, R, Lewis, FL. Optimal model‐free output synchronization of heterogeneous systems using off‐policy reinforcement learning. Automatica. 2016; 71: 334‐ 341. · Zbl 1343.93006
[34]	Ioannou, PA, Sun, J. Robust adaptive control. Mineola, New York: Courier Corporation; 2012.

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.