×

Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. (English) Zbl 1191.49038

Summary: We discuss an online algorithm based on policy iteration for learning the continuous-time optimal control solution with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real-time the solution to the optimal control described by the HJ equation. This method finds in real-time suitable approximations of both the optimal cost and the optimal control policy, while also guaranteeing closed-loop stability. We present an online adaptive algorithm implemented as an actor/critic structure which involves simultaneous continuous-time adaptation of both actor and critic neural networks. We call this ‘synchronous’ policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for both critic and actor networks, with extra nonstandard terms in the actor tuning law being required to guarantee closed-loop dynamical stability. The convergence to is proven, and the stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm.

MSC:

49M30 Other numerical methods in calculus of variations (MSC2010)
93C10 Nonlinear systems in control theory
93C40 Adaptive control/observation systems
Full Text: DOI

References:

[1] Abu-Khalaf, M.; Lewis, F. L., Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach, Automatica, 41, 5, 779-791 (2005) · Zbl 1087.49022
[2] Adams, R.; Fournier, J., Sobolev spaces (2003), Academic Press: Academic Press New York · Zbl 1098.46001
[4] Beard, R.; Saridis, G.; Wen, J., Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation, Automatica, 33, 12, 2159-2177 (1997) · Zbl 0949.93022
[5] Bertsekas, D. P.; Tsitsiklis, J. N., Neuro-dynamic programming (1996), Athena Scientific: Athena Scientific MA · Zbl 0924.68163
[6] Doya, K., Reinforcement learning in continuous time and space, Neural Computation, 12, 1, 219-245 (2000)
[7] Finlayson, B. A., The method of weighted residuals and variational principles (1990), Academic Press: Academic Press New York
[8] Hanselmann, T.; Noakes, L.; Zaknich, A., Continuous-time adaptive critics, IEEE Transactions on Neural Networks, 18, 3, 631-647 (2007)
[9] Hornik, K.; Stinchcombe, M.; White, H., Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks, Neural Networks, 3, 551-560 (1990)
[10] Howard, R. A., Dynamic programming and markov processes (1960), MIT Press: MIT Press Cambridge, MA · Zbl 0091.16001
[11] Ioannou, P.; Fidan, B., (Adaptive control tutorial. Adaptive control tutorial, Advances in design and control (2006), SIAM: SIAM PA) · Zbl 1116.93001
[12] Kleinman, D., On an iterative technique for riccati equation computations, IEEE Transactions on Automatic Control, 13, 1, 114-115 (1968)
[13] Lewis, F. L.; Jagannathan, S.; Yesildirek, A., Neural network control of robot manipulators and nonlinear systems (1999), Taylor & Francis
[14] Lewis, F. L.; Liu, K.; Yesildirek, A., Neural net controller with guaranteed tracking performance, IEEE Transactions on Neural Networks, 6, 3, 703-715 (1995)
[15] Lewis, F. L.; Syrmos, V. L., Optimal control (1995), John Wiley
[16] Murray, J. J.; Cox, C. J.; Lendaris, G. G.; Saeks, R., Adaptive dynamic programming, IEEE Transactions on Systems, Man and Cybernetics, 32, 2, 140-153 (2002)
[18] Prokhorov, D. Prokhorov; Wunsch, D., Adaptive critic designs, IEEE Transactions on Neural Networks, 8, 5, 997-1007 (1997)
[19] Sandberg, E. W., Notes on uniform approximation of time-varying systems on finite time intervals, IEEE Transactions on Circuits and Systems—1: Fundamental Theory and Applications, 45, 8, 863-865 (1998) · Zbl 0952.94027
[20] Si, J.; Barto, A.; Powel, W.; Wunsch, D., Handbook of learning and approximate dynamic programming (2004), John Wiley: John Wiley New Jersey
[22] Stevens, B.; Lewis, F. L., Aircraft control and simulation (2003), John Willey: John Willey New Jersey
[23] Sutton, R. S.; Barto, A. G., Reinforcement learning — an introduction (1998), MIT Press: MIT Press Cambridge, MA
[24] Tao, G., (Adaptive control design and analysis. Adaptive control design and analysis, Adaptive and learning systems for signal processing, communications and control series (2003), Wiley-Interscience: Wiley-Interscience Hoboken, NJ) · Zbl 1061.93004
[25] Van der Schaft, A. J., \(L 2\)-gain analysis of nonlinear systems and nonlinear state feedback \(H \infty\) control, IEEE Transactions on Automatic Control, 37, 6, 770-784 (1992) · Zbl 0755.93037
[27] Vrabie, D.; Pastravanu, O.; Lewis, F. L.; Abu-Khalaf, M., Adaptive optimal control for continuous-time linear systems based on policy iteration, Automatica, 45, 2, 477-484 (2009), doi:10.1016/j.automatica.2008.08.017 · Zbl 1158.93354
[31] Werbos, P. J., Approximate dynamic programming for real-time control and neural modeling, (White, D. A.; Sofge, D. A., Handbook of intelligent control (1992), Van Nostrand Reinhold: Van Nostrand Reinhold New York)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.