×

Optimal dynamic output feedback control of unknown linear continuous-time systems by adaptive dynamic programming. (English) Zbl 1537.93261

Summary: In this paper, we present an approximate optimal dynamic output feedback control learning algorithm to solve the linear quadratic regulation problem for unknown linear continuous-time systems. First, a dynamic output feedback controller is designed by constructing the internal state. Then, an adaptive dynamic programming based learning algorithm is proposed to estimate the optimal feedback control gain by only accessing the input and output data. By adding a constructed virtual observer error into the iterative learning equation, the proposed learning algorithm with the new iterative learning equation is immune to the observer error. In addition, the value iteration based learning equation is established without storing a series of past data, which could lead to a reduction of demands on the usage of memory storage. Besides, the proposed algorithm eliminates the requirement of repeated finite window integrals, which may reduce the computational load. Moreover, the convergence analysis shows that the estimated control policy converges to the optimal control policy. Finally, a physical experiment on an unmanned quadrotor is given to illustrate the effectiveness of the proposed approach.

MSC:

93B52 Feedback control
93C40 Adaptive control/observation systems
49L20 Dynamic programming in optimal control and differential games
49N10 Linear-quadratic optimal control problems
93C05 Linear systems in control theory
Full Text: DOI

References:

[1] Bian, T.; Jiang, Z. P., Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design, Automatica, 71, 348-360, 2016 · Zbl 1343.93095
[2] Chen, C.; Modares, H.; Xie, K.; Lewis, F. L.; Wan, Y.; Xie, S., Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics, IEEE Transactions on Automatic Control, 64, 11, 4423-4438, 2019 · Zbl 1482.93302
[3] Chen, C.; Xie, L.; Jiang, Y.; Xie, K.; Xie, S., Robust output regulation and reinforcement learning-based output tracking design for unknown linear discrete-time systems, IEEE Transactions on Automatic Control, 68, 4, 2391-2398, 2023 · Zbl 1529.93050
[4] Gao, W.; Jiang, Z. P., Adaptive dynamic programming and adaptive optimal output regulation of linear systems, IEEE Transactions on Automatic Control, 61, 12, 4164-4169, 2016 · Zbl 1359.93224
[5] Gao, W.; Jiang, Z., Learning-based adaptive optimal tracking control of strict-feedback nonlinear systems, IEEE Transactions on Neural Networks and Learning Systems, 29, 6, 2614-2624, 2018
[6] Gao, W.; Jiang, Z. P., Adaptive optimal output regulation of time-delay systems via measurement feedback, IEEE Transactions on Neural Networks and Learning Systems, 30, 3, 938-945, 2019
[7] Gao, W.; Jiang, Y.; Davari, M., Data-driven cooperative output regulation of multi-agent systems via robust adaptive dynamic programming, IEEE Transactions on Circuits and Systems II: Express Briefs, 66, 3, 447-451, 2019
[8] Gao, W.; Jiang, Y.; Jiang, Z. P.; Chai, T., Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming, Automatica, 72, 37-45, 2016 · Zbl 1344.93060
[9] Gao, W.; Jiang, Z. P.; Lewis, F. L., Leader-to-formation stability of multi-agent systems: an adaptive optimal control approach, IEEE Transactions on Automatic Control, 63, 10, 3581-3588, 2018 · Zbl 1423.93015
[10] Jha, S. K.; Roy, S. B.; Bhasin, S., Initial excitation-based iterative algorithm for approximate optimal control of completely unknown LTI systems, IEEE Transactions on Automatic Control, 64, 12, 5230-5237, 2019 · Zbl 1482.93309
[11] Jiang, Y.; Jiang, Z. P., Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics, Automatica, 48, 10, 2699-2704, 2012 · Zbl 1271.93088
[12] Jiang, Y.; Jiang, Z. P., Robust adaptive dynamic programming with an application to power systems, IEEE Transactions on Neural Networks and Learning Systems, 24, 7, 1150-1156, 2013
[13] Jiang, Y.; Jiang, Z. P., Robust adaptive dynamic programming, 2017, Wiley: Wiley Hoboken , NJ, USA · Zbl 1406.90003
[14] Jiang, H.; Zhang, H.; Zhang, K.; Cui, X., Data-driven adaptive dynamic programming schemes for non-zero-sum games of unknown discrete-time nonlinear systems, Neurocomputing, 275, 649-658, 2018
[15] Lancaster, P.; Rodman, L., Algebraic riccati equations, 1995, Oxford University Press Inc.: Oxford University Press Inc. New York, NY, USA · Zbl 0836.15005
[16] Lewis, F. L.; Vamvoudakis, K. G., Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data, IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 41, 1, 14-25, 2011
[17] Lewis, F. L.; Vrabie, D., Reinforcement learning and adaptive dynamic programming for feedback control, IEEE Circuits and Systems Magazine, 9, 3, 32-50, 2009
[18] Lewis, F. L.; Vrabie, D. L.; Syrmos, V. L., Optimal control, 2012, John Wiley & Sons, Inc. · Zbl 1284.49001
[19] Liu, D.; Wei, Q.; Ding, W.; Yang, X.; Li, H., Adaptive dynamic programming with applications in optimal control, 2017, Springer: Springer Cham, Switzerland · Zbl 1390.93003
[20] Liu, D.; Xue, S.; Zhao, B.; Luo, B.; Wei, Q., Adaptive dynamic programming for control: A survey and recent advances, IEEE Transactions on System Man Cybernetics: System, 51, 1, 142-160, 2021
[21] Luo, B.; Yang, Y.; Liu, D., Adaptive Q-learning for data-based optimal output regulation with experience replay, IEEE Transactions on Cybernetics, 48, 12, 3337-3348, 2018
[22] Modares, H.; Lewis, F. L.; Jiang, Z., Optimal output-feedback control of unknown continuous-time linear systems using off-policy reinforcement learning, IEEE Transactions on Cybernetics, 46, 11, 2401-2410, 2016
[23] Peng, Y.; Meng, Q.; Sun, W., Adaptive output-feedback quadratic tracking control of continuous-time systems via value iteration with its application, IET Control Theory & Applications, 14, 20, 3621-3631, 2020
[24] Powell, W., Approximate dynamic programming: solving the curse of dimensionality, 2004, Wiley: Wiley New York, NY, USA
[25] Rizvi, S. A.A.; Lin, Z., Output feedback adaptive dynamic programming for linear differential zero-sum games, Automatica, 122, Article 109272 pp., 2020 · Zbl 1451.91026
[26] Rizvi, S. A.A.; Lin, Z., Reinforcement learning-based linear quadratic regulation of continuous-time systems using dynamic output feedback, IEEE Transactions on Cybernetics, 50, 11, 4670-4679, 2020
[27] Rizvi, S. A.A.; Pertzborn, A. J.; Lin, Z., Reinforcement learning based optimal tracking control under unmeasurable disturbances with application to HVAC systems, IEEE Transactions on Neural Networks and Learning Systems, 33, 12, 7523-7533, 2022
[28] Roy, S. B.; Bhasin, S.; Kar, I. N., Combined MRAC for unknown MIMO LTI systems with parameter convergence, IEEE Transactions on Automatic Control, 63, 1, 283-290, 2018 · Zbl 1390.93453
[29] Sun, W.; Zhao, G.; Peng, Y., Adaptive optimal output feedback tracking control for unknown discrete-time linear systems using a combined reinforcement Q-learning and internal model method, IET Control Theory & Applications, 13, 18, 3075-3086, 2019
[30] Sutton, R. S.; Barto, A. G., Introduction to reinforcement learning, 1998, MIT Press: MIT Press Cambridge, MA, USA
[31] Vamvoudakis, K. G.; Lewis, F. L., Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem, Automatica, 46, 5, 878-888, 2010 · Zbl 1191.49038
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.