×

Q-learning solution for optimal consensus control of discrete-time multiagent systems using reinforcement learning. (English) Zbl 1418.93250

Summary: This paper investigates a Q-learning scheme for the optimal consensus control of discrete-time multiagent systems. The Q-learning algorithm is conducted by reinforcement learning (RL) using system data instead of system dynamics information. In the multiagent systems, the agents are interacted with each other and at least one agent can communicate with the leader directly, which is described by an algebraic graph structure. The objective is to make all the agents achieve synchronization with leader and make the performance indices reach Nash equilibrium. On one hand, the solutions of the optimal consensus control for multiagent systems are acquired by solving the coupled Hamilton-Jacobi-Bellman (HJB) equation. However, it is difficult to get analytical solutions directly of the discrete-time HJB equation. On the other hand, accurate mathematical models of most systems in real world are hard to be obtained. To overcome these difficulties, Q-learning algorithm is developed using system data rather than the accurate system model. We formulate performance index and corresponding Bellman equation of each agent \(i\). Then, the Q-function Bellman equation is acquired on the basis of Q-function. Policy iteration is adopted to calculate the optimal control iteratively, and least square (LS) method is employed to motivate the implementation process. Stability analysis of proposed Q-learning algorithm for multiagent systems by policy iteration is given. Two simulation examples are experimented to verify the effectiveness of the proposed scheme.

MSC:

93D99 Stability of control systems
93A14 Decentralized systems
93C55 Discrete-time control/observation systems
93D05 Lyapunov and other classical stabilities (Lagrange, Poisson, \(L^p, l^p\), etc.) in control theory
05C90 Applications of graph theory
Full Text: DOI

References:

[1] Meng, W.; Yang, Q.; Sarangapani, J.; Sun, Y., Distributed control of nonlinear multiagent systems with asymptotic consensus, IEEE Trans. Syst. Man Cybern.: Syst., 47, 5, 749-757 (2017)
[2] Wang, H.; Liao, X.; Huang, T.; Li, C., Cooperative distributed optimization in multiagent networks with delays, IEEE Trans. Syst. Man Cybern.: Syst., 45, 2, 363-369 (2015)
[3] Chen, Y.; Wen, G.; Peng, Z.; Rahmani, A., Consensus of fractional-order multiagent system via sampled-data event-triggered control, J. Frankl. Inst. (2019), , In press. doi:10.1016/j.jfranklin.2018.01.043 · Zbl 1425.93248
[4] Chen, C.; Ren, C.; Du, T., Fuzzy observed-based adaptive consensus tracking control for second-order multiagent systems with heterogeneous nonlinear dynamics, IEEE Trans. Fuzzy Syst., 24, 4, 906-915 (2016)
[5] Rezaee, H.; Abdollahi, F., Discrete-time consensus strategy for a class of high-order linear multiagent systems under stochastic communication topologies, J. Frankl. Inst., 354, 9, 3690-3705 (2017) · Zbl 1367.93029
[6] Cao, Y.; Yu, W.; Ren, W.; Chen, G., An overview of recent progress in the study of distributed multi-agent coordination, IEEE Trans. Ind. Inform., 9, 1, 427-438 (2013)
[7] Ren, W.; Beard, R.; Atkins, E., A survey of consensus problems in multi-agent coordination, Proceedings of American Control Conference, 1859-1864 (2005)
[8] Fax, J.; Murray, R., Information flow and cooperative control of vehicle formations, IEEE Trans. Autom. Control, 49, 9, 1465-1476 (2004) · Zbl 1365.90056
[9] Lin, J.; Morse, A.; Anderson, B., The multi-agent rendezvous problem: the asynchronous case, SIAM J. Control Optim., 2, 1926-1931 (2004)
[10] Zhu, J.; Lu, J.; Yu, X., Flocking of multi-agent non-holonomic systems with proximity graphs, IEEE Trans. Circuits Syst. I: Regul. Pap., 60, 1, 199-210 (2013) · Zbl 1468.93047
[11] Saber, R. O.; Murray, R. M., Consensus problems in networks of agents with switching topology and time-delays, IEEE Trans. Autom. Control, 49, 9, 1520-1533 (2004) · Zbl 1365.93301
[12] Lesser, V.; Ortiz, C.; Tambe, M., Distributed Sensor Networks: A Multiagent Perspective (2012), Springer: Springer New York, NY, USA
[13] Vamvoudakis, K.; Lewis, F., Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations, Automatica, 47, 8, 1556-1569 (2011) · Zbl 1237.91015
[14] Si, J.; Wang, Y., Online learning control by association and reinforcement, IEEE Trans. Neural Netw., 12, 2, 264-276 (2001)
[15] Liu, F.; Sun, J.; Si, J.; Guo, W.; Mei, S., A boundedness result for the direct heuristic dynamic programming, Neural Netw., 32, 229-235 (2012) · Zbl 1254.90286
[16] Sokolov, Y.; Kozma, R.; Werbos, L.; Werbos, P., Complete stability analysis of a heuristic approximate dynamic programming control design, Automatica, 59, 9-18 (2015) · Zbl 1338.90442
[17] Modares, H.; Lewis, F.; Naghibi-Sistani, M., Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks, IEEE Trans. Neural Netw. Learn. Syst., 24, 10, 1513-1525 (2013)
[18] Abu-Khalaf, M.; Lewis, F., Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach, Automatica, 41, 5, 779-791 (2005) · Zbl 1087.49022
[19] Wei, Q.; Zhang, H.; Liu, D.; Zhao, Y., An optimal control scheme for a class of discrete-time nonlinear systems with time delays using adaptive dynamic programming, Acta Autom. Sin., 36, 1, 121-129 (2010) · Zbl 1240.49044
[20] Wang, B.; Zhao, D.; Alippi, C.; Liu, D., Dual heuristic dynamic programming for nonlinear discrete-time uncertain systems with state delay, Neurocomputing, 134, 222-229 (2014)
[21] Kiumarsi, B.; Lewis, F., Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems, IEEE Trans. Neural Netw. Learn. Syst., 26, 1, 140-151 (2015)
[22] Modares, H.; Lewis, F., Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning, Automatica, 50, 7, 1780-1792 (2014) · Zbl 1296.93073
[23] Zhang, H.; Jiang, H.; Luo, Y.; Xiao, G., Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method, IEEE Trans. Ind. Electron., 64, 5, 4091-4100 (2017)
[24] Vamvoudakis, K.; Lewis, F., Online solution of nonlinear two-player zero-sum games using synchronous policy iteration, Int. J. Robust Nonlinear Control, 22, 13, 1460-1483 (2012) · Zbl 1292.91011
[25] Zhang, H.; Cui, L.; Luo, Y., Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using singlenetwork ADP, IEEE Trans. Cybern., 43, 1, 206-216 (2013)
[26] Mu, C.; Sun, C.; Song, A.; Yu, H., Iterative GDHP-based approximate optimal tracking control for a class of discrete-time nonlinear systems, Neurocomputing, 214, 775-784 (2016)
[27] Mu, C.; Wang, D.; He, H., Novel iterative neural dynamic programming for data-based approximate optimal control design, Automatica, 81, 240-252 (2017) · Zbl 1373.90170
[28] Zhang, H.; Liu, Z.; Huang, G.; Wang, Z., Novel weighting-delay-based stability criteria for recurrent neural networks with time-varying delay, IEEE Trans. Neural Netw. Learn. Syst., 21, 1, 91-106 (2010)
[29] Zhang, C.; Zou, W.; Cheng, N.; Gao, J., Trajectory tracking control for rotary steerable systems using interval type-2 fuzzy logic and reinforcement learning, J. Frankl. Inst., 355, 2, 803-826 (2018) · Zbl 1384.93074
[30] Zhang, H.; Wang, Z.; Liu, D., A comprehensive review of stability analysis of continuous-time recurrent neural networks, IEEE Trans. Neural Netw. Learn. Syst., 25, 7, 1229-1262 (2014)
[31] Zhang, K.; Zhang, H.; Gao, Z.; Su, H., Online adaptive policy iteration based fault-tolerant control algorithm for continuous-time nonlinear tracking systems with actuator failures, J. Frankl. Inst., 355, 15, 6947-6968 (2018) · Zbl 1398.93141
[32] Zhang, H.; Yang, F.; Liu, X.; Zhang, Q., Stability analysis for neural networks with time-varying delay based on quadratic convex combination, IEEE Trans. Neural Netw. Learn. Syst., 24, 4, 513-521 (2013)
[33] Jiang, Y.; Fan, J.; Chai, T.; Lewis, F.; Li, J., Tracking control for linear discrete-time networked control systems with unknown dynamics and dropout, IEEE Trans. Neural Netw. Learn. Syst., 29, 10, 4607-4620 (2018)
[34] Al-Tamimi, A.; Lewis, F.; Abu-Khalaf, M., Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control, Automatica, 43, 3, 473-481 (2007) · Zbl 1137.93321
[35] Sahoo, A.; Jagannathan, S., Event-triggered optimal regulation of uncertain linear discrete-time systems by using Q-learning scheme, Proceedings of the IEEE Conference on Decision and Control, 1233-1238 (2015)
[36] Fu, Y.; Chai, T.; Fan, J., Robust adaptive quadratic tracking control of continuous-time linear systems with unknown dynamics, Proceedings of American Control Conference, 2230-2235 (2015)
[37] Vamvoudakis, K.; Lewis, F.; Hudas, G., Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality, Automatica, 48, 8, 1598-1611 (2012) · Zbl 1267.93190
[38] Abouheaf, M.; Lewis, F., Multi-agent differential graphical games: Nash online adaptive learning solutions, Proceedings of the 52nd IEEE Conference on Decision and Control, 5803-5809 (2013)
[39] Abouheaf, M.; Lewis, F.; Vamvoudakis, K.; Haesaert, S.; Babuska, R., Multi-agent discrete-time graphical games and reinforcement learning solutions, Automatica, 50, 12, 3038-3053 (2014) · Zbl 1367.91032
[40] Abouheaf, M.; Lewis, F.; Haesaert, S.; Babuska, R.; Vamvoudakis, K., Multi-agent discrete-time graphical games: interactive Nash equilibrium and value iteration solution, Proceedings of American Control Conference (ACC), 4189-4195 (2013)
[41] Wang, C.; Zuo, Z.; Sun, J.; Yang, J.; Ding, Z., Consensus disturbance rejection for Lipschitz nonlinear multi-agent systems with input delay: A DOBC approach, J. Frankl. Inst., 354, 1, 298-315 (2017) · Zbl 1355.93020
[42] Chen, C.; Wen, G.; Liu, Y.; Wang, F., Adaptive consensus control for a class of nonlinear multiagent time-delay systems using neural networks, IEEE Trans. Neural Netw. Learn. Syst., 25, 6, 1217-1226 (2014)
[43] Wen, G.; Chen, C.; Liu, Y.; Liu, Z., Neural network-based adaptive leader-following consensus control for a class of nonlinear multiagent state-delay systems, IEEE Trans. Cybern., 47, 8, 2151-2160 (2017)
[44] Abouheaf, M.; Lewis, F.; Mahmoud, M.; Mikulski, D., Discrete-time dynamic graphical games: model-free reinforcement learning solution, Control Theory Technol., 13, 1, 55-69 (2015) · Zbl 1340.91018
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.