×

\(Q\)-learning-based non-zero sum games for Markov jump multiplayer systems under actor-critic NNs structure. (English) Zbl 07897614

Summary: This article addresses the problem of non-zero sum games for Markov jump multiplayer systems (MJMSs) using the reinforcement \(Q\)-learning method. Firstly, the \(Q\)-functions for each player are derived from the system states and the control inputs. On this basis, by incorporating the integral reinforcement learning scheme and the actor-critic neural networks structure, we design a novel reinforcement learning approach for MJMSs. It should be noted that the designed algorithm does not require any information about the system dynamics and transition probabilities. Furthermore, the stochastic stability and Nash equilibrium of MJMSs can be ensured by the designed algorithm. Finally, a simulation example is presented to illustrate the effectiveness of the designed approach.

MSC:

91A15 Stochastic games, stochastic differential games
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI

References:

[1] Aliev, Rafik A.; Pedrycz, Witold; Guirimov, Babek G.; Aliev, Rashad R.; Ilhan, Umit; Babagil, Mustafa; Mammadli, Sadik, Type-2 fuzzy neural networks with fuzzy clustering and differential evolution optimization, Inf. Sci., 181, 9, 1591-1608, 2011
[2] Bian, Tao; Jiang, Zhong-Ping, Reinforcement learning and adaptive optimal control for continuous-time nonlinear systems: a value iteration approach, IEEE Trans. Netw. Learn. Syst., 33, 7, 2781-2790, 2022
[3] Dong, Shanling; Liu, Meiqin, Adaptive fuzzy asynchronous control for nonhomogeneous Markov jump power systems under hybrid attacks, IEEE Trans. Fuzzy Syst., 31, 3, 1009-1019, 2023
[4] Guo, Ge; Zhang, Renyongkang; Zhou, Zeng-Di, A local-minimization-free zero-gradient-sum algorithm for distributed optimization, Automatica, 157, Article 111247 pp., 2023 · Zbl 1525.93008
[5] Hei, Shuping; Song, Jun; Ding, Zhengtao; Liu, Fei, Online adaptive optimal control for continuous-time Markov jump linear systems using a novel policy iteration algorithm, IET Control Theory Appl., 9, 10, 1536-1543, 2015
[6] He, Yongming; Xing, Lining; Chen, Yingwu; Pedrycz, Witold; Wang, Ling; Wu, Guohua, A generic Markov decision process model and reinforcement learning method for scheduling agile Earth observation satellites, IEEE Trans. Syst. Man Cybern. Syst., 52, 3, 1463-1474, 2022
[7] Huang, Zhen; Tu, Yidong; Fang, Haiyang; Wang, Hai; Zhang, Liang; Shi, Kaibo; He, Shuping, Off-policy reinforcement learning for tracking control of discrete-time Markov jump linear systems with completely unknown dynamics, J. Franklin Inst., 360, 3, 2361-2378, 2023 · Zbl 1507.93140
[8] Li, Hongyi; Shi, Peng; Yao, Deyin; Wu, Ligang, Observer-based adaptive sliding mode control for nonlinear Markovian jump systems, Automatica, 64, 133-142, 2016 · Zbl 1329.93126
[9] Li, Menghua; Wang, Ding; Zhao, Mingming; Qiao, Junfei, Event-triggered constrained neural critic control of nonlinear continuous-time multiplayer nonzero-sum games, Inf. Sci., 631, 412-428, 2023 · Zbl 1536.93552
[10] Li, Yongming; Wang, Tiechao; Liu, Wei; Tong, Shaocheng, Neural network adaptive output-feedback optimal control for active suspension systems, IEEE Trans. Syst. Man Cybern. Syst., 52, 6, 4021-4032, 2022
[11] Liu, Mushuang; Wan, Yan; Lewis, Frank L.; Lopez, Victor G., Adaptive optimal control for stochastic multiplayer differential games using on-policy and off-policy reinforcement learning, IEEE Trans. Netw. Learn. Syst., 31, 12, 5522-5533, 2020
[12] Long, Mingkang; An, Qing; Su, Housheng; Luo, Hui; Zhao, Jin, Model-free algorithm for consensus of discrete-time multi-agent systems using reinforcement learning method, J. Franklin Inst., 360, 14, 10564-10581, 2023 · Zbl 1521.93179
[13] Ming, Zhongyang; Zhang, Huaguang; Li, Weihua; Luo, Yanhong, Base on Q-learning Pareto optimality for linear Itô stochastic systems with Markovian jumps, IEEE Trans. Autom. Sci. Eng., 21, 1, 965-975, 2024
[14] Mu, Chaoxu; Wang, Ke; Ni, Zhen, Adaptive learning and sampled-control for nonlinear game systems using dynamic event-triggering strategy, IEEE Trans. Netw. Learn. Syst., 33, 9, 4437-4450, 2022
[15] Pedrycz, Witold, Conditional fuzzy clustering in the design of radial basis function neural networks, IEEE Trans. Neural Netw., 9, 4, 601-612, 1998
[16] Peng, Zhinan; Luo, Rui; Hu, Jiangping; Shi, Kaibo; Nguang, Sing Kiong; Ghosh, Bijoy Kumar, Optimal tracking control of nonlinear multiagent systems using internal reinforce Q-learning, IEEE Trans. Netw. Learn. Syst., 33, 8, 4043-4055, 2022
[17] Qi, Wenhai; Zong, Guangdeng; Karimi, Hamid Reza, Sliding mode control for nonlinear stochastic semi-Markov switching systems with application to SRMM, IEEE Trans. Ind. Electron., 67, 5, 3955-3966, 2020
[18] Song, Jun; He, Shuping; Ding, Zhengtao; Liu, Fei, A new iterative algorithm for solving \(H_\infty\) control problem of continuous-time Markovian jumping linear systems based on online implementation, Int. J. Robust Nonlinear Control, 26, 17, 3737-3754, 2016 · Zbl 1351.93055
[19] Tan, Cheng; Gao, Chengzhen; Zhang, Zhengqiang; Wong, Wing Shing, Non-fragile guaranteed cost control for networked nonlinear Markov jump systems under multiple cyber-attacks, J. Franklin Inst., 360, 13, 9446-9467, 2023 · Zbl 1521.93199
[20] Ugrinovskii, Valery; Pota, Hemanshu Roy, Decentralized control of power systems via robust control of uncertain Markov jump parameter systems, Int. J. Control, 78, 9, 662-677, 2005 · Zbl 1121.93362
[21] Vamvoudakis, Kyriakos G., Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems, Automatica, 61, C, 274-281, 2015 · Zbl 1336.91022
[22] Vargas, Alessandro N.; Pujol, Gisela; Acho, Leonardo, Stability of Markov jump systems with quadratic terms and its application to RLC circuits, J. Franklin Inst., 354, 1, 332-344, 2017 · Zbl 1355.93207
[23] Wang, Dong; Liu, Jiaxun; Lian, Jie; Liu, Yang; Wang, Zhu; Wang, Wei, Distributed delayed dual averaging for distributed optimization over time-varying digraphs, Automatica, 150, Article 110869 pp., 2023 · Zbl 1519.93020
[24] Wang, Dong; Wang, Wei, Necessary and sufficient conditions for containment control of multi-agent systems with time delay, Automatica, 103, 418-423, 2019 · Zbl 1415.93033
[25] Wang, Jing; Wu, Jiacheng; Shen, Hao; Cao, Jinde; Rutkowski, Leszek, Fuzzy \(H_\infty\) control of discrete-time nonlinear Markov jump systems via a novel hybrid reinforcement Q-learning method, IEEE Trans. Cybern., 53, 11, 7380-7391, 2023
[26] Ke, Wang; Mu, Chaoxu, Asynchronous learning for actor-critic neural networks and synchronous triggering for multiplayer system, ISA Trans., 129, 295-308, 2022
[27] Ke, Wang; Mu, Chaoxu, Learning-based control with decentralized dynamic event-triggering for vehicle systems, IEEE Trans. Ind. Inform., 19, 3, 2629-2639, 2023
[28] Wei, Qinglai; Zhu, Liao; Song, Ruizhuo; Zhang, Pinjia; Liu, Derong; Xiao, Jun, Model-free adaptive optimal control for unknown nonlinear multiplayer nonzero-sum game, IEEE Trans. Netw. Learn. Syst., 33, 2, 879-892, 2022
[29] Xie, Lifei; Cheng, Jun; Zou, Yanli; Wu, Zheng-Guang; Yan, Huaicheng, A dynamic-memory event-triggered protocol to multiarea power systems with semi-Markov jumping parameter, IEEE Trans. Cybern., 53, 10, 6577-6587, 2023
[30] Xin, Xilin; Tu, Yidong; Stojanovic, Vladimir; Wang, Hai; Shi, Kaibo; He, Shuping; Pan, Tianhong, Online reinforcement learning multiplayer non-zero sum games of continuous-time Markov jump linear systems, Appl. Math. Comput., 412, Article 126537 pp., 2022 · Zbl 1510.91006
[31] Yin, Yanyan; Shi, Peng; Liu, Fei; Teo, Kok Lay; Lim, Cheng-Chew, Robust filtering for nonlinear nonhomogeneous Markov jump systems by fuzzy approximation approach, IEEE Trans. Cybern., 45, 9, 1706-1716, 2015
[32] Zamfirache, Iuliu Alexandru; Precup, Radu-Emil; Roman, Raul-Cristian; Petriu, Emil M., Reinforcement learning-based control using Q-learning and gravitational search algorithm with experimental validation on a nonlinear servo system, Inf. Sci., 583, 99-120, 2022 · Zbl 1532.93132
[33] Zhang, Chengke; Li, Fangchao, Non-zero sum differential game for stochastic Markovian jump systems with partially unknown transition probabilities, J. Franklin Inst., 358, 15, 7528-7558, 2021 · Zbl 1472.93203
[34] Zhang, Haoyan; Wang, Huanqing; Niu, Ben; Zhang, Liang; Ahmad, Adil M., Sliding-mode surface-based adaptive actor-critic optimal control for switched nonlinear systems with average dwell time, Inf. Sci., 580, 756-774, 2021 · Zbl 07786227
[35] Zhang, Huiyan; Chen, Zixian; Zhao, Ning; Xing, Bin; Kalidass, Mathiyalagan, Adaptive neural dissipative control for Markovian jump cyber-physical systems against sensor and actuator attacks, J. Franklin Inst., 360, 12, 7676-7698, 2023 · Zbl 1520.93276
[36] Zhang, Jilie; Wang, Zhanshan; Zhang, Hongwei, Data-based optimal control of multiagent systems: a reinforcement learning design approach, IEEE Trans. Cybern., 49, 12, 4441-4449, 2019
[37] Zhang, Kun; Su, Rong; Zhang, Huaguang, A novel resilient control scheme for a class of Markovian jump systems with partially unknown information, IEEE Trans. Cybern., 52, 8, 8191-8200, 2022
[38] Zhang, Kun; Zhang, Hua-guang; Cai, Yuliang; Su, Rong, Parallel optimal tracking control schemes for mode-dependent control of coupled Markov jump systems via integral RL method, IEEE Trans. Autom. Sci. Eng., 17, 3, 1332-1342, 2020
[39] Zhang, Yongwei; Zhao, Bo; Liu, Derong; Zhang, Shunchao, Event-triggered optimal tracking control of multiplayer unknown nonlinear systems via adaptive critic designs, Int. J. Robust Nonlinear Control, 32, 1, 29-51, 2022 · Zbl 1527.93306
[40] Zhao, Yanwei; Wang, Huanqing; Xu, Ning; Zong, Guangdeng; Zhao, Xudong, Reinforcement learning-based decentralized fault tolerant control for constrained interconnected nonlinear systems, Chaos Solitons Fractals, 167, Article 113034 pp., 2023
[41] Zhou, Peixin; Xue, Huiwen; Wen, Jiwei; Shi, Peng; Luan, Xaoli, Model-free optimal tracking policies for Markov jump systems by solving non-zero-sum games, Inf. Sci., 647, Article 119423 pp., 2023 · Zbl 1521.93213
[42] Zhu, J.; Wang, L. P.; Spiryagin, Maksym, Control and decision strategy for a class of Markovian jump systems in failure prone manufacturing process, IET Control Theory Appl., 6, 12, 1803-1811, 2012
[43] Zhu, Xinye; An, Tianjiao; Dong, Bo, Multiplayer zero-sum games optimal control for modular robot manipulators with interconnected dynamic couplings, Int. J. Adapt. Control Signal Process., 36, 12, 3254-3270, 2022 · Zbl 07842395
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.