×

Discrete-time dynamic graphical games: model-free reinforcement learning solution. (English) Zbl 1340.91018

Summary: This paper introduces a model-free reinforcement learning technique that is used to solve a class of dynamic games known as dynamic graphical games. The graphical game results from multi-agent dynamical systems, where pinning control is used to make all the agents synchronize to the state of a command generator or a leader agent. Novel coupled Bellman equations and Hamiltonian functions are developed for the dynamic graphical games. The Hamiltonian mechanics are used to derive the necessary conditions for optimality. The solution for the dynamic graphical game is given in terms of the solution to a set of coupled Hamilton-Jacobi-Bellman equations developed herein. Nash equilibrium solution for the graphical game is given in terms of the solution to the underlying coupled Hamilton-Jacobi-Bellman equations. An online model-free policy iteration algorithm is developed to learn the Nash solution for the dynamic graphical game. This algorithm does not require any knowledge of the agents’ dynamics. A proof of convergence for this multi-agent learning algorithm is given under mild assumptions about the inter-connectivity properties of the graph. A gradient descent technique with critic network structures is used to implement the policy iteration algorithm to solve the graphical game online in real-time.

MSC:

91A25 Dynamic games
68T05 Learning and adaptive systems in artificial intelligence
91A43 Games involving graphs
Full Text: DOI

References:

[1] P. J. Werbos. Neural networks for control and system identification. Proceedings of the 28th IEEE Conference on Decision and Control, New York: IEEE, 1989: 260–265.
[2] P. J. Werbos. Approximate Dynamic Programming for Real-time Control and Neural Modeling. Handbook of Intelligent Control. D. A. White, D. A. Sofge (ed.). New York: Van Nostrand Reinhold, 1992.
[3] M. I. Abouheaf, F. L. Lewis, S. Haesaert, et al. Multi-agent discrete-time graphical games: interactive Nash equilibrium and value iteration solution. Proceedings of the American Control Conference, New York: IEEE, 2013: 4189–4195.
[4] K. G. Vamvoudakis, F. L. Lewis, G. R. Hudas. Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica, 2012, 48(8): 1598–1611. · Zbl 1267.93190 · doi:10.1016/j.automatica.2012.05.074
[5] A. E. Bryson. Optimal control-1950 to 1985. IEEE Control Systems, 1996, 16(3): 26–33. · doi:10.1109/37.506395
[6] F. L. Lewis, D. Vrabie, V. L. Syrmos. Optimal Control. 3rd ed. New York: John Wiley & Sons, 2012. · Zbl 1284.49001
[7] J. E. Marsden, M. West. Discrete mechanics and variational integrators. Acta Numerica, 2001, 10(5): 357–514. · Zbl 1123.37327 · doi:10.1017/S096249290100006X
[8] Y. B. Suris. Discrete Lagrangian models. International School on Discrete Integrable Systems, Berlin: Springer, 2004: 111–184. · Zbl 1066.37035
[9] S. Lall, M. West. Discrete variational Hamiltonian mechanics. Journal of Physics A: Mathematical and General, 2006, 39(19): 5509–5519. · Zbl 1087.49027 · doi:10.1088/0305-4470/39/19/S11
[10] S. Mu, T. Chu, L. Wang. Coordinated collective motion in a motile particle group with a leader. Physica A, 2005, 351(2/4): 211–226. · doi:10.1016/j.physa.2004.12.054
[11] R. W. Beard, V. Stepanyan. Synchronization of information in distributed multiple vehicle coordinated control. Proceedings of the IEEE Conference on Decision and Control, Maui: IEEE, 2003: 2029–2034.
[12] A. Jadbabaie, J. Lin, A. Morse. Coordination of groups of mobile autonomous agents using nearest neighbor rules. IEEE Transactions on Automatic Control, 2003, 48(6): 988–1001. · Zbl 1364.93514 · doi:10.1109/TAC.2003.812781
[13] R. Olfati-Saber, R. Murray. Consensus problems in networks of agents with switching topology and time-delays. IEEE Transactions on Automatic Control, 2004, 49(9): 1520–1533. · Zbl 1365.93301 · doi:10.1109/TAC.2004.834113
[14] Z. Qu. Cooperative Control of Dynamical Systems: Applications to Autonomous Vehicles. New York: Springer, 2009. · Zbl 1171.93005
[15] W. Ren, R. Beard, E. Atkins. A survey of consensus problems in multi-agent coordination. Proceedings of the American Control Conference, New York: IEEE, 2005: 1859–1864.
[16] J. Tsitsiklis. Problems in Decentralized Decision Making and Computation. Ph.D. dissertation. Cambridge, MA: Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 1984.
[17] Z. Li, Z. Duan, G. Chen, et al. Consensus of multi-agent systems and synchronization of complex networks: A unified viewpoint. IEEE Transactions on Circuits and Systems, 2010, 57(1): 213–224.
[18] X. Li, X. Wang, G. Chen. Pinning a complex dynamical network to its equilibrium. IEEE Transactions on Circuits and Systems, 2004, 51(10): 2074–2087. · Zbl 1374.94915 · doi:10.1109/TCSI.2004.835655
[19] W. Ren, K. Moore, Y. Chen. High-order and model reference consensus algorithms in cooperative control of multivehicle systems. Journal of Dynamic Systems, Measurement and Control, 2007, 129(5): 678–688. · doi:10.1115/1.2764508
[20] J. Kuang, J. Zhu. On consensus protocols for high-order multiagent systems. Journal of Control Theory and Applications, 2010, 8(4): 406–412. · doi:10.1007/s11768-010-9064-4
[21] S. Zhang, G. Duan. Consensus seeking in multi-agent cooperative control systems with bounded control input. Journal of Control Theory and Applications, 2011, 9(2): 210–214. · doi:10.1007/s11768-011-9077-7
[22] R. Gopalakrishnan, J. R. Marden, A. Wierman. An architectural view of game theoretic control. Performance Evaluation Review, 2011, 38(3): 31–36. · doi:10.1145/1925019.1925026
[23] T. Başar, G. J. Olsder. Dynamic Non-cooperative Game Theory. Classics in Applied Mathematics. 2nd ed. Philadelphia: SIAM, 1999.
[24] G. Freiling, G. Jank, H. Abou-Kandil. On global existence of Solutions to coupled matrix Riccati equations in closed-loop Nash Games. IEEE Transactions on Automatic Control, 2002, 41(2): 264–269. · Zbl 0845.90137 · doi:10.1109/9.481532
[25] Z. Gajic, T.-Y. Li. Simulation results for two new algorithms for solving coupled algebraic Riccati equations. Proceedings of the 3rd International Symposium on Differential Games. Sophia Antipolis, France, 1988.
[26] A. G. Barto, R. S. Sutton, C. W. Anderson. Neuron like adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems Man and Cybernetics, 1983, 13(5): 834–846. · doi:10.1109/TSMC.1983.6313077
[27] R. Howard. Dynamic Programming and Markov Processes. Cambridge, MA: MIT Press, 1960. · Zbl 0091.16001
[28] R. Bellman. Dynamic Programming. Princeton: Princeton University Press, 1957. · Zbl 0077.13605
[29] D. P. Bertsekas, J. N. Tsitsiklis. Neuro-dynamic Programming. Belmont, MA: Athena Scientific 1996. · Zbl 0924.68163
[30] P. J. Werbos. Intelligence in the Brain: a theory of how it works and how to build it. Conference on Goal-Directed Neural Systems, Oxford: Pergamon-Elsevier Science Ltd., 2009: 200–212.
[31] D. Vrabie, F. L. Lewis. Adaptive dynamic programming for online solution of a zero-sum differential game. Journal of Control Theory and Applications, 2011, 9(3): 353–360. · Zbl 1249.90308 · doi:10.1007/s11768-011-0166-4
[32] J. Morimoto, G. Zeglin, C. Atkeson. Minimax differential dynamic programming: application to a biped walking robot. IEEE/RSJ International Conference on Intelligent Robots and Systems, New York: IEEE, 2003: 1927–1932.
[33] T. Landelius. Reinforcement Learning and Distributed Local Model Synthesis. Ph.D. dissertation. Sweden: Linkoping University, 1997.
[34] R. S. Sutton, A. G. Barto. Reinforcement Learning - An Introduction. Cambridge, MA: MIT Press, 1998.
[35] S. Sen, G. Weiss. Learning In Multiagent Systems, in Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. Cambridge, MA: MIT Press, 1999: 259–298.
[36] K. G. Vamvoudakis, F. L. Lewis. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 2010, 46(5): 878–888. · Zbl 1191.49038 · doi:10.1016/j.automatica.2010.02.018
[37] K. G. Vamvoudakis, F. L. Lewis. Multi-player non-zero sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 2011, 47(8): 1556–1569. · Zbl 1237.91015 · doi:10.1016/j.automatica.2011.03.005
[38] D. Vrabie, O. Pastravanu, F. L. Lewis, et al. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica, 2009, 45(2): 477–484. · Zbl 1158.93354 · doi:10.1016/j.automatica.2008.08.017
[39] D. P. Bertsekas. Approximate policy iteration: A survey and some new methods. Journal of Control Theory and Applications, 2011, 9(3): 310–335. · Zbl 1249.90179 · doi:10.1007/s11768-011-1005-3
[40] L. Busoniu, R. Babuska, B. De-Schutter. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man and Cybernetics, 2008, 38(2): 156–172. · doi:10.1109/TSMCC.2007.913919
[41] P. Vrancx, K. Verbeeck, A. Nowe. Decentralized learning in markov games. IEEE Transactions on Systems, Man and Cybernetics, 2008, 38(4): 976–981. · doi:10.1109/TSMCB.2008.920998
[42] M. L. Littman. Value-function reinforcement learning in Markov games. Cognitive Systems Research, 2001, 2(1): 55–66. · doi:10.1016/S1389-0417(01)00015-8
[43] Y. Jiang, Z. Jiang. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica, 2012, 48(10): 2699–2704. · Zbl 1271.93088 · doi:10.1016/j.automatica.2012.06.096
[44] Y. Jiang, Z. Jiang. Global adaptive dynamic programming for continuous-time nonlinear systems. 2013: http://arxiv.org/abs/1401.0020 . · Zbl 1360.49017
[45] T. Dierks, S. Jagannathan. Optimal control of affine nonlinear continuous-time systems using an online Hamilton-Jacobi-Isaacs formulation. Proceedings of the 49th IEEE Conference on Decision and Control, New York: IEEE, 2010: 3048–3053.
[46] M. Johnson, T. Hiramatsu, N. Fitz-Coy, et al. Asymptotic Stackelberg optimal control design for an uncertain Euler Lagrange system. Proceedings of the 49th IEEE Conference on Decision and Control, New York: IEEE, 2010: 6686–6691.
[47] F. L. Lewis. Applied Optimal Control and Estimation: Digital Design and Implementation. Englewood Cliffs: Prentice Hall, 1992. · Zbl 0778.93001
[48] S. Khoo, L. Xie, Z. Man. Robust finite-time consensus tracking algorithm for multirobot systems. IEEE/ASME Transactions on Mechatronics, 2009, 14(2): 219–228. · doi:10.1109/TMECH.2009.2014057
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.