Clique-based Cooperative Multiagent Reinforcement Learning Using Factor Graphs

Zhen Zhang; Dongbin Zhao

Volume 1 Issue 3

Jul. 2014

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2014 > 1(3): 248-256

Zhen Zhang and Dongbin Zhao, "Clique-based Cooperative Multiagent Reinforcement Learning Using Factor Graphs," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 3, pp. 248-256, 2014.

Citation:

Zhen Zhang and Dongbin Zhao, "Clique-based Cooperative Multiagent Reinforcement Learning Using Factor Graphs," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 3, pp. 248-256, 2014.

Zhen Zhang and Dongbin Zhao, "Clique-based Cooperative Multiagent Reinforcement Learning Using Factor Graphs," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 3, pp. 248-256, 2014.

Citation:

Zhen Zhang and Dongbin Zhao, "Clique-based Cooperative Multiagent Reinforcement Learning Using Factor Graphs," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 3, pp. 248-256, 2014.

PDF( 1681 KB)

Clique-based Cooperative Multiagent Reinforcement Learning Using Factor Graphs

Zhen Zhang¹,
Dongbin Zhao²

1. State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China and Department of Electric Engineering, College of Automation Engineering, Qingdao University, Qingdao 266071, China;
2. State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

Funds:

This work was supported by National Natural Science Foundation of China (61273136, 61034002), Beijing Natural Science Foundation (4122083), and Visiting Professorship of Chinese Academy of Sciences.

Abstract

Abstract

In this paper, we propose a clique-based sparse reinforcement learning (RL) algorithm for solving cooperative tasks. The aim is to accelerate the learning speed of the original sparse RL algorithm and to make it applicable for tasks decomposed in a more general manner. First, a transition function is estimated and used to update the Q-value function, which greatly reduces the learning time. Second, it is more reasonable to divide agents into cliques, each of which is only responsible for a specific subtask. In this way, the global Q-value function is decomposed into the sum of several simpler local Q-value functions. Such decomposition is expressed by a factor graph and exploited by the general maxplus algorithm to obtain the greedy joint action. Experimental results show that the proposed approach outperforms others with better performance.
- Terms—Multiagent reinforcement learning,
- factor graph,
- max-plus algorithm,
- clique-based decomposition

FullText(HTML)

References(50)

References

[1]	Russell S J, Norvig P, Canny J F, Malik J M, Edwards D D. Artificial Intelligence: A Modern Approach. Englewood Cliffs: Prentice Hall, 1995.
[2]	Zhao D B, Dai Y J, Zhang Z. Computational intelligence in urban traffic signal control, a survey. IEEE Transactions on System, Man and Cybernetics Part C: Applications and Reviews, 2012, 42(4): 485-494
[3]	Li T, Zhao D B, Yi J Q. Adaptive dynamic programming for multiintersections traffic signal intelligent control. In: Proceedings of the 11th IEEE International Conference on Intelligent Transportation Systems. Beijing, China: IEEE, 2008. 286-291
[4]	Zhao D B, Bai X R, Wang F Y, Xu J, Yu W S. DHP for coordinated freeway ramp metering. IEEE Transactions on Intelligent Transportation Systems, 2011, 12(4): 990-999
[5]	Vlassis N. A concise introduction to multiagent systems and distributed artificial intelligence. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2007, 1(1): 1-71
[6]	Khan Z, Glisic S, DaSilva L A, Lehtomaki J. Modeling the dynamics of coalition formation games for cooperative spectrum sharing in an interference channel. IEEE Transactions on Computational Intelligence and AI in Games, 2011, 3(1): 17-30
[7]	Zhao D B, Zhang Z, Dai Y J. Self-teaching adaptive dynamic programming for Go-Moku. Neurocomputing, 2012, 78(1): 23-29
[8]	Tan C H, Tan K C, Tay A. Dynamic game difficulty scaling using adaptive behavior-based AI. IEEE Transactions on Computational Intelligence and AI in Games, 2011, 3(4): 289-301
[9]	Zeng L, Hu G D. Consensus of linear multi-agent systems with communication and input delays. Acta Automatica Sinica, 2013, 39(7): 1133-1140
[10]	Li T, Fu M Y, Xie L H, Zhang J F. Distributed consensus with limited communication data rate. IEEE Transactions on Automatic Control, 2011, 56(2): 279-292
[11]	Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998
[12]	Wiering M A. Multiagent reinforcement learning for traffic light control. In: Proceedings of the 17th International Conference on Machine Learning. Stanford University, US: Morgan Kaufmann, 2000. 1151-1158
[13]	Claus C, Boutilier C. The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the 15th National Conference on Artificial Intelligence and 10th Conference on Innovative Applications of Artificial Intelligence. Menlo Park, CA: AAAI Press/MIT Press, 1998. 746-752
[14]	Tuyls K, Nowé A. Evolutionary game theory and multi-agent reinforcement learning. The Knowledge Engineering Review, 2005, 20(1): 63-90
[15]	Gomes E R, Kowalczyk R. Dynamic analysis of multiagent Q-learning with ε-greedy exploration. In: Proceedings of the 26th International Conference on Machine Learning. New York, USA: ACM, 2009. 369-376
[16]	Kianercy A, Galstyan A. Dynamics of Boltzmann Q-learning in twoplayer two-action games. Physical Review E, 2012, 85(4): 1145-1154
[17]	Littman M L. Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning. New Brunswick, US: Morgan Kaufmann, 1994. 157-163
[18]	Owen G. Game Theory (Second Edition). Orlando, Florida: Academic Press, 1982.
[19]	Littman M L. Friend-or-foe Q-learning in general-sum games. In: Proceedings of the 18th International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann, 2001. 322-328
[20]	Hu J L, Wellman M P. Multiagent reinforcement learning: theoretical framework and an algorithm. In: Proceedings of the 15th International Conference on Machine Learning. Madison, Wisconsin, US: Morgan Kaufmann, 1998. 242-250
[21]	Singh S, Kearns M, Mansour Y. Nash convergence of gradient dynamics in general-sum games. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence. Stanford University, Stanford, California, US: Morgan Kaufmann, 2000. 541-548
[22]	Bowling M, Veloso M. Multiagent learning using a variable learning rate. Artificial Intelligence, 2002, 136(2): 215-250
[23]	Zhang Hua-Guang, Zhang Xin, Luo Yan-Hong, Yang Jun. An overview of research on adaptive dynamic programming. Acta Automatica Sinica, 2013, 39(4): 303-311 (in Chinese)
[24]	Werbos P J. A menu of designs for reinforcement learning over time. Neural Networks for Control. Cambridge: MIT Press, 1990
[25]	Liu D R, Wang D, Zhao D B, Wei Q L, Jin N. Neural-networkbased optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Transactions on Automation Science and Engineering, 2012, 9(3): 628-634
[26]	Huang Y Z, Liu D R. Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative ADP algorithm. Neurocomputing, 2014, 125: 46-56
[27]	Song Rui-Zhuo, Xiao Wen-Dong, Sun Chang-Yin. Optimal tracking control for a class of unknown discrete-time systems with actuator saturation via data-based ADP algorithm. Acta Automatica Sinica, 2013, 39(9): 1413-1420 (in Chinese)
[28]	Wang D, Liu D R. Neuro-optimal control for a class of unknown nonlinear dynamic systems using SN-DHP technique. Neurocomputing, 2013, 121(9) 218-225
[29]	Zhang Ji-Lie, Zhang Hua-Guang, Luo Yan-Hong, Liang Hong-Jing. Nearly optimal control scheme using adaptive dynamic programming based on generalized fuzzy hyperbolic model. Acta Automatica Sinica, 2013, 39(2): 142-148 (in Chinese)
[30]	Liu D R, Javaherian H, Kovalenko O, Huang T. Adaptive critic learning techniques for engine torque and air-fuel ratio control. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 2008, 38(4): 988-993
[31]	Zhao D B, Hu Z H, Xia Z P, Alippi C, Zhu YH, Wang D. Full range adaptive cruise control based on supervised adaptive dynamic programming. Neurocomputing, 2014, 125: 57-67
[32]	Zhao D B, Yi J Q, Liu D R. Particle swarn optimized adaptive dynamic programming. In: Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. Honolulu, Hawaiian Islands, US: IEEE, 2007. 32-37
[33]	Crites R H, Barto A G. Improving elevator performance using reinforcement learning. In: Proceedings of Advances in Neural Information Processing Systems 8. Denver, US: MIT Press, 1996. 1017-1023
[34]	Bazzan A L, de Oliveira D, de Silva B C. Learning in groups of traffic signals. Engineering Applications of Artificial Intelligence, 2010, 23(4): 560-568
[35]	Bazzan A L. Coordinating many agents in stochastic games. In: Proceedings of the 2012 International Joint Conference on Neural Networks. Brisbane, Australia: IEEE, 2012. 1-8
[36]	Kok J R, Vlassis N. Sparse cooperative Q-learning. In: Proceedings of the 21st International Conference on Machine Learning. USA: ACM, 2004. 481-488
[37]	Kok J R, Vlassis N. Using the max-plus algorithm for multiagent decision making in coordination graphs. In: Proceedings of Robot Soccer World Cup IX, Lecture Notes in Computer Science. Berlin: Springer, 2005. 1-12
[38]	Syed A, Koenig S, Tambe M. Preprocessing techniques for accelerating the DCOP algorithm ADOPT. In: Proceedings of the 4th International Joint Conference on Autonomous Agents and Multiagent Systems. Utrecht, The Netherlands: ACM, 2005. 1041-1048
[39]	Kok J R, Vlassis N. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research, 2006, 7(S1): 1789-1828
[40]	Schneider J, Wong W K, Moore A, Riedmiller M. Distributed value functions. In: Proceedings of the 16th International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann, 1999. 371-378
[41]	Chen G, Cao W H, Chen X, Wu M. Multi-agent Q-learning with joint state value approximation. In: Proceedings of the 30th Chinese Control Conference. Yantai, China: IEEE, 2011. 4878-4882
[42]	Guestrin C, Lagoudakis M G, Parr R. Coordinated reinforcement learning. In: Proceedings of the 19th International Conference on Machine Learning. Sydney, Australia: Morgan Kaufmann, 2002. 227-234
[43]	Panait L, Luke S. Cooperative multi-agent learning: the state of the art. Autonomous Agents and Multi-Agent Systems, 2005, 11(3): 387-434
[44]	Luke S. Genetic programming produced competitive soccer softbot teams for RoboCup97. In: Proceedings of the 3rd Annual Conference on Genetic Programming. Madison, Wisconsin, US: Morgan Kaufmann, 1998. 214-222
[45]	Kschischang F R, Frey B J, Loeliger H A. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 2001, 47(2): 498-519
[46]	Pearl J. Probabilistic Reasoning in Intelligent Systems. San Mateo: Morgan Kaufman, 1988
[47]	Weiss Y, Freeman W T. On the optimality of solutions of the maxproduct belief-propagation algorithm in arbitrary graphs. IEEE Transactions on Information Theory, 2001, 47(2): 736-744
[48]	Lan X, Roth S, Huttenlocher D, Black M. Efficient belief propagation with learned higher-order Markov random fields. In: Proceedings of the 2006 European Conference on Computer Vision. Berlin: Springer, 2006. 269-282
[49]	Som P, Chockalingam A. Damped belief propagation based near-optimal equalization of severely delay-spread UWB MIMO-ISI channels. In: Proceedings of the 2010 IEEE International Conference on Communications. Cape Town, South Africa: IEEE, 2010. 1-5
[50]	Frey B J, Kschischang F R, Conference A. Probability propagation and iterative decoding. In: Proceedings of the 34th Annual Allerton Conference on Communication Control and Computing. Illinois, US: IEEE, 1996. 1-4

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Get Citation

PDF

XML

Article Metrics

Article views (1318) PDF downloads(24)

Clique-based Cooperative Multiagent Reinforcement Learning Using Factor Graphs

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content