A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 1 Issue 3
Jul.  2014

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8, Top 4% (SCI Q1)
    CiteScore: 17.6, Top 3% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Zhen Zhang and Dongbin Zhao, "Clique-based Cooperative Multiagent Reinforcement Learning Using Factor Graphs," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 3, pp. 248-256, 2014.
Citation: Zhen Zhang and Dongbin Zhao, "Clique-based Cooperative Multiagent Reinforcement Learning Using Factor Graphs," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 3, pp. 248-256, 2014.

Clique-based Cooperative Multiagent Reinforcement Learning Using Factor Graphs

Funds:

This work was supported by National Natural Science Foundation of China (61273136, 61034002), Beijing Natural Science Foundation (4122083), and Visiting Professorship of Chinese Academy of Sciences.

  • In this paper, we propose a clique-based sparse reinforcement learning (RL) algorithm for solving cooperative tasks. The aim is to accelerate the learning speed of the original sparse RL algorithm and to make it applicable for tasks decomposed in a more general manner. First, a transition function is estimated and used to update the Q-value function, which greatly reduces the learning time. Second, it is more reasonable to divide agents into cliques, each of which is only responsible for a specific subtask. In this way, the global Q-value function is decomposed into the sum of several simpler local Q-value functions. Such decomposition is expressed by a factor graph and exploited by the general maxplus algorithm to obtain the greedy joint action. Experimental results show that the proposed approach outperforms others with better performance.

     

  • loading
  • [1]
    Russell S J, Norvig P, Canny J F, Malik J M, Edwards D D. Artificial Intelligence: A Modern Approach. Englewood Cliffs: Prentice Hall, 1995.
    [2]
    Zhao D B, Dai Y J, Zhang Z. Computational intelligence in urban traffic signal control, a survey. IEEE Transactions on System, Man and Cybernetics Part C: Applications and Reviews, 2012, 42(4): 485-494
    [3]
    Li T, Zhao D B, Yi J Q. Adaptive dynamic programming for multiintersections traffic signal intelligent control. In: Proceedings of the 11th IEEE International Conference on Intelligent Transportation Systems. Beijing, China: IEEE, 2008. 286-291
    [4]
    Zhao D B, Bai X R, Wang F Y, Xu J, Yu W S. DHP for coordinated freeway ramp metering. IEEE Transactions on Intelligent Transportation Systems, 2011, 12(4): 990-999
    [5]
    Vlassis N. A concise introduction to multiagent systems and distributed artificial intelligence. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2007, 1(1): 1-71
    [6]
    Khan Z, Glisic S, DaSilva L A, Lehtomaki J. Modeling the dynamics of coalition formation games for cooperative spectrum sharing in an interference channel. IEEE Transactions on Computational Intelligence and AI in Games, 2011, 3(1): 17-30
    [7]
    Zhao D B, Zhang Z, Dai Y J. Self-teaching adaptive dynamic programming for Go-Moku. Neurocomputing, 2012, 78(1): 23-29
    [8]
    Tan C H, Tan K C, Tay A. Dynamic game difficulty scaling using adaptive behavior-based AI. IEEE Transactions on Computational Intelligence and AI in Games, 2011, 3(4): 289-301
    [9]
    Zeng L, Hu G D. Consensus of linear multi-agent systems with communication and input delays. Acta Automatica Sinica, 2013, 39(7): 1133-1140
    [10]
    Li T, Fu M Y, Xie L H, Zhang J F. Distributed consensus with limited communication data rate. IEEE Transactions on Automatic Control, 2011, 56(2): 279-292
    [11]
    Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998
    [12]
    Wiering M A. Multiagent reinforcement learning for traffic light control. In: Proceedings of the 17th International Conference on Machine Learning. Stanford University, US: Morgan Kaufmann, 2000. 1151-1158
    [13]
    Claus C, Boutilier C. The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the 15th National Conference on Artificial Intelligence and 10th Conference on Innovative Applications of Artificial Intelligence. Menlo Park, CA: AAAI Press/MIT Press, 1998. 746-752
    [14]
    Tuyls K, Nowé A. Evolutionary game theory and multi-agent reinforcement learning. The Knowledge Engineering Review, 2005, 20(1): 63-90
    [15]
    Gomes E R, Kowalczyk R. Dynamic analysis of multiagent Q-learning with ε-greedy exploration. In: Proceedings of the 26th International Conference on Machine Learning. New York, USA: ACM, 2009. 369-376
    [16]
    Kianercy A, Galstyan A. Dynamics of Boltzmann Q-learning in twoplayer two-action games. Physical Review E, 2012, 85(4): 1145-1154
    [17]
    Littman M L. Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning. New Brunswick, US: Morgan Kaufmann, 1994. 157-163
    [18]
    Owen G. Game Theory (Second Edition). Orlando, Florida: Academic Press, 1982.
    [19]
    Littman M L. Friend-or-foe Q-learning in general-sum games. In: Proceedings of the 18th International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann, 2001. 322-328
    [20]
    Hu J L, Wellman M P. Multiagent reinforcement learning: theoretical framework and an algorithm. In: Proceedings of the 15th International Conference on Machine Learning. Madison, Wisconsin, US: Morgan Kaufmann, 1998. 242-250
    [21]
    Singh S, Kearns M, Mansour Y. Nash convergence of gradient dynamics in general-sum games. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence. Stanford University, Stanford, California, US: Morgan Kaufmann, 2000. 541-548
    [22]
    Bowling M, Veloso M. Multiagent learning using a variable learning rate. Artificial Intelligence, 2002, 136(2): 215-250
    [23]
    Zhang Hua-Guang, Zhang Xin, Luo Yan-Hong, Yang Jun. An overview of research on adaptive dynamic programming. Acta Automatica Sinica, 2013, 39(4): 303-311 (in Chinese)
    [24]
    Werbos P J. A menu of designs for reinforcement learning over time. Neural Networks for Control. Cambridge: MIT Press, 1990
    [25]
    Liu D R, Wang D, Zhao D B, Wei Q L, Jin N. Neural-networkbased optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Transactions on Automation Science and Engineering, 2012, 9(3): 628-634
    [26]
    Huang Y Z, Liu D R. Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative ADP algorithm. Neurocomputing, 2014, 125: 46-56
    [27]
    Song Rui-Zhuo, Xiao Wen-Dong, Sun Chang-Yin. Optimal tracking control for a class of unknown discrete-time systems with actuator saturation via data-based ADP algorithm. Acta Automatica Sinica, 2013, 39(9): 1413-1420 (in Chinese)
    [28]
    Wang D, Liu D R. Neuro-optimal control for a class of unknown nonlinear dynamic systems using SN-DHP technique. Neurocomputing, 2013, 121(9) 218-225
    [29]
    Zhang Ji-Lie, Zhang Hua-Guang, Luo Yan-Hong, Liang Hong-Jing. Nearly optimal control scheme using adaptive dynamic programming based on generalized fuzzy hyperbolic model. Acta Automatica Sinica, 2013, 39(2): 142-148 (in Chinese)
    [30]
    Liu D R, Javaherian H, Kovalenko O, Huang T. Adaptive critic learning techniques for engine torque and air-fuel ratio control. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 2008, 38(4): 988-993
    [31]
    Zhao D B, Hu Z H, Xia Z P, Alippi C, Zhu YH, Wang D. Full range adaptive cruise control based on supervised adaptive dynamic programming. Neurocomputing, 2014, 125: 57-67
    [32]
    Zhao D B, Yi J Q, Liu D R. Particle swarn optimized adaptive dynamic programming. In: Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. Honolulu, Hawaiian Islands, US: IEEE, 2007. 32-37
    [33]
    Crites R H, Barto A G. Improving elevator performance using reinforcement learning. In: Proceedings of Advances in Neural Information Processing Systems 8. Denver, US: MIT Press, 1996. 1017-1023
    [34]
    Bazzan A L, de Oliveira D, de Silva B C. Learning in groups of traffic signals. Engineering Applications of Artificial Intelligence, 2010, 23(4): 560-568
    [35]
    Bazzan A L. Coordinating many agents in stochastic games. In: Proceedings of the 2012 International Joint Conference on Neural Networks. Brisbane, Australia: IEEE, 2012. 1-8
    [36]
    Kok J R, Vlassis N. Sparse cooperative Q-learning. In: Proceedings of the 21st International Conference on Machine Learning. USA: ACM, 2004. 481-488
    [37]
    Kok J R, Vlassis N. Using the max-plus algorithm for multiagent decision making in coordination graphs. In: Proceedings of Robot Soccer World Cup IX, Lecture Notes in Computer Science. Berlin: Springer, 2005. 1-12
    [38]
    Syed A, Koenig S, Tambe M. Preprocessing techniques for accelerating the DCOP algorithm ADOPT. In: Proceedings of the 4th International Joint Conference on Autonomous Agents and Multiagent Systems. Utrecht, The Netherlands: ACM, 2005. 1041-1048
    [39]
    Kok J R, Vlassis N. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research, 2006, 7(S1): 1789-1828
    [40]
    Schneider J, Wong W K, Moore A, Riedmiller M. Distributed value functions. In: Proceedings of the 16th International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann, 1999. 371-378
    [41]
    Chen G, Cao W H, Chen X, Wu M. Multi-agent Q-learning with joint state value approximation. In: Proceedings of the 30th Chinese Control Conference. Yantai, China: IEEE, 2011. 4878-4882
    [42]
    Guestrin C, Lagoudakis M G, Parr R. Coordinated reinforcement learning. In: Proceedings of the 19th International Conference on Machine Learning. Sydney, Australia: Morgan Kaufmann, 2002. 227-234
    [43]
    Panait L, Luke S. Cooperative multi-agent learning: the state of the art. Autonomous Agents and Multi-Agent Systems, 2005, 11(3): 387-434
    [44]
    Luke S. Genetic programming produced competitive soccer softbot teams for RoboCup97. In: Proceedings of the 3rd Annual Conference on Genetic Programming. Madison, Wisconsin, US: Morgan Kaufmann, 1998. 214-222
    [45]
    Kschischang F R, Frey B J, Loeliger H A. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 2001, 47(2): 498-519
    [46]
    Pearl J. Probabilistic Reasoning in Intelligent Systems. San Mateo: Morgan Kaufman, 1988
    [47]
    Weiss Y, Freeman W T. On the optimality of solutions of the maxproduct belief-propagation algorithm in arbitrary graphs. IEEE Transactions on Information Theory, 2001, 47(2): 736-744
    [48]
    Lan X, Roth S, Huttenlocher D, Black M. Efficient belief propagation with learned higher-order Markov random fields. In: Proceedings of the 2006 European Conference on Computer Vision. Berlin: Springer, 2006. 269-282
    [49]
    Som P, Chockalingam A. Damped belief propagation based near-optimal equalization of severely delay-spread UWB MIMO-ISI channels. In: Proceedings of the 2010 IEEE International Conference on Communications. Cape Town, South Africa: IEEE, 2010. 1-5
    [50]
    Frey B J, Kschischang F R, Conference A. Probability propagation and iterative decoding. In: Proceedings of the 34th Annual Allerton Conference on Communication Control and Computing. Illinois, US: IEEE, 1996. 1-4

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (1213) PDF downloads(19) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return