A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 5 Issue 1
Jan.  2018

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 6.171, Top 11% (SCI Q1)
    CiteScore: 11.2, Top 5% (Q1)
    Google Scholar h5-index: 51, TOP 8
Lei Xue, Changyin Sun, Donald Wunsch, Yingjiang Zhou and Fang Yu, "An Adaptive Strategy via Reinforcement Learning for the Prisoner's Dilemma Game," IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 301-310, Jan. 2018. doi: 10.1109/JAS.2017.7510466
An Adaptive Strategy via Reinforcement Learning for the Prisoner's Dilemma Game

doi: 10.1109/JAS.2017.7510466

the National Natural Science Foundation (NNSF) of China 61603196

the National Natural Science Foundation (NNSF) of China 61503079

the National Natural Science Foundation (NNSF) of China 61520106009

the National Natural Science Foundation (NNSF) of China 61533008

the Natural Science Foundation of Jiangsu Province of China BK20150851

China Postdoctoral Science Foundation 2015M581842

Jiangsu Postdoctoral Science Foundation 1601259C

Nanjing University of Posts and Telecommunications Science Foundation (NUPTSF) NY215011

Priority Academic Program Development of Jiangsu Higher Education Institutions, the open fund of Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education MCCSE2015B02

the Research Innovation Program for College Graduates of Jiangsu Province CXLX1309

  • The iterated prisoner's dilemma (IPD) is an ideal model for analyzing interactions between agents in complex networks. It has attracted wide interest in the development of novel strategies since the success of tit-for-tat in Axelrod's tournament. This paper studies a new adaptive strategy of IPD in different complex networks, where agents can learn and adapt their strategies through reinforcement learning method. A temporal difference learning method is applied for designing the adaptive strategy to optimize the decision making process of the agents. Previous studies indicated that mutual cooperation is hard to emerge in the IPD. Therefore, three examples which based on square lattice network and scale-free network are provided to show two features of the adaptive strategy. First, the mutual cooperation can be achieved by the group with adaptive agents under scale-free network, and once evolution has converged mutual cooperation, it is unlikely to shift. Secondly, the adaptive strategy can earn a better payoff compared with other strategies in the square network. The analytical properties are discussed for verifying evolutionary stability of the adaptive strategy.


