A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 1 Issue 3
Jul.  2014

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 6.171, Top 11% (SCI Q1)
    CiteScore: 11.2, Top 5% (Q1)
    Google Scholar h5-index: 51, TOP 8
Turn off MathJax
Article Contents
Rushikesh Kamalapurkar, Justin R. Klotz and Warren E. Dixon, "Concurrent Learning-based Approximate Feedback-Nash Equilibrium Solution of N-player Nonzero-sum Differential Games," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 3, pp. 239-247, 2014.
Citation: Rushikesh Kamalapurkar, Justin R. Klotz and Warren E. Dixon, "Concurrent Learning-based Approximate Feedback-Nash Equilibrium Solution of N-player Nonzero-sum Differential Games," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 3, pp. 239-247, 2014.

Concurrent Learning-based Approximate Feedback-Nash Equilibrium Solution of N-player Nonzero-sum Differential Games

Funds:

This work was supported by National Science Foundation Award (1161260, 1217908), Office of Naval Research (N00014-13-1-0151), and a contract with the Air Force Research Laboratory Mathematical Modeling and Optimization Institute. Recommended by Associate Editor Zhongsheng Hou

  • This paper presents a concurrent learning-based actor-critic-identifier architecture to obtain an approximate feedback-Nash equilibrium solution to an infinite horizon N-player nonzero-sum differential game. The solution is obtained online for a nonlinear control-affine system with uncertain linearly parameterized drift dynamics. It is shown that under a condition milder than persistence of excitation (PE), uniformly ultimately bounded convergence of the developed control policies to the feedback-Nash equilibrium policies can be established. Simulation results are presented to demonstrate the performance of the developed technique without an added excitation signal.

     

  • loading
  • [1]
    Kirk D E. Optimal Control Theory: An Introduction. New York: Dover Publications, 2004.
    [2]
    Lewis F L, Vrabie D, Syrmos V L. Optimal Control (Third edition). New York: John Wiley & Sons, 2012.
    [3]
    Isaacs R. Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization. New York: Dover Publications, 1999.
    [4]
    Tijs S. Introduction to Game Theory. Hindustan Book Agency, 2003.
    [5]
    Basar T, Olsder G. Dynamic Noncooperative Game Theory (Second edition). Philadelphia, PA: SIAM, 1999.
    [6]
    Nash J. Non-cooperative games. The Annals of Mathematics, 1951, 54(2): 286-295
    [7]
    Case J H. Toward a theory of many player differential games. SIAM Journal on Control, 1969, 7(2): 179-197
    [8]
    Starr A W, Ho C Y. Nonzero-sum differential games. Journal of Optimization Theory and Applications, 1969, 3(3): 184-206
    [9]
    Starr A, Ho C Y. Further properties of nonzero-sum differential games. Journal of Optimization Theory and Applications, 1969, 3(4): 207-219
    [10]
    Friedman A. Differential Games. New York: John Wiley and Sons, 1971.
    [11]
    Littman M. Value-function reinforcement learning in Markov games. Cognitive Systems Research, 2001, 2(1): 55-66
    [12]
    Wei Q L, Zhang H G. A new approach to solve a class of continuoustime nonlinear quadratic zero-sum game using ADP. In: Proceedings of the IEEE International Conference on Networking Sensing and Control. Sanya, China: IEEE, 2008. 507-512
    [13]
    Vamvoudakis K, Lewis F. Online synchronous policy iteration method for optimal control. In: Proceedings of the Recent Advances in Intelligent Control Systems. London: Springer, 2009. 357-374
    [14]
    Zhang H G, Wei Q L, Liu D R. An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica, 2010, 47: 207-214
    [15]
    Zhang X, Zhang H G, Luo Y H, Dong M. Iteration algorithm for solving the optimal strategies of a class of nonaffine nonlinear quadratic zero-sum games. In: Proceedings of the IEEE Conference Decision and Control. Xuzhou, China: IEEE, 2010. 1359-1364
    [16]
    Vrabie D, Lewis F. Integral reinforcement learning for online computation of feedback Nash strategies of nonzero-sum differential games. In: Proceedings of the 49th IEEE Conference Decision and Control. Atlanta, GA: IEEE, 2010. 3066-3071
    [17]
    Vamvoudakis K G, Lewis F L. Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 2011, 47(8): 1556-1569
    [18]
    Zhang H, Liu D, Luo Y, Wang D. Adaptive Dynamic Programming for Control-Algorithms and Stability (Communications and Control Engineering). London: Springer-Verlag, 2013
    [19]
    Johnson M, Bhasin S, Dixon W E. Nonlinear two-player zero-sum game approximate solution using a policy iteration algorithm. In: Proceedings of the 50th IEEE Conference on Decision and Control and European Control Conference. Orlando, FL, USA: IEEE, 2011. 142-147
    [20]
    Mehta P, Meyn S. Q-learning and pontryagin's minimum principle. In: Proceedings of the IEEE Conference on Decision and Control. Shanghai, China: IEEE, 2009. 3598-3605
    [21]
    Vrabie D, Abu-Khalaf M, Lewis F L, Wang Y Y. Continuous-time ADP for linear systems with partially unknown dynamics. In: Proceedings of the IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. Honolulu, HI: IEEE, 2007. 247-253
    [22]
    Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA, USA: MIT Press, 1998. 4
    [23]
    Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis K, Lewis F L, Dixon W. A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica, 2013, 49(1): 89-92
    [24]
    Chowdhary G, Yucelen T, Mühlegg M, Johnson E N. Concurrent learning adaptive control of linear systems with exponentially convergent bounds. International Journal of Adaptive Control and Signal Processing, 2012, 27(4): 280-301
    [25]
    Chowdhary G V, Johnson E N. Theory and flight-test validation of a concurrent-learning adaptive controller. Journal of Guidance, Control, and Dynamics, 2011, 34(2): 592-607
    [26]
    Ioannou P, Sun J. Robust Adaptive Control. New Jersey: Prentice Hall, 1996. 198
    [27]
    Khalil H K. Nonlinear Systems (Third edition). New Jersey: Prentice Hall, 2002. 172
    [28]
    Nevistić V, Primbs J A. Constrained nonlinear optimal control: a converse HJB approach. California Institute of Technology, Pasadena, CA 91125, Techical Report CIT-CDS 96-021, 1996. 5
    [29]
    Savitzky A, Golay M J E. Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry, 1964, 36(8): 1627-1639

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (1023) PDF downloads(10) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return