Data-Based Optimal Tracking of Autonomous Nonlinear Switching Systems

Xiaofeng Li; Lu Dong; Changyin Sun

doi:10.1109/JAS.2020.1003486

Volume 8 Issue 1

Jan. 2021

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2021 > 8(1): 227-238

Xiaofeng Li, Lu Dong and Changyin Sun, "Data-Based Optimal Tracking of Autonomous Nonlinear Switching Systems," IEEE/CAA J. Autom. Sinica, vol. 8, no. 1, pp. 227-238, Jan. 2021. doi: 10.1109/JAS.2020.1003486

Citation:

Xiaofeng Li, Lu Dong and Changyin Sun, "Data-Based Optimal Tracking of Autonomous Nonlinear Switching Systems," IEEE/CAA J. Autom. Sinica, vol. 8, no. 1, pp. 227-238, Jan. 2021. doi: 10.1109/JAS.2020.1003486

Citation:

PDF( 1559 KB)

Data-Based Optimal Tracking of Autonomous Nonlinear Switching Systems

doi: 10.1109/JAS.2020.1003486

Funds: This work was supported by the National Natural Science Foundation of China (61921004, U1713209, 61803085, and 62041301)

More Information

Author Bio:
Xiaofeng Li received the B. S. and M. S. degrees in engineering from Nanjing University of Aeronautics and Astronautics, Nanjing, China in 2012 and 2016, respectively. He is currently a Ph. D. candidate in control science and engineering at School of Automation, Southeast University, Nanjing, China. His research interests include deep reinforcement learning, optimal control, and adaptive dynamic programming

Lu Dong (S’16−M’18) received the B. S. degree in physics and the Ph. D. degree in electrical engineering from Southeast University, Nanjing, China in 2012 and 2017, respectively. She is currently an Associate Professor with the College of Electronics and Information Engineering, Tongji University, China. Her current research interests include adaptive dynamic programming, event-triggered control, nonlinear system control, and optimization

Changyin Sun (SM’20) received the B. S. degree in applied mathematics from the College of Mathematics, Sichuan University, Chengdu, China in 1996, and the M. S. and Ph. D. degrees in electrical engineering from Southeast University, Nanjing, China in 2001 and 2004, respectively. He is currently a Professor with the School of Automation, Southeast University, Nanjing, China. His current research interests include intelligent control, flight control, and optimal theory. He is an Associate Editor of the IEEE Transactions on Neural Networks and Learning Systems, Neural Processing Letters, and the IEEE/CAA Journal of Automatica Sinica
Corresponding author: L. Dong is with the College of Electronics and Information Engineering, Tongji University, Shanghai 201804, China (e-mail: ldong@tongji.edu.cn)
Received Date: 2020-03-23
Revised Date: 2020-04-27
Accepted Date: 2020-06-03

Available Online: 2020-07-08

Abstract

Abstract

In this paper, a data-based scheme is proposed to solve the optimal tracking problem of autonomous nonlinear switching systems. The system state is forced to track the reference signal by minimizing the performance function. First, the problem is transformed to solve the corresponding Bellman optimality equation in terms of the Q-function (also named as action value function). Then, an iterative algorithm based on adaptive dynamic programming (ADP) is developed to find the optimal solution which is totally based on sampled data. The linear-in-parameter (LIP) neural network is taken as the value function approximator. Considering the presence of approximation error at each iteration step, the generated approximated value function sequence is proved to be boundedness around the exact optimal solution under some verifiable assumptions. Moreover, the effect that the learning process will be terminated after a finite number of iterations is investigated in this paper. A sufficient condition for asymptotically stability of the tracking error is derived. Finally, the effectiveness of the algorithm is demonstrated with three simulation examples.
- Adaptive dynamic programming,
- approximation error,
- data-based control,
- Q-learning,
- switching system

FullText(HTML)

References(56)

References

[1]	D. Liberzon, Switching in Systems and Control. Boston, USA: Birkhäuser, 2003.
[2]	X. P. Xu and P. J. Antsaklis, “Optimal control of switched systems based on parameterization of the switching instants,” IEEE Trans. Autom. Control, vol. 49, no. 1, pp. 2–16, Jan. 2004. doi: 10.1109/TAC.2003.821417
[3]	M. Soler, A. Olivares, E. Staffetti, and D. Zapata, “Framework for aircraft trajectory planning toward an efficient air traffic management,” J. Aircr., vol. 49, no. 1, pp. 341–348, Jan.-Feb. 2012. doi: 10.2514/1.C031490
[4]	K. Benmansour, A. Benalia, M. Djemaï, and J. de Leon, “Hybrid control of a multicellular converter,” Nonlinear Anal.:Hybrid Syst., vol. 1, no. 1, pp. 16–29, Mar. 2007. doi: 10.1016/j.nahs.2006.06.001
[5]	A. Heydari and S. N. Balakrishnan, “Optimal multi-therapeutic HIV treatment using a global optimal switching scheme,” Appl. Math. Comput., vol. 219, no. 14, pp. 7872–7881, Mar. 2013.
[6]	M. Rinehart, M. Dahleh, D. Reed, and I. Kolmanovsky, “Suboptimal control of switched systems with an application to the disc engine,” IEEE Trans. Control Syst. Technol., vol. 16, no. 2, pp. 189–201, Mar. 2008. doi: 10.1109/TCST.2007.903366
[7]	A. Heydari and S. N. Balakrishnan, “Optimal orbit transfer with ON-OFF actuators using a closed form optimal switching scheme,” in Proc. AIAA Guidance, Navigation, Control Conf., Boston, USA, 2013, pp. 2013–4635.
[8]	H. Axelsson, M. Boccadoro, M. Egerstedt, P. Valigi, and Y. Wardi, “Optimal mode-switching for hybrid systems with varying initial states,” Nonlinear Anal.:Hybrid Syst., vol. 2, no. 3, pp. 765–772, Aug. 2008. doi: 10.1016/j.nahs.2007.11.010
[9]	X. P. Xu and P. J. Antsaklis, “Optimal control of switched systems via non-linear optimization based on direct differentiations of value functions,” Int. J. Control, vol. 75, no. 16–17, pp. 1406–1426, Nov. 2002. doi: 10.1080/0020717021000023825
[10]	X. C. Ding, A. Schild, M. Egerstedt, and J. Lunze, “Real-time optimal feedback control of switched autonomous systems,” IFAC Proc. Vol., vol. 42, no. 17, pp. 108–113, Sep. 2009. doi: 10.3182/20090916-3-ES-3003.00020
[11]	Y. Wardi and M. Egerstedt, “Algorithm for optimal mode scheduling in switched systems,” in Proc. American Control Conf. (ACC), Montreal, Canada, 2012, pp. 4546–4551.
[12]	H. Axelsson, M. Egerstedt, Y. Wardi, and G. Vachtsevanos, “Algorithm for switching-time optimization in hybrid dynamical systems,” in Proc. IEEE Int. Symp., Mediterrean Conf. Control and Automation Intelligent Control, 2005, Limassol, Cyprus, 2005, pp. 256–261.
[13]	M. Sakly, A. Sakly, N. Majdoub, and M. Benrejeb, “Optimization of switching instants for optimal control of linear switched systems based on genetic algorithms,” IFAC Proc. Vol., vol. 42, no. 19, pp. 249–253, Sep. 2009. doi: 10.3182/20090921-3-TR-3005.00045
[14]	R. Luus and Y. Q. Chen, “Optimal switching control via direct search optimization,” in Proc. IEEE Int. Symp. Intelligent Control, Houston, USA, 2003, pp. 371–376.
[15]	R. Long, J. M. Fu, and L. Y. Zhang, “Optimal control of switched system based on neural network optimization,” in Proc. 4th Int. Conf. Intelligent Computing, Shanghai, China, 2008, pp. 799–806.
[16]	M. Rungger and O. Stursberg, “A numerical method for hybrid optimal control based on dynamic programming,” Nonlinear Anal.:Hybrid Syst., vol. 5, no. 2, pp. 254–274, May 2011. doi: 10.1016/j.nahs.2010.09.002
[17]	F. L. Lewis, D. L. Vrabie, and V. L. Syrmos, Optimal Control. New York, USA: Wiley, 2012.
[18]	M. H. Korayem, A. Zehfroosh, H. Tourajizadeh, and S. Manteghi, “Optimal motion planning of non-linear dynamic systems in the presence of obstacles and moving boundaries using SDRE: Application on cable-suspended robot,” Nonlinear Dyn., vol. 76, no. 2, pp. 1423–1441, Jan. 2014. doi: 10.1007/s11071-013-1219-7
[19]	M. H. Korayem and H. Tourajizadeh, “Maximum DLCC of spatial cable robot for a predefined trajectory within the workspace using closed loop optimal control approach,” J. Intell. Robot. Syst., vol. 63, no. 1, pp. 75–99, Jan. 2011. doi: 10.1007/s10846-010-9521-9
[20]	M. H. Korayem, M. Bamdad, H. Tourajizadeh, A. H. Korayem, and S. Bayat, “Analytical design of optimal trajectory with dynamic load-carrying capacity for cable-suspended manipulator,” Int. J. Adv. Manuf. Technol., vol. 60, no. 1–4, pp. 317–327, Aug. 2012. doi: 10.1007/s00170-011-3579-9
[21]	D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, USA: Athena Scientific, 1996.
[22]	D. V. Prokhorov and D. C. Wunsch, “Adaptive critic designs,” IEEE Trans. Neural Netw., vol. 8, no. 5, pp. 997–1007, Sep. 1997. doi: 10.1109/72.623201
[23]	M. Abu-Khalaf and F. L. Lewis, “Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach,” Automatica, vol. 41, no. 5, pp. 779–791, May 2005. doi: 10.1016/j.automatica.2004.11.034
[24]	D. Vrabie and F. Lewis, “Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems,” Neural Netw., vol. 22, no. 3, pp. 237–246, Apr. 2009. doi: 10.1016/j.neunet.2009.03.008
[25]	K. G. Vamvoudakis and F. L. Lewis, “Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem,” Automatica, vol. 46, no. 5, pp. 878–888, May 2010. doi: 10.1016/j.automatica.2010.02.018
[26]	A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof,” IEEE Trans. Syst.,Man,Cybern. B,Cybern., vol. 38, no. 4, pp. 943–949, Aug. 2008. doi: 10.1109/TSMCB.2008.926614
[27]	D. R. Liu and Q. L. Wei, “Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 3, pp. 621–634, Mar. 2014. doi: 10.1109/TNNLS.2013.2281663
[28]	J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, “Adaptive dynamic programming,” IEEE Trans. Syst.,Man,Cybern. C Appl. Rev., vol. 32, no. 2, pp. 140–153, May 2002. doi: 10.1109/TSMCC.2002.801727
[29]	F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE Circuits Syst. Mag., vol. 9, no. 3, pp. 32–50, Aug. 2009. doi: 10.1109/MCAS.2009.933854
[30]	A. Heydari and S. N. Balakrishnan, “Optimal switching between autonomous subsystems,” J. Franklin Inst., vol. 351, no. 5, pp. 2675–2690, May 2014. doi: 10.1016/j.jfranklin.2013.12.008
[31]	Y. Z. Huang and D. R. Liu, “Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative ADP algorithm,” Neurocomputing, vol. 125, pp. 46–56, Feb. 2014. doi: 10.1016/j.neucom.2012.07.047
[32]	D. R. Liu, D. Wang, D. B. Zhao, Q. L. Wei, and N. Jin, “Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming,” IEEE Trans. Autom. Sci. Eng., vol. 9, no. 3, pp. 628–634, Jul. 2012. doi: 10.1109/TASE.2012.2198057
[33]	J. Si and Y. T. Wang, “Online learning control by association and reinforcement,” IEEE Trans. Neural Netw., vol. 12, no. 2, pp. 264–276, Mar. 2001. doi: 10.1109/72.914523
[34]	B. Luo, D. R. Liu, T. W. Huang, and D. Wang, “Model-free optimal tracking control via critic-only Q-learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 10, pp. 2134–2144, Oct. 2016. doi: 10.1109/TNNLS.2016.2585520
[35]	B. Luo, H. N. Wu, T. W. Huang, and D. R. Liu, “Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design,” Automatica, vol. 50, no. 12, pp. 3281–3290, Dec. 2014. doi: 10.1016/j.automatica.2014.10.056
[36]	T. Bian and Z. P. Jiang, “Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design,” Automatica, vol. 71, pp. 348–360, Sep. 2016. doi: 10.1016/j.automatica.2016.05.003
[37]	Y. Jiang and Z. P. Jiang, “Global adaptive dynamic programming for continuous-time nonlinear systems,” IEEE Trans. Autom. Control, vol. 60, no. 11, pp. 2917–2929, Nov. 2015. doi: 10.1109/TAC.2015.2414811
[38]	L. Dong, X. N. Zhong, C. Y. Sun, and H. B. He, “Adaptive event-triggered control based on heuristic dynamic programming for nonlinear discrete-time systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 7, pp. 1594–1605, Jul. 2017. doi: 10.1109/TNNLS.2016.2541020
[39]	L. Dong, Y. F. Tang, H. B. He, and C. Y. Sun, “An event-triggered approach for load frequency control with supplementary ADP,” IEEE Trans. Power Syst., vol. 32, no. 1, pp. 581–589, Jan. 2017. doi: 10.1109/TPWRS.2016.2537984
[40]	D. Wang, M. M. Ha, and J. F. Qiao, “Self-learning optimal regulation for discrete-time nonlinear systems under event-driven formulation,” IEEE Trans. Autom. Control, vol. 65, no. 3, pp. 1272–1279, Mar. 2020. doi: 10.1109/TAC.2019.2926167
[41]	D. Wang, H. B. He, X. N. Zhong, and D. R. Liu, “Event-driven nonlinear discounted optimal regulation involving a power system application,” IEEE Trans. Ind. Electron., vol. 64, no. 10, pp. 8177–8186, Oct. 2017. doi: 10.1109/TIE.2017.2698377
[42]	D. Wang, H. B. He, and D. R. Liu, “Adaptive critic nonlinear robust control: A survey,” IEEE Trans. Cybern., vol. 47, no. 10, pp. 3429–3451, Oct. 2017. doi: 10.1109/TCYB.2017.2712188
[43]	D. Wang, “Robust policy learning control of nonlinear plants with case studies for a power system application,” IEEE Trans. Ind. Inform., vol. 16, no. 3, pp. 1733–1741, Mar. 2020. doi: 10.1109/TII.2019.2925632
[44]	C. X. Mu, Z. Ni, C. Y. Sun, and H. B. He, “Air-breathing hypersonic vehicle tracking control based on adaptive dynamic programming,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 3, pp. 584–598, Mar. 2017. doi: 10.1109/TNNLS.2016.2516948
[45]	C. Cai, C. K. Wong, and B. G. Heydecker, “Adaptive traffic signal control using approximate dynamic programming,” Transp. Res. C-Emerg. Technol., vol. 17, no. 5, pp. 456–474, Oct. 2009. doi: 10.1016/j.trc.2009.04.005
[46]	A. Heydari, “Optimal switching of DC-DC power converters using approximate dynamic programming,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 3, pp. 586–596, Mar. 2018. doi: 10.1109/TNNLS.2016.2635586
[47]	A. Heydari, “Optimal switching with minimum dwell time constraint,” J. Franklin Inst., vol. 354, no. 11, pp. 4498–4518, Jul. 2017. doi: 10.1016/j.jfranklin.2017.04.015
[48]	A. Heydari, “Feedback solution to optimal switching problems with switching cost,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 10, pp. 2009–2019, Oct. 2016. doi: 10.1109/TNNLS.2015.2388672
[49]	A. Heydari, “Optimal scheduling for reference tracking or state regulation using reinforcement learning,” J. Franklin Inst., vol. 352, no. 8, pp. 3285–3303, Aug. 2015. doi: 10.1016/j.jfranklin.2014.11.008
[50]	T. Sardarmehni and A. Heydari, “Policy iteration for optimal switching with continuous-time dynamics,” in 2016 Int. Joint Conf. Neural Networks (IJCNN), Vancouver, Canada, 2016, pp. 3536–3543.
[51]	A. Heydari and S. N. Balakrishnan, “Optimal switching and control of nonlinear switching systems using approximate dynamic programming,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 6, pp. 1106–1117, Jun. 2014. doi: 10.1109/TNNLS.2013.2288067
[52]	A. Heydari, “Optimal codesign of control input and triggering instants for networked control systems using adaptive dynamic programming,” IEEE Trans. Ind. Electron., vol. 66, no. 1, pp. 482–490, Jan. 2019. doi: 10.1109/TIE.2018.2823699
[53]	A. Heydari, “Optimal triggering of networked control systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 7, pp. 3011–3021, Jul. 2018.
[54]	W. Rudin, Principles of Mathematical Analysis. New York, USA: McGraw-Hill, 1976.
[55]	H. K. Khalil, Nonlinear Systems. 3rd ed. Upper Saddle River, USA: Prentice Hall, 2002.
[56]	T. Sardarmehni and A. Heydari, “Sub-optimal switching in anti-lock brake systems using approximate dynamic programming,” IET Control Theory Appl., vol. 13, no. 9, pp. 1413–1424, Jun. 2019. doi: 10.1049/iet-cta.2018.5428

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(15)

Get Citation

PDF

XML

Article Metrics

Article views (1257) PDF downloads(98)

Highlights

Develop a data-based method for optimal tracking of autonomous switching systems.
The effects of approximation error and finite number of iterations are considered.
Provide theoretical analysis of the continuity, the convergence, and the stability.

Data-Based Optimal Tracking of Autonomous Nonlinear Switching Systems

doi: 10.1109/JAS.2020.1003486

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content