IEEE/CAA Journal of Automatica Sinica
Citation:  Yongliang Yang, Zihao Ding, Rui Wang, Hamidreza Modares and Donald C. Wunsch, "DataDriven HumanRobot Interaction Without Velocity Measurement Using OffPolicy Reinforcement Learning," IEEE/CAA J. Autom. Sinica, vol. 9, no. 1, pp. 4763, Jan. 2022. doi: 10.1109/JAS.2021.1004258 
[1] 
E. Nuño, R. Ortega, and L. Basañez, “An adaptive controller for nonlinear teleoperators,” Automatica, vol. 46, no. 1, pp. 155–159, 2010. doi: 10.1016/j.automatica.2009.10.026

[2] 
S. Cai, Z. Ma, M. J. Skibniewski, and S. Bao, “Construction automation and robotics for highrise buildings over the past decades: A comprehensive review,” Advanced Engineering Informatics, vol. 42, Article No. 100989, 2019. doi: 10.1016/j.aei.2019.100989

[3] 
D. Han, P. Huang, X. Liu, and Y. Yang, “Combined spacecraft stabilization control after multiple impacts during the capture of a tumbling target by a space robot,” Acta Astronautica, vol. 176, pp. 24–32, 2020. doi: 10.1016/j.actaastro.2020.05.035

[4] 
S. E. Fasoli, H. I. Krebs, J. Stein, W. R. Frontera, and N. Hogan, “Effects of robotic therapy on motor impairment and recovery in chronic stroke,” Archives of Physical Medicine and Rehabilitation, vol. 84, no. 4, pp. 477–482, 2003. doi: 10.1053/apmr.2003.50110

[5] 
J. C. Perry, J. Rosen, and S. Burns, “Upperlimb powered exoskeleton design,” IEEE/ASME Transactions on Mechatronics, vol. 12, no. 4, pp. 408–417, 2007. doi: 10.1109/TMECH.2007.901934

[6] 
M. Bergamasco, B. Allotta, L. Bosio, L. Ferretti, G. Parrini, G. Prisco, F. Salsedo, and G. Sartini, “An arm exoskeleton system for teleoperation and virtual environments applications,” in Proc. IEEE Int. Conf. Robotics and Automation, 1994, pp. 1449–1454.

[7] 
H. Modares, I. Ranatunga, F. L. Lewis, and D. O. Popa, “Optimized assistive human–robot interaction using reinforcement learning,” IEEE Trans. Cybernetics, vol. 46, no. 3, pp. 655–667, 2015.

[8] 
K. Guo, Y. Pan, D. Zheng, and H. Yu, “Composite learning control of robotic systems: A least squares modulated approach,” Automatica, vol. 111, Article No. 108612, 2020. doi: 10.1016/j.automatica.2019.108612

[9] 
T. Sun and Y. Pan, “Robust adaptive control for prescribed performance tracking of constrained uncertain nonlinear systems,” J. Franklin Institute, vol. 356, no. 1, pp. 18–30, 2019. doi: 10.1016/j.jfranklin.2018.09.005

[10] 
K. Dupree, P. M. Patre, Z. D. Wilcox, and W. E. Dixon, “Asymptotic optimal control of uncertain nonlinear eulerlagrange systems,” Automatica, vol. 47, no. 1, pp. 99–107, 2011. doi: 10.1016/j.automatica.2010.10.007

[11] 
Z. Li, J. Liu, Z. Huang, Y. Peng, H. Pu, and L. Ding, “Adaptive impedance control of human–robot cooperation using reinforcement learning,” IEEE Trans. Industrial Electronics, vol. 64, no. 10, pp. 8013–8022, 2017. doi: 10.1109/TIE.2017.2694391

[12] 
T. Sun, L. Peng, L. Cheng, Z. Hou, and Y. Pan, “Stabilityguaranteed variable impedance control of robots based on approximate dynamic inversion,” IEEE Trans. Systems,Man,and Cybernetics:Systems, vol. 51, no. 7, pp. 4193–4200, 2019. doi: 10.1109/TSMC.2019.2930582

[13] 
T. Sun, L. Cheng, L. Peng, Z. Hou, and Y. Pan, “Learning impedance control of robots with enhanced transient and steadystate control performances,” Science China Information Sciences, vol. 63, no. 9, pp. 1–13, 2020.

[14] 
T. Sun, L. Peng, L. Cheng, Z. Hou, and Y. Pan, “Composite learning enhanced robot impedance control,” IEEE Trans. Neural Networks and Learning Systems, vol. 31, no. 3, pp. 1052–1059, 2020. doi: 10.1109/TNNLS.2019.2912212

[15] 
R. Colbaugh, H. Seraji, and K. Glass, “Direct adaptive impedance control of robot manipulators,” J. Robotic Systems, vol. 10, no. 2, pp. 217–248, 1993. doi: 10.1002/rob.4620100205

[16] 
S. Ge, C. Hang, L. Woon, and X. Chen, “Impedance control of robot manipulators using adaptive neural networks,” Int. J. Intelligent Control and Systems, vol. 2, no. 3, pp. 433–452, 1998.

[17] 
C. Wang, Y. Li, S. S. Ge, and T. H. Lee, “Reference adaptation for robots in physical interactions with unknown environments,” IEEE Transactions on Cybernetics, vol. 47, no. 11, pp. 3504–3515, 2016.

[18] 
W.S. Lu and Q.H. Meng, “Impedance control with adaptation for robotic manipulations,” IEEE Trans. Robotics and Automation, vol. 7, no. 3, pp. 408–415, 1991. doi: 10.1109/70.88152

[19] 
H. N. Rahimi, I. Howard, and L. Cui, “Neural impedance adaption for assistive human–robot interaction,” Neurocomputing, vol. 290, pp. 50–59, 2018. doi: 10.1016/j.neucom.2018.02.025

[20] 
Y. Wang, W. Sun, Y. Xiang, and S. Miao, “Neural networkbased robust tracking control for robots,” Intelligent Automation &Soft Computing, vol. 15, no. 2, pp. 211–222, 2009.

[21] 
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. USA: A Bradford Book, 2018.

[22] 
Y. Yang, D. Wunsch, and Y. Yin, “Hamiltoniandriven adaptive dynamic programming for continuous nonlinear dynamical systems,” IEEE Trans. Neural Networks and Learning Systems, vol. 28, no. 8, pp. 1929–1940, 2017. doi: 10.1109/TNNLS.2017.2654324

[23] 
Y. Yang, K. G. Vamvoudakis, H. Modares, Y. Yin, and D. C. Wunsch, “Hamiltoniandriven hybrid adaptive dynamic programming,” IEEE Trans. Systems, Man, and Cybernetics: Systems, to be published, 2020.

[24] 
Y. Yang, K. G. Vamvoudakis, H. Modares, Y. Yin, and D. C. Wunsch, “Safe intermittent reinforcement learning with static and dynamic event generators,” IEEE Trans. Neural Networks and Learning Systems, vol. 31, no. 12, pp. 5441–5455, 2020. doi: 10.1109/TNNLS.2020.2967871

[25] 
D. Wang and X. Zhong, “Advanced policy learning nearoptimal regulation,” IEEE/CAA J. Automa Sinica, vol. 6, no. 3, pp. 743–749, 2019. doi: 10.1109/JAS.2019.1911489

[26] 
Y. Yang, Z. Guo, H. Xiong, D. Ding, Y. Yin, and D. C. Wunsch, “Datadriven robust control of discretetime uncertain linear systems via offpolicy reinforcement learning,” IEEE Trans. Neural Networks and Learning Systems, vol. 30, no. 12, pp. 3735–3747, 2019. doi: 10.1109/TNNLS.2019.2897814

[27] 
D. Wang, D. Liu, C. Mu, and Y. Zhang, “Neural network learning and robust stabilization of nonlinear systems with dynamic uncertainties,” IEEE Trans. Neural Networks and Learning Systems, vol. 29, no. 4, pp. 1342–1351, 2018. doi: 10.1109/TNNLS.2017.2749641

[28] 
Q. Zhang and D. Zhao, “Databased reinforcement learning for nonzerosum games with unknown drift dynamics,” IEEE Trans. Cybernetics, vol. 49, no. 8, pp. 2874–2885, 2019. doi: 10.1109/TCYB.2018.2830820

[29] 
H. Modares, F. L. Lewis, and Z.P. Jiang, “H_{∞} tracking control of completely unknown continuoustime systems via offpolicy reinforcement learning,” IEEE Trans. Neural Networks and Learning Systems, vol. 26, no. 10, pp. 2550–2562, 2015. doi: 10.1109/TNNLS.2015.2441749

[30] 
B. Luo, H.N. Wu, and T. Huang, “Offpolicy reinforcement learning for H8 control design,” IEEE trans. Cybernetics, vol. 45, no. 1, pp. 65–76, 2014.

[31] 
W. Gao, Z. Jiang, and K. Ozbay, “Datadriven adaptive optimal control of connected vehicles,” IEEE Trans. Intelligent Transportation Systems, vol. 18, no. 5, pp. 1122–1133, 2017. doi: 10.1109/TITS.2016.2597279

[32] 
W. Gao, J. Gao, K. Ozbay, and Z. Jiang, “Reinforcementlearningbased cooperative adaptive cruise control of buses in the lincoln tunnel corridor with timevarying topology,” IEEE Trans. Intelligent Transportation Systems, vol. 20, no. 10, pp. 3796–3805, 2019. doi: 10.1109/TITS.2019.2895285

[33] 
T. Degris, M. White, and R. S. Sutton, “Offpolicy actorcritic,” arXiv preprint arXiv:1205.4839, 2012.

[34] 
Y. Jiang and Z.P. Jiang, “Computational adaptive optimal control for continuoustime linear systems with completely unknown dynamics,” Automatica, vol. 48, no. 10, pp. 2699–2704, 2012. doi: 10.1016/j.automatica.2012.06.096

[35] 
J. Kober and J. R. Peters, “Policy search for motor primitives in robotics,” in Advances in Neural Information Processing Systems, in Learning Motor Skills, Cham: Springer, 2014, pp. 83–117.

[36] 
F. Zhang, D. M. Dawson, M. S. de Queiroz, and W. E. Dixon, “Global adaptive output feedback tracking control of robot manipulators,” IEEE Trans. Automatic Control, vol. 45, no. 6, pp. 1203–1208, 2000. doi: 10.1109/9.863607

[37] 
F. L. Lewis, D. M. Dawson, and C. T. Abdallah, Robot Manipulator Control: Theory and Practice. Boca Raton, Florida: CRC Press, 2003.

[38] 
J. E. Slotine and W. Li, “On the adaptive control of robot manipulators,” Int. J. Robot. Res., vol. 6, no. 3, pp. 49–59, 1987. doi: 10.1177/027836498700600303

[39] 
A. T. Hasan, N. Ismail, A. Hamouda, I. Aris, M. Marhaban, and H. AlAssadi, “Artificial neural networkbased kinematics jacobian solution for serial manipulator passing through singular configurations,” Advances in Engineering Software, vol. 41, no. 2, pp. 359–367, 2010. doi: 10.1016/j.advengsoft.2009.06.006

[40] 
R. C. Miall, D. J. Weir, D. M. Wolpert, and J. F. Stein, “Is the cerebellum a smith predictor?” Journal of Motor Behavior, vol. 25, no. 3, pp. 203–216, 1993. doi: 10.1080/00222895.1993.9942050

[41] 
A. Phatak, H. Weinert, I. Segall, and C. N. Day, “Identification of a modified optimal control model for the human operator,” Automatica, vol. 12, no. 1, pp. 31–41, 1976. doi: 10.1016/00051098(76)900662

[42] 
J. Ragazzini, “Engineering aspects of the human being as a servomechanism,” Am. Psychol., vol. 3, pp. 219–314, 1948. doi: 10.1037/h0056536

[43] 
E. Zergeroglu, W. Dixon, D. Haste, and D. Dawson, “A composite adaptive output feedback tracking controller for robotic manipulators,” Robotica, vol. 17, no. 6, pp. 591–600, 1999. doi: 10.1017/S0263574799001848

[44] 
H. Y. Lau and L. C. Wai, “A jacobianbased redundant control strategy for the 7DOF wam,” in Proc. 7th Int. Conf. Control, Automation, Robotics and Vision, (ICARCV 2002), IEEE, 2002, vol. 2, pp. 1060–1065.

[45] 
F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal Control. Hoboken, NewJersey: John Wiley & Sons, 2012.
