A Spatial-Temporal Attention Model for Human Trajectory Prediction

Xiaodong Zhao; Yaran Chen; Jin Guo; Dongbin Zhao

doi:10.1109/JAS.2020.1003228

Volume 7 Issue 4

Jun. 2020

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2020 > 7(4): 965-974

Xiaodong Zhao, Yaran Chen, Jin Guo and Dongbin Zhao, "A Spatial-Temporal Attention Model for Human Trajectory Prediction," IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 965-974, July 2020. doi: 10.1109/JAS.2020.1003228

Citation:

Xiaodong Zhao, Yaran Chen, Jin Guo and Dongbin Zhao, "A Spatial-Temporal Attention Model for Human Trajectory Prediction," IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 965-974, July 2020. doi: 10.1109/JAS.2020.1003228

Citation:

PDF( 2648 KB)

A Spatial-Temporal Attention Model for Human Trajectory Prediction

doi: 10.1109/JAS.2020.1003228

Funds: This work was supported by the National Key Research and Development Program of China (2018AAA0101005, 2018AAA0102404), the Program of the Huawei Technologies Co. Ltd. (FA2018111061SOW12), the National Natural Science Foundation of China (61773054), and the Youth Research Fund of the State Key Laboratory of Complex Systems Management and Control (20190213)

More Information

Author Bio:
Xiaodong Zhao received the B.Eng. degree in automation from North China University of Technology, China in 2018. He is currently working toward the M.A.Sc degree in control science and engineering at University of Science and Technology Beijing. His research interests include trajectory prediction, semantic segmentation, neural architecture search, and autonomous driving

Yaran Chen received the Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences in 2018. She is currently an Assistant Researcher at the State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China. Her research interests include deep learning, neural architecture search, deep reinforcement learning and autonomous driving

Jin Guo received the B.S. degree in mathematics from Shandong University, China, in 2008, and Ph.D. degree in system modeling and control theory from the Academy of Mathematics and Systems Science, Chinese Academy of Sciences in 2013. He is currently a Professor with the School of Automation and Electrical Engineering, University of Science and Technology Beijing. His research interests include identification and control of set-valued systems and cyber-physical systems

Dongbin Zhao (M’06–SM’10–F’20) received the B.S., M.S., Ph.D. degrees from Harbin Institute of Technology, Harbin, China, in 1994, 1996, and 2000, respectively. He was a Postdoctoral Fellow at Tsinghua University, Beijing, China, from 2000 to 2002. He has been a Professor at the Institute of Automation, Chinese Academy of Sciences since 2002, and also a Professor with the University of Chinese Academy of Sciences, China. From 2007 to 2008, he was also a visiting scholar at the University of Arizona. He has published 6 books, and over 90 international journal papers. He serves as the Associate Editor of IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Cybernetics, IEEE Transactions on Artificial Intelligence, IEEE Computation Intelligence Magazine, etc. His current research interests include deep reinforcement learning, computational intelligence, autonomous driving, game artificial intelligence, robotics, and smart grids
Corresponding author: Y. R. Chen and D. B. Zhao are with the State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China (e-mail: chenyaran2013@ia.ac.cn; dongbin.zhao@ia.ac.cn); J. Guo is with the School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, and also with the Key Laboratory of Knowledge Automation for Industrial Processes, Ministry of Education, Beijing 100083, China (e-mail: guojin@ustb.edu.cn)
Received Date: 2020-03-24
Accepted Date: 2020-04-13

Available Online: 2020-05-20

Abstract

Abstract

Human trajectory prediction is essential and promising in many related applications. This is challenging due to the uncertainty of human behaviors, which can be influenced not only by himself, but also by the surrounding environment. Recent works based on long-short term memory (LSTM) models have brought tremendous improvements on the task of trajectory prediction. However, most of them focus on the spatial influence of humans but ignore the temporal influence. In this paper, we propose a novel spatial-temporal attention (ST-Attention) model, which studies spatial and temporal affinities jointly. Specifically, we introduce an attention mechanism to extract temporal affinity, learning the importance for historical trajectory information at different time instants. To explore spatial affinity, a deep neural network is employed to measure different importance of the neighbors. Experimental results show that our method achieves competitive performance compared with state-of-the-art methods on publicly available datasets.
- Attention mechanism,
- long-short term memory (LSTM),
- spatial-temporal model,
- trajectory prediction

FullText(HTML)

References(42)

References

[1]	L. Lv, D. B. Zhao, and Q. Q. Deng, “A semi-supervised predictive sparse decomposition based on task-driven dictionary learning,” Cognitive Computation, vol. 9, no. 1, pp. 1–10, 2017.
[2]	D. B. Zhao, Z. H. Hu, Z. P. Xia, C. Alippi, Y. H. Zhu, and D. Wang, “Fullrange adaptive cruise control based on supervised adaptive dynamic programming,” Neurocomputing, vol. 125, pp. 57–67, 2014. doi: 10.1016/j.neucom.2012.09.034
[3]	D. Li, D. B. Zhao, Q. C. Zhang, and Y. R. Chen, “Reinforcement learning and deep learning based lateral control for autonomous driving,” IEEE Computational Intelligence Magazine, vol. 14, no. 2, pp. 83–98, 2019.
[4]	D. Li, Q. C. Zhang, D. B. Zhao, Y. Z. Zhuang, B. Wang, W. Liu, R. Tutunov, and J. Wang, “Graph attention memory for visual navigation,” arXiv preprint arXiv: 1905.13315, 2019.
[5]	T. Yagi, K. Mangalam, R. Yonetani, and Y. Sato, “Future person localization in first-person videos,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition. IEEE, 2018, pp. 7593–7602.
[6]	D. Makris and T. Ellis, “Path detection in video surveillance,” Image and Vision Computing, vol. 20, no. 12, pp. 895–903, 2002. doi: 10.1016/S0262-8856(02)00098-7
[7]	Y. R. Chen, D. B. Zhao, L. Lv, and Q. C. Zhang, “Multi-task learning for dangerous object detection in autonomous driving,” Information Sciences, vol. 432, pp. 559–571, 2018. doi: 10.1016/j.ins.2017.08.035
[8]	D. B. Zhao, Y. R. Chen, and L. Lv, “Deep reinforcement learning with visual attention for vehicle classification,” IEEE Trans. Cognitive and Developmental Systems, vol. 9, no. 4, pp. 356–367, 2017. doi: 10.1109/TCDS.2016.2614675
[9]	A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, F. F. Li, and S. Savarese, “Social LSTM: Human trajectory prediction in crowded spaces,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition. IEEE, 2016, pp. 961–971.
[10]	A. Gupta, J. Johnson, F. F. Li, S. Savarese, and A. Alahi, “Social GAN: Socially acceptable trajectories with generative adversarial networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition. IEEE, 2018, pp. 2255–2264.
[11]	J. Chen, J. Liu, J. W. Liang, T. Y. Hu, W. Ke, W. Barrios, D. Huang, and A. G. Hauptmann, “Minding the gaps in a video action analysis pipeline,” in Proc. IEEE Winter Applications of Computer Vision Workshops. IEEE, 2019, pp. 41–46.
[12]	S. Pellegrini, A. Ess, and L. Van Gool, “Improving data association by joint modeling of pedestrian trajectories and groupings,” in Proc. European Conf. Computer Vision. Springer, 2010, pp. 452–465.
[13]	A. Lerner, Y. Chrysanthou, and D. Lischinski, “Crowds by example,” Computer Graphics Forum, vol. 36, no. 3, pp. 655–664, 2007.
[14]	G. Awad, A. Butt, K. Curtis, J. Fiscus, A. Godil, A. F. Smeaton, Y. Graham, W. Kraaij, G. Qunot, J. Magalhaes, D. Semedo, and S. Blasi, “Trecvid 2018: Benchmarking video activity detection, video captioning and matching, video storytelling linking and video search,” in TRECVID. 2018.
[15]	G. G. Qu and D. Shen, “Stochastic iterative learning control with faded signals,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 5, pp. 1196–1208, 2019. doi: 10.1109/JAS.2019.1911696
[16]	Y. R. Chen, D. B. Zhao, and H. R. Li, “Deep Kalman filter with optical flow for multiple object tracking,” in IEEE Int. Conf. Systems, Man, and Cybernetics. Bari, Italy: IEEE, Oct. 2019. pp. 3036–3041.
[17]	C. K. I. Williams, “Prediction with Gaussian processes: From linear regression to linear prediction and beyond,” in Nato Advanced Study Institute on Learning in Graphical Models. Springer, 1998, pp. 599–621.
[18]	D. Helbing and P. Molnár, “Social force model for pedestrian dynamics,” Physical Review E, vol. 51, no. 5, pp. 4282–4286, 1995. doi: 10.1103/PhysRevE.51.4282
[19]	A. Johansson, D. Helbing, and P. K. Shukla, “Specification of the social force pedestrian model by evolutionary adjustment to video tracking data,” Advances in Complex Systems, vol. 10, no. supp02, pp. 271–288, 2007. doi: 10.1142/S0219525907001355
[20]	H. Su, Y. R. Chen, S. W. Tong, and D. B. Zhao, “Real-time multiple object tracking based on optical flow,” in Proc. 9th Int. Conf. Information Science and Technology. IEEE, 2019. PP. 350–356.
[21]	S. Yi, H. S. Li, and X. G. Wang, “Pedestrian behavior understanding and prediction with deep neural networks,” in Proc. European Conf. Computer Vision. Springer, 2016, pp. 263–279.
[22]	S. Y. Huang, X. Li, Z. F. Zhang, Z. Z. He, F. Wu, W. Liu, J. H. Tang, and Y. T. Zhuang, “Deep learning driven visual path prediction from a single image,” IEEE Trans. Image Processing, vol. 25, no. 12, pp. 5892–5904, 2016. doi: 10.1109/TIP.2016.2613686
[23]	E. Principi, D. Rossetti, S. Squartini, and F. Piazza, “Unsupervised electric motor fault detection by using deep autoencoders,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 2, pp. 441–451, 2019. doi: 10.1109/JAS.2019.1911393
[24]	Y. H. Wu, M. Schuster, Z. F. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, and et al., “Google’s neural machine translation system: Bridging the gap between human and machine translation,” arXiv preprint arXiv: 1609.08144, 2016.
[25]	D. Yu and J. Y. Li, “Recent progresses in deep learning based acoustic models,” IEEE/CAA J. Autom. Sinica, vol. 4, no. 3, pp. 396–409, 2017. doi: 10.1109/JAS.2017.7510508
[26]	K. Xu, J. L. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” in Proc. Int. Conf. Machine Learning. 2015, pp. 2048–2057.
[27]	D. Quang and X. H. Xie, “DanQ: A hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences,” Nucleic Acids Research, vol. 44, no. 11, pp. e107-1–e107-6, 2016. doi: 10.1093/nar/gkw226
[28]	J. W. Liang, L. Jiang, J. C. Niebles, A. G. Hauptmann, and F. F. Li, “Peeking into the future: Predicting future person activities and locations in videos,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition. IEEE, 2019, pp. 5725–5734.
[29]	J. W. Liang, L. Jiang, L. L. Cao, L. J. Li, and A. Hauptmann, “Focal visual-text attention for visual question answering,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition. IEEE, 2018, pp. 6135–6143.
[30]	Y. Y. Xu, Z. X. Piao, and S. H. Gao, “Encoding crowd interaction with deep neural network for pedestrian trajectory prediction,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition. IEEE, 2018, pp. 5275–5284.
[31]	H. Su, Y. P. Dong, J. Zhu, H. B. Ling, and B. Zhang, “Crowd scene understanding with coherent recurrent neural networks,” in Proc. 25th Int. Joint Conf. Artificial Intelligence, vol. 1, pp. 3469–3476, 2016.
[32]	H. Su, J. Zhu, Y. P. Dong, and B. Zhang, “Forecast the plausible paths in crowd scenes,” in Proc. 26th Int. Joint Conf. Artificial Intelligence, vol. 1, pp. 2772–2778, 2017.
[33]	P. Zhang, W. L. Ouyang, P. F. Zhang, J. R. Xue, and N. N. Zheng, “SR-LSTM: State refinement for LSTM towards pedestrian trajectory prediction,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition. IEEE, 2019, pp. 12085–12094.
[34]	A. Sadeghian, V. Kosaraju, A. Sadeghian, N. Hirose, and S. Savarese, “Sophie: An attentive gan for predicting paths compliant to social and physical constraints,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition. IEEE, 2019, pp. 1349–1358.
[35]	C. Wang, H. Han, X. Shang, and X. Zhao, “A new deep learning method based on unsupervised domain adaptation and re-ranking in person re-identification,” Int. J. Pattern Recognition and Artificial Intelligence, 2019.
[36]	M. Köestinger, M. Hirzer, P. Wohlhart, P. M. Roth, and H. Bischof, “Large scale metric learning from equivalence constraints,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition. IEEE, Jun. 2012.
[37]	H. Han, M. C. Zhou, and Y. Zhang, “Can virtual samples solve small sample size problem of KISSME in pedestrian re-identification of smart transportation,” IEEE Trans. Intelligent Transportation Systems, 2019.
[38]	H. Han, M. C. Zhou, X. W. Shang, W. Cao, and A. Abusorrah, “KISS+ for rapid and accurate pedestrian re-identification,” IEEE Trans. Intelligent Transportation Systems, 2020.
[39]	A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems. 2017, pp. 5998–6008.
[40]	D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” Computer Science, arXiv preprint arXiv: 1409.0473, 2014.
[41]	S. C. Gao, M. C. Zhou, Y. R. Wang, J. J. Cheng, Y. Hanaki, and J. H. Wang, “Dendritic neuron model with effective learning algorithms for classification, approximation and prediction,” IEEE Trans. Neural Networks and Learning Systems, vol. 30, no. 2, pp. 601–614, 2019. doi: 10.1109/TNNLS.2018.2846646
[42]	J. J. Wang, and T. Kumbasar, “Parameter optimization of interval Type-2 fuzzy neural networks based on PSO and BBBC methods,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 1, pp. 247–257, 2019. doi: 10.1109/JAS.2019.1911348

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(6) / Tables(5)

Get Citation

PDF

XML

Article Metrics

Article views (2697) PDF downloads(145)

Highlights

Study the trajectory prediction jointly with temporal and spatial affinities.
A LSTM model that uses attention mechanism to improve the accuracy of trajectory prediction .
An experimental error analysis using data based on both world plane and image plane.

A Spatial-Temporal Attention Model for Human Trajectory Prediction

doi: 10.1109/JAS.2020.1003228

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content