A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 8 Issue 7
Jul.  2021

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 6.171, Top 11% (SCI Q1)
    CiteScore: 11.2, Top 5% (Q1)
    Google Scholar h5-index: 51, TOP 8
Turn off MathJax
Article Contents
Tian Wang, Xing Xu, Fumin Shen and Yang Yang, "A Cognitive Memory-Augmented Network for Visual Anomaly Detection," IEEE/CAA J. Autom. Sinica, vol. 8, no. 7, pp. 1296-1307, July 2021. doi: 10.1109/JAS.2021.1004045
Citation: Tian Wang, Xing Xu, Fumin Shen and Yang Yang, "A Cognitive Memory-Augmented Network for Visual Anomaly Detection," IEEE/CAA J. Autom. Sinica, vol. 8, no. 7, pp. 1296-1307, July 2021. doi: 10.1109/JAS.2021.1004045

A Cognitive Memory-Augmented Network for Visual Anomaly Detection

doi: 10.1109/JAS.2021.1004045
Funds:  This work was supported in part by the National Natural Science Foundation of China (61976049, 62072080, U20B2063), the Fundamental Research Funds for the Central Universities (ZYGX2019Z015), the Sichuan Science and Technology Program, China (2018GZDZX0032, 2019ZDZX0008, 2019YFG0003, 2019YFG0533, 2020YFS0057), and Dongguan Songshan Lake Introduction Program of Leading Innovative and Entrepreneurial Talents
More Information
  • With the rapid development of automated visual analysis, visual analysis systems have become a popular research topic in the field of computer vision and automated analysis. Visual analysis systems can assist humans to detect anomalous events (e.g., fighting, walking alone on the grass, etc). In general, the existing methods for visual anomaly detection are usually based on an autoencoder architecture, i.e., reconstructing the current frame or predicting the future frame. Then, the reconstruction error is adopted as the evaluation metric to identify whether an input is abnormal or not. The flaws of the existing methods are that abnormal samples can also be reconstructed well. In this paper, inspired by the human memory ability, we propose a novel deep neural network (DNN) based model termed cognitive memory-augmented network (CMAN) for the visual anomaly detection problem. The proposed CMAN model assumes that the visual analysis system imitates humans to remember normal samples and then distinguishes abnormal events from the collected videos. Specifically, in the proposed CMAN model, we introduce a memory module that is able to simulate the memory capacity of humans and a density estimation network that can learn the data distribution. The reconstruction errors and the novelty scores are used to distinguish abnormal events from videos. In addition, we develop a two-step scheme to train the proposed model so that the proposed memory module and the density estimation network can cooperate to improve performance. Comprehensive experiments evaluated on various popular benchmarks show the superiority and effectiveness of the proposed CMAN model for visual anomaly detection comparing with the state-of-the-arts methods. The implementation code of our CMAN method can be accessed at https://github.com/CMAN-code/CMAN_pytorch.

     

  • loading
  • [1]
    M. Hasan, J. Choi, J. Neumann, A. K. Roy-Chowdhury, and L. S. Davis, “Learning temporal regularity in video sequences,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2016, pp. 733–742.
    [2]
    W. Luo, W. Liu, and S. Gao, “Remembering history with convolutional lstm for anomaly detection,” in Proc. IEEE Int. Conf. Multimedia and Expo (ICME), IEEE, 2017, pp. 439–444.
    [3]
    D. Gong, L. Liu, V. Le, B. Saha, M. R. Mansour, S. Venkatesh, and A. van den Hengel, “Memorizing normality to detect anomaly: Memoryaugmented deep autoencoder for unsupervised anomaly detection,” in Proc. IEEE Int. Conf. Computer Vision, 2019, pp. 1705–1714.
    [4]
    W. Luo, W. Liu, and H. Gao, “A revisit of sparse coding based anomaly detection in stacked rnn framework,” in Proc. IEEE Int. Conf. Computer Vision, 2017, pp. 341–349.
    [5]
    Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layerwise training of deep networks,” in Proc. Advances Neural Information Processing Systems 19, Proc. the Twentieth Annual Conf. on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 4–7, 2007.
    [6]
    D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” arXiv preprint arXiv: 1312.6114, 2013.
    [7]
    X. Xu, K. Lin, L. Gao, H. Lu, H. T. Shen, and X. Li, “Cross-modal common representations by private-shared subspaces separation, ” IEEE Trans. Cybernetics, pp. 1–14, 2020.
    [8]
    D. Abati, A. Porrello, S. Calderara, and R. Cucchiara, “Latent space autoregression for novelty detection,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 481–490.
    [9]
    P. Perera, R. Nallapati, and B. Xiang, “Ocgan: One-class novelty detection using gans with constrained latent representations,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 2898–2906.
    [10]
    T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs, “Unsupervised anomaly detection with generative adversarial networks to guide marker discovery,” in Proc. Int. Conf. Information Processing Medical Imaging, Springer, 2017, pp. 146–157.
    [11]
    W. Liu, W. Luo, D. Lian, and S. Gao, “Future frame prediction for anomaly detection–a new baseline,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 6536–6545.
    [12]
    R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” in Proc IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR06), IEEE, 2006, pp. 1735–1742.
    [13]
    W. Li, V. Mahadevan, and N. Vasconcelos, “Anomaly detection and localization in crowded scenes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 1, pp. 18–32, 2013.
    [14]
    Y. Mizukami, K. Tadamura, J. Warrell, P. Li, and S. Prince, “Cuda implementation of deformable pattern recognition and its application to mnist handwritten digit database,” in Proc 20th Int. Conf. Pattern Recognition, IEEE, 2010, pp. 2001–2004.
    [15]
    A. Krizhevsky, V. Nair, and G. Hinton, “The cifar-10 dataset,” [online], vailible: http://www.cs.toronto.edu/kriz/cifar.html, vol. 55, 2014.
    [16]
    P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “MVTec AD – A comprehensive real-world dataset for unsupervised anomaly detection,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2020, pp. 9592–9600.
    [17]
    H. Park, J. Noh, and B. Ham, “Learning memory-guided normality for anomaly detection,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2020, pp. 14 372–14 381.
    [18]
    R. T. Ionescu, F. S. Khan, M.-I. Georgescu, and L. Shao, “Object-centric auto-encoders and dummy anomalies for abnormal event detection in video,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 7842–7851.
    [19]
    B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen, “Deep autoencoding gaussian mixture model for unsupervised anomaly detection,” in Proc. Int. Conf. Learning Representations, 2018, pp. 1–14.
    [20]
    M. Sabokrou, M. Fayyaz, M. Fathy, Z. Moayed, and R. Klette, “Deepanomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes,” Computer Vision and Image Understanding, vol. 172, pp. 88–97, 2018. doi: 10.1016/j.cviu.2018.02.006
    [21]
    C. M. Bishop, Pattern Recognition and Machine Learning. New York, USA: Springer, 2006.
    [22]
    J. Kim and K. Grauman, “Observe locally, infer globally: A space-time mrf for detecting abnormal activities with incremental updates,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, IEEE, 2009, pp. 2921–2928.
    [23]
    V. Mahadevan, W. Li, V. Bhalodia, and N. Vasconcelos, “Anomaly detection in crowded scenes,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, IEEE, 2010, pp. 1975–1981.
    [24]
    Y. Zhao, B. Deng, C. Shen, Y. Liu, H. Lu, and X.-S. Hua, “Spatiotemporal autoencoder for video anomaly detection,” in Proc. 25th ACM Int. Conf. Multimedia, 2017, pp. 1933–1941.
    [25]
    A. Agarwal, A. Sarkar, and A. K. Dubey, “Computer vision-based fruit disease detection and classification,” in Smart Innovations Communication and Computational Sciences. New York, USA: Springer, 2019, pp. 105–115.
    [26]
    D. Vallejo, J. Albusac, L. Jimenez, C. Gonzalez, and J. Moreno, “A cognitive surveillance system for detecting incorrect traffic behaviors,” Expert Systems with Applications, vol. 36, no. 7, pp. 10 503–10 511, 2009. doi: 10.1016/j.eswa.2009.01.034
    [27]
    T. J. Prescott, D. Camilleri, U. Martinez-Hernandez, A. Damianou, and N. D. Lawrence, “Memory and mental time travel in humans and social robots,” Philosophical Transactions of the Royal Society B, vol. 374, no. 1771, pp. 352–369, 2019.
    [28]
    W. Dodd and R. Gutierrez, “The role of episodic memory and emotion in a cognitive robot,” in Proc. ROMAN IEEE Int. Workshop on Robot and Human Interactive Communication, IEEE, 2005, pp. 692–697.
    [29]
    P. Luc, N. Neverova, C. Couprie, J. Verbeek, and Y. LeCun, “Predicting deeper into the future of semantic segmentation,” in Proc. IEEE Int. Conf. Computer Vision, 2017, pp. 648–657.
    [30]
    A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “Wavenet: A generative model for raw audio,” arXiv preprint arXiv: 1609.03499, 2016.
    [31]
    H. Larochelle and I. Murray, “The neural autoregressive distribution estimator,” in Proc. Fourteenth Int. Conf. Artificial Intelligence and Statistics, 2011, pp. 29–37.
    [32]
    B. Uria, I. Murray, and H. Larochelle, “Rnade: The real-valued neural autoregressive density-estimator,” in Proc. Advances Neural Information Processing Systems, 2013, pp. 2175–2183.
    [33]
    A. van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, and A. Graves, “Conditional image generation with pixelcnn decoders,” in Proc. Advances Neural Information Processing Systems, 2016, pp. 4790–4798.
    [34]
    J. Weston, S. Chopra, and A. Bordes, “Memory networks, ” arXiv preprint arXiv: 1410.3916, 2014.
    [35]
    J. Rae, J. J. Hunt, I. Danihelka, T. Harley, A. W. Senior, G. Wayne, A. Graves, and T. Lillicrap, “Scaling memory-augmented neural networks with sparse reads and writes,” in Proc. Advances Neural Information Processing Systems, 2016, pp. 3621–3629.
    [36]
    C. Stauffer and W. E. L. Grimson, “Adaptive background mixture models for real-time tracking,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, vol. 2. IEEE, 1999, pp. 246–252.
    [37]
    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv: 1412.6980, 2014.
    [38]
    S. A. Nene, S. K. Nayar, and H. Murase, “Columbia object image library (coil-20),” Tech. Rep. cucus-006-96, 1996.
    [39]
    H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv: 1708.07747, 2017.
    [40]
    R. Tudor Ionescu, S. Smeureanu, B. Alexe, and M. Popescu, “Unmasking the abnormal events in video,” in Proc. IEEE Int. Conf. Computer Vision, 2017, pp. 2895–2903.
    [41]
    B. Scholkopf, J. Platt, and J. Taylor, “Estimating the support of a high dimensional distribution neural computation,” Neural Computation, vol. 13, no. 7, pp. 1443–1471, 2001. doi: 10.1162/089976601750264965
    [42]
    L. Ruff, R. Vandermeulen, N. Goernitz, L. Deecke, S. A. Siddiqui, A. Binder, E. Müller, and M. Kloft, “Deep one-class classification,” in Proc. Int. conf. Machine Learning, 2018, pp. 4393–4402.
    [43]
    M. Sabokrou, M. Khalooei, M. Fathy, and E. Adeli, “Adversarially learned one-class classifier for novelty detection,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 3379–3388.
    [44]
    M. Sakurada and T. Yairi, “Anomaly detection using autoencoders with nonlinear dimensionality reduction,” in Proc. MLSDA 2nd Workshop Machine Learning for Sensory Data Analysis, 2014, pp. 4–11.
    [45]
    S. Pidhorskyi, R. Almohsen, and G. Doretto, “Generative probabilistic novelty detection with adversarial autoencoders,” in Proc. Advances Neural Information Processing Systems, 2018, pp. 6822–6833.
    [46]
    P. Bergmann, S. Lwe, M. Fauser, D. Sattlegger, and C. Steger, “Improving unsupervised defect segmentation by applying structural similarity to autoencoders,” in Proc.14th Int. Conf. Computer Vision Theory and Applications, 2019, 372–380.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(8)  / Tables(5)

    Article Metrics

    Article views (768) PDF downloads(54) Cited by()

    Highlights

    • A Cognitive Memory-Augmented Network is proposed for visual anomaly detection.
    • A memory module is designed to simulate the memory capacity of humans.
    • A density estimation module is developed to learn the data distribution.
    • A two-step scheme is proposed to enable the cooperation of the two modules.

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return