A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 8 Issue 9
Sep.  2021

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 6.171, Top 11% (SCI Q1)
    CiteScore: 11.2, Top 5% (Q1)
    Google Scholar h5-index: 51, TOP 8
Turn off MathJax
Article Contents
Ke Zhang, Yukun Su, Xiwang Guo, Liang Qi and Zhenbing Zhao, "MU-GAN: Facial Attribute Editing Based on Multi-Attention Mechanism," IEEE/CAA J. Autom. Sinica, vol. 8, no. 9, pp. 1614-1626, Sept. 2021. doi: 10.1109/JAS.2020.1003390
Citation: Ke Zhang, Yukun Su, Xiwang Guo, Liang Qi and Zhenbing Zhao, "MU-GAN: Facial Attribute Editing Based on Multi-Attention Mechanism," IEEE/CAA J. Autom. Sinica, vol. 8, no. 9, pp. 1614-1626, Sept. 2021. doi: 10.1109/JAS.2020.1003390

MU-GAN: Facial Attribute Editing Based on Multi-Attention Mechanism

doi: 10.1109/JAS.2020.1003390
Funds:  This work was supported in part by the National Natural Science Foundation of China (NSFC) (62076093, 61871182, 61302163, 61401154), the Beijing Natural Science Foundation (4192055), the Natural Science Foundation of Hebei Province of China (F2015502062, F2016502101, F2017502016), the Fundamental Research Funds for the Central Universities (2020YJ006, 2020MS099), and the Open Project Program of the National Laboratory of Pattern Recognition (NLPR) (201900051)
More Information
  • Facial attribute editing has mainly two objectives: 1) translating image from a source domain to a target one, and 2) only changing the facial regions related to a target attribute and preserving the attribute-excluding details. In this work, we propose a multi-attention U-Net-based generative adversarial network (MU-GAN). First, we replace a classic convolutional encoder-decoder with a symmetric U-Net-like structure in a generator, and then apply an additive attention mechanism to build attention-based U-Net connections for adaptively transferring encoder representations to complement a decoder with attribute-excluding detail and enhance attribute editing ability. Second, a self-attention (SA) mechanism is incorporated into convolutional layers for modeling long-range and multi-level dependencies across image regions. Experimental results indicate that our method is capable of balancing attribute editing ability and details preservation ability, and can decouple the correlation among attributes. It outperforms the state-of-the-art methods in terms of attribute manipulation accuracy and image quality. Our code is available at https://github.com/SuSir1996/MU-GAN.

     

  • loading
  • [1]
    D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv: 1312.6114, 2013.
    [2]
    X. Wang, Z. Ning, M. Zhou, X. Hu, L. Wang, Y. Zhang, F. R. Yu, and B. Hu, “Privacy-preserving content dissemination for vehicular social networks: Challenges and solutions,” IEEE Communications Surveys &Tutorials, vol. 21, no. 2, pp. 1314–1345, 2018.
    [3]
    Z. He, W. Zuo, M. Kan, S. Shan, and X. Chen, “Attgan: Facial attribute editing by only changing what you want,” IEEE Trans. Image Processing, vol. 28, no. 11, pp. 5464–5478, 2019. doi: 10.1109/TIP.2019.2916751
    [4]
    M. Y. Liu, T. Breuel, and J. Kautz, “Unsupervised image-toimage translation networks, ” in Proc. Advances Neural Information Processing Systems, Long Beach, USA, 2017, pp. 700–708.
    [5]
    G. Lample, N. Zeghidour, N. Usunier, A. Bordes, L. Denoyer, and M. Ranzato, “Fader networks: Manipulating images by sliding attributes, ” in Proc. Advances Neural Information Processing Systems, Long Beach, USA, 2017, pp. 5967–5976.
    [6]
    P. Li, Y. Hu, R. He, and Z. Sun, “Global and local consistent wavelet-domain age synthesis,” IEEE Trans. Information Forensics and Security, vol. 14, no. 11, pp. 2943–2957, 2019. doi: 10.1109/TIFS.2019.2907973
    [7]
    H. Yang, D. Huang, Y. Wang, and A. K. Jain, “Learning face age progression: A pyramid architecture of gans, ” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 31–39.
    [8]
    H. Dong, P. Neekhara, C. Wu, and Y. Guo, “Unsupervised image-to-image translation with generative adversarial networks, ” arXiv preprint arXiv: 1701.02676, 2017.
    [9]
    A. Pumarola, A. Agudo, A. M. Martinez, A. Sanfeliu, and F. Moreno-Noguer, “Ganimation: Anatomically-aware facial animation from a single image,” in Proc. European Conf. Computer Vision, Munich, Germany, 2018, pp. 818–833.
    [10]
    J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks, ” in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 2223–2232.
    [11]
    Y. Choi, M. Choi, M. Kim, J. W. Ha, S. Kim, and J. Choo, “Stargan: Unified generative adversarial networks for multidomain image-to-image translation, ” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 8789–8797.
    [12]
    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition, ” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 770–778.
    [13]
    K. Zhang, M. Sun, T. X. Han, X. Yuan, L. Guo, and T. Liu, “Residual networks of residual networks: Multilevel residual networks,” IEEE Trans. Circuits and Systems for Video Technology, vol. 28, no. 6, pp. 1303–1314, 2017.
    [14]
    O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation, ” in Proc. Int. Conf. on Medical Image Computing and Computerassisted Intervention, Munich, Germany, 2015, pp. 234–241.
    [15]
    M. Liu, Y. Ding, M. Xia, X. Liu, E. Ding, W. Zuo, and S. Wen, “Stgan: A unified selective transfer network for arbitrary image attribute editing,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 3673–3682.
    [16]
    L. Chen, X. Hu, W. Tian, H. Wang, D. Cao, and F. Y. Wang, “Parallel planning: A new motion planning framework for autonomous driving,” IEEE/CAA Journal of Automatica Sinica, vol. 6, no. 1, pp. 236–246, 2018.
    [17]
    H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Selfattention generative adversarial networks, ” arXiv preprint arXiv: 1805.08318, 2018.
    [18]
    M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” arXiv preprint arXiv: 1701.07875, 2017.
    [19]
    I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” in Proc. Advances Neural Information Processing Systems, Long Beach, USA, 2017, pp. 5767–5777.
    [20]
    X. Wang, Q. Kang, J. An, and M. Zhou, “Drifted twitter spam classification using multiscale detection test on KL divergence,” IEEE Access, vol. 7, pp. 108 384–108 394, 2019. doi: 10.1109/ACCESS.2019.2932018
    [21]
    M. Mirza and S. Osindero, “Conditional generative adversarial nets, ” arXiv preprint arXiv: 1411.1784, 2014.
    [22]
    A. Odena, “Semi-supervised learning with generative adversarial networks, ” arXiv preprint arXiv: 1606.01583, 2016.
    [23]
    A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with auxiliary classifier gans, ” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 2642–2651.
    [24]
    S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to image synthesis, ” in Proc. 33th Int. Conf. Machine Learning, New York, USA, 2016, pp. 1060–1069.
    [25]
    H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N. Metaxas, “Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks, ” in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 5907–5915.
    [26]
    Z. Shu, E. Yumer, S. Hadap, K. Sunkavalli, E. Shechtman, and D. Samaras, “Neural face editing with intrinsic image disentangling, ” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Hawaii, USA, 2017, pp. 5541–5550.
    [27]
    Y. Taigman, A. Polyak, and L. Wolf, “Unsupervised crossdomain image generation, ” arXiv preprint arXiv: 1611.02200, 2016.
    [28]
    T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim, “Learning to discover cross-domain relations with generative adversarial networks, ” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 1857–1865.
    [29]
    C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Z. Shi, “Photo-realistic single image super-resolution using a generative adversarial network, ” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Hawaii, USA, 2017, pp. 4681–4690.
    [30]
    B. Xu, L. Ma, L. Zhang, H. Li, Q. Kang, and M. Zhou, “An adaptive wordpiece language model for learning chinese word embeddings, ” in Proc. IEEE 15th Int. Conf. Automation Science and Engineering. IEEE, 2019, pp. 812–817.
    [31]
    S. Gao, M. Zhou, Y. Wang, J. Cheng, H. Yachi, and J. Wang, “Dendritic neuron model with effective learning algorithms for classification, approximation, and prediction,” IEEE Trans. Neural Networks and Learning Systems, vol. 30, no. 2, pp. 601–614, 2018.
    [32]
    X. Guo, M. Zhou, S. Liu, and L. Qi, “Lexicographic multiobjective scatter search for the optimization of sequencedependent selective disassembly subject to multiresource constraints,” IEEE Transactions on Cybernetics, vol. 50, no. 7, pp. 3307–3317, 2020. doi: 10.1109/TCYB.2019.2901834
    [33]
    X. Guo, S. Liu, M. Zhou, and G. Tian, “Dual-objective program and scatter search for the optimization of disassembly sequences subject to multiresource constraints,” IEEE Trans. Automation Science and Engineering, vol. 15, no. 3, pp. 1091–1103, 2017.
    [34]
    I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets, ” in Proc. Advances in Neural Information Processing Systems, Montreal, Canada, 2014, pp. 2672–2680.
    [35]
    K. Wang, C. Gou, Y. Duan, Y. Lin, X. Zheng, and F.-Y. Wang, “Generative adversarial networks: introduction and outlook,” IEEE/CAA Journal of Automatica Sinica, vol. 4, no. 4, pp. 588–598, 2017. doi: 10.1109/JAS.2017.7510583
    [36]
    G. J. Qi, “Loss-sensitive generative adversarial networks on lipschitz densities,” Int. Journal of Computer Vision, vol. 128, no. 5, pp. 1118–1140, 2020. doi: 10.1007/s11263-019-01265-2
    [37]
    M. Y. Liu and O. Tuzel, “Coupled generative adversarial networks, ” in Proc. Advances Neural Information Processing Systems, Barcelona Spain, 2016, pp. 469–477.
    [38]
    A. Almahairi, S. Rajeshwar, A. Sordoni, P. Bachman, and A. Courville, “Augmented cyclegan: Learning many-to-many mappings from unpaired data, ” in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 195–204.
    [39]
    P. Xiang, L. Wang, F. Wu, J. Cheng, and M. Zhou, “Singleimage de-raining with feature-supervised generative adversarial network,” IEEE Signal Processing Letters, vol. 26, no. 5, pp. 650–654, 2019. doi: 10.1109/LSP.2019.2903874
    [40]
    S. Zhou, T. Xiao, Y. Yang, D. Feng, Q. He, and W. He, “Genegan: Learning object transfiguration and attribute subspace from unpaired data, ” arXiv preprint arXiv: 1705.04932, 2017.
    [41]
    T. Xiao, J. Hong, and J. Ma, “Dna-gan: learning disentangled representations from multi-attribute images, ” arXiv preprint arXiv: 1711.05415, 2017.
    [42]
    G. Perarnau, J. Van De Weijer, B. Raducanu, and J. M. Álvarez, “Invertible conditional gans for image editing, ” arXiv preprint arXiv: 1611.06355, 2016.
    [43]
    P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, “Image-toimage translation with conditional adversarial networks, ” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Hawaii, USA, 2017, pp. 1125–1134.
    [44]
    Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild, ” in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 3730–3738.
    [45]
    X. Guo, S. Liu, M. Zhou, and G. Tian, “Disassembly sequence optimization for large-scale products with multiresource constraints using scatter search and petri nets,” IEEE Trans. Cybernetics, vol. 46, no. 11, pp. 2435–2446, 2015.
    [46]
    G. Cai, Y. Wang, L. He, and M. Zhou, “Unsupervised domain adaptation with adversarial residual transform networks, ” IEEE Trans. Neural Networks and Learning Systems, 2019, to be published. DOI: 10.1109/TNNLS.2019.2935384.
    [47]
    X. Hu, J. Cheng, M. Zhou, B. Hu, X. Jiang, Y. Guo, K. Bai, and F. Wang, “Emotion-aware cognitive system in multi-channel cognitive radio ad hoc networks,” IEEE Communications Magazine, vol. 56, no. 4, pp. 180–187, 2018. doi: 10.1109/MCOM.2018.1700728
    [48]
    K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation, ” in Proc. Conf. Empirical Methods Natural Language Processing, Doha, Qatar, 2014, p. 1724–1734.
    [49]
    S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. doi: 10.1162/neco.1997.9.8.1735
    [50]
    E. Principi, D. Rossetti, S. Squartini, and F. Piazza, “Unsupervised electric motor fault detection by using deep autoencoders,” IEEE/CAA Journal of Automatica Sinica, vol. 6, no. 2, pp. 441–451, 2019. doi: 10.1109/JAS.2019.1911393
    [51]
    K. Zhang, N. Liu, X. Yuan, X. Guo, C. Gao, Z. Zhao, and Z. Ma, “Fine-grained age estimation in the wild with attention lstm networks, ” IEEE Trans. Circuits and Systems for Video Technology, 2019, to be published. DOI: 10.1109/TCSVT.2019.2936410.
    [52]
    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need, ” in Proc. Advances Neural Information Processing Systems, Long Beach, USA, 2017, pp. 5998–6008.
    [53]
    X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks, ” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 7794–7803.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(10)  / Tables(5)

    Article Metrics

    Article views (1117) PDF downloads(63) Cited by()

    Highlights

    • Constructing a symmetric U-Net-like architecture generator based on an additive attention mechanism, which effectively enhances detail preservation and attribute manipulation abilities.
    • Taking a self-attention mechanism into the existing encoder-decoder architecture thus effectively enforcing geometric constraints on generated results.
    • Introducing a multi-attention mechanism to help attribute decoupling, i.e., it can deal with the interference among attributes and only change the attributes that need to be changed.

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return