A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 9 Issue 8
Aug.  2022

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8, Top 4% (SCI Q1)
    CiteScore: 17.6, Top 3% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Q. M. Cheng, Y. Z. Zhou, H. Y. Huang, and Z. Y. Wang, “Multi-attention fusion and fine-grained alignment for bidirectional image-sentence retrieval in remote sensing,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 8, pp. 1532–1535, Aug. 2022. doi: 10.1109/JAS.2022.105773
Citation: Q. M. Cheng, Y. Z. Zhou, H. Y. Huang, and Z. Y. Wang, “Multi-attention fusion and fine-grained alignment for bidirectional image-sentence retrieval in remote sensing,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 8, pp. 1532–1535, Aug. 2022. doi: 10.1109/JAS.2022.105773

Multi-Attention Fusion and Fine-Grained Alignment for Bidirectional Image-Sentence Retrieval in Remote Sensing

doi: 10.1109/JAS.2022.105773
More Information
  • loading
  • [1]
    H. Chen, G. Ding, X. Liu, Z. Lin, J. Liu, and J. Han, “IMRAM: Iterative matching with recurrent attention memory for cross-modal image-text retrieval,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Jun. 2020, pp. 12652−12660.
    [2]
    K. Lee, X. Chen, G. Hua, H. Hu, and X. He, “Stacked cross attention for image-text matching,” in Proc. 15th European Conf. Computer Vision, Sep. 2018, pp. 201−216.
    [3]
    T. Wang, X. Xu, Y. Yang, A. Hanjalic, H. Shen, and J. Song, “Matching images and text with multi-modal tensor fusion and re-ranking,” in Proc. 27th ACM Int. Conf. Multimedia, 2019, pp. 12-20.
    [4]
    Y. Wang, H. Yang, X. Qian, L. Ma, and X. Fan, “Position focused attention network for image-text matching,” in Proc. 28th Int. Joint Conf. Artificial Intelligence, Aug. 2019, pp. 3792−3798.
    [5]
    G. Wu, J. Han, Z. Lin, G. Ding, B. Zhang, and Q. Ni, “Joint image-text hashing for fast large-scale cross-media retrieval using self-supervised deep learning,” IEEE Trans. Industrial Electronics, vol. 66, no. 12, pp. 9868–9877, Dec. 2019. doi: 10.1109/TIE.2018.2873547
    [6]
    Abdullah, Ba zi, Rahhal A, et al, “TextRS: Deep bidirectional triplet network for matching text to remote sensing images,” Remote Sensing, vol. 12, no. 3, pp. 405–423, Jan. 2020. doi: 10.3390/rs12030405
    [7]
    Q. Cheng, Y. Zhou, P. Fu, Y. Xu, and L. Zhang, “A deep semantic alignment network for the cross-modal image-text retrieval in remote sensing,” IEEE J. Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 4284–4297, Apr. 2021. doi: 10.1109/JSTARS.2021.3070872
    [8]
    Y. Lv, W. Xiong, X. Zhang, and Y. Cui, “Fusion-based correlation learning model for cross-modal remote sensing image retrieval,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, Jan. 2022.
    [9]
    Z. Yuan, W. Zhang, K. Fu, X. Li, C. Deng, H. Wang, and X. Sun, “Exploring a fine-grained multiscale method for cross-modal remote sensing image retrieval,” IEEE Trans. Geoscience and Remote Sensing, vol. 60, p. 4404119, May. 2022.
    [10]
    Z. Yuan, W. Zhang, X. Rong, X. Li, J. Chen, H. Wang, K. Fu, and X. Sun, “A lightweight multi-scale cross-modal text-image retrieval method in remote sensing,” IEEE Trans. Geoscience and Remote Sensing, vol. 60, p. 5612819, Apr. 2022.
    [11]
    G. Mikriukov, M. Ravanbakhsh, and B. Demir. “Deep unsupervised contrastive hashing for large-scale cross-modal text-image retrieval in remote sensing,” arXiv preprint arXiv: 2201.08125v1, Jan. 2022.
    [12]
    Y. Chen, X. Lu, and S. Wang, “Deep cross-modal image-voice retrieval in remote sensing,” IEEE Trans. Geoscience and Remote Sensing, vol. 58, no. 10, pp. 7049–7061, Oct. 2020. doi: 10.1109/TGRS.2020.2979273
    [13]
    U. Chaudhuri, B. Banerjee, A. Bhattacharya, and M. Datcu, “Attention-driven cross-modal remote sensing image retrieval”, in Proc. IEEE Int. Geoscience and Remote Sensing Symposium, 2021, pp. 4783−4786.
    [14]
    U. Chaudhuri, B. Banerjee, A. Bhattacharya, and M. Datcu, “CMIR-NET: A deep learning based model for cross-modal retrieval in remote sensing,” Pattern Recognition Letters, vol. 131, no. 2, pp. 456–462, 2020.
    [15]
    Y. Li, Y. Zhang, X. Huang, and J. Ma, “Learning source-invariant deep hashing convolutional neural networks for cross-source remote sensing image retrieval,” IEEE Trans. Geoscience and Remote Sensing, vol. 56, no. 11, pp. 6521–6536, Nov. 2018. doi: 10.1109/TGRS.2018.2839705
    [16]
    Y. Li, D. Kong, Y. Zhang, Y. Tan, and L. Chen, “Robust deep alignment network with remote sensing knowledge graph for zero-shot and generalized zero-shot remote sensing image scene classification,” ISPRS J. Photogrammetry and Remote Sensing, vol. 179, pp. 145–158, 2021. doi: 10.1016/j.isprsjprs.2021.08.001
    [17]
    X. Lu, B. Wang, X. Zheng, and X. Li, “Exploring models and data for remote sensing image caption generation,” IEEE Trans. Geoscience and Remote Sensing, vol. 56, no. 4, pp. 2183–2195, Apr. 2018. doi: 10.1109/TGRS.2017.2776321
    [18]
    B. Qu, X. Li, D. Tao, and X. Lu, “Deep semantic understanding of high resolution remote sensing image,” in Proc. Int. Conf. Computer, Inform. and Telecomm. Syst., pp. 124−128, Jul. 2016.
    [19]
    H. J. Hu, H. S. Wang, Z. Liu, and W. D. Chen, “Domain-invariant similarity activation map contrastive learning for retrieval-based long-term visual localization,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 2, pp. 313–328, Feb. 2022. doi: 10.1109/JAS.2021.1003907

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(6)  / Tables(5)

    Article Metrics

    Article views (379) PDF downloads(58) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return