2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

自适应特征融合的多模态实体对齐研究

郭浩 李欣奕 唐九阳 郭延明 赵翔

郭浩, 李欣奕, 唐九阳, 郭延明, 赵翔. 自适应特征融合的多模态实体对齐研究. 自动化学报, 2024, 50(4): 1001−1013 doi: 10.16383/j.aas.c210518
引用本文: 郭浩, 李欣奕, 唐九阳, 郭延明, 赵翔. 自适应特征融合的多模态实体对齐研究. 自动化学报, 2024, 50(4): 1001−1013 doi: 10.16383/j.aas.c210518
Guo Hao, Li Xin-Yi, Tang Jiu-Yang, Guo Yan-Ming, Zhao Xiang. Adaptive feature fusion for multi-modal entity alignment. Acta Automatica Sinica, 2024, 50(4): 1001−1013 doi: 10.16383/j.aas.c210518
Citation: Guo Hao, Li Xin-Yi, Tang Jiu-Yang, Guo Yan-Ming, Zhao Xiang. Adaptive feature fusion for multi-modal entity alignment. Acta Automatica Sinica, 2024, 50(4): 1001−1013 doi: 10.16383/j.aas.c210518

自适应特征融合的多模态实体对齐研究

doi: 10.16383/j.aas.c210518
基金项目: 国家自然科学基金(62002373, 61872446, 71971212, U19B2024)资助
详细信息
    作者简介:

    郭浩:国防科技大学博士研究生. 主要研究方向为知识图谱构建与融合技术. E-mail: guo_hao@nudt.edu.cn

    李欣奕:国防科技大学博士. 主要研究方向为自然语言处理和信息检索. 本文通信作者. E-mail: lixinyimichael@163.com

    唐九阳:国防科技大学教授. 主要研究方向为智能分析, 大数据和社会计算. E-mail: 13787319678@163.com

    郭延明:国防科技大学副教授. 主要研究方向为深度学习, 跨媒体信息处理与智能对抗. E-mail: guoyanming@nudt.edu.cn

    赵翔:国防科技大学教授. 主要研究方向为图数据管理与挖掘和智能分析. E-mail: xiangzhao@nudt.edu.cn

Adaptive Feature Fusion for Multi-modal Entity Alignment

Funds: Supported by National Natural Science Foundation of China (62002373, 61872446, 71971212, U19B2024)
More Information
    Author Bio:

    GUO Hao Ph.D. candidate at National University of Defense Technology. His research interest covers knowledge graph construction and fusion

    LI Xin-Yi Ph.D. at National University of Defense Technology. His research interest covers natural language processing and information retrieval. Corresponding author of this paper

    TANG Jiu-Yang Professor at National University of Defense Technology. His research interest covers intelligence analytics, big data, and social computing, etc

    GUO Yan-Ming Associate professor at National University of Defense Technology. His research interest covers deep learning, cross-media processing, and adversarial attack

    ZHAO Xiang Professor at National University of Defense Technology. His research interest covers graph data management and mining, and intelligence analytics

  • 摘要: 多模态数据间交互式任务的涌现对综合利用不同模态的知识提出高要求, 多模态知识图谱应运而生, 其通过融合不同模态的知识来满足这类任务的需求. 然而, 现有多模态知识图谱存在图谱知识不完整的问题, 严重阻碍对信息的有效利用. 缓解此问题关键是通过实体对齐方法对图谱进行补全. 当前多模态实体对齐方法以固定权重融合多种模态信息, 在融合过程中忽略不同模态信息贡献的差异性. 为解决上述问题, 设计一套自适应特征融合机制, 根据不同模态数据质量动态融合实体结构信息和视觉信息. 此外, 考虑到视觉信息质量不高、知识图谱之间的结构差异也影响实体对齐的效果, 本文分别设计提升视觉信息有效利用率的视觉特征处理模块以及缓和结构差异性的三元组筛选模块. 在多模态实体对齐任务上的实验结果表明, 提出的多模态实体对齐方法的性能优于当前最好的方法.
  • 图  1  知识图谱FreeBase和DBpedia的结构差异性表现

    Fig.  1  Performance of structural differences between knowledge graphs FreeBase and DBpedia

    图  2  自适应特征融合的多模态实体对齐框架

    Fig.  2  Multi-modal entity alignment framework based on adaptive feature fusion

    图  3  视觉特征处理模块

    Fig.  3  Visual feature processing module

    图  4  三元组筛选模块

    Fig.  4  Triples filtering module

    图  5  自适应特征融合与固定权重融合的实体对齐Hits@1对比

    Fig.  5  Entity alignment Hits@1's comparison of adaptive feature fusion and fixed feature fusion

    表  1  多模态知识图谱

    Table  1  Statistic of the MMKGs datasets

    数据集实体关系三元组图片SameAs
    FB15K14 9151 345592 21313 444
    DB15K14 77727999 02812 84112 846
    Yago15K15 40432122 88611 19411 199
    下载: 导出CSV

    表  2  多模态实体对齐结果

    Table  2  Results of multi-modal entity alignment

    数据集方法seed = 0.2seed = 0.5
    Hits@1Hits@10MRRHits@1Hits@10MRR
    FB15K-DB15KIKRL2.9611.450.0595.5324.410.121
    GCN-align6.2618.810.10513.7934.600.210
    PoE11.117.823.533.0
    HMEA12.1634.860.19127.2451.770.354
    AF2MEA17.7534.140.23329.4550.250.365
    FB15K-Yago15KIKRL3.8412.500.0756.1620.450.111
    GCN-align6.4418.720.10614.0934.800.209
    PoE8.713.318.524.7
    HMEA10.0329.380.16827.9155.310.371
    AF2MEA21.6540.220.28235.7256.030.423
    下载: 导出CSV

    表  3  消融实验实体对齐结果

    Table  3  Entity alignment results of ablation study

    数据集方法seed = 0.2seed = 0.5
    Hits@1Hits@10MRRHits@1Hits@10MRR
    FB15K-DB15KAF2MEA17.7534.140.23329.4550.250.365
    AF2MEA-Adaptive16.0331.010.21226.2945.350.331
    AF2MEA-Visual16.1930.710.21226.1445.380.323
    AF2MEA-Filter14.1328.770.19122.9143.080.297
    FB15K-Yago15KAF2MEA21.6540.220.28235.7256.250.423
    AF2MEA-Adaptive19.3237.380.25531.7753.240.393
    AF2MEA-Visual19.7536.380.25432.0851.530.388
    AF2MEA-Filter15.8432.360.21627.3848.140.345
    下载: 导出CSV

    表  4  实体视觉特征的对齐结果

    Table  4  Entity alignment results of visual feature

    数据集方法seed = 0.2seed = 0.5
    Hits@1Hits@10MRRHits@1Hits@10MRR
    FB15K-DB15KHMEA-v2.079.820.0583.9114.410.086
    Att8.8120.160.1289.5721.130.139
    Att+Filter8.9820.520.1319.9622.580.144
    FB15K-Yago15KHMEA-v2.7711.490.0724.2815.380.095
    Att9.2521.380.13710.5623.550.157
    Att+Filter9.4321.910.13811.0724.510.158
    下载: 导出CSV

    表  5  不同三元组筛选机制下实体结构特征对齐结果

    Table  5  Entity alignment results of structure feature in different filtering mechanism

    数据集方法seed = 0.2seed = 0.5
    Hits@1Hits@10MRRHits@1Hits@10MRR
    FB15K-DB15KBaseline6.2618.810.10513.7934.600.210
    ${\rm{F}}_{\text{PageRank}}$8.0321.370.12518.9039.250.259
    ${\rm{F}}_{\text{random}}$7.5720.760.12016.3236.480.231
    ${\rm{F}}_{\text{our}}$9.7425.280.15022.0944.850.297
    FB15K-Yago15KBaseline6.4418.720.10615.8836.70.229
    ${\rm{F}}_{\text{PageRank}}$9.5423.450.14421.6742.300.290
    ${\rm{F}}_{\text{random}}$8.1720.860.12618.2238.550.254
    ${\rm{F}}_{\text{our}}$11.5928.440.17524.8847.850.327
    下载: 导出CSV

    表  6  自适应特征融合与固定权重融合多模态实体对齐结果

    Table  6  Multi-modal entity alignment results of fixed feature fusion and adaptive feature fusion

    方法Group 1Group 2Group 3
    Hits@1Hits@10Hits@1Hits@10Hits@1Hits@10
    FB15K-DB15K
    Adaptive16.4432.9717.4333.4719.2935.40
    Fixed13.8728.9115.8231.0818.1234.33
    FB15K-Yago15K
    Adaptive16.4432.9717.4333.4719.2935.40
    Fixed16.2133.2319.5537.1122.2745.52
    下载: 导出CSV

    表  7  补充实验多模态实体对齐结果

    Table  7  Multi-modal entity alignment results of additional experiment

    方法Hits@1Hits@10MRRHits@1Hits@10MRR
    seed = 0.2seed = 0.5
    PoE16.4432.9717.4334.753.60.414
    MMEA13.8728.9115.8240.2664.510.486
    AF2MEA28.6548.220.38248.2575.830.569
    下载: 导出CSV
  • [1] Zhu S G, Cheng X, Su S. Knowledge-based question answering by tree-to-sequence learning. Neurocomputing, 2020, 372: 64-72 doi: 10.1016/j.neucom.2019.09.003
    [2] Martinez-Rodriguez J L, Hogan A, Lopez-Arevalo I. Information extraction meets the semantic web: A survey. Semantic Web, 2020, 11(2): 255-335 doi: 10.3233/SW-180333
    [3] Yao X C, Van Durme B. Information extraction over structured data: Question answering with freebase. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, USA: ACL, 2014. 956−966
    [4] Sun Z, Yang J, Zhang J, Bozzon A, Huang L K, Xu C. Recurrent knowledge graph embedding for effective recommendation. In: Proceedings of the 12th ACM Conference on Recommender Systems. Vancouver, Canada: ACM, 2018. 297−305. mmgj
    [5] Wang M, Qi G L, Wang H F, Zheng Q S. Richpedia: A comprehensive multi-modal knowledge graph. In: Proceedings of the 9th Joint International Conference on Semantic Technology. Hangzhou, China: Springer, 2019. 130−145
    [6] Liu Y, Li H, Garcia-Duran A, Niepert M, Onoro-Rubio D, Rosenblum D S. MMKG: Multi-modal knowledge graphs. In: Proceedings of the 16th International Conference on the Semantic Web. Portorož, Slovenia: Springer, 2019. 459−474
    [7] Shen L, Hong R C, Hao Y B. Advance on large scale near-duplicate video retrieval. Frontiers of Computer Science, 2020, 14(5): Article No. 145702 doi: 10.1007/s11704-019-8229-7
    [8] Han Y H, Wu A M, Zhu L C, Yang Y. Visual commonsense reasoning with directional visual connections. Frontiers of Information Technology & Electronic Engineering, 2021, 22(5): 625-637
    [9] Zheng W F, Yin L R, Chen X B, Ma Z Y, Liu S, Yang B. Knowledge base graph embedding module design for visual question answering model. Pattern Recognition, 2021120: Article No. 108153 doi: 10.1016/j.patcog.2021.108153
    [10] Zeng W X, Zhao X, Wang W, Tang J Y, Tan Z. Degree-aware alignment for entities in tail. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Virtual, Online, China: ACM, 2020. 811−820
    [11] Zhao X, Zeng W X, Tang J Y, Wang W, Suchanek F. An experimental study of state-of-the-art entity alignment approaches. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(6): 2610-2625
    [12] Zeng W X, Zhao X, Tang J Y, Li X Y, Luo M N, Zheng Q H. Towards entity alignment in the open world: An unsupervised approach. In: Proceedings of the 26th International Conference Database Systems for Advanced Applications. Taipei, China: Springer, 2021. 272−289
    [13] Guo H, Tang J Y, Zeng W X, Zhao X, Liu L. Multi-modal entity alignment in hyperbolic space. Neurocomputing, 2021, 461: 598-607 doi: 10.1016/j.neucom.2021.03.132
    [14] Wang Z C, Lv Q S, Lan X H, Zhang Y. Cross-lingual knowledge graph alignment via graph convolutional networks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: ACL, 2018. 349−357
    [15] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR, 2015.
    [16] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 770−778
    [17] Chen M H, Tian Y T, Yang M H, Zaniolo C. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne, Australia: IJCAI.org, 2017. 1511−1517
    [18] Sun Z Q, Hu W, Zhang Q H, Qu Y Z. Bootstrapping entity alignment with knowledge graph embedding. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden: IJCAI.org, 2018. 4396−4402
    [19] Chen L Y, Li Z, Wang Y J, Xu T, Wang Z F, Chen E H. MMEA: Entity alignment for multi-modal knowledge graph. In: Proceedings of the 13th International Conference on Knowledge Science, Engineering and Management. Hangzhou, China: Springer, 2020. 134−147
    [20] Guo L B, Sun Z Q, Hu W. Learning to exploit long-term relational dependencies in knowledge graphs. In: Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: PMLR, 2019. 2505−2514
    [21] 庄严, 李国良, 冯建华. 知识库实体对齐技术综述. 计算机研究与发展, 2016, 53(1): 165-192 doi: 10.7544/issn1000-1239.2016.20150661

    Zhuang Yan, Li Guo-Liang, Feng Jian-Hua. A survey on entity alignment of knowledge base. Journal of Computer Research and Development, 2016, 53(1): 165-192 doi: 10.7544/issn1000-1239.2016.20150661
    [22] 乔晶晶, 段利国, 李爱萍. 融合多种特征的实体对齐算法. 计算机工程与设计, 2018, 39(11): 3395-3400

    Qiao Jing-Jing, Duan Li-Guo, Li Ai-Ping. Entity alignment algorithm based on multi-features. Computer Engineering and Design, 2018, 39(11): 3395-3400
    [23] Trisedya B D, Qi J Z, Zhang R. Entity alignment between knowledge graphs using attribute embeddings. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu, USA: AAAI Press, 2019. 297−304
    [24] Zhu H, Xie R B, Liu Z Y, Sun M S. Iterative entity alignment via joint knowledge embeddings. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne, Australia: IJCAI.org, 2017. 4258−4264
    [25] Chen M H, Tian Y T, Chang K W, Skiena S, Zaniolo C. Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden: IJCAI.org, 2018. 3998−4004
    [26] Cao Y X, Liu Z Y, Li C J, Liu Z Y, Li J Z, Chua T S. Multi-channel graph neural network for entity alignment. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: ACL, 2019. 1452−1461
    [27] Li C J, Cao Y X, Hou L, Shi J X, Li J Z, Chua T S. Semi-supervised entity alignment via joint knowledge embedding model and cross-graph model. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: ACL, 2019. 2723−2732
    [28] Mao X, Wang W T, Xu H M, Lan M, Wu Y B. MRAEA: An efficient and robust entity alignment approach for cross-lingual knowledge graph. In: Proceedings of the 13th International Conference on Web Search and Data Mining. Houston, USA: ACM, 2020. 420−428
    [29] Sun Z Q, Hu W, Li C K. Cross-lingual entity alignment via joint attribute-preserving embedding. In: Proceedings of the 16th International Semantic Web Conference on the Semantic Web (ISWC). Vienna, Austria: Springer, 2018. 628−644
    [30] Galárraga L, Razniewski S, Amarilli A, Suchanek F M. Predicting completeness in knowledge bases. In: Proceedings of the 10th ACM International Conference on Web Search and Data Mining. Cambridge, United Kingdom: ACM, 2017. 375−383
    [31] Ferrada S, Bustos B, Hogan A. IMGpedia: A linked dataset with content-based analysis of Wikimedia images. In: Proceedings of the 16th International Semantic Web Conference on the Semantic Web (ISWC). Vienna, Austria: Springer, 2017. 84−93
    [32] Xie R B, Liu Z Y, Luan H B, Sun M S. Image-embodied knowledge representation learning. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne, Australia: IJCAI.org, 2017. 3140−3146
    [33] Mousselly-Sergieh H, Botschen T, Gurevych I, Roth S. A multimodal translation-based approach for knowledge graph representation learning. In: Proceedings of the 7th Joint Conference on Lexical and Computational Semantics. New Orleans, USA: ACL, 2018. 225−234
    [34] Tan H, Bansal M. LXMERT: Learning cross-modality encoder representations from transformers. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: ACL, 2019. 5100−5111
    [35] Li L H, Yatskar M, Yin D, Hsieh C J, Chang K W. What does BERT with vision look at? In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: ACL, 2020. 5265−5275
    [36] Wang H R, Zhang Y, Ji Z, Pang Y W, Ma L. Consensus-aware visual-semantic embedding for image-text matching. In: Proceedings of the 16th European Conference on Computer Vision (ECCV). Glasgow, UK: Springer, 2020. 18−34
    [37] Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft coco: Common objects in context. In: Proceedings of the 13th European Conference on Computer Vision (ECCV). Zurich, Switzerland: Springer, 2014. 740−755
    [38] Plummer B A, Wang L W, Cervantes C M, Caicedo J C, Hockenmaier J, Lazebnik S. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 2641−2649
    [39] Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149 doi: 10.1109/TPAMI.2016.2577031
    [40] Schuster M, Paliwal K K. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 1977, 45(11): 2673–2681
    [41] Hamilton W L, Ying R, Leskovec J. Inductive representation learning on large graphs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc., 2017. 1025−1035
    [42] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th International Conference on Learning Representations. Toulon, France: OpenReview.net, 2017.
    [43] Wu Y T, Liu X, Feng Y S, Wang Z, Yan R, Zhao D Y. Relation-aware entity alignment for heterogeneous knowledge graphs. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao, China: IJCAI, 2019. 5278−5284
    [44] Xing W P, Ghorbani A. Weighted pagerank algorithm. In: Proceedings of the 2nd Annual Conference on Communication Networks and Services Research. Fredericton, Canada: IEEE, 2004. 305−314
    [45] Zhang Q H, Sun Z Q, Hu W, Chen M H, Guo L B, Qu Y Z. Multi-view knowledge graph embedding for entity alignment. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao, China: IJCAI.org, 2019. 5429−5435
    [46] Pang N, Zeng W X, Tang J Y, Tan Z, Zhao X. Iterative entity alignment with improved neural attribute embedding. In: Proceedings of the Workshop on Deep Learning for Knowledge Graphs (DL4KG2019) Co-located with the 16th Extended Semantic Web Conference (ESWC). Portorož, Slovenia: CEUR-WS, 2019. 41−46
    [47] Huang B, Yang F, Yin M X, Mo X Y, Zhong C. A review of multimodal medical image fusion techniques. Computational and Mathematical Methods in Medicine, 2020, 2020: Article No. 8279342
    [48] Atrey P K, Hossain M A, El Saddik A, Kankanhalli M S. Multimodal fusion for multimedia analysis: A survey. Multimedia Systems, 2010, 16(6): 345-379 doi: 10.1007/s00530-010-0182-0
    [49] Poria S, Cambria E, Bajpai R, Hussain A. A review of affective computing: From unimodal analysis to multimodal fusion. Information Fusion, 2017, 37: 98-125 doi: 10.1016/j.inffus.2017.02.003
  • 加载中
计量
  • 文章访问数:  1331
  • HTML全文浏览量:  987
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-06-09
  • 录用日期:  2021-11-26
  • 网络出版日期:  2022-11-10

目录

    /

    返回文章
    返回