2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

自适应特征融合的多模态实体对齐研究

郭浩 李欣奕 唐九阳 郭延明 赵翔

郭浩, 李欣奕, 唐九阳, 郭延明, 赵翔. 自适应特征融合的多模态实体对齐研究. 自动化学报, 2022, 48(x): 1−13 doi: 10.16383/j.aas.c210518
引用本文: 郭浩, 李欣奕, 唐九阳, 郭延明, 赵翔. 自适应特征融合的多模态实体对齐研究. 自动化学报, 2022, 48(x): 1−13 doi: 10.16383/j.aas.c210518
Guo Hao, Li Xin-Yi, Tang Jiu-Yang, Guo Yan-Ming, Zhao Xiang. Adaptive feature fusion for multi-modal entity alignment. Acta Automatica Sinica, 2022, 48(x): 1−13 doi: 10.16383/j.aas.c210518
Citation: Guo Hao, Li Xin-Yi, Tang Jiu-Yang, Guo Yan-Ming, Zhao Xiang. Adaptive feature fusion for multi-modal entity alignment. Acta Automatica Sinica, 2022, 48(x): 1−13 doi: 10.16383/j.aas.c210518

自适应特征融合的多模态实体对齐研究

doi: 10.16383/j.aas.c210518
基金项目: 国家自然科学基金(62002373, 61872446, 71971212, U19B2024)资助
详细信息
    作者简介:

    郭浩:中国人民解放军国防科技大学博士研究生. 主要研究方向为知识图谱构建与融合技术. E-mail: guo_hao@nudt.edu.cn

    李欣奕:中国人民解放军国防科技大学博士. 主要研究方向为自然语言处理和信息检索. 本文通信作者. E-mail: lixinyimichael@163.com

    唐九阳:中国人民解放军国防科技大学教授. 主要研究方向为智能分析, 大数据和社会计算. E-mail: 13787319678@163.com

    郭延明:中国人民解放军国防科技大学副教授. 主要研究方向为深度学习, 跨媒体信息处理与智能对抗. E-mail: guoyanming@nudt.edu.cn

    赵翔:中国人民解放军国防科技大学教授. 主要研究方向为图数据管理与挖掘和智能分析. E-mail: xiangzhao@nudt.edu.cn

Adaptive Feature Fusion for Multi-modal Entity Alignment

Funds: Supported by National Natural Science Foundation of China (62002373, 61872446, 71971212, U19B2024)
More Information
    Author Bio:

    GUO Hao Ph. D. candidate at National University of Defense Technology. His research interest covers knowledge graph construction and fusion

    LI Xin-Yi Ph. D. at National University of Defense Technology. His research interest covers natural language processing and information retrieval. Corresponding author of this paper

    TANG Jiu-Yang Professor at National University of Defense Technology. His research interest covers intelligence analytics, big data and social computing, etc

    GUO Yan-Ming Associate Professor at National University of Defense Technology. His research interest covers deep learning, cross-media processing and adversarial attack

    ZHAO Xiang Professor at National University of Defense Technology. His research interest covers graph data management and mining, intelligence analytics, etc

  • 摘要: 多模态数据间交互式任务的涌现对综合利用不同模态的知识提出了高要求, 多模态知识图谱应运而生, 其通过融合不同模态的知识来满足这类任务的需求. 然而, 现有多模态知识图谱存在图谱知识不完整的问题, 严重阻碍对信息的有效利用. 缓解此问题关键是通过实体对齐方法对图谱进行补全. 当前多模态实体对齐方法以固定权重融合多种模态信息, 在融合过程中忽略了不同模态信息贡献的差异性. 为解决上述问题, 本文设计一套自适应特征融合机制, 根据不同模态数据质量动态融合实体结构信息和视觉信息. 此外, 考虑到视觉信息质量不高、知识图谱之间的结构差异也影响实体对齐的效果, 本文分别设计提升视觉信息有效利用率的视觉特征处理模块以及缓和结构差异性的三元组筛选模块. 在多模态实体对齐任务上的实验结果表明, 本文提出的多模态实体对齐方法的性能优于当前最好的方法.
  • 图  1  知识图谱FreeBase和DBpedia的结构差异性表现

    Fig.  1  Structure difference between knowledge graphs FreeBase and DBpedia

    图  2  自适应特征融合的多模态实体对齐框架

    Fig.  2  Multi-modal entity alignment framework based on adaptive feature fusion

    图  3  视觉特征处理模块

    Fig.  3  Visual feature processing module

    图  4  三元组筛选模块

    Fig.  4  Triples filtering module

    图  5  自适应特征融合与固定权重融合的实体对齐Hits@1对比

    Fig.  5  Entity alignment Hits@1's comparison of adaptive feature fusion and fixed feature fusion

    表  1  多模态知识图谱

    Table  1  Statistic of the MMKGs Datasets

    数据集实体关系三元组图片SameAs
    FB15K14 9151 345592 21313 444
    DB15K14 77727999 02812 84112 846
    Yago15K15 40432122 88611 19411 199
    下载: 导出CSV

    表  2  多模态实体对齐结果

    Table  2  Results of multi-modal entity alignment

    数据集方法seed = 0.2seed = 0.5
    Hits@1Hits@10MRRHits@1Hits@10MRR
    FB15K-DB15KIKRL2.9611.450.0595.5324.410.121
    GCN-align6.2618.810.10513.7934.600.210
    PoE11.117.823.533.0
    HMEA12.1634.860.19127.2451.770.354
    AF2MEA17.7534.140.23329.4550.250.365
    FB15K-Yago15KIKRL3.8412.500.0756.1620.450.111
    GCN-align6.4418.720.10614.0934.800.209
    PoE8.713.318.524.7
    HMEA10.0329.380.16827.9155.310.371
    AF2MEA21.6540.220.28235.7256.030.423
    下载: 导出CSV

    表  3  消融实验实体对齐结果

    Table  3  Entity alignment results of ablation study

    方法$\text{seed} = 0.2$seed = 0.5
    Hits@1Hits@10MRRHits@1Hits@10MRR
    FB15K-DB15K
    AF2MEA17.7534.140.23329.4550.250.365
    AF2MEA-Adaptive16.0331.010.21226.2945.350.331
    AF2MEA-Visual16.1930.710.21226.1445.380.323
    AF2MEA-Filter14.1328.770.19122.9143.080.297
    FB15K-Yago15K
    AF2MEA21.6540.220.28235.7256.250.423
    AF2MEA-Adaptive19.3237.380.25531.7753.240.393
    AF2MEA-Visual19.7536.380.25432.0851.530.388
    AF2MEA-Filter15.8432.360.21627.3848.140.345
    下载: 导出CSV

    表  4  实体视觉特征的对齐结果

    Table  4  Entity alignment results of visual feature

    数据集方法seed = 0.2seed = 0.5
    Hits@1Hits@10MRRHits@1Hits@10MRR
    FB15K-DB15KHMEA-v2.079.820.0583.9114.410.086
    Att8.8120.160.1289.5721.130.139
    Att+Filter8.9820.520.1319.9622.580.144
    FB15K-Yago15KHMEA-v2.7711.490.0724.2815.380.095
    Att9.2521.380.13710.5623.550.157
    Att+Filter9.4321.910.13811.0724.510.158
    下载: 导出CSV

    表  5  不同三元组筛选机制下实体结构特征对齐结果

    Table  5  Entity alignment results of structure feature in different filtering mechanism

    数据集方法seed = 0.2seed = 0.5
    Hits@1Hits@10MRRHits@1Hits@10MRR
    FB15K-DB15KBaseline6.2618.810.10513.7934.600.210
    ${\rm{F}}_{\text{PageRank}}$8.0321.370.12518.9039.250.259
    ${\rm{F}}_{\text{random}}$7.5720.760.12016.3236.480.231
    ${\rm{F}}_{\text{our}}$9.7425.280.15022.0944.850.297
    FB15K-Yago15KBaseline6.4418.720.10615.8836.70.229
    ${\rm{F}}_{\text{PageRank}}$9.5423.450.14421.6742.300.290
    ${\rm{F}}_{\text{random}}$8.1720.860.12618.2238.550.254
    ${\rm{F}}_{\text{our}}$11.5928.440.17524.8847.850.327
    下载: 导出CSV

    表  6  自适应特征融合与固定权重融合多模态实体对齐结果

    Table  6  Multi-modal entity alignment results of fixed feature fusion and adaptive feature fusion

    方法Group1Group2Group3
    Hits@1Hits@10Hits@1Hits@10Hits@1Hits@10
    FB15K-DB15K
    Adaptive16.4432.9717.4333.4719.2935.40
    Fixed13.8728.9115.8231.0818.1234.33
    FB15K-Yago15K
    Adaptive16.4432.9717.4333.4719.2935.40
    Fixed16.2133.2319.5537.1122.2745.52
    下载: 导出CSV

    表  7  补充实验多模态实体对齐结果

    Table  7  Multi-modal entity alignment results of additional experiment

    方法Hits@1Hits@10MRRHits@1Hits@10MRR
    seed = 0.2seed = 0.5
    PoE16.4432.9717.4334.753.60.414
    MMEA13.8728.9115.8240.2664.510.486
    AF2MEA28.6548.220.38248.2575.830.569
    下载: 导出CSV
  • [1] Zhu S G, Cheng X, Su S. Knowledge-based question answering by tree-to-sequence learning. Neurocomputing, 2020, 372: 64-72 doi: 10.1016/j.neucom.2019.09.003
    [2] Martinez-Rodriguez J L, Hogan A, Lopez-Arevalo I. Information extraction meets the semantic web: a survey. Semantic Web, 2020, 11(2): 255-335 doi: 10.3233/SW-180333
    [3] Yao X C, Van Durme B. Information extraction over structured data: Question answering with freebase. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, MD, USA: ACL, 2014. 956−966
    [4] Sun Z, Yang J, Zhang J, Bozzon A, Huang L K, Xu C. Recurrent knowledge graph embedding for effective recommendation. In: Proceedings of the 12th ACM Conference on Recommender Systems. Vancouver, BC, Canada: ACM. 2018: 297−305.
    [5] Wang M, Qi G L, Wang H F, Zheng Q S. Richpedia: A comprehensive multi-modal knowledge graph. semantic technology.In: Proceedings of Semantic Technology: 9th Joint International Conference (JIST). Hangzhou, China, Springer. 2019: 130−145
    [6] Liu Y, Li H, Garcia-Duran A, Niepert M, Onoro-Rubio D, Rosenblum D S. MMKG: multi-modal knowledge graphs. In: Proceeding of the 16th European Semantic Web Conference. Portoro, Slovenia, Springer. 2019: 459−474
    [7] Shen L, Hong R C, Hao Y B. Advance on large scale near-duplicate video retrieval. Frontiers of Computer Science, 2020, 14(5): 1-24
    [8] Han Y H, Wu A M, Zhu L C, Yang Y. Visual commonsense reasoning with directional visual connections. Frontiers of Information Technology & Electronic Engineering, 2021, 22(5): 625-637
    [9] Zheng W F, Yin L R, Chen X B, Ma Z Y, Liu S, Yang B. Knowledge base graph embedding module design for visual question answering model. Pattern Recognition, 2021, 120: 108153 doi: 10.1016/j.patcog.2021.108153
    [10] Zeng W X, Zhao X, Wang W, Tang J Y, Tan Z. Degree-aware alignment for entities in tail. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Virtual Event, China, ACM, 2020: 811−820
    [11] Zhao X, Zeng W X, Tang J Y, Wang W, Suchanek F. An experimental study of state-of-the-art entity alignment approaches. IEEE Trans. Knowl. Data Eng., 2022, 34(6): 2610-2625
    [12] Zeng W X, Zhao X, Tang J Y, Luo M N, Zheng Q H. Towards entity alignment in the open world: An Unsupervised Approach. In: Proceedings of Database Systems for Advanced Applications (DASFAA). Taipei, Taiwan, Springer. 2021: 272−289
    [13] Guo H, Tang J Y, Zeng W X, Zhao X, Liu L. Multi-modal Entity Alignment in Hyperbolic Space. Neurocomputing, 2021, 461: 598-607 doi: 10.1016/j.neucom.2021.03.132
    [14] 17 Wang Z C, Lv Q S, Lan X H, Zhang Y. Cross-lingual knowledge graph alignment via graph convolutional networks. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP). Brussels, Belgium, ACL. 2018: 349−357
    [15] 14 Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceeding of International Conference on Learning Representations (ICLR). San Diego, CA, USA. 2015
    [16] 15. He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE. 2016: 770−778
    [17] 18 Chen M H, Tian Y T, Yang M H, Zaniolo C. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Melbourne, Australia, ijcai.org. 2017: 1511−1517
    [18] 19 Sun Z Q, Hu W, Zhang Q H, Qu Y Z. Bootstrapping entity alignment with knowledge graph embedding. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Stockholm, Sweden, ijcai.org. 2018: 4396−4402
    [19] 32 Chen L Y, Li Z, Wang Y J, Xu T, Wang Z F, Chen E H. MMEA: entity alignment for multi-modal knowledge graph. In: Proceeding of the International Conference on Knowledge Science, Engineering and Management (KSEM). Hangzhou, China, Springer. 2020: 134−147
    [20] 16 Guo L B, Sun Z Q, Hu6 W. Learning to exploit long-term relational dependencies in knowledge graphs. In: Proceedings of the International Conference on Machine Learning (ICML). Long Beach, California, USA, PMLR. 2019: 2505−2514
    [21] 庄严, 李国良, 冯建华. 知识库实体对齐技术综述. 计算机研究与发展, 2016, 53(1): 165-192 doi: 10.7544/issn1000-1239.2016.20150661

    Zhuang Y, Li G L, Feng J H. A survey on entity alignment of knowledge base. Journal of Computer Research and Development, 2016, 53(1): 165-192 doi: 10.7544/issn1000-1239.2016.20150661
    [22] 乔晶晶, 段利国, 李爱萍. 融合多种特征的实体对齐算法. 计算机工程与设计, 2018, 39(11): 3395-3400 doi: 10.16208/j.issn1000-7024.2018.11.018

    Qiao J J, Duan L G, Li A P. Entity alignment algorithm based on multi-features. Computer Engineering and Design, 2018, 39(11): 3395-3400 doi: 10.16208/j.issn1000-7024.2018.11.018
    [23] Trisedya B D, Qi J Z, Zhang R. Entity alignment between knowledge graphs using attribute embeddings. In: Proceedings of the Conference on Artificial Intelligence (AAAI). Honolulu, Hawaii, USA, AAAI Press. 2019: 297−304
    [24] Zhu H, Xie R B, Liu Z Y, Sun M S. Iterative entity alignment via joint knowledge embeddings. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Melbourne, Australia, ijcai.org. 2017: 4258−4264
    [25] Chen M H, Tian Y T, Chang K W, Skiena S, Carlo Z. Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Stockholm, Sweden, ijcai.org. 2018: 3998−4004
    [26] Cao Y X, Liu Z Y, Li C J, Liu Z Y, Li J Z, Chua T S. Multi-channel graph neural network for entity alignment. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Florence, Italy, ACL. 2019: 1452−1461
    [27] Li C J, Cao Y X, Hou L, Shi J X, Li J Z, Chua T S. Semi-supervised entity alignment via joint knowledge embedding model and cross-graph mode. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China, ACL. 2019: 2723−2732
    [28] Mao X, Wang W T, Xu H M, Lan M, Wu Y B. MRAEA: an efficient and robust entity alignment approach for cross-lingual knowledge graph. In: Proceedings of the International Conference on Web Search and Data Mining (WSDM). Houston, TX, USA, ACM. 2020: 420−428
    [29] Sun Z Q, Hu W, Li C K. Cross-lingual entity alignment via joint attribute-preserving embedding. In: Proceedings of the International Semantic Web Conference (ISWS). Vienna, Austria, Springer. 2018: 628−644
    [30] Galárraga L, Razniewski S, Amarilli A, Suchanek F M. Predicting completeness in knowledge bases. In: Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM). Cambridge, United Kingdom, ACM. 2017: 375−383
    [31] Ferrada S, Bustos B, Hogan A. IMGpedia: a linked dataset with content-based analysis of Wikimedia images. In: Proceeding of the International Semantic Web Conference (ISWS). Vienna, Auetria: Springer. 2017: 84−93
    [32] Chen L Y, Li Z, Wang Y J, Xu T, Wang Z F, Chen E H. MMEA: entity alignment for multi-modal knowledge graph. In: Proceeding of International Conference on Knowledge Science, Engineering and Management (KSEM). Hangzhou, China, Springer. 2020: 134−147
    [33] Xie R B, Liu Z Y, Luan H B, Sun M S. Image-embodied knowledge representation learning. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Melbourne, Australia, ijcai.org. 2017: 3140−3146
    [34] Mousselly-Sergieh H, Botschen T, Gurevych I, Roth S. A multimodal translation-based approach for knowledge graph representation learning. In: Proceedings of the Joint Conference on Lexical and Computational Semantics. New Orleans, Louisiana, USA, ACL. 2018: 225−234
    [35] Tan H, Bansal M. Lxmert: Learning cross-modality encoder representations from transformers. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China, ACL. 2019: 5100−5111
    [36] Li L H, Yatskar M, Yin D, et al. What does BERT with vision look at? In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Online, ACL. 2020: 5265−5275
    [37] Wang H R, Zhang Y, Ji Z, Pang Y W, Ma L. Consensus-aware visual-semantic embedding for image-text matching. In: Proceeding of the European Conference on Computer Vision(ECCV). Glasgow, UK, Springer. 2020: 18−34
    [38] Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft coco: Common objects in context. In: Proceeding of European Conference on Computer Vision (ECCV). Zurich, Switzerland, Springer. 2014: 740−755
    [39] Plummer B A, Wang L, Cervantes C M, et al. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In: Proceedings of the IEEE international conference on computer vision (ICCV). Santiago, Chile, IEEE. 2015: 2641−2649
    [40] Ren S Q, He K M, Girshick R B, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149
    [41] Hamilton W L, Ying R, Leskovec J. Inductive representation learning on large graphs. In: Proceedings of the International Conference on Neural Information Processing Systems (NIPS). Long Beach, CA, USA. 2017: 1024−1034
    [42] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks[C]. In: Proceeding of the 5th International Conference on Learning Representations. Toulon, France, OpenReview. net. 2017
    [43] Wu Y T, Liu X, Feng Y S, Wang Z, Yan R, Zhao D Y. Relation-aware entity alignment for heterogeneous knowledge graphs. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Macao, China, ijcai. 2019: 5278−5284
    [44] Xing W P, Ghorbani A. Weighted pagerank algorithm. In: Proceedings of the Annual Conference on Communication Networks and Services Research. Fredericton, N.B., Canada, IEEE. 2004: 305−314
    [45] Zhang Q H, Sun Z Q, Hu W, Chen M H, Guo L B, Qu Y Z. Multi-view knowledge graph embedding for entity alignment. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Macao, China, ijcai.org. 2019: 5429−5435
    [46] Pang N, Zeng W X, Tang J Y, Tan Z, Zhao X. Iterative entity alignment with improved neural attribute embedding. In: Proceedings of the Workshop on Deep Learning for Knowledge Graphs Co-located with the 16th Extended Semantic Web Conference. Portoroz, Slovenia, CEUR. 2019: 41−46
    [47] Huang B, Yang F, Yin M X, Mo X Y, Zhong C. A review of multimodal medical image fusion techniques. Computational and mathematical methods in medicine, 2020, 2020(8279342): 1-16
    [48] Atrey P K, Hossain M A, El Saddik A, Kankanhalli M S. Multimodal fusion for multimedia analysis: a survey. Multimedia systems, 2010, 16(6): 345-379 doi: 10.1007/s00530-010-0182-0
    [49] Poria S, Cambria E, Bajpai R, Hussain A. A review of affective computing: From unimodal analysis to multimodal fusion. Information Fusion, 2017, 37: 98-125 doi: 10.1016/j.inffus.2017.02.003
  • 加载中
计量
  • 文章访问数:  928
  • HTML全文浏览量:  640
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-06-09
  • 录用日期:  2021-11-26
  • 网络出版日期:  2022-11-10

目录

    /

    返回文章
    返回