-
摘要: 多模态数据间交互式任务的涌现对综合利用不同模态的知识提出了高要求, 多模态知识图谱应运而生, 其通过融合不同模态的知识来满足这类任务的需求. 然而, 现有多模态知识图谱存在图谱知识不完整的问题, 严重阻碍对信息的有效利用. 缓解此问题关键是通过实体对齐方法对图谱进行补全. 当前多模态实体对齐方法以固定权重融合多种模态信息, 在融合过程中忽略了不同模态信息贡献的差异性. 为解决上述问题, 本文设计一套自适应特征融合机制, 根据不同模态数据质量动态融合实体结构信息和视觉信息. 此外, 考虑到视觉信息质量不高、知识图谱之间的结构差异也影响实体对齐的效果, 本文分别设计提升视觉信息有效利用率的视觉特征处理模块以及缓和结构差异性的三元组筛选模块. 在多模态实体对齐任务上的实验结果表明, 本文提出的多模态实体对齐方法的性能优于当前最好的方法.Abstract: The recent surge of interactive tasks involving multi-modal data brings a high demand for utilizing knowledge in different modalities. This facilitated the birth of multi-modal knowledge graphs, which aggregate multi-modal knowledge to meet the demands of the tasks. However, they are known to suffer from the knowledge incompleteness problem that hinders the utilization of information. To mitigate this problem, it is of great need to improve the knowledge coverage via entity alignment. Current entity alignment methods fuse multi-modal information by fixed weighting, which ignores the different contributions of individual modalities. To solve this challenge, we propose an adaptive feature fusion mechanism, that combines entity structure information and visual information via dynamic fusion according to the data quality. Besides, considering that low quality visual information and structural difference between knowledge graphs further impact the performance of entity alignment, we design a visual feature processing module to improve the effective utilization of visual information and a triple filtering module to ease structural differences. Experiments on multi-modal entity alignment indicate that our method outperforms the state-of-the-arts.
-
Key words:
- Multi-modal knowledge graph /
- entity alignment /
- pre-trained model /
- feature fusion
-
表 1 多模态知识图谱
Table 1 Statistic of the MMKGs Datasets
数据集 实体 关系 三元组 图片 SameAs FB15K 14 915 1 345 592 213 13 444 DB15K 14 777 279 99 028 12 841 12 846 Yago15K 15 404 32 122 886 11 194 11 199 表 2 多模态实体对齐结果
Table 2 Results of multi-modal entity alignment
数据集 方法 seed = 0.2 seed = 0.5 Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR FB15K-DB15K IKRL 2.96 11.45 0.059 5.53 24.41 0.121 GCN-align 6.26 18.81 0.105 13.79 34.60 0.210 PoE 11.1 17.8 — 23.5 33.0 — HMEA 12.16 34.86 0.191 27.24 51.77 0.354 AF2MEA 17.75 34.14 0.233 29.45 50.25 0.365 FB15K-Yago15K IKRL 3.84 12.50 0.075 6.16 20.45 0.111 GCN-align 6.44 18.72 0.106 14.09 34.80 0.209 PoE 8.7 13.3 — 18.5 24.7 — HMEA 10.03 29.38 0.168 27.91 55.31 0.371 AF2MEA 21.65 40.22 0.282 35.72 56.03 0.423 表 3 消融实验实体对齐结果
Table 3 Entity alignment results of ablation study
方法 $\text{seed} = 0.2$ seed = 0.5 Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR FB15K-DB15K AF2MEA 17.75 34.14 0.233 29.45 50.25 0.365 AF2MEA-Adaptive 16.03 31.01 0.212 26.29 45.35 0.331 AF2MEA-Visual 16.19 30.71 0.212 26.14 45.38 0.323 AF2MEA-Filter 14.13 28.77 0.191 22.91 43.08 0.297 FB15K-Yago15K AF2MEA 21.65 40.22 0.282 35.72 56.25 0.423 AF2MEA-Adaptive 19.32 37.38 0.255 31.77 53.24 0.393 AF2MEA-Visual 19.75 36.38 0.254 32.08 51.53 0.388 AF2MEA-Filter 15.84 32.36 0.216 27.38 48.14 0.345 表 4 实体视觉特征的对齐结果
Table 4 Entity alignment results of visual feature
数据集 方法 seed = 0.2 seed = 0.5 Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR FB15K-DB15K HMEA-v 2.07 9.82 0.058 3.91 14.41 0.086 Att 8.81 20.16 0.128 9.57 21.13 0.139 Att+Filter 8.98 20.52 0.131 9.96 22.58 0.144 FB15K-Yago15K HMEA-v 2.77 11.49 0.072 4.28 15.38 0.095 Att 9.25 21.38 0.137 10.56 23.55 0.157 Att+Filter 9.43 21.91 0.138 11.07 24.51 0.158 表 5 不同三元组筛选机制下实体结构特征对齐结果
Table 5 Entity alignment results of structure feature in different filtering mechanism
数据集 方法 seed = 0.2 seed = 0.5 Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR FB15K-DB15K Baseline 6.26 18.81 0.105 13.79 34.60 0.210 ${\rm{F}}_{\text{PageRank}}$ 8.03 21.37 0.125 18.90 39.25 0.259 ${\rm{F}}_{\text{random}}$ 7.57 20.76 0.120 16.32 36.48 0.231 ${\rm{F}}_{\text{our}}$ 9.74 25.28 0.150 22.09 44.85 0.297 FB15K-Yago15K Baseline 6.44 18.72 0.106 15.88 36.7 0.229 ${\rm{F}}_{\text{PageRank}}$ 9.54 23.45 0.144 21.67 42.30 0.290 ${\rm{F}}_{\text{random}}$ 8.17 20.86 0.126 18.22 38.55 0.254 ${\rm{F}}_{\text{our}}$ 11.59 28.44 0.175 24.88 47.85 0.327 表 6 自适应特征融合与固定权重融合多模态实体对齐结果
Table 6 Multi-modal entity alignment results of fixed feature fusion and adaptive feature fusion
方法 Group1 Group2 Group3 Hits@1 Hits@10 Hits@1 Hits@10 Hits@1 Hits@10 FB15K-DB15K Adaptive 16.44 32.97 17.43 33.47 19.29 35.40 Fixed 13.87 28.91 15.82 31.08 18.12 34.33 FB15K-Yago15K Adaptive 16.44 32.97 17.43 33.47 19.29 35.40 Fixed 16.21 33.23 19.55 37.11 22.27 45.52 表 7 补充实验多模态实体对齐结果
Table 7 Multi-modal entity alignment results of additional experiment
方法 Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR seed = 0.2 seed = 0.5 PoE 16.44 32.97 17.43 34.7 53.6 0.414 MMEA 13.87 28.91 15.82 40.26 64.51 0.486 AF2MEA 28.65 48.22 0.382 48.25 75.83 0.569 -
[1] Zhu S G, Cheng X, Su S. Knowledge-based question answering by tree-to-sequence learning. Neurocomputing, 2020, 372: 64-72 doi: 10.1016/j.neucom.2019.09.003 [2] Martinez-Rodriguez J L, Hogan A, Lopez-Arevalo I. Information extraction meets the semantic web: a survey. Semantic Web, 2020, 11(2): 255-335 doi: 10.3233/SW-180333 [3] Yao X C, Van Durme B. Information extraction over structured data: Question answering with freebase. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, MD, USA: ACL, 2014. 956−966 [4] Sun Z, Yang J, Zhang J, Bozzon A, Huang L K, Xu C. Recurrent knowledge graph embedding for effective recommendation. In: Proceedings of the 12th ACM Conference on Recommender Systems. Vancouver, BC, Canada: ACM. 2018: 297−305. [5] Wang M, Qi G L, Wang H F, Zheng Q S. Richpedia: A comprehensive multi-modal knowledge graph. semantic technology.In: Proceedings of Semantic Technology: 9th Joint International Conference (JIST). Hangzhou, China, Springer. 2019: 130−145 [6] Liu Y, Li H, Garcia-Duran A, Niepert M, Onoro-Rubio D, Rosenblum D S. MMKG: multi-modal knowledge graphs. In: Proceeding of the 16th European Semantic Web Conference. Portoro, Slovenia, Springer. 2019: 459−474 [7] Shen L, Hong R C, Hao Y B. Advance on large scale near-duplicate video retrieval. Frontiers of Computer Science, 2020, 14(5): 1-24 [8] Han Y H, Wu A M, Zhu L C, Yang Y. Visual commonsense reasoning with directional visual connections. Frontiers of Information Technology & Electronic Engineering, 2021, 22(5): 625-637 [9] Zheng W F, Yin L R, Chen X B, Ma Z Y, Liu S, Yang B. Knowledge base graph embedding module design for visual question answering model. Pattern Recognition, 2021, 120: 108153 doi: 10.1016/j.patcog.2021.108153 [10] Zeng W X, Zhao X, Wang W, Tang J Y, Tan Z. Degree-aware alignment for entities in tail. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Virtual Event, China, ACM, 2020: 811−820 [11] Zhao X, Zeng W X, Tang J Y, Wang W, Suchanek F. An experimental study of state-of-the-art entity alignment approaches. IEEE Trans. Knowl. Data Eng., 2022, 34(6): 2610-2625 [12] Zeng W X, Zhao X, Tang J Y, Luo M N, Zheng Q H. Towards entity alignment in the open world: An Unsupervised Approach. In: Proceedings of Database Systems for Advanced Applications (DASFAA). Taipei, Taiwan, Springer. 2021: 272−289 [13] Guo H, Tang J Y, Zeng W X, Zhao X, Liu L. Multi-modal Entity Alignment in Hyperbolic Space. Neurocomputing, 2021, 461: 598-607 doi: 10.1016/j.neucom.2021.03.132 [14] 17 Wang Z C, Lv Q S, Lan X H, Zhang Y. Cross-lingual knowledge graph alignment via graph convolutional networks. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP). Brussels, Belgium, ACL. 2018: 349−357 [15] 14 Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceeding of International Conference on Learning Representations (ICLR). San Diego, CA, USA. 2015 [16] 15. He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE. 2016: 770−778 [17] 18 Chen M H, Tian Y T, Yang M H, Zaniolo C. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Melbourne, Australia, ijcai.org. 2017: 1511−1517 [18] 19 Sun Z Q, Hu W, Zhang Q H, Qu Y Z. Bootstrapping entity alignment with knowledge graph embedding. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Stockholm, Sweden, ijcai.org. 2018: 4396−4402 [19] 32 Chen L Y, Li Z, Wang Y J, Xu T, Wang Z F, Chen E H. MMEA: entity alignment for multi-modal knowledge graph. In: Proceeding of the International Conference on Knowledge Science, Engineering and Management (KSEM). Hangzhou, China, Springer. 2020: 134−147 [20] 16 Guo L B, Sun Z Q, Hu6 W. Learning to exploit long-term relational dependencies in knowledge graphs. In: Proceedings of the International Conference on Machine Learning (ICML). Long Beach, California, USA, PMLR. 2019: 2505−2514 [21] 庄严, 李国良, 冯建华. 知识库实体对齐技术综述. 计算机研究与发展, 2016, 53(1): 165-192 doi: 10.7544/issn1000-1239.2016.20150661Zhuang Y, Li G L, Feng J H. A survey on entity alignment of knowledge base. Journal of Computer Research and Development, 2016, 53(1): 165-192 doi: 10.7544/issn1000-1239.2016.20150661 [22] 乔晶晶, 段利国, 李爱萍. 融合多种特征的实体对齐算法. 计算机工程与设计, 2018, 39(11): 3395-3400 doi: 10.16208/j.issn1000-7024.2018.11.018Qiao J J, Duan L G, Li A P. Entity alignment algorithm based on multi-features. Computer Engineering and Design, 2018, 39(11): 3395-3400 doi: 10.16208/j.issn1000-7024.2018.11.018 [23] Trisedya B D, Qi J Z, Zhang R. Entity alignment between knowledge graphs using attribute embeddings. In: Proceedings of the Conference on Artificial Intelligence (AAAI). Honolulu, Hawaii, USA, AAAI Press. 2019: 297−304 [24] Zhu H, Xie R B, Liu Z Y, Sun M S. Iterative entity alignment via joint knowledge embeddings. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Melbourne, Australia, ijcai.org. 2017: 4258−4264 [25] Chen M H, Tian Y T, Chang K W, Skiena S, Carlo Z. Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Stockholm, Sweden, ijcai.org. 2018: 3998−4004 [26] Cao Y X, Liu Z Y, Li C J, Liu Z Y, Li J Z, Chua T S. Multi-channel graph neural network for entity alignment. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Florence, Italy, ACL. 2019: 1452−1461 [27] Li C J, Cao Y X, Hou L, Shi J X, Li J Z, Chua T S. Semi-supervised entity alignment via joint knowledge embedding model and cross-graph mode. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China, ACL. 2019: 2723−2732 [28] Mao X, Wang W T, Xu H M, Lan M, Wu Y B. MRAEA: an efficient and robust entity alignment approach for cross-lingual knowledge graph. In: Proceedings of the International Conference on Web Search and Data Mining (WSDM). Houston, TX, USA, ACM. 2020: 420−428 [29] Sun Z Q, Hu W, Li C K. Cross-lingual entity alignment via joint attribute-preserving embedding. In: Proceedings of the International Semantic Web Conference (ISWS). Vienna, Austria, Springer. 2018: 628−644 [30] Galárraga L, Razniewski S, Amarilli A, Suchanek F M. Predicting completeness in knowledge bases. In: Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM). Cambridge, United Kingdom, ACM. 2017: 375−383 [31] Ferrada S, Bustos B, Hogan A. IMGpedia: a linked dataset with content-based analysis of Wikimedia images. In: Proceeding of the International Semantic Web Conference (ISWS). Vienna, Auetria: Springer. 2017: 84−93 [32] Chen L Y, Li Z, Wang Y J, Xu T, Wang Z F, Chen E H. MMEA: entity alignment for multi-modal knowledge graph. In: Proceeding of International Conference on Knowledge Science, Engineering and Management (KSEM). Hangzhou, China, Springer. 2020: 134−147 [33] Xie R B, Liu Z Y, Luan H B, Sun M S. Image-embodied knowledge representation learning. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Melbourne, Australia, ijcai.org. 2017: 3140−3146 [34] Mousselly-Sergieh H, Botschen T, Gurevych I, Roth S. A multimodal translation-based approach for knowledge graph representation learning. In: Proceedings of the Joint Conference on Lexical and Computational Semantics. New Orleans, Louisiana, USA, ACL. 2018: 225−234 [35] Tan H, Bansal M. Lxmert: Learning cross-modality encoder representations from transformers. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China, ACL. 2019: 5100−5111 [36] Li L H, Yatskar M, Yin D, et al. What does BERT with vision look at? In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Online, ACL. 2020: 5265−5275 [37] Wang H R, Zhang Y, Ji Z, Pang Y W, Ma L. Consensus-aware visual-semantic embedding for image-text matching. In: Proceeding of the European Conference on Computer Vision(ECCV). Glasgow, UK, Springer. 2020: 18−34 [38] Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft coco: Common objects in context. In: Proceeding of European Conference on Computer Vision (ECCV). Zurich, Switzerland, Springer. 2014: 740−755 [39] Plummer B A, Wang L, Cervantes C M, et al. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In: Proceedings of the IEEE international conference on computer vision (ICCV). Santiago, Chile, IEEE. 2015: 2641−2649 [40] Ren S Q, He K M, Girshick R B, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6): 1137-1149 [41] Hamilton W L, Ying R, Leskovec J. Inductive representation learning on large graphs. In: Proceedings of the International Conference on Neural Information Processing Systems (NIPS). Long Beach, CA, USA. 2017: 1024−1034 [42] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks[C]. In: Proceeding of the 5th International Conference on Learning Representations. Toulon, France, OpenReview. net. 2017 [43] Wu Y T, Liu X, Feng Y S, Wang Z, Yan R, Zhao D Y. Relation-aware entity alignment for heterogeneous knowledge graphs. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Macao, China, ijcai. 2019: 5278−5284 [44] Xing W P, Ghorbani A. Weighted pagerank algorithm. In: Proceedings of the Annual Conference on Communication Networks and Services Research. Fredericton, N.B., Canada, IEEE. 2004: 305−314 [45] Zhang Q H, Sun Z Q, Hu W, Chen M H, Guo L B, Qu Y Z. Multi-view knowledge graph embedding for entity alignment. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Macao, China, ijcai.org. 2019: 5429−5435 [46] Pang N, Zeng W X, Tang J Y, Tan Z, Zhao X. Iterative entity alignment with improved neural attribute embedding. In: Proceedings of the Workshop on Deep Learning for Knowledge Graphs Co-located with the 16th Extended Semantic Web Conference. Portoroz, Slovenia, CEUR. 2019: 41−46 [47] Huang B, Yang F, Yin M X, Mo X Y, Zhong C. A review of multimodal medical image fusion techniques. Computational and mathematical methods in medicine, 2020, 2020(8279342): 1-16 [48] Atrey P K, Hossain M A, El Saddik A, Kankanhalli M S. Multimodal fusion for multimedia analysis: a survey. Multimedia systems, 2010, 16(6): 345-379 doi: 10.1007/s00530-010-0182-0 [49] Poria S, Cambria E, Bajpai R, Hussain A. A review of affective computing: From unimodal analysis to multimodal fusion. Information Fusion, 2017, 37: 98-125 doi: 10.1016/j.inffus.2017.02.003 -

计量
- 文章访问数: 928
- HTML全文浏览量: 640
- 被引次数: 0