基于关系指数和表示学习的领域集成实体链接

蒋胜臣; 王红斌; 余正涛; 线岩团; 王红涛

doi:10.16383/j.aas.c180705

基于关系指数和表示学习的领域集成实体链接

doi: 10.16383/j.aas.c180705

蒋胜臣^{1, 2,},
王红斌^{1, 2,},
余正涛^{1, 2,},
线岩团^{1, 2,},
王红涛^{1, 2,}

1.
昆明理工大学信息工程与自动化学院昆明 650500
2.
昆明理工大学云南省人工智能重点实验室昆明 650500

基金项目: 国家自然科学基金(61562052, 61462054)

详细信息

作者简介:
蒋胜臣：昆明理工大学信息工程与自动化学院硕士研究生. 主要研究方向为自然语言处理, 知识图谱. E-mail: jsc_study@hotmail.com

王红斌：博士, 昆明理工大学信息工程与自动化学院副教授. 主要研究方向为智能信息系统, 自然语言处理, 信息检索. E-mail: whbin2007@126.com

余正涛：博士, 昆明理工大学信息工程与自动化学院教授. 主要研究方向为自然语言处理, 机器翻译, 信息检索.本文通信作者. E-mail: ztyu@hotmail.com

线岩团：昆明理工大学信息工程与自动化学院博士研究生. 主要研究方向为自然语言处理, 信息抽取, 机器翻译. E-mail: yantuan.xian@gmail.com

王红涛：昆明理工大学信息工程与自动化学院硕士研究生. 主要研究方向为自然语言处理, 信息抽取. E-mail: 15893739522@163.com

计量
- 文章访问数: 921
- HTML全文浏览量: 193
- PDF下载量: 147
- 被引次数: 0
出版历程
- 收稿日期: 2019-05-24
- 录用日期: 2019-07-17
- 网络出版日期: 2021-09-28
- 刊出日期: 2021-10-20

Domain Integrated-Entity Links Based on Relationship Indices andRepresentation Learning

JIANG Sheng-Chen^{1, 2
,},
WANG Hong-Bin^{1, 2
,},
YU Zheng-Tao^{1, 2
,},
XIAN Yan-Tuan^{1, 2
,},
WANG Hong-Tao^{1, 2
,}

1.
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500
2.
Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500

Funds: Supported by National Natural Science Foundation of China (61562052, 61462054)

More Information

Author Bio:
JIANG Sheng-Chen　Master student at Faculty of Information Engineering and Automation, Kunming University of Science and Technology. His research interest covers natural language processing and knowledge graph

WANG Hong-Bin　Ph.D., associate professor at the Faculty of Information Engineering and Automation, Kunming University of Science and Technology. His research interest covers intelligent information system, natural language processing, information retrieval

YU Zheng-Tao　Ph.D., professor at the Faculty of Information Engineering and Automation, Kunming University of Science and Technology. His research interest covers natural language processing, machine translation, and information retrieval. Corresponding author of this paper

XIAN Yan-Tuan　Ph.D. candidate at the Faculty of Information Engineering and Automation, Kunming University of Science and Technology. His research interest covers natural language processing, information extraction, and machine translation

WANG Hong-Tao　Master student at the Faculty of Information Engineering and Automation, Kunming University of Science and Technology. His research interest covers natural language processing, and information extraction

摘要

摘要: 本文针对现有方法不能很好结合文本信息和知识库信息的问题, 提出一种基于关系指数和表示学习的领域集成实体链接方法.首先, 本文构建了特定领域知识库; 其次, 运用表示学习从文本信息中得到的向量表示计算实体指称项的上下文、主题关键词、扩展词三个特征的相似度; 然后, 利用知识库中的关系信息计算候选实体的关系指数; 最后, 将这三种相似度及关系指数相融合, 用于实体链接. 实验结果表明, 相较于现有方法, 本文方法能够有效地提高F1值, 并且该方法不需要标注语料, 更加简单高效, 适应于缺少标注语料的特定领域.
- 集成实体链接 /
- 特定领域 /
- 表示学习 /
- 关系指数
Abstract: Aiming at the problem that the existing methods can′t combine text information and knowledge base information well, this paper proposes a domain integrated-entity links based on relationship indices and representation learning. Firstly, this paper builds a domain-specific knowledge base; Secondly, using the vector representation of learning representation from the text information to calculate the similarity of the three features of the context, topic keywords and extension words of the entity referential item; Then using the relationship information in the knowledge base to calculate the relationship index of the candidate entities; Finally, these three similarities and relationship indices are combined for physical links; The experimental results show that compared with the existing methods, the proposed method can effectively improve the F1 value, and the method does not need to label the corpus, which is simpler and more efficient, and is suitable for the specific field lacking the labeled corpus.
- Integrated-entity link /
- specific field /
- representation learning /
- relationship index

HTML全文

图 1 模型框架图

Fig. 1 Frame diagram of the model

下载: 全尺寸图片幻灯片

图 2 融合知识和主题信息的词向量表示模型

Fig. 2 Word vector representation model that fuses knowledge and subject information

下载: 全尺寸图片幻灯片

表 1 不同特征组合实验结果统计

Table 1 Statistics of experimental results of different feature combinations

特征组合	P(%)	R(%)	F1
上下文	64.8	61.7	63.2
上下文+主题关键词	79.3	80.9	80.1
上下文+主题关键词+扩展词	87.7	86.5	87.1
上下文+主题关键词+扩展词+关系指数	92.6	90.4	91.5

下载: 导出CSV

表 2 不同v值实验结果统计

Table 2 Statistical results of different v values

v	P(%)	R(%)	F1
1	84.4	81.8	83.1
2	89.6	87.1	88.3
3	92.6	90.4	91.5
4	90.3	89.4	89.8

下载: 导出CSV

表 3 不同w值实验结果统计

Table 3 Statistical results of different w values

w	P(%)	R(%)	F1
1	86.4	83.5	84.9
2	89.8	87.6	88.7
3	90.5	89.2	89.8
4	92.6	90.4	91.5
5	89.7	88.6	89.1

下载: 导出CSV

表 4 各个关系子属性的实验结果统计

Table 4 Statistical results of experimental results for each relationship sub-attribute

关系属性	P(%)	R(%)	F1
直接关系	89.3	87.2	88.2
直接关系+垂直间接关系	91.8	88.7	90.2
直接关系+水平间接关系	91.1	87.6	89.3
直接关系+两个间接关系	92.6	90.4	91.5

下载: 导出CSV

表 5 本文方法与其他方法的比较

Table 5 Comparison of methods in this paper withother methods

方法名	P(%)	R(%)	F1
Wikify	71.3	73.9	72.6
Cucerzan	76.5	80.2	78.3
SVM^[27]	83.1	85.3	84.2
Score^[28]	87.4	86.5	86.9
EAT^[21]	80.7	82.9	81.8
Zero-shot^[29]	91.4	88.0	89.7
本文的方法	92.6	90.4	91.5

下载: 导出CSV

表 6 不同领域的实验结果统计

Table 6 Statistics of experimental results in different fields

领域名称	P(%)	R(%)	F1
旅游领域	92.6	90.4	91.5
少数民族领域	91.8	89.6	90.7
药材领域	90.3	91.4	90.8

下载: 导出CSV

参考文献(29)

[1]	Hoffart J, Yosef M A, Bordino I, et al. Robust disambiguation of named entities in text. In: Proceedings of Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011. 782−792
[2]	Burdick D, Fagin R, Kolaitis P G, et al. Expressive power of entity-linking frameworks. Journal of Computer and System Sciences, 2019, 100: 44-69 doi: 10.1016/j.jcss.2018.09.001
[3]	Saeedi A, Peukert E, Rahm E. Using link features for entity clustering in knowledge graphs. In: Proceedings of European Semantic Web Conference. Springer, Cham, 2018. 576−592
[4]	Dubey M, Banerjee D, Chaudhuri D, et al. Earl: Joint entity and relation linking for question answering over knowledge graphs. In: Proceedings of International Semantic Web Conference. Springer, Cham, 2018. 108−126
[5]	Bunescu R C. Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of Conference of the European Chapter of the Association for Computational Linguistics, 2006. 9−16
[6]	Ganea O E, Ganea M, Lucchi A, et al. Probabilistic bag-of-hyperlinks model for entity linking. In: Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2016. 927−938
[7]	Cucerzan S. Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007. 708−716
[8]	Nguyen H T, Cao T H. Exploring wikipedia and text features for named entity disambiguation. In: Proceedings of Asian Conference on Intelligent Information and Database Systems. Springer, Berlin, Heidelberg, 2010. 11−20
[9]	Zeng Y, Wang D, Zhang T, et al. Linking entities in short texts based on a Chinese semantic knowledge base. Natural Language Processing and Chinese Computing. Springer, Berlin, Heidelberg, 2013. 266−276
[10]	Han X, Sun L, Zhao J. Collective entity linking in web text: A graph-based method. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 2011. 765−774
[11]	Liu Q, Zhong Y, Li Y, et al. Graph-based collective Chinese entity linking algorithm. Comput. Res. Develop, 2016, 53(2): 270-283
[12]	Ferragina P, Scaiella U. Tagme: On-the-fly annotation of short text fragments. In: Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, 2010. 1625−1628
[13]	Francis-Landau M, Durrett G, Klein D. Capturing semantic similarity for entity linking with convolutional neural networks. In: Proceedings of the North American Chapter of the Association for Computational Linguistics, 2016. 1256−1261
[14]	Wang W, Arora R, Livescu K, et al. On deep multi-view representation learning: Objectives and optimization. In: Proceedings of International Conference on Machine Learning 2015, 2016. 1083−1092
[15]	Bengio Y, Courville A, Vincent P. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2013, 35(8): 1789-1828
[16]	ZENG Qi, ZHOU Gang, LAN Ming-jing, et al. Polysemous Word Multi-embedding Calculation. Journal of Chinese Computer Systems, 2016, 37(5): 1417-1421
[17]	Raiman J R, Raiman O M. DeepType: Multilingual entity linking by neural type system evolution. In: Proceedings of 32nd AAAI Conference on Artificial Intelligence, 2018. 5406-5413
[18]	Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space. In: Proceedings of the International Conference on Learning Representations, 2013. 1−12
[19]	Goldberg Y, Levy O. Word2vec Explained: Deriving Mikolov et al.′s negative-sampling word-embedding method. CoRR abs/1402.3722, 2014, 1−5
[20]	Kar R, Reddy S, Bhattacharya S, et al. Task-specific representation learning for web-scale entity disambiguation. In: Proceedings of Thirty-Second AAAI Conference on Artificial Intelligence, 2018. 5812−5819
[21]	Moreno J G, Besancon R, Beaumont R, et al. Combining Word and Entity Embeddings for Entity Linking. In: Proceedings of European Semantic Web Conference. Springer, Cham, 2017. 337−352
[22]	You M, Yang H, Lin Z, et al. BTM Topic Modeling Approach to Named Entity Linking. Journal of Physics: Conference Series. IOP Publishing, 2018, 1060(1): 012-027
[23]	Gao Y, Li A, Duan L. Entity disambiguation method based on multi-feature fusion graph model for entity linking. Application Research of Computers,, 2017: 2909-2914
[24]	Sakor A, Mulang I O, Singh K, et al. Old is gold: Linguistic driven approach for entity and relation linking of short text. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019. 2336−2346
[25]	Xu C, Bai Y, Bian J, et al. Rc-net: A general framework for incorporating knowledge into word representations. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management. ACM, 2014. 1219−1228
[26]	Ganea O E, Ganea M, Lucchi A, et al. Bordes A, Usunier N, Garcia-Duran A, et al. Translating embeddings for modeling multi-relational data. In: Proceedings of Advances in Neural Information Processing Systems, 2013. 2787−2795
[27]	Bao-Xing H, Bao T F, Zhu H S, et al. Topic modeling approach to named entity linking. Journal of Software, 2014, (9): 2076-2087
[28]	WU Yunbing, ZHU Danhong, LIAO Xiangwen, et al. Knowledge Graph Reasoning Based on Paths of Tensor Factorization. Pattern Recognition and Artificial Intelligence, 2017, 30(5): 473-480
[29]	Kundu G, Sil A, Florian R, et al. Neural cross-lingual coreference resolution and its application to entity linking. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018. 395−400