A Model for Calculating Semantic Relatedness of Words Considering Semantic Relationship Graph
-
摘要: 词语的语义计算是自然语言处理领域的重要问题之一,目前的研究主要集中在词语语义的相似度计算方面,对词语语义的相关度计算方法研究不够.为此,本文提出了一种基于语义词典和语料库相结合的词语语义相关度计算模型.首先,以HowNet和大规模语料库为基础,制定了相关的语义关系提取规则,抽取了大量的语义依存关系;然后,以语义关系三元组为存储形式,构建了语义关系图;最后,采用图论的相关理论,对语义关系图中的语义关系进行处理,设计了一个基于语义关系图的词语语义相关度计算模型.实验结果表明,本文提出的模型在词语语义相关度计算方面具有较好的效果,在WordSimilarity-353数据集上的斯皮尔曼等级相关系数达到了0.5358,显著地提升了中文词语语义相关度的计算效果.Abstract: Word semantic computation is one of the important issues in nature language processing. Current studies usually focus on semantic similarity computation of words, not paying enough attention to the semantic relatedness computation. For this reason, we present a word semantic relatedness calculation model based on semantic dictionary and corpus. First of all, the semantic extraction rules are formulated with "HowNet" and corpus, and a large number of semantic dependency relations are extracted based on these rules. Then, a semantic relationship graph is constructed by storing the semantic relationship triplet tuple. At last, graph theory is used to process the semantic relation in the semantic relationship graph and a semantic relatedness calculation model is designed by means of the semantic relationship graph. Experimental results show that this method has a better performance in word semantic relatedness computation, the Spearman rank correlation on the WordSimilarity-353 dataset being up to 0.5358, a significant efficiency improvement of semantic relatedness computation of Chinese words.1) 本文责任编委 张民
-
表 1 语义关系的存储格式
Table 1 The storage format of semantic relations
关系起始项 关系终止项 语义关系词 拳台 设施 DEF $\cdots$ $\cdots$ $\cdots$ 表 2 不同方法的Spearman系数比较
Table 2 The comparison of Spearman in different methods
模型 Spearman系数 Knowledge-based LIU [23] 0.4202 WU [23] 0.3205 Corpus-based TFIDF [17] 0.4030 COMB [17] 0.5150 ICLinkBased [23] 0.2786 ICSubCategoryNodes [23] 0.2803 WLM [23] 0.4984 WLT [23] 0.5126 Our methods HN 0.4389 DSR 0.5012 HN+DSR 0.5358 Knowledge-based WUP [24] 0.3390 J & C [24] 0.3180 Lin [24] 0.3480 Resnik [24] 0.3530 Corpus-based LSA [24] 0.5810 ESA [24] 0.6290 SSA [24] 0.5370 Knowledge + Corpus-based WTMGW [24] 0.7500 表 3 语义相关度计算的实验结果
Table 3 The experimental result of semantic relatedness computation
词语1 词语2 相关度 足球比赛 比分 0.9004 足球比赛 直播 0.8438 足球比赛 场地 0.6034 足球比赛 规则 0.7925 足球比赛 法庭 0.2016 滑冰 足球比赛 0.2415 滑冰 流畅 0.7924 滑冰 速度 0.8415 滑冰 摔倒 0.7524 滑冰 法庭 0.2965 足球比赛 流畅 0.0251 -
[1] Gracia J, Mena E. Web-based measure of semantic relatedness. In:Proceedings of the 9th International Conference on Web Information Systems Engineering. Auckland, New Zealand:Springer, 2008. 136-150 [2] Resnik P. Using information content to evaluate semantic similarity in a taxonomy. In:Proceedings of the 14th International Joint Conference on Artificial Intelligence. Montreal, Quebec, Canada:Morgan Kaufmann Publishers Inc., 1995. 448-453 [3] Liu H W, Xu J J, Zheng K, Liu C F, Du L, Wu X. Semantic-aware query processing for activity trajectories. In:Proceedings of the 10th ACM International Conference on Web Search and Data Mining. Cambridge, UK:ACM, 2017. 283-292 [4] Ensan F, Bagheri E. Document retrieval model through semantic linking. In:Proceedings of the 10th ACM International Conference on Web Search and Data Mining. Cambridge, UK:ACM, 2017. 181-190 [5] 刘康, 张元哲, 纪国良, 来斯惟, 赵军.基于表示学习的知识库问答研究进展与展望.自动化学报, 2016, 42(6):807-818 http://www.aas.net.cn/CN/Y2016/V42/I6/807Liu Kang, Zhang Yuan-Zhe, Ji Guo-Liang, Lai Si-Wei, Zhao Jun. Representation learning for question answering over knowledge base:an overview. Acta Automatica Sinica, 2016, 42(6):807-818 http://www.aas.net.cn/CN/Y2016/V42/I6/807 [6] Zhang Y M, Iwaihara M. Evaluating semantic relatedness through categorical and contextual information for entity disambiguation. In:Proceedings of the IEEE/ACIS 15th International Conference on Computer and Information Science. Okayama, Japan:IEEE, 2016. 1-6 [7] Li C, Bendersky M, Garg V, Ravi S. Related event discovery. In:Proceedings of the 10th ACM International Conference on Web Search and Data Mining. Cambridge, UK:ACM, 2017. 355-364 [8] Arab M, Jahromi M Z, Fakhrahmad S M. A graph-based approach to word sense disambiguation. An unsupervised method based on semantic relatedness. In:Proceedings of the 24th Iranian Conference on Electrical Engineering. Shiraz, Iran:IEEE, 2016. 250-255 [9] 辛宇, 谢志强, 杨静.基于话题概率模型的语义社区发现方法研究.自动化学报, 2015, 41(10):1693-1710 http://www.aas.net.cn/CN/Y2015/V41/I10/1693Xin Yu, Xie Zhi-Qiang, Yang Jing. Semantic community detection research based on topic probability models. Acta Automatica Sinica, 2015, 41(10):1693-1710 http://www.aas.net.cn/CN/Y2015/V41/I10/1693 [10] Budanitsky A, Hirst G. Evaluating WordNet-based measures of lexical semantic relatedness. Computational Linguistics, 2006, 32(1):13-47 doi: 10.1162/coli.2006.32.1.13 [11] Taieb M A, Aouicha M B, Hamadou A B. A new semantic relatedness measurement using WordNet features. Knowledge and Information Systems, 2014, 41(2):467-497 doi: 10.1007/s10115-013-0672-4 [12] 刘群, 李素建.基于《知网》的词汇语义相似度计算.中文计算语言学, 2002, 7(2):59-76 http://mall.cnki.net/magazine/Article/JSJY201308048.htmLiu Qun, Li Su-Jian. Word similarity computing based on HowNet. Computational Linguistics, 2002, 7(2):59-76 http://mall.cnki.net/magazine/Article/JSJY201308048.htm [13] Zhang P Y. A HowNet-based semantic relatedness kernel for text classification. TELKOMNIKA, 2013, 11(4):1909-1915 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.301.3337 [14] Zhang G P, Yu C, Cai D F, Song Y, Sun J G. Research on concept-sememe tree and semantic relevance computation. In:Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation. Wuhan, China:Tsinghua University Press, 2006. 398-402 [15] 田萱, 杜小勇, 李海华.语义查询扩展中词语——概念相关度的计算.软件学报, 2008, 19(8):2043-2053 https://www.wenkuxiazai.com/doc/3ed9fe8ecc22bcd126ff0c31-2.htmlTian Xuan, Du Xiao-Yong, Li Hai-Hua. Computing term-concept association in semantic-based query expansion. Journal of Software, 2008, 19(8):2043-2053 https://www.wenkuxiazai.com/doc/3ed9fe8ecc22bcd126ff0c31-2.html [16] Ye F Y, Zhang F, Luo X F, Xu L Y. Research on measuring semantic correlation based on the Wikipedia hyperlink network. In:Proceedings of the IEEE/ACIS 12th International Conference on Computer and Information Science. Niigata, Japan:IEEE, 2013. 309-314 [17] 万富强, 吴云芳.基于中文维基百科的词语语义相关度计算.中文信息学报, 2013, 27(6):31-38 http://www.docin.com/p-1630396880.htmlWan Fu-Qiang, Wu Yun-Fang. Computing lexical semantic relatedness with Chinese Wikipedia. Journal of Chinese Information Processing, 2013, 27(6):31-38 http://www.docin.com/p-1630396880.html [18] 王宏显, 周强, 邬晓钧. 《知网》语义关系图的自动构建.中文信息学报, 2008, 22(5):90-96 http://d.old.wanfangdata.com.cn/Periodical/zwxxxb200805014Wang Hong-Xian, Zhou Qiang, Wu Xiao-Jun. The automatic construction of lexical semantic relationship graph based on HowNet. Journal of Chinese Information Processing, 2008, 22(5):90-96 http://d.old.wanfangdata.com.cn/Periodical/zwxxxb200805014 [19] 郑丽娟, 邵艳秋, 杨尔弘.中文非投射语义依存现象分析研究.中文信息学报, 2014, 28(6):41-47 http://d.old.wanfangdata.com.cn/Periodical/zwxxxb201406006Zheng Li-Juan, Shao Yan-Qiu, Yang Er-Hong. Analysis of the non-projective phenomenon in Chinese semantic dependency graph. Journal of Chinese Information Processing, 2014, 28(6):41-47 http://d.old.wanfangdata.com.cn/Periodical/zwxxxb201406006 [20] 张仰森, 郑佳. 中文文本语义错误侦测方法研究. 计算机学报, 2016, 39, 在线出版号No. 122Zhang Yang-Sen, Zheng Jia. Study of semantic error detecting method for Chinese text. Chinese Journal of Computers, 2016, 39, Online Publishing No.122 [21] 张沪寅, 刘道波, 温春艳.基于《知网》的词语语义相似度改进算法研究.计算机工程, 2015, 41(2):151-156 doi: 10.3969/j.issn.1000-3428.2015.02.029Zhang Hu-Yin, Liu Dao-Bo, Wen Chun-Yan. Research on improved algorithm of word semantic similarity based on HowNet. Computer Engineering, 2015, 41(2):151-156 doi: 10.3969/j.issn.1000-3428.2015.02.029 [22] Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E. Placing search in context:the concept revisited. ACM Transactions on Information Systems, 2002, 20(1):116-131 doi: 10.1145/503104.503110 [23] 汪祥, 贾焰, 周斌, 丁兆云, 梁政.基于中文维基百科链接结构与分类体系的语义相关度计算.小型微型计算机系统, 2011, 32(11):2237-2242 http://www.doc88.com/p-9965404579619.htmlWang Xiang, Jia Yan, Zhou Bin, Ding Zhao-Yun, Liang Zheng. Computing semantic relatedness using Chinese Wikipedia links and taxonomy. Journal of Chinese Computer Systems, 2011, 32(11):2237-2242 http://www.doc88.com/p-9965404579619.html [24] Liu B Q, Feng J, Liu M, Liu F, Wang X L, Li P. Computing semantic relatedness using a word-text mutual guidance model. In:Proceedings of the 3rd CCF Conference on Natural Language Processing and Chinese Computing. Shenzhen, China:Springer, 2014. 67-78