2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于证据理论的单词语义相似度度量

王俊华 左祥麟 左万利

王俊华, 左祥麟, 左万利. 基于证据理论的单词语义相似度度量. 自动化学报, 2015, 41(6): 1173-1186. doi: 10.16383/j.aas.2015.c131141
引用本文: 王俊华, 左祥麟, 左万利. 基于证据理论的单词语义相似度度量. 自动化学报, 2015, 41(6): 1173-1186. doi: 10.16383/j.aas.2015.c131141
WANG Jun-Hua, ZUO Xiang-Lin, ZUO Wan-Li. Word Semantic Similarity Measurement Based on Evidence Theory. ACTA AUTOMATICA SINICA, 2015, 41(6): 1173-1186. doi: 10.16383/j.aas.2015.c131141
Citation: WANG Jun-Hua, ZUO Xiang-Lin, ZUO Wan-Li. Word Semantic Similarity Measurement Based on Evidence Theory. ACTA AUTOMATICA SINICA, 2015, 41(6): 1173-1186. doi: 10.16383/j.aas.2015.c131141

基于证据理论的单词语义相似度度量

doi: 10.16383/j.aas.2015.c131141
基金项目: 

国家自然科学基金(60903098, 60973040, 61300148, 61472049), 吉林省重点科技攻关项目(20130206051GX),吉林省科技计划青年基金项目(20130522112JH)资助

详细信息
    作者简介:

    王俊华 吉林大学计算机科学与技术学院博士研究生. 2005 年获得东北师范大学传媒科学学院学士学位. 主要研究方向为自然语言处理与Web 数据挖掘.E-mail: wangjunhua1982@126.com

    通讯作者:

    左万利 吉林大学计算机科学与技术学院教授. 1982 年获得吉林大学计算机科学与技术学院学士学位. 主要研究方向为信息检索, 自然语言处理, 本体工程与Web 数据挖掘. E-mail: zuowl@jlu.edu.cn

Word Semantic Similarity Measurement Based on Evidence Theory

Funds: 

Supported by National Natural Science Foundation of China (60903098, 60973040, 61300148, 61472049), Key Scientific and Technological Break-through Program of Jilin Province (20130206051GX), and Science and Technology Planning Youth Fund Project of Jilin Province (20130522112JH)

  • 摘要: 单词语义相似度度量一直是自然语言处理领域的经典和热点问题, 其成果可对词义消歧、机器翻译、本体映射、计算语言学等应用具有重要影响. 本文通过结合证据理论和知识库,提出一个新颖的度量单词语义相似度度量途径. 首先,借助通用本体WordNet获取证据;其次,利用散点图分析证据的合理性; 然后,使用统计和分段线性插值生成基本信任分配函数;最后,结合证据冲突处理、 重要度分配和D-S合成规则实现信息融合获得全局基本信任分配函数, 并在此基础上量化单词语义相似度.在数据集RG(65)上, 对比本文算法评判结果与人类评判结果的相关度,采用5折交叉验证对算法进行分析, 相关度达到0.912,比当前最优方法PS高出0.4个百分点, 比经典算法reLHS、distJC、simLC、simL和simR高出7%~13%; 在数据集MC(30)和WordSim353上也取得了比较好的实验结果, 相关度分别为0.915和0.941;且算法的运行效率和经典算法相当. 实验结果显示使用证据理论解决单词语义相似度问题是合理有效的.
  • [1] Zhou M, Ding Y, Huang C N. Improving translation selection with a new translation model trained by independent monolingual corpora. Computational Linguistics and Chinese Language Processing, 2001, 6(1): 1-26
    [2] [2] Leacock C, Chodorow M. Combining Local Context and WordNet Similarity for Word Sense Identification. Cambridge: MIT Press, 1998. 265-283
    [3] Lu Wen-Peng, Huang He-Yan, Wu Hao. Word sense disambiguation with graph model based on domain knowledge. Acta Automatica Sinica, 2006, 40(12): 2836-2850(鹿文鹏, 黄河燕, 吴昊. 基于领域知识的图模型词义消歧方法. 自动化学报, 2014, 40(12): 2836-2850)
    [4] Liu Yu-Peng, Li Sheng, Zhao Tie-Jun. System combination based on WSD using wordnet. Acta Automatica Sinica, 2010, 36(11): 1575-1580(刘宇鹏, 李生, 赵铁军. 基于WordNet词义消歧的系统融合. 自动化学报, 2010, 36(11): 1575-1580)
    [5] [5] Hassan H, Hassan A, Emam O. Unsupervised information extraction approach using graph mutual reinforcement. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2006. 501-508
    [6] Li Wen-Qing, Sun Xin, Zhang Chang-You, Feng Ye. A semantic similarity measure between ontological concepts. Acta Automatica Sinica, 2012, 38(2): 229-235(李文清, 孙新, 张常有, 冯烨. 一种本体概念的语义相似度计算方法. 自动化学报, 2012, 38(2): 229-235)
    [7] [7] Cui Q, Gao B, Bian J, Qiu S, Liu T Y. KNET: A General Framework for Learning Word Embedding Using Morphological Knowledge. arXiv: 1407.1687, 2014. 1-16
    [8] [8] Rada R, Mili H, Bicknell E, Blettner M. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 1989, 19(1): 17-30
    [9] [9] Resnik P. Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1995. 448-453
    [10] Wu Z B, Palmer M. Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 1994. 133-138
    [11] Agirre E, Rigau G. A proposal for word sense disambiguation using conceptual distance. In: Proceedings of the 1st International Conference on Recent Advances in Natural Language Processing. Stroudsburg, Cambridge: MIT Press, 1995. 35-43
    [12] Jiang J J, Conrath D W. Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the 1997 International Conference on Research in Computational Linguistics. Stroudsburg, PA: ACL, 1997. 19-33
    [13] Lin D K. An information-theoretic definition of similarity. In: Proceedings of the 15th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1998. 296-304
    [14] Hirst G, St-Onge D. Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms. Cambridge: MIT Press, 1998. 305-332
    [15] Li Y H, Bandar Z A, McLean D. An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(4): 871-882
    [16] Yang D Q, Powers D M W. Measuring semantic similarity in the taxonomy of wordnet. In: Proceedings of the 28th Australasian Conference on Computer Science. Darlinghurst, Australia, Australia: Australian Computer Society, Inc., 2005. 315-322
    [17] Budanitsky A, Hirst G. Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 2006, 32(1): 13-47
    [18] Alvarez M A, Lim S J. A graph modeling of semantic similarity between words. In: Proceedings of the 2007 International Conference on Semantic Computing. Irvine, CA: IEEE, 2007. 355-362
    [19] Qin P, Lu Z, Yan Y, Wu F. A new measure of word semantic similarity based on wordnet hierarchy and DAG theory. In: Proceedings of the 2009 International Conference on Web Information Systems and Mining. Shanghai, China: IEEE, 2009. 181-185
    [20] Pirr G. A semantic similarity metric combining features and intrinsic information content. Data Knowledge Engineering, 2009, 68(11): 1289-1308
    [21] Cai S M, Lu Z. An improved semantic similarity measure for word pairs. In: Proceedings of 2010 International Conference on e-Education, e-Business, e-Management and e-Learning. Sanya, China: IEEE, 2010. 212-216
    [22] Snchez D, Batet M, Isern D. Ontology-based information content computation. Knowledge-Based Systems, 2011, 24(2): 297-303
    [23] Snchez D, Batet M, Isern D, Valls A. Ontology-based semantic similarity: a new feature-based approach. Expert Systems with Applications, 2012, 39(9): 7718-7728
    [24] Liu H Z, Bao H, Xu D. Concept Vector for semantic similarity and relatedness based on WordNet structure. Journal of Systems and Software, 2012, 85(2): 370-381
    [25] Dagan I, Lee L, Pereira F C N. Similarity-based models of word cooccurrence probabilities. Machine Learning, 1999, 34(1-3): 43-69
    [26] Brown P F, Pietra S A D, Pietra V J D, Mercer R L. Word-sense disambiguation using statistical methods. In: Proceedings of the 29th Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 1991. 264-270
    [27] Lee L. Similarity-based Approaches to Natural Language Processing [Ph.D. dissertation], Harvard University, Cambridge, MA, USA, 1997.
    [28] Liu L, Zhong M S, Lu R Z. Measuring word similarity based on pattern vector space model. In: Proceedings of the 2009 International Conference on Artificial Intelligence and Computational Intelligence. Piscataway, NJ: IEEE, 2009. 72-76
    [29] Xu T, Qu W G, Tang X R, Ding D X, Li B, Li H. Computing word similarity on large-scale corpus. In: Proceedings of the 4th International Conference on Innovative Computing, Information and Control. Kaohsiung: IEEE, 2009. 1076-1079
    [30] Radinsky K, Agichtein E, Gabrilovich E, Markovitch S. A word at a time: computing word relatedness using temporal semantic analysis. In: Proceedings of the 20th international conference on World Wide Web. New York, NY, USA: ACM, 2011. 337-346
    [31] Shafer G. A Mathematical Theory of Evidence. Princeton: Princeton University Press, 1976.
    [32] Rubenstein H, Goodenough J B. Contextual correlates of synonymy. Communications of the ACM, 1965, 8(10): 627-633
    [33] Zhou Hao, Li Shao-Hong. New combination algorithm of conflict evidences introduced by GDOP. Control and Decision, 2010, 25(2): 278-281(周皓, 李少洪. GDOP引出的冲突证据组合新算法. 控制与决策, 2010, 25(2): 278-281)
    [34] Voorbraak F. A Computationally efficient approximation of Dempster-Shafer theory. International Journal of Man-Machine Studies, 1989, 30(5): 525-536
    [35] Miller G, Charles W. Contextual correlates of semantic similarity. Language and Cognitive Processes, 1991, 6(1): 1-28
    [36] Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E. Placing search in context: the concept revisited. ACM Transactions on Information Systems, 2002, 20(1): 116-131
  • 加载中
计量
  • 文章访问数:  1758
  • HTML全文浏览量:  57
  • PDF下载量:  915
  • 被引次数: 0
出版历程
  • 收稿日期:  2013-12-13
  • 修回日期:  2014-10-27
  • 刊出日期:  2015-06-20

目录

    /

    返回文章
    返回