-
摘要: 灵活的基因名字命名方式使基因名字具有严重的歧义, 这已成为对生物医学文献进行深层自动文本挖掘的主要障碍之一. 基因名字规范化是解决这一问题的有效途径. 本文提出了一种多层歧义消解框架来完成基因名字规范化任务. 基因名字规范化过程中不同阶段有不同的歧义情形, 在本文提出的框架中, 针对这些情形采用了有针对性的解决策略, 包括: 基于词典的基因名字检测, 基于机器学习方法的候选选择以及基于语义的歧义消解. 试验表明, 我们的方法能够在BioCreAtIvE2006的基因名字规范化测试集上取得0.746的F度量.Abstract: The flexible nomenclature of gene name results in severe semantic ambiguity, which is an obstacle for deep biomedical text mining. Gene name normalization (GN) is an effective way to resolve this problem. In this work, a multi-level disambiguation framework was proposed to solve gene name normalization problem. Aiming at different ambiguity situations during the procedure of GN, three different strategies were included in the framework. They were dictionary-based gene name detection, machine-learning-based candidate selection, and semantic-based disambiguation. Experimental results showed that the proposed method could achieve 0.746 F-measure on the BioCreAtIvE2006 GN task test data set.
-
Key words:
- Gene name normalization (GN) /
- maximum entropy model /
- semantic similarity
计量
- 文章访问数: 2480
- HTML全文浏览量: 46
- PDF下载量: 2190
- 被引次数: 0