中文人称名词短语单复数自动识别

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

中文人称名词短语单复数自动识别

郎君, 秦兵, 刘挺, 李正华, 李生

文章导航 > 自动化学报 > 2008 > 34(8): 972-979

郎君, 秦兵, 刘挺, 李正华, 李生. 中文人称名词短语单复数自动识别. 自动化学报, 2008, 34(8): 972-979. doi: 10.3724/SP.J.1004.2008.00972

引用本文:

郎君, 秦兵, 刘挺, 李正华, 李生. 中文人称名词短语单复数自动识别. 自动化学报, 2008, 34(8): 972-979. doi: 10.3724/SP.J.1004.2008.00972

LANG Jun, QIN Bing, LIU Ting, LI Zheng-Hua, LI Sheng. Number Type Recognition of Chinese Personal Noun Phrase. ACTA AUTOMATICA SINICA, 2008, 34(8): 972-979. doi: 10.3724/SP.J.1004.2008.00972

Citation:

LANG Jun, QIN Bing, LIU Ting, LI Zheng-Hua, LI Sheng. Number Type Recognition of Chinese Personal Noun Phrase. ACTA AUTOMATICA SINICA, 2008, 34(8): 972-979. doi: 10.3724/SP.J.1004.2008.00972

郎君, 秦兵, 刘挺, 李正华, 李生. 中文人称名词短语单复数自动识别. 自动化学报, 2008, 34(8): 972-979. doi: 10.3724/SP.J.1004.2008.00972

引用本文:

郎君, 秦兵, 刘挺, 李正华, 李生. 中文人称名词短语单复数自动识别. 自动化学报, 2008, 34(8): 972-979. doi: 10.3724/SP.J.1004.2008.00972

LANG Jun, QIN Bing, LIU Ting, LI Zheng-Hua, LI Sheng. Number Type Recognition of Chinese Personal Noun Phrase. ACTA AUTOMATICA SINICA, 2008, 34(8): 972-979. doi: 10.3724/SP.J.1004.2008.00972

Citation:

LANG Jun, QIN Bing, LIU Ting, LI Zheng-Hua, LI Sheng. Number Type Recognition of Chinese Personal Noun Phrase. ACTA AUTOMATICA SINICA, 2008, 34(8): 972-979. doi: 10.3724/SP.J.1004.2008.00972

中文人称名词短语单复数自动识别

doi: 10.3724/SP.J.1004.2008.00972

1.
哈尔滨工业大学信息检索研究室哈尔滨 150001

通讯作者:
郎君

中图分类号: TP391
计量
- 文章访问数: 3266
- HTML全文浏览量: 64
- PDF下载量: 1702
- 被引次数: 0
出版历程
- 收稿日期: 2007-05-09
- 修回日期: 2007-09-20
- 刊出日期: 2008-08-20

Number Type Recognition of Chinese Personal Noun Phrase

1.
Information Retrieval Laboratory, Harbin Institute of Technology, Harbin 150001

More Information

Corresponding author: LANG Jun

摘要: 名词短语的单复数信息在共指消解中是必不可少的特征. 与英语不同, 中文属于汉藏语系, 名词本身不能明显体现单复数信息, 需要借助其所在的名词短语来进行体现. 本文在自动内容抽取(Automatic content extraction, ACE)语料上抽取得到人称名词短语的单复数信息, 分别采用了基于规则和机器学习的方法来进行人称名词短语的单复数自动识别. 基于规则的方法, 在一些知识资源的基础上定义了规则模板库, 每条规则采用槽和槽值的方法来进行体现; 机器学习方法采用最大熵模型组合考察了词形、词性、词义、数量关系等特征. 两种方法分别达到了48.24\%和87.48\%的正确率. 实验结果显示, 基于规则的方法能够保证精确率而不能保证召回率, 机器学习的方法可以更好地完成单复数信息的识别任务.
- 人称名词短语 /
- 单复数 /
- 机器学习
Abstract: Number type is absolutely a necessary feature for co-reference resolution. Different from English, Chinese, belonging to Sino-Tibetan language family, cannot reflect number information directly by nouns themselves. However, the problem can be tackled by virtue of noun phrase. This paper presents two methods of number type recognition of Chinese personal noun phrase and their tests on ACE 2005 corpus. The first one is rule-based, which defines the template rules based on some knowledge resources, employing some slots and slot values. The other one is machine learning method, with maximum entropy model on features of word, pos, word sense, and quantitative relation. The two methods reached total accuracies of 48.24\% and 87.48\%, respectively. Experimental results showed that the rule based method could ensure the precision but the recall, while the machine learning method managed the number type recognition task.
- Personal noun phrase /
- number type /
- machine learning

参考文献(0)

资源附件(0)

WeChat

点击查看大图

计量

文章访问数: 3266
HTML全文浏览量: 64
PDF下载量: 1702
被引次数: 0

/

下载: 全尺寸图片幻灯片

分享

用微信扫码二维码

分享至好友和朋友圈

返回

版权所有 © 《自动化学报》编辑部京ICP备14019135号-6

地址：北京中关村东路95号邮政编码：100190E-mail：aas_editor@ia.ac.cn

电话：010-82544677 (日常咨询和稿件处理)，010-82544653(费用管理、寄刊)

本系统由北京仁和汇智信息技术有限公司开发技术支持： info@rhhz.net