2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于深度学习的维吾尔语名词短语指代消解

李敏 禹龙 田生伟 吐尔根·依布拉音 赵建国

李敏, 禹龙, 田生伟, 吐尔根·依布拉音, 赵建国. 基于深度学习的维吾尔语名词短语指代消解. 自动化学报, 2017, 43(11): 1984-1992. doi: 10.16383/j.aas.2017.c160330
引用本文: 李敏, 禹龙, 田生伟, 吐尔根·依布拉音, 赵建国. 基于深度学习的维吾尔语名词短语指代消解. 自动化学报, 2017, 43(11): 1984-1992. doi: 10.16383/j.aas.2017.c160330
LI Min, YU Long, TIAN Sheng-Wei, TurgLm IBRAHIM, ZHAO Jian-Guo. Coreference Resolution of Uyghur Noun Phrases Based on Deep Learning. ACTA AUTOMATICA SINICA, 2017, 43(11): 1984-1992. doi: 10.16383/j.aas.2017.c160330
Citation: LI Min, YU Long, TIAN Sheng-Wei, TurgLm IBRAHIM, ZHAO Jian-Guo. Coreference Resolution of Uyghur Noun Phrases Based on Deep Learning. ACTA AUTOMATICA SINICA, 2017, 43(11): 1984-1992. doi: 10.16383/j.aas.2017.c160330

基于深度学习的维吾尔语名词短语指代消解

doi: 10.16383/j.aas.2017.c160330
基金项目: 

国家自然科学基金 61563051

自治区科技人才培养项目 QN2016YX0051

国家自然科学基金 61662074

国家自然科学基金 61262064

国家自然科学基金 61331011

详细信息
    作者简介:

    李敏  新疆大学硕士研究生.主要研究方向为自然语言处理.E-mail:limin_xju@163.com

    田生伟  新疆大学教授.主要研究方向为自然语言处理与计算机智能技术.E-mail:tianshengwei@163.com

    吐尔根·依布拉音  新疆大学教授.主要研究方向为计算机智能技术与自然语言处理.E-mail:mytlgxj@126.com

    赵建国  新疆大学副教授.主要研究方向为维汉双语对比.E-mail:13899951918@126.com

    通讯作者:

    禹龙  新疆大学教授.主要研究方向为计算机智能技术与计算机网络.本文通信作者.E-mail:yul_xju@163.com

Coreference Resolution of Uyghur Noun Phrases Based on Deep Learning

Funds: 

National Natural Science Foundation of China 61563051

Regional Scientific and Technological Personnel Training Project QN2016YX0051

National Natural Science Foundation of China 61662074

National Natural Science Foundation of China 61262064

National Natural Science Foundation of China 61331011

More Information
    Author Bio:

     Master student at Xinjiang University. Her main research interest is natural language processing

     Professor at Xinjiang University. His research interest covers natural language processing and computer intelligence technology

     Professor at Xinjiang University. His research interest covers computer intelligence technology and natural language processing

     Associate professor at Xinjiang University. His main research interest is Uyghur-Chinese bilinguals comparison

    Corresponding author: YU Long  Professor at Xinjiang University. Her research interest covers computer intelligence technology and computer networks. Corresponding author of this paper
  • 摘要: 针对维吾尔语名词短语指代现象,提出了一种利用栈式自编码深度学习算法进行基于语义特征的指代消解方法.通过对维吾尔语名词短语指称性的研究,提取出利于消解任务的13项特征.为提高特征对文本语义的表达,在特征集中引入富含词汇语义及上下文位置关系的Word embedding.利用深度学习机制无监督的提取隐含的深层语义特征,训练Softmax分类器进而完成指代消解任务.该方法在维吾尔语指代消解任务中的准确率为74.5%,召回率为70.6%,F值为72.4%.实验结果证明,深度学习模型较浅层的支持向量机更合适于本文的指代消解任务,对Word embedding特征项的引入,有效地提高了指代消解模型的性能.
    1)  本文责任编委 张民
  • 图  1  维吾尔语名词短语指代消解框架

    Fig.  1  The coreference resolution of Uyghur noun phrases framework

    图  2  栈式自编码网络结构图

    Fig.  2  Structure of SAE

    表  1  指示词库

    Table  1  The demonstrative thesaurus

    指人指物指性质指数量指地点
    /这个 /这样 /这么 /这儿
    /这个 /这样, /这么 /这儿
    /那个 /那样 /那么 /那儿
    /那个 /那样 /那么 /那儿
    $\cdots$$\cdots$$\cdots$$\cdots$
    下载: 导出CSV

    表  2  维吾尔语名词短语指代消解训练和测试样例

    Table  2  Training or testing sample format for Uyghur noun phrases

    先行语照应语样例值(13个特征值+ 50维先行语、照应语Word embedding)是否指代
    0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 00.133, $-$0.053, 0.114, $\cdots$, $-$0.108
    0.177, $-$0.008, 0.127, $\cdots$, $-$0.055
    0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 00.076, 0.099, 0.019, $\cdots$, $-$0.069
    0.177, $-$0.008, 0.127, $\cdots$, $-$0.055
    0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 00.060, $-$0.135, 0.277, $\cdots$, $-$0.042
    0.177, $-$0.008, 0.127, $\cdots$, $-$0.055
    下载: 导出CSV

    表  3  SAE模型最优参数

    Table  3  Optimal parameters of SAE

    参数 $\rho$ $\beta$ $\lambda$maxIter
    0.133E$-$3800
    下载: 导出CSV

    表  4  基于SAE模型的有效性验证

    Table  4  The validation of SAE effectiveness

    模型 $P$ (%) $R$ (%) $F$ (%)
    SAE$^1$61.77573.31967.054
    SAE$^2$66.06471.25668.562
    SAE$^3$66.13471.99568.940
    SAE$^4$68.69571.74370.186
    SVM66.72770.11568.379
    下载: 导出CSV

    表  5  特征集对结果的影响

    Table  5  The influence of introducing features sets

    特征项 $P$ (%) $R$ (%) $F$ (%)
    AnProperNoun46.0790.8561.681
    CaProperNoun66.1590.7131.411
    AnDefiniteNP75.8971.1022.172
    CaDefiniteNP59.9324.5798.508
    AnDemonstrativeNP65.4328.40914.903
    CaDemonstrativeNP57.0929.11215.716
    AnPossessionNP60.41126.91237.222
    CaPossessionNP44.43938.40341.201
    AnPossessionNP48.23151.08249.616
    CaPossessionNP45.83170.33455.498
    PropertyFit64.47051.10857.017
    SinglePluralFit58.63180.20567.742
    FullMatch68.69571.74370.186
    下载: 导出CSV

    表  6  Word embedding的引入对实验的影响

    Table  6  The influence of introducing word embedding

    模型 $P$ (%) $R$ (%) $F$ (%)
    SAE$^1$60.91574.56967.054
    SAE$^1$ + WE64.38270.10367.121
    SAE$^2$66.06471.25668.562
    SAE$^2+$ WE66.57171.41968.910
    SAE$^3$66.13471.99568.940
    SAE$^3$ + WE68.21572.37570.233
    SAE$^4$68.69571.74370.186
    SAE$^4$+ WE72.35269.74371.024
    下载: 导出CSV

    表  7  Word embedding维度对实验的影响

    Table  7  The influence of adjusting word embedding dimension

    SAE$^4$ + WESVM + WE
    $P$ (%) $R$ (%) $F$ (%) $P$ (%) $R$ (%) $F$ (%)
    1072.469.771.067.070.368.6
    5073.969.871.870.569.870.1
    10074.570.672.469.969.969.9
    15075.868.471.969.070.469.7
    20077.067.071.968.270.969.4
    下载: 导出CSV
  • [1] Zelenko D, Aone C, Tibbetts J. Coreference resolution for information extraction. In:Proceedings of the 2004 ACL Workshop on Reference Resolution and its Applications. Barcelona, Spain:ACL, 2004. 9-16
    [2] Soon W M, Ng H T, Lim D C Y. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 2001, 27(4):521-544 doi: 10.1162/089120101753342653
    [3] Bergsma S, Lin D K. Bootstrapping path-based pronoun resolution. In:Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Sydney:Association for Computational Linguistics, 2006. 33-40
    [4] Ng V. Semantic class induction and coreference resolution. In:Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. Prague, Czech Republic:ACL, 2007. 536-543
    [5] Bengtson E, Roth D. Understanding the value of features for coreference resolution. In:Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. Honolulu:Association for Computational Linguistics, 2008. 294-303
    [6] 周俊生, 黄书剑, 陈家骏, 曲维光.一种基于图划分的无监督汉语指代消解算法.中文信息学报, 2007, 21(2):77-82 http://d.wanfangdata.com.cn/Periodical/zwxxxb200702012

    Zhou Jun-Sheng, Huang Shu-Jian, Chen Jia-Jun, Qu Wei-Guang. A new graph clustering algorithm for Chinese noun phrase coreference resolution. Journal of Chinese Information Processing, 2007, 21(2):77-82 http://d.wanfangdata.com.cn/Periodical/zwxxxb200702012
    [7] 王海东, 胡乃全, 孔芳, 周国栋.指代消解中语义角色特征的研究.中文信息学报, 2009, 23(1):23-29 http://d.wanfangdata.com.cn/Periodical/zwxxxb200901004

    Wang Hai-Dong, Hu Nai-Quan, Kong Fang, Zhou Guo-Dong. Research on semantic role information in anaphora resolution. Journal of Chinese Information Processing, 2009, 23(1):23-29 http://d.wanfangdata.com.cn/Periodical/zwxxxb200901004
    [8] 孔芳, 周国栋.基于树核函数的中英文代词消解.软件学报, 2012, 23(5):1085-1099 http://d.wanfangdata.com.cn/Periodical/rjxb201205005

    Kong Fang, Zhou Guo-Dong. Pronoun resolution in English and Chinese languages based on tree kernel. Journal of Software, 2012, 23(5):1085-1099 http://d.wanfangdata.com.cn/Periodical/rjxb201205005
    [9] 奚雪峰, 周国栋.基于Deep Learning的代词指代消解.北京大学学报(自然科学版), 2014, 50(1):100-110 http://d.wanfangdata.com.cn/Periodical/bjdxxb201401015

    Xi Xue-Feng, Zhou Guo-Dong. Pronoun resolution based on deep learning. Acta Scientiarum Naturalium Universitatis Pekinensis, 2014, 50(1):100-110 http://d.wanfangdata.com.cn/Periodical/bjdxxb201401015
    [10] Mikolov T, Sutskever I, Chen K, Corrado G S, Dean J. Distributed representations of words and phrases and their compositionality. In:Proceedings of the 2013 Advances in Neural Information Processing Systems 26. Lake Tahoe, Nevada, USA:Curran Associates, Inc., 2013. 3111-3119
    [11] Kim Y. Convolutional neural networks for sentence classification. In:Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar:Association for Computational Linguistics, 2014. 746-1751
    [12] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. In:Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, USA, 2011. 315-323
    [13] Glorot X, Bordes A, Bengio Y. Domain adaptation for large-scale sentiment classification:a deep learning approach. In:Proceedings of the 28th International Conference on Machine Learning. Bellevue, Washington, USA:Omnipress, 2011. 513-520
    [14] Lu S X, Chen Z B, Xu B. Learning new semi-supervised deep auto-encoder features for statistical machine translation. In:Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, Maryland, USA:ACL, 2014. 122-132
    [15] 王厚峰, 梅铮.鲁棒性的汉语人称代词消解.软件学报, 2005, 16(5):700-707 http://d.wanfangdata.com.cn/Periodical/rjxb200505008

    Wang Hou-Feng, Mei Zheng. Robust pronominal resolution within Chinese text. Journal of Software, 2005, 16(5):700-707 http://d.wanfangdata.com.cn/Periodical/rjxb200505008
    [16] 帕提古力·麦麦提. 基于向心理论的维吾尔语语篇回指研究[博士学位论文], 中央民族大学, 中国, 2010

    Patgul·Mamat. Uyghur Discourse Anaphora based on Centering Theory[Ph.D. dissertation], Minzu University of China, China, 2010
    [17] 贺宇, 潘达, 付国宏.基于自动编码特征的汉语解释性意见句识别.北京大学学报(自然科学版), 2015, 51(2):235-240 http://d.wanfangdata.com.cn/Periodical/bjdxxb201502006

    He Yu, Pan Da, Fu Guo-Hong. Chinese explanatory opinionated sentence recognition based on auto-Encoding features. Acta Scientiarum Naturalium Universitatis Pekinensis, 2015, 51(2):235-240 http://d.wanfangdata.com.cn/Periodical/bjdxxb201502006
  • 加载中
图(2) / 表(7)
计量
  • 文章访问数:  1862
  • HTML全文浏览量:  304
  • PDF下载量:  550
  • 被引次数: 0
出版历程
  • 收稿日期:  2016-04-12
  • 录用日期:  2016-08-02
  • 刊出日期:  2017-11-20

目录

    /

    返回文章
    返回