刘广灿 曹宇 许家铭 徐波

doi: 10.16383/j.aas.c190076

国家自然科学基金 61602479

北京脑科学专项基金 Z181100001518006

中国科学院战略性先导科技专项基金 XDB32070000


    刘广灿  哈尔滨理工大学自动化学院硕士研究生.主要研究方向为语音分离, 自然语言理解与生成.E-mail:c1240754278@163.com

    曹宇   哈尔滨理工大学副教授.2009年获得哈尔滨工业大学博士学位.主要研究方向为模式识别, 机器视觉和机器人.E-mail:cyhit@163.com

    徐波   中国科学院自动化研究所所长, 研究员.中国科学院脑科学与智能技术卓越创新中心副主任.长期从事人工智能研究.主要研究领域包括类脑智能, 类脑认知计算模型, 自然语言处理与理解, 类脑机器人.E-mail:xubo@ia.ac.cn


    许家铭   中国科学院自动化研究所副研究员.主要研究方向为语音处理与听觉注意, 智能问答和对话, 深度学习和强化学习.本文通信作者.E-mail:jiaming.xu@ia.ac.cn

Natural Language Inference Based on Adversarial Regularization


National Natural Science Foundation of China 61602479

the Beijing Brain Science Project Z181100001518006

the Strategic Priority Research Program of the Chinese Academy of Sciences XDB32070000

    Master student at the School of Automation, Harbin University of Science and Technology. His research interest covers speech separation, natural language understand and generation

    Associate professor at Harbin University of Science and Technology. He received his Ph. D. degree from Harbin Institute of Technology in 2009. His research interest covers pattern recognition, machine vision, and robot

    Professor, president of the Institute of the Automation, Chinese Academy of Sciences, and deputy director of the Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences. His research interest covers brain-inspired intelligence, brain-inspired cognitive models, natural language processing and understanding, and brain-inspired robotics

    Corresponding author: XU Jia-Ming Associate professor at the Institute of Automation, Chinese Academy of Sciences. His research interest covers speech processing and auditory attention, question and answering and dialog system, deep learning, and reinforcement learning. Corresponding author of this paper
  • 摘要: 目前自然语言推理(Natural language inference,NLI)模型存在严重依赖词信息进行推理的现象.虽然词相关的判别信息在推理中占有重要的地位,但是推理模型更应该去关注连续文本的内在含义和语言的表达,通过整体把握句子含义进行推理,而不是仅仅根据个别词之间的对立或相似关系进行浅层推理.另外,传统有监督学习方法使得模型过分依赖于训练集的语言先验,而缺乏对语言逻辑的理解.为了显式地强调句子序列编码学习的重要性,并降低语言偏置的影响,本文提出一种基于对抗正则化的自然语言推理方法.该方法首先引入一个基于词编码的推理模型,该模型以标准推理模型中的词编码作为输入,并且只有利用语言偏置才能推理成功;再通过两个模型间的对抗训练,避免标准推理模型过多依赖语言偏置.在SNLI和Breaking-NLI两个公开的标准数据集上进行实验,该方法在SNLI数据集已有的基于句子嵌入的推理模型中达到最佳性能,在测试集上取得了87.60%的准确率;并且在Breaking-NLI数据集上也取得了目前公开的最佳结果.
  • 图  1  自然语言推理(NLI)整体结构框图

    Fig.  1  The structure of natural language inference (NLI)

    图  2  基于对抗正则化的自然语言推理模型结构框图

    Fig.  2  The structure of natural language inference model based on adversarial regularization

    图  3  词编码器和句子编码器网络结构

    Fig.  3  Word encoder and sentence encoder network structure

    表  1  SNLI数据集上的三个例子

    Table  1  Three examples from the SNLI dataset

    Premise (前提) Hypothesis (假设) Label (标签)
    A soccer game with multiple males playing. Some men are playing a sport. Entailment
    (译文) 一场有多名男子参加的足球比赛. 有些男人在做运动. 蕴涵
    A person on a horse jumps over a broken down airplane. A person is training his horse for a competition. Neutral
    (译文) 一个人骑着马跳过了一架坏掉的飞机. 为了参加比赛, 一个人正在训练他的马. 中立
    A black race car starts up in front of a crowd of people. A man is driving down a lonely road. Contradiction
    (译文) 一辆黑色赛车在一群人面前启动. 一个男人开着车行驶在荒凉的路上. 矛盾
    表  2  不同方法在SNLI上的实验结果(%)

    Table  2  Experimental results for different methods on SNLI (%)

    对比方法 模型 训练准确率 测试准确率
    Mou等[13] (2015) 300D Tree-based CNN encoders 83.3 82.1
    Liu等[12] (2016) 600D (300 + 300) BiLSTM encoders 86.4 83.3
    Liu等[12] (2016) 600D BiLSTM encoders with intra-attention 84.5 84.2
    Conneau等[34] (2017) 4096D BiLSTM with max-pooling 85.6 84.5
    Shen等[6] (2017) Directional self-attention network encoders 91.1 85.6
    Yi等[7] (2018) 300D CAFE (no cross-sentence attention) 87.3 85.9
    Im等[16] (2017) Distance-based Self-Attention Network 89.6 86.3
    Kim等[35] (2018) DRCN (-Attn, -Flag) 91.4 86.5
    Talman等[36] (2018) 600D HBMP 89.9 86.6
    Chen等[37] (2018) 600D BiLSTM with generalized pooling 94.9 86.6
    Kiela等[38] (2018) 512D Dynamic Meta-Embeddings 91.6 86.7
    Yoon等[17] (2018) 600D Dynamic Self-Attention Model 87.3 86.8
    Yoon等[17] (2018) Multiple-Dynamic Self-Attention Model 89.0 87.4
    本文方法 BiLSTM_MP 89.46 86.51
    本文方法 EMRIM 92.71 87.36
    本文方法 BiLSTM_MP + AR 89.02 86.73
    本文方法 EMRIM + AR 93.26 $\textbf{87.60}$
    表  3  不同方法在Breaking-NLI上的测试结果

    Table  3  Experimental results for different methods on Breaking-NLI

    模型 测试准确率(%)
    Decomposable Attention[39] 51.9
    Residual-Stacked-Encoder[40] 62.2
    ESIM[8] 65.6
    KIM[41] 83.5
    EMRIM 88.37
    EMRIM + AR $\textbf{89.96}$
    表  4  权重$\lambda$对NLI准确率的影响

    Table  4  Impact of weight $\lambda$ on NLI accuracy

    权重值 测试准确率(%)
    0.5 86.90
    0.25 87.14
    0.10 87.60
    0.05 87.35
    0.01 87.39
