基于对抗正则化的自然语言推理

刘广灿; 曹宇; 许家铭; 徐波

doi:10.16383/j.aas.c190076

基于对抗正则化的自然语言推理

doi: 10.16383/j.aas.c190076

刘广灿^1,2,,
曹宇^1,,
许家铭^2, ,,
徐波^2,3,

1.
哈尔滨理工大学自动化学院哈尔滨 150080
2.
中国科学院自动化研究所北京 100190
3.
中国科学院脑科学与智能技术卓越创新中心上海 200031

基金项目:

国家自然科学基金 61602479

北京脑科学专项基金 Z181100001518006

中国科学院战略性先导科技专项基金 XDB32070000

详细信息

作者简介:
刘广灿  哈尔滨理工大学自动化学院硕士研究生.主要研究方向为语音分离, 自然语言理解与生成.E-mail:c1240754278@163.com

曹宇   哈尔滨理工大学副教授.2009年获得哈尔滨工业大学博士学位.主要研究方向为模式识别, 机器视觉和机器人.E-mail:cyhit@163.com

徐波   中国科学院自动化研究所所长, 研究员.中国科学院脑科学与智能技术卓越创新中心副主任.长期从事人工智能研究.主要研究领域包括类脑智能, 类脑认知计算模型, 自然语言处理与理解, 类脑机器人.E-mail:xubo@ia.ac.cn

通讯作者:
许家铭中国科学院自动化研究所副研究员.主要研究方向为语音处理与听觉注意, 智能问答和对话, 深度学习和强化学习.本文通信作者.E-mail:jiaming.xu@ia.ac.cn

计量
- 文章访问数: 2561
- HTML全文浏览量: 399
- PDF下载量: 144
- 被引次数: 0
出版历程
- 收稿日期: 2019-01-30
- 录用日期: 2019-06-02
- 刊出日期: 2019-08-20

Natural Language Inference Based on Adversarial Regularization

LIU Guang-Can^{1,2
,},
CAO Yu^1
,,
XU Jia-Ming^{2
, ,},
XU Bo^{2,3
,}

1.
School of Automation, Harbin University of Science and Technology, Harbin 150080
2.
Institute of Automation, Chinese Academy of Sciences(CASIA), Beijing 100190
3.
Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031

Funds:

National Natural Science Foundation of China 61602479

the Beijing Brain Science Project Z181100001518006

the Strategic Priority Research Program of the Chinese Academy of Sciences XDB32070000

More Information

Author Bio:
Master student at the School of Automation, Harbin University of Science and Technology. His research interest covers speech separation, natural language understand and generation

Associate professor at Harbin University of Science and Technology. He received his Ph. D. degree from Harbin Institute of Technology in 2009. His research interest covers pattern recognition, machine vision, and robot

Professor, president of the Institute of the Automation, Chinese Academy of Sciences, and deputy director of the Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences. His research interest covers brain-inspired intelligence, brain-inspired cognitive models, natural language processing and understanding, and brain-inspired robotics

Corresponding author: XU Jia-Ming Associate professor at the Institute of Automation, Chinese Academy of Sciences. His research interest covers speech processing and auditory attention, question and answering and dialog system, deep learning, and reinforcement learning. Corresponding author of this paper

摘要

摘要: 目前自然语言推理（Natural language inference，NLI）模型存在严重依赖词信息进行推理的现象.虽然词相关的判别信息在推理中占有重要的地位，但是推理模型更应该去关注连续文本的内在含义和语言的表达，通过整体把握句子含义进行推理，而不是仅仅根据个别词之间的对立或相似关系进行浅层推理.另外，传统有监督学习方法使得模型过分依赖于训练集的语言先验，而缺乏对语言逻辑的理解.为了显式地强调句子序列编码学习的重要性，并降低语言偏置的影响，本文提出一种基于对抗正则化的自然语言推理方法.该方法首先引入一个基于词编码的推理模型，该模型以标准推理模型中的词编码作为输入，并且只有利用语言偏置才能推理成功；再通过两个模型间的对抗训练，避免标准推理模型过多依赖语言偏置.在SNLI和Breaking-NLI两个公开的标准数据集上进行实验，该方法在SNLI数据集已有的基于句子嵌入的推理模型中达到最佳性能，在测试集上取得了87.60%的准确率；并且在Breaking-NLI数据集上也取得了目前公开的最佳结果.
- 深度学习 /
- 自然语言推理 /
- 语言偏置 /
- 对抗正则化
Abstract: At present, natural language inference (NLI) models rely heavily on word information. Although the discriminant information related to the words plays an important role in inference, the inference models should pay more attention to the internal meaning of continuous text and the expression of language, and carry out inference through an overall grasp of sentence meaning rather than make shallow inference based on the opposition or similarity between individual words. In addition, the traditional supervised learning method makes the model rely too much on the language priori of the training set, and lacks the understanding of the language logic. In order to explicitly emphasize the importance of the learning sequence encoding and reduce the impact of language bias, this paper proposes a natural language inference method based on adversarial regularization. This method firstly introduces an inference model based on word encoding, which takes the word encoding in the standard inference model as input, and it can infer successfully only by using language bias. Then, through the adversarial training between the two models, the standard inference model can avoid relying too much on language bias. Experiments were carried out on two open standard datasets, SNLI and Breaking-NLI. On the SNLI dataset, the method achieves the best performance in existing inference models based on sentence embedding, and achieves 87.60% accuracy in test set. And the inference model has achieved state-of-the-art result on the Breaking-NLI dataset.
- Deep learning /
- natural language inference (NLI) /
- language bias /
- adversarial regularization
注释:

1) 本文责任编委张军平

HTML全文

图 1 自然语言推理(NLI)整体结构框图

Fig. 1 The structure of natural language inference (NLI)

下载: 全尺寸图片幻灯片

图 2 基于对抗正则化的自然语言推理模型结构框图

Fig. 2 The structure of natural language inference model based on adversarial regularization

下载: 全尺寸图片幻灯片

图 3 词编码器和句子编码器网络结构

Fig. 3 Word encoder and sentence encoder network structure

下载: 全尺寸图片幻灯片

表 1 SNLI数据集上的三个例子

Table 1 Three examples from the SNLI dataset

	Premise (前提)	Hypothesis (假设)	Label (标签)
	A soccer game with multiple males playing.	Some men are playing a sport.	Entailment
(译文)	一场有多名男子参加的足球比赛.	有些男人在做运动.	蕴涵
	A person on a horse jumps over a broken down airplane.	A person is training his horse for a competition.	Neutral
(译文)	一个人骑着马跳过了一架坏掉的飞机.	为了参加比赛, 一个人正在训练他的马.	中立
	A black race car starts up in front of a crowd of people.	A man is driving down a lonely road.	Contradiction
(译文)	一辆黑色赛车在一群人面前启动.	一个男人开着车行驶在荒凉的路上.	矛盾

下载: 导出CSV

表 2 不同方法在SNLI上的实验结果(%)

Table 2 Experimental results for different methods on SNLI (%)

对比方法	模型	训练准确率	测试准确率
Mou等^[13] (2015)	300D Tree-based CNN encoders	83.3	82.1
Liu等^[12] (2016)	600D (300 + 300) BiLSTM encoders	86.4	83.3
Liu等^[12] (2016)	600D BiLSTM encoders with intra-attention	84.5	84.2
Conneau等^[34] (2017)	4096D BiLSTM with max-pooling	85.6	84.5
Shen等^[6] (2017)	Directional self-attention network encoders	91.1	85.6
Yi等^[7] (2018)	300D CAFE (no cross-sentence attention)	87.3	85.9
Im等^[16] (2017)	Distance-based Self-Attention Network	89.6	86.3
Kim等^[35] (2018)	DRCN (-Attn, -Flag)	91.4	86.5
Talman等^[36] (2018)	600D HBMP	89.9	86.6
Chen等^[37] (2018)	600D BiLSTM with generalized pooling	94.9	86.6
Kiela等^[38] (2018)	512D Dynamic Meta-Embeddings	91.6	86.7
Yoon等^[17] (2018)	600D Dynamic Self-Attention Model	87.3	86.8
Yoon等^[17] (2018)	Multiple-Dynamic Self-Attention Model	89.0	87.4
本文方法	BiLSTM_MP	89.46	86.51
本文方法	EMRIM	92.71	87.36
本文方法	BiLSTM_MP + AR	89.02	86.73
本文方法	EMRIM + AR	93.26	$\textbf{87.60}$

下载: 导出CSV

表 3 不同方法在Breaking-NLI上的测试结果

Table 3 Experimental results for different methods on Breaking-NLI

模型	测试准确率(%)
Decomposable Attention^[39]	51.9
Residual-Stacked-Encoder^[40]	62.2
ESIM^[8]	65.6
KIM^[41]	83.5
EMRIM	88.37
EMRIM + AR	$\textbf{89.96}$

下载: 导出CSV

表 4 权重$\lambda$对NLI准确率的影响

Table 4 Impact of weight $\lambda$ on NLI accuracy

权重值	测试准确率(%)
0.5	86.90
0.25	87.14
0.10	87.60
0.05	87.35
0.01	87.39

下载: 导出CSV

参考文献(41)

[1]	郭茂盛, 张宇, 刘挺.文本蕴含关系识别与知识获取研究进展及展望.计算机学报, 2017, 40(4):889-910 http://d.old.wanfangdata.com.cn/Periodical/jsjxb201704008 Guo Mao-Sheng, Zhang Yu, Liu Ting. Research advances and prospect of recognizing textual entailment and knowledge acquisition. Acta Automatica Sinica, 2017, 40(4):889 -910 http://d.old.wanfangdata.com.cn/Periodical/jsjxb201704008
[2]	任函, 冯文贺, 刘茂福, 万菁.基于语言现象的文本蕴涵识别.中文信息学报, 2017, 31(1):184-191 http://d.old.wanfangdata.com.cn/Periodical/zwxxxb201701024 Ren Han, Feng Wen-He, Liu Mao-Fu, Wan Jing. Recognizing textual entailment based on inference phenomena. Journal of Chinese Information Processing, 2017, 31(1):184-191 http://d.old.wanfangdata.com.cn/Periodical/zwxxxb201701024
[3]	Bowman S R, Angeli G, Potts C, Manning C D. A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal: Association for Computational Linguistics, 2015.632-642
[4]	Glockner M, Shwartz V, Goldberg Y. Breaking nli systems with sentences that require simple lexical inferences. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Short Papers). Melbourne, Australia: Association for Computational Linguistics, 2018.650-655 https://arxiv.org/abs/1805.02266
[5]	Gong Y, Luo H, Zhang J. Natural language inference over interaction space. In: Proceedings of the 6th International Conference on Learning Representations. 2018.
[6]	Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C. DiSAN: directional self-attention network for RNN/CNN-free language understanding. In: Proceedings of the 32nd AAAI Conference on Articifical Intelligence. New Orleans, USA: AAAI, 2018.5446-5445
[7]	Yi T, Luu A T, Siu C H. Compare, compress and propagate: enhancing neural architectures with alignment factorization for natural language inference. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics, 2018.1565-1575 https://arxiv.org/abs/1801.00102
[8]	Chen Q, Zhu X, Ling Z H, Wei S, Jiang H, Inkpen D. Enhanced LSTM for natural language inference. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada: Association for Computational Linguistics, 2017.1657-1668
[9]	Zhang K, Lv G, Wu L, Chen E, Liu Q, Wu H, Wu F. Image-enhanced multi-level sentence representation net for natural language inference. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM). Singapore, Singapore: IEEE, 2018.747-756
[10]	Camburu O M, Rocktäschel T, Lukasiewicz T, Blunsom P. e-SNLI: natural language inference with natural language explanations. In: Proceedings of the 32nd Conference on Neural Information Processing Systems. Montréal, Canada: NeurIPS, 2018.9560-9572
[11]	Poliak A, Naradowsky J, Haldar A, Rudinger R, Van Durme B. Hypothesis only baselines in natural language inference. In: Proceedings of the 7th Joint Conference on Lexical and Computational Semantics. Vancouver, Canada, 2018.180-191 https://arxiv.org/abs/1805.01042
[12]	Liu Y, Sun C, Lin L, Wang X. Learning natural language inference using bidirectional LSTM model and inner-attention. arXiv preprint arXiv: 1605.09090, 2016.
[13]	Mou L, Men R, Li G, Xu Y, Zhang L, Yan R, et al. Natural language inference by tree-based convolution and heuristic matching. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany: Association for Computational Linguistics, 2016.130 https://arxiv.org/abs/1512.08422
[14]	Munkhdalai T, Yu H. Neural semantic encoders. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia, Spain, 2017.397-407
[15]	Shen T, Zhou T, Long G, Jiang J, Wang S, Zhang C. Reinforced self-attention network: a hybrid of hard and soft attention for sequence modeling. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. New Orleans, USA: AAAI, 2018.4345-4352
[16]	Im J, Cho S. Distance-based self-attention network for natural language inference. arXiv preprint arXiv: 1712.02047, 2017.
[17]	Yoon D, Lee D, Lee S K. Dynamic self-attention: computing attention over words dynamically for sentence embedding. arXiv preprint arXiv: 1808.07383, 2018.
[18]	Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, WardeFarley D, Ozair S, et al. Generative adversarial nets. In: Proceedings of the 2014 Conference on Advances in Neural Information Processing Systems. Montreal, Canada: Curran Associates, Inc., 2014.2672-2680
[19]	Mao X, Li Q, Xie H, Lau R Y, Wang Z, Smolley S P. Least squares generative adversarial networks. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017.2813-2821 https://arxiv.org/abs/1611.04076
[20]	Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. In: Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: ICML, 2017.214-233
[21]	Zhu J Y, Zhang R, Pathak D, Darrell T, Efros A A, Wang O, et al. Toward multimodal image-to-image translation. In: Proceedings of the 31st Conference on Neural Information Processing Systems. Long Beach, USA: NIPS, 2017.465-476 https://arxiv.org/abs/1711.11586
[22]	Yi Z, Zhang H, Tan P, Gong M. Dualgan: unsupervised dual learning for image-to-image translation. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017.2849-2857
[23]	Yu L T, Zhang W N, Wang J, Yu Y. Seqgan: sequence generative adversarial nets with policy gradient. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI 2017.2852-2858
[24]	Lin K, Li D, He X, Zhang Z, Sun M T. Adversarial ranking for language generation. In: Proceedings of the 31st Conference on Neural Information Processing Systems. Long Beach, CA, USA: NIPS, 2017.3155-3165 https://arxiv.org/abs/1705.11001
[25]	Tzeng E, Hoffman J, Saenko K, Darrell T. Adversarial discriminative domain adaptation. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, Hawaii, USA: IEEE, 2017.7167-7176
[26]	Lample G, Zeghidour N, Usunier N, Bordes A, Denoyer L. Fader networks: manipulating images by sliding attributes. In: Proceedings of the 31st Conference in Neural Information Processing Systems. Long Beach, CA, USA: NIPS, 2017.5967-5976
[27]	Pennington J, Socher R, Manning C. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics, 2014.1532-1543
[28]	Yang Z, Dhingra B, Yuan Y, Hu J, Cohen W W, Salakhutdinov R. Words or characters? fine-grained gating for reading comprehension. In: Proceedings of the 5th International Conference on Learning Representations. 2017.
[29]	McCann B, Bradbury J, Xiong C, Socher R. Learned in translation: contextualized word vectors. In: Proceedings of the 31st Conference in Neural Information Processing Systems. Long Beach, CA, USA: NIPS, 2017.6294-6305
[30]	Anastasopoulos A, Chiang D. Tied multitask learning for neural speech translation. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans, USA: Association for Computational Linguistics, 2018.82-91 https://arxiv.org/abs/1802.06655
[31]	Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv: 1207.0580, 2012.
[32]	Ba J L, Kiros J R, Hinton G E. Layer normalization. arXiv preprint arXiv: 1607.06450, 2016.
[33]	Kingma D P, Ba J. Adam: a method for stochastic optimization. arXiv preprint arXiv: 1412.6980, 2014.
[34]	Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A. Supervised learning of universal sentence representations from natural language inference data. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark: Association for Computational Linguistics, 2017.670-680
[35]	Kim S, Hong J H, Kang I, Kwak N. Semantic sentence matching with densely-connected recurrent and co-attentive information. arXiv preprint arXiv: 1805.11360, 2018.
[36]	Talman A, Yli-Jyrä A, Tiedemann J. Natural language inference with hierarchical bilstm max pooling architecture. arXiv preprint arXiv: 1808.08762, 2018.
[37]	Chen Q, Ling Z H, Zhu X. Enhancing sentence embedding with generalized pooling. In: Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe, New Mexico, USA: The COLING 2018 Organizing Committee, 2018.1815-1826 https://arxiv.org/abs/1806.09828
[38]	Kiela D, Wang C, Cho K. Dynamic meta-embeddings for improved sentence representations. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics, 2018.1466-1477 https://arxiv.org/abs/1804.07983
[39]	Parikh A, Täckström O, Das D, Uszkoreit J. A decomposable attention model for natural language inference. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas, USA: Association for Computational Linguistics, 2016.2249-2255 https://arxiv.org/abs/1606.01933
[40]	Nie Y, Bansal M. Shortcut-stacked sentence encoders for multi-domain inference. In: Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP. Copenhagen, Denmark: Association for Computational Linguistics, 2017.41-45 https://arxiv.org/abs/1708.02312
[41]	Chen Q, Zhu X, Ling Z H, Inkpen D, Wei S. Neural natural language inference models enhanced with external knowledge. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia: Association for Computational Linguistics, 2018.2406-2417 https://arxiv.org/abs/1711.04289