2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于注意力机制的概念化句嵌入研究

王亚珅 黄河燕 冯冲 周强

王亚珅, 黄河燕, 冯冲, 周强. 基于注意力机制的概念化句嵌入研究. 自动化学报, 2020, 46(7): 1390-1400. doi: 10.16383/j.aas.2018.c170295
引用本文: 王亚珅, 黄河燕, 冯冲, 周强. 基于注意力机制的概念化句嵌入研究. 自动化学报, 2020, 46(7): 1390-1400. doi: 10.16383/j.aas.2018.c170295
WANG Ya-Shen, HUANG He-Yan, FENG Chong, ZHOU Qiang. Conceptual Sentence Embeddings Based on Attention Mechanism. ACTA AUTOMATICA SINICA, 2020, 46(7): 1390-1400. doi: 10.16383/j.aas.2018.c170295
Citation: WANG Ya-Shen, HUANG He-Yan, FENG Chong, ZHOU Qiang. Conceptual Sentence Embeddings Based on Attention Mechanism. ACTA AUTOMATICA SINICA, 2020, 46(7): 1390-1400. doi: 10.16383/j.aas.2018.c170295

基于注意力机制的概念化句嵌入研究

doi: 10.16383/j.aas.2018.c170295
基金项目: 

国家自然科学基金重点项目 61751201

详细信息
    作者简介:

    王亚珅  北京理工大学计算机学院博士研究生. 2012年获得北京理工大学计算机学院学士学位.主要研究方向为自然语言处理与社交网络分析. E-mail: yswang@bit.edu.cn

    冯冲  北京理工大学计算机学院副研究员. 2005年获得中国科学技术大学博士学位.主要研究方向为信息抽取和情感分析. E-mail: fengchong@bit.edu.cn

    周强  百度公司研发工程师. 2016年获得北京理工大学计算机学院硕士学位.主要研究方向为自然语言处理与社交网络分析. E-mail: qzhou@bit.edu.cn

    通讯作者:

    黄河燕  北京理工大学计算机学院教授. 1989年获得中国科学院计算技术研究所博士学位.主要研究方向为自然语言处理和机器翻译.本文通信作者. E-mail: hhy63@bit.edu.cn

Conceptual Sentence Embeddings Based on Attention Mechanism

Funds: 

National Natural Science Foundation of China 61751201

More Information
    Author Bio:

    WANG Ya-Shen   Ph. D. candidate at the School of Computer, Beijing Institute of Technology. He received his bachelor degree from the School of Computer, Beijing Institute of Technology in 2012. His research interest covers natural language processing and social media analysis

    FENG Chong   Associate professor at the School of Computer, Beijing Institute of Technology. He received his Ph. D. degree from University of Science and Technology of China in 2005. His research interest covers information extraction and sentiment analysis

    ZHOU Qiang   Research engineer at Baidu Inc. He received his master degree from Beijing Institute of Technology in 2016. His research interest covers natural language processing and deep learning

    Corresponding author: HUANG He-Yan   Professor at the School of Computer, Beijing Institute of Technology. She received her Ph. D. degree from Institute of Computer Technology, China Academy of Sciences in 1989. Her research interest covers natural language processing and machine translation. Corresponding author of this paper
  • 摘要: 大多数句嵌模型仅利用文本字面信息来完成句子向量化表示, 导致这些模型对普遍存在的一词多义现象缺乏甄别能力.为了增强句子的语义表达能力, 本文使用短文本概念化算法为语料库中的每个句子赋予相关概念, 然后学习概念化句嵌入(Conceptual sentence embedding, CSE).因此, 由于引入了概念信息, 这种语义表示比目前广泛使用的句嵌入模型更具表达能力.此外, 我们通过引入注意力机制进一步扩展概念化句嵌入模型, 使模型能够有区别地选择上下文语境中的相关词语以实现更高效的预测.本文通过文本分类和信息检索等语言理解任务来验证所提出的概念化句嵌入模型的性能, 实验结果证明本文所提出的模型性能优于其他句嵌入模型.
    Recommended by Associate Editor ZHAO Tie-Jun
    1)  本文责任编委 赵铁军
  • 图  1  CBOW模型和Skip-Gram模型

    Fig.  1  CBOW model and Skip-Gram model

    图  2  CSE-CBOW模型和CSE-SkipGram模型

    Fig.  2  CSE-CBOW model and CSE-SkipGram model

    图  3  aCSE-TYPE模型

    Fig.  3  aCSE-TYPE model

    表  1  文本分类任务实验结果

    Table  1  Evaluation results of text classification task

    数据集 NewsTitle Twitter TREC
    模型 P R F P R F P R F
    BOW 0.731 0.719 0.725 0.397 0.415 0.406 0.822 0.820 0.821
    LDA 0.720 0.706 0.713 0.340 0.312 0.325 0.815 0.811 0.813
    PV-DBOW 0.726 0.721 0.723 0.409 0.410 0.409 0.825 0.817 0.821
    PV-DM 0.745 0.738 0.741 0.424 0.423 0.423 0.837 0.824 0.830
    TWE 0.810 0.805 0.807 0.454 0.438 0.446 0.894 0.885 0.885
    SCBOW 0.812$^\beta$ 0.805$^\beta$ 0.809$^\beta$ 0.455$^\beta$ 0.439 0.449$^\beta$ 0.897$^\beta$ 0.887$^\beta$ 0.892$^\beta$
    CSE-CBOW 0.814 0.811 0.812 0.458 0.450 0.454 0.895 0.891 0.893
    CSE-SkipGram 0.827 0.819 0.823 0.477 0.447 0.462 0.899 0.894 0.896
    aCSE-SUR 0.828 0.822 0.825 0.469 0.453 0.462 0.906 0.897 0.901
    aCSE-TYPE 0.838 0.830 0.834 0.483 0.455 0.468 0.911 0.903 0.907
    aCSE-ALL 0.845$^{\alpha\beta}$ 0.832$^{\alpha\beta}$ 0.838$^{\alpha\beta}$ 0.485$^{\alpha\beta}$ 0.462$^{\alpha\beta}$ 0.473$^{\alpha\beta}$ 0.917$^{\alpha\beta}$ 0.914$^{\alpha\beta}$ 0.915$^{\alpha\beta}$
    下载: 导出CSV

    表  2  信息检索任务实验结果

    Table  2  Evaluation results of information retrieval

    查询项集合 TMB2011 TMB2012
    模型 MAP P@30 MAP P@30
    BOW 0.304 0.412 0.321 0.494
    LDA 0.281 0.409 0.311 0.486
    PV-DBOW 0.285 0.412 0.324 0.491
    PV-DM 0.327 0.431 0.340 0.524
    TWE 0.331 0.446 0.347 0.509
    SCBOW 0.333 0.448$^{\beta}$ 0.349$^{\beta}$ 0.511
    CSE-CBOW 0.337 0.451 0.344 0.512
    CSE-SkipGram 0.367 0.461 0.360 0.517
    aCSE-SUR 0.342 0.458 0.349 0.520
    aCSE-TYPE 0.373 0.466 0.365 0.525
    aCSE-ALL 0.376$^{\alpha\beta}$ 0.471$^{\alpha\beta}$ 0.369$^{\alpha\beta}$ 0.530$^{\alpha\beta}$
    下载: 导出CSV
  • [1] Harris Z S. Distributional Structure. Word, 1954, 10: 146-162 doi: 10.1080/00437956.1954.11659520
    [2] Palangi H, Deng L, Shen Y, Gao J, He X, Chen J, Song X, Ward R. Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24: 694-707 doi: 10.1109/TASLP.2016.2520371
    [3] Le Q, Mikolov T. Distributed Representations of Sentences and Documents. In: Proceedings of the 31st International Conference on Machine Learning. New York, NY, USA: ACM, 2014. 1188-1196
    [4] Liu Y, Liu Z, Chua T S, Sun M. Topical Word Embeddings. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto, CA, USA: AAAI, 2015. 2418-2424
    [5] Mikolov T, Corrado G, Chen K, Dean J. Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv: 1301.3781, 2015. http://arxiv.org/abs/1301.3781
    [6] Wang Z, Zhao K, Wang H, Dean J. Query Understanding through Knowledge-Based Conceptualization. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence. San Francisco, CA, USA: Morgan Kaufmann, 2015. 3264-3270
    [7] Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv preprint arXiv: 1409.0473, 2015. http://arxiv.org/abs/1409.0473
    [8] Lin Y, Shen S, Liu Z, Luan H, Sun M. Neural Relation Extraction with Selective Attention over Instances. In: Proceedings of the 54th Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2016. 2124-2133
    [9] Wang Y, Huang H, Feng C, Zhou Q, Gu J, Gao X. CSE: Conceptual Sentence Embeddings based on Attention Model. In: Proceedings of the 54th Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2016. 505-515
    [10] Rayner K. Eye movements in Reading and Information Processing: 20 Years of Research. Psychological Bulletin, 1998, 124: 372-422 doi: 10.1037/0033-2909.124.3.372
    [11] Nilsson M, Nivre J. Learning where to look: Modeling eye movements in reading. In: Proceedings of the 13th Conference on Computational Natural Language Learning. Stroudsburg, PA, USA: Association for Computational Linguistics, 2009. 93-101
    [12] Lai S, Xu L, Liu K, Zhao J. Recurrent Convolutional Neural Networks for Text Classification. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto, CA, USA: AAAI, 2015. 2267-2273
    [13] Kenter T, Borisov A, De Rijke M. Siamese CBOW: Optimizing Word Embeddings for Sentence Representations. In: Proceedings of the 54th Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2016. 941-951
    [14] Xiong C, Merity S, Socher R. Dynamic Memory Networks for Visual and Textual Question Answering. In: Proceedings of the 33rd International Conference on Machine Learning. New York, NY. USA: ACM, 2016. 1230-1239
    [15] Yu J. Learning Sentence Embeddings with Auxiliary Tasks for Cross-Domain Sentiment Classification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2016. 236-246
    [16] Yong Z, Meng J E, Ning W, Pratama M. Attention Pooling-based Convolutional Neural Network for Sentence Modelling. Information Sciences, 2016, 373: 388-403 doi: 10.1016/j.ins.2016.08.084
    [17] Wang S, Zhang J, Zong C. Learning sentence representation with guidance of human attention. In: Proceedings of IJCAI International Joint Conference on Artificial Intelligence. San Francisco, CA, USA: Morgan Kaufmann, 2017. 4137-4143
    [18] Lang Z, Gu X, Zhou Q, Xu T. Combining statistics-based and CNN-based information for sentence classification. In: Proceedings of the 28th IEEE International Conference on Tools with Artificial Intelligence. New York, NY. USA: IEEE, 2017. 1012-1018
    [19] Wieting J, Gimpel K. Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2017. 2078-2088
    [20] Nicosia M, Moschitti A. Learning Contextual Embeddings for Structural Semantic Similarity using Categorical Information. In: Proceedings of the 21st Conference on Computational Natural Language Learning. Stroudsburg, PA, USA: Association for Computational Linguistics, 2017. 260-270
    [21] Peng W, Wang J, Zhao B, Wang L. Identification of Protein Complexes Using Weighted PageRank-Nibble Algorithm and Core-Attachment Structure. IEEE/ACM Transactions on Computational Biology & Bioinformatics, 2015, 12: 179-192 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=abeae9028ba365cfcf0b6aadc978a058
    [22] Hua W, Wang Z, Wang H, Zheng K, Zhou X. Short text understanding through lexical-semantic analysis. In: Proceedings of the 31st IEEE International Conference on Data Engineering. New York, NY. USA: IEEE, 2015. 495-506
    [23] Song Y, Wang H. Open Domain Short Text Conceptualization: A Generative + Descriptive Modeling Approach. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence. San Francisco, CA, USA: Morgan Kaufmann, 2015. 3820-3826
    [24] Wang F, Wang Z, Li Z, Wen J R. Concept-based Short Text Classification and Ranking. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. New York, NY. USA: ACM, 2014. 1069-1078
    [25] Wang Y, Huang H, Feng C. Query Expansion Based on a Feedback Concept Model for Microblog Retrieval. In: Proceedings of the 26th International Conference on World Wide Web. New York, NY. USA: ACM, 2017. 559-568
    [26] Wu W, Li H, Wang H, Zhu K Q. Probase: A probabilistic taxonomy for text understanding. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. New York, NY. USA: ACM, 2012. 481-492
    [27] Yu Z, Wang H, Lin X, Wang M. Understanding Short Texts through Semantic Enrichment and Hashing. IEEE Transactions on Knowledge and Data Engineering, 2016, 28: 566-579 doi: 10.1109/TKDE.2015.2485224
    [28] Itti L, Koch C, Niebur E. A Model of Saliency Based Visual Attention for Rapid Scene Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20: 1254-1259 doi: 10.1109/34.730558
    [29] Hahn M, Keller F. Modeling Human Reading with Neural Attention. Psychological Bulletin, 2016, 85: 618-627 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=Arxiv000001371070
    [30] Narayanan S, Jurafsky D. A Bayesian Model Predicts Human Parse Preference and Reading Times in Sentence Processing. Advances in Neural Information Processing Systems, 2001, 14: 59-65 http://cn.bing.com/academic/profile?id=295791b42c4db2c7f0cea418567b841e&encoded=0&v=paper_preview&mkt=zh-cn
    [31] Demberg V, Keller F. Data from eye-tracking corpora as evidence for theories of syntactic processing complexity. Cognition, 2016, 109: 193-210 http://cn.bing.com/academic/profile?id=8f52c0a66bab4fbea159c968896ec065&encoded=0&v=paper_preview&mkt=zh-cn
    [32] Barrett M, Sogaard A. Reading behavior predicts syntactic categories. In: Proceedings of the 2015 SIGNLL Conference on Computational Natural Language Learning. Stroudsburg, PA, USA: Association for Computational Linguistics, 2015. 345-349
    [33] Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed Representations of Words and Phrases and their Compositionality. In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2013. 1-9
    [34] Fernandez-Carbajales V, García M A, MartU.S.A.nez J M. Visual attention based on a joint perceptual space of color and brightness for improved video tracking. Pattern Recognit, 2016, 60: 571-584
    [35] Attneave F. Applications of information theory to psychology: a summary of basic concepts, methods, and results. American Journal of Psychology, 1961, 74(2): 319-324 doi: 10.2307/1419430
    [36] Hale J. A probabilistic earley parser as a psycholinguistic model. In: Proceedings of the 2nd meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies. Stroudsburg, PA, USA: Association for Computational Linguistics, 2001. 1-8
    [37] Ounis I, Macdonald C, Lin J. Overview of the trec-2011 microblog track. In: Proceedings of the 2011 Text REtrieval Conference. Gaithersburg, MD, USA: NIST, 2001. 1-9
    [38] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation. Journal of Machine Learning Research, 2003, 3: 993-1022 http://d.old.wanfangdata.com.cn/Periodical/jsjyy201306024
    [39] Fan R E, Chang K W, Hsieh C J, Wang X R, Lin C J. LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research, 2008, 9: 1871-1874 http://d.old.wanfangdata.com.cn/Periodical/dzjs201506001
  • 加载中
图(3) / 表(2)
计量
  • 文章访问数:  1234
  • HTML全文浏览量:  129
  • PDF下载量:  177
  • 被引次数: 0
出版历程
  • 收稿日期:  2017-06-02
  • 录用日期:  2018-03-24
  • 刊出日期:  2020-07-24

目录

    /

    返回文章
    返回