Fine-grained Entity Type Classification Based on Transfer Learning
-
摘要: 细粒度实体分类(Fine-grained entity type classification, FETC)旨在将文本中出现的实体映射到层次化的细分实体类别中. 近年来, 采用深度神经网络实现实体分类取得了很大进展. 但是, 训练一个具备精准识别度的神经网络模型需要足够数量的标注数据, 而细粒度实体分类的标注语料非常稀少, 如何在没有标注语料的领域进行实体分类成为难题. 针对缺少标注语料的实体分类任务, 本文提出了一种基于迁移学习的细粒度实体分类方法, 首先通过构建一个映射关系模型挖掘有标注语料的实体类别与无标注语料实体类别间的语义关系, 对无标注语料的每个实体类别, 构建其对应的有标注语料的类别映射集合. 然后, 构建双向长短期记忆(Bidirectional long short term memory, BiLSTM)模型, 将代表映射类别集的句子向量组合作为模型的输入用来训练无标注实体类别. 基于映射类别集中不同类别与对应的无标注类别的语义距离构建注意力机制, 从而实现实体分类器以识别未知实体分类. 实验证明, 我们的方法取得了较好的效果, 达到了在无任何标注语料前提下识别未知命名实体分类的目的.Abstract: The aim of fine-grained entity type classification (FETC) is that mapping the entity appearing in the text into hierarchical fine-grained entity type. In recent years, deep neural network is used to entity classification and has made great progress. However, training a neural network model with precise recognition requires a great quantity labeled data. The labeled dataset of fine-grained entity classification is so rare that hard to classify unlabeled entity. This paper proposes a fine-grained entity classification method based on transfer learning for the task of entity classification with lack labeled dataset. Firstly, we construct a mapping relation model to mining the semantic relationship between labeled entity type and unlabeled entity type, we construct a corresponding labeled entity type mapping set for each unlabeled entity type. Then, we construct a bidirectional long short term memory (BiLSTM) model, the sentence vector combination representing the mapping type set is used as the input of the model to train the unlabeled entity type. Lastly, the attention mechanism is constructed based on the semantic distance between different types in the mapping type set and corresponding unlabeled type, so as to realize entity classifier to recognize the classification of unknown entities. The experiment shows that our method have achieved good results and achieved the purpose of identifying unknown named entity classification with unlabeled dataset.
-
表 1 混淆矩阵
Table 1 Confusion matrix
预测情况 正例 反例 真实情况 正例 TP (真正例) FN (假反例) 反例 FP (假正例) TN (真反例) 表 2 超参数设置
Table 2 Hyper-parametric settings table
$L_r$ $D_w$ $D_p$ $B$ $P_i$ $P_o$ $\lambda$ 0.0002 180 85 256 0.7 0.9 0.0 表 3 数据集规模表
Table 3 Datasets size table
有标注数据集 (源领域) 无标注数据集 (目标领域) 类别数量 50 30 mention 数量 896 914 229 685 Token 数量 15 284 525 3 929 738 表 4 无标注领域不同模型对比实验
Table 4 Comparative experiment of different models inunlabeled field
模型 Acc Macro F1 Micro F1 TransNER 0.051 0.035 0.041 FNET 0.026 0.027 0.028 TLERMAM 0.369 0.290 0.355 表 5 稀疏标注领域不同模型对比实验
Table 5 Comparison experiment of different models in the field of sparse annotation
模型 Acc Macro F1 Micro F1 TransNER 0.500 0.337 0.534 FNET 0.523 0.329 0.447 TLERMAM 0.805 0.487 0.805 表 6 军事领域和文化领域的实体类别集
Table 6 Entity type set of military field andculture field
领域 实体类别 军事 terrorist_organization, weapon, attack, soldier, military, terrorist_attack, power_station, terrorist, military_conflict 文化 film, theater, artist, play, ethnicity, author, written_work, language, director, music, musician, newspaper, election, protest, broadcast_network, broadacast_program, tv_channel, religion, educational_institution, library, educational_department, educational_degree, actor, news_agency, instrument 表 7 军事领域和文化领域的数据集规模表
Table 7 Dataset size of military field and culture field
有标注数据集 (文化领域) 无标注数据集 (军事领域) 类别数量 25 9 mention 数量 226 734 126 036 Token 数量 3 927 700 2 104 890 表 8 无标注语料的军事领域实体识别效果比较
Table 8 Comparison of entity recognition in unlabeledmilitary field
模型 Acc Macro F1 Micro F1 TransNER 0.040 0.023 0.012 FNET 0.013 0.014 0.029 TLERMAM 0.257 0.339 0.339 表 9 稀疏标注语料的军事领域识别对比
Table 9 Comparison of entity recognition in military field with sparse annotated corpus
模型 Acc Macro F1 Micro F1 TransNER 0.338 0.204 0.285 FNET 0.460 0.424 0.537 TLERMAM 0.572 0.504 0.559 -
[1] MUC-6. The sixth in a Series of Message Understanding Conferences [Online], available: https://cs.nyu.edu/cs/faculty/grishman/muc6.html, 1995. [2] Grishman R. The NYU system for MUC-6 or where's the syntax? In: Proceedings of the 6th conference on Message understanding. Maryland, USA: ACL, 1995. 167−175 [3] Zhou G D, Su J. Named entity recognition using an hmmbased chunk tagger. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Philadelphia, USA: ACL, 2002. 473−480 [4] Borthwick A, Grishman R. A maximum entropy approach to named entity recognition [Ph. D. dissertation], New York University, 1999. [5] Lafferty J D, Mccallum A, Pereira F C N. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th Internaional Conference on Machine Learning. Williamstown, MA, USA: ICML, 2001. 282−289 [6] Athavale V, Bharadwaj S, Pamecha M, et al. Towards deep learning in hindi NER: an approach to tackle the labelled data scarcity. arXiv preprint arXiv: 1610.09756, 2016. [7] Huang Z, Xu W, Yu K. Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv: 1508.01991, 2015. [8] Ma X, Hovy E. End-to-end sequence labeling via bidirectional lstm-cnns-crf. arXiv preprint arXiv: 1603.01354, 2016. [9] Bharadwaj A, Mortensen D, Dyer C, et al. Phonologically aware neural model for named entity recognition in low resource transfer settings. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Texas, USA: EMNLP, 2016. 1462−1472 [10] Putthividhya D P, Hu J. Bootstrapped named entity recognition for product attribute extraction. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Edinburgh, UK: EMNLP, 2011. 1557−1567 [11] Schmitz M, Bart R, Soderland S, et al. Open language learning for information extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Jeju Island, Korea: ACL, 2012. 523−534 [12] Manning C, Surdeanu M, Bauer J, et al. The Stanford CoreNLP natural language processing toolkit. In: Proceedings of the 52nd annual meeting of the association for computational linguistics. Maryland, USA: ACL, 2014. 55−60 [13] Ling X, Weld D S. Fine-Grained entity recognition. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence. Toronto, Canada: AAAI, 2012. 94−100 [14] Gillick D, Lazic N, Ganchev K, et al. Context-dependent fine-grained entity type tagging. arXiv preprint arXiv: 1412.1820, 2014. [15] Shimaoka S, Stenetorp P, Inui K, et al. An attentive neural architecture for fine-grained entity type classification. In: Proceedings of the 5th Workshop on Automated Knowledge Base Construction. San Diego, USA: AKBC, 2016. 69−74 [16] Yang Z, Salakhutdinov R, Cohen W W. Transfer learning for sequence tagging with hierarchical recurrent networks. arXiv preprint arXiv: 1703.06345, 2017. [17] Lee J Y, Dernoncourt F, Szolovits P. Transfer learning for named-entity recognition with neural networks. arXiv preprint arXiv: 1705.06273, 2017. [18] Abhishek A, Anand A, Awekar A. Fine-grained entity type classification by jointly learning representations and label embeddings. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia, Spain: ACL, 2017. 797−807 [19] Cosine_similarity. Cosine_similarity [Online]. available: https://en.wikipedia.org/wiki/Cosine_similarity, 2018. [20] Zhou P, Shi W, Tian J, et al. Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany: ACL, 2016. 207−212 [21] Pennington J, Socher R, Manning C. Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing. Doha, Qatar: EMNLP, 2014. 1532−1543