基于多层关系图模型的中文评价对象与评价词抽取方法

廖祥文; 陈兴俊; 魏晶晶; 陈国龙; 程学旗

doi:10.16383/j.aas.2017.c160060

基于多层关系图模型的中文评价对象与评价词抽取方法

doi: 10.16383/j.aas.2017.c160060

廖祥文^1,3,,
陈兴俊^1,3,,
魏晶晶^2,3,,
陈国龙^1,3, ,,
程学旗^4,

1.
福州大学数学与计算机科学学院福州 350116
2.
福建江夏学院电子信息科学学院福州 350108
3.
福建省网络计算与智能信息处理重点实验室 (福州大学) 福州 350116
4.
中国科学院计算技术研究所网络数据科学与技术重点实验室北京 100190

基金项目:

国家自然科学基金青年项目 61300105

中国科学院网络数据科学与技术重点实验室开放基金课题 CASNDST20140X

详细信息

作者简介:
廖祥文福州大学数学与计算机学院副教授.主要研究方向为文本倾向性检索与挖掘.E-mail:liaoxw@fzu.edu.cn

陈兴俊福州大学数学与计算机学院硕士研究生.主要研究方向为文本倾向性检索与挖掘.E-mail:346993131@qq.com

魏晶晶福州大学物理与信息工程学院博士研究生.主要研究方向为观点挖掘.E-mail:weijj@fjjxu.edu.cn

程学旗中国科学院网络数据科学与工程重点实验室主任.主要研究方向为大数据分析与挖掘.E-mail:cxq@ict.ac.cn

通讯作者:
陈国龙福州大学数学与计算机科学学院教授.主要研究方向为智能信息处理.本文通信作者.E-mail:cgl@fzu.edu.cn

计量
- 文章访问数: 2989
- HTML全文浏览量: 293
- PDF下载量: 798
- 被引次数: 0
出版历程
- 收稿日期: 2016-01-20
- 录用日期: 2016-06-17
- 刊出日期: 2017-03-20

A Multi-layer Relation Graph Model for Extracting Opinion Targets and Opinion Words

LIAO Xiang-Wen^{1,3
,},
CHEN Xing-Jun^{1,3
,},
WEI Jing-Jing^{2,3
,},
CHEN Guo-Long^{1,3
, ,},
CHENG Xue-Qi^4
,

1.
College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116
2.
College of Electronics and Information Science, Fujian Jiangxia University, Fuzhou 350108
3.
Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, Fuzhou 350116
4.
Key Laboratory of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190

Funds:

National Natural Science Foundation of China 61300105

Key Laboratory of Network Data Science & Technology, Chinese Science and Technology Foundation CASNDST20140X

More Information

Author Bio:
Associate professor at College of Mathematics and Computer Science, Fuzhou University. His research interest covers text orientation retrieval and mining

Master student at the College of Mathematics and Computer Science, Fuzhou University. His research interest covers text orientation retrieval and mining

Ph.D. candidate at the College of Physics and Information Engineering, Fuzhou University. Her main research interest is opinion mining

Director of Key Laboratory of Network Data Science & Technology, Chinese Science and Technology. His research interest covers big data analysis and mining

Corresponding author: CHEN Guo-Long Professor at the College of Mathematics and Computer Science, Fuzhou University. His main research interest is intelligent information processing. Corresponding author of this paper

摘要

摘要: 中文评价对象与评价词抽取是文本倾向性分析的重要问题.如何利用评价对象与评价词之间的语法、共现等关系设计模型是提高抽取精度的关键.本文提出了一种基于多层关系图模型的中文评价对象与评价词抽取方法.该方法首先利用词对齐模型抽取评价对象与评价词搭配；然后，考虑评价对象与评价词的依存句法关系、评价对象内部的共现关系和评价词内部的共现关系，建立多层情感关系图，接着利用随机游走方法计算候选评价对象与评价词的置信度；最后，选取置信度高的候选评价对象与评价词作为输出.实验结果表明，与现有的方法相比，本文所提出的方法不仅对评价对象和评价词的抽取精度均有显著提升，而且具有良好的鲁棒性.
- 倾向性分析 /
- 观点挖掘 /
- 依存句法分析 /
- 随机游走
Abstract: Mining opinion targets and opinion words is a fundamental task for the Chinese online media to mine opinion and analyze sentiment. The key to enhancing the effectiveness of opinion target and opinion word is to integrate syntactic relations and co-occurrence relations between opinion target and opinion word into the mining model. A novel approach based on a multi-layer relation graph model is proposed to extract opinion targets and opinion words from Chinese social media. First, the word alignment model is employed to extract the candidates of opinion target and opinion word candidates. Second, a multi-layer relation graph is constructed by the syntactic inter-relations between opinion target and opinion word, the co-occurrence intra-relations among opinion targets, and the co-occurrence intra-relations among opinion words. Third, a random-walk algorithm is adopted to calculate the confidence of each opinion target candidate and opinion word candidate. Finally, opinion targets and opinion words are extracted according to their confidence values. Experimental results show that the presented method can not only achieve significant improvement over previous methods, but also have good robustness.
- Sentiment analysis /
- opinion mining /
- dependency syntactic parsing /
- random walk

HTML全文

图 1 两种不同的依存句法关系示例

Fig. 1 Two different syntactical dependency examples

下载: 全尺寸图片幻灯片

图 2 依存句法分析示例

Fig. 2 The example of dependency syntactic parsing

下载: 全尺寸图片幻灯片

图 3 评价对象、句法模式和评价词的多层关系图

Fig. 3 The multi-layer relation graph on opinion target, syntactic pattern and opinion word

下载: 全尺寸图片幻灯片

图 4 评价对象抽取的宏平均和微平均结果

Fig. 4 Micro and macro results of opinion targets extraction

下载: 全尺寸图片幻灯片

图 5 评价词抽取的宏平均和微平均结果

Fig. 5 Micro and macro results of opinion words extraction

下载: 全尺寸图片幻灯片

图 6 不同 $\alpha$ , $\beta$ 的评价对象抽取结果

Fig. 6 Different results of opinion targets extraction according to $\alpha$ , $\beta$

下载: 全尺寸图片幻灯片

图 7 不同 $\alpha$ , $\beta$ 的评价词抽取结果

Fig. 7 Different results of opinion words extraction according to $\alpha$ , $\beta$

下载: 全尺寸图片幻灯片

表 1 语料库中候选项的统计信息

Table 1 The statistics of candidate terms in the corpus

	${T_i}$	$\sim{T_i}$
${T_j}$	${K_1}({T_i},{T_j})$	${K_2}(\sim{T_i},{T_j})$
$\sim{T_j}$	${K_3}({T_i},\sim{T_j})$	${K_4}(\sim{T_i},\sim{T_j})$

下载: 导出CSV

表 2 语料统计表

Table 2 The description of dataset

领域	句子总数	评价对象数	评价词数
电子产品	15 051	3 593	5 068
影视娱乐	7 915	997	1 344
金融证券	6 382	534	605

下载: 导出CSV

表 3 评价对象抽取的对比实验结果

Table 3 The results of our method VS baseline on opinion

领域	方法	Precision	Recall	${\rm{F1\_measure}}$
电子产品	${\rm{COAE}}\_{\rm{BEST}}$	0.223	0.160	0.186
	LIU	0.199	0.221	0.209
	${\rm{WAM\_U}}$	0.338	0.376	0.356
	${\rm{WAM\_I}}$	0.346	0.385	0.364
	MULTI	0.365	0.406	0.384
影视娱乐	${\rm{COAE}}\_{\rm{BEST}}$	0.203	0.053	0.078
	LIU	0.102	0.154	0.123
	${\rm{WAM\_U}}$	0.173	0.261	0.208
	${\rm{WAM\_I}}$	0.185	0.279	0.223
	MULTI	0.171	0.258	0.206
金融证券	${\rm{COAE}}\_{\rm{BEST}}$	0.090	0.045	0.045
	LIU	0.210	0.315	0.252
	${\rm{WAM\_U}}$	0.120	0.180	0.144
	${\rm{WAM\_I}}$	0.183	0.273	0.219
	MULTI	0.275	0.412	0.330

下载: 导出CSV

表 4 评价词抽取的对比实验结果

Table 4 The results of our method VS baseline on opinion word

领域	方法	Precision	Recall	${\rm{F1\_measure}}$
电子产品	${\rm{COAE}}\_{\rm{BEST}}$	0.218	0.113	0.126
	${\rm{WAM\_U}}$	0.187	0.221	0.203
	${\rm{WAM\_I}}$	0.197	0.233	0.214
	MULTI	0.326	0.386	0.353
影视娱乐	${\rm{COAE}}\_{\rm{BEST}}$	0.130	0.096	0.064
	${\rm{WAM\_U}}$	0.081	0.120	0.096
	${\rm{WAM\_I}}$	0.076	0.113	0.091
	MULTI	0.118	0.176	0.141
金融证券	${\rm{COAE}}\_{\rm{BEST}}$	0.044	0.121	0.049
	${\rm{WAM\_U}}$	0.048	0.079	0.060
	${\rm{WAM\_I}}$	0.035	0.058	0.044
	MULTI	0.175	0.289	0.218

下载: 导出CSV

参考文献(20)

[1]	黄萱菁, 赵军.中文文本情感倾向性分析.中国计算机学会通讯, 2008, 4(2): 39-47 Huang Xuan-Jing, Zhao Jun. Chinese text sentiment orientation analysis. Communications of the CCF, 2008, 4(2): 39-47
[2]	Zhuang L, Jing F, Zhu X Y. Movie review mining and summarization. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. Arlington, Virginia, USA: ACM, 2006. 43-50
[3]	Wang B, Wang H F. Bootstrapping both product features and opinion words from Chinese customer reviews with cross-inducing. In: Proceedings of the 3rd International Joint Conference on Natural Language Processing. Hyderabad, India, 2008. 289-295
[4]	Kim S M, Hovy E. Identifying opinion holders for question answering in opinion texts. In: Proceedings of AAAI-05 Workshop on Question Answering in Restricted Domains. Pennsylvania, USA: AAAI, 2005. 1367-1373
[5]	Jin W, Ho H H, Srihari R K. OpinionMiner: a novel machine learning system for web opinion mining and extraction. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris, France: ACM, 2009. 1195-1204
[6]	Wu Y B, Zhang Q, Huang X J, Wu L D. Phrase dependency parsing for opinion mining. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2009. 1533-1541
[7]	Jakob N, Gurevych I. Extracting opinion targets in a single-and cross-domain setting with conditional random fields. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Vancouver, British Columbia, Canada: Association for Computational Linguistics, 2010. 1035-1045
[8]	Hu M Q, Liu B. Mining and summarizing customer reviews. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Seattle, USA: ACM, 2004. 168-177
[9]	Qiu G, Liu B, Bu J J, Chen C. Opinion word expansion and target extraction through double propagation. Computational Linguistics, 2011, 37(1): 9-27 doi: 10.1162/coli_a_00034
[10]	Xu L H, Liu K, Lai S W, Chen Y B, Zhao J. Mining opinion words and opinion targets in a two-stage framework. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Sofia, Bulgaria: ACL, 2013. 1764-1773
[11]	Liu K, Xu L H, Zhao J. Syntactic patterns versus word alignment: extracting opinion targets from online reviews. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Sofia, Bulgaria: ACL, 2013. 1754-1763
[12]	Liu K, Xu L H, Zhao J. Opinion target extraction using word-based translation model. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Jeju Island, South Korea: Association for Computational Linguistics, 2012. 1346-1356
[13]	陈兴俊, 魏晶晶, 廖祥文, 简思远, 陈国龙.基于词对齐模型的中文评价对象与评价词抽取.山东大学学报 (理学版), 2016, 51(1): 58-64, 70 http://www.cnki.com.cn/Article/CJFDTOTAL-SDDX201601008.htm Chen Xing-Jun, Wei Jing-Jing, Liao Xiang-Wen, Jian Si-Yuan, Chen Guo-Long. Extraction of opinion targets and opinion words from Chinese sentences based on word alignment model. Journal of Shandong University (Natural Science), 2016, 51(1): 58-64, 70 http://www.cnki.com.cn/Article/CJFDTOTAL-SDDX201601008.htm
[14]	Zhou X J, Wan X J, Xiao J G. Cross-language opinion target extraction in review texts. In: Proceedings of the 12th IEEE International Conference on Data Mining. Brussels, Belgium: IEEE, 2012. 1200-1205
[15]	Li S S, Wang R Y, Zhou G D. Opinion target extraction using a shallow semantic parsing framework. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence. Toronto, Canada: AAAI, 2012. 1671-1677
[16]	Huang H, Liu Q T, Huang T. Appraisal expression recognition based on generalized mutual information. Journal of Computers, 2013, 8(7): 1715-1721 https://www.researchgate.net/publication/272798717_Appraisal_Expression_Recognition_Based_on_Generalized_Mutual_Information
[17]	赵妍妍, 秦兵, 车万翔, 刘挺.基于句法路径的情感评价单元识别.软件学报, 2011, 22(5): 887-898 doi: 10.3724/SP.J.1001.2011.03767 Zhao Yan-Yan, Qin Bing, Che Wan-Xiang, Liu Ting. Appraisal expression recognition based on syntactic path. Journal of Software, 2011, 22(5): 887-898 doi: 10.3724/SP.J.1001.2011.03767
[18]	Li F T, Pan S J, Jin O, Yang Q, Zhu X Y. Cross-domain co-extraction of sentiment and topic lexicons. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Jeju Island, South Korea: Association for Computational Linguistics, 2012. 410-419
[19]	Brown P F, Della Pietra V J, Della Pietra S A, Mercer R L. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 1993, 19(2): 263-311 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.8919
[20]	舒万里. 中文领域本体学习中概念和关系抽取的研究[硕士学位论文], 重庆大学, 中国, 2012 Shu Wan-Li. Research on Concept and Relation Extraction of Chinese Domain Ontology [Master dissertation], Chongqing University, China, 2012

施引文献

资源附件(0)

访问统计

图(7) / 表(4)

计量

文章访问数: 2989
HTML全文浏览量: 293
PDF下载量: 798
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于多层关系图模型的中文评价对象与评价词抽取方法

doi: 10.16383/j.aas.2017.c160060

通讯作者:
陈国龙福州大学数学与计算机科学学院教授.主要研究方向为智能信息处理.本文通信作者.E-mail:cgl@fzu.edu.cn

计量

A Multi-layer Relation Graph Model for Extracting Opinion Targets and Opinion Words

Corresponding author: CHEN Guo-Long Professor at the College of Mathematics and Computer Science, Fuzhou University. His main research interest is intelligent information processing. Corresponding author of this paper

计量

目录

留言板

基于多层关系图模型的中文评价对象与评价词抽取方法

doi: 10.16383/j.aas.2017.c160060

通讯作者: 陈国龙 福州大学数学与计算机科学学院教授.主要研究方向为智能信息处理.本文通信作者.E-mail:cgl@fzu.edu.cn

计量

出版历程

A Multi-layer Relation Graph Model for Extracting Opinion Targets and Opinion Words

Corresponding author: CHEN Guo-Long Professor at the College of Mathematics and Computer Science, Fuzhou University. His main research interest is intelligent information processing. Corresponding author of this paper

计量

出版历程

目录

通讯作者:
陈国龙福州大学数学与计算机科学学院教授.主要研究方向为智能信息处理.本文通信作者.E-mail:cgl@fzu.edu.cn