2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

面向产品评论分析的短文本情感主题模型

熊蜀峰 姬东鸿

熊蜀峰, 姬东鸿. 面向产品评论分析的短文本情感主题模型. 自动化学报, 2016, 42(8): 1227-1237. doi: 10.16383/j.aas.2016.c150591
引用本文: 熊蜀峰, 姬东鸿. 面向产品评论分析的短文本情感主题模型. 自动化学报, 2016, 42(8): 1227-1237. doi: 10.16383/j.aas.2016.c150591
XIONG Shu-Feng, JI Dong-Hong. A Short Text Sentiment-topic Model for Product Review Analysis. ACTA AUTOMATICA SINICA, 2016, 42(8): 1227-1237. doi: 10.16383/j.aas.2016.c150591
Citation: XIONG Shu-Feng, JI Dong-Hong. A Short Text Sentiment-topic Model for Product Review Analysis. ACTA AUTOMATICA SINICA, 2016, 42(8): 1227-1237. doi: 10.16383/j.aas.2016.c150591

面向产品评论分析的短文本情感主题模型

doi: 10.16383/j.aas.2016.c150591
基金项目: 

国家自然科学基金 61373108, 61173062, 61133012

国家社会科学重大招标计划项目 11&ZD189

详细信息
    作者简介:

    熊蜀峰 武汉大学计算机学院博士研究生,平顶山学院讲师.主要研究方向为自然语言处理,机器学习和观点挖掘.E-mail:xsf@whu.edu.cn

    通讯作者:

    姬东鸿 武汉大学计算机学院教授.主要研究方向为自然语言处理,数据挖掘和生物信息处理.本文通信作者.E-mail:dhji@whu.edu.cn

A Short Text Sentiment-topic Model for Product Review Analysis

Funds: 

National Natural Science Foundation of China 61373108, 61173062, 61133012

The Major Program of the National Social Science Foundation of China 11&ZD189

More Information
    Author Bio:

    Ph. D. candidate at the Computer School of Wuhan University and lecturer at PingDingShan University. His research interest covers natural language processing, machine learning, and opinion mining.E-mail:

    Corresponding author: JI Dong-Hong Professor at the Computer School of Wuhan University. His research interest covers natural language processing, data mining, and biological information processing. Corresponding author of this paper.
  • 摘要: 情感主题联合生成模型已经成功应用于网络评论分析.然而,随着智能终端设备的广泛应用,由于屏幕及输入限制,用户书写的评论越来越短,我们不得不面对短评论中的文本稀疏问题.本文提出了一个针对短文本的联合情感--主题模型SSTM(Short-text sentiment-topic model)来解决稀疏性问题.不同于一般主题模型中通常采用的基于文档产生过程的建模方法,我们直接对整个语料集合的产生过程建模.在产生文档集的过程中,我们每次采样一个词对,同一个词对中的词有相同的情感极性和主题.我们将SSTM模型应用于两个真实网络评论数据集.在三个实验任务中,通过定性分析验证了主题发现的有效性,并与经典方法进行定量对比,SSTM模型的文档级情感分类性能也有较大提升.
  • 图  1  两条评论文本信息

    Fig.  1  Two opinion texts

    图  2  SSTM模型的图表示

    Fig.  2  SSTM model

    图  3  JST模型的图表示

    Fig.  3  JSTmodel

    图  4  ASUM模型的图表示

    Fig.  4  ASUM model

    图  5  主题数目对三个主题模型情感识别性能的影响

    Fig.  5  The impact of topic numbers in three topic models

    表  1  论文中符号的含义

    Table  1  Meanings of the notations

    符号描述符号描述
    D文档数量βφ的非对称Dirichlet先验参数,
    M词对数量β = {{{βz, l, i}k=1T}l=1S}i=1V
    T主题数目αθ的Dirichlet先验参数
    S情感极性数γπ的Dirichlet先验参数
    V词汇表大小 Θ主题的多项式分布
    b词对, b = (wi, wj)ztt个词的主题
    wltt个词的情感极性标签
    z主题 B词对集合
    l情感极性标签{z-t}除第t个词以外的其他所有词的主题分布
    πk, l主题k和情感极性l上的分布 {l-t}除第t个词以外的其他所有词的情感极性
    Π情感极性标签的多项式分布Nk, l, iwi指派为主题k和情感极性l的次数
    φk, l, ww基于主题k和情感极性l的分布 Nk, l指派为主题k和情感极性l的词的数量
    Φ词的多项式分布N'(·)句子计数
    θk主题k的分布 Nk主题k中的词的数量
    下载: 导出CSV

    表  2  语料统计信息

    Table  2  Statistics of the text corpus

    笔记本手机
    文档平均词数2032
    评论数3 9882 289
    词汇表大小7 9648 787
    正面评论数1 9931 146
    负面评论数1 9951 943
    下载: 导出CSV

    表  3  笔记本数据集中发现的部分主题词列表

    Table  3  Example topics discovered from LAPTOP dataset

    SSTMBTMLDA
    外观电池散热性外观电池散热性外观电池散热性
    指纹电池散热电池散热容易电池
    钢琴小时容易时间指纹小时散热
    漂亮温度指纹小时不错外壳时间声音
    烤漆比较键盘键盘电池钢琴风扇
    时间烤漆比较烤漆续航
    模具续航CPU比较表面比较温度
    屏幕使用硬盘不错温度亮点使用
    外壳上网风扇外壳不错声音感觉键盘运行
    文字机器钢琴使用使用小巧
    呵呵比较屏幕续航CPU屏幕
    下载: 导出CSV

    表  4  手机数据集中发现的部分主题词列表

    Table  4  Example topics discovered from MOBILE dataset

    SSTMBTMLDA
    拍照媒体播放屏幕拍照媒体播放屏幕拍照媒体播放屏幕
    拍摄播放屏幕像素MP3屏幕效果支持屏幕
    功能速度摄像头播放摄像头MP3显示
    支持不错显示拍摄耳机显示像素播放比较
    屏幕影音数码效果TFT拍照内存色彩
    像素手机效果手机效果照片蓝牙
    材质处理器彩色支持音乐色彩拍摄清晰
    照片格式设计手机格式
    摄像头MP3TFT效果功能数码扩展铃声
    拍照流畅机子相机不错26万相机文件方便
    数码文件拍照比较像素视频TFT
    下载: 导出CSV

    表  5  笔记本数据集上的CM值(%)

    Table  5  CM(%) on laptop dataset

    方法标注员1标注员2标注员3标注员4平均值
    LDA5850605656
    BTM7066757270.75
    SSTM6964726768
    下载: 导出CSV

    表  6  手机数据集上的CM值(%)

    Table  6  CM(%) on mobile dataset

    方法标注员1标注员2标注员3标注员4平均值
    LDA6965717469.75
    BTM7674818178
    SSTM7572797876
    下载: 导出CSV

    表  7  SSTM 发现的部分情感相关的主题词列表

    Table  7  Example sentiment-specific topics discovered by SSTM

    笔记本手机
    正面负面正面负面
    快递性价比外观做工售后铃声外观按键输入法信号
    速度不错有点电话铃声设计按键短信信号
    东西价格漂亮禁用服务不错外观手感输入法网络
    京东机器喜欢触摸板耳机不错感觉
    质量便宜需要客服切换
    外观外壳送货声音感觉操作拼音检测
    发货性能本本盖子快递喜欢不错数字移动
    问题不错音乐漂亮容易麻烦关机
    比较电脑老版时尚使用质量
    很快超值键盘态度耳朵手感摇杆故障
    送货降价适合瑕疵前台效果机身舒服标点符号通话
    下载: 导出CSV

    表  8  情感极性识别结果(主题数目设置为25)

    Table  8  Sentiment identification results (The number of topics is 25.)

    基线JSTASUMSSTMSVM (Uni)SVM (Bi)
    笔记本0.6376450.506770.577540.655030.660470.70021
    手机0.6021880.536980.436940.642010.644760.68953
    下载: 导出CSV
  • [1] Fang L, Huang M L, Zhu X Y. Exploring weakly supervised latent sentiment explanations for aspect-level review analysis. In:Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management. New York, NY, USA:ACM, 2013.1057-1066 http://cn.bing.com/academic/profile?id=2061812507&encoded=0&v=paper_preview&mkt=zh-cn
    [2] 徐冰, 赵铁军, 王山雨, 郑德权. 基于浅层句法特征的评价对象抽取研究. 自动化学报, 2011, 37(10):1241-1247 http://www.aas.net.cn/CN/abstract/abstract17613.shtml

    Xu Bing, Zhao Tie-Jun, Wang Shan-Yu, Zheng De-Quan. Extraction of opinion targets based on shallow parsing features. Acta Automatica Sinica, 2011, 37(10):1241-1247 http://www.aas.net.cn/CN/abstract/abstract17613.shtml
    [3] 赵妍妍, 秦兵, 刘挺. 基于图的篇章内外特征相融合的评价句极性识别. 自动化学报, 2010, 36(10):1417-1425 http://www.aas.net.cn/CN/abstract/abstract17356.shtml

    Zhao Yan-Yan, Qin Bing, Liu Ting. Integrating intra-and inter-document evidences for improving sentence sentiment classification. Acta Automatica Sinica, 2010, 36(10):1417-1425 http://www.aas.net.cn/CN/abstract/abstract17356.shtml
    [4] Liu B. Sentiment Analysis and Opinion Mining. San Rafael, CA:Morgan Claypool Publishers, 2012.
    [5] Pang B, Lee L. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2008, 2(1-2):1-135 http://cn.bing.com/academic/profile?id=2097726431&encoded=0&v=paper_preview&mkt=zh-cn
    [6] Jo Y, Oh A H. Aspect and sentiment unification model for online review analysis. In:Proceedings of the 4th ACM International Conference on Web Search and Data Mining. New York, NY, USA:ACM, 2011.815-824
    [7] He Y L, Lin C H, Alani H. Automatically extracting polarity-bearing topics for cross-domain sentiment classification. In:Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies——Volume 1. Stroudsburg, PA, USA:Association for Computational Linguistics, 2011.123-131
    [8] Lin C H, He Y L. Joint sentiment/topic model for sentiment analysis. In:Proceedings of the 18th ACM Conference on Information and Knowledge Management. New York, NY, USA:ACM, 2009.375-384
    [9] 张林, 钱冠群, 樊卫国, 华琨, 张莉. 轻型评论的情感分析研究. 软件学报, 2014, 25(12):2790-2807 http://www.cnki.com.cn/Article/CJFDTOTAL-RJXB201412006.htm

    Zhang Lin, Qian Guan-Qun, Fan Wei-Guo, Hua Kun, Zhang Li. Sentiment analysis based on light reviews. Journal of Software, 2014, 25(12):2790-2807 http://www.cnki.com.cn/Article/CJFDTOTAL-RJXB201412006.htm
    [10] Weng J S, Lim E P, Jiang J, He Q. TwitterRank:finding topic-sensitive influential twitterers. In:Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. New York, NY, USA:ACM, 2010.261-270 http://cn.bing.com/academic/profile?id=2159681701&encoded=0&v=paper_preview&mkt=zh-cn
    [11] Hong L J, Davison B D. Empirical study of topic modeling in twitter. In:Proceedings of the 1st Workshop on Social Media Analytics. New York, NY, USA:ACM, 2010.80-88
    [12] Zhao W X, Jiang J, Weng J S, He J, Lim E P, Yan H F, Li X M. Comparing twitter and traditional media using topic models. Advances in Information Retrieval. Heidelberg, Berlin, Germany:Springer, 2011.338-349
    [13] Gruber A, Weiss Y, Rosen-Zvi M. Hidden topic Markov models. In:Proceedings of the 11th International Conference on Artificial Intelligence and Statistics. San Juan, Puerto Rico:Omnipress, 2007.163-170
    [14] Yan X H, Guo J F, Lan Y Y, Cheng X Q. A biterm topic model for short texts. In:Proceedings of the 22nd International Conference on World Wide Web. New York, NY, USA:ACM, 2013.1445-1456
    [15] Riloff E, Patwardhan S, Wiebe J. Feature subsumption for opinion analysis. In:Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA:Association for Computational Linguistics, 2006.440-448 http://cn.bing.com/academic/profile?id=2241121518&encoded=0&v=paper_preview&mkt=zh-cn
    [16] Pang B, Lee L. Seeing stars:exploiting class relationships for sentiment categorization with respect to rating scales. In:Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA, USA:Association for Computational Linguistics, 2005.115-124
    [17] Matsumoto S, Takamura H, Okumura M. Sentiment classification using word sub-sequences and dependency sub-trees. Advances in Knowledge Discovery and Data Mining. Heidelberg, Berlin, Germany:Springer, 2005:301-311
    [18] Pang B, Lee L, Vaithyanathan S. Thumbs up? sentiment classification using machine learning techniques. In:Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing——Volume 10. Stroudsburg, PA, USA:Association for Computational Linguistics, 2002.79-86
    [19] Titov I, McDonald R. Modeling online reviews with multi-grain topic models. In:Proceedings of the 17th International Conference on World Wide Web. New York, NY, USA:ACM, 2008.111-120 http://cn.bing.com/academic/profile?id=2096110600&encoded=0&v=paper_preview&mkt=zh-cn
    [20] Titov I, McDonald R T. A joint model of text and aspect ratings for sentiment summarization. In:Proceedings of ACL-08:HLT. Columbus, Ohio, USA:Association for Computational Linguistics, 2008.308-316
    [21] Li F T, Huang M L, Zhu X Y. Sentiment analysis with global topics and local dependency. In:Proceedings of the 24th AAAI Conference on Artificial Intelligence. Carol Hamilton, USA:Association for the Advancement of Artificial Intelligence, 2010.1371-1376
    [22] Wang H N, Lu Y, Zhai C X. Latent aspect rating analysis without aspect keyword supervision. In:Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA:ACM, 2011.618-626 http://cn.bing.com/academic/profile?id=2019207508&encoded=0&v=paper_preview&mkt=zh-cn
    [23] Moghaddam S, Ester M. ILDA:interdependent LDA model for learning latent aspects and their ratings from online product reviews. In:Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA:ACM, 2011.665-674
    [24] Mukherjee S, Basu G, Joshi S. Joint author sentiment topic model. In:Proceedings of the 2014 SIAM International Conference on Data Mining. Philadelphia, PA, USA:SIAM, 2014.370-378
    [25] Zhao W X, Jiang J, Yan H F, Li X M. Jointly modeling aspects and opinions with a MaxEnt-LDA hybrid. In:Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA:Association for Computational Linguistics, 2010.56-65
    [26] Li F T, Wang S, Liu S H, Zhang M. Suit:a supervised user-item based topic model for sentiment analysis. In:Proceedings of the 28th AAAI Conference on Artificial Intelligence. Carol Hamilton, USA:Association for the Advancement of Artificial Intelligence, 2014.1636-1642
    [27] Moghaddam S, Ester M. The FLDA model for aspect-based opinion mining:addressing the cold start problem. In:Proceedings of the 22nd International Conference on World Wide Web. Republic and Canton of Geneva, Switzerland:International World Wide Web Conferences Steering Committee, 2013.909-918
    [28] Zhang Y, Ji D H, Su Y, Wu H M. Joint naïve Bayes and LDA for unsupervised sentiment analysis. Advances in Knowledge Discovery and Data Mining. Heidelberg, Berlin, Germany:Springer, 2013.402-413
    [29] Zhang Y, Ji D H, Su Y, Sun C. Sentiment analysis for online reviews using an author-review-object model. Information Retrieval Technology. Heidelberg, Berlin, Germany:Springer, 2011.362-371
    [30] Moghaddam S, Ester M. On the design of LDA models for aspect-based opinion mining. In:Proceedings of the 21st ACM International Conference on Information and Knowledge Management. New York, NY, USA:ACM, 2012.803-812 http://cn.bing.com/academic/profile?id=1967274749&encoded=0&v=paper_preview&mkt=zh-cn
    [31] Li C T, Zhang J W, Sun J T, Chen Z. Sentiment topic model with decomposed prior. In:Proceedings of the 2013 SIAM International Conference on Data Mining. Philadelphia, PA:SIAM, 2013.767-775
    [32] Wang X R, McCallum A. Topics over time:a non-Markov continuous-time model of topical trends. In:Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA:ACM, 2006.424-433
    [33] Phan X H, Nguyen L M, Horiguchi S. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In:Proceedings of the 17th International Conference on World Wide Web. New York, NY, USA:ACM, 2008.91-100 http://www.oalib.com/references/5692309
    [34] Lim K W, Buntine W. Twitter opinion topic model:extracting product opinions from tweets by leveraging hashtags and sentiment lexicon. In:Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. New York, NY, USA:ACM, 2014.1319-1328
    [35] Chang J, Boyd-Graber J L, Gerrish S, Wang C, Blei D M. Reading tea leaves:how humans interpret topic models. In:Proceedings of the 2009 Advances in Neural Information Processing Systems. San Diego, CA, USA:NIPS Foundation, Inc., 2009.288-296
    [36] Xie P T, Xing E P. Integrating document clustering and topic modeling. In:Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence. Cambridge, MA, USA:Association for Uncertainty in Artificial Intelligence, 2013.
  • 加载中
图(5) / 表(8)
计量
  • 文章访问数:  2188
  • HTML全文浏览量:  1007
  • PDF下载量:  1955
  • 被引次数: 0
出版历程
  • 收稿日期:  2015-09-15
  • 录用日期:  2015-12-28
  • 刊出日期:  2016-08-01

目录

    /

    返回文章
    返回