基于上下文重构的短文本情感极性判别研究

杨震; 赖英旭; 段立娟; 李玉

doi:10.3724/SP.J.1004.2012.00055

基于上下文重构的短文本情感极性判别研究

doi: 10.3724/SP.J.1004.2012.00055

1.
北京工业大学计算机学院北京 100124

详细信息

通讯作者:
杨震北京工业大学计算机学院副教授. 主要研究方向为信息内容安全, 网络舆情分析, 可信计算. 本文通信作者.E-mail: yangzhen@bjut.edu.cn

计量
- 文章访问数: 2240
- HTML全文浏览量: 39
- PDF下载量: 1509
- 被引次数: 0
出版历程
- 收稿日期: 2011-03-28
- 修回日期: 2011-07-07
- 刊出日期: 2012-01-20

Short Text Sentiment Classification Based on Context Reconstruction

1.
College of Computer Sciences, Beijing University of Technology, Beijing 100124

摘要

摘要: 文本对象所固有的多义性,面对短文本特征稀疏和上下文缺失的情况,现有处理方法无法明辨语义,形成了底层特征和高层表达之间巨大的语义鸿沟.本文尝试借由时间、空间、联系等要素挖掘文本间隐含的关联关系,重构文本上下文范畴,提升情感极性分类性能.具体做法对应一个两阶段处理过程:1)基于短文本的内在联系将其初步重组成上下文(领域);2)将待处理短文本归入适合的上下文(领域)进行深入处理.首先给出了基于Naive Bayes分类器的短文本情感极性分类基本框架,揭示出上下文(领域)范畴差异对分类性能的影响.接下来讨论了基于领域归属划分的文本情感极性分类增强方法,并将领域的概念扩展为上下文关系,提出了基于特殊上下文关系的文本情感极性判别方法.同时为了解决由于信息缺失所造成的上下文重组困难,给出基于遗传算法的任意上下文重组方案.理论分析表明,满足限制条件的前提下,基于上下文重构的情感极性判别方法能够同时降低抽样误差(Sample error)和近似误差(Approximation error).真实数据集上的实验结果也验证了理论分析的结论.
- 舆情分析 /
- 短文本处理 /
- 情感计算 /
- 误差分析 /
- 遗传算法
Abstract: Synonymy and polysemy present a challenge to effective natural language processing, especially in the situations of context absence and sparse feature in short texts, widened semantic gap between low-level text features representation and high-level interpretation. In this work, short texts were reorganized into special context, i.e., the implied internal relationship such as time and space, and a novel two-step scheme for semantic orientation detection based on the special context was proposed. In the first step, the short texts were reorganized into special contexts by the implied internal relationship. In the second step, the unknown short text was categorized into a special context and labeled a polarity tag using the inner semantic orientation classifier. We firstly discussed the effect of special context after a sentiment classification framework based on naive Bayes classifier was presented. Then an enhancement classification method was given using field concept, which was expanded to special context. Finally, a special context reorganizing method was proposed based on genetic algorithm. Theoretical analysis shows the proposed methods can reduce the sample error and approximation error under some constraints. The experimental results in real corpora show the effectiveness of the proposed method.
- Public opinion analysis /
- short text processing /
- sentiment classification /
- error analysis /
- genetic algorithm