Modeling and Analyzing Topic Evolution
-
摘要: 根据时序关系将文本流划分为连续时间片中的文本集, 在线抽取各时间片中隐含的子话题, 采用模型选择方法动态确定各时间片包含的子话题数, 以历史时间片的子话题信息作为当前子话题发现的先验知识, 基于 OLDA (Online latent Dirichlet allocation)模型抽取各时间片包含的子话题, 通过 Gibbs 抽样对话题模型参数进行估计; 对子话题进行关联分析, 定义子话题产生、消亡、继承、分裂和合并五种演化类型, 提出基于相对熵的子话题关联分析方法, 根据子话题语义相似度和时序关系建立子话题间的关联, 由具有时序关系和内容关联的子话题组成话题, 通过子话题内容和强度的变化描述话题演化. 基于真实网络新闻的话题演化分析实验表明, 本文提出的话题演化分析方法能够有效检测 网络新闻话题内容和强度的演化.Abstract: Topic evolution of network public opinions is investigated. By treating topics as a set of correlated sub-topics, a topic evolution model is proposed, consisting of sub-topic detection and correlation analysis. Furthermore, a sub-topic detection algorithm based on OLDA is presented with Bayesian model selection for the appropriate topic numbers and parameters estimation via Gibbs sampling. The correlations are further defined for analysis of topic evolution, including emergence, extinction, development, merge and split of sub-topics. The method is experimentally verified to be efficient for detecting topic evolution of network public opinions.
计量
- 文章访问数: 1985
- HTML全文浏览量: 99
- PDF下载量: 1482
- 被引次数: 0