基于LDA的双通道在线主题演化模型

曹建平; 王晖; 夏友清; 乔凤才; 张鑫

doi:10.3724/SP.J.1004.2014.02877

基于LDA的双通道在线主题演化模型

doi: 10.3724/SP.J.1004.2014.02877

1.
国防科学技术大学信息系统与管理学院长沙 410073

基金项目:

国家自然科学基金(61105124,60902091)资助

详细信息

作者简介:
王晖国防科学技术大学计算实验与平行系统技术研究中心教授. 主要研究方向为多媒体情报分析与数据挖掘.E-mail: huiwang@nudt.edu.cn

通讯作者:
曹建平国防科学技术大学信息系统与管理学院博士研究生. 主要研究方向为文本分析, 平行系统理论. 本文通信作者.E-mail: caojianping@nudt.edu.cn

计量
- 文章访问数: 1748
- HTML全文浏览量: 104
- PDF下载量: 1031
- 被引次数: 0
出版历程
- 收稿日期: 2013-01-11
- 修回日期: 2013-09-12
- 刊出日期: 2014-12-20

Bi-path Evolution Model for Online Topic Model Based on LDA

1.
College of Information System and Management, National University of Defense Technology, Changsha 410073

Funds:

Supported by National Natural Science Foundation of China (61105124, 60902091)

摘要

摘要: 网络舆情分析中需要处理大量时效性较强的文本数据流. 针对在线时效性较强的文本数据流, 提出基于LDA (Latent Dirichlet allocation)的双通道在线主题演化模型(Bi-path evolution online-LDA, BPE-OLDA), 在下一时间片生成文本时考虑文本的内容遗传和强度遗传, 很好地模拟了人在生成时效性较强的文本时的特征. 估算模型参数时对 Gibbs 采样算法进行了简化, 实验证明, 使用简化后的在线 Gibbs 重采样算法, BPE-OLDA 模型在提取时效性较强的文本数据流的主题方面具有明显的效果.
- 时效性 /
- 强度遗传 /
- Gibbs 采样 /
- LDA模型
Abstract: There are a large number of time-sensitive texts as data streams to be processed in open-source intelligence analysis. We design a new bi-path evolution model based online-LDA (BPE-OLDA) for the time-limited text streams. This model takes consideration of both content and intensity influences to model the composition process of human successfully. When estimating the parameters of this model, we simplify the Gibbs sampling. Experiments show that BPE-OLDA performs better than other approaches over time-limited text streams.
- Time-sensitive /
- intensity influence /
- Gibbs sampling /
- latent Dirichlet allocation (LDA)

HTML全文

参考文献(28)

[1]	Wang Fei-Yue, Wang Jue. Intelligence and security informatics: the state of the art and outlook. China Basic Science, 2005, 7(2): 24-29(王飞跃, 王珏. 情报与安全信息学研究的现状与展望. 中国基础科学, 2005, 7(2): 24-29)
[2]	Chen H C, Wang F Y, Zeng D. Intelligence and security informatics for homeland security: information, communication, and transportation. IEEE Transactions on Intelligent Transportation Systems, 2004, 5(4): 329-341
[3]	Wang Fei-Yue. Decision service and academic analytics for development of science and technology based on open source intelligence and big data. Bulletin of Chinese Academy of Sciences, 2012, 27(5): 527-537(王飞跃. 知识产生方式和科技决策支撑的重大变革 —— 面向大数据和开源信息的科技态势解析与决策服务. 中国科学院院刊, 2012, 27(5): 527-537)
[4]	Zhang Chen-Yi, Sun Jian-Ling, Ding Yi-Qun. Topic mining for microblog based on MB-LDA model. Journal of Computer Research and Development, 2011, 48(10): 1795-1802(张晨逸, 孙建伶, 丁轶群. 基于MB-LDA模型的微博主题挖掘. 计算机研究与发展, 2011, 48(10): 1795-1802)
[5]	Yang Zhen, Lai Ying-Xu, Duan Li-Juan, Li Yu-Jian. Short text sentiment classification based on context reconstruction. Acta Automatica Sinica, 2012, 38(1): 55-67(杨震, 赖英旭, 段立娟, 李玉鑑. 基于上下文重构的短文本情感极性判别研究. 自动化学报, 2012, 38(1): 55-67)
[6]	Yin Chun-Xia, Peng Qin-Ke. Identifying word sentiment orientation for free comments via complex network. Acta Automatica Sinica, 2012, 38(3): 389-398(殷春霞, 彭勤科. 利用复杂网络为自由评论鉴定词汇情感倾向性. 自动化学报, 2012, 38(3): 389-398)
[7]	Li Wen-Qing, Sun Xin, Zhang Chang-You, Feng Ye. A semantic similarity measure between ontological concepts. Acta Automatica Sinica, 2012, 38(2): 229-235(李文清, 孙新, 张常有, 冯烨. 一种本体概念的语义相似度计算方法. 自动化学报, 2012, 38(2): 229-235)
[8]	Xu R, Wunsch D. Survey of clustering algorithms. IEEE Transactions on Neural Networks, 2005, 16(3): 645-678
[9]	Deerwester S, Dumais S T, Furnas G W, Landauer T K, Harshman R. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 1990, 41(6): 391-407
[10]	Landauer T K, Foltz P W, Laham D. Indexing by latent semantic analysis. Introduction to Latent Semantic Analysis, 1998, 25(2): 259-284
[11]	Griffiths T, Steyvers M. Probabilistic topic models. Latent Semantic Analysis: A Road to Meaning. Hillsdale, NJ: Laurence Erlbaum, 2006.
[12]	Hofmann T. Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 1999. 50-57
[13]	Salton G, McGill M. Introduction to Modern Information Retrieval. New York: McGraw-Hill, 1986.
[14]	Zhou Jian-Ying, Wang Fei-Yue, Zeng Da-Jun. Hierarchical Dirichlet processes and their applications: a survey. Acta Automatica Sinica, 2011, 37(4): 389-407 (周建英, 王飞跃, 曾大军. 分层Dirichlet 过程及其应用综述. 自动化学报, 2011, 37(4): 389-407
[15]	Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. The Journal of Machine Learning Research, 2003, 3: 993- 1022
[16]	Wei X, Croft W B. LDA-based document models for ad-hoc retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2006. 178-185
[17]	Dietz L, Bickel S, Scheffer T. Unsupervised prediction of citation influences. In: Proceedings of the 24th International Conference on Machine Learning. New York, USA: ACM, 2007. 233-240
[18]	Mei Q Z, Cai D, Zhang D, Zhai C X. Topic modeling with network regularization. In: Proceedings of the 17th International Conference on World Wide Web. New York, USA: ACM, 2008. 10l-ll0
[19]	Blei D M, Lafferty J. Text Mining: Classification, Clustering, and Applications. New York: Chapman & Hall/CRC, 2009.
[20]	Blei D M, Lafferty J D. Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning. New York, USA: ACM, 2006. 113-120
[21]	Boyd-Graber J, Blei D M. Syntactic topic models. In: Proceedings of the 20th Neural Information Processing Systems. Cambridge, USA: MIT, 2008.
[22]	Nallapati R, Cohen W. Link-pLSA-LDA: A new unsupervised model for topics and influence of blogs. In: Proceedings of the 2008 International Conference on Weblogs and Social Media (ICWSM). Menlo Park, CA: AAAI, 2008.
[23]	Sun C K, Gao B, Cao Z F, Li H. HTM: A topic model for hypertexts. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. New York, USA: ACM, 2008. 514-522
[24]	Alusumait L, Barber D, Domeniconi C. On-Line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking. In: Proceedings of the 2008 English IEEE International Conference on Data Mining. Pisa, Italy: IEEE, 2008. 3-12
[25]	Manning C D, Raghavan P, Schittze H. An Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2007. 117-119
[26]	Fgueiredo M, Jain A K. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(3): 381-396
[27]	Ching J, Chen Y J. Transitional Markov chain monte carlo method for Bayesian model updating, model class selection, and model averaging. Journal of Engineering Mechanics, 2007, 133(7): 816-832
[28]	Xu Xin, Shen Dong, Gao Yan-Qing, Wang Kai. Learning control of dynamical systems based on Markov decision processes: research frontiers and outlooks. Acta Automatica Sinica, 2012, 38(5): 673-687 (徐昕, 沈栋, 高岩青, 王凯. 基于马氏决策过程模型的动态系统学习控制: 研究前沿与展望. 自动化学报, 2012, 38(5): 673-687

施引文献

资源附件(0)

访问统计

计量

文章访问数: 1748
HTML全文浏览量: 104
PDF下载量: 1031
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于LDA的双通道在线主题演化模型

doi: 10.3724/SP.J.1004.2014.02877

作者简介:
王晖国防科学技术大学计算实验与平行系统技术研究中心教授. 主要研究方向为多媒体情报分析与数据挖掘.E-mail: huiwang@nudt.edu.cn

通讯作者:
曹建平国防科学技术大学信息系统与管理学院博士研究生. 主要研究方向为文本分析, 平行系统理论. 本文通信作者.E-mail: caojianping@nudt.edu.cn

计量

Bi-path Evolution Model for Online Topic Model Based on LDA

计量

目录

留言板

基于LDA的双通道在线主题演化模型

doi: 10.3724/SP.J.1004.2014.02877

作者简介: 王晖 国防科学技术大学计算实验与平行系统技术研究中心教授. 主要研究方向为多媒体情报分析与数据挖掘.E-mail: huiwang@nudt.edu.cn

通讯作者: 曹建平 国防科学技术大学信息系统与管理学院博士研究生. 主要研究方向为文本分析, 平行系统理论. 本文通信作者.E-mail: caojianping@nudt.edu.cn

计量

出版历程

Bi-path Evolution Model for Online Topic Model Based on LDA

计量

出版历程

目录

作者简介:
王晖国防科学技术大学计算实验与平行系统技术研究中心教授. 主要研究方向为多媒体情报分析与数据挖掘.E-mail: huiwang@nudt.edu.cn

通讯作者:
曹建平国防科学技术大学信息系统与管理学院博士研究生. 主要研究方向为文本分析, 平行系统理论. 本文通信作者.E-mail: caojianping@nudt.edu.cn