2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于LDA的双通道在线主题演化模型

曹建平 王晖 夏友清 乔凤才 张鑫

曹建平, 王晖, 夏友清, 乔凤才, 张鑫. 基于LDA的双通道在线主题演化模型. 自动化学报, 2014, 40(12): 2877-2886. doi: 10.3724/SP.J.1004.2014.02877
引用本文: 曹建平, 王晖, 夏友清, 乔凤才, 张鑫. 基于LDA的双通道在线主题演化模型. 自动化学报, 2014, 40(12): 2877-2886. doi: 10.3724/SP.J.1004.2014.02877
CAO Jian-Ping, WANG Hui, XIA You-Qing, QIAO Feng-Cai, ZHANG Xin. Bi-path Evolution Model for Online Topic Model Based on LDA. ACTA AUTOMATICA SINICA, 2014, 40(12): 2877-2886. doi: 10.3724/SP.J.1004.2014.02877
Citation: CAO Jian-Ping, WANG Hui, XIA You-Qing, QIAO Feng-Cai, ZHANG Xin. Bi-path Evolution Model for Online Topic Model Based on LDA. ACTA AUTOMATICA SINICA, 2014, 40(12): 2877-2886. doi: 10.3724/SP.J.1004.2014.02877

基于LDA的双通道在线主题演化模型

doi: 10.3724/SP.J.1004.2014.02877
基金项目: 

国家自然科学基金(61105124,60902091)资助

详细信息
    作者简介:

    王晖 国防科学技术大学计算实验与平行系统技术研究中心教授. 主要研究方向为多媒体情报分析与数据挖掘.E-mail: huiwang@nudt.edu.cn

    通讯作者:

    曹建平 国防科学技术大学信息系统与管理学院博士研究生. 主要研究方向为文本分析, 平行系统理论. 本文通信作者.E-mail: caojianping@nudt.edu.cn

Bi-path Evolution Model for Online Topic Model Based on LDA

Funds: 

Supported by National Natural Science Foundation of China (61105124, 60902091)

  • 摘要: 网络舆情分析中需要处理大量时效性较强的文本数据流. 针对在线时效性较强的文本数据流, 提出基于LDA (Latent Dirichlet allocation)的双通道在线主题演化模型(Bi-path evolution online-LDA, BPE-OLDA), 在下一时间片生成文本时考虑文本的内容遗传和强度遗传, 很好地模拟了人在生成时效性较强的文本时的特征. 估算模型参数时对 Gibbs 采样算法进行了简化, 实验证明, 使用简化后的在线 Gibbs 重采样算法, BPE-OLDA 模型在提取时效性较强的文本数据流的主题方面具有明显的效果.
  • [1] Wang Fei-Yue, Wang Jue. Intelligence and security informatics: the state of the art and outlook. China Basic Science, 2005, 7(2): 24-29(王飞跃, 王珏. 情报与安全信息学研究的现状与展望. 中国基础科学, 2005, 7(2): 24-29)
    [2] Chen H C, Wang F Y, Zeng D. Intelligence and security informatics for homeland security: information, communication, and transportation. IEEE Transactions on Intelligent Transportation Systems, 2004, 5(4): 329-341
    [3] Wang Fei-Yue. Decision service and academic analytics for development of science and technology based on open source intelligence and big data. Bulletin of Chinese Academy of Sciences, 2012, 27(5): 527-537(王飞跃. 知识产生方式和科技决策支撑的重大变革 —— 面向大数据和开源信息的科技态势解析与决策服务. 中国科学院院刊, 2012, 27(5): 527-537)
    [4] Zhang Chen-Yi, Sun Jian-Ling, Ding Yi-Qun. Topic mining for microblog based on MB-LDA model. Journal of Computer Research and Development, 2011, 48(10): 1795-1802(张晨逸, 孙建伶, 丁轶群. 基于MB-LDA模型的微博主题挖掘. 计算机研究与发展, 2011, 48(10): 1795-1802)
    [5] Yang Zhen, Lai Ying-Xu, Duan Li-Juan, Li Yu-Jian. Short text sentiment classification based on context reconstruction. Acta Automatica Sinica, 2012, 38(1): 55-67(杨震, 赖英旭, 段立娟, 李玉鑑. 基于上下文重构的短文本情感极性判别研究. 自动化学报, 2012, 38(1): 55-67)
    [6] Yin Chun-Xia, Peng Qin-Ke. Identifying word sentiment orientation for free comments via complex network. Acta Automatica Sinica, 2012, 38(3): 389-398(殷春霞, 彭勤科. 利用复杂网络为自由评论鉴定词汇情感倾向性. 自动化学报, 2012, 38(3): 389-398)
    [7] Li Wen-Qing, Sun Xin, Zhang Chang-You, Feng Ye. A semantic similarity measure between ontological concepts. Acta Automatica Sinica, 2012, 38(2): 229-235(李文清, 孙新, 张常有, 冯烨. 一种本体概念的语义相似度计算方法. 自动化学报, 2012, 38(2): 229-235)
    [8] Xu R, Wunsch D. Survey of clustering algorithms. IEEE Transactions on Neural Networks, 2005, 16(3): 645-678
    [9] Deerwester S, Dumais S T, Furnas G W, Landauer T K, Harshman R. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 1990, 41(6): 391-407
    [10] Landauer T K, Foltz P W, Laham D. Indexing by latent semantic analysis. Introduction to Latent Semantic Analysis, 1998, 25(2): 259-284
    [11] Griffiths T, Steyvers M. Probabilistic topic models. Latent Semantic Analysis: A Road to Meaning. Hillsdale, NJ: Laurence Erlbaum, 2006.
    [12] Hofmann T. Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 1999. 50-57
    [13] Salton G, McGill M. Introduction to Modern Information Retrieval. New York: McGraw-Hill, 1986.
    [14] Zhou Jian-Ying, Wang Fei-Yue, Zeng Da-Jun. Hierarchical Dirichlet processes and their applications: a survey. Acta Automatica Sinica, 2011, 37(4): 389-407 (周建英, 王飞跃, 曾大军. 分层Dirichlet 过程及其应用综述. 自动化学报, 2011, 37(4): 389-407
    [15] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. The Journal of Machine Learning Research, 2003, 3: 993- 1022
    [16] Wei X, Croft W B. LDA-based document models for ad-hoc retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2006. 178-185
    [17] Dietz L, Bickel S, Scheffer T. Unsupervised prediction of citation influences. In: Proceedings of the 24th International Conference on Machine Learning. New York, USA: ACM, 2007. 233-240
    [18] Mei Q Z, Cai D, Zhang D, Zhai C X. Topic modeling with network regularization. In: Proceedings of the 17th International Conference on World Wide Web. New York, USA: ACM, 2008. 10l-ll0
    [19] Blei D M, Lafferty J. Text Mining: Classification, Clustering, and Applications. New York: Chapman & Hall/CRC, 2009.
    [20] Blei D M, Lafferty J D. Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning. New York, USA: ACM, 2006. 113-120
    [21] Boyd-Graber J, Blei D M. Syntactic topic models. In: Proceedings of the 20th Neural Information Processing Systems. Cambridge, USA: MIT, 2008.
    [22] Nallapati R, Cohen W. Link-pLSA-LDA: A new unsupervised model for topics and influence of blogs. In: Proceedings of the 2008 International Conference on Weblogs and Social Media (ICWSM). Menlo Park, CA: AAAI, 2008.
    [23] Sun C K, Gao B, Cao Z F, Li H. HTM: A topic model for hypertexts. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. New York, USA: ACM, 2008. 514-522
    [24] Alusumait L, Barber D, Domeniconi C. On-Line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking. In: Proceedings of the 2008 English IEEE International Conference on Data Mining. Pisa, Italy: IEEE, 2008. 3-12
    [25] Manning C D, Raghavan P, Schittze H. An Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2007. 117-119
    [26] Fgueiredo M, Jain A K. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(3): 381-396
    [27] Ching J, Chen Y J. Transitional Markov chain monte carlo method for Bayesian model updating, model class selection, and model averaging. Journal of Engineering Mechanics, 2007, 133(7): 816-832
    [28] Xu Xin, Shen Dong, Gao Yan-Qing, Wang Kai. Learning control of dynamical systems based on Markov decision processes: research frontiers and outlooks. Acta Automatica Sinica, 2012, 38(5): 673-687 (徐昕, 沈栋, 高岩青, 王凯. 基于马氏决策过程模型的动态系统学习控制: 研究前沿与展望. 自动化学报, 2012, 38(5): 673-687
  • 加载中
计量
  • 文章访问数:  1749
  • HTML全文浏览量:  104
  • PDF下载量:  1031
  • 被引次数: 0
出版历程
  • 收稿日期:  2013-01-11
  • 修回日期:  2013-09-12
  • 刊出日期:  2014-12-20

目录

    /

    返回文章
    返回