基于词语对狄利克雷过程的时序摘要

席耀一; 李弼程; 李天彩; 黄山奇

doi:10.16383/j.aas.2015.c150001

基于词语对狄利克雷过程的时序摘要

doi: 10.16383/j.aas.2015.c150001

1.
解放军信息工程大学信息系统工程学院郑州 450001;
2.
65022部队沈阳 110162

基金项目:

国家社会科学基金(14BXW028)资助

详细信息

作者简介:
李弼程解放军信息工程大学信息系统工程学院教授.主要研究方向为文本分析与理解,语音处理与识别,图像/视频处理与识别,信息融合.E-mail:lbclm@gmail.com

计量
- 文章访问数: 1682
- HTML全文浏览量: 68
- PDF下载量: 1932
- 被引次数: 73
出版历程
- 收稿日期: 2015-01-04
- 修回日期: 2015-04-08
- 刊出日期: 2015-08-20

Temporal Summarization Based on Biterm Dirichlet Process

1.
Institute of Information System Engineering, PLA Information Engineering University, Zhengzhou 450001;
2.
Unit 65022, Shenyang 110162

Funds:

Supported by National Social Science Foundation of China (14BXW028)

摘要

摘要: 时序摘要是按照时间顺序生成摘要, 对话题的演化发展进行概括. 已有的相关研究忽视或者不能准确发现句子中隐含的子话题信息. 针对该问题, 本文建立了一种新的主题模型, 即词语对狄利克雷过程, 并提出了一种基于该模型的时序摘要生成方法. 首先通过模型推理得到句子的子话题分布; 然后利用该分布计算句子的相关度和新颖度; 最后按时间顺序抽取与话题相关且新颖度高的句子组成时序摘要. 实验结果表明, 本文方法较目前的代表性研究方法生成了更高质量的时序摘要.
- 时序摘要 /
- 狄利克雷过程 /
- 词语对 /
- 主题模型
Abstract: Temporal summarization aims at extracting sentences chronologically to give an overview about the evolution of a topic. Existing researches either neglect the information of latent subtopics, or fail to accurately discover them. In this paper, we develop a novel topic model called biterm Dirichlet process and generate the temporal summary based on it. Firstly, we get the subtopic distribution in each sentence through posterior inference. Secondly, we calculate each sentence's relevance and novelty degree according to its subtopic distribution. Finally, we chronologically extract the sentences which are relevant and novel to generate the temporal summary. Experiments demonstrate the better performance of our approach compared with currently representative methods.
- Temporal summarization /
- Dirichlet process /
- biterm /
- topic model

HTML全文

参考文献(26)

[1]	Yan R, Wan X J, Otterbacher J, Kong L, Li X M, Zhang Y. Evolutionary timeline summarization: a balanced optimization framework via iterative substitution. In: Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Beijing, China: ACM, 2011. 745-754
[2]	Yan R, Kong L, Huang C R, Wan X J, Li X M, Zhang Y. Timeline generation through evolutionary trans-temporal summarization. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Edinburgh, UK: ACL, 2011. 433-443
[3]	Tran G B, Tran T A, Tran N K. Leveraging learning to rank in an optimization framework for timeline summarization. In: Proceedings of the 36th Annual International ACM SIGIR Workshop on Time-aware Information Access. Dublin, Ireland: ACM, 2013. 433-443
[4]	Chieu H L, Lee Y K. Query based event extraction along a timeline. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield, UK: ACM, 2004. 425-432
[5]	Xu S Z, Wang S S, Zhang Y. Summarizing complex events: a cross-modal solution of storylines extraction and reconstruction. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, USA: ACL, 2013. 1281-1291
[6]	Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993-1022
[7]	Cao Jian-Ping, Wang Hui, Xia You-Qing, Qiao Feng-Cai, Zhang Xin. Bi-path evolution model for online topic model based on LDA. Acta Automatica Sinica, 2014, 40(12): 2877 -2886(曹建平, 王晖, 夏友清, 乔凤才, 张鑫. 基于 LDA 的双通道在线主题演化模型. 自动化学报, 2014, 40(12): 2877-2886)
[8]	Gao D H, Li W J, Zhang R X. Sequential summarization: a new application for timely updated twitter trending topics. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Sofia, Bulgaria: ACL, 2013. 567-571
[9]	Huang L F, Huang L E. Optimized event storyline generation based on mixture-event-aspect model. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, USA: ACL, 2013. 726-735
[10]	Li J W, Li S J. Evolutionary hierarchical dirichlet process for timeline summarization. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Sofia, Bulgaria: ACL, 2013. 556-560
[11]	Yan X H, Guo J F, Lan Y Y, Cheng X Q. A biterm topic model for short texts. In: Proceedings of the 22nd International World Wide Web Conference. Rio de Janeiro, Brazil: ACM, 2013. 1445-1455
[12]	Allan J, Gupta R, Khandelwal V. Temporal summaries of new topics. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New Orleans, USA: ACM, 2001. 10-18
[13]	Lin F R, Liang C H. Storyline-based summarization for news topic retrospection. Decision Support Systems, 2008, 45(3): 473-490
[14]	He Rui-Fang, Qin Bing, Liu Ting, Pan Yue-Qun, Li Sheng. Temporal multi-document summarization based on macro-micro importance discriminative model. Journal of Computer Research and Development, 2009, 46(7): 1184-1191(贺瑞芳, 秦兵, 刘挺, 潘越群, 李生. 基于宏微观重要性判别模型的时序多文档文摘. 计算机研究与发展, 2009, 46(7): 1184-1191)
[15]	Chen C C, Chen M C. TSCAN: a content anatomy approach to temporal topic summarization. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(1): 170-183
[16]	Shou L D, Wang Z H, Chen K, Chen G. Sumblr: continuous summarization of evolving tweet streams. In: Proceedings of the 36th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland: ACM, 2013. 533-542
[17]	Olariu A. Efficient online summarization of microblogging streams. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Gothenburg, Sweden: ACL, 2013. 236-240
[18]	Olariu A. Hierarchical clustering in improving microblog stream summarization. In: Proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics. Samos, Greece: Springer, 2013. 424- 435
[19]	Zubiaga A, Spina D, Amigó E, Gonzalo J. Towards real-time summarization of scheduled events from twitter streams. In: Proceedings of the 23rd ACM Conference on Hypertext and Social Media. Milwaukee, USA: ACM, 2013. 319-320
[20]	Teh Y W, Jordan M I, Beal M J, Blei D M. Hierarchical dirichlet processes. Journal of the American Statistical Association, 2006, 101(476): 1566-1581
[21]	Griffiths T L, Steyvers M. Finding scientific topics. Proceedings of the National Academy of Science of the United States of America, 2004, 101(Suppl 1): 5228-5235
[22]	Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia: ACM, 1998. 335-336
[23]	Lin C Y, Hovy E. Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Edmonton, Canada: ACL, 2003. 71-78
[24]	Erkan G, Radev D R. LexRank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 2004, 22(1): 457-479
[25]	Radev D R, Jing H Y, Stys M, Tam D. Centroid-based summarization of multiple documents. Information Processing and Management, 2004, 40(6): 919-938
[26]	Li P, Wang Y L, Gao W, Jiang J. Generating aspect-oriented multi-document summarization with event-aspect model. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Edinburgh, UK: ACL, 2011. 1137-1146

施引文献

期刊类型引用(22)

1.	张欣，张雁，张鑫. 基于亮度与彩色纹理统计的无参考图像评价. 信息技术与信息化. 2023(01): 122-129 . 百度学术
2.	何锦成，韩永成，张闻文，何伟基，陈钱. 基于通道校正卷积的真彩色微光图像增强. 兵工学报. 2023(06): 1643-1654 . 百度学术
3.	罗小燕，刘顺，汤文聪，王兴卫. 基于Mask RCNN的矿仓入料口堵塞矿石识别定位研究. 有色金属科学与工程. 2022(01): 101-107 . 百度学术
4.	陈健，李诗云，林丽，王猛，李佐勇. 模糊失真图像无参考质量评价综述. 自动化学报. 2022(03): 689-711 . 本站查看
5.	段添耀，柯圆圆. 基于多种颜色模型的马赛克瓷砖选色研究. 江汉大学学报(自然科学版). 2022(04): 45-52 . 百度学术
6.	来晓. 基于微调优化的深度学习在果蔬识别中的应用. 智能计算机与应用. 2021(04): 117-123 . 百度学术
7.	贺杰，王桂梅，刘杰辉，杨立洁. 基于图像处理的皮带机上煤量体积计量. 计量学报. 2020(12): 1516-1520 . 百度学术
8.	柴富杰，邓嘉敏，李建森，刘正发. 数码照相颜色数值与物质浓度辨识的数学模型. 数学的实践与认识. 2019(04): 305-311 . 百度学术
9.	陈扬，李旦，张建秋. 互补色小波域图像质量盲评价方法. 电子学报. 2019(04): 775-783 . 百度学术
10.	侯向宁，刘华春. 基于MSER和SVM以及强种子区域生长的车牌定位. 西安工程大学学报. 2019(02): 180-185 . 百度学术
11.	梁长江，吴雪梅，王芳，宋朱军，张富贵. 基于无人机的田间地膜识别算法研究. 浙江农业学报. 2019(06): 1005-1011 . 百度学术
12.	刘星星，王烁烁，徐丽明，袁全春，马帅，于畅畅，牛丛，陈晨，袁训腾，曾鉴. 基于OpenCV的动态葡萄干色泽实时识别. 农业工程学报. 2019(23): 177-184 . 百度学术
13.	李可，陈洪亮，张生伟，万锦锦. 基于SVM的雾天图像分类技术研究. 电光与控制. 2018(03): 37-41+47 . 百度学术
14.	丁丽. 基于粗集理论的车辆状态检测. 电脑知识与技术. 2018(01): 189-190+208 . 百度学术
15.	胡晓丽，钟昊，李彤. 基于二值图像连通域的甘蔗螟虫识别计数方法. 桂林电子科技大学学报. 2018(03): 210-214 . 百度学术
16.	张宪红，张春蕊. 基于六维前馈神经网络模型的图像增强算法. 山东大学学报(工学版). 2018(04): 10-19 . 百度学术
17.	李玉华，李天华，牛子孺，吴彦强，张智龙，侯加林. 基于色饱和度三维几何特征的马铃薯芽眼识别. 农业工程学报. 2018(24): 158-164 . 百度学术
18.	郑恩，林靖宇. 基于图像质量约束的无序图像关键帧提取. 计算机工程. 2017(11): 210-215 . 百度学术
19.	任荣梓，高航. 基于混沌置乱的分量融合图像加密压缩方法. 计算机技术与发展. 2017(08): 106-109+114 . 百度学术
20.	元朴康，况盛坤，王强，田全慧. 基于GRNN的模糊图像盲评价. 包装工程. 2016(13): 195-200 . 百度学术
21.	李俊峰，张之祥，沈军民. 基于亮度统计的无参考图像质量评价. 光电子·激光. 2016(10): 1101-1110 . 百度学术
22.	万泽慧. 试析网络图像的色彩管理要点. 无线互联科技. 2016(04): 32-34 . 百度学术