社交网络中隐式事件突发性检测

介飞; 谢飞; 李磊; 吴信东

doi:10.16383/j.aas.2017.c160564

社交网络中隐式事件突发性检测

doi: 10.16383/j.aas.2017.c160564

介飞^1,,
谢飞^2,,
李磊^1,,
吴信东^1,3, ,

1.
合肥工业大学计算机与信息学院合肥 230009 中国
2.
合肥师范学院计算机科学与技术系合肥 230601 中国
3.
路易斯安那大学拉菲特分校计算与信息学院拉菲特 LA 70503 美国

基金项目:

国家重点基础研究发展计划（973计划） 2013CB329604

国家自然科学基金 61503116

国家自然科学基金 61503114

详细信息

作者简介:
介飞  合肥工业大学计算机与信息学院博士研究生.2014年获得合肥工业大学工学学士学位.主要研究方向为数据挖掘与社交媒体分析.E-mail:hfut_jf@163.com

谢飞  合肥师范学院计算机科学与技术系副教授.2007年和2011年获得合肥工业大学硕士和博士学位.主要研究方向为数据挖掘与自然语言处理.E-mail:xiefei9815057@sina.com

李磊  合肥工业大学计算机与信息学院副研究员.2012年获得澳大利亚麦考瑞大学计算专业博士学位.主要研究方向为数据挖掘, 社会计算, 图计算.E-mail:lilei@hfut.edu.cn

通讯作者:
吴信东长江学者, IEEE Fellow, AAAS Fellow, 合肥工业大学计算机与信息学院教授, 美国路易斯安那大学拉菲特分校计算与信息学院教授.1993年获得英国爱丁堡大学人工智能博士学位.主要研究方向为数据挖掘, 知识库系统, 万维网信息探索.本文通信作者.E-mail:xwu@hfut.edu.cn

计量
- 文章访问数: 2335
- HTML全文浏览量: 393
- PDF下载量: 667
- 被引次数: 0
出版历程
- 收稿日期: 2016-07-31
- 录用日期: 2017-03-21
- 刊出日期: 2018-04-20

Latent Event-related Burst Detection in Social Networks

JIE Fei^1
,,
XIE Fei^2
,,
LI Lei^1
,,
WU Xin-Dong^{1,3
, ,}

1.
School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China
2.
Department of Computer Science and Technology, Hefei Normal University, Hefei 230601, China
3.
School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette LA 70503, USA

Funds:

National Basic Research Program of China (973 Program) 2013CB329604

National Natural Science Foundation of China 61503116

National Natural Science Foundation of China 61503114

More Information

Author Bio:
Ph. D. candidate at the School of Computer Science and Information Engineering, Hefei University of Technology. He received his bachelor degree from Hefei University of Technology in 2014. His research interest covers data mining and social media analytics

Associate professor in the Department of Computer Science and Technology, Hefei Normal University. He received his master and Ph. D. degrees from Hefei University of Technology in 2007 and 2011, respectively. His research interest covers data mining and natural language processing

Associate professor in the Department of Computer Science and Information Engineering, Hefei University of Technology. He received his Ph. D. degree in computing from Macquarie University, Australia in 2012. His research interest covers data mining, social computing, and graph computing

Corresponding author: WU Xin-Dong The Yangtze River Scholar, IEEE Fellow, AAAS Fellow, professor at the School of Computer Science and Information Engineering, Hefei University of Technology, professor at the School of Computing and Informatics, University of Louisiana at Lafayette, USA. He received his Ph. D. degree from the University of Edinburgh, UK in 1993. His research interest covers data mining, knowledge based systems, and Web information exploration. Corresponding author of this paper

摘要

摘要: 社交网络与人们的生活息息相关，其上的用户行为可用于检测社交网络中的事件突发性，进而准确定位事件的发生区间.但用户行为易受主观及外部因素的影响，有时会出现隐式事件突发性，给事件突发性检测带来困难.本文针对社交网络中的隐式事件突发性问题，在以社交行为特征进行事件突发性检测的基础上，引入关键词特征，动态调整各个时间窗口的候选关键词，将不同事件与不同的关键词特征绑定，避免事件之间及噪音带来的干扰，实现对隐式事件突发性的准确识别.相关实验表明，本文提出的算法可有效改善现有社交网络中事件突发性检测任务的效果.
- 突发性 /
- 事件 /
- 检测 /
- 社交网络
Abstract: Social networks are closely bound up with our daily life, in which behaviors of users can be used for detection of event-related bursts and further for determination of the time period for each event. But latent event-related bursts, which result from internal or external impacts on users' behaviors, will be difficult to identify. In this paper, in order to solve the detection problem of latent event-related bursts in social networks, on the basis of event burst detection via social behavior features, we introduce the features of keywords and dynamically change the keyword candidates for each time window, so as to bind different events with different keywords, aiming to avoid interferences from inter-events or noise and discover latent event-related bursts more accurately. Experimental results show that our proposed method can improve the performance of event-related burst detection in social networks compared with existing algorithms.
- Burst /
- event /
- detection /
- social network
注释:

1) 本文责任编委张民

HTML全文

图 1 隐式事件突发性示例

Fig. 1 An example of latent event-related burst

下载: 全尺寸图片幻灯片

图 2 相关定义示意图

Fig. 2 A schematic diagram of related conceptions

下载: 全尺寸图片幻灯片

图 3 关键词特征作用示意图

Fig. 3 The schematic diagram of keyword feature relations

下载: 全尺寸图片幻灯片

图 4 区间优化算法流程图

Fig. 4 The flow chart of interval optimization algorithm

下载: 全尺寸图片幻灯片

图 5 社交网络中事件突发性检测方案流程示意图

Fig. 5 The flow diagram of event-related burst detection in social networks

下载: 全尺寸图片幻灯片

图 6 Comb方法作用示意图

Fig. 6 The schematic diagram of method Comb

下载: 全尺寸图片幻灯片

表 1 数据集HD上各算法实验结果

Table 1 The experimental results of different algorithms on dataset HD

实验项目		实验结果
Method	Feature/Strategy	$P$	$R$	$F$
	all	0.9000	0.3846	0.5389
	post	0.8352	0.3462	0.4894
Single	repost	$\textbf{0.9902}$	$\textbf{0.5385}$	$\textbf{0.6976}$
	url	0.6803	0.3846	0.4914
	user	0.6573	0.4615	0.5423
Multi	post+repost+url	$\textbf{0.9525}$	$\textbf{0.6923}$	$ \textbf{0.8018}$
	conjunct	1.0000	0.5385	0.7000
Comb	disjunct	$\textbf{0.8256}$	$\textbf{0.9231}$	$\textbf{0.8716}$
	hybrid	0.9949	0.6923	0.8165

下载: 导出CSV

表 2 数据集BA上各算法实验结果

Table 2 The experimental results of different algorithms on dataset BA

实验项目		实验结果
Method	Feature/Strategy	$P$	$R$	$F$
	all	$\textbf{0.9662}$	$ \textbf{0.4000}$	$\textbf{0.5658}$
	post	0.9740	0.2000	0.3319
Single	repost	0.8640	0.3000	0.4454
	url	0.2574	0.1333	0.1757
	user	0.7346	0.3333	0.4586
Multi	post+repost+url	$\textbf{0.8787}$	$\textbf{0.4667}$	$\textbf{0.6096}$
	conjunct	0.9554	0.2667	0.4170
Comb	disjunct	$\textbf{0.9030} $	$ \textbf{0.5333}$	$\textbf{0.6706}$
	hybrid	0.8051	0.5667	0.6652

下载: 导出CSV

表 3 单独使用关键词特征时实验结果

Table 3 The experimental results with only keyword features

数据集	实验结果
数据集	$P$	$R$	$F$
HD	$\textbf{0.7709}$	$\textbf{0.7692}$	$\textbf{0.7701}$
BA	$\textbf{0.6327}$	$\textbf{0.3667}$	$\textbf{0.4643} $

下载: 导出CSV

表 4 事件$A$, $B$的关键词提取结果

Table 4 Extracted keywords of event $A$ and $B$

时间窗口	关键词(Top 3)
2015-10-21 19时	恒大、决赛、亚冠、广州
2015-10-21 20时	恒大、决赛、亚冠、广州
2015-10-21 21时	恒大、决赛、亚冠、进
2015-10-22 19时	恒大、英国、峰会、工商
2015-10-22 20时	恒大、集团、英国、峰会
2015-10-22 21时	恒大、英国、峰会、工商

下载: 导出CSV

参考文献(35)

[1]	Zhao W X, Shu B H, Jiang J, Song Y, Yan H F, Li X M. Identifying event-related bursts via social media activities. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Stroudsburg, PA, USA: ACL, 2012. 1466-1477 http://www.researchgate.net/publication/262285003_Identifying_event-related_bursts_via_social_media_activities
[2]	Kleinberg J. Bursty and hierarchical structure in streams. Data Mining and Knowledge Discovery, 2003, 7(4):373-397 doi: 10.1023/A:1024940629314
[3]	Swan R, Allan J. Extracting significant time varying features from text. In: Proceedings of the 8th International Conference on Information and Knowledge Management. New York, NY, USA: ACM, 1999. 38-45 http://www.researchgate.net/publication/2450599_Extracting_Significant_Time_Varying_Features_from_Text
[4]	Swan R, Allan J. Automatic generation of overview time-lines. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA: ACM, 2000. 49-56 http://www.researchgate.net/publication/221299242_Automatic_generation_of_overview_timelines
[5]	Mei Q Z, Zhai C X. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. New York, NY, USA: ACM, 2005. 198-207 http://www.researchgate.net/publication/220272030_Discovering_evolutionary_theme_patterns_from_text_an_exploration_of_temporal_text_mining
[6]	Marcus A, Bernstein M S, Badar O, Karger D R, Madden S, Miller R C. Twitinfo: aggregating and visualizing microblogs for event exploration. In: Proceedings of the 2011 SIGCHI Conference on Human Factors in Computing Systems. New York, NY, USA: ACM, 2011. 227-236 http://www.researchgate.net/publication/228977615_TwitInfo_Aggregating_and_visualizing_microblogs_for_event_exploration
[7]	Takahashi T, Tomioka R, Yamanishi K. Discovering emerging topics in social streams via link-anomaly detection. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(1):120-130 doi: 10.1109/TKDE.2012.239
[8]	张鲁民, 贾焰, 周斌, 赵金辉, 洪锋.一种基于情感符号的在线突发事件检测方法.计算机学报, 2013, 36(8):1659-1667 http://edu.wanfangdata.com.cn/Periodical/Detail/jsjxb201308010 Zhang Lu-Min, Jia Yan, Zhou Bin, Zhao Jin-Hui, Hong Feng. Online bursty events detection based on emoticons. Chinese Journal of Computers, 2013, 36(8):1659-1667 http://edu.wanfangdata.com.cn/Periodical/Detail/jsjxb201308010
[9]	Chen F, Neill D B. Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, 2014. 1166-1175 http://dl.acm.org/citation.cfm?id=2623619
[10]	Zhang X M, Li Z J, Chao W H, Xia J L. Popularity prediction of burst event in microblogging. In: Proceedings of the 15th International Conference on Web-Age Information Management. Macau, China: Springer, 2014. 484-487 doi: 10.1007%2F978-3-319-08010-9_53
[11]	Aiello L M, Petkos G, Martin C, Corney D, Papadopoulos S, Skraba R, Goker A, Kompatsiaris I, Jaimes A. Sensing trending topics in twitter. IEEE Transactions on Multimedia, 2013, 15(6):1268-1282 doi: 10.1109/TMM.2013.2265080
[12]	冯冲, 石戈, 郭宇航, 龚静, 黄河燕.基于词向量语义分类的微博实体链接方法.自动化学报, 2016, 42(6):915-922 http://www.aas.net.cn/CN/abstract/abstract18882.shtml Feng Chong, Shi Ge, Guo Yu-Hang, Gong Jing, Huang He-Yan. An entity linking method for microblog based on semantic categorization by word embeddings. Acta Automatica Sinica, 2016, 42(6):915-922 http://www.aas.net.cn/CN/abstract/abstract18882.shtml
[13]	Fung G P C, Yu J X, Yu P S, Lu H J. Parameter free bursty events detection in text streams. In: Proceedings of the 31st International Conference on Very Large Data Bases. New York, NY, USA: ACM, 2005. 181-192 http://www.researchgate.net/publication/221309682_Parameter_Free_Bursty_Events_Detection_in_Text_Streams
[14]	Urabe Y, Yamanishi K, Tomioka R, Iwai H. Real-time change-point detection using sequentially discounting normalized maximum likelihood coding. In: Proceedings of the 15th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Berlin, Heidelberg, Germany: Springer-Verlag, 2011. 185-197
[15]	Mathioudakis M, Koudas N. TwitterMonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. New York, NY, USA: ACM, 2010. 1155-1158 http://www.researchgate.net/publication/221213158_TwitterMonitor_trend_detection_over_the_twitter_stream
[16]	Allan J, Carbonell J G, Doddington G, Yamron J, Yang Y M. Topic detection and tracking pilot study final report. In: Proceedings of the 1998 DARPA Broadcast News Transcription and Understanding Workshop. Lansdowne, Virginia, USA: DARPA, 1998. 194-218
[17]	Atefeh F, Khreich W. A survey of techniques for event detection in twitter. Computational Intelligence, 2015, 31(1):132-164 doi: 10.1111/coin.v31.1
[18]	Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. Journal of Machine Learning Research, 2003, 3:993-1022 http://ci.nii.ac.jp/naid/20001460587
[19]	Zhao W X, Chen R S, Fan K, Yan H F, Li X M. A novel burst-based text representation model for scalable event detection. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: ACL, 2012, 2: 43-47
[20]	Zhao W X, Jiang J, Weng J S, He J, Lim E P, Yan H F, Li X M. Comparing twitter and traditional media using topic models. In: Proceedings of the 33rd European Conference on Advances in Information Retrieval. Berlin, Heidelberg, Germany: Springer-Verlag, 2011. 338-349
[21]	Diao Q M, Jiang J, Zhu F D, Lim E P. Finding bursty topics from microblogs. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: ACL, 2012, 1: 536-544
[22]	Hong L J, Ahmed A, Gurumurthy S, Smola A J, Tsioutsiouliklis K. Discovering geographical topics in the twitter stream. In: Proceedings of the 21st International Conference on World Wide Web. New York, NY, USA: ACM, 2012. 769 -778
[23]	Weng J S, Lee B S. Event detection in twitter. In: Proceedings of the 2011 International AAAI Conference on Web and Social Media. Palo Alto, CA, USA: AAAI, 2011. 401-408
[24]	Wang Z H, Shou L D, Chen K, Chen G, Mehrotra S. On summarization and timeline generation for evolutionary tweet streams. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(5):1301-1315 doi: 10.1109/TKDE.2014.2345379
[25]	Sakaki T, Okazaki M, Matsuo Y. Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web. New York, NY, USA: ACM, 2010. 851-860
[26]	Becker H, Naaman M, Gravano L. Beyond trending topics: real-world event identification on twitter. In: Proceedings of the 2011 International AAAI Conference on Web and Social Media. Palo Alto, CA, USA: AAAI, 2011. 438-441
[27]	付举磊, 刘文礼, 郑晓龙, 樊瑛, 汪寿阳.基于文本挖掘和网络分析的"东突"活动主要特征研究.自动化学报, 2014, 40(11):2456-2468 http://www.aas.net.cn/CN/abstract/abstract18522.shtml Fu Ju-Lei, Liu Wen-Li, Zheng Xiao-Long, Fan Ying, Wang Shou-Yang. Analyzing the characteristics of "east Turkistan" activities using text mining and network analysis. Acta Automatica Sinica, 2014, 40(11):2456-2468 http://www.aas.net.cn/CN/abstract/abstract18522.shtml
[28]	胡艳丽, 白亮, 张维明.一种话题演化建模与分析方法.自动化学报, 2012, 38(10):1690-1697 http://www.aas.net.cn/CN/abstract/abstract17778.shtml Hu Yan-Li, Bai Liang, Zhang Wei-Ming. Modeling and analyzing topic evolution. Acta Automatica Sinica, 2012, 38(10):1690-1697 http://www.aas.net.cn/CN/abstract/abstract17778.shtml
[29]	Thelwall M, Buckley K, Paltoglou G. Sentiment in twitter events. Journal of the American Society for Information Science and Technology, 2011, 62(2):406-418 doi: 10.1002/asi.21462
[30]	Bollen J, Mao H N, Zeng X J. Twitter mood predicts the stock market. Journal of Computational Science, 2011, 2(1):1-8 doi: 10.1016/j.jocs.2010.12.007
[31]	吴信东, 李毅, 李磊.在线社交网络影响力分析.计算机学报, 2014, 37(4):735-752 Wu Xin-Dong, Li Yi, Li Lei. Influence analysis of online social networks. Chinese Journal of Computers, 2014, 37(4):735-752
[32]	Perozzi B, Al-Rfou R, Skiena S. Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, 2014. 701-710
[33]	辛宇, 杨静, 谢志强.基于标签传播的语义重叠社区发现算法.自动化学报, 2014, 40(10):2262-2275 http://www.aas.net.cn/CN/abstract/abstract18501.shtml Xin Yu, Yang Jing, Xie Zhi-Qiang. An overlapping semantic community structure detecting algorithm by label propagation. Acta Automatica Sinica, 2014, 40(10):2262-2275 http://www.aas.net.cn/CN/abstract/abstract18501.shtml
[34]	黄立威, 李彩萍, 张海粟, 刘玉超, 李德毅, 刘艳博.一种基于因子图模型的半监督社区发现方法.自动化学报, 2016, 42(10):1520-1531 http://www.aas.net.cn/CN/abstract/abstract18939.shtml Huang Li-Wei, Li Cai-Ping, Zhang Hai-Su, Liu Yu-Chao, Li De-Yi, Liu Yan-Bo. A semi-supervised community detection method based on factor graph model. Acta Automatica Sinica, 2016, 42(10):1520-1531 http://www.aas.net.cn/CN/abstract/abstract18939.shtml
[35]	Tsur O, Rappoport A. What's in a hashtag?: content based prediction of the spread of ideas in microblogging communities. In: Proceedings of the 5th ACM International Conference on Web Search and Data Mining. Seattle, Washington, USA: ACM, 2012. 643-652