摘要: 社交网络与人们的生活息息相关,其上的用户行为可用于检测社交网络中的事件突发性,进而准确定位事件的发生区间.但用户行为易受主观及外部因素的影响,有时会出现隐式事件突发性,给事件突发性检测带来困难.本文针对社交网络中的隐式事件突发性问题,在以社交行为特征进行事件突发性检测的基础上,引入关键词特征,动态调整各个时间窗口的候选关键词,将不同事件与不同的关键词特征绑定,避免事件之间及噪音带来的干扰,实现对隐式事件突发性的准确识别.相关实验表明,本文提出的算法可有效改善现有社交网络中事件突发性检测任务的效果.Abstract: Social networks are closely bound up with our daily life, in which behaviors of users can be used for detection of event-related bursts and further for determination of the time period for each event. But latent event-related bursts, which result from internal or external impacts on users' behaviors, will be difficult to identify. In this paper, in order to solve the detection problem of latent event-related bursts in social networks, on the basis of event burst detection via social behavior features, we introduce the features of keywords and dynamically change the keyword candidates for each time window, so as to bind different events with different keywords, aiming to avoid interferences from inter-events or noise and discover latent event-related bursts more accurately. Experimental results show that our proposed method can improve the performance of event-related burst detection in social networks compared with existing algorithms.
表 1 数据集HD上各算法实验结果
Table 1 The experimental results of different algorithms on dataset HD
实验项目 实验结果 Method Feature/Strategy $P$ $R$ $F$ all 0.9000 0.3846 0.5389 post 0.8352 0.3462 0.4894 Single repost $\textbf{0.9902}$ $\textbf{0.5385}$ $\textbf{0.6976}$ url 0.6803 0.3846 0.4914 user 0.6573 0.4615 0.5423 Multi post+repost+url $\textbf{0.9525}$ $\textbf{0.6923}$ $ \textbf{0.8018}$ conjunct 1.0000 0.5385 0.7000 Comb disjunct $\textbf{0.8256}$ $\textbf{0.9231}$ $\textbf{0.8716}$ hybrid 0.9949 0.6923 0.8165 表 2 数据集BA上各算法实验结果
Table 2 The experimental results of different algorithms on dataset BA
实验项目 实验结果 Method Feature/Strategy $P$ $R$ $F$ all $\textbf{0.9662}$ $ \textbf{0.4000}$ $\textbf{0.5658}$ post 0.9740 0.2000 0.3319 Single repost 0.8640 0.3000 0.4454 url 0.2574 0.1333 0.1757 user 0.7346 0.3333 0.4586 Multi post+repost+url $\textbf{0.8787}$ $\textbf{0.4667}$ $\textbf{0.6096}$ conjunct 0.9554 0.2667 0.4170 Comb disjunct $\textbf{0.9030} $ $ \textbf{0.5333}$ $\textbf{0.6706}$ hybrid 0.8051 0.5667 0.6652 表 3 单独使用关键词特征时实验结果
Table 3 The experimental results with only keyword features
数据集 实验结果 $P$ $R$ $F$ HD $\textbf{0.7709}$ $\textbf{0.7692}$ $\textbf{0.7701}$ BA $\textbf{0.6327}$ $\textbf{0.3667}$ $\textbf{0.4643} $ 表 4 事件$A$, $B$的关键词提取结果
Table 4 Extracted keywords of event $A$ and $B$
时间窗口 关键词(Top 3) 2015-10-21 19时 恒大、决赛、亚冠、广州2015-10-21 20时 恒大、决赛、亚冠、广州2015-10-21 21时 恒大、决赛、亚冠、进2015-10-22 19时 恒大、英国、峰会、工商2015-10-22 20时 恒大、集团、英国、峰会2015-10-22 21时 恒大、英国、峰会、工商 -
