Graph-based Features for Identifying Spammers in Microblog Networks
-
摘要: 随着网络水军策略的不断演变,传统的基于用户内容和用户行为的发现方法 对新型社交网络水军的识别效果不断下降.水军用户可以变更自身的博文内容与转发行为, 但无法改变与网络中正常用户的连结关系,形成的结构图具有一定的稳定性, 因此,相对于用户的内容特征与行为特征,用户关系特征在水军识别中具有更强的鲁棒性与准确度. 由此,本文提出一种基于用户关系图特征的微博水军账号识别方法. 实验中通过爬虫程序抓取新浪微博网络数据; 然后,提取用户的属性特征、时间特征、关系图特征;最后,利用三种机器学习算法对用户进行分类预测. 仿真结果表明,添加新特征后对水军账号的识别准确率、召回率提高5%以上, 从而验证了关系图特征在水军识别中的有效性.Abstract: With the evolution of spammer strategy, traditional methods of identifying spammer based on content and behavior are becoming hard to find new social networks spammers. Microblog users can change their own blog contents and forwarding behaviors to escape from detecting, but it is difficult to change the relationship with the normal users. The relationship graph between spammers and normal users is relatively stable. Thus, the relationship graph is more robust and accurate in detecting microblog spammers, as compared with content-based features and behavior-based features. This paper proposes a method of detecting microblog spammers based on the user relationship graph. Our experiment used the network datasets by Sina microblog crawler, and then extracted user's attribute feature, time feature and relationship graph feature. Finally, three machine learning algorithms were used to identify the spam accounts. Simulation results show that with the new features, the accuracy and recall of existing methods can be improved by more than 5%, which verifies the validity of relationship graph features in detecting microblog spammers.
-
Key words:
- Microblog network /
- machine learning /
- spammers /
- graph-based feature /
- classifier
-
[1] Almeida T A, Yamakami A. Content-based spam filtering. In: Proceedings of the 2010 International Joint Conference on Neural Networks. Barcelona: IEEE, 2010. 1-7 [2] Zhang L, Zhu J B, Yao T S. An evaluation of statistical spam filtering techniques. ACM Transactions on Asian Language Information Processing, 2004, 3(4): 243-269 [3] Cao Jian-Ping, Wang Hui, Xia You-Qing, Qiao Feng-Cai, Zhang Xin. Bi-path evolution model for online topic model based on LDA. Acta Automatica Sinica, 2014, 40(12): 2877-2886(曹建平, 王晖, 夏友清, 乔凤才, 张鑫. 基于LDA的双通道在线主题演化模型. 自动化学报, 2014, 40(12): 2877-2886) [4] Liu Hong-Yu, Zhao Yan-Yan, Qin Bing, Liu Ting. Comment target extraction and sentiment classification. Journal of Chinese Information Processing, 2006, 24(1): 84-88, 122(刘鸿宇, 赵妍妍, 秦兵, 刘挺. 评价对象抽取及其倾向性分析. 中文信息学报, 2006, 24(1): 84-88, 122) [5] Jindal N, Liu B, Lim E P. Finding unusual review patterns using unexpected rules. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management. New York, United States: ACM, 2010. 1549-1552 [6] Ott M, Choi Y, Cardie C, Hancock J T. Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Stroudsburg, PA, USA: ACL, 2011. 309-319 [7] Niu Y, Wang Y M, Chen H, Ma M, Hsu F. A quantitative study of forum spamming using context-based analysis. In: Proceedings of the 2007 Network and Distributed System Security Symposium. San Diego, United States: ISOC, 2007. 1-14 [8] Mao Jia-Xi, Liu Yi-Qun, Zhang Min, Ma Shao-Ping. Social influence analysis for micro-blog user based on user behavior. Chinese Journal of Computers, 2014, 37(4): 791-799 (毛佳昕, 刘奕群, 张敏, 马少平. 基于用户行为的微博用户社会影响力分析. 计算机学报, 2014, bf 37(4): 791-799) [9] Hayati P, Chai K, Potdar V. Computational Science and Its Applications---ICCSA2010. Berlin, Heidelberg: Springer, 2010. 351-360 [10] Song J, Lee S, Kim J. Recent Advances in Intrusion Detection. Berlin. Heidelberg: Springer, 2011. 301-317 [11] Murmann A J. Enhancing Spammer Detection in online Social Networks with Trust-based Metrics [Master dissertation], San Jose State University, United States, 2009. [12] Xu Zhi-Ming, Li Dong, Liu Ting, Li Sheng, Wang Gang, Yuan Shu-Lun. Measuring similarity between microblog users and its application. Chinese Journal of Computers, 2014, 37(1): 207-218 (徐志明, 李栋, 刘挺, 李生, 王刚, 袁树仑. 微博用户的相似性度量及其应用. 计算机学报, 2014, 37(1): 207-218) [13] Yang C, Harkreader R, Gu G F. Empirical evaluation and new design for fighting evolving Twitter spammers. IEEE Transactions on Information Forensics and Security, 2013, 8(8): 1280-1293 [14] Hu Yun, Wang Chong-Jun, Wu Jun, Xie Jun-Yuan, Li Hui. Overlapping community discovery and global representation on microblog network. Journal of Software, 2014, 25(12): 2824-2836(胡云, 王崇骏, 吴骏, 谢俊元, 李慧. 微博网络上的重叠社群发现与全局表示. 软件学报, 2014, 25(12): 2824-2836) [15] Zhou Xiao-Ping, Liang Xun, Zhang Hai-Yan. User community detection on micro-blog using R-C model. Journal of software, 2014, 25(12): 2808-2823(周小平, 梁循, 张海燕. 基于R-C模型的微博用户社区发现. 软件学报, 2014, 25(12): 2808-2823) [16] Lin C F, He J H, Zhou Y, Yang X K, Chen K, Song L. Analysis and identification of spamming behaviors in Sina Weibo microblog. In: Proceedings of the 7th Workshop on Social Network Mining and Analysis. Chicago, United States: ACM, 2013: Article No.5 [17] Fu Ju-Lei, Liu Wen-Li, Zheng Xiao-Long, Fan Ying, Wang Shou-Yang. Analyzing the characteristics of "East Turkistan" activities using text mining and network analysis. Acta Automatica Sinica, 2014, 40(11): 2456-2468(付举磊, 刘文礼, 郑晓龙, 樊瑛, 汪寿阳. 基于文本挖掘和网络分析的"东突"活动主要特征研究. 自动化学报, 2014, 40(11): 2456-2468) [18] Bai Lin-Gen, Chen Zhi-Qun, Wang Rong-Bo, Huang Xiao-Xi. Empirical analysis on K-core of microblog following relationship network. New Technology of Library and Information Service, 2013, 29(11): 68-74(白林根, 谌志群, 王荣波, 黄孝喜. 微博关注关系网络K-!核结构实证分析. 现代图书情报技术, 2013, bf 29(11): 68-74) [19] Chen K, Chen L, Zhu P D, Xiong Y S. Unveil the spams in Weibo. In: Proceedings of the 2013 IEEE and Internet of Things, IEEE International Conference on and IEEE Cyber, Physical and Social Computing, Green Computing and Communications. Beijing, China: IEEE, 2013: 916-922 [20] Benevenuto F, Magno G, Rodrigues T, Almeida V. Detecting spammers on Twitter. In: Proceedings of the 7th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference. Redmond, United States: CEAS, 2010: 12-21 [21] Han Zhong-Ming, Xu Feng-Min, Duan Da-Gao. Probabilistic graphical model for identifying water army in microblogging system. Journal of Computer Research and Development, 2013, 50(Suppl): 180-186 (韩忠明, 许峰敏, 段大高. 面向微博的概率图水军识别模型. 计算机研究与发展, 2013, 50(Suppl): 180-186) [22] Mo Qian, Yang Ke. Overview of web spammer detection. Journal of Software, 2014, 25(7): 1505-1526 (莫倩, 杨珂. 网络水军识别研究. 软件学报, 2014, 25(7): 1505-1526) [23] Lu Hao, Wang Fei-Yue, Liu De-Rong, Zhang Nan, Zhao Xue-Liang. Analytics of lastest research progress in automation discipline based on academic knowledge mapping. Acta Automatica Sinica, 2014, 40(5): 994-1015 (陆浩, 王飞跃, 刘德荣, 张楠, 赵学亮. 基于科研知识图谱的近年国内外自动化学科发展综述. 自动化学报, 2014, 40(5): 994-1015)
点击查看大图
计量
- 文章访问数: 3075
- HTML全文浏览量: 269
- PDF下载量: 1660
- 被引次数: 0