邮件网络协同过滤机制研究

杨震; 赖英旭; 段立娟; 李玉鑑; 许昕

doi:10.3724/SP.J.1004.2012.00399

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名

邮箱

手机号码

标题

留言内容

验证码

邮件网络协同过滤机制研究

doi: 10.3724/SP.J.1004.2012.00399

1.
北京工业大学计算机学院北京 100124

计量
- 文章访问数: 1874
- HTML全文浏览量: 28
- PDF下载量: 1179
- 被引次数: 0
出版历程
- 收稿日期: 2011-02-22
- 修回日期: 2011-10-17
- 刊出日期: 2012-03-20

Spam Collaborative Filtering in Enron E-mail Network

1.
College of Computer Sciences, Beijing University of Technology, Beijing 100124

摘要

摘要: 基于Enron邮件集合探索真实邮件网络,揭示出邮件网络的无标度特性和有限小世界特性. 在此基础上,依据用户间交互强度设计出垃圾邮件协同过滤机制,通过调整参数λ,用户可以决定主要是依靠自己还是其他用户协同进行垃圾信息过滤. 算法即使在没有对用户个人阅读习惯充分训练的情况下,也可以通过基于交互强度的网络协同方式实现良好过滤. 同时为了解决Enron数据集缺乏标注的情况,基于训练样本集W和测试样本集T独立同分布的假设,利用改进的EM (Expectation maximization)算法最小化W∪T集合上风险函数,给出了未知样本的一个良好标注. 真实数据上的实验表明,同单机过滤和集成过滤方法相比,协同过滤能够提高平均过滤精度且方法简单易行.
- 文本分类 /
- 邮件过滤 /
- 邮件网络 /
- 协同过滤
Abstract: Social network analysis in Enron corpus found that the real e-mail network was a scale-free and small world in some degree. Then a spam collaborative filtering method was designed based on users' interaction. By adjusting the parameter λ, users can decide filtering spam by themselves or others or trade-off between them. Even in the absence of reading habits of users, the collaborative filtering method could achieve good performance. Because the Enron corpus was unlabeled, by adding i.i.d. assumption constraint to training data set W and test data set T, we labeled Enron corpus using improved EM (Expectation maximization) algorithm in a sense of minimum statistical risk in W ∪ T. Experiment results showed that the collaborative filtering method is simple and effective which can steadily increase average accuracy compared with single machine and ensemble filterings.
- Text classification /
- spam filtering /
- e-mail network /
- collaborative filtering