基于因子图模型的动态图半监督聚类算法

张建朋; 裴雨龙; 刘聪; 李邵梅; 陈鸿昶

doi:10.16383/j.aas.c170363

基于因子图模型的动态图半监督聚类算法

doi: 10.16383/j.aas.c170363

张建朋^{1, 2,},
裴雨龙^2, ,,
刘聪^{2, 3,},
李邵梅^1,,
陈鸿昶^1,

1.
国家数字交换系统工程技术研究中心郑州 450002 中国
2.
埃因霍温理工大学埃因霍温 5600MB 荷兰
3.
山东科技大学计算机学院青岛 266590 中国

基金项目:

国家自然科学基金群体项目 61521003

国家重点研发计划项目 2016YFB0800101

详细信息

作者简介:
张建朋  荷兰埃因霍温理工大学博士研究生, 中国国家数字交换系统工程技术研究中心助理研究员.主要研究方向为数据流挖掘. E-mail:zhangjianpeng0309@gmail.com

刘聪  荷兰埃因霍温理工大学博士生.主要研究方向为流程挖掘, 软件工程. E-mail: liucongchina@163.com

李邵梅  中国国家数字交换系统工程技术研究中心副研究员.主要研究方向为图像处理与模式识别. E-mail: lishaomei@ndsc.com.cn

陈鸿昶  中国国家数字交换系统工程技术研究中心教授.主要研究方向为电信网信息关防. E-mail: chenhongchang@ndsc.com.cn

通讯作者:
裴玉龙荷兰埃因霍温理工大学博士生.主要研究方向为数据挖掘, 机器学习.本文通信作者. E-mail: feilong0309@sina.com

计量
- 文章访问数: 6901
- HTML全文浏览量: 3334
- PDF下载量: 307
- 被引次数: 0
出版历程
- 收稿日期: 2017-07-01
- 录用日期: 2018-05-04
- 刊出日期: 2020-04-24

A Semi-supervised Clustering Algorithm Based on Factor Graph Model for Dynamic Graphs

ZHANG Jian-Peng^{1, 2
,},
PEI Yu-Long^{2
, ,},
LIU Cong^{2, 3
,},
LI Shao-Mei^1
,,
CHEN Hong-Chang^1
,

1.
National Digital Switching System Engineering & Technological R & D Center, Zhengzhou 450002, China
2.
Eindhoven University of Technology, Eindhoven 5600 MB, the Netherlands
3.
Department of Computer Science, Shandong University of Science and Technology, Qingdao 266590, China

Funds:

the Foundation for Innovative Research Groups of the National Natural Science Foundation of China 61521003

National Key Research and Development Program of China 2016YFB0800101

More Information

Author Bio:
ZHANG Jian-Peng   Ph.D. candidate at Eindhoven University of Technology, Netherlands and lecturer at the National Digital Switching System Engineering & Technological R & D Center, China. His main research interest covers data stream mining

LIU Cong   Ph.D. candidate at Eindhoven University of Technology, Netherlands. His research interest covers process mining and software engineering

LI Shao-Mei   Associate professor at the National Digital Switching System Engineering & Technological R & D Center, China. Her research interest covers image process and pattern recognition

CHEN Hong-Chang   Professor at the National Digital Switching System Engineering & Technological R & D Center, China. His main research interest is telecommunication network protection

Corresponding author: PEI Yu-Long Ph.D. candidate at Eindhoven University of Technology, Netherlands. His research interest covers data mining and machine learning. Corresponding author of this paper

摘要

摘要: 针对动态图的聚类主要存在着两点不足:首先, 现有的经典聚类算法大多从静态图分析的角度出发, 无法对真实网络图持续演化的特性进行有效建模, 亟待对动态图的聚类算法展开研究, 通过对不同时刻图快照的聚类结构进行分析进而掌握图的动态演化情况.其次, 真实网络中可以预先获取图中部分节点的聚类标签, 如何将这些先验信息融入到动态图的聚类结构划分中, 从而向图中的未标记节点分配聚类标签也是本文需要解决的问题.为此, 本文提出进化因子图模型(Evolution factor graph model, EFGM)用于解决动态图节点的半监督聚类问题, 所提EFGM不仅可以捕获动态图的节点属性和边邻接属性, 还可以捕获节点的时间快照信息.本文对真实数据集进行实验验证, 实验结果表明EFGM算法将动态图与先验信息融合到一个统一的进化因子图框架中, 既使得聚类结果满足先验知识, 又契合动态图的整体演化规律, 有效验证了本文方法的有效性.
- 半监督聚类 /
- 进化因子图模型 /
- 特征提取 /
- 动态图
Abstract: There are two main deficiencies on clustering of dynamic graphs. Firstly, most of existing classical algorithms analyze such graphs from the perspective of static analysis. However, static analysis is not capable of modeling the continuous evolution of real-world networks. Therefore, it is a great need to research on clustering algorithms for dynamic graphs. Our goal is to capture the dynamic evolution characteristics by considering the clustering structure of multiple snapshots as a whole. Secondly, some clustering labels of some nodes in real-world graphs can be obtained in advance, thus how to integrate these priori information into clustering assignments of dynamic graphs and assign clustering labels to the unlabeled nodes of each snapshot should be resolved. In this paper, we propose an evolution factor graph model (EFGM) for the semi-supervised clustering of nodes in dynamic networks. EFGM is able to capture the node-attribute and edge-adjacency attribute of each node of the dynamic graphs, and also make full use of the snapshot information. We experiment with the real-world graphs and experimental results show that the EFGM integrates the prior knowledge and the dynamic graph into a unified framework (i.e., evolution factor graph model), which makes the clustering result satisfy the prior label information and conform to the overall evolution of dynamic graphs. It validates the effectiveness of the proposed approach.
- Semi-supervised clustering /
- evolution factor graph model (EFGM) /
- feature extraction /
- dynamic graphs
Recommended by Associate Editor ZHU Jun
注释:

1) 本文责任编委朱军

HTML全文

图 1 进化因子图模型示意图

Fig. 1 Diagram of the evolution factor graph model

下载: 全尺寸图片幻灯片

图 2 特征数目对NMI指标的影响

Fig. 2 The impact of the number of features on NMI score

下载: 全尺寸图片幻灯片

图 3 特征数目对概率误差的影响

Fig. 3 The impact of the number of features on RE score

下载: 全尺寸图片幻灯片

图 4 训练集所占比例对NMI指标的影响

Fig. 4 The impact of the training set percentage on NMI score

下载: 全尺寸图片幻灯片

图 5 训练集所占比例对概率误差的影响

Fig. 5 The impact of the training set percentage on RE score

下载: 全尺寸图片幻灯片

表 1 相关符号说明

Table 1 Description of symbols

符号	说明
$G_L $	部分标注网络
$V_L $	被标注的节点
$V_U $	未被标注的节点
$E$	图中的边集合
$W$	$W_{ij} $为节点$V_i $的第$j_{th} $个属性值
$f$	映射函数, 将每个节点$i$赋予相应的标签, 记为$f_i$
$\Omega $	部分标注动态网络

下载: 导出CSV

表 2 DBLP数据集的会议名称和聚类簇标签

Table 2 Conference names and their clustering labels of DBLP dataset

聚类簇标签	会议名称
AI & ML	IJCAI, AAAI, ICML, UAI, AISTATS
AL & TH	FOCS, STOC, SODA, COLT
CV	CVPR, ICCV, ECCV, BMVC
DB	EDBT, ICDE, PODS, SIGMOD, VLDB
DM	KDD, SDM, ICDM, PAKDD
IR	SIGIR, ECIR

下载: 导出CSV

表 3 DBLP会议论文网络的统计信息

Table 3 Statistics of DBLP conference network

年份	作者关系	关系数量
2001	3 074	5 743
2002	2 557	5 343
2003	3 836	7 700
2004	3 464	7 132
2005	5 198	11 171
2006	4 494	9 392
2007	7 294	15 708
2008	5 780	12 398
2009	6 405	14 321
2010	5 757	12 738

下载: 导出CSV

表 4 真实网络图的实验结果比较

Table 4 Comparison of results on real-world graphs

网络集	相应算法	NMI	RE
HEPCitation	EFGM	0.845	0.203
	FFGM	0.393	0.478
	CFGM	0.824	0.245
	MV	0.578	0.450
	SVM	0.502	0.423
DBLP	EFGM	0.885	0.196
	FFGM	0.493	0.280
	CFGM	0.814	0.235
	MV	0.678	0.350
	SVM	0.560	0.323

下载: 导出CSV

表 5 各个算法在真实网络数据集上的处理时间比较(秒)

Table 5 Comparison of the execution time on a real-world networks (s)

运行时间(s)	EFGM	FFGM	CFGM	MV	SVM
HEPCitation	282.8	269.2	272.6	220.3	394.4
DBLP	123.8	110.3	108.2	84	232.3

下载: 导出CSV

表 6 不同特征提取方法的实验结果比较

Table 6 Comparison of results on different feature extraction methods

特征提取方法		NMI	RE
ReFeX	EFGM	0.837	0.222
	FFGM	0.372	0.427
	CFGM	0.799	0.253
Node2vec	EFGM	0.852	0.193
	FFGM	0.402	0.392
	CFGM	0.819	0.235

下载: 导出CSV

参考文献(34)

[1]	黄立威, 李彩萍, 张海粟, 刘玉超, 李德毅, 刘艳博.一种基于因子图模型的半监督社区发现方法.自动化学报, 2016, 42(10): 1520-1531 doi: 10.16383/j.aas.2016.c150261 Huang Li-Wei, Li Cai-Ping, Zhang Hai-Su, Liu Yu-Chao, Li De-Yi, Liu Yan-Bo. A semi-supervised community detection method based on factor graph model. Acta Automatica Sinica, 2016, 42(10): 1520-1531 doi: 10.16383/j.aas.2016.c150261
[2]	Girvan M, Newman M E J. Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America, 2002, 99(12): 7821-7826 doi: 10.1073/pnas.122653799
[3]	Newman M E J. Fast algorithm for detecting community structure in networks. Physical Review E, 2004, 69(6): Article No. 066133 http://d.old.wanfangdata.com.cn/OAPaper/oai_arXiv.org_cond-mat%2f0309508
[4]	Raghavan U N, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 2007, 76(3): Article No. 036106 http://d.old.wanfangdata.com.cn/OAPaper/oai_arXiv.org_0709.2938
[5]	Chakrabarti D, Kumar R, Tomkins A. Evolutionary clustering. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, PA, USA: ACM, 2006. 554-560 http://www.researchgate.net/publication/221654105_Evolutionary_clustering
[6]	Pan S R, Zhu X Q, Zhang C Q, Yu P S. Graph stream classification using labeled and unlabeled graphs. In: Proceedings of the IEEE 29th International Conference on Data Engineering (ICDE). Brisbane, QLD, Australia: IEEE, 2013. 398-409 http://www.researchgate.net/publication/261345341_Graph_stream_classification_using_labeled_and_unlabeled_graphs
[7]	Zhao Y C, Wang G, Yu P S, Liu S B, Zhang S. Inferring social roles and statuses in social networks. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Chicago, USA: ACM, 2013. 695-703 http://www.researchgate.net/publication/266654052_Inferring_social_roles_and_statuses_in_social_networks
[8]	Choobdar S, Ribeiro P, Parthasarathy S, Silva F. Dynamic inference of social roles in information cascades. Data Mining and Knowledge Discovery, 2015, 29(5): 1152-1177 doi: 10.1007/s10618-015-0402-5
[9]	Grover A, Leskovec J. Node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, California, USA: ACM, 2016. 855-864 https://www.researchgate.net/publication/305997704_node2vec_Scalable_Feature_Learning_for_Networks
[10]	常振超.在线社会网络社团检测关键技术研究[博士学位论文], 解放军信息工程大学, 中国, 2016 Chang Zhen-Chao. Research on Key Technologies of Community Detection in Online Social Networks[Ph.D. dissertation], Information Engineering University, China, 2016
[11]	Lin Y R, Chi Y, Zhu S, Sundaram H, Tseng B L. Analyzing communities and their evolutions in dynamic social networks. ACM Transactions on Knowledge Discovery from Data (TKDD), 2009, 3(2): Article No. 8 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=WFHYXW320184
[12]	Chi Y, Song X D, Zhou D Y, Hino K, Tseng B L. Evolutionary spectral clustering by incorporating temporal smoothness. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, California, USA: ACM, 2007. 153-162
[13]	Dinh T N, Nguyen N P, Thai M T. An adaptive approximation algorithm for community detection in dynamic scale-free networks. In: Proceedings of the 2013 IEEE International Conference on Computer Communications. Turin, Italy: IEEE, 2013. 55-59 http://www.researchgate.net/publication/261462402_An_adaptive_approximation_algorithm_for_community_detection_in_dynamic_scale-free_networks
[14]	Sun J M, Faloutsos C, Papadimitriou S, Yu P S. Graphscope: parameter-free mining of large time-evolving graphs. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, California, USA: ACM, 2007. 687-696 http://www.researchgate.net/publication/221654321_GraphScope_parameter-free_mining_of_large_time-evolving_graphs
[15]	肖杰斌, 张绍武.基于随机游走和增量相关节点的动态网络社团挖掘算法.电子与信息学报, 2013, 35(4): 977-981 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=dzkxxk201304036 Xiao Jie-Bin, Zhang Shao-Wu. An algorithm of integrating random walk and increment correlative vertexes for mining community of dynamic networks. Journal of Electronics & Information Technology, 2013, 35(4): 977-981 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=dzkxxk201304036
[16]	Ning H Z, Xu W, Chi Y, Gong Y H, Huang T S. Incremental spectral clustering by efficiently updating the eigen-system. Pattern Recognition, 2010, 43(1): 113-127 http://cn.bing.com/academic/profile?id=46240821097459119fa011826a06c99a&encoded=0&v=paper_preview&mkt=zh-cn
[17]	Ma X K, Gao L, Yong X R, Fu L D. Semi-supervised clustering algorithm for community structure detection in complex networks. Physica A: Statistical Mechanics and its Applications, 2010, 389(1): 187-197 doi: 10.1016/j.physa.2009.09.018
[18]	Allahverdyan A E, Ver Steeg G, Galstyan A. Community detection with and without prior information. EPL (Europhysics Letters), 2010, 90(1): Article No. 18002 doi: 10.1209/0295-5075/90/18002
[19]	Eaton E, Mansbach R. A spin-glass model for semi-supervised community detection. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence. Toronto, Ontario, Canada: AAAI, 2012. 900-906 https://www.researchgate.net/publication/268350911_A_Spin-Glass_Model_for_Semi-Supervised_Community_Detection
[20]	Liu D, Liu X, Wang W J, Bai H Y. Semi-supervised community detection based on discrete potential theory. Physica A: Statistical Mechanics and its Applications, 2014, 416: 173-182 doi: 10.1016/j.physa.2014.08.051
[21]	Yang L, Cao X C, Jin D, Wang X, Meng D. A unified semi-supervised community detection framework using latent space graph regularization. IEEE Transactions on Cybernetics, 2015, 45(10): 2585-2598 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=05d1e8c05c606c057da50daac18de1a3
[22]	Li L, Du M, Liu G, Hu X G, Wu G Q. Extremal optimization-based semi-supervised algorithm with conflict pairwise constraints for community detection. In: Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). Beijing, China: IEEE, 2014. 180-187 http://www.researchgate.net/publication/286758921_Extremal_optimization-based_semi-supervised_algorithm_with_conflict_pairwise_constraints_for_community_detection
[23]	Li K, Guo S X, Du N, Gao J, Zhang A D. Learning, analyzing and predicting object roles on dynamic networks. In: Proceedings of IEEE 13th International Conference on Data Mining (ICDM). Dallas, TX, USA: IEEE, 2013. 428-437 http://www.researchgate.net/publication/269033111_Learning_Analyzing_and_Predicting_Object_Roles_on_Dynamic_Networks
[24]	Yao Y B, Holder L. Scalable SVM-based classification in dynamic graphs. In: Proceedings of the 2014 IEEE International Conference on Data Mining (ICDM). Shenzhen, China: IEEE, 2014. 650-659 http://www.researchgate.net/publication/282237894_Scalable_SVM-Based_Classification_in_Dynamic_Graphs?ev=auth_pub
[25]	Koller D, Friedman N. Probabilistic Graphical Models: Principles and Techniques. Cambridge: MIT Press, 2009.
[26]	Tang W B, Zhuang H L, Tang J. Learning to infer social ties in large networks. In: Proceeding of the 2011 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Berlin, Heidelberg: Springer, 2011. 381-397
[27]	Yang Z, Tang J, Li J Z, Yang W J. Social community analysis via a factor graph model. IEEE Intelligent Systems, 2011, 26(3): 58-65 doi: 10.1109/MIS.2010.55
[28]	Xu H, Yang Y J, Wang L W, Liu W H. Node classification in social network via a factor graph model. In: Proceedings of the 2013 Pacific-Asia Conference on Knowledge Discovery and Data Mining. Berlin, Heidelberg: Springer, 2013. 213-224 https://www.researchgate.net/publication/273204869_Node_Classification_in_Social_Network_via_a_Factor_Graph_Model
[29]	Murphy K P, Weiss Y, Jordan M I. Loopy belief propagation for approximate inference: An empirical study. In: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Stockholm, Sweden: Morgan Kaufmann Publishers Inc., 1999. 467-475 http://www.researchgate.net/publication/235356658_Loopy_Belief
[30]	Mao Q, Wang L, Tsang I W, Sun Y J. Principal graph and structure learning based on reversed graph embedding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11): 2227-2241 doi: 10.1109/TPAMI.2016.2635657
[31]	Chen S H, Niu S F, Akoglu L, Kovačević J, Faloutsos C. Fast, Warped Graph Embedding: Unifying Framework and One-Click Algorithm. arXiv preprint arXiv: 1702.05764, 2017.
[32]	Shijia E, Jia S B, Xiang Y, Ji Z L. Knowledge graph embedding for link prediction and triplet classification. In: Proceedings of the 2016 Knowledge Graph and Semantic Computing: Semantic, Knowledge, and Linked Big Data. Singapore: Springer, 2016. 228-232 https://www.researchgate.net/publication/310742316_Knowledge_Graph_Embedding_for_Link_Prediction_and_Triplet_Classification
[33]	Hu W M, Gao J, Xing J L, Zhang C, Maybank S. Semi-supervised tensor-based graph embedding learning and its application to visual discriminant tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(1): 172-188 doi: 10.1109/TPAMI.2016.2539944
[34]	Lueckenga J, Engel D, Green R. Weighted vote algorithm combination technique for anomaly based smart grid intrusion detection systems. In: Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN). Vancouver, BC, Canada: IEEE, 2016. 2738-2742 Weighted vote algorithm combination technique for anomaly based smart grid intrusion detection systems

施引文献

资源附件(0)

访问统计

图(5) / 表(6)

计量

文章访问数: 6901
HTML全文浏览量: 3334
PDF下载量: 307
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于因子图模型的动态图半监督聚类算法

doi: 10.16383/j.aas.c170363

通讯作者:
裴玉龙荷兰埃因霍温理工大学博士生.主要研究方向为数据挖掘, 机器学习.本文通信作者. E-mail: feilong0309@sina.com

计量

A Semi-supervised Clustering Algorithm Based on Factor Graph Model for Dynamic Graphs

Corresponding author: PEI Yu-Long Ph.D. candidate at Eindhoven University of Technology, Netherlands. His research interest covers data mining and machine learning. Corresponding author of this paper

计量

目录

留言板

基于因子图模型的动态图半监督聚类算法

doi: 10.16383/j.aas.c170363

通讯作者: 裴玉龙 荷兰埃因霍温理工大学博士生.主要研究方向为数据挖掘, 机器学习.本文通信作者. E-mail: feilong0309@sina.com

计量

出版历程

A Semi-supervised Clustering Algorithm Based on Factor Graph Model for Dynamic Graphs

Corresponding author: PEI Yu-Long Ph.D. candidate at Eindhoven University of Technology, Netherlands. His research interest covers data mining and machine learning. Corresponding author of this paper

计量

出版历程

目录

通讯作者:
裴玉龙荷兰埃因霍温理工大学博士生.主要研究方向为数据挖掘, 机器学习.本文通信作者. E-mail: feilong0309@sina.com