Graph Clustering Based on Flexibly Balanced Constraint
-
摘要: 现有的图聚类方法主要存在两方面的问题, 一是对各个类规模一致的假设, 在许多实际应用中并不成立. 二是在处理多类聚类问题时, 其所常借助的递归技术或启发式算法会影响聚类的性能. 为此, 本文提出一种基于灵活平衡约束的多类图聚类方法. 其能够覆盖从绝对平衡约束到无平衡约束的范围, 可同时处理类别规模一致和不一致的问题. 为有效求解新方法中的参数, 进一步提出一个紧松弛方法来使所提出的图聚类方法不仅易于求解, 且在处理多类聚类问题时不必依赖递归技术, 而能直接得到聚类结果. 文中也给出一种实现松弛图聚类的有效求解算法. 在合成数据和真实数据上的实验结果表明, 所提出的方法具有良好的性能.Abstract: There are two main problems in existing graph clustering methods. One is that they assume the size of each class should be balanced, which is inconsistent with many practical applications. The other is that they were proposed to deal with the multiclass clustering by using either recursive techniques or heuristics, which deteriorates the clustering performance. To solve these problems, we propose a multiclass graph clustering method based on flexible balance constraints in this paper. The proposed method can deal with the issue that the size of each class is both consistent and inconsistent. Moreover, a tight relaxation approach is presented for effectively solving the proposed method, which not only makes it easy to be solved, but also can directly obtain the clustering results without relying on recursion techniques when dealing with multi-class clustering problems. Meanwhile, we propose an efficient algorithm to solve the relaxed graph clustering model. Experimental results on synthetic and real datasets show that the proposed method achieves a superior performance.
-
Key words:
- Graph clustering /
- graph cut /
- balanced constraint /
- tight relaxation
-
表 1 真实数据集
Table 1 The statistics of real databases
数据集 样本数 类别数 数据类型 HIGHSCHOOL 60 5 社交 POLBOOKS 105 3 社交 FOOTBALL 115 12 社交 DUKE 44 2 医疗 CANCER 198 14 医疗 KHAN 83 4 基因 ROSETTA 300 5 基因 ORL 400 40 图像 UMIST 575 20 图像 ALPHADIGS 1404 36 图像 COIL20 1440 20 图像 SAVEE 480 7 语音 CASIA 1200 6 语音 IEMOCAP 9994 8 语音 MED 1033 31 文本 WEBKB4 4196 4 文本 REUTERS 8293 65 文本 YEAST 1484 10 生物 FAULTS 1941 7 工业 ECOLI 327 5 蛋白质 表 2 各图聚类方法的聚类精度(%)对比
Table 2 Comparison of clustering accuracies (%) between graph clustering methods
Dataset 1Spec[16] MTV[11] L21MC[19] BKCut[18] CCB[15] FBC (r) FMC (r) HIGHSCHOOL 81.67 80.00 70.00 81.67 86.67 71.67 (0.9) 89.67 (1.2) POLBOOKS 82.86 79.05 77.14 81.90 83.81 81.50 (1.4) 83.81 (1.3) FOOTBALL 86.96 92.17 24.35 93.04 86.09 92.11 (1.3) 93.04 (1.8) DUKE 52.27 70.45 63.64 52.27 52.27 70.45 (0.5) 70.45 (0.5) CANCER 34.34 54.04 52.02 54.55 48.48 53.54 (1.5) 54.55 (1.7) KHAN 57.83 49.40 54.22 51.81 57.83 50.28 (1.6) 57.83 (1.6) ROSETTA 76.67 76.67 76.67 76.67 76.67 76.67 (1.0) 77.33 (1.9) ORL 36.00 52.50 79.25 82.00 74.25 74.75 (1.5) 81.75 (1.0) UMIST 58.96 76.87 70.78 70.09 65.22 72.70 (1.3) 74.96 (1.3) ALPHADIGS 36.11 51.71 46.08 47.72 45.87 36.84 (2.0) 49.00 (2.0) COIL20 45.00 81.46 50.97 78.89 77.29 81.81 (1.0) 82.92 (1.2) SAVEE 25.21 32.50 31.46 31.46 29.17 31.02 (1.1) 33.24 (1.3) CASIA 22.42 32.92 28.50 28.08 28.58 26.00 (0.1) 35.54 (0.4) IEMOCAP 54.03 54.00 54.00 54.03 54.03 54.03 (1.9) 55.21 (1.5) MED 44.72 53.92 52.76 54.99 53.53 50.09 (1.4) 55.00 (1.4) WEBKB4 39.39 56.22 39.30 51.60 39.56 46.14 (0.5) 59.60 (0.7) REUTERS 52.83 73.34 65.69 61.02 69.38 67.58 (2.0) 73.53 (1.5) YEAST 51.35 53.97 49.26 53.71 51.95 44.41 (1.6) 54.85 (1.6) FAULTS 36.89 40.08 37.92 41.11 39.52 31.72 (1.8) 42.76 (1.7) ECOLI 79.82 77.67 77.67 82.87 77.98 71.43 (1.8) 82.87 (1.8) 表 3 本文方法与其它非图聚类方法的聚类精度(%)对比
Table 3 Comparison of clustering accuracies (%) between ours and other non-graph clustering methods
Dataset PNMF[44] NSC[45] PLSI[42] LSD[46] NMFR[47] CLR[43] FMC (r) HIGHSCHOOL 81.67 83.33 83.33 81.67 95.00 80.00 89.67 (1.2) POLBOOKS 78.10 80.95 78.10 78.10 79.05 80.95 83.81 (1.3) FOOTBALL 93.04 93.04 93.04 93.04 93.04 93.04 93.04 (1.8) DUKE 52.27 52.27 70.45 70.45 70.45 52.27 70.45 (0.5) CANCER 53.53 53.03 53.53 53.03 51.52 54.04 54.55 (1.7) KHAN 60.24 55.42 55.42 51.81 50.60 51.81 57.83 (1.6) ROSETTA 76.67 76.67 76.67 76.67 76.67 77.00 77.33 (1.9) ORL 81.50 82.25 82.50 81.25 82.50 80.00 81.75 (1.0) UMIST 64.00 68.00 68.52 68.35 71.83 70.26 74.96 (1.3) ALPHADIGS 44.94 49.43 48.65 49.36 50.78 36.47 49.00 (2.0) COIL20 75.21 81.25 80.69 75.00 82.85 85.21 82.92 (1.2) SAVEE 31.88 31.46 31.67 34.17 31.88 28.13 33.24 (1.3) CASIA 35.17 29.08 32.00 32.17 34.08 27.92 35.54 (0.4) IEMOCAP 54.00 54.06 54.00 54.00 54.00 54.02 55.21 (1.5) MED 53.92 53.73 54.50 54.60 55.86 49.18 55.00 (1.4) WEBKB4 39.06 39.77 48.90 50.81 63.32 39.56 59.60 (0.7) REUTERS 73.86 76.47 75.92 74.99 76.52 51.80 73.53 (1.5) YEAST 52.76 54.31 52.69 52.09 54.85 46.90 54.85 (1.6) FAULTS 39.26 40.08 40.03 40.18 38.17 41.37 42.76 (1.7) ECOLI 77.68 78.90 79.82 67.89 79.20 78.29 82.87 (1.8) 表 4 不同聚类方法得到的图分割函数值
Table 4 The values of different graph cut functions obtained by different clustering methods
图分割函数 1Spec[16] MTV[11] L21MC[19] BKCut[18] CCB[15] PLSI[42] LSD[46] NMFR[47] CLR[43] Ours FRCut0.5 12.64 (13.01) 13.99 (12.66) 15.57 (10.64) 12.03 (11.09) 18.63 (14.63) 14.43 (12.16) 14.70 (12.41) 14.27 (12.04) 13.69 (12.14) 9.71 (10.52) FRCut1.0 9.21 (9.64) 12.27 (11.22) 12.09 (8.27) 9.67 (9.31) 13.93 (11.28) 12.36 (10.35) 12.95 (10.83) 12.77 (10.90) 9.78 (9.08) 8.05 (8.18) FRCut2.0 7.79 (7.96) 11.89 (10.99) 10.90 (7.77) 8.37 (8.42) 12.18 (10.64) 11.88 (9.99) 12.62 (10.57) 12.50 (10.78) 8.18 (8.36) 6.72 (7.18) FNCut0.5 1.83 (1.48) 1.78 (1.42) 2.05 (1.25) 1.57 (1.25) 1.87 (1.46) 1.84 (1.35) 1.90 (1.38) 1.83 (1.37) 1.79 (1.48) 1.39 (1.21) FNCut1.0 1.35 (1.14) 1.53 (1.23) 1.56 (0.93) 1.29 (1.09) 1.41 (1.13) 1.54 (1.14) 1.63 (1.18) 1.60 (1.20) 1.26 (1.06) 1.14 (1.03) FNCut2.0 1.19 (1.00) 1.47 (1.20) 1.39 (0.87) 1.18 (1.03) 1.24 (1.07) 1.47 (1.10) 1.57 (1.14) 1.55 (1.17) 1.04 (0.95) 0.92 (0.83) -
[1] MacQueen J. Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley symposium on mathematical statistics and probability. Oakland, California, USA, 1967. 281−297 [2] Luxburg U. A tutorial on spectral clustering. Statistics and Computing, 2007, 4(17): 395−416 [3] Ding Shi-Fei, Cong Lin, Hu Qian-Kun, Jia Hong-Jie, Shi Zhong-Zhi. A multiway p-spectral clustering algorithm. Knowledge-Based Systems, 2019, 164: 371−377 [4] Liang Nai-Yao, Yang Zu-Yuan, Li Zhen-Ni, Xie Sheng-Li, Su Chun-Yi. Semi-supervised multi-view clustering with graph-regularized partially shared non-negative matrix factorization. Knowledge-Based Systems, to be published [5] 庞宁, 张继福, 秦啸. 一种基于多属性权重的分类数据子空间聚类算法. 自动化学报, 2018, 44(3): 517−532Pang Ning, Zhang Ji-Fu, Qin Xiao. A subspace clustering algorithm of categorical data using multiple attribute weights. Acta Automatica Sinica, 2018, 44(3): 517−532 [6] 尹明, 吴浩杨, 谢胜利, 杨其宇. 基于自注意力对抗的深度子空间聚类. 自动化学报, 2020, 46(x): 1−11Yin Ming, Wu Hao-Yang, Xie Sheng-Li, Yang Qi-Yu. Self-attention adversarial based deep subspace clustering. Acta Automatica Sinica, 2020, 46(x): 1−11 [7] Stoer M, Wagner P. A simple min-cut algorithm. Journal of the ACM, 1997, 44(4): 585−591 [8] Hagen L, Kahng A B. New spectral methods for ratio cut partitioning and clustering. IEEE transactions on computer-aided design of integrated circuits and systems, 1992, 11(9): 1074−1085 [9] Shi Jian-Bo, Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888−905 [10] Chung F R, Graham F C. Spectral graph theory. Providence: American Mathematical Society, 1997. 36−45 [11] Bresson X, Laurent T, Uminsky D, Von Brecht J. Multiclass total variation clustering. In: Proceedings of Advances in Neural Information Processing Systems. Lake Tahoe, Nevada, USA, 2013. 1421−1429 [12] Bresson X, Tai X C, Chan T F, Szlam A. Multiclass transductive learning based on $ \ell_1$ relaxations of cheeger cut and mumford-shah-potts model. Journal of mathematical imaging and vision, 2014, 49(1): 191−201[13] Hein M, Setzer S. Beyond spectral clustering-tight relaxations of balanced graph cuts. In: Proceedings of Advances in neural information processing systems. Granada, Spain, 2011. 2366−2374 [14] Bühler T, Hein M. Spectral clustering based on the graph p-laplacian. In: Proceedings of the 26th Annual International Conference on Machine Learning. Montreal, Quebec, Canada, 2009. 81−88 [15] Cahill N D, Hayes T L, Meinhold R T, Hamilton J F. Compassionately conservative balanced cuts for image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah, USA, 2018. 1683−1691 [16] Hein M, $ {{\rm{B}}\mathop {'}\limits^{'}} {\rm{uhler}}$ T. An inverse power method for nonlinear eigenproblems with applications in 1-spectral clustering and sparse PCA. In: Proceedings of Advances in Neural Information Processing Systems. Vancouver, Canada, 2010. 847−855 [17] Nie Fei-Ping, Wang Hua, Deng Cheng, Gao Xin-Bo, Li Xue-Long, Huang Heng. New $ \ell_1$-norm relaxations and optimizations for graph clustering. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. Phoenix, Arizona, USA, 2016. 1962−1968 [18] Rangapuram S S, Mudrakarta P K, Hein M. Tight continuous relaxation of the balanced k-cut problem. In: Proceedings of Advances in Neural Information Processing Systems. Montreal, Canada, 2014. 3131−3139 [19] Yang Xu, Deng Cheng, Liu Xiang-Long, Nie Fei-Ping. New $ \ell_{2, 1}$-norm relaxation of multi-way graph cut for clustering. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans, Louisiana, USA, 2018. 4374−4381 [20] Ling Shu-Yang, Strohmer T. Certifying global optimality of graph cuts via semidefinite relaxation: A performance guarantee for spectral clustering. Foundations of Computational Mathematics, 2019, 20(1): 367−421 [21] Lovász L. Submodular functions and convexity. Mathematical Programming: The State of the Art. Springer, Berlin, Heidelberg, 1983. 235−257 [22] Bach F. Submodular functions: from discrete to continuous domains. Mathematical Programming, 2019, 175: 419−459 [23] Ji Sai, Xu Da-Chuan, Li Min, Wang Yi-Shui, Zhang Dong-Mei. Stochastic greedy algorithm is still good: maximizing submodular + supermodular functions. In: Proceedings of the 6th World Congress on Global Optimization. Metz, France, 2019. 488−497 [24] Chambolle A, Pock T. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of mathematical imaging and vision, 2011, 40(1): 120−145 [25] Batagelj V, Mrvar A. Pajek datasets[Online], available: http://vlado.fmf.uni-lj.si/pub/networks/data/, 2006 [26] Krebs V. Books about US politics[Online], available: https://www.cise.ufl.edu/research/sparse/matrices/Newman/polbooks.html, 2014 [27] Chang C C, Lin C J. LIBSVM : a library for support vector machines[Online], available: http://www.csie.ntu.edu.tw/~cjlin/libsvm, 2019 [28] Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. New York: Springer-Verlag, 2009. 649−698 [29] Kim P, Tidor B. Subsystem identification through dimensionality reduction of large-scale gene expression data. Genome Research, 2003, 13(7): 1706−1718 [30] Samaria F S, Harter A C. Parameterisation of a stochastic model for human face identification. In: Proceedings of the IEEE Workshop on Applications of Computer Vision. Sarasota, Florida, USA, 1994. 138−142. [31] Graham D B, Allinson N M. Characterising virtual eigensignatures for general purpose face recognition. Face Recognition: From Theory to Applications. Berlin: Springer-Heidelberg, 1998. 446−456 [32] Roweis S. The binary alphadigits dataset[Online], available: https://cs.nyu.edu/~roweis/data.html, 2010 [33] Nene S A, Nayar S K, Murase H. Columbia object image library (COIL-20)[Online], available: http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php, 1996 [34] Haq S, Jackson P J B, Edge J. Audio-visual feature selection and reduction for emotion classification. In: Proceedings of the International Conference on Auditory-Visual Speech Processing. Moreton Island, Queensland, Australia, 2008. 185−190 [35] ChineseLDC. CASIA-Chinese emotional speech corpus[Online], available: http://www.chineseldc.org/, 2010 [36] Busso C, Bulut M, Lee C C, Kazemzadeh A, Mower E, Kim S, Chang J N, Lee S, Narayanan S S. IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 2008, 42(4): 335−359 [37] Dumais S, Berry W M. LSI: latent semantic indexing[Online], available: http://web.eecs.utk.edu/research/lsi/, 2006 [38] Srivastava S K, Singh S K, Suri J S. Effect of incremental feature enrichment on healthcare text classification system: A machine learning paradigm. Computer Methods and Programs in Biomedicine, 2019, 172: 35−51 [39] Dua D, Graff C. UCI machine learning repository[Online], available: http://archive.ics.uci.edu/ml, 2019 [40] Oyallon E, Zagoruyko S, Huang G, Komodakis N, Julien S L, Blaschko M, et al. Scattering Networks for Hybrid Representation Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(9): 2208−2221 [41] Schuller B, Batliner A, Bergler C, Messner E M, Hamilton A, Amiriparian S, et al. The INTERSPEECH 2020 computational paralinguistics challenge: Elderly emotion, breathing and masks. In: Proceedings of the 21th Annual Conference of the International Speech Communication Association. Shanghai, China, 2020. 1−5 [42] Zhou Dao-Xiang, Yang Dan, Zhang Xiao-Hong, Huang Sheng, Feng Shu. Discriminative probabilistic latent semantic analysis with application to single sample face recognition. Neural Processing Letters, 2019, 49(3): 1273−1298 [43] Nie Fei-Ping, Wang Xiao-Qian, Jordan M I, Huang Heng. The constrained Laplacian rank algorithm for graph-based clustering. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. Phoenix, Arizona, USA, 2016. 1969−1976 [44] Yang Zhi-Rong, Oja E. Linear and nonlinear projective nonnegative matrix factorization. IEEE Transactions on Neural Networks, 2010, 21(5): 734−749 [45] Ding C, Li Tao, Jordan M I. Nonnegative matrix factorization for combinatorial optimization: Spectral clustering, graph matching, and clique finding. In: Proceedings of the Eighth IEEE International Conference on Data Mining. Las Vegas, Nevada, USA: IEEE, 2008. 183−192 [46] Arora R, Gupta M, Kapila A, Fazel M. Clustering by left-stochastic matrix factorization. In: Proceedings of the 28th International Conference on Machine Learning. Bellevue, Washington, USA, 2011. 761−768 [47] Yang Zhi-Rong, Hao Te-Le, Dikmen O, Chen Xi, Oja E. Clustering by nonnegative matrix factorization using graph random walk. In: Proceedings of Advances in Neural Information Processing Systems. Lake Tahoe, Nevada, USA, 2012. 1079−1087 -

计量
- 文章访问数: 67
- HTML全文浏览量: 28
- 被引次数: 0