2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于灵活平衡约束的图聚类方法

罗辉 韩纪庆

罗辉, 韩纪庆. 基于灵活平衡约束的图聚类方法. 自动化学报, 2023, 49(4): 778−789 doi: 10.16383/j.aas.c200144
引用本文: 罗辉, 韩纪庆. 基于灵活平衡约束的图聚类方法. 自动化学报, 2023, 49(4): 778−789 doi: 10.16383/j.aas.c200144
Luo Hui, Han Ji-Qing. Graph clustering based on flexibly balanced constraint. Acta Automatica Sinica, 2023, 49(4): 778−789 doi: 10.16383/j.aas.c200144
Citation: Luo Hui, Han Ji-Qing. Graph clustering based on flexibly balanced constraint. Acta Automatica Sinica, 2023, 49(4): 778−789 doi: 10.16383/j.aas.c200144

基于灵活平衡约束的图聚类方法

doi: 10.16383/j.aas.c200144
基金项目: 国家自然科学基金(U1736210), 国家重点研发计划(2017YFB1002102)资助
详细信息
    作者简介:

    罗辉:哈尔滨工业大学计算机科学与技术学院博士研究生. 2014年获贵州大学通信与信息系统专业硕士学位. 主要研究方向为智能语音处理技术和模式识别. 本文通信作者.E-mail: luohui0216@163.com

    韩纪庆:哈尔滨工业大学计算机科学与技术学院教授. 1998年获哈尔滨工业大学计算机科学博士学位. 主要研究方向为智能语音处理技术和音频信息的智能处理. E-mail: jqhan@hit.edu.cn

Graph Clustering Based on Flexibly Balanced Constraint

Funds: Supported by National Natural Science Foundation of China (U1736210) and National Key Research and Development Program of China (2017YFB1002102)
More Information
    Author Bio:

    LUO Hui Ph.D. candidate at the School of Computer Science and Technology, Harbin Institute of Technology. He received his master degree in communication and information systems from Guizhou University in 2014. His research interest covers intelligent speech processing technology and pattern recognition. Corresponding author of this paper

    HAN Ji-Qing Professor at the School of Computer Science and Technology, Harbin Institute of Technology. He received his Ph.D. degree in computer science from Harbin Institute of Technology in 1998. His research interest covers intelligent speech processing technology and intelligent processing of audio information

  • 摘要: 现有的图聚类方法主要存在两方面的问题, 一是对各个类规模一致的假设, 在许多实际应用中并不成立; 二是在处理多类聚类问题时, 其所常借助的递归技术或启发式算法会影响聚类的性能. 为此, 本文提出一种基于灵活平衡约束的多类图聚类方法. 其能够覆盖从绝对平衡约束到无平衡约束的范围, 可同时处理类别规模一致和不一致的问题. 为有效求解新方法中的参数, 进一步提出一个紧松弛方法来使所提出的图聚类方法不仅易于求解, 且在处理多类聚类问题时不必依赖递归技术, 而能直接得到聚类结果. 另外, 本文还给出一种实现松弛图聚类的有效求解算法. 在合成数据和真实数据上的实验结果表明, 所提出的方法具有良好的性能.
  • 图  1  函数$\Phi_p(v)$在不同参数$p$时的示意图[14]

    Fig.  1  Illustration of functions $\Phi_p(v)$ with different parameter $p$[14]

    图  2  函数$\Psi_r(v)$在不同参数$r$时的示意图

    Fig.  2  Illustration of functions $\Psi_r(v)$ with different parameter $r$

    图  3  函数$\widehat{W}$$\widehat{G}$的对比图

    Fig.  3  The comparisons of $\widehat{W}$ and $\widehat{G}$

    图  4  合成数据及其邻接图

    Fig.  4  Synthetic data and its adjacency graph

    图  5  Ratio形式的图聚类结果

    Fig.  5  Results of Ratio graph clustering methods

    图  6  Normalized形式的图聚类结果

    Fig.  6  Results of Normalized graph clustering methods

    图  7  聚类结果的统计分析

    Fig.  7  Statistical analysis of clustering results

    表  1  真实数据集

    Table  1  The statistics of real databases

    数据集样本数类别数数据类型
    HIGHSCHOOL605社交
    POLBOOKS1053社交
    FOOTBALL11512社交
    DUKE442医疗
    CANCER19814医疗
    KHAN834基因
    ROSETTA3005基因
    ORL40040图像
    UMIST57520图像
    ALPHADIGS140436图像
    COIL20144020图像
    SAVEE4807语音
    CASIA12006语音
    IEMOCAP99948语音
    MED103331文本
    WEBKB441964文本
    REUTERS829365文本
    YEAST148410生物
    FAULTS19417 工业
    ECOLI3275蛋白质
    下载: 导出CSV

    表  2  各图聚类方法的聚类精度(%)对比

    Table  2  Comparison of clustering accuracies (%) between graph clustering methods

    数据集1Spec[16]MTV[11]L21MC[19]BKCut[18]CCB[15]FBC (r)FMC (r)
    HIGHSCHOOL81.6780.0070.0081.6786.6771.67 (0.9)89.67 (1.2)
    POLBOOKS82.8679.0577.1481.9083.8181.50 (1.4)83.81 (1.3)
    FOOTBALL86.9692.1724.3593.0486.0992.11 (1.3)93.04 (1.8)
    DUKE52.2770.4563.6452.2752.2770.45 (0.5)70.45 (0.5)
    CANCER34.3454.0452.0254.5548.4853.54 (1.5)54.55 (1.7)
    KHAN57.8349.4054.2251.8157.8350.28 (1.6)57.83 (1.6)
    ROSETTA76.6776.6776.6776.6776.6776.67 (1.0)77.33 (1.9)
    ORL36.0052.5079.2582.0074.2574.75 (1.5)81.75 (1.0)
    UMIST58.9676.8770.7870.0965.2272.70 (1.3)74.96 (1.3)
    ALPHADIGS36.1151.7146.0847.7245.8736.84 (2.0)49.00 (2.0)
    COIL2045.0081.4650.9778.8977.2981.81 (1.0)82.92 (1.2)
    SAVEE25.2132.5031.4631.4629.1731.02 (1.1)33.24 (1.3)
    CASIA22.4232.9228.5028.0828.5826.00 (0.1)35.54 (0.4)
    IEMOCAP54.0354.0054.0054.0354.0354.03 (1.9)55.21 (1.5)
    MED44.7253.9252.7654.9953.5350.09 (1.4)55.00 (1.4)
    WEBKB439.3956.2239.3051.6039.5646.14 (0.5)59.60 (0.7)
    REUTERS52.8373.3465.6961.0269.3867.58 (2.0)73.53 (1.5)
    YEAST51.3553.9749.2653.7151.9544.41 (1.6)54.85 (1.6)
    FAULTS36.8940.0837.9241.1139.5231.72 (1.8)42.76 (1.7)
    ECOLI79.8277.6777.6782.8777.9871.43 (1.8)82.87 (1.8)
    下载: 导出CSV

    表  3  本文方法与其他非图聚类方法的聚类精度(%)对比

    Table  3  Comparison of clustering accuracies (%) between ours and other non-graph clustering methods

    数据集PNMF[44]NSC[45]PLSI[42]LSD[46]NMFR[47]CLR[43]FMC (r)
    HIGHSCHOOL81.6783.3383.3381.6795.0080.0089.67 (1.2)
    POLBOOKS78.1080.9578.1078.1079.0580.9583.81 (1.3)
    FOOTBALL93.0493.0493.0493.0493.0493.0493.04 (1.8)
    DUKE52.2752.2770.4570.4570.4552.2770.45 (0.5)
    CANCER53.5353.0353.5353.0351.5254.0454.55 (1.7)
    KHAN60.2455.4255.4251.8150.6051.8157.83 (1.6)
    ROSETTA76.6776.6776.6776.6776.6777.0077.33 (1.9)
    ORL81.5082.2582.5081.2582.5080.0081.75 (1.0)
    UMIST64.0068.0068.5268.3571.8370.2674.96 (1.3)
    ALPHADIGS44.9449.4348.6549.3650.7836.4749.00 (2.0)
    COIL2075.2181.2580.6975.0082.8585.2182.92 (1.2)
    SAVEE31.8831.4631.6734.1731.8828.1333.24 (1.3)
    CASIA35.1729.0832.0032.1734.0827.9235.54 (0.4)
    IEMOCAP54.0054.0654.0054.0054.0054.0255.21 (1.5)
    MED53.9253.7354.5054.6055.8649.1855.00 (1.4)
    WEBKB439.0639.7748.9050.8163.3239.5659.60 (0.7)
    REUTERS73.8676.4775.9274.9976.5251.8073.53 (1.5)
    YEAST52.7654.3152.6952.0954.8546.9054.85 (1.6)
    FAULTS39.2640.0840.0340.1838.1741.3742.76 (1.7)
    ECOLI77.6878.9079.8267.8979.2078.2982.87 (1.8)
    下载: 导出CSV

    表  4  不同聚类方法得到的图分割函数值

    Table  4  The values of different graph cut functions obtained by different clustering methods

    图分割函数1Spec[16]MTV[11]L21MC[19]BKCut[18]CCB[15]PLSI[42]LSD[46]NMFR[47]CLR[43]本文方法
    FRCut0.512.64 (13.01)13.99 (12.66)15.57 (10.64)12.03 (11.09)18.63 (14.63)14.43 (12.16)14.70 (12.41)14.27 (12.04)13.69 (12.14)9.71 (10.52)
    FRCut1.09.21 (9.64)12.27 (11.22)12.09 (8.27)9.67 (9.31)13.93 (11.28)12.36 (10.35)12.95 (10.83)12.77 (10.90)9.78 (9.08)8.05 (8.18)
    FRCut2.07.79 (7.96)11.89 (10.99)10.90 (7.77)8.37 (8.42)12.18 (10.64)11.88 (9.99)12.62 (10.57)12.50 (10.78)8.18 (8.36)6.72 (7.18)
    FNCut0.51.83 (1.48)1.78 (1.42)2.05 (1.25)1.57 (1.25)1.87 (1.46)1.84 (1.35)1.90 (1.38)1.83 (1.37)1.79 (1.48)1.39 (1.21)
    FNCut1.01.35 (1.14)1.53 (1.23)1.56 (0.93)1.29 (1.09)1.41 (1.13)1.54 (1.14)1.63 (1.18)1.60 (1.20)1.26 (1.06)1.14 (1.03)
    FNCut2.01.19 (1.00)1.47 (1.20)1.39 (0.87)1.18 (1.03)1.24 (1.07)1.47 (1.10)1.57 (1.14)1.55 (1.17)1.04 (0.95)0.92 (0.83)
    下载: 导出CSV
  • [1] MacQueen J. Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, USA: University of California Press, 1967. 281−297
    [2] Von Luxburg U. A tutorial on spectral clustering. Statistics and Computing, 2007, 17(4): 395-416 doi: 10.1007/s11222-007-9033-z
    [3] Ding S F, Cong L, Hu Q K, Jia H J, Shi Z Z. A multiway p-spectral clustering algorithm. Knowledge-Based Systems, 2019, 164: 371-377 doi: 10.1016/j.knosys.2018.11.007
    [4] Liang N Y, Yang Z Y, Li Z N, Xie S L, Su C Y. Semi-supervised multi-view clustering with graph-regularized partially shared non-negative matrix factorization. Knowledge-Based Systems, 2020, 190: Article No. 105185 doi: 10.1016/j.knosys.2019.105185
    [5] 庞宁, 张继福, 秦啸. 一种基于多属性权重的分类数据子空间聚类算法. 自动化学报, 2018, 44(3): 517-532

    Pang Ning, Zhang Ji-Fu, Qin Xiao. A subspace clustering algorithm of categorical data using multiple attribute weights. Acta Automatica Sinica, 2018, 44(3): 517-532
    [6] 尹明, 吴浩杨, 谢胜利, 杨其宇. 基于自注意力对抗的深度子空间聚类. 自动化学报, 2022, 48(1): 271-281 doi: 10.16383/j.aas.c200302

    Yin Ming, Wu Hao-Yang, Xie Sheng-Li, Yang Qi-Yu. Self-attention adversarial based deep subspace clustering. Acta Automatica Sinica, 2022, 48(1): 271-281 doi: 10.16383/j.aas.c200302
    [7] Stoer M, Wagner F. A simple min-cut algorithm. Journal of the ACM, 1997, 44(4): 585-591 doi: 10.1145/263867.263872
    [8] Hagen L, Kahng A B. New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1992, 11(9): 1074-1085 doi: 10.1109/43.159993
    [9] Shi J B, Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888-905 doi: 10.1109/34.868688
    [10] Chung F R K. Spectral Graph Theory. Providence: American Mathematical Society, 1997. 36−45
    [11] Bresson X, Laurent T, Uminsky D, Von Brecht J H. Multiclass total variation clustering. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: Curran Associates Inc., 2013. 1421−1429
    [12] Bresson X, Tai X C, Chan T F, Szlam A. Multi-class transductive learning based on 1 relaxations of cheeger cut and Mumford-shah-Potts model. Journal of Mathematical Imaging and Vision, 2014, 49(1): 191-201 doi: 10.1007/s10851-013-0452-5
    [13] Hein M, Setzer S. Beyond spectral clustering-tight relaxations of balanced graph cuts. In: Proceedings of the 24th International Conference on Neural Information Processing Systems. Granada, Spain: Curran Associates Inc., 2011. 2366−2374
    [14] Bühler T, Hein M. Spectral clustering based on the graph p-laplacian. In: Proceedings of the 26th Annual International Conference on Machine Learning. Montreal, Canada: ACM, 2009. 81−88
    [15] Cahill N D, Hayes T L, Meinhold R T, Hamilton J F. Compassionately conservative balanced cuts for image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 1683−1691
    [16] Hein M, Bühler T. An inverse power method for nonlinear eigenproblems with applications in 1-spectral clustering and sparse PCA. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc., 2010. 847−855
    [17] Nie F P, Wang H, Deng C, Gao X B, Li X L, Huang H. New 1-norm relaxations and optimizations for graph clustering. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. Phoenix, USA: AAAI Press, 2016. 1962−1968
    [18] Rangapuram S S, Mudrakarta P K, Hein M. Tight continuous relaxation of the balanced k-cut problem. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press, 2014. 3131−3139
    [19] Yang X, Deng C, Liu X L, Nie F P. New 2,1-norm relaxation of multi-way graph cut for clustering. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence and the 30th Innovative Applications of Artificial Intelligence Conference and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence. New Orleans, USA: AAAI Press, 2018. 4374−4381
    [20] Ling S Y, Strohmer T. Certifying global optimality of graph cuts via semidefinite relaxation: A performance guarantee for spectral clustering. Foundations of Computational Mathematics, 2020, 20(3): 367-421 doi: 10.1007/s10208-019-09421-3
    [21] Lovász L. Submodular functions and convexity. Mathematical Programming the State of the Art. Berlin, Heidelberg: Springer, 1983. 235−257
    [22] Bach F. Submodular functions: From discrete to continuous domains. Mathematical Programming, 2019, 175(1-2): 419-459 doi: 10.1007/s10107-018-1248-6
    [23] Ji S, Xu D C, Li M, Wang Y S, Zhang D M. Stochastic greedy algorithm is still good: Maximizing submodular + supermodular functions. In: Proceedings of the 6th World Congress on Global Optimization. Metz, France: Springer, 2019. 488−497
    [24] Chambolle A, Pock T. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 2011, 40(1): 120-145 doi: 10.1007/s10851-010-0251-1
    [25] Batagelj V, Mrvar A. Pajek datasets [Online], available: http://vlado.fmf.uni-lj.si/pub/networks/data/, February 24, 2007
    [26] Krebs V. Books about US politics [Online], available: https://www.cise.ufl.edu/research/sparse/matrices/Newman/polbooks.html, March 12, 2014
    [27] Chang C C, Lin C J. LIBSVM — A library for support vector machines [Online], available: http://www.csie.ntu.edu.tw/~cjlin/libsvm, September 11, 2019
    [28] Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. New York: Springer, 2009. 649−698
    [29] Kim P M, Tidor B. Subsystem identification through dimensionality reduction of large-scale gene expression data. Genome Research, 2003, 13(7): 1706-1718 doi: 10.1101/gr.903503
    [30] Samaria F S, Harter A C. Parameterisation of a stochastic model for human face identification. In: Proceedings of the IEEE Workshop on Applications of Computer Vision. Sarasota, USA: IEEE, 1994. 138−142.
    [31] Graham D B, Allinson N M. Characterising virtual eigensignatures for general purpose face recognition. Face Recognition: From Theory to Applications. Berlin: Springer, 1998. 446−456
    [32] Roweis S. Binary alphadigits [Online], available: https://cs.nyu.edu/~roweis/data.html, January 12, 2010
    [33] Nene S A, Nayar S K, Murase H. Columbia object image library (COIL-20) [Online], available: http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php, June 29, 2016
    [34] Haq S, Jackson P J B, Edge J. Audio-visual feature selection and reduction for emotion classification. In: Proceedings of the International Conference on Auditory-Visual Speech Processing. Moreton Island, Australia: AVSP, 2008. 185−190
    [35] 中文语言资源联盟. CASIA 汉语情感语料库 [Online], available: http://www.chineseldc.org/, 2010-10-09

    Chinese LDC. CASIA-Chinese emotional speech corpus [Online], available: http://www.chineseldc.org/, October 9, 2010
    [36] Busso C, Bulut M, Lee C C, Kazemzadeh A, Mower E, Kim S, et al. IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 2008, 42(4): 335-359 doi: 10.1007/s10579-008-9076-6
    [37] Berry W M, Dumais S. LSI: Latent semantic indexing [Online], available: http://web.eecs.utk.edu/research/lsi/, November 21, 2008
    [38] Srivastava S K, Singh S K, Suri J S. Effect of incremental feature enrichment on healthcare text classification system: A machine learning paradigm. Computer Methods and Programs in Biomedicine, 2019, 172: 35-51 doi: 10.1016/j.cmpb.2019.01.011
    [39] Dua D, Graff C. UCI machine learning repository [Online], available: http://archive.ics.uci.edu/ml, September 24, 2018
    [40] Oyallon E, Zagoruyko S, Huang G, Komodakis N, Julien S L, Blaschko M, et al. Scattering networks for hybrid representation learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(9): 2208-2221 doi: 10.1109/TPAMI.2018.2855738
    [41] Schuller B W, Batliner A, Bergler C, Messner E M, Hamilton A, Amiriparian S, et al. The INTERSPEECH 2020 computational paralinguistics challenge: Elderly emotion, breathing & masks. In: Proceedings of the 21st Annual Conference of the International Speech Communication Association. Shanghai, China: ISCA, 2020. 2042−2046
    [42] Zhou D X, Yang D, Zhang X H, Huang S, Feng S. Discriminative probabilistic latent semantic analysis with application to single sample face recognition. Neural Processing Letters, 2019, 49(3): 1273-1298 doi: 10.1007/s11063-018-9852-2
    [43] Nie F P, Wang X Q, Jordan M I, Huang H. The constrained Laplacian rank algorithm for graph-based clustering. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. Phoenix, USA: AAAI Press, 2016. 1969−1976
    [44] Yang Z R, Oja E. Linear and nonlinear projective nonnegative matrix factorization. IEEE Transactions on Neural Networks, 2010, 21(5): 734-749 doi: 10.1109/TNN.2010.2041361
    [45] Ding C, Li T, Jordan M I. Nonnegative matrix factorization for combinatorial optimization: Spectral clustering, graph matching, and clique finding. In: Proceedings of the 8th IEEE International Conference on Data Mining. Pisa, Italy: IEEE, 2008. 183−192
    [46] Arora R, Gupta M R, Kapila A, Fazel M. Clustering by left-stochastic matrix factorization. In: Proceedings of the 28th International Conference on International Conference on Machine Learning. Bellevue, USA: Omnipress, 2011. 761−768
    [47] Yang Z R, Hao T L, Dikmen O, Chen X, Oja E. Clustering by nonnegative matrix factorization using graph random walk. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: Curran Associates Inc., 2012. 1079−1087
  • 加载中
图(7) / 表(4)
计量
  • 文章访问数:  1095
  • HTML全文浏览量:  316
  • PDF下载量:  156
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-03-18
  • 录用日期:  2020-08-14
  • 网络出版日期:  2020-12-17
  • 刊出日期:  2023-04-20

目录

    /

    返回文章
    返回