2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于灵活平衡约束的图聚类方法

罗辉 韩纪庆

罗辉, 韩纪庆. 基于灵活平衡约束的图聚类方法. 自动化学报, 2020, 46(x): 1−12 doi: 10.16383/j.aas.c200144
引用本文: 罗辉, 韩纪庆. 基于灵活平衡约束的图聚类方法. 自动化学报, 2020, 46(x): 1−12 doi: 10.16383/j.aas.c200144
Luo Hui, Han Ji-Qing. Graph clustering based on flexibly balanced constraint. Acta Automatica Sinica, 2020, 46(x): 1−12 doi: 10.16383/j.aas.c200144
Citation: Luo Hui, Han Ji-Qing. Graph clustering based on flexibly balanced constraint. Acta Automatica Sinica, 2020, 46(x): 1−12 doi: 10.16383/j.aas.c200144

基于灵活平衡约束的图聚类方法

doi: 10.16383/j.aas.c200144
基金项目: 国家自然科学基金(U1736210)资助, 国家重点研发计划项目(2017YFB1002102)资助
详细信息
    作者简介:

    罗辉:哈尔滨工业大学计算机科学与技术学院博士研究生. 2014年获贵州大学通信与信息系统专业硕士学位. 主要研究方向为智能语音处理技术和模式识别. E-mail: luohui0216@163.com

    韩纪庆:哈尔滨工业大学计算机科学与技术学院教授, 博士生导师. 1998年获哈尔滨工业大学计算机科学博士学位. 主要研究方向为智能语音处理技术和音频信息的智能处理. E-mail: jqhan@hit.edu.cn

Graph Clustering Based on Flexibly Balanced Constraint

Funds: Supported by National Science Foundation of China (U1736210), National Key Research and Development Program of China (2017YFB1002102)
  • 摘要: 现有的图聚类方法主要存在两方面的问题, 一是对各个类规模一致的假设, 在许多实际应用中并不成立. 二是在处理多类聚类问题时, 其所常借助的递归技术或启发式算法会影响聚类的性能. 为此, 本文提出一种基于灵活平衡约束的多类图聚类方法. 其能够覆盖从绝对平衡约束到无平衡约束的范围, 可同时处理类别规模一致和不一致的问题. 为有效求解新方法中的参数, 进一步提出一个紧松弛方法来使所提出的图聚类方法不仅易于求解, 且在处理多类聚类问题时不必依赖递归技术, 而能直接得到聚类结果. 文中也给出一种实现松弛图聚类的有效求解算法. 在合成数据和真实数据上的实验结果表明, 所提出的方法具有良好的性能.
  • 图  1  函数$\varPhi_p(v)$在不同参数$p$时的示意图[14]

    Fig.  1  Illustration of functions $\varPhi_p(v)$ with different parameter $p$[14]

    图  2  函数$\varPsi_r(v)$在不同参数$r$时的示意图

    Fig.  2  Illustration of functions $\varPsi_r(v)$ with different parameter $r$

    图  3  函数$\widehat{W}$$\widehat{G}$的对比图

    Fig.  3  The comparisons of $\widehat{W}$ and $\widehat{G}$

    图  4  合成数据及其邻接图

    Fig.  4  Synthetic data and its adjacency graph

    图  5  Ratio形式的图聚类结果

    Fig.  5  Results of Ratio graph clustering methods

    图  6  Normalized形式的图聚类结果

    Fig.  6  Results of Normalized graph clustering methods

    图  7  聚类结果的统计分析

    Fig.  7  Statistical analysis of clustering results

    表  1  真实数据集

    Table  1  The statistics of real databases

    数据集样本数类别数数据类型
    HIGHSCHOOL605社交
    POLBOOKS1053社交
    FOOTBALL11512社交
    DUKE442医疗
    CANCER19814医疗
    KHAN834基因
    ROSETTA3005基因
    ORL40040图像
    UMIST57520图像
    ALPHADIGS140436图像
    COIL20144020图像
    SAVEE4807语音
    CASIA12006语音
    IEMOCAP99948语音
    MED103331文本
    WEBKB441964文本
    REUTERS829365文本
    YEAST148410生物
    FAULTS19417 工业
    ECOLI3275蛋白质
    下载: 导出CSV

    表  2  各图聚类方法的聚类精度(%)对比

    Table  2  Comparison of clustering accuracies (%) between graph clustering methods

    Dataset1Spec[16]MTV[11]L21MC[19]BKCut[18]CCB[15]FBC (r)FMC (r)
    HIGHSCHOOL81.6780.0070.0081.6786.6771.67 (0.9)89.67 (1.2)
    POLBOOKS82.8679.0577.1481.9083.8181.50 (1.4)83.81 (1.3)
    FOOTBALL86.9692.1724.3593.0486.0992.11 (1.3)93.04 (1.8)
    DUKE52.2770.4563.6452.2752.2770.45 (0.5)70.45 (0.5)
    CANCER34.3454.0452.0254.5548.4853.54 (1.5)54.55 (1.7)
    KHAN57.8349.4054.2251.8157.8350.28 (1.6)57.83 (1.6)
    ROSETTA76.6776.6776.6776.6776.6776.67 (1.0)77.33 (1.9)
    ORL36.0052.5079.2582.0074.2574.75 (1.5)81.75 (1.0)
    UMIST58.9676.8770.7870.0965.2272.70 (1.3)74.96 (1.3)
    ALPHADIGS36.1151.7146.0847.7245.8736.84 (2.0)49.00 (2.0)
    COIL2045.0081.4650.9778.8977.2981.81 (1.0)82.92 (1.2)
    SAVEE25.2132.5031.4631.4629.1731.02 (1.1)33.24 (1.3)
    CASIA22.4232.9228.5028.0828.5826.00 (0.1)35.54 (0.4)
    IEMOCAP54.0354.0054.0054.0354.0354.03 (1.9)55.21 (1.5)
    MED44.7253.9252.7654.9953.5350.09 (1.4)55.00 (1.4)
    WEBKB439.3956.2239.3051.6039.5646.14 (0.5)59.60 (0.7)
    REUTERS52.8373.3465.6961.0269.3867.58 (2.0)73.53 (1.5)
    YEAST51.3553.9749.2653.7151.9544.41 (1.6)54.85 (1.6)
    FAULTS36.8940.0837.9241.1139.5231.72 (1.8)42.76 (1.7)
    ECOLI79.8277.6777.6782.8777.9871.43 (1.8)82.87 (1.8)
    下载: 导出CSV

    表  3  本文方法与其它非图聚类方法的聚类精度(%)对比

    Table  3  Comparison of clustering accuracies (%) between ours and other non-graph clustering methods

    DatasetPNMF[44]NSC[45]PLSI[42]LSD[46]NMFR[47]CLR[43]FMC (r)
    HIGHSCHOOL81.6783.3383.3381.6795.0080.0089.67 (1.2)
    POLBOOKS78.1080.9578.1078.1079.0580.9583.81 (1.3)
    FOOTBALL93.0493.0493.0493.0493.0493.0493.04 (1.8)
    DUKE52.2752.2770.4570.4570.4552.2770.45 (0.5)
    CANCER53.5353.0353.5353.0351.5254.0454.55 (1.7)
    KHAN60.2455.4255.4251.8150.6051.8157.83 (1.6)
    ROSETTA76.6776.6776.6776.6776.6777.0077.33 (1.9)
    ORL81.5082.2582.5081.2582.5080.0081.75 (1.0)
    UMIST64.0068.0068.5268.3571.8370.2674.96 (1.3)
    ALPHADIGS44.9449.4348.6549.3650.7836.4749.00 (2.0)
    COIL2075.2181.2580.6975.0082.8585.2182.92 (1.2)
    SAVEE31.8831.4631.6734.1731.8828.1333.24 (1.3)
    CASIA35.1729.0832.0032.1734.0827.9235.54 (0.4)
    IEMOCAP54.0054.0654.0054.0054.0054.0255.21 (1.5)
    MED53.9253.7354.5054.6055.8649.1855.00 (1.4)
    WEBKB439.0639.7748.9050.8163.3239.5659.60 (0.7)
    REUTERS73.8676.4775.9274.9976.5251.8073.53 (1.5)
    YEAST52.7654.3152.6952.0954.8546.9054.85 (1.6)
    FAULTS39.2640.0840.0340.1838.1741.3742.76 (1.7)
    ECOLI77.6878.9079.8267.8979.2078.2982.87 (1.8)
    下载: 导出CSV

    表  4  不同聚类方法得到的图分割函数值

    Table  4  The values of different graph cut functions obtained by different clustering methods

    图分割函数1Spec[16]MTV[11]L21MC[19]BKCut[18]CCB[15]PLSI[42]LSD[46]NMFR[47]CLR[43]Ours
    FRCut0.512.64 (13.01)13.99 (12.66)15.57 (10.64)12.03 (11.09)18.63 (14.63)14.43 (12.16)14.70 (12.41)14.27 (12.04)13.69 (12.14)9.71 (10.52)
    FRCut1.09.21 (9.64)12.27 (11.22)12.09 (8.27)9.67 (9.31)13.93 (11.28)12.36 (10.35)12.95 (10.83)12.77 (10.90)9.78 (9.08)8.05 (8.18)
    FRCut2.07.79 (7.96)11.89 (10.99)10.90 (7.77)8.37 (8.42)12.18 (10.64)11.88 (9.99)12.62 (10.57)12.50 (10.78)8.18 (8.36)6.72 (7.18)
    FNCut0.51.83 (1.48)1.78 (1.42)2.05 (1.25)1.57 (1.25)1.87 (1.46)1.84 (1.35)1.90 (1.38)1.83 (1.37)1.79 (1.48)1.39 (1.21)
    FNCut1.01.35 (1.14)1.53 (1.23)1.56 (0.93)1.29 (1.09)1.41 (1.13)1.54 (1.14)1.63 (1.18)1.60 (1.20)1.26 (1.06)1.14 (1.03)
    FNCut2.01.19 (1.00)1.47 (1.20)1.39 (0.87)1.18 (1.03)1.24 (1.07)1.47 (1.10)1.57 (1.14)1.55 (1.17)1.04 (0.95)0.92 (0.83)
    下载: 导出CSV
  • [1] MacQueen J. Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley symposium on mathematical statistics and probability. Oakland, California, USA, 1967. 281−297
    [2] Luxburg U. A tutorial on spectral clustering. Statistics and Computing, 2007, 4(17): 395−416
    [3] Ding Shi-Fei, Cong Lin, Hu Qian-Kun, Jia Hong-Jie, Shi Zhong-Zhi. A multiway p-spectral clustering algorithm. Knowledge-Based Systems, 2019, 164: 371−377
    [4] Liang Nai-Yao, Yang Zu-Yuan, Li Zhen-Ni, Xie Sheng-Li, Su Chun-Yi. Semi-supervised multi-view clustering with graph-regularized partially shared non-negative matrix factorization. Knowledge-Based Systems, to be published
    [5] 庞宁, 张继福, 秦啸. 一种基于多属性权重的分类数据子空间聚类算法. 自动化学报, 2018, 44(3): 517−532

    Pang Ning, Zhang Ji-Fu, Qin Xiao. A subspace clustering algorithm of categorical data using multiple attribute weights. Acta Automatica Sinica, 2018, 44(3): 517−532
    [6] 尹明, 吴浩杨, 谢胜利, 杨其宇. 基于自注意力对抗的深度子空间聚类. 自动化学报, 2020, 46(x): 1−11

    Yin Ming, Wu Hao-Yang, Xie Sheng-Li, Yang Qi-Yu. Self-attention adversarial based deep subspace clustering. Acta Automatica Sinica, 2020, 46(x): 1−11
    [7] Stoer M, Wagner P. A simple min-cut algorithm. Journal of the ACM, 1997, 44(4): 585−591
    [8] Hagen L, Kahng A B. New spectral methods for ratio cut partitioning and clustering. IEEE transactions on computer-aided design of integrated circuits and systems, 1992, 11(9): 1074−1085
    [9] Shi Jian-Bo, Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888−905
    [10] Chung F R, Graham F C. Spectral graph theory. Providence: American Mathematical Society, 1997. 36−45
    [11] Bresson X, Laurent T, Uminsky D, Von Brecht J. Multiclass total variation clustering. In: Proceedings of Advances in Neural Information Processing Systems. Lake Tahoe, Nevada, USA, 2013. 1421−1429
    [12] Bresson X, Tai X C, Chan T F, Szlam A. Multiclass transductive learning based on $ \ell_1$ relaxations of cheeger cut and mumford-shah-potts model. Journal of mathematical imaging and vision, 2014, 49(1): 191−201
    [13] Hein M, Setzer S. Beyond spectral clustering-tight relaxations of balanced graph cuts. In: Proceedings of Advances in neural information processing systems. Granada, Spain, 2011. 2366−2374
    [14] Bühler T, Hein M. Spectral clustering based on the graph p-laplacian. In: Proceedings of the 26th Annual International Conference on Machine Learning. Montreal, Quebec, Canada, 2009. 81−88
    [15] Cahill N D, Hayes T L, Meinhold R T, Hamilton J F. Compassionately conservative balanced cuts for image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah, USA, 2018. 1683−1691
    [16] Hein M, $ {{\rm{B}}\mathop {'}\limits^{'}} {\rm{uhler}}$ T. An inverse power method for nonlinear eigenproblems with applications in 1-spectral clustering and sparse PCA. In: Proceedings of Advances in Neural Information Processing Systems. Vancouver, Canada, 2010. 847−855
    [17] Nie Fei-Ping, Wang Hua, Deng Cheng, Gao Xin-Bo, Li Xue-Long, Huang Heng. New $ \ell_1$-norm relaxations and optimizations for graph clustering. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. Phoenix, Arizona, USA, 2016. 1962−1968
    [18] Rangapuram S S, Mudrakarta P K, Hein M. Tight continuous relaxation of the balanced k-cut problem. In: Proceedings of Advances in Neural Information Processing Systems. Montreal, Canada, 2014. 3131−3139
    [19] Yang Xu, Deng Cheng, Liu Xiang-Long, Nie Fei-Ping. New $ \ell_{2, 1}$-norm relaxation of multi-way graph cut for clustering. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans, Louisiana, USA, 2018. 4374−4381
    [20] Ling Shu-Yang, Strohmer T. Certifying global optimality of graph cuts via semidefinite relaxation: A performance guarantee for spectral clustering. Foundations of Computational Mathematics, 2019, 20(1): 367−421
    [21] Lovász L. Submodular functions and convexity. Mathematical Programming: The State of the Art. Springer, Berlin, Heidelberg, 1983. 235−257
    [22] Bach F. Submodular functions: from discrete to continuous domains. Mathematical Programming, 2019, 175: 419−459
    [23] Ji Sai, Xu Da-Chuan, Li Min, Wang Yi-Shui, Zhang Dong-Mei. Stochastic greedy algorithm is still good: maximizing submodular + supermodular functions. In: Proceedings of the 6th World Congress on Global Optimization. Metz, France, 2019. 488−497
    [24] Chambolle A, Pock T. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of mathematical imaging and vision, 2011, 40(1): 120−145
    [25] Batagelj V, Mrvar A. Pajek datasets[Online], available: http://vlado.fmf.uni-lj.si/pub/networks/data/, 2006
    [26] Krebs V. Books about US politics[Online], available: https://www.cise.ufl.edu/research/sparse/matrices/Newman/polbooks.html, 2014
    [27] Chang C C, Lin C J. LIBSVM : a library for support vector machines[Online], available: http://www.csie.ntu.edu.tw/~cjlin/libsvm, 2019
    [28] Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. New York: Springer-Verlag, 2009. 649−698
    [29] Kim P, Tidor B. Subsystem identification through dimensionality reduction of large-scale gene expression data. Genome Research, 2003, 13(7): 1706−1718
    [30] Samaria F S, Harter A C. Parameterisation of a stochastic model for human face identification. In: Proceedings of the IEEE Workshop on Applications of Computer Vision. Sarasota, Florida, USA, 1994. 138−142.
    [31] Graham D B, Allinson N M. Characterising virtual eigensignatures for general purpose face recognition. Face Recognition: From Theory to Applications. Berlin: Springer-Heidelberg, 1998. 446−456
    [32] Roweis S. The binary alphadigits dataset[Online], available: https://cs.nyu.edu/~roweis/data.html, 2010
    [33] Nene S A, Nayar S K, Murase H. Columbia object image library (COIL-20)[Online], available: http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php, 1996
    [34] Haq S, Jackson P J B, Edge J. Audio-visual feature selection and reduction for emotion classification. In: Proceedings of the International Conference on Auditory-Visual Speech Processing. Moreton Island, Queensland, Australia, 2008. 185−190
    [35] ChineseLDC. CASIA-Chinese emotional speech corpus[Online], available: http://www.chineseldc.org/, 2010
    [36] Busso C, Bulut M, Lee C C, Kazemzadeh A, Mower E, Kim S, Chang J N, Lee S, Narayanan S S. IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 2008, 42(4): 335−359
    [37] Dumais S, Berry W M. LSI: latent semantic indexing[Online], available: http://web.eecs.utk.edu/research/lsi/, 2006
    [38] Srivastava S K, Singh S K, Suri J S. Effect of incremental feature enrichment on healthcare text classification system: A machine learning paradigm. Computer Methods and Programs in Biomedicine, 2019, 172: 35−51
    [39] Dua D, Graff C. UCI machine learning repository[Online], available: http://archive.ics.uci.edu/ml, 2019
    [40] Oyallon E, Zagoruyko S, Huang G, Komodakis N, Julien S L, Blaschko M, et al. Scattering Networks for Hybrid Representation Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(9): 2208−2221
    [41] Schuller B, Batliner A, Bergler C, Messner E M, Hamilton A, Amiriparian S, et al. The INTERSPEECH 2020 computational paralinguistics challenge: Elderly emotion, breathing and masks. In: Proceedings of the 21th Annual Conference of the International Speech Communication Association. Shanghai, China, 2020. 1−5
    [42] Zhou Dao-Xiang, Yang Dan, Zhang Xiao-Hong, Huang Sheng, Feng Shu. Discriminative probabilistic latent semantic analysis with application to single sample face recognition. Neural Processing Letters, 2019, 49(3): 1273−1298
    [43] Nie Fei-Ping, Wang Xiao-Qian, Jordan M I, Huang Heng. The constrained Laplacian rank algorithm for graph-based clustering. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. Phoenix, Arizona, USA, 2016. 1969−1976
    [44] Yang Zhi-Rong, Oja E. Linear and nonlinear projective nonnegative matrix factorization. IEEE Transactions on Neural Networks, 2010, 21(5): 734−749
    [45] Ding C, Li Tao, Jordan M I. Nonnegative matrix factorization for combinatorial optimization: Spectral clustering, graph matching, and clique finding. In: Proceedings of the Eighth IEEE International Conference on Data Mining. Las Vegas, Nevada, USA: IEEE, 2008. 183−192
    [46] Arora R, Gupta M, Kapila A, Fazel M. Clustering by left-stochastic matrix factorization. In: Proceedings of the 28th International Conference on Machine Learning. Bellevue, Washington, USA, 2011. 761−768
    [47] Yang Zhi-Rong, Hao Te-Le, Dikmen O, Chen Xi, Oja E. Clustering by nonnegative matrix factorization using graph random walk. In: Proceedings of Advances in Neural Information Processing Systems. Lake Tahoe, Nevada, USA, 2012. 1079−1087
  • 加载中
计量
  • 文章访问数:  67
  • HTML全文浏览量:  28
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-03-18
  • 录用日期:  2020-08-14
  • 网络出版日期:  2020-12-17

目录

    /

    返回文章
    返回