2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于矩阵模型的高维聚类边界模式发现

李向丽 曹晓锋 邱保志

李向丽, 曹晓锋, 邱保志. 基于矩阵模型的高维聚类边界模式发现. 自动化学报, 2017, 43(11): 1962-1972. doi: 10.16383/j.aas.2017.c160443
引用本文: 李向丽, 曹晓锋, 邱保志. 基于矩阵模型的高维聚类边界模式发现. 自动化学报, 2017, 43(11): 1962-1972. doi: 10.16383/j.aas.2017.c160443
LI Xiang-Li, CAO Xiao-Feng, QIU Bao-Zhi. Clustering Boundary Pattern Discovery for High Dimensional Space Base on Matrix Model. ACTA AUTOMATICA SINICA, 2017, 43(11): 1962-1972. doi: 10.16383/j.aas.2017.c160443
Citation: LI Xiang-Li, CAO Xiao-Feng, QIU Bao-Zhi. Clustering Boundary Pattern Discovery for High Dimensional Space Base on Matrix Model. ACTA AUTOMATICA SINICA, 2017, 43(11): 1962-1972. doi: 10.16383/j.aas.2017.c160443

基于矩阵模型的高维聚类边界模式发现

doi: 10.16383/j.aas.2017.c160443
基金项目: 

河南省基础与前沿技术研究项目 152300410191

详细信息
    作者简介:

    李向丽   郑州大学信息工程学院副教授.主要研究方向为计算机网络, 数据挖掘.E-mail:iexlli@zzu.edu.cn

    邱保志  郑州大学信息工程学院教授.主要研究方向为数据库, 先进智能系统, 数据挖掘.E-mail:iebzqiu@zzu.edu.cn

    通讯作者:

    曹晓锋  郑州大学信息工程学院硕士研究生.主要研究方向为模式识别和数据挖掘.本文通信作者.E-mail:18739920964@163.com

Clustering Boundary Pattern Discovery for High Dimensional Space Base on Matrix Model

Funds: 

Basic and Advanced Technology Research Project of Henan Province 152300410191

More Information
    Author Bio:

     Associate professor at the School of Information Engineering, Zhengzhou University. Her research interest covers computer network and data mining

     Professor at the School of Information Engineering, Zhengzhou University. His research interest covers database, advanced intelligent system, and data mining

    Corresponding author: CAO Xiao-Feng   Master student at the School of Information Engineering, Zhengzhou University. His research interest covers pattern recognition and data mining. Corresponding author of this paper
  • 摘要: 流形学习关注于寻找合适的嵌入方式将高维空间映射至低维空间,但映射子空间依然可能具有较高的维度,难以解决高维空间的数据挖掘任务.本文建立一种简单的矩阵模型判断数据点k近邻空间关于该点的对称性,并使用对称率进行边界提取,提出一种基于矩阵模型的高维聚类边界检测技术(Clustering boundary detection based on matrix model,MMC).该模型构造简单、直接、易于理解和使用.理论分析以及在人工合成和真实数据集的实验结果表明MMC算法能够有效地检测出低维和高维空间的聚类边界.
    1)  本文责任编委 张军平
  • 图  1  不同算法在人工合成数据集上的最佳聚类边界检测结果

    Fig.  1  The best clustering boundary detection results of different algorithms on synthetic datasets

    图  2  手写体3的聚类边界和聚类中心

    Fig.  2  The clustering boundary and center objects of ``3''

    图  3  人脸簇FC1

    Fig.  3  Face cluster FC1

    图  4  标记FC1的边界个数为4时, MMC算法的检测结果($k=5$, $\varepsilon_1=0.8462$)

    Fig.  4  The boundary detection result of MMC when marking 4 boundaries on FC1 ($k=5$, $\varepsilon_1=0.8462$)

    图  5  标记FC1的边界个数为8时, MMC算法的检测结果($k=5$, $\varepsilon_1=0.6923$)

    Fig.  5  The boundary detection result of MMC when marking 8 boundaries on FC1 ($k=5$, $\varepsilon_1=0.6923$)

    图  6  标记FC2的边界个数为8时, MMC算法的检测结果($k=5$, $\varepsilon_1=0.7850$)

    Fig.  6  The boundary detection result of MMC when marking 8 boundaries on FC2 ($k=5$, $\varepsilon_1=0.7850$)

    图  7  MMC在序列1上的聚类边界检测结果($k=5$, $\varepsilon_1=0.9427$)

    Fig.  7  The boundary detection result of MMC on the first sequence ($k=5$, $\varepsilon_1=0.9427$)

    图  8  在不同数据集上输入不同$k$时, F-measure值的变化

    Fig.  8  The change of F-measure when inputting different $k$ on some different data sets

    图  9  在不同数据集上输入不同$\varepsilon_1$时, F-measure值的变化

    Fig.  9  The change of F-measure when inputting different $\varepsilon_1$ on some data sets

    图  10  在不同数据集上输入不同ε2时, F-measure值的变化

    Fig.  10  The change of F-measure when inputting different ε2 on some data sets

    表  1  预处理方式

    Table  1  Pretreatment methods

    数据集样本总数维数预处理方式
    Mnist10 000283)
    Colon622 0001)
    Prostate10210 5092)
    Pointing data2 7903843)
    下载: 导出CSV

    表  2  不同算法在不同数据集上聚类边界检测结果

    Table  2  The clustering boundary detection results of different algorithms on different data sets

    数据集维度算法真实边界数检测边界数检测正确边界数准确率召回率F-measure
    DS12BAND6408235560.67560.86880.7601
    BORDER7235400.74690.84380.7924
    BRINK6675200.77950.81250.7957
    BRIM6805360.78820.83750.8121
    BERGE6625320.80360.83130.8172
    MMC6305760.91430.90000.9071
    DS22BAND5387494540.60610.84390.7055
    BORDER6694450.63660.82710.7195
    BRINK4994380.87780.81410.8447
    BRIM5624660.82920.86610.8472
    BERGE5534720.85350.87730.8652
    MMC5495030.91620.93490.9255
    DS32BAND1 0771 6299610.58990.89230.7103
    BORDER1 2528310.66370.77160.7136
    BRINK1 5409140.59350.84780.6985
    BRIM1 1889350.78700.86820.8256
    BERGE1 1389420.82780.87470.8506
    MMC1 0169680.95280.89880.9250
    DS42BAND1 2041 9441 0560.54320.87710.6709
    BORDER1 8021 0890.60430.90450.7246
    BRINK1 8171 0030.55200.83310.6640
    BRIM1 3551 0620.78380.88210.8300
    BERGE1 2461 1230.90130.93270.9167
    MMC1 2281 1380.92670.94520.9359
    Biomed4BAND3026220.84620.73330.7857
    BORDER26230.88460.76670.8214
    BRINK36300.83331.00000.9089
    BERGE26240.92310.80000.8572
    MMC30280.93330.93330.9333
    Cancer10BAND3737250.67570.67570.6757
    BORDER37280.75680.75680.7568
    BRINK37290.78370.78370.7837
    BERGE37280.75680.75680.7568
    MMC38340.89470.91890.9067
    Colon2 000BAND7650.83330.71430.7692
    BORDER771.00001.00001.0000
    BRINK650.83330.71430.7692
    BERGE650.83330.71430.7692
    MMC771.00001.00001.0000
    Prostate10 509BAND1817160.94120.88890.9143
    BORDER19180.94741.00000.9730
    BRINK17160.94120.88890.9143
    BERGE17160.94120.88890.9143
    MMC18181.00001.00001.0000
    下载: 导出CSV
  • [1] Tsapanos N, Tefas A, Nikolaidis N, Pitas I. A distributed framework for trimmed kernel k-means clustering. Pattern Recognition, 2015, 48(8):2685-2698 doi: 10.1016/j.patcog.2015.02.020
    [2] Guo Y H, Sengur A. NCM:neutrosophic c-means clustering algorithm. Pattern Recognition, 2015, 48(8):2710-2724 doi: 10.1016/j.patcog.2015.02.018
    [3] Vikjord V V, Jenssen R. Information theoretic clustering using a k-nearest neighbors approach. Pattern Recognition, 2014, 47(9):3070-3081 doi: 10.1016/j.patcog.2014.03.018
    [4] Jain A K. Data clustering:50 years beyond k-means. Pattern Recognition Letters, 2010, 31(8):651-666 doi: 10.1016/j.patrec.2009.09.011
    [5] Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data. Data Mining and Knowledge Discovery, 2015, 11(1):5-33 doi: 10.1007-s10618-005-1396-1/
    [6] Dai L Z, Ding J D, Yang J. Inhomogeneity-embedded active contour for natural image segmentation. Pattern Recognition, 2015, 48(8):2513-2529 doi: 10.1016/j.patcog.2015.03.001
    [7] Aja-Fernández S, Curiale A H, Vegas-Sánchez-Ferrero G. A local fuzzy thresholding methodology for multiregion image segmentation. Knowledge-Based Systems, 2015, 83(1):1-12
    [8] Peng P, Addam O, Elzohbi M, Özyer S T, Elhajj A, Gao S, Liu Y M, Özyer T, Kaya M, Ridley M, Rokne J, Alhajj R. Reporting and analyzing alternative clustering solutions by employing multi-objective genetic algorithm and conducting experiments on cancer data. Knowledge-Based Systems, 2014, 56(3):108-122
    [9] Kaur P, Soni A K, Gosain A. RETRACTED:a robust kernelized intuitionistic fuzzy c-means clustering algorithm in segmentation of noisy medical images. Pattern Recognition Letters, 2013, 34(2):163-175 doi: 10.1016/j.patrec.2012.09.015
    [10] Parsons L, Haque E, Liu H. Subspace clustering for high dimensional data:a review. ACM SIGKDD Explorations Newsletter, 2004, 6(1):90-105 doi: 10.1145/1007730
    [11] Angiulli F, Pizzuti C. Outlier mining in large high-dimensional data sets. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(2):203-215 doi: 10.1109/TKDE.2005.31
    [12] Ester M, Kriegel H P, Sander J, Xu X W. A density-based algorithm for discovering clusters in large spatial databases with noise. In:Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96). Portland, Oregon:Association for the Advancement of Artificial Intelligence, 1996. 226-231
    [13] Xia C Y, Hsu W, Lee M L, Ooi B C. BORDER:efficient computation of boundary points. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(3):289-303 doi: 10.1109/TKDE.2006.38
    [14] Achtert E, Böhm C, Kröger P, Kunath P, Pryakhin A, Renz M. Efficient reverse k-nearest neighbor estimation. Informatik-Forschung und Entwicklung, 2007, 21(3-4):179-195 doi: 10.1007/s00450-007-0027-z
    [15] Qiu B Z, Yue F, Shen J Y. BRIM:an efficient boundary points detecting algorithm. Advances in Knowledge Discovery and Data Mining. Berlin Heidelberg:Springer, 2007. 761-768
    [16] 薛丽香, 邱保志.基于变异系数的边界点检测算法.模式识别与人工智能, 2009, 22(5):799-802 http://d.wanfangdata.com.cn/Periodical/mssbyrgzn200905020

    Xue Li-Xiang, Qiu Bao-Zhi. Boundary points detection algorithm based on coefficient of variation. Pattern Recognition and Artificial Intelligence, 2009, 22(5):799-802 http://d.wanfangdata.com.cn/Periodical/mssbyrgzn200905020
    [17] 邱保志, 杨洋, 杜效伟. BRINK:基于局部质变因子的聚类边界检测算法.郑州大学学报(工学版), 2012, 33(3):117-120

    Qiu Bao-Zhi, Yang Yang, Du Xiao-Wei. BRINK:an algorithm of boundary points of clusters detection based on local qualitative factors. Journal of Zhengzhou University (Engineering Science), 2012, 33(3):117-120
    [18] 李向丽, 耿鹏, 邱保志.混合属性数据集的聚类边界检测技术.控制与决策, 2015, 30(1):171-175 http://d.wanfangdata.com.cn/Periodical/kzyjc201501030

    Li Xiang-Li, Geng Peng, Qiu Bao-Zhi. Clustering boundary detection technology for mixed attribute data set. Control and Decision, 2015, 30(1):171-175 http://d.wanfangdata.com.cn/Periodical/kzyjc201501030
    [19] Roweis S T, Saul L K. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000, 290(5500):2323-2326 doi: 10.1126/science.290.5500.2323
    [20] Belkin M, Niyogi P. Laplacian eigenmaps and spectral techniques for embedding and clustering. In:Advances in Neural Information Processing Systems 14:Proceedings of the 2001 Conference. Cambridge MA:MIT Press, 2001. 585-591
    [21] He X, Niyogi P. Locality preserving projections. Advances in Neural Information Processing Systems, 2003, 16(1):186-197 http://d.wanfangdata.com.cn/Periodical/rjxb201006008
    [22] Zhang Z Y, Zha H Y. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM Journal on Scientific Computing, 2004, 26(1):313-338 doi: 10.1137/S1064827502419154
    [23] 郑思龙, 李元祥, 魏宪, 彭希帅.基于字典学习的非线性降维方法.自动化学报, 2016, 42(7):1065-1076 http://www.aas.net.cn/CN/abstract/abstract18897.shtml

    Zheng Si-Long, Li Yuan-Xiang, Wei Xian, Peng Xi-Shuai. Nonlinear dimensionality reduction based on dictionary learning. Acta Automatica Sinica, 2016, 42(7):1065-1076 http://www.aas.net.cn/CN/abstract/abstract18897.shtml
    [24] Taşdemir K, Yalçin B, Yildirim I. Approximate spectral clustering with utilized similarity information using geodesic based hybrid distance measures. Pattern Recognition, 2015, 48(4):1465-1477 doi: 10.1016/j.patcog.2014.10.023
    [25] Hearst M A, Dumais S T, Osman E, Platt J, Scholkopf B. Support vector machines. IEEE Intelligent Systems and their Applications, 1998, 13(4):18-28 doi: 10.1109/5254.708428
    [26] Chua L O, Yang L. Cellular neural networks:theory. IEEE Transactions on Circuits and Systems, 1998, 35(10):1257-1272 http://d.wanfangdata.com.cn/Periodical/dlxtzdh201405011
    [27] Zimdahl H, Hübner N. Gene chip technology and its application to molecular medicine. Encyclopedic Reference of Genomics and Proteomics in Molecular Medicine. Berlin Heidelberg:Springer, 2006. 650-655
    [28] Ferdous M M, Vinciotti V, Liu X H, Wilson P. Exploring the link between gene expression and protein binding by integrating mRNA microarray and ChIP-Seq data. Statistical Learning and Data Sciences. Switzerland:Springer International Publishing, 2015. 214-222
    [29] The Data and Story Library. Biomed data set[Online], available:http://lib.stat.cmu.edu/datasets/biomed.data.html, October 24, 2017
    [30] UCI Machine Learning Repository. Cancer data set[Online], available:http://archive.ics.uci.edu/ml/datasets.html, October 24, 2017
    [31] Princeton University Gene Expression Project. Colon data set[Online], available:http://genomics-pubs.princeton.edu/oncology/affydata/index.html, October 24, 2017
    [32] Gene Expression Model Selector. Prostate data set[Online], available:http://www.gems-system.org, October 24, 2017
    [33] Zhu Q, Xin H. Feature extraction and filter in handwritten numeral recognition. Geo-Informatics in Resource Management and Sustainable Ecosystem. Berlin Heidelberg:Springer, 2013. 58-67
    [34] Weber-Alonso J M, Sesmero M P, Gutierrez G, Ledezma A, Sanchis A. Input transformation and output combination for improved handwritten digit recognition. Artificial Neural Networks. Switzerland:Springer International Publishing, 2015, 4:435-443
    [35] Wang Y M, Peyls A, Pan Y, Claesen L, Yan X L. A fast self-organizing map algorithm for handwritten digit recognition. Multimedia and Ubiquitous Engineering. Berlin Heidelberg:Springer, 2013, 240:177-183
    [36] Jia W J, Zhang H F, He X J. Region-based license plate detection. Journal of Network and Computer Applications, 2007, 30(4):1324-1333 doi: 10.1016/j.jnca.2006.09.010
    [37] Zhou W G, Li H Q, Lu Y G, Tian Q. Principal visual word discovery for automatic license plate detection. IEEE Transactions on Image Processing, 2012, 21(9):4269-4279 doi: 10.1109/TIP.2012.2199506
    [38] THE MNIST DATABASE. Mnist data set[Online], available:http://yann.SMCun.com/exdb/mnist/, October 24, 2017
    [39] Huang S C, Chen J, Luo Z. RETRACTED ARTICLE:sparse tensor CCA for color face recognition. Neural Computing and Applications, 2014, 24(7-8):1647-1658 doi: 10.1007/s00521-013-1387-x
    [40] Bhaskar B, Mahantesh K, Geetha G P. An investigation of fSVD and ridgelet transform for illumination and expression invariant face recognition advances in intelligent informatics. Advances in Intelligent Informatics. Switzerland:Springer International Publishing, 2015, 320:31-38
    [41] Dang K D, Le T H. Local region partitioning for disguised face recognition using non-negative sparse coding. Advanced Methods for Computational Collective Intelligence. Berlin Heidelberg:Springer, 2013, 457:197-206
    [42] Head Pose Image Database. Pointing'04 dat set, [Online], available:http://www-prima.inrialpes.fr/Pointing04, October 24, 2017
  • 加载中
图(10) / 表(2)
计量
  • 文章访问数:  2436
  • HTML全文浏览量:  307
  • PDF下载量:  555
  • 被引次数: 0
出版历程
  • 收稿日期:  2016-05-31
  • 录用日期:  2016-11-16
  • 刊出日期:  2017-11-20

目录

    /

    返回文章
    返回