2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于密度的聚类中心自动确定的混合属性数据聚类算法研究

陈晋音 何辉豪

陈晋音, 何辉豪. 基于密度的聚类中心自动确定的混合属性数据聚类算法研究. 自动化学报, 2015, 41(10): 1798-1813. doi: 10.16383/j.aas.2015.c150062
引用本文: 陈晋音, 何辉豪. 基于密度的聚类中心自动确定的混合属性数据聚类算法研究. 自动化学报, 2015, 41(10): 1798-1813. doi: 10.16383/j.aas.2015.c150062
CHEN Jin-Yin, HE Hui-Hao. Research on Density-based Clustering Algorithm for Mixed Data with Determine Cluster Centers Automatically. ACTA AUTOMATICA SINICA, 2015, 41(10): 1798-1813. doi: 10.16383/j.aas.2015.c150062
Citation: CHEN Jin-Yin, HE Hui-Hao. Research on Density-based Clustering Algorithm for Mixed Data with Determine Cluster Centers Automatically. ACTA AUTOMATICA SINICA, 2015, 41(10): 1798-1813. doi: 10.16383/j.aas.2015.c150062

基于密度的聚类中心自动确定的混合属性数据聚类算法研究

doi: 10.16383/j.aas.2015.c150062
基金项目: 

浙江省自然科学基金(Y14F020092), 宁波市自然科学基金 (2013A610070)资助

详细信息
    作者简介:

    何辉豪 浙江工业大学信息学院硕士研 究生. 数主要研究方向为据挖掘与应用, 聚类分析. E-mail: hhh zjut@163.com

    通讯作者:

    陈晋音 博士, 浙江工业大学信息工程 学院副教授. 主要研究方向为智能计算, 优化计算, 网络安全. 本文通信作者. E-mail: chenjinyin@zjut.edu.cn

Research on Density-based Clustering Algorithm for Mixed Data with Determine Cluster Centers Automatically

Funds: 

Supported by Natural Science Foundation of Zhejiang Province (Y14F020092), Natural Science Foundation of Ningbo City (2013A610070)

  • 摘要: 面对广泛存在的混合属性数据,现有大部分混合属性聚类算法普遍存在聚类 质量低、聚类算法参数依赖性大、聚类类别个数和聚类中心无法准确自动确定等问题,针对 这些问题本文提出了一种基于密度的聚类中心自动确定的混合属性数据 聚类算法.该算法通过分析混合属性数据特征,将混合属性数据分为数 值占优、分类占优和均衡型混合属性数据三类,分析不同情况的特征选取 相应的距离度量方式.在计算数据集各个点的密度和距离分布图基础 上,深入分析获得规律: 高密度且与比它更高密度的数据点有较大距离的数 据点最可能成为聚类中心,通过线性回归模型和残差分析确定奇异 点,理论论证这些奇异点即为聚类中心,从而实现了自动确定聚类中心.采 用粒子群算法(Particle swarm optimization, PSO)寻找最优dc值,通过参数dc能够计算得到 任意数据对象的密度和到比它密度更高的点的最小距离,根据聚类 中心自动确定方法确定每个簇中心,并将其他点按到最近邻的更高 密度对象的最小距离划分到相应的簇中,从而实现聚类.最终将本文 提出算法与其他现有的多种混合属性聚类算法在多个数据集上进行 算法性能比较,验证本文提出算法具有较高的聚类质量.
  • [1] Huang Z X. Extensions to the K-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 1998, 2(3): 283-304
    [2] Jain A K, Dubes R C. Algorithms for Clustering Data. New Jersey: Prentice-Hall, 1988.
    [3] Han J, Kamber M. Data Mining: Concepts and Techniques. San Francisco: Morgan Kaufmann, 2001.
    [4] Chen W F, Feng G C. Spectral clustering: a semi-supervised approach. Neurocomputing, 2012, 77(1): 229-242
    [5] Zhang W, Yoshida T, Tang X J, Wang Q. Text clustering using frequent itemsets. Knowledge-Based Systems, 2010, 23(5): 379-388
    [6] Hsu C C, Chen C L, Su Y W. Hierarchical clustering of mixed data based on distance hierarchy. Information Sciences, 2007, 177(20): 4474-4492
    [7] Hsu C C, Huang Y P. Incremental clustering of mixed data based on distance hierarchy. Expert Systems with Applications, 2008, 35(3): 1177-1185
    [8] Lloyd S P. Least squares quantization in PCM. IEEE Transactions on Information Theory, 1982, 28(2): 129-137
    [9] Berget I, Mevik B H, Nas T. New modifications and applications of fuzzy C-means methodology. Computational Statistics & Data Analysis, 2008, 52(5): 2403-2418
    [10] Guha S, Rastogi R, Shim K. CURE: an efficient clustering algorithm for large databases. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. Washington: ACM Press, 1998. 73-84
    [11] S. H. Cluster Analysis Algorithms. West Sussex: Ellis Horwood Limited, 1980.
    [12] Zhang T, Ramakrishnan R, Livny M. BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. Montreal: ACM Press, 1996. 103-114
    [13] Ester M, Kriegel H P, Sander J, Xu X W. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of KDD. 1996. 226-232
    [14] Bi Kai, Wang Xiao-Dan, Xing Ya-Qiong. Fuzzy clustering ensemble based on fuzzy measure and DS evidence theory. Control and Decision, 2015, 30(5): 823-830 (毕凯, 王晓丹, 邢雅琼. 基于模糊测度和证据理论的模糊聚类集成方法. 控制与决策, 2015, 30(5): 823-830)
    [15] Liu Z G, Pan Q, Dezert J, Mercier G. Credal C-means clustering method based on belief functions. Knowledge-Based Systems, 2015, 74: 119-132
    [16] Huang Z X. A fast clustering algorithm to cluster very large categorical data sets in data mining. In: Research Issues on Data Mining and Knowledge Discovery. Arizona: ACM Press, 1997. 1-8
    [17] Gan G, Wu J, Yang Z. A genetic fuzzy K-modes algorithm for clustering categorical data. Expert Systems with Applications, 2009, 36(2): 1615-1620
    [18] Barbara D, Couto J, Li Y. COOLCAT: an entropy-based algorithm for categorical clustering. In: Proceedings of the 11th International Conference on Information and Knowledge Management. Virginia: ACM Press, 2002. 582-589
    [19] Huang Z X. Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining. Singapore: World Scientific Publishing, 1997. 21-34
    [20] Chatzis S P. A fuzzy C-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional. Expert Systems with Applications, 2011, 38(7): 8684-8689
    [21] Gath I, Geva A B. Unsupervised optimal fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1989, 711(7): 773-780
    [22] Zheng Z, Gong M G, Ma J J, Jiao L C, Wu Q D. Unsupervised evolutionary clustering algorithm for mixed type data. In: Proceedings of the 2010 IEEE Congress on Evolutionary Computation. Barcelona: IEEE, 2010. 1-8
    [23] Li C, Biswas G. Unsupervised learning with mixed numeric and nominal data. IEEE Transactions on Knowledge and Data Engineering, 2002, 14(4): 673-690
    [24] Goodall D W. A new similarity index based on probability. Biometrics, 1966, 22(4): 882-907
    [25] Hsu C C, Chen Y C. Mining of mixed data with application to catalog marketing. Expert Systems with Applications, 2007, 32(1): 12-23
    [26] Ahmad A, Dey L. A K-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering, 2007, 63(2): 503-527
    [27] Ji J C, Bai T, Zhou C G, Ma C, Wang Z. An improved K-prototypes clustering algorithm for mixed numeric and categorical data. Neurocomputing, 2013, 120: 590-596
    [28] Ji J C, Pang W, Zhou C G, Han X, Wang Z. A fuzzy K-prototype clustering algorithm for mixed numeric and categorical data. Knowledge-based Systems, 2012, 30: 129-135
    [29] Rodriguez A, Laio A. Clustering by fast search and find of density peaks. Science, 2014, 344(6191): 1492-1496
    [30] Wang Song-Gui, Shi Jian-Hong, Yin Su-Ju, Wu Mi-Xia. Introduction to Linear Models. Beijing: Science Press, 2004. (王松桂, 史建红, 尹素菊, 吴密霞. 线性模型引论. 北京: 科学出版社, 2004.)
  • 加载中
计量
  • 文章访问数:  2297
  • HTML全文浏览量:  189
  • PDF下载量:  2050
  • 被引次数: 0
出版历程
  • 收稿日期:  2015-02-03
  • 修回日期:  2015-07-14
  • 刊出日期:  2015-10-20

目录

    /

    返回文章
    返回