摘要:
相似度聚类方法(Similarity-based clustering method, SCM)因其简单易实现和具有鲁棒性而广受关注. 但由于内含相似度聚类算法 (Similarity clustering algorithm, SCA)的高时间复杂度和凝聚型层次聚类 (Agglomerative hierarchical clustering, AHC) 的高空间复杂度, SCM不适用大数据集场合. 本文首先发现了SCM和核密度估计问题的本质联系, 并以此入手, 通过快速压缩集密度估计器(Fast reduced set density estimator, FRSDE)和基于图的松弛聚类(Graph-based relaxed clustering, GRC)算法提出了快速自适应相似度聚类方法(Fast adaptive similarity-based clustering method, FASCM). 相比于原SCM, 该方法的主要优点是: 1)其总体渐近时间复杂度与样本容量呈线性关系; 2) 不依赖于人工经验的干预, 具有了自适应性. 由此, FASCM适用于大数据集环境. 该方法的有效性在图像分割应用中进行了验证.
Abstract:
Similarity-based clustering method (SCM) has received much attention because it is robust and can be implemented simply and easily. However, because of its high time complexity of the embedded similarity clustering algorithm (SCA) and high space complexity of the embedded agglomerative hierarchical clustering (AHC ), SCM is impractical for large data sets. In this paper, the relationship is revealed between SCM and the kernel density estimation of samples, a novel fast adaptive similarity-based clustering method (FASCM) is accordingly proposed by adopting fast reduced set density estimator (FRSDE) and graph-based relaxed clustering (GRC). The distinctive advantages of FMSSC over MSSC exist in: 1) its asymptotic linear time complexity with the data size; 2) independent on artificial experience and its adaptability. Thus, FASCM is practical for large datasets. Its effectiveness has also been demonstrated in image segmentation examples.