基于空间向量分解的边界剥离密度聚类

张瑞霖; 郑海阳; 苗振国; 王鸿鹏

doi:10.16383/j.aas.c220208

基于空间向量分解的边界剥离密度聚类

doi: 10.16383/j.aas.c220208

张瑞霖^1,,
郑海阳^1,,
苗振国^1,,
王鸿鹏^{1, 2, 3,}

1.
哈尔滨工业大学(深圳)计算机科学与技术学院深圳 518071
2.
鹏城实验室深圳 518000
3.
广东省安全智能新技术重点实验室深圳 518000

基金项目: 广东省安全智能新技术重点实验室基础研究项目(2022B1212010005), 深圳市基础研究专项(JCYJ20210324132212030)资助

详细信息

作者简介:
张瑞霖：哈尔滨工业大学(深圳)计算机科学与技术学院博士研究生. 主要研究方向为深度学习, 计算机视觉和数据挖掘. E-mail: zzurlz@163.com

郑海阳：哈尔滨工业大学(深圳)计算机科学与技术学院硕士研究生. 主要研究方向为深度学习. E-mail: 21S151085@stu.hit.edu.cn

苗振国：哈尔滨工业大学(深圳)计算机科学与技术学院硕士研究生. 主要研究方向为深度学习. E-mail: 20S051017@stu.hit.edu.cn

王鸿鹏：哈尔滨工业大学(深圳)计算机科学与技术学院教授. 主要研究方向为计算机视觉, 智能机器人和人工智能. 本文通信作者. E-mail: wanghp@hit.edu.cn

计量
- 文章访问数: 957
- HTML全文浏览量: 311
- PDF下载量: 165
- 被引次数: 0
出版历程
- 收稿日期: 2022-03-21
- 录用日期: 2022-10-14
- 网络出版日期: 2023-04-17
- 刊出日期: 2023-06-20

Density Clustering Based on the Border-peeling Using Space Vector Decomposition

1.
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518071
2.
Peng Cheng Laboratory, Shenzhen 518000
3.
Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies, Shenzhen 518000

Funds: Supported by Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies (2022B1212010005) and Shenzhen Fundamental Research Fund (JCYJ20210324132212030)

More Information

Author Bio:
ZHANG Rui-Lin　Ph.D. candidate at the School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen. His research interest covers deep learning, computer vision, and data mining

ZHENG Hai-Yang　Master student at the School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen. His main research interest is deep learning

MIAO Zhen-Guo　Master student at the School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen. His main research interest is deep learning

WANG Hong-Peng　Professor at the School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen. His research interest covers computer vision, intelligent robot, and artificial intelligence. Corresponding author of this paper

摘要

摘要: 作为聚类的重要组成部分, 边界点在引导聚类收敛和提升模式识别能力方面起着重要作用, 以BP (Border-peeling clustering)为最新代表的边界剥离聚类借助潜在边界信息来确保簇核心区域的空间隔离, 提高了簇骨架代表性并解决了边界隶属问题. 然而, 现有边界剥离聚类仍存在判别特征不完备、判别模式单一、嵌套迭代等约束. 为此, 提出了基于空间向量分解的边界剥离密度聚类(Density clustering based on the border-peeling using space vector decomposition, CBPVD), 以投影子空间和原始数据空间为基准, 从分布稀疏性(紧密性)和方向偏斜性(对称性)两个视角强化边界的细粒度特征, 进而通过主动边界剥离反向建立簇骨架并指导边界隶属. 与同类算法相比, 40个数据集(人工、UCI、视频图像)上的实验结果以及4个视角的理论分析表明了CBPVD在高维聚类和边界模式识别方面具有良好的综合表现.
- 聚类 /
- 空间向量分解 /
- 边界剥离 /
- 投影子空间 /
- 高维 /
- 密度
Abstract: Border points, as an essential part of density clustering, play a key role in guiding clustering convergence and improving pattern recognition ability. Indeed, the border-peeling clustering with BP (border-peeling clustering) as the latest representative ensures the spatial isolation of core region of the cluster by using intrinsic boundary information, then enhancing the cluster backbone. Nevertheless, the performance of available methods tends to be constrained by incomplete discriminant feature, single pattern and multiple iterations. To this end, this paper proposes a novel algorithm named CBPVD (density clustering based on the border-peeling using space vector decomposition). The property of CBPVD is based on the projection subspace and original space to enhance the fine-grained feature representation of the border point from the two perspectives of sparsity (compactness) and skewness (symmetry) of distribution, then reversely establishes the cluster backbone through active boundary peeling and guides the boundary membership. Finally, we compare performance of CBPVD with six state-of-the-art methods over synthetic, UCI, and image datasets. Experiments on 40 datasets and discussion cases from 4 perspectives demonstrate that our algorithm is feasible and effective in clustering and boundary pattern recognition.
- Clustering /
- space vector decomposition /
- border-peeling /
- projection subspace /
- high dimension /
- density

HTML全文

图 1 计算边界置信的图示

Fig. 1 Graph with respect to boundary confidence calculation

下载: 全尺寸图片幻灯片

图 2 CBPVD算法流程

Fig. 2 The algorithm flow of CBPVD

下载: 全尺寸图片幻灯片

图 3 合成数据集的聚类可视化结果

Fig. 3 Visualized results of algorithms on synthetic datasets

下载: 全尺寸图片幻灯片

图 4 CBPVD在Olivetti上的聚类结果

Fig. 4 The clustering results on Olivetti faces by CBPVD

下载: 全尺寸图片幻灯片

图 5 运行时间测试

Fig. 5 Running time test

下载: 全尺寸图片幻灯片

图 6 在手写体数据集上识别的边界信息

Fig. 6 The boundary information extraction on MNIST

下载: 全尺寸图片幻灯片

图 7 Nemenyi检验结果

Fig. 7 The Nemenyi test result

下载: 全尺寸图片幻灯片

图 8 鲁棒性分析

Fig. 8 Robustness analysis

下载: 全尺寸图片幻灯片

表 1 参数设置

Table 1 Hyperparameter configuration

Algorithm	Time complexity
K-means	$k$= The actual number of clusters
DPC	$dc\in [0.1,20]$
SNN-DPC	$k\in [3,70]$
GB-DPC	$dc\in [0.1,20] $
EC	$dc\in [0.1,20]$ or $dc\in [100,300]$
BP	$k\in[3,70], b\in[0.1,0.5], \epsilon\in[0.1,0.5], T\in[100,120],C=2$
CBPVD	$k\in[3,70], \tau\in[0.1,0.4]$

下载: 导出CSV

表 2 数据集基本信息

Table 2 Basic information of datasets

数据集	大小	维度	簇数	特征
Compound	399	2	6	Multi-density, -Scale
R15	600	2	15	Micro, Adjoining
Flame	240	2	2	Overlapping
Parabolic	2000	2	2	Cross-winding, Multi-density
Jain	373	2	2	Cross-winding, Multi-density
4k2-far	400	2	4	Noise, Convex
D31	3100	2	31	Multiple-Micro cluster
Aggregation	788	2	7	Bridging
Spiral	240	2	3	Manifold
Heart disease	303	13	2	UCI, Clinical medicine
Hepatitis	155	19	2	UCI, Clinical medicine
German Credit	1000	20	2	UCI, Financial
Voting	435	16	2	UCI, Political election
Credit Approval	690	15	2	UCI, Credit record
Bank	4521	16	2	UCI, Financial credit
Sonar	208	60	2	UCI, Geology exploration
Zoo	101	7	16	UCI, Biological species
Parkinson	195	22	2	UCI, Clinical medicine
Post	90	8	3	UCI, Postoperative recovery
Spectheart	267	22	2	UCI, Clinical medicine
Wine	178	13	3	UCI, Wine ingredients
Ionosphere	351	34	2	UCI, Atmospheric structure
WDBC	569	30	2	UCI, Cancer
Optical Recognition	5620	64	10	OCR, Handwritten Digits
Olivetti Face	400	10304	40	Face, High-dimensional
You-Tube Faces	10000	10000	41	Video stream, Face
RNA-seq	801	20531	5	Gene expression, Nonlinear
REUTERS	10000	10000	4	Word, News, Text
G2-20	2048	2	2	Noise-20%
G2-30	2048	2	2	Noise-30%
G2-40	2048	2	2	Noise-40%
Size500	500	2	5	Gaussian
Size2500	2500	2	5	Gaussian
Size5000	5000	2	5	Gaussian
Size10000	10000	2	5	Gaussian
Dim128	1024	128	16	High-dimensional
Dim256	1024	256	16	High-dimensional
Dim512	1024	512	16	High-dimensional
Dim1024	1024	1024	16	High-dimensional
MINST	10000	784	10	OCR, high-dimensional

下载: 导出CSV

表 3 算法在合成数据集上的聚类表现

Table 3 Performance comparison of algorithms on all synthetic datasets

Dataset	Algorithm	Parameter	ACC	Purity	JC	ARI	FMI
4k2-far	K-means	$k$= 4	1	1	0.13	1	1
	DPC	$dc$= 0.2168	1	1	1	1	1
	GB-DPC	$dc$= 0.5	1	1	0.26	1	1
	SNN-DPC	$k$= 10	1	1	1	1	1
	EC	$\sigma$= 1	1	1	1	1	1
	BP	—	0.98	0.99	0.01	0.97	0.98
	CBPVD	10, 0.1	1	1	1	1	1
Aggregation	K-means	$k$= 7	0.78	0.94	0	0.76	0.81
	DPC	$k$= 7, $dc$= 2.5	0.91	0.95	0.22	0.84	0.87
	GB-DPC	$dc$= 2.5	0.64	0.99	0.09	0.57	0.68
	SNN-DPC	$k$= 40	0.98	0.98	0	0.96	0.97
	EC	$\sigma$= 5.5	1	1	0	1	1
	BP	—	1	0.95	0.72	0.99	0.99
	CBPVD	16, 0.24	1	1	1	1	1
Compound	K-means	$k$= 6	0.63	0.83	0.23	0.53	0.63
	DPC	$dc$= 1.25	0.64	0.83	0.15	0.54	0.64
	GB-DPC	$dc$= 1.8	0.68	0.83	0.23	0.54	0.64
	SNN-DPC	$k$= 12	0.76	0.84	0.24	0.63	0.74
	EC	$\sigma$= 5.8	0.68	0.86	0.68	0.59	0.69
	BP	—	0.77	0.91	0.77	0.65	0.73
	CBPVD	9, 0.08	0.90	0.91	0.13	0.94	0.96
Flame	K-means	$k$= 2	0.83	0.83	0.83	0.43	0.73
	DPC	$dc$= 0.93	0.84	0.84	0.16	0.45	0.74
	GB-DPC	$dc$= 2	0.99	0.99	0.99	0.97	0.98
	SNN-DPC	$k$= 5	0.99	0.99	0.01	0.95	0.98
	EC	$\sigma$= 5.4	0.80	0.93	0.14	0.51	0.74
	BP	—	0.98	0.99	0.65	0.96	0.98
	CBPVD	3, 0.11	1	1	1	1	1
Spiral	K-means	$k$= 3	0.35	0.35	0.33	−0.01	0.33
	DPC	$dc$= 1.74	0.49	0.49	0.35	0.06	0.38
	GB-DPC	$dc$= 2.95	0.44	0.44	0.36	0.02	0.35
	SNN-DPC	$k$= 10	1	1	0	1	1
	EC	$\sigma$= 10	0.34	0.34	0.32	0	0.58
	BP	—	0.50	0.56	0.50	0.17	0.49
	CBPVD	5, 0.32	1	1	1	1	1
Jain	K-means	$k$= 2	0.79	0.79	0.21	0.32	0.70
	DPC	$dc$= 1.35	0.86	0.86	0.86	0.52	0.79
	GB-DPC	$dc$= 1.35	0.35	0.94	0.18	0.15	0.44
	SNN-DPC	$k$= 10	0.86	0.86	0.14	0.52	0.79
	EC	$\sigma$= 7.65	0.79	0.86	0.19	0.51	0.78
	BP	—	0.42	0.98	0.09	0.23	0.53
	CBPVD	13, 0.16	1	1	0	1	1
R15	K-means	$k$= 15	0.81	0.86	0.03	0.80	0.81
	DPC	$dc$= 0.95	0.99	0.99	0	0.98	0.98
	GB-DPC	$dc$= 0.2	0.99	0.99	0.07	0.99	0.99
	SNN-DPC	$k$= 15	0.99	0.99	0.99	0.99	0.99
	EC	$\sigma$= 1.45	0.98	0.98	0.98	0.97	0.97
	BP	—	0.99	0.99	0	0.99	0.99
	CBPVD	9, 0.13	1	1	1	1	1
Parabolic	K-means	$k$= 2	0.81	0.81	0.81	0.39	0.69
	DPC	$dc$= 1.5	0.82	0.82	0.82	0.41	0.71
	GB-DPC	$dc$= 0.5	0.94	0.94	0.06	0.77	0.89
	SNN-DPC	$k$= 9	0.95	0.95	0.95	0.81	0.91
	EC	$\sigma$= 3.05	0.73	0.73	0.73	0.21	0.66
	BP	—	0.19	0.98	0.03	0.13	0.36
	CBPVD	33, 0.27	1	1	1	1	1
D31	K-means	$k$= 31	0.88	0.91	0	0.87	0.87
	DPC	$dc$= 1.8	0.97	0.97	0	0.94	0.94
	GB-DPC	$dc$= 4	0.46	0.46	0.02	0.32	0.45
	SNN-DPC	$k$= 40	0.97	0.97	0	0.94	0.94
	EC	$\sigma$= 4	0.91	0.91	0.06	0.88	0.89
	BP	—	0.94	0.95	0	0.90	0.91
	CBPVD	13, 0.15	0.97	0.97	0.07	0.94	0.94

下载: 导出CSV

表 4 算法在16个真实数据集(UCI)上的聚类表现

Table 4 Performance comparison of algorithms on 16 real-world datasets

Dataset	Algorithm	Parameter	ACC	Purity	JC	ARI	FMI
Heart disease	K-means	$k$= 2	0.57	0.57	0.57	0.02	0.52
	DPC	$dc$= 19.4424	0.55	0.55	0.45	0.01	0.51
	GB-DPC	$dc$= 19.4424	0.54	0.54	0.54	0	0.71
	SNN-DPC	$k$= 65	0.59	0.59	0.41	0.03	0.54
	EC	$\sigma$= 100	0.54	0.54	0.46	−0.001	0.71
	BP	—	0.53	0.54	0.47	−0.002	0.68
	CBPVD	0.27, 26	0.68	0.68	0.32	0.12	0.77
Hepatitis	K-means	$k$= 2	0.66	0.84	0.66	−0.02	0.67
	DPC	$dc$= 1	0.63	0.84	0.01	−0.11	0.61
	GB-DPC	$dc$= 10.2	0.73	0.70	0.28	−0.01	0.72
	SNN-DPC	$k$= 45	0.70	0.84	0.30	−0.07	0.71
	EC	$\sigma$= 5.8	0.01	1	0.01	0	0.01
	BP	—	0.83	0.84	0.83	−0.02	0.84
	CBPVD	10, 0.2	0.84	0.84	0.76	0	0.85
German	K-means	2	0.67	0.70	0.33	0.05	0.66
	DPC	$dc$= 53.9814	0.61	0.70	0.61	0.03	0.58
	GB-DPC	$dc$= 53.9814	0.61	0.70	0.61	0.03	0.58
	SNN-DPC	$k$= 30	0.62	0.70	0.39	0.01	0.61
	EC	$\sigma$= 100	0.15	0.72	0.01	0.01	0.20
	BP	—	0.14	0.70	0.07	0.001	0.20
	CBPVD	4, 0.39	0.83	0.83	0.83	0.43	0.74
Voting	K-means	$k$= 2	0.51	0.61	0.51	−0.002	0.51
	DPC	$dc$= 1	0.81	0.81	0.19	0.39	0.7
	GB-DPC	$dc$= 1.7	0.87	0.87	0.87	0.54	0.78
	SNN-DPC	$k$= 60	0.88	0.88	0.12	0.57	0.79
	EC	$\sigma$= 2	0.75	0.89	0.75	0.42	0.68
	BP	—	0.86	0.91	0.05	0.59	0.79
	CBPVD	66, 0.33	0.88	0.88	0.12	0.68	0.79
Credit	K-means	$k$= 2	0.55	0.55	0.45	0.003	0.71
	DPC	$dc$= 1	0.68	0.68	0.68	0.13	0.60
	GB-DPC	$dc$= 7	0.55	0.55	0.45	0	0.71
	SNN-DPC	$k$= 50	0.61	0.61	0.61	0.05	0.53
	EC	$\sigma$= 800	0.56	0.59	0	0.02	0.68
	BP	—	0.33	0.69	0.26	0.06	0.35
	CBPVD	31, 0.33	0.85	0.85	0.85	0.49	0.74
Bank	K-means	$k$= 2	0.82	0.88	0.11	−0.002	0.82
	DPC	$dc$= 2.39	0.64	0.88	0.14	0.04	0.65
	GB-DPC	$dc$= 10	0.76	0.74	0.24	−0.02	0.76
	SNN-DPC	$k$= 3	0.81	0.88	0.81	0.01	0.81
	EC	$\sigma$= 300	0.82	0.82	0	0.02	0.82
	BP	—	0.24	0.88	0.09	0.01	0.29
	CBPVD	24, 0.2	0.88	0.88	0.12	0	0.89
Sonar	K-means	$k$= 2	0.54	0.54	0.34	0.50	0.50
	DPC	$dc$= 2.82	0.58	0.58	0.42	0.02	0.66
	GB-DPC	$dc$= 1.4	0.51	0.53	0.51	−0.004	0.51
	SNN-DPC	$k$= 19	0.50	0.53	0.50	−0.01	0.51
	EC	$\sigma$= 1.6	0.54	0.57	0.07	0.01	0.66
	BP	—	0.51	0.53	0.51	−0.004	0.68
	CBPVD	9, 0.66	0.66	0.66	0.66	0.10	0.60
ZOO	K-means	$k$= 7	0.76	0.84	0.62	0.6	0.69
	DPC	$dc$= 2.4	0.70	0.79	0.36	0.59	0.68
	GB-DPC	$dc$= 3.6	0.66	0.75	0.03	0.48	0.60
	SNN-DPC	$k$= 5	0.56	0.56	0.12	0.31	0.53
	EC	$\sigma$= 2.3	0.80	0.81	0.08	0.65	0.73
	BP	—	0.59	0.59	0.23	0.4	0.62
	CBPVD	10, 0.15	0.86	0.86	0.01	0.93	0.94
Parkinson	K-means	$k$= 2	0.72	0.75	0.28	0	0.74
	DPC	$dc$= 1.3	0.66	0.75	0.34	0.05	0.63
	GB-DPC	$dc$= 3	0.71	0.71	0.29	−0.05	0.75
	SNN-DPC	$k$= 80	0.72	0.75	0.28	0.11	0.69
	EC	$\sigma$= 135	0.70	0.75	0.7	0.14	0.66
	BP	—	0.19	0.98	0.03	0.13	0.36
	CBPVD	13, 0.16	0.82	0.82	0.82	0.25	0.81
POST	K-means	$k$= 3	0.43	0.71	0.43	−0.002	0.45
	DPC	$dc$= 1	0.53	0.71	0.53	−0.01	0.52
	GB-DPC	$dc$= 2.7	0.61	0.71	0.38	−0.03	0.62
	SNN-DPC	$k$= 60	0.61	0.71	0.61	0.02	0.60
	EC	$\sigma$= 6	0.70	0.72	0.05	0.04	0.74
	BP	—	0.62	0.72	0.09	0.04	0.61
	CBPVD	10, 0.01	0.79	0.79	0.79	0.25	0.78
Spectheart	K-means	$k$= 2	0.64	0.92	0.64	−0.05	0.69
	DPC	$dc$= 1.4142	0.52	0.92	0.48	−0.01	0.65
	GB-DPC	$dc$= 1.1	0.52	0.92	0.08	0	0.92
	SNN-DPC	$k$= 80	0.87	0.92	0.13	0.11	0.87
	EC	$\sigma$= 4	0.92	0.92	0.08	0	0.92
	BP	—	0.91	0.92	0.91	−0.01	0.91
	CBPVD	15, 0.26	0.92	0.92	0.08	0	0.92
Wine	K-means	$k$= 4	0.66	0.70	0.11	0.32	0.54
	DPC	$dc$= 0.5	0.55	0.58	0.43	0.15	0.57
	GB-DPC	$dc$= 5.6	0.60	0.71	0.35	0.27	0.50
	SNN-DPC	$k$= 3	0.62	0.66	0.51	0.34	0.63
	EC	$\sigma$= 250	0.66	0.66	0.66	0.37	0.66
	BP	—	0.68	0.71	0.21	0.34	0.56
	CBPVD	4, 0.03	0.91	0.95	0.75	0.8	0.87
Ionosphere	K-means	$k$= 2	0.71	0.71	0.71	0.18	0.61
	DPC	$dc$= 3.7	0.65	0.65	0.35	0.02	0.73
	GB-DPC	$dc$= 3.7	0.65	0.65	0.35	0.02	0.73
	SNN-DPC	$k$= 34	0.67	0.67	0.67	0.11	0.57
	EC	$\sigma$= 5	0.65	0.67	0	0.05	0.73
	BP	—	0.80	0.80	0.80	0.34	0.76
	CBPVD	6, 0.51	0.83	0.83	0.87	0.42	0.77
WDBC	K-means	$k$= 2	0.74	0.89	0.22	0.54	0.76
	DPC	$dc$= 5	0.67	0.67	0.67	0.10	0.60
	GB-DPC	$dc$= 3.9	0.63	0.63	0.63	0	0.73
	SNN-DPC	$k$= 3	0.81	0.81	0.19	0.36	0.75
	EC	$\sigma$= 350	0.82	0.87	0	0.49	0.78
	BP	—	0.44	0.88	0.12	0.25	0.52
	CBPVD	3, 0.6	0.95	0.95	0.05	0.81	0.91
RNN-seq	K-means	$k$= 5	0.75	0.75	0.17	0.72	0.79
	DPC	$dc$= 159.6	0.70	0.73	0.39	0.62	0.76
	GB-DPC	$dc$= 159.6	0.73	0.73	0.54	0.63	0.77
	SNN-DPC	$k$= 30	0.73	0.73	0.001	0.51	0.71
	EC	$\sigma$= 240	0.38	0.38	0.17	0	0.49
	BP	—	0.78	0.74	0.002	0.63	0.72
	CBPVD	10, 0.4	0.996	0.996	0.81	0.99	0.99
REUTERS	K-means	$k$= 4	0.50	0.58	0.22	0.15	0.41
	DPC	$dc$= 3.5	0.43	0.43	0.28	0.10	0.46
	GB-DPC	$dc$= 3.5	0.35	0.55	0	0.14	0.41
	SNN-DPC	$k$= 40	0.49	0.50	0.49	0.24	0.54
	EC	$\sigma$= 300	0.40	0.40	0.40	0	0.55
	BP	—	0.39	0.41	0.38	0.01	0.50
	CBPVD	20, 0.1	0.61	0.61	0.61	0.23	0.47

下载: 导出CSV

表 5 图像数据集的聚类结果

Table 5 Performance comparison of algorithms on image datasets

Dataset	Algorithm	Parameter	ACC	Purity	JC	ARI	FMI
Olivetti	K-means	$k$= 40	0.64	0.67	0.01	0.517	0.54
	DPC	$dc$= 0.922	0.59	0.65	0.02	0.523	0.56
	GB-DPC	$dc$= 0.65	0.65	0.73	0.05	0.577	0.59
	SNN-DPC	$k$= 40	0.66	0.74	0	0.585	0.61
	EC	$\sigma$= 3700	0.44	0.58	0.02	0.22	0.32
	BP	—	0.03	0.03	0.03	0	0.15
	CBPVD	4, 0.14	0.75	0.78	0	0.646	0.68
Optical	K-means	$k$= 10	0.71	0.73	0.04	0.58	0.63
	DPC	$dc$= 1.1	0.60	0.62	0.09	0.475	0.56
	GB-DPC	$dc$= 10.5	0.61	0.62	0.02	0.468	0.56
	SNN-DPC	$k$= 10	0.71	0.73	0.20	0.629	0.69
	EC	$\sigma$= 30	0.69	0.69	0.17	0.596	0.67
	BP	—	0.80	0.85	0	0.717	0.75
	CBPVD	4, 0.45	0.93	0.95	0.30	0.889	0.90
You-Tube Faces	K-means	$k$= 41	0.52	0.63	0.02	0.51	0.53
	DPC	$dc$= 6.5	0.53	0.62	0.02	0.48	0.51
	GB-DPC	$dc$= 6.5	0.31	0.31	0	0.25	0.35
	SNN-DPC	$k$= 59	0.57	0.69	0.03	0.47	0.50
	EC	$\sigma$= 100	0.51	0.56	0.01	0.40	0.46
	BP	—	0.52	0.62	0.04	0.19	0.32
	CBPVD	20, 0.1	0.66	0.88	0.01	0.62	0.64

下载: 导出CSV

表 6 复杂度对比

Table 6 The time complexity of algorithms

Algorithm	Time complexity
DBSACN	$\text{O}(n^2)$
DPC	$\text{O}(n^2)$
GB-DPC	$\text{O}(n\log_2n)$
SNN-DPC	$\text{O}(n^2)$
DPC-RDE	$\text{O}(n^2)$
RA-Clust	$\text{O}(n\sqrt{n})$
EC	$\text{O}(n^2)$
BP	$\text{O}(n^2)$
CBPVD	$\text{O}(n^2)$

下载: 导出CSV

参考文献(34)

[1]	朱颖雯, 陈松灿. 基于随机投影的高维数据流聚类. 计算机研究与发展, 2020, 57(8): 1683-1696 doi: 10.7544/issn1000-1239.2020.20200432 Zhu Ying-Wen, Chen Song-Can. High dimensional data stream clustering algorithm based on random projection. Journal of Computer Research and Development, 2020, 57(8): 1683-1696 doi: 10.7544/issn1000-1239.2020.20200432
[2]	Xia S Y, Peng D W, Meng D Y, Zhang C Q, Wang G Y, Giem E, et al. Ball k k-means: fast adaptive clustering with no bounds. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2022, 44(01): 87-99
[3]	Rodriguez A, Laio A. Clustering by fast search and find of density peaks. Science, 2014, 344(6191): 1492-1469 doi: 10.1126/science.1242072
[4]	Flores K G, Garza S E. Density peaks clustering with gap-based automatic center detection. Knowledge-Based Systems, 2020, 206: Article No. 160350
[5]	Wang S L, Li Q, Zhao C F, Zhu X Q, Yuan H N, Dai T R. Extreme clustering–a clustering method via density extreme points. Information Sciences, 2021, 542: 24-39 doi: 10.1016/j.ins.2020.06.069
[6]	Hou J, Zhang A H, Qi N M. Density peak clustering based on relative density relationship. Pattern Recognition, 2020, 108: Article No. 107554
[7]	Xu X, Ding S F, Wang Y R, Wang L J, Jia W K. A fast density peaks clustering algorithm with sparse search. Information Sciences, 2021, 554: 61-83 doi: 10.1016/j.ins.2020.11.050
[8]	Weng S Y, Gou J, Fan Z W. h-DBSCAN: A simple fast DBSCAN algorithm for big data. In: Proceedings of Asian Conference on Machine Learning. New York, USA: PMLR, 2021. 81−96
[9]	Ester M, Kriegel H, Sander J, Xu X W. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of Knowledge Discovery and Data Mining. New York, USA: ACM, 1996. 226−231
[10]	Fang F, Qiu L, Yuan S F. Adaptive core fusion-based density peak clustering for complex data with arbitrary shapes and densities. Pattern Recognition, 2020, 107: Article No. 107452
[11]	Chen M, Li L J, Wang B, Cheng J J, Pan L N, Chen X Y. Effectively clustering by finding density backbone based-on kNN. Pattern Recognition, 2016, 60: 486-498 doi: 10.1016/j.patcog.2016.04.018
[12]	Averbuch-Elor H, Bar N, Cohen-Or D. Border peeling clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 42(7): 1791-1797
[13]	Cao X F, Qiu B Z, Li X L, Shi Z L, Xu G D, Xu J L. Multidimensional balance-based cluster boundary detection for high-dimensional data. IEEE Transactions on Neural Networks and Learning Systems, 2018, 30(6): 1867-1880
[14]	Qiu B Z, Cao X F. Clustering boundary detection for high dimensional space based on space inversion and Hopkins statistics. Knowledge-Based Systems, 2016, 98: 216-225 doi: 10.1016/j.knosys.2016.01.035
[15]	Zhang R L, Song X H, Ying S R, Ren H L, Zhang B Y, Wang H P. CA-CSM: a novel clustering algorithm based on cluster center selection model. Soft Computing, 2021, 25(13): 8015-8033 doi: 10.1007/s00500-021-05835-w
[16]	Li X L, Han Q, Qiu B Z. A clustering algorithm using skewness-based boundary detection. Neurocomputing, 2018, 275: 618-626 doi: 10.1016/j.neucom.2017.09.023
[17]	Yu H, Chen L Y, Yao J T. A three-way density peak clustering method based on evidence theory. Knowledge-Based Systems, 2021, 211: Article No. 106532
[18]	Tong Q H, Li X, Yuan B. Efficient distributed clustering using boundary information. Neurocomputing, 2018, 275: 2355-2366 doi: 10.1016/j.neucom.2017.11.014
[19]	Zhang S Z, You C, Vidal R, Li C G. Learning a self-expressive network for subspace clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2021. 12393−12403
[20]	MacQueen J. Classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symp. Math. Statist. Probability. Berkeley, USA: University of California Press, 1967. 281−297
[21]	Liu R, Wang H, Yu X M. Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Information Sciences, 2018, 450: 200-226 doi: 10.1016/j.ins.2018.03.031
[22]	Gong C Y, Su Z G, Wang P H, Wang Q. Cumulative belief peaks evidential K-nearest neighbor clustering. Knowledge-Based Systems, 2020, 200: Article No. 105982
[23]	邱保志, 张瑞霖, 李向丽. 基于残差分析的混合属性数据聚类算法. 自动化学报, 2020, 46(7): 1420-1432 doi: 10.16383/j.aas.2018.c180030 QIU Bao-Zhi, ZHANG Rui-Lin, LI Xiang-Li. Clustering algorithm for mixed data based on residual analysis. Acta Automatica Sinica, 2020, 46(7): 1420-1432 doi: 10.16383/j.aas.2018.c180030
[24]	Zhang R L, Miao Z G, Tian Y, Wang H P. A novel density peaks clustering algorithm based on Hopkins statistic. Expert Systems with Applications, 2022, 201: Article No. 116892
[25]	Liu Y H, Ma Z M, Yu F. Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy. Knowledge-Based Systems, 2017, 133: 208-220 doi: 10.1016/j.knosys.2017.07.010
[26]	Abbas M, El-Zoghabi A, Shoukry A. DenMune: Density peak based clustering using mutual nearest neighbors. Pattern Recognition, 2021, 109: Article No. 107589
[27]	Ren Y Z, Hu X H, Shi K, Yu G X, Yao D Z, Xu Z L. Semi-supervised denpeak clustering with pairwise constraints. In: Proceedings of Pacific Rim International Conference on Artificial Intelligence. Cham, Switzerland: Springer, 2018. 837−850
[28]	Ren Y Z, Wang N, Li M X, Xu Z L. Deep density-based image clustering. Knowledge-Based Systems, 2020, 197: 105841 doi: 10.1016/j.knosys.2020.105841
[29]	Gao T F, Chen D, Tang Y B, Du B, Ranjan R, Zomaya A Y. Adaptive density peaks clustering: Towards exploratory EEG analysis. Knowledge-Based Systems, 2022, 240: Article No. 108123
[30]	Xu J, Wang G Y, Deng W H. DenPEHC: density peak based efficient hierarchical clustering. Information Sciences, 2016, 373: 200-218 doi: 10.1016/j.ins.2016.08.086
[31]	Ren Y Z, Kamath U, Domeniconi C, Zhang G J. Boosted mean shift clustering. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Berlin, German: Springer, 2014. 646−661
[32]	Lotfi A, Moradi P, Beigy H. Density peaks clustering based on density backbone and fuzzy neighborhood. Pattern Recognition, 2020, 107: Article No. 107449
[33]	Teng Q, Yong J L. Fast LDP-MST: An efficient density-peak-based clustering method for large-size datasets. IEEE Transactions on Knowledge and Data Engineering, DOI: 10.1109/TKDE.2022.3150403
[34]	Brooks J K. Decomposition theorems for vector measures. Proceedings of the American Mathematical Society, 1969, 21(1): 27-29 doi: 10.1090/S0002-9939-1969-0237743-1