摘要:
为了解决空间数据流中任意形状簇的聚类问题,提出了一种基于密度的空间数据流在线聚类算法(On-line density-based clustering algorithm for spatial datastream,OLDStream),该算法在先前聚类结果上聚类增量空间数据,仅对新增空间点及其满足核心点条件的邻域数据做局部聚类更新,降低聚类更新的时间复杂度,实现对空间数据流的在线聚类.OLDStream算法具有快速处理大规模空间数据流、实时获取全局任意形状的聚类簇结果、对数据流的输入顺序不敏感、并能发现孤立点数据等优势.在真实数据和合成数据上的综合实验验证了算法的聚类效果、高效率性和较高的可伸缩性,同时实验结果的统计分析显示仅有4%的空间点消耗最坏运行时间,对每个空间点的平均聚类时间约为0.033 ms.
Abstract:
We propose an efficient online density-based clustering algorithm (On-line density-based clustering algorithm for spatial data stream, OLDStream), which is designed for online discovering clusters in spatial data stream. In OLDStream, only the new spatial point and its adjunct points which satisfy core point are processed in clustering update. And the overall clusters results can be accessed instantaneously. The developed algorithm has exhibited many advantages such as its high scalability to online process incremental large-scale spatial data, its capability to discover overall clusters with arbitrary shape instantaneously, its insensitivity to the input sequence of data stream, and its capability to detect all isolated points. An experimental evaluation of the effectiveness, efficiency and scalability of our algorithm was performed by using real data and large synthetic data from Matlab and Thomas Brinkhoff's network-based generator. Experimental results vividly demonstrated that our algorithm can fast and efficiently cluster new points based on the previous points. The statistics of the results showed that only 4% of the points take the worst case running time, and the average running time is about 0.033 ms for each point process.