摘要:
Skyline 查询的结果集为数据集中不被其他对象所``支配''的对象的全体. 近年来, 它在在线服务、决策支持和实时监测等领域的良好应用前景, 使其成为数据管理与数据挖掘领域的研究热点. 实际应用中, 用户通常期望快速、渐进地获得 Skyline 计算结果, 而流数据的连续、海量、高维等特性, 使得在确保查询质量损失受控的前提下挖掘稀疏 Skyline 集合成为一个极具价值和挑战性的问题. 本文首先提出一个新颖的概念: 稀疏 Skyline (Sparse-skyline), 它采用一个 Skyline 对象来代表其周围 ε-邻域内的所有 Skyline 对象; 接着, 给出了通过数据维度之间的相关性来自适应调整查询质量的两个在线算法; 最后, 理论分析和实验结果表明, 与现有的 Skyline 挖掘算法相比, 本文提出的方法具有良好的性能和效率, 更适合于数据流应用.
Abstract:
Skyline query set includes the objects that are not ``dominated'' by other objects in the dataset. In recent years, skyline query has been becoming a hot research topic due to its potential applications in online services, decision-making and real-time monitoring fields. Usually, people care about obtaining the skyline set quickly and progressively in real applications, however, because of the continuity, large-volume, and high-dimension of stream data, mining the sparse skyline set over data stream under control of losing quality is a more meaningful and challenging problem. In this paper, firstly, we propose a novel concept, called sparse-skyline, which uses a skyline object that represents its nearby skyline neighbors within ε-distance (acceptable difference). Then, two algorithms are developed which adopt correlation coefficient to adjust adaptively the quality of the sparse skyline query. Furthermore, theoretical analysis and experimental results show that the proposed methods are more efficient and effective compared with the existing skyline computing algorithm, and are suitable for data stream applications.