Discovering Maximal Frequent Itemsequences Based on Suboperators of Itemsequence Sets and Data Partitioning
-
摘要: 发现频繁项目序列集是关联规则挖掘中的一个重要步骤.该文提出两个发现最大频繁 项目序列的算法Dfis和Dfisp.Dfis算法基于项目序列集操作理论,只有一次数据库扫描.Dfisp 是Dfis的改进算法,它引入数据分割技术以提高内存使用率因而增强对大型数据库的处理能力, 是一个两次数据库扫描算法.实验表明了它们的性能和优势.Abstract: Discovering frequent itemsets or itemsequences is an important phase in mining association rules. This paper presents two new algorithms for discovering frequent itemsequences called Dfis and Dfisp, which are based on suboperators of itemsequence sets and data partitioning techniques. Dfis is an algorithm with one-pass over databases and Dfisp is with two-pass over databases. Experimental results show that using suitable number of data partitioning, Dfisp could keep memory usage space within acceptable ranges.
-
Key words:
- Data mining /
- association rules /
- itemsequences /
- suboperators
计量
- 文章访问数: 2670
- HTML全文浏览量: 60
- PDF下载量: 1011
- 被引次数: 0