An Incremental Learning Vector Quantization Algorithm Based on Pattern Density and Classification Error Ratio
-
摘要: 作为一种简单而成熟的分类方法, K最近邻(K nearest neighbor, KNN)算法在数据挖掘、模式识别等领域获得了广泛的应用, 但仍存在计算量大、高空间消耗、运行时间长等问题. 针对这些问题, 本文在增量学习型矢量量化(Incremental learning vector quantization, ILVQ)的单层竞争学习基础上, 融合样本密度和分类误差率的邻域思想, 提出了一种新的增量学习型矢量量化方法, 通过竞争学习策略对代表点邻域实现自适应增删、合并、分裂等操作, 快速获取原始数据集的原型集, 进而在保障分类精度基础上, 达到对大规模数据的高压缩效应. 此外, 对传统近邻分类算法进行了改进, 将原型近邻集的样本密度和分类误差率纳入到近邻判决准则中. 所提出算法通过单遍扫描学习训练集可快速生成有效的代表原型集, 具有较好的通用性. 实验结果表明, 该方法同其他算法相比较, 不仅可以保持甚至提高分类的准确性和压缩比, 且具有快速分类的优势.Abstract: As a simple and mature classification method, the K nearest neighbor algorithm (KNN) has been widely applied to many fields such as data mining, pattern recognition, etc. However, it faces serious challenges such as huge computation load, high memory consumption and intolerable runtime burden when the processed dataset is large. To deal with the above problems, based on the single-layer competitive learning of the incremental learning vector quantization (ILVQ) network, we propose a new incremental learning vector quantization method that merges together pattern density and classification error rate. By adopting a series of new competitive learning strategies, the proposed method can obtain an incremental prototype set from the original training set quickly by learning, inserting, merging, splitting and deleting these representative points adaptively. The proposed method can achieve a higher reduction efficiency while guaranteeing a higher classification accuracy synchronously for large-scale dataset. In addition, we improve the classical nearest neighbor classification algorithm by absorbing pattern density and classification error ratio of the final prototype neighborhood set into the classification decision criteria. The proposed method can generate an effective representative prototype set after learning the training dataset by a single pass scan, and hence has a strong generality. Experimental results show that the method not only can maintain and even improve the classification accuracy and reduction ratio, but also has the advantage of rapid prototype acquisition and classification over its counterpart algorithms.
-
Key words:
- Learning vector quantification /
- incremental learning /
- classification error ratio /
- pattern density /
- merge /
- split
-
[1] Wu X D, Kumar V, Quinlan J R, Ghosh J, Yang Q, Motoda H, McLachlan G J, Ng A, Liu B, Yu P S, Zhou Z H, Steinbach M, Hand D J, Steinberg D. Top 10 algorithms in data mining. Knowledge and Information Systems, 2008, 14(1): 1-37 [2] [2] Triguero I, Derrac J, Garcia S, Herrera F. A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Transactions on Systems, Man, and Cybernetics Part C: Applications and Reviews, 2012, 42(1): 86-100 [3] [3] Rico-Juan J R, Iesta J M. New rank methods for reducing the size of the training set using the nearest neighbor rule. Pattern Recognition Letters, 2012, 33(5): 654-660 [4] [4] Angiulli F. Fast nearest neighbor condensation for large data sets classification. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(11): 1450-1464 [5] [5] Wilson D L. Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on System, Man, and Cybernetics, 1972, SMC-2(3): 408-421 [6] [6] Wilson D R, Martinez T R. Reduction techniques for instance-based learning algorithms. Machine Learning, 2000, 38(3): 257-286 [7] [7] Chang C L. Finding prototypes for nearest neighbor classifiers. IEEE Transactions on Computers, 1974, C-23(11): 1179-1184 [8] [8] Raicharoen T, Lursinsap C. A divide-and-conquer approach to the pairwise opposite class-nearest neighbor (POC-NN) algorithm. Pattern Recognition Letters, 2005, 26(10): 1554 -1567 [9] [9] Kim S W, Oommen B J. Enhancing prototype reduction schemes with LVQ3-type algorithms. Pattern Recognition, 2003, 36(5): 1083-1093 [10] Fayed H A, Atiya A F. A novel template reduction approach for the k-nearest neighbor method. IEEE Transactions on Neural Networks, 2009, 20(5): 890-896 [11] Olvera-Lpez J A, Carrasco-Ochoa J A, Martnez-Trinidad J F. A new fast prototype selection method based on clustering. Pattern Analysis and Applications, 2010, 13(2): 131- 141 [12] Kohonen T. Self-Organizing Maps (3rd Edition). New York: Springer-Verlag, 2001. [13] Polikar R, Byorick J, Krause S, Marino A, Moreton M. Learn++: a classifier independent incremental learning algorithm for supervised neural networks. In: Proceedings of the 2002 International Joint Conference on Neural Networks. Honolulu, HI: IEEE, 2002, 2: 1742-1747 [14] Zheng J, Shen F R, Fan H J, Zhao J X. An online incremental learning support vector machine for large-scale data. Neural Computing and Applications, 2013, 22(5): 1023- 1035 [15] Liu J, Lee J P Y, Li L J, Luo Z Q, Wong K M. Online clustering algorithms for radar emitter classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1185-1196 [16] Seo S, Mohr J, Obermayer K. A new incremental pairwise clustering algorithm. In: Proceedings of the 2009 International Conference on Machine Learning and Applications. Miami Beach, FL: IEEE, 2009. 223-228 [17] Xu Y, Shen F R, Zhao J X. An incremental learning vector quantization algorithm for pattern classification. Neural Computing and Applications, 2012, 21(6): 1205-1215 [18] Calaa Y P, Reyes E G, Alzate M O, Duin R P W. Prototype selection for dissimilarity representation by a genetic algorithm. In: Proceedings of the 20th International Conference on Pattern Recognition. Istanbul: IEEE, 2010. 177- 180
点击查看大图
计量
- 文章访问数: 1350
- HTML全文浏览量: 65
- PDF下载量: 1010
- 被引次数: 0