基于样本密度和分类误差率的增量学习矢量量化算法研究

李娟; 王宇平

doi:10.16383/j.aas.2015.c140311

基于样本密度和分类误差率的增量学习矢量量化算法研究

doi: 10.16383/j.aas.2015.c140311

李娟^1,2,
王宇平^1, ,

1.
西安电子科技大学计算机学院西安 710071;
2.
陕西师范大学远程教育学院西安 710062

基金项目:

国家自然科学基金(61203372, 61472297)资助

详细信息

作者简介:
李娟西安电子科技大学计算机学院博士研究生, 陕西师范大学远程教育学院讲师. 主要研究方向为数据挖掘, 模式识别.E-mail: ally 2004@126.com

通讯作者:
王宇平西安电子科技大学计算机学院教授. 主要研究方向为进化计算, 运筹学,模式识别, 机器学习. E-mail: ywang@xidian.edu.cn

计量
- 文章访问数: 1416
- HTML全文浏览量: 65
- PDF下载量: 1030
- 被引次数: 0
出版历程
- 收稿日期: 2014-05-08
- 修回日期: 2014-09-27
- 刊出日期: 2015-06-20

An Incremental Learning Vector Quantization Algorithm Based on Pattern Density and Classification Error Ratio

LI Juan^1,2,
WANG Yu-Ping^{1
, ,}

1.
School of Computer Science and Technology, Xidian University, Xi'an 710071;
2.
School of Distance Education, Shaanxi Normal University, Xi'an 710062

Funds:

Supported by National Natural Science Foundation of China (61203372, 61472297)

摘要

摘要: 作为一种简单而成熟的分类方法, K最近邻(K nearest neighbor, KNN)算法在数据挖掘、模式识别等领域获得了广泛的应用, 但仍存在计算量大、高空间消耗、运行时间长等问题. 针对这些问题, 本文在增量学习型矢量量化(Incremental learning vector quantization, ILVQ)的单层竞争学习基础上, 融合样本密度和分类误差率的邻域思想, 提出了一种新的增量学习型矢量量化方法, 通过竞争学习策略对代表点邻域实现自适应增删、合并、分裂等操作, 快速获取原始数据集的原型集, 进而在保障分类精度基础上, 达到对大规模数据的高压缩效应. 此外, 对传统近邻分类算法进行了改进, 将原型近邻集的样本密度和分类误差率纳入到近邻判决准则中. 所提出算法通过单遍扫描学习训练集可快速生成有效的代表原型集, 具有较好的通用性. 实验结果表明, 该方法同其他算法相比较, 不仅可以保持甚至提高分类的准确性和压缩比, 且具有快速分类的优势.
- 学习矢量量化 /
- 增量学习 /
- 分类误差率 /
- 样本密度 /
- 合并 /
- 分裂
Abstract: As a simple and mature classification method, the K nearest neighbor algorithm (KNN) has been widely applied to many fields such as data mining, pattern recognition, etc. However, it faces serious challenges such as huge computation load, high memory consumption and intolerable runtime burden when the processed dataset is large. To deal with the above problems, based on the single-layer competitive learning of the incremental learning vector quantization (ILVQ) network, we propose a new incremental learning vector quantization method that merges together pattern density and classification error rate. By adopting a series of new competitive learning strategies, the proposed method can obtain an incremental prototype set from the original training set quickly by learning, inserting, merging, splitting and deleting these representative points adaptively. The proposed method can achieve a higher reduction efficiency while guaranteeing a higher classification accuracy synchronously for large-scale dataset. In addition, we improve the classical nearest neighbor classification algorithm by absorbing pattern density and classification error ratio of the final prototype neighborhood set into the classification decision criteria. The proposed method can generate an effective representative prototype set after learning the training dataset by a single pass scan, and hence has a strong generality. Experimental results show that the method not only can maintain and even improve the classification accuracy and reduction ratio, but also has the advantage of rapid prototype acquisition and classification over its counterpart algorithms.
- Learning vector quantification /
- incremental learning /
- classification error ratio /
- pattern density /
- merge /
- split

HTML全文

参考文献(18)

[1]	Wu X D, Kumar V, Quinlan J R, Ghosh J, Yang Q, Motoda H, McLachlan G J, Ng A, Liu B, Yu P S, Zhou Z H, Steinbach M, Hand D J, Steinberg D. Top 10 algorithms in data mining. Knowledge and Information Systems, 2008, 14(1): 1-37
[2]	[2] Triguero I, Derrac J, Garcia S, Herrera F. A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Transactions on Systems, Man, and Cybernetics Part C: Applications and Reviews, 2012, 42(1): 86-100
[3]	[3] Rico-Juan J R, Iesta J M. New rank methods for reducing the size of the training set using the nearest neighbor rule. Pattern Recognition Letters, 2012, 33(5): 654-660
[4]	[4] Angiulli F. Fast nearest neighbor condensation for large data sets classification. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(11): 1450-1464
[5]	[5] Wilson D L. Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on System, Man, and Cybernetics, 1972, SMC-2(3): 408-421
[6]	[6] Wilson D R, Martinez T R. Reduction techniques for instance-based learning algorithms. Machine Learning, 2000, 38(3): 257-286
[7]	[7] Chang C L. Finding prototypes for nearest neighbor classifiers. IEEE Transactions on Computers, 1974, C-23(11): 1179-1184
[8]	[8] Raicharoen T, Lursinsap C. A divide-and-conquer approach to the pairwise opposite class-nearest neighbor (POC-NN) algorithm. Pattern Recognition Letters, 2005, 26(10): 1554 -1567
[9]	[9] Kim S W, Oommen B J. Enhancing prototype reduction schemes with LVQ3-type algorithms. Pattern Recognition, 2003, 36(5): 1083-1093
[10]	Fayed H A, Atiya A F. A novel template reduction approach for the k-nearest neighbor method. IEEE Transactions on Neural Networks, 2009, 20(5): 890-896
[11]	Olvera-Lpez J A, Carrasco-Ochoa J A, Martnez-Trinidad J F. A new fast prototype selection method based on clustering. Pattern Analysis and Applications, 2010, 13(2): 131- 141
[12]	Kohonen T. Self-Organizing Maps (3rd Edition). New York: Springer-Verlag, 2001.
[13]	Polikar R, Byorick J, Krause S, Marino A, Moreton M. Learn++: a classifier independent incremental learning algorithm for supervised neural networks. In: Proceedings of the 2002 International Joint Conference on Neural Networks. Honolulu, HI: IEEE, 2002, 2: 1742-1747
[14]	Zheng J, Shen F R, Fan H J, Zhao J X. An online incremental learning support vector machine for large-scale data. Neural Computing and Applications, 2013, 22(5): 1023- 1035
[15]	Liu J, Lee J P Y, Li L J, Luo Z Q, Wong K M. Online clustering algorithms for radar emitter classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1185-1196
[16]	Seo S, Mohr J, Obermayer K. A new incremental pairwise clustering algorithm. In: Proceedings of the 2009 International Conference on Machine Learning and Applications. Miami Beach, FL: IEEE, 2009. 223-228
[17]	Xu Y, Shen F R, Zhao J X. An incremental learning vector quantization algorithm for pattern classification. Neural Computing and Applications, 2012, 21(6): 1205-1215
[18]	Calaa Y P, Reyes E G, Alzate M O, Duin R P W. Prototype selection for dissimilarity representation by a genetic algorithm. In: Proceedings of the 20th International Conference on Pattern Recognition. Istanbul: IEEE, 2010. 177- 180

施引文献

资源附件(0)

访问统计

计量

文章访问数: 1416
HTML全文浏览量: 65
PDF下载量: 1030
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于样本密度和分类误差率的增量学习矢量量化算法研究

doi: 10.16383/j.aas.2015.c140311

作者简介:
李娟西安电子科技大学计算机学院博士研究生, 陕西师范大学远程教育学院讲师. 主要研究方向为数据挖掘, 模式识别.E-mail: ally 2004@126.com

通讯作者:
王宇平西安电子科技大学计算机学院教授. 主要研究方向为进化计算, 运筹学,模式识别, 机器学习. E-mail: ywang@xidian.edu.cn

计量

An Incremental Learning Vector Quantization Algorithm Based on Pattern Density and Classification Error Ratio

计量

目录

留言板

基于样本密度和分类误差率的增量学习矢量量化算法研究

doi: 10.16383/j.aas.2015.c140311

作者简介: 李娟 西安电子科技大学计算机学院博士研究生, 陕西师范大学远程教育学院讲师. 主要研究方向为数据挖掘, 模式识别.E-mail: ally 2004@126.com

通讯作者: 王宇平 西安电子科技大学计算机学院教授. 主要研究方向为进化计算, 运筹学,模式识别, 机器学习. E-mail: ywang@xidian.edu.cn

计量

出版历程

An Incremental Learning Vector Quantization Algorithm Based on Pattern Density and Classification Error Ratio

计量

出版历程

目录

作者简介:
李娟西安电子科技大学计算机学院博士研究生, 陕西师范大学远程教育学院讲师. 主要研究方向为数据挖掘, 模式识别.E-mail: ally 2004@126.com

通讯作者:
王宇平西安电子科技大学计算机学院教授. 主要研究方向为进化计算, 运筹学,模式识别, 机器学习. E-mail: ywang@xidian.edu.cn