摘要:
多数图像分类算法需要大量的训练样本对分类器模型进行训练.在实际应用中, 对大量样本进行标注非常枯燥、耗时.对于一些特殊图像,如合成孔径雷达 (Synthetic aperture radar, SAR)图像, 对其内容判读非常困难,因此能够获得的标注样本数量非常有限. 本文将基于最优标号和次优标号(Best vs second-best, BvSB)的主动学习和带约束条件的自学习(Constrained self-training, CST) 引入到基于支持向量机(Support vector machine, SVM)分类器的图像分类算法中,提出了一种新的图像分类方法.通过BvSB 主动学习去挖掘那些对当前分类器模型最有价值的样本进行人工标注,并借助CST半 监督学习进一步利用样本集中大量的未标注样本,使得在花费较小标注代价情况下, 能够获得良好的分类性能.将新方法与随机样本选择、基于熵的不确定性采样主动学 习算法以及BvSB主动学习方法进行了性能比较.对3个光学图像集及1个SAR图像集分类 问题的实验结果显示,新方法能够有效地减少分类器训练时所需的人工标注样本的数 量,并获得较高的准确率和较好的鲁棒性.
Abstract:
Most image classification methods require adequate labeled training samples to train classifier models. In real world applications, labelling samples are often very time consuming and expensive, especially for some special images, e.g. synthetic aperture radar (SAR) images. So the number of labeled samples is usually limited. In this study, we propose a novel image classification method based on SVMs, incorporating best vs second-best (BvSB) active learning and constrained self-training (CST). In this method, BvSB active learning is used to explore examples that are the most valuable to current classifier model for manual labelling. And CST is used to exploit useful information from examples that remain in the unlabeled dataset. With this new method, satisfying classification performance can be achieved while the human labelling load is low. We demonstrate results on 3 optical image datasets and a SAR image dataset. The proposed method gives large reduction in the number of human labeled samples as compared with random selection, entropy based active learning and BvSB active learning to achieve similar classification accuracy, and has little computational overhead and good robustness.