-
摘要: 在小样本分类任务中, 每类可供训练的样本非常有限, 同类样本在特征空间中分布稀疏, 异类样本间的边界模糊. 文章提出一种新的基于特征变换的网络, 并使用度量的方法来处理小样本分类任务. 算法通过嵌入函数将样本映射到特征空间并计算输入样本与样本中心的特征残差, 利用特征变换函数学习样本中心与同类样本间的残差, 使样本在特征空间中向同类样本中心靠拢, 更新样本中心在特征空间中的位置使它们之间的距离增大. 融合余弦相似度和欧氏距离构造一个新的度量方法, 设计一个度量函数对特征图中每个局部特征的度量距离进行联合地表达, 该函数在网络优化时可同时对样本特征间的夹角和欧氏距离进行优化. 网络模型在小样本分类任务常用数据集上的表现证明, 该模型性能优秀且具有泛化性.Abstract: For few-shot learning, training samples for each class are very limited. Samples from same class are far apart in feature space, and boundary between different classes is fuzzy. Therefore, a new feature transformation networks for metric based few-shot learning is proposed. We use embedding function to map samples to the feature space and calculate the residual between features and its class center. We design a feature transforming function to study the residual. The function makes features moving towards their class center. Then the algorithm will recalculate class centers to make the distance between these centers increased. Furthermore, an effective distance measurement is constructed by using cosine similarity as part of Euclidean distance’s weight, and a metric function is proposed to express the distance of each position in the feature map jointly, which can optimize the cosine similarity and Euclidean distance at the same time. The result of our model on datasets proves its excellent performance and generalization.
-
Key words:
- feature transformation /
- metric based /
- few-shot learning /
- residual learning
-
表 1 网络模型的嵌入函数与重要结构
Table 1 Networks’ embedding function and important structures
表 2 Omniglot数据集上的平均分类精度对比 (%)
Table 2 Comparison of mean accuracy (%) with existing algorithms on Omniglot dataset
模型 5-类 20-类 1-样本 5-样本 1-样本 5-样本 MN[11] 98.1 98.9 93.8 98.5 ProtoNet[12] 98.8 99.7 96.0 98.9 SN[13] 97.3 98.4 88.2 97.0 RN[14] 99.6 ± 0.2 99.8 ± 0.1 97.6 ± 0.2 99.1 ± 0.1 SM[15] 98.4 99.6 95.0 98.6 MetaNet[16] 98.95 — 97.0 — MANN[17] 82.8 94.9 — — MAML[18] 98.7 ± 0.4 99.9 ± 0.1 95.8 ± 0.3 98.9 ± 0.2 MMNet[26] 99.28 ± 0.08 99.77 ± 0.04 97.16 ± 0.1 98.93 ± 0.05 MBFTN 99.7 ± 0.1 99.9 ± 0.1 98.3 ± 0.1 99.5 ± 0.1 表 3 miniImageNet数据集上的平均分类精度对比 (%)
Table 3 Comparison of mean accuracy (%) with existing algorithms on miniImageNet dataset
模型 5-类 1-样本 5-样本 MN[11] 43.4 ± 0.78 51.09 ± 0.71 ML-LSTM[11] 43.56 ± 0.84 55.31 ± 0.73 ProtoNet[12] 49.42 ± 0.78 68.20 ± 0.66 RN[14] 50.44 ± 0.82 65.32 ± 0.70 MetaNet[16] 49.21 ± 0.96 — MAML[18] 48.70 ± 1.84 63.11 ± 0.92 EGNN[22] — 66.85 EGNN+Transduction[22] — 76.37 DN4[24] 51.24 ± 0.74 71.02 ± 0.64 DC[25] 62.53 ± 0.19 78.95 ± 0.13 DC+IMP[25] — 79.77 ± 0.19 MMNet[26] 53.37 ± 0.08 66.97 ± 0.09 PredictNet[27] 54.53 ± 0.40 67.87 ± 0.20 DynamicNet[28] 56.20 ± 0.86 72.81 ± 0.62 MN-FCE[29] 43.44 ± 0.77 60.60 ± 0.71 MetaOptNet[30] 60.64 ± 0.61 78.63 ± 0.46 MBFTN 59.86 ± 0.91 75.96 ± 0.82 MBFTN-R12 61.33 ± 0.21 79.59 ± 0.47 表 4 在CUB-200、CIFAR-FS和tieredImageNet上的分类精度(%)
Table 4 Mean accuracy (%) with existing algorithms on CUB-200, CIFAR-FS and tieredImageNet dataset
模型 CUB-200 5-类 CIFAR-FS 5-类 tieredImageNet 5-类 1-样本 5-样本 1-样本 5-样本 1-样本 5-样本 MN[11] 61.16 ± 0.89 72.86 ± 0.70 — — — — ProtoNet[12] 51.31 ± 0.91 70.77 ± 0.69 55.5 ± 0.7 72.0 ± 0.6 53.31 ± 0.89 72.69 ± 0.74 RN[14] 62.45 ± 0.98 76.11 ± 0.69 55.0 ± 1.0 69.3 ± 0.8 54.48 ± 0.93 71.32 ± 0.78 MAML[18] 55.92 ± 0.95 72.09 ± 0.76 58.9 ± 1.9 71.5 ± 1.0 51.67 ± 1.81 70.30 ± 1.75 EGNN[22] — — — — 63.52 ± 0.52 80.24 ± 0.49 DN4[24] 53.15 ± 0.84 81.90 ± 0.60 — — — — MetaOptNet[30] — — 72.0 ± 0.7 84.2 ± 0.5 65.99 ± 0.72 81.56 ± 0.53 MBFTN-R12 69.58 ± 0.36 85.46 ± 0.79 70.3 ± 0.5 82.6 ± 0.3 62.14 ± 0.63 81.74 ± 0.33 表 5 网络模型的消融实验结果对比
Table 5 Ablation study of our model
模型 5-类 1-样本 5-样本 ProtoNet-4C 49.42 ± 0.78 68.20 ± 0.66 ProtoNet-8C 51.18 ± 0.73 70.23 ± 0.46 ProtoNet-Trans-4C 53.47 ± 0.46 71.33 ± 0.23 ProtoNet-M-4C 56.54 ± 0.57 73.46 ± 0.53 ProtoNet-VLAD-4C 52.46 ± 0.67 70.83 ± 0.62 Trans*-M-4C 59.86 ± 0.91 67.86 ± 0.56 仅使用 54.62 ± 0.57 72.58 ± 0.38 仅使用欧氏距离 55.66 ± 0.67 73.34 ± 0.74 ProtoNet-Trans-M-4C 59.86 ± 0.91 75.96 ± 0.82 -
[1] Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015. 1-9 [2] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 770-778 [3] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS). Lake Tahoe, USA: NIPS, 2012. 1106-1114 [4] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR, 2015. [5] 刘颖, 雷研博, 范九伦, 王富平, 公衍超, 田奇. 基于小样本学习的图像分类技术综述. 自动化学报, 2021, 1(2): 297-315 doi: 10.16383/j.aas.c190720Liu Ying, Lei Yan-Bo, Fan Jiu-Lun, Wang Fu-Ping, Gong Yan-Chao, Tian Qi. Survey on image classification technology based on small sample learning. Acta Automatica Sinica, 2021, 1(2): 297-315 doi: 10.16383/j.aas.c190720 [6] Miller E G, Matsakis N E, Viola P A. Learning from one example through shared densities on transforms. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Hilton Head Island, USA: IEEE, 2000. 464-471 [7] Fei-Fei L, Fergus R, Perona P. One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 1(4): 594-6111 [8] Lake B M, Salakhutdinov R, Gross J, Tenenbaum J B. One shot learning of simple visual concepts. In: Proceedings of the 33rd Annual Meeting of the Cognitive Science Society (CogSci). Boston, USA: CogSci, 2011. 2568-2573 [9] Lake B M, Salakhutdinov R, Tenenbaum J B. Human-level concept learning through probabilistic program induction. Science, 2015, 1(6266): 1332-1338 [10] Edwards H, Storkey A J. Towards a neural statistician. In: Proceedings of the 5th International Conference on Learning Representations. Toulon, France: ICLR, 2017. [11] Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D. Matching networks for one shot learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: Curran Associates Inc., 2016. 3637-3645 [12] Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc., 2017. 4080-4090 [13] Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for one-shot image recognition. In: Proceedings of the 32nd International Conference on Machine Learning. Lille, France: JMLR, 2015. [14] Sung F, Yang Y X, Zhang L, Xiang T, Torr P H S, Hospedales T M. Learning to compare: Relation network for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 1199-1208 [15] Kaiser Ł, Nachum O, Roy A, Bengio S. Learning to remember rare events. In: Proceedings of the 5th International Conference on Learning Representations. Toulon, France: ICLR, 2017. [16] Munkhdalai T, Yu H. Meta networks. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70. Sydney, Australia: JMLR.org, 2017. 2554-2563 [17] . Santoro A, Bartunov S, Botvinick M, Wierstra D, Lillicrap T. Meta-learning with memory-augmented neural networks. In: Proceedings of the 33rd International Conference on Machine Learning. New York City, USA: PMLR, 2016. 1842-1850 [18] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70. Sydney, Australia: JMLR.org, 2017. 1126-1135 [19] Arandjelovic R, Gronat P, Torii A, Pajdla T, Sivic J. NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 5297-5307 [20] Jégou H, Douze M, Schmid C, Pérez P. Aggregating local descriptors into a compact image representation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE, 2010. 3304-3311 [21] Bertinetto L, Henriques J F, Torr P H, Vedaldi A. Meta-learning with differentiable closed-form solvers. In: Proceedings of the 7th International Conference on Learning Representations. New Orleans, USA: ICLR, 2019. [22] Kim J, Kim T, Kim S, Yoo C D. Edge-labeling graph neural network for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 11-20 [23] Yue Z Q, Zhang H W, Sun Q R, Hua X S. Interventional Few-Shot Learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc., 2020. Article No. 230 [24] Li W B, Wang L, Xu J L, Huo J, Gao Y, Luo J B. Revisiting local descriptor based image-to-class measure for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 7253-7260 [25] Lifchitz Y, Avrithis Y, Picard S, Bursuc A. Dense classification and implanting for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 9250-9259 [26] Cai Q, Pan Y W, Yao T, Yan C G, Mei T. Memory matching networks for one-shot image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 4080-4088 [27] Qiao S Y, Liu C X, Shen W, Yuille A L. Few-shot image recognition by predicting parameters from activations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 7229-7238 [28] Gidaris S, Komodakis N. Dynamic few-shot visual learning without forgetting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 4367-4375 [29] Ravi S, Larochelle H. Optimization as a model for few-shot learning. In: Proceedings of the 5th International Conference on Learning Representations. Toulon, France: ICLR, 2017. [30] Lee K, Maji S, Ravichandran A, Soatto S. Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 10649-10657
计量
- 文章访问数: 225
- HTML全文浏览量: 84
- 被引次数: 0