基于特征变换和度量网络的小样本学习算法

王多瑞; 杜杨; 董兰芳; 胡卫明; 李兵

doi:10.16383/j.aas.c210903

基于特征变换和度量网络的小样本学习算法

doi: 10.16383/j.aas.c210903

王多瑞^{1, 2,},
杜杨^{2, 3},
董兰芳^1,,
胡卫明^2,,
李兵²

1.
中国科学技术大学合肥 230026
2.
中国科学院自动化研究所模式识别国家重点实验室北京100190
3.
阿里巴巴科技(北京)有限公司北京 100016

基金项目: 国家重点研发计划(2018AAA0102802), 国家自然科学基金(62036011, 62192782, 61721004), 中国科学院前沿科学重点研究计划(QYZDJ-SSW-JSC040)资助

详细信息

作者简介:
王多瑞：2021年获得中国科学技术大学硕士学位. 主要研究方向为小样本学习, 目标检测.E-mail: wangduor@mail.ustc.edu.cn

杜杨：2019年获得中国科学院自动化研究所博士学位. 主要研究方向为行为识别, 医学图像处理.E-mail: jingzhou.dy@alibaba-inc.com

董兰芳：中国科学技术大学副教授. 1994年获得中国科学技术大学硕士学位. 主要研究方向为图像与视频智能分析, 知识图谱与对话系统, 数值模拟与三维重建.E-mail: lfdong@ustc.edu.cn

胡卫明：中国科学院自动化研究所研究员. 1998年获得浙江大学博士学位. 主要研究方向为视觉运动分析, 网络不良信息识别和网络入侵检测. 本文通信作者.E-mail: wmhu@nlpr.ia.ac.cn

李兵：中国科学院自动化研究所研究员. 2009年获得北京交通大学博士学位. 主要研究方向为网络内容安全, 智能图像信号处理.E-mail: bing.li@ia.ac.cn

计量
- 文章访问数: 1108
- HTML全文浏览量: 585
- PDF下载量: 290
- 被引次数: 0
出版历程
- 收稿日期: 2021-09-20
- 录用日期: 2021-12-11
- 网络出版日期: 2023-09-11
- 刊出日期: 2024-07-23

Feature Transformation and Metric Networks for Few-shot Learning

WANG Duo-Rui^{1, 2
,},
DU Yang^{2, 3},
DONG Lan-Fang^1
,,
HU Wei-Ming^2
,,
LI Bing²

1.
University of Science and Technology of China, Hefei 230026
2.
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190
3.
Alibaba (Beijing) Group, Beijing 100016

Funds: Supported by National Key Research and Development Program of China (2018AAA0102802), National Natural Science Foundation of China (62036011, 62192782, 61721004), and Key Research Program of Frontier Sciences of Chinese Academy of Sciences (QYZDJ-SSW-JSC040)

More Information

Author Bio:
WANG Duo-Rui　He received his master degree from University of Science and Technology of China in 2021. His research interest covers few-shot learning and object detection

DU Yang　He received his Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences. His research interest covers action recognition and medical image processing

DONG Lan-Fang　Associate professor at University of Science and Technology of China. She received her master degree from University of Science and Technology of China in 1994. Her research interest covers image and video intelligent analysis, knowledge mapping and dialogue systems, and numerical simulation and 3D reconstruction

HU Wei-Ming　Professor at the Institute of Automation, Chinese Academy of Sciences. He received his Ph.D. degree from Zhejiang University in 1998. His research interest covers visual motion analysis, recognition of web objectionable information, and network intrusion detection. Corresponding author of this paper

LI Bing　Professor at the Institute of Automation, Chinese Academy of Sciences. He received his Ph.D. degree from Beijing Jiaotong University in 2009. His research interest covers the web content security and intelligent image signal process

摘要

摘要: 在小样本分类任务中, 每个类别可供训练的样本数量非常有限. 因此在特征空间中同类样本分布稀疏, 异类样本间边界模糊. 提出一种新的基于特征变换和度量网络(Feature transformation and metric networks, FTMN)的小样本学习算法用于小样本分类任务. 算法通过嵌入函数将样本映射到特征空间, 并计算输入该样本与所属类别中心的特征残差. 构造一个特征变换函数对该残差进行学习, 使特征空间内的样本特征经过该函数后向同类样本中心靠拢. 利用变换后的样本特征更新类别中心, 使各类别中心间的距离增大. 算法进一步构造了一种新的度量函数, 对样本特征中每个局部特征点的度量距离进行联合表达, 该函数能够同时对样本特征间的夹角和欧氏距离进行优化. 算法在小样本分类任务常用数据集上的优秀表现证明了算法的有效性和泛化性.
- 特征变换 /
- 度量学习 /
- 小样本学习 /
- 残差学习
Abstract: For few-shot classification, training samples for each class are highly limited. Consequently, samples from the same class tend to distribute sparsely while boundaries between different classes are indistinct in the feature space. Therefore, a novel few-shot learning algorithm based on feature transformation and metric networks (FTMN) is proposed for few-shot learning. The algorithm maps samples to the feature space through an embedding function and calculates the residual between the input features and their class center. A feature transformation function is then constructed to learn from the residual, enabling input features to move closer to their class center after transformation. The transformed features are used to update the class centers, increasing the distance between centers of different classes. Furthermore, the algorithm introduces a novel metric function that jointly expresses the metric distances of each point within the features. The metric function simultaneously optimizes both cosine similarity and Euclidean distance. The performance of the algorithm on commonly used datasets for few-shot classification validates its effectiveness and generalization ability.
- Feature transformation /
- metric learning /
- few-shot learning /
- residual learning

HTML全文

图 1 特征变换和度量网络模型

Fig. 1 Model of feature transformation and metric networks

下载: 全尺寸图片幻灯片

图 2 网络中关键函数的结构

Fig. 2 Structure of important functions of networks

下载: 全尺寸图片幻灯片

表 1 网络模型的嵌入函数与重要结构

Table 1 Embedding function and important structures of networks

模型名称	嵌入函数	重要结构
MN	4层卷积网络	注意力长短时记忆网络
ProtoNet^[12]	4层卷积网络	“原型”概念、使用欧氏距离进行度量
RN	4层卷积网络	卷积神经网络作为度量函数
EGNN	4层卷积网络	边标签预测节点类别
EGNN + Transduction^[22]	ResNet-12	边标签预测节点类别、转导和标签传递
DN4^[24]	ResNet-12	局部描述子、图像与类别间的相似性度量
DC^[25]	4层卷积网络	稠密分类
DC + IMP^[25]	4层卷积网络	稠密分类、神经网络迁移
FTMN	4层卷积网络	特征变换模块、特征度量模块
FTMN-R12	ResNet-12	特征变换模块、特征度量模块

下载: 导出CSV

表 2 在Omniglot数据集上的小样本分类性能(%)

Table 2 Few-shot classification performance on Omniglot dataset (%)

模型	5-类		20-类
模型	1-样本	5-样本	1-样本	5-样本
MN	98.1	98.9	93.8	98.5
ProtoNet^[12]	98.8	99.7	96.0	98.9
SN	97.3	98.4	88.2	97.0
RN	99.6 ± 0.2	99.8 ± 0.1	97.6 ± 0.2	99.1 ± 0.1
SM^[15]	98.4	99.6	95.0	98.6
MetaNet^[16]	98.95	—	97.00	—
MANN^[17]	82.8	94.9	—	—
MAML^[18]	98.7 ± 0.4	99.9 ± 0.1	95.8 ± 0.3	98.9 ± 0.2
MMNet^[26]	99.28 ± 0.08	99.77 ± 0.04	97.16 ± 0.10	98.93 ± 0.05
FTMN	99.7 ± 0.1	99.9 ± 0.1	98.3 ± 0.1	99.5 ± 0.1

下载: 导出CSV

表 3 在miniImageNet数据集上的小样本分类性能 (%)

Table 3 Few-shot classification performance on miniImageNet dataset (%)

模型	5-类
模型	1-样本	5-样本
MN	43.40 ± 0.78	51.09 ± 0.71
ML-LSTM^[11]	43.56 ± 0.84	55.31 ± 0.73
ProtoNet^[12]	49.42 ± 0.78	68.20 ± 0.66
RN	50.44 ± 0.82	65.32 ± 0.70
MetaNet^[16]	49.21 ± 0.96	—
MAML^[18]	48.70 ± 1.84	63.11 ± 0.92
EGNN	—	66.85
EGNN + Transduction^[22]	—	76.37
DN4^[24]	51.24 ± 0.74	71.02 ± 0.64
DC^[25]	62.53 ± 0.19	78.95 ± 0.13
DC + IMP^[25]	—	79.77 ± 0.19
MMNet^[26]	53.37 ± 0.08	66.97 ± 0.09
PredictNet^[27]	54.53 ± 0.40	67.87 ± 0.20
DynamicNet^[28]	56.20 ± 0.86	72.81 ± 0.62
MN-FCE^[29]	43.44 ± 0.77	60.60 ± 0.71
MetaOptNet^[30]	60.64 ± 0.61	78.63 ± 0.46
FTMN	59.86 ± 0.91	75.96 ± 0.82
FTMN-R12	61.33 ± 0.21	79.59 ± 0.47

下载: 导出CSV

表 4 在CUB-200、CIFAR-FS和tieredImageNet数据集上的小样本分类性能(%)

Table 4 Few-shot classification performance on CUB-200, CIFAR-FS and tieredImageNet datasets (%)

模型	CUB-200 5-类		CIFAR-FS 5-类		tieredImageNet 5-类
模型	1-样本	5-样本	1-样本	5-样本	1-样本	5-样本
MN	61.16 ± 0.89	72.86 ± 0.70	—	—	—	—
ProtoNet^[12]	51.31 ± 0.91	70.77 ± 0.69	55.5 ± 0.7	72.0 ± 0.6	53.31 ± 0.89	72.69 ± 0.74
RN	62.45 ± 0.98	76.11 ± 0.69	55.0 ± 1.0	69.3 ± 0.8	54.48 ± 0.93	71.32 ± 0.78
MAML^[18]	55.92 ± 0.95	72.09 ± 0.76	58.9 ± 1.9	71.5 ± 1.0	51.67 ± 1.81	70.30 ± 1.75
EGNN	—	—	—	—	63.52 ± 0.52	80.24 ± 0.49
DN4^[24]	53.15 ± 0.84	81.90 ± 0.60	—	—	—	—
MetaOptNet^[30]	—	—	72.0 ± 0.7	84.2 ± 0.5	65.99 ± 0.72	81.56 ± 0.53
FTMN-R12	69.58 ± 0.36	85.46 ± 0.79	70.3 ± 0.5	82.6 ± 0.3	62.14 ± 0.63	81.74 ± 0.33

下载: 导出CSV

表 5 消融实验结果 (%)

Table 5 Results of ablation study (%)

模型	5-类
模型	1-样本	5-样本
ProtoNet-4C	49.42 ± 0.78	68.20 ± 0.66
ProtoNet-8C	51.18 ± 0.73	70.23 ± 0.46
ProtoNet-Trans-4C	53.47 ± 0.46	71.33 ± 0.23
ProtoNet-M-4C	56.54 ± 0.57	73.46 ± 0.53
ProtoNet-VLAD-4C	52.46 ± 0.67	70.83 ± 0.62
Trans*-M-4C	59.86 ± 0.91	67.86 ± 0.56
仅使用余弦相似度	54.62 ± 0.57	72.58 ± 0.38
仅使用欧氏距离	55.66 ± 0.67	73.34 ± 0.74
FTMN	59.86 ± 0.91	75.96 ± 0.82

下载: 导出CSV

参考文献(30)

[1]	Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015. 1−9
[2]	He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 770−778
[3]	Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: NIPS, 2012. 1106−1114
[4]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR, 2015.
[5]	刘颖, 雷研博, 范九伦, 王富平, 公衍超, 田奇. 基于小样本学习的图像分类技术综述. 自动化学报, 2021, 47(2): 297−315 doi: 10.16383/j.aas.c190720 Liu Ying, Lei Yan-Bo, Fan Jiu-Lun, Wang Fu-Ping, Gong Yan-Chao, Tian Qi. Survey on image classification technology based on small sample learning. Acta Automatica Sinica, 2021, 47(2): 297−315 doi: 10.16383/j.aas.c190720
[6]	Miller E G, Matsakis N E, Viola P A. Learning from one example through shared densities on transforms. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hilton Head Island, USA: IEEE, 2000. 464−471
[7]	Li F F, Fergus R, Perona P. One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(4): 594−611
[8]	Lake B M, Salakhutdinov R, Gross J, Tenenbaum J B. One shot learning of simple visual concepts. In: Proceedings of the 33rd Annual Meeting of the Cognitive Science Society. Boston, USA: CogSci, 2011. 2568−2573
[9]	Lake B M, Salakhutdinov R, Tenenbaum J B. Human-level concept learning through probabilistic program induction. Science, 2015, 350(11): 1332−1338
[10]	Edwards H, Storkey A J. Towards a neural statistician. In: Proceedings of the 5th International Conference on Learning Representations. Toulon, France: ICLR, 2017.
[11]	Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D. Matching networks for one shot learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: 2016. 3637−3645
[12]	Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning. In: Proceedings of the 31th International Conference on Neural Information Processing Systems. Long Beach, USA: 2017. 4080−4090
[13]	Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for one-shot image recognition. In: Proceedings of the 32nd International Conference on Machine Learning. Lille, France: JMLR, 2015.
[14]	Sung F, Yang Y X, Zhang L, Xiang T, Torr P H S, Hospedales T M. Learning to compare: Relation network for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 1199−1208
[15]	Kaiser L, Nachum O, Roy A, Bengio S. Learning to remember rare events. In: Proceedings of the 5th International Conference on Learning Representations. Toulon, France: ICLR, 2017.
[16]	Munkhdalai T, Yu H. Meta networks. In: Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: JMLR.org, 2017. 2554−2563
[17]	Santoro A, Bartunov S, Botvinick M, Wierstra D, Lillicrap T. Meta-learning with memory-augmented neural networks. In: Proceedings of the 33rd International Conference on Machine Learning. New York, USA: PMLR, 2016. 1842−1850
[18]	Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: JMLR.org, 2017. 1126−1135
[19]	Arandjelovic R, Gronat P, Torii A, Pajdla T, Sivic J. Net-VLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 5297−5307
[20]	Jégou H, Douze M, Schmid C, Pérez P. Aggregating local descriptors into a compact image representation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE, 2010. 3304−3311
[21]	Bertinetto L, Henriques J F, Torr P H, Vedaldi A. Meta-learning with differentiable closed-form solvers. In: Proceedings of the 7th International Conference on Learning Representations. New Orleans, USA: ICLR, 2019.
[22]	Kim J, Kim T, Kim S, Yoo C D. Edge-labeling graph neural network for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 11−20
[23]	Yue Z Q, Zhang H W, Sun Q R, Hua X S. Interventional few-shot learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Incorporated, 2020. Article No. 230
[24]	Li W B, Wang L, Xu J L, Huo J, Gao Y, Luo J B. Revisiting local descriptor based image-to-class measure for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 7253−7260
[25]	Lifchitz Y, Avrithis Y, Picard S, Bursuc A. Dense classification and implanting for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 9250−9259
[26]	Cai Q, Pan Y W, Yao T, Yan C G, Mei T. Memory matching networks for one-shot image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 4080−4088
[27]	Qiao S Y, Liu C X, Shen W, Yuille A L. Few-shot image recognition by predicting parameters from activations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 7229−7238
[28]	Gidaris S, Komodakis N. Dynamic few-shot visual learning without forgetting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 4367−4375
[29]	Ravi S, Larochelle H. Optimization as a model for few-shot learning. In: Proceedings of the 5th International Conference on Learning Representations. Toulon, France: ICLR, 2017.
[30]	Lee K, Maji S, Ravichandran A, Soatto S. Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 10649−10657