基于拓扑一致性对抗互学习的知识蒸馏

赖轩; 曲延云; 谢源; 裴玉龙

doi:10.16383/j.aas.c200665

基于拓扑一致性对抗互学习的知识蒸馏

doi: 10.16383/j.aas.c200665

赖轩^1,,
曲延云^1,,
谢源^2,,
裴玉龙^1,

1.
厦门大学信息学院厦门 361005
2.
华东师范大学计算机科学与技术学院上海 200064

基金项目: 国家自然科学基金(61876161, 61772524, 61671397, U1065252, 61772440), 上海市人工智能科技支撑专项(21511100700)资助

详细信息

作者简介:
赖轩：厦门大学信息学院硕士研究生. 主要研究方向为计算机视觉与图像处理. E-mail: laixuan@stu.xmu.edu.cn

曲延云：厦门大学信息学院教授.主要研究方向为模式识别, 计算机视觉和机器学习. 本文通信作者.E-mail: yyqu@xmu.edu.cn

谢源：华东师范大学计算机科学与技术学院教授. 主要研究方向为模式识别, 计算机视觉和机器学习. E-mail: yxie@cs.ecnu.edu.cn

裴玉龙：厦门大学信息学院硕士研究生. 主要研究方向为计算机视觉与图像处理. E-mail: 23020181154279@ stu.xmu.edu.cn

计量
- 文章访问数: 1318
- HTML全文浏览量: 1262
- PDF下载量: 292
- 被引次数: 0
出版历程
- 收稿日期: 2020-08-18
- 录用日期: 2020-12-23
- 网络出版日期: 2021-01-19
- 刊出日期: 2023-01-07

Topology-guided Adversarial Deep Mutual Learning for Knowledge Distillation

LAI Xuan^1
,,
QU Yan-Yun^1
,,
XIE Yuan^2
,,
PEI Yu-Long^1
,

1.
School of Informatics, Xiamen University, Xiamen 361005
2.
Department of Computer Science & Technology, East China Normal University, Shanghai 200064

Funds: Supported by National Natural Science Foundation of China (61876161, 61772524, 61671397, U1065252, 61772440) and Shanghai Science and Technology Commission (21511100700)

More Information

Author Bio:
LAI Xuan　Master student at the School of Informatics, Xiamen University. His research interest covers computer vision and image processing

QU Yan-Yun　Professor at the School of Informatics, Xiamen University. Her research interest covers pattern recognition, computer vision and machine learning. Corresponding author of this paper

XIE Yuan　Professor in the Department of Computer Science & Technology, East China Normal University. His research interest covers pattern recognition, computer vision and machine learning

PEI Yu-Long　Master student at the School of Informatics, Xiamen University. His research interest covers computer vision and image processing

摘要

摘要: 针对基于互学习的知识蒸馏方法中存在模型只关注教师网络和学生网络的分布差异, 而没有考虑其他的约束条件, 只关注了结果导向的监督, 而缺少过程导向监督的不足, 提出了一种拓扑一致性指导的对抗互学习知识蒸馏方法(Topology-guided adversarial deep mutual learning, TADML). 该方法将教师网络和学生网络同时训练, 网络之间相互指导学习, 不仅采用网络输出的类分布之间的差异, 还设计了网络中间特征的拓扑性差异度量. 训练过程采用对抗训练, 进一步提高教师网络和学生网络的判别性. 在分类数据集CIFAR10、CIFAR100和Tiny-ImageNet及行人重识别数据集Market1501上的实验结果表明了TADML的有效性, TADML取得了同类模型压缩方法中最好的效果.
- 互学习 /
- 生成对抗网络 /
- 特征优化 /
- 知识蒸馏
Abstract: The existing mutual-deep-learning based knowledge distillation methods have the limitations: the discrepancy between the teacher network and the student network is only used to supervise the knowledge transfer neglecting other constraints, and the result-driven supervision is only used neglecting process-driven supervision. This paper proposes a topology-guided adversarial deep mutual learning network (TADML). This method trains multiple classification sub-networks of the same task simultaneously and each sub-network learns from others. Moreover, our method uses an adversarial network to adaptively measure the differences between pairwise sub-networks and optimizes the features without changing the model structure. The experimental results on three classification datasets: CIFAR10, CIFAR100 and Tiny-ImageNet and a person re-identification dataset Market1501 show that our method has achieved the best results among similar model compression methods.
- Mutual learning /
- generative adversarial network /
- feature optimization /
- knowledge distillation

HTML全文

图 1 本文方法框架

Fig. 1 The framework of the proposed method

下载: 全尺寸图片幻灯片

图 2 判别器结构图

Fig. 2 The structure of discriminator

下载: 全尺寸图片幻灯片

表 1 损失函数对分类精度的影响比较(%)

Table 1 Comparison of classification performance with different loss function (%)

损失构成	CIFAR10	CIFAR100
L_S	92.90	70.47
L_S+ L_JS	93.18	71.70
L_S+ L_JS+ L_adv	93.52	72.75
L_S+ L₁ + L_adv	93.04	71.97
L_S+ L₂ + L_adv	93.26	72.02
L_S+ L₁ + L_JS+ L_adv	92.87	71.63
L_S+ L₂ + L_JS+ L_adv	92.38	70.90
L_S+ L_JS+ L_adv+ L_T	93.05	71.81

下载: 导出CSV

表 2 判别器结构对分类精度的影响比较(%)

Table 2 Comparison of classification performance with different discriminator structures (%)

结构	CIFAR100
256fc-256fc	71.57
500fc-500fc	72.09
100fc-100fc-100fc	72.33
128fc-256fc-128fc	72.51
64fc-128fc-256fc-128fc	72.28
128fc-256fc-256fc-128fc	72.23

下载: 导出CSV

表 3 判别器输入对分类精度的影响比较(%)

Table 3 Comparison of classification performance with different discriminator inputs (%)

输入约束	CIFAR100
Conv4	72.33
FC	72.51
Conv4 + FC	72.07
FC + DAE	71.97
FC + Label	72.35
FC + Avgfc	71.20

下载: 导出CSV

表 4 采样数量对分类精度的影响比较(%)

Table 4 Comparison of classification performance with different sampling strategies (%)

网络结构	Vanila	Random	K = 2	K = 4	K = 8	K = 16	K = 32	K = 64
Resnet32	71.14	72.12	31.07	60.69	72.43	72.84	72.50	71.99
Resnet110	74.31	74.59	22.64	52.33	74.59	75.18	75.01	74.59

下载: 导出CSV

表 5 网络结构对分类精度的影响比较(%)

Table 5 Comparison of classification performance with different network structures (%)

网络结构		原始网络		DML^[13]		ADML		TADML
网络 1	网络 2	网络 1	网络 2	网络 1	网络 2	网络 1	网络 2	网络 1	网络 2
ResNet32	ResNet32	70.47	70.47	71.86	71.89	72.85	72.89	73.07	73.13
ResNet32	ResNet110	70.47	73.12	71.62	74.08	72.66	74.18	73.14	74.86
ResNet110	ResNet110	73.12	73.12	74.59	74.55	75.08	75.10	75.52	75.71
WRN-10-4	WRN-10-4	72.65	72.65	73.06	73.01	73.77	73.75	73.97	74.08
WRN-10-4	WRN-28-10	72.65	80.77	73.58	81.11	74.61	81.43	75.11	82.13

下载: 导出CSV

表 6 网络结构对行人重识别平均识别精度的影响比较(%)

Table 6 Comparison of person re-identification mAP with different network structures (%)

网络结构		原始网络		DML^[13]		ADML		TADML
网络 1	网络 2	网络 1	网络 2	网络 1	网络 2	网络 1	网络 2	网络 1	网络 2
InceptionV1	MobileNetV1	65.26	46.07	65.34	52.87	65.60	53.22	66.03	53.91
MobileNetV1	MobileNetV1	46.07	46.07	52.95	51.26	53.42	53.27	53.84	53.65

下载: 导出CSV

表 7 本文算法与其他压缩算法的实验结果

Table 7 Experimental results of the proposed algorithm and other compression algorithms

对比算法	参数量 (MB)	CIFAR10 (%)	CIFAR100 (%)	Tiny-ImageNet (%)
ResNet20	0.27	91.42	66.63	54.45
ResNet164	2.6	93.43	72.24	61.55
Yim 等^[10]	0.27	88.70	63.33	-
SNN-MIMIC^[22]	0.27	90.93	67.21	-
KD^[8]	0.27	91.12	66.66	57.65
FitNet^[9]	0.27	91.41	64.96	55.59
Quantization^[20]	0.27	91.13	-	-
Binary Connect^[21]	15.20	91.73	-	-
ANC^[23]	0.27	91.92	67.55	58.17
TSANC^[24]	0.27	92.17	67.43	58.20
KSANC^[24]	0.27	92.68	68.58	59.77
DML^[13]	0.27	91.82	69.47	57.91
ADML	0.27	92.23	69.60	59.00
TADML	0.27	93.05	70.81	60.11

下载: 导出CSV

参考文献(24)

[1]	He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA: IEEE, 2016. 770−778
[2]	Zhang X Y, Zhou X Y, Lin M X, Sun J. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake, USA: IEEE, 2018. 6848−6856
[3]	Guo Y W, Yao A B, Zhao H, Chen Y R. Network sketching: Exploiting binary structure in deep CNNs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA: IEEE, 2017. 4040−4048
[4]	Tai C, Xiao T, Wang X G,E W N. Convolutional neural networks with low-rank regularization. In: Proceedings of the 4th International Conference on Learning Representations, San Ju-an, Puerto Rico, 2016
[5]	Chen W, Wilson J T, Tyree S, Weinberger K Q, Chen Y X. Compressing neural networks with the hashing trick. In: Proce-edings of the 32nd International Conference on Machine Learni-ng, Lille, France: 2015. 37: 2285−2294
[6]	Denton E L, Zaremba W, Bruna J, LeCun Y, Fergus R. Exploiting linear structure within convolutional networks for efficient evaluation. In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems, Montreal, Canada: 2014. 1269−1277
[7]	Li Z, Hoiem D. Learning without forgetting. In: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Netherlands: 2016. 614−629
[8]	Hinton G E, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv preprint, 2015, arXiv: 1503.02531
[9]	Romero A, Ballas N, Kahou S E, Chassang A, Gatta C, Bengio Y. Fitnets: Hints for thin deep nets. In: Proceedings of the 3rd International Conference on Learning Representations. San Di-ego, USA, 2015
[10]	Yim J, Joo D, Bae J H, Kim J. A Gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA: IEEE, 2017. 7130−7138
[11]	Peng B Y, Jin X, Li D S, Zhou S F, Wu Y C, Liu J H, et al. Correlation congruence for knowledge distillation. In: Proceedings of the IEEE International Conference on Computer Vision, Seoul, South Korea: IEEE, 2017. 5006−5015
[12]	Park W, Kim D, Lu Y, Cho M. Relational knowledge distillation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, USA: IEEE, 2019. 3967−3976
[13]	Zhang Y, Xiang T, Hospedales T M, Lu H C. Deep mutual learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA: IEEE, 2018. 4320−4328
[14]	Batra T, Parikh D. Cooperative Learning with Visual Attributes. arXiv preprint, 2017, arXiv: 1705.05512
[15]	Zhang H, Goodfellow I J, Metaxas D N, Odena A. Self-attention generative adversarial networks. In: Proceedings of the 36th International Conference on Machine Learning, Long Beach, USA: 2019. 7354−7363
[16]	Zagoruyko S, Komodakis N. Wide residual networks. In: Proceedings of the British Machine Vision Conference, York, UK: 2016. 1−12
[17]	Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases, 2009.1(4).
[18]	Mirza M, Osindero S. Conditional generative adversarial nets. arXiv preprint, 2014, arXiv: 1411.1784
[19]	Shu C Y, Li P, Xie Y, Qu Y Y, Kong H. Knowledge Squeezed Adversarial Network Compression. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, USA: 2020. 11370−11377
[20]	Zhu C Z, Han S, Mao H Z, Dally W J. Trained ternary quantization. In: Proceedings of the 5th International Conference on Learning Representations. Toulon, France, 2017.
[21]	Courbariaux M, Bengio Y, David J P. Binaryconnect: Training deep neural networks with binary weights during propagations. In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems. Montreal, Canada: 2015. 3123− 3131
[22]	Ba J, Caruana R. Do deep nets really need to be deep? In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems, Montreal, Canada: 2014. 2654−2662
[23]	Belagiannis V, Farshad A, Galasso F. Adversarial network compression. In: Proceedings of the European Conference on Computer Vision, Munich, Germany: 2018. 11132: 431−449
[24]	Xu Z, Hsu Y C, H J W. Training student networks for acceleration with conditional adversarial networks. In: Proceedings of British Machine Vision Conference, Newcastle, UK: 2018. 61