2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于拓扑一致性对抗互学习的知识蒸馏

赖轩 曲延云 谢源 裴玉龙

赖轩, 曲延云, 谢源, 裴玉龙. 基于拓扑一致性对抗互学习的知识蒸馏. 自动化学报, 2021, x(x): 1−9 doi: 10.16383/j.aas.200665
引用本文: 赖轩, 曲延云, 谢源, 裴玉龙. 基于拓扑一致性对抗互学习的知识蒸馏. 自动化学报, 2021, x(x): 1−9 doi: 10.16383/j.aas.200665
Lai Xuan, Qu Yan-Yun, Xie Yuan, Pei Yu-Long. Topology-guided adversarial deep mutual learning for knowledge distillation. Acta Automatica Sinica, 2021, x(x): 1−9 doi: 10.16383/j.aas.200665
Citation: Lai Xuan, Qu Yan-Yun, Xie Yuan, Pei Yu-Long. Topology-guided adversarial deep mutual learning for knowledge distillation. Acta Automatica Sinica, 2021, x(x): 1−9 doi: 10.16383/j.aas.200665

基于拓扑一致性对抗互学习的知识蒸馏

doi: 10.16383/j.aas.200665
基金项目: 国家自然科学基金面上项目(61876161, 61772524, 61671397, U1065252, 61772440)资助项目
详细信息
    作者简介:

    赖轩:厦门大学信息学院硕士研究生. 主要研究方向为计算机视觉与图像处理. E-mail: laixuan@stu.xmu.edu.cn

    曲延云 厦门大学信息学院教授.主要研究方向为模式识别, 计算机视觉和机器学习. E-mail: yyqu@xmu.edu.cn

    谢源:华东师范大学计算机科学与技术学院教授. 主要研究方向为模式识别, 计算机视觉和机器学习. E-mail: yxie@cs.ecnu.edu.cn

    裴玉龙:厦门大学信息学院硕士研究生. 主要研究方向为计算机视觉与图像处理. E-mail: 23020181154279@stu.xmu.edu.cn

Topology-guided Adversarial Deep Mutual Learning for Knowledge Distillation

Funds: Supported by National Natural Science Foundation of China (61876161, 61772524, 61671397, U1065252, 61772440)
  • 摘要: 针对基于互学习的知识蒸馏方法中存在的不足——模型只关注教师网络和学生网络的分布差异而没有考虑其他的约束条件; 只关注了结果导向的监督, 而缺少过程导向的监督——本文提出了一种拓扑一致性指导的对抗互学习知识蒸馏方法(Topology-guided aadversarial deep mutual learning, TADML)该方法将教师网络和学生网络同时训练, 网络之间相互指导学习, 不仅采用网络输出的类分布之间的差异, 还设计了网络中间特征的拓扑性差异度量. 训练过程采用对抗训练, 进一步提高教师网络和学生网络的判别性. 在分类数据集CIFAR10、CIFAR100和Tiny-ImageNet及行人重识别数据集Market1501上的实验结果表明本文所提方法TADML的有效性, TADML取得了同类模型压缩方法中最好的效果.
  • 图  1  本文所提算法框架

    Fig.  1  The framework of the proposed algorithm

    图  2  判别器结构图

    Fig.  2  The structure of discriminator

    表  1  损失函数对分类精度的影响比较(%)

    Table  1  Comparison of classification performance with different loss function(%)

    损失构成CIFAR10CIFAR100
    LS92.9070.47
    LS+JS93.1871.70
    LS+JS+Ladv93.5272.75
    LS+L1+Ladv93.0471.97
    LS+L2+Ladv93.2672.02
    LS+L1+JS+Ladv92.8771.63
    LS+L2+JS+Ladv92.3870.90
    LS+JS+Ladv+LT93.0571.81
    下载: 导出CSV

    表  2  判别器结构对分类精度的影响比较(%)

    Table  2  Comparison of classification performance with different discriminator structure(%)

    结构CIFAR100
    256fc-256fc71.57
    500fc-500fc72.09
    100fc-100fc-100fc72.33
    128fc-256fc-128fc72.51
    64fc-128fc-256fc-128fc72.28
    128fc-256fc-256fc-128fc72.23
    下载: 导出CSV

    表  3  判别器输入对分类精度的影响比较(%)

    Table  3  Comparison of classification performance with different discriminator input(%)

    输入约束CIFAR100
    conv472.33
    fc72.51
    conv4+fc72.07
    fc+DAE71.97
    fc+label72.35
    fc+avgfc71.20
    下载: 导出CSV

    表  4  采样数量对分类精度的影响比较(%)

    Table  4  Comparison of classification performance with different sampling strategy(%)

    网络结构VanilaRandomK=2K=4K=8K=16K=32K=64
    Resnet3271.1472.1231.0760.6972.4372.8472.5071.99
    Resnet11074.3174.5922.6452.3374.5975.1875.0174.59
    下载: 导出CSV

    表  5  网络结构对分类精度的影响比较(%)

    Table  5  Comparison of classification performance with different network structure(%)

    网络结构原始网络DML[13]ADMLTADML
    网络1网络2网络1网络2网络1网络2网络1网络2网络1网络2
    ResNet32ResNet32070.4770.4771.8671.8972.8572.8973.0773.13
    ResNet32ResNet11070.4773.1271.6274.0872.6674.1873.1474.86
    ResNet110ResNet11073.1273.1274.5974.5575.0875.1075.5275.71
    WRN-10-4-WRN-10-472.6572.6573.0673.0173.7773.7573.9774.08
    WRN-10-4-WRN-28-1072.6580.7773.5881.1174.6181.4375.1182.13
    下载: 导出CSV

    表  6  网络结构对行人重识别mAP的影响比较(%)

    Table  6  Comparison of person re-identification mAP with different network structure(%)

    网络结构原始网络DML[13]ADMLTADML
    网络1网络2网络1网络2网络1网络2网络1网络2网络1网络2
    InceptionV1MobileNetV165.2646.0765.3452.8765.6053.2266.0353.91
    MobileNetV1MobileNetV146.0746.0752.9551.2653.4253.2753.8453.65
    下载: 导出CSV

    表  7  所提算法与其他压缩算法的实验结果(%)

    Table  7  Experimental results of the proposed algorithm and other compression algorithms(%)

    对比算法参数量CIFAR10CIFAR100Tiny-ImageNet
    ResNet200.27M91.4266.6354.45
    ResNet1642.6M93.4372.2461.55
    Yim[10]-0.27M88.7063.33---
    L2-Ba[23]==0.27M90.9367.21---
    KD[8]0.27M91.1266.6657.65
    FitNet[9]--0.27M91.4164.9655.59
    Quantization[21]0.27M91.13------
    Binary Connect[22]15.20M91.73------
    ANC[24]0.27M91.9267.5558.17
    TSANC[25]0.27M92.1767.4358.20
    KSANC[25]0.27M92.6868.5859.77
    DML[13]0.27M91.8269.4757.91
    ADML-0.27M92.2369.6059.00
    TADML---0.27M93.0570.8160.11
    下载: 导出CSV
  • [1] He K M, Zhang X Y, Ren S Q and Sun J. Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA: IEEE, 2016.770−778.
    [2] Zhang X Y, Zhou X Y, Lin M X and Sun J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake, USA: IEEE, 2018.6848−6856.
    [3] Guo Y W, Yao A B, Zhao H and Chen Y R. Network Sketching: Exploiting Binary Structure in Deep CNNs. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA: IEEE, 2017.4040−4048.
    [4] Tai C, Xiao T, Wang X G and E W N. Convolutional neural networks with low-rank regularization. In: Proceedings of the 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2016.
    [5] Chen W, Wilson J T, Tyree S, Weinberger K Q and Chen Y X. Compressing Neural Networks with the Hashing Trick. In: Proceedings of the 32nd International Conference on Machine Learning, Lille, France: ACM, 2015. 37: 2285−2294.
    [6] Denton E L, Zaremba W, Bruna J, LeCun Y and Fergus R. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation. In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems, Montreal, Quebec, Canada: MIT Press, 2014.1269−1277.
    [7] Li Z, Hoiem D. Learning without Forgetting. In: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands: Springer Verlag, 2016.614−629.
    [8] Hinton G E, Vinyals O and Dean J. Distilling the knowledge in a neural network. arXiv preprint, arXiv: 1503.02531, 2015.
    [9] Romero A, Ballas N, Kahou S E, Chassang A, Gatta C and Bengio Y. Fitnets: Hints for thin deep nets. In: Proceedings of the 3rd International Conference on Learning Representations. San Diego, CA, USA, 2015.
    [10] Yim J, Joo D, Bae J H and Kim J. A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA: IEEE, 2017.7130−7138.
    [11] Peng B Y, Jin X, Li D S, Zhou S F, Wu Y C, Liu J H, Zhang Z N and Liu Y. Correlation Congruence for Knowledge Distillation. In: Proceedings of the 2019 IEEE International Conference on Computer Vision, Seoul, Korea (South): IEEE, 2017.5006−5015.
    [12] Park W, Kim D, Lu Y and Cho M. Relational Knowledge Distillation. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA: IEEE, 2019.3967−3976.
    [13] Zhang Y, Xiang T, Hospedales T M, Lu H C. Deep Mutual Learning. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA: IEEE, 2018.4320−4328.
    [14] Batra T and Parikh D. Cooperative Learning with Visual Attributes. arXiv preprint, arXiv: 1705.05512, 2017.
    [15] Zhang H, Goodfellow I J, Metaxas D N and Odena A. Self-Attention Generative Adversarial Networks. In: Proceedings of the 36th International Conference on Machine Learning, Long Beach, California, USA: ACM, 2019. 97: 7354−7363.
    [16] He K M, Zhang X Y, Ren S Q and Sun J. Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA: IEEE, 2016.770−778.
    [17] Zagoruyko S and Komodakis N. Wide residual networks. In: Proceedings of British Machine Vision Conference, York, UK: Springer, 2016.
    [18] Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases, 2009, 1(4)
    [19] Mirza M and Osindero S. Conditional Generative Adversarial Nets. arXiv preprint. arXiv: 1411.1784, 2014.
    [20] Shu C Y, Li P, Xie Y, Qu Y Y and Kong H. Knowledge Squeezed Adversarial Network Compression. In: Proceedings of the 34st AAAI Conference on Artificial Intelligence, New York, NY, USA: AAAI, 2020.11370−11377.
    [21] Zhu C Z, Han S, Mao H Z and Dally W J. Trained ternary quantization. In: Proceedings of the 5th International Conference on Learning Representations. Toulon, France, 2017.
    [22] Courbariaux M, Bengio Y and David J P. Binaryconnect: Training deep neural networks with binary weights during propagations. In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems. Montreal, Quebec, Canada: MIT Press, 2015.3123−3131.
    [23] Ba J and Caruana R. Do deep nets really need to be deep?. In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems, Montreal, Quebec, Canada: MIT Press, 2014.2654−2662.
    [24] Belagiannis V, Farshad A and Galasso F. Adversarial network compression. In: Proceedings of the 2018 European Conference on Computer Vision, Munich, Germany: Springer, 2018. 11132: 431−449.
    [25] Xu Z, Hsu Y C and H J W. Training student networks for acceleration with conditional adversarial networks. In: Proceedings of British Machine Vision Conference, Newcastle, UK: Springer, 2018.61.
  • 加载中
计量
  • 文章访问数:  36
  • HTML全文浏览量:  18
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-08-18
  • 录用日期:  2020-12-23
  • 网络出版日期:  2021-01-19

目录

    /

    返回文章
    返回