一种基于动态量化编码的深度神经网络压缩方法

饶川; 陈靓影; 徐如意; 刘乐元

doi:10.16383/j.aas.c180554

一种基于动态量化编码的深度神经网络压缩方法

doi: 10.16383/j.aas.c180554

饶川^1,2,,
陈靓影^1,2, ,,
徐如意^1,2,,
刘乐元^1,2,

1.
华中师范大学国家数字化学习工程技术研究中心武汉 430079
2.
华中师范大学教育大数据国家工程实验室武汉 430079

基金项目:

湖北省自然科学基金 2017CFB504

湖北省创新研究团队 2017CFA007

国家自然科学基金 61807014

国家重点研发计划 2018YFB1004504

中央高校基本业务费 CCNU19Z02002

中国博士后科学基金 2018M632889

国家自然科学基金 61702208

详细信息

作者简介:
饶川  华中师范大学国家数字化学习工程技术研究中心硕士研究生.2015年获得湖北大学计算机与信息工程学院学士学位.主要研究方向为深度模型的压缩与加速.E-mail:raoguoc@163.com

徐如意  华中师范大学国家数字化学习工程技术研究中心算法工程师.2008年获得武汉科技大学学士学位, 2016年获得华中科技大学硕士学位.主要研究方向为计算机视觉及多媒体应用.E-mail:86798653@qq.com

刘乐元  华中师范大学国家数字化学习工程技术研究中心副教授.主要研究方向为计算机视觉, 模式识别, 多模态人机交互.E-mail:lyliu@mail.ccnu.edu.cn

通讯作者:
陈靓影华中师范大学国家数字化学习工程技术研究中心教授.2001年获得南洋理工计算机科学与工程系博士学位.主要研究方向为图像处理, 计算机视觉, 模式识别, 多媒体应用.本文通信作者.E-mail:chenjy@mail.ccnu.edu.cn

计量
- 文章访问数: 1505
- HTML全文浏览量: 444
- PDF下载量: 140
- 被引次数: 0
出版历程
- 收稿日期: 2018-08-17
- 录用日期: 2019-01-02
- 刊出日期: 2019-10-20

A Dynamic Quantization Coding Based Deep Neural Network Compression Method

RAO Chuan^{1,2
,},
CHEN Jing-Ying^{1,2
, ,},
XU Ru-Yi^{1,2
,},
LIU Le-Yuan^{1,2
,}

1.
National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430079
2.
National Engineering Laboratory for Big Data for Education, Central China Normal University, Wuhan 430079

Funds:

Hubei Natural Science Foundation 2017CFB504

Foundation for Innovative Research Groups of Hubei Province 2017CFA007

National Natural Science Foundation of China 61807014

National Basic Research Program of China 2018YFB1004504

Basic Operating Costs of Central Universities CCNU19Z02002

China Postdoctoral Science Fund 2018M632889

National Natural Science Foundation of China 61702208

More Information

Author Bio:
Master student at National Engineering Research Center for E-Learning, Central China Normal University. He received his bachelor degree from Hubei University in 2015. His research interest covers deep neural network compression and acceleration

Algorithmic engineer at the National Engineering Research Center for E-Learning, Central China Normal University. He received his bachelor degree from Wuhan University of Science and Technology, and master degree from Huazhong University of Science and Technology in 2008 and 2016 respectively. His research interest covers computer vision and multimedia applications

Associate professor at the National Engineering Research Center for E-Learning, Central China Normal University. His research interest covers computer vision, pattern recognition, and multimodal human-computer interaction

Corresponding author: CHEN Jing-Ying Professor at National Engineering Research Center for E-Learning, Central China Normal University. She received her Ph.D. degree from the School of Computer Engineering, Nanyang Technological University, Singapore in 2001. Her research interest covers image processing, computer vision, pattern recognition, and multimedia applications. Corresponding author of this paper

摘要

摘要: 近年来深度神经网络（Deep neural network，DNN）从众多机器学习方法中脱颖而出，引起了广泛的兴趣和关注.然而，在主流的深度神经网络模型中，其参数数以百万计，需要消耗大量的计算和存储资源，难以应用于手机等移动嵌入式设备.为了解决这一问题，本文提出了一种基于动态量化编码（Dynamic quantization coding，DQC）的深度神经网络压缩方法.不同于现有的采用静态量化编码（Static quantitative coding，SQC）的方法，本文提出的方法在模型训练过程中同时对量化码本进行更新，使码本尽可能减小较大权重参数量化引起的误差.通过大量的对比实验表明，本文提出的方法优于现有基于静态编码的模型压缩方法.
- 深度神经网络 /
- 模型压缩 /
- 动态量化编码 /
- 码本更新
Abstract: Recently, deep neural network (DNN) stands out from many machine learning methods and has attracted wide interest and attention. However, it is difficult to apply DNN to mobile embedded devices such as mobile phones due to millions of parameters for the mainstream model of deep neural network, which requires a lot of calculation and storage resources. To address this problem, this paper proposes a deep neural network compressing method based on dynamic quantization coding (DQC). Different from the existing static quantitative coding (SQC) methods, the proposed method updates the quantized codebook in the training process, so as to minimize the error caused by large weight parameters. Numerous experiments show that the proposed method is superior to the existing model compression method based on SQC.
- Deep neural network (DNN) /
- model compression /
- dynamic quantization coding (DQC) /
- codebook update
注释:

1) 本文责任编委胡清华

HTML全文

图 1 网络权值的量化规则

Fig. 1 Quantization rules for network weights

下载: 全尺寸图片幻灯片

图 2 动态量化编码压缩方法的训练流程

Fig. 2 The process of dynamic quantization coding

下载: 全尺寸图片幻灯片

图 3 码本中无0, SQC和DQC的量化比较

Fig. 3 Quantization performance of SQC and DQC with 0 in codebook

下载: 全尺寸图片幻灯片

图 4 码本中有0, SQC和DQC的量化效果比较

Fig. 4 Quantization performance of SQC and DQC without 0 in codebook

下载: 全尺寸图片幻灯片

表 1 LeNet在Softmax-loss下量化效果

Table 1 Quantization performance of LeNet under Softmax-loss

位宽	码本无0	码本有0
3	99.29 %	99.22 %
4	99.30 %	99.25 %
5	99.35 %	99.32 %

下载: 导出CSV

表 2 LeNet在Softmax-loss+L1下量化效果

Table 2 Quantization performance of LeNet under Softmax-loss and L1

位宽	码本无0	码本有0
3	98.69 %	99.25 %
4	99.09 %	99.25 %
5	99.14 %	99.27 %

下载: 导出CSV

表 3 LeNet在Softmax-loss+L2下量化效果

Table 3 Quantization performance of LeNet under Softmax-loss and L2

位宽	码本无0	码本有0
3	99.26 %	99.29 %
4	99.29 %	99.28 %
5	99.36 %	99.28 %

下载: 导出CSV

表 4 ResNet-20在不同码本下量化效果

Table 4 Quantization performance of ResNet-20 under different codebook

位宽	码本无0	码本有0
3	90.07 %	90.78 %
4	91.71 %	91.91 %
5	92.63 %	92.82 %

下载: 导出CSV

表 5 ResNet-32在不同码本下量化效果

Table 5 Quantization performance of ResNet-32 under different codebook

位宽	码本无0	码本有0
3	91.44 %	92.11 %
4	92.53 %	92.36 %
5	92.87 %	92.33 %

下载: 导出CSV

表 6 ResNet-44在不同码本下量化效果

Table 6 Quantization performance of ResNet-44 under different codebook

位宽	码本无0	码本有0
3	92.68 %	92.53 %
4	93.14 %	93.37 %
5	93.28 %	93.14 %

下载: 导出CSV

表 7 ResNet-56在不同码本下量化效果

Table 7 Quantization performance of ResNet-56 under different codebook

位宽	码本无0	码本有0
3	92.72 %	92.69 %
4	93.54 %	93.39 %
5	93.21 %	93.24 %

下载: 导出CSV

表 8 固定码本下量化效果

Table 8 Quantization performance of SQC

网络	3 bits码本	3bits码本	3 bits码本	3 bits码本	3 bits码本	3 bits码本
网络	SQC无0	SQC有0	SQC无0	SQC有0	SQC无0	SQC有0
ResNet-20	92.72 %	92.69 %	92.72 %	92.69 %	92.72 %	92.69 %
ResNet-32	93.54 %	93.39 %	92.72 %	92.69 %	92.72 %	92.69 %
ResNet-44	93.21 %	93.24 %	92.72 %	92.69 %	92.72 %	92.69 %
ResNet-56	93.21 %	93.24 %	92.72 %	92.69 %	92.72 %	92.69 %

下载: 导出CSV

表 9 Deep compression与DQC的实验比较

Table 9 Comparison of deep compression and DQC

压缩方法	位宽	准确率
Deep compression	5	99.20 %
DQC	5	99.70 %

下载: 导出CSV

表 10 量化为5 bits时INQ和DQC在CIFAR-10上的准确率比较

Table 10 Compare the accuracy of INQ and DQC on CIFAR-10 with 5 bits

网络	INQ	DQC码本无0	DQC码本有0
ResNet-20	91.01 %	92.63 %	92.82 %
ResNet-32	91.78 %	92.87 %	92.33 %
ResNet-44	92.30 %	93.28 %	93.14 %
ResNet-56	92.29 %	93.21 %	93.24 %

下载: 导出CSV

参考文献(35)

[1]	Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc, 2012. 1097-1105
[2]	Russakovsky O, Deng J, Su H, et al. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 2015, 115(3):211-252 http://d.old.wanfangdata.com.cn/NSTLHY/NSTL_HYCC0214533907/
[3]	Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Science, 2014 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=10.1080/01431161.2018.1506593
[4]	Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions. In: Proceeding of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, USA: IEEE, 2015. 1-9
[5]	He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 770-778
[6]	He K, Zhang X, Ren S, et al. Identity Mappings in Deep Residual Networks. In: Proceeding of the European Conference on Computer Vision. Springer International Publishing, 2016. 630-645
[7]	Gong Y C, Liu L, Yang M, Bourdev L. Compressing deep concolutional net-works using vector quantization. arXiv preprint, arXiv: 1412.6115v1, 2014.
[8]	Chen W, Wilson J T, Tyree S, et al. Compressing Neural Networks with the Hashing Trick. Computer Science, 2015:2285-2294 http://arxiv.org/abs/1504.04788
[9]	Han S, Mao H, Dally W J. Deep compression:compressing deep neural networks with pruning, trained quantization and Huffman coding. Fiber, 2015, 56(4):3-7
[10]	Courbariaux M, Bengio Y, David J P. BinaryConnect: training deep neural networks with binary weights during propagations. arXiv preprint, arXiv: 1511.00363, 2015.
[11]	Courbariaux M, Hubara I, Soudry D, et al. Binarized neural networks: training deep neural networks with weights and activations constrained to +1 or -1. arXiv preprint, arXiv: 1602.02830, 2016.
[12]	Rastegari M, Ordonez V, Redmon J, et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. In: Proceedings of the European Conference on Computer Vision. Springer, Cham, 2016. 525-542
[13]	Li Z, Ni B, Zhang W, Yang X, Gao W. Performance Guaranteed Network Acceleration via High-Order Residual Quantization. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 2017. 2603-2611
[14]	Li F, Zhang B, Liu B. Ternary weight networks. arXiv preprint, arXiv: 1605.04711, 2016.
[15]	Zhu C Z, Han S, Mao H Z, Dally W J. Trained ternary quantization. arXiv preprint, arXiv: 1612.01064, 2016.
[16]	Cai Z, He X, Sun J, Vasconcelos N. Deep Learning with Low Precision by Half-Wave Gaussian Quantization. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). arXiv preprint, arXiv: 1702.00953, 2017.
[17]	Zhou A J, Yao A B, Guo Y W, Xu L, Chen Y R. Incremental network quantization: towards lossless CNNs with low-precision weights. arXiv preprint, arXiv: 1702.03044, 2017.
[18]	Song Han, Jeff Pool, John Tran, William J. Dally. Learning Both Weights and Connections for Efficient Neural Networks. arXiv: 1506.02626, 2015.
[19]	Anwar S, Sung W Y. Coarse pruning of convolutional neural networks with random masks. In: Proceedings of the Int'l Conference on Learning and Representation (ICLR). IEEE, 2017. 134-145
[20]	Li H, Kadav A, Durdanovic I, Samet H, Graf H P. Pruning filters for efficient ConvNets. In: Proceedings of the Int'l Conference on Learning and Representation (ICLR). IEEE, 2017. 34-42
[21]	Luo J H, Wu J, Lin W. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression. 2017: arXiv: 1707.06342
[22]	Hu H, Peng R, Tai Y W, Tang C K. Network trimming: a data-driven neuron pruning approach towards efficient deep architectures. In: Proceedings of the Int'l Conference on Learning and Representation (ICLR). IEEE, 2017. 214-222
[23]	Luo J, Wu J. An Entropy-based Pruning Method for CNN Compression. CoRR, 2017. abs/1706.05791
[24]	Yang T, Chen Y, Sze V. Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6071-6079
[25]	Hinton G, Vinyals O, Dean J. Distilling the Knowledge in a Neural Network. Computer Science, 2015, 14(7):38-39
[26]	Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y. Fitnets: Hints for thin deep nets. In: Proceedings of the Int'l Conference on Learning and Representation (ICLR). IEEE, 2017. 124-133
[27]	Zagoruyko S, Komodakis N.Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. CoRR, 2016. abs/1612.03928
[28]	Zhang X, Zou J, He K, et al. Accelerating very deep convolutional networks for classification and detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10):1943-1955 doi: 10.1109/TPAMI.2015.2502579
[29]	Lebedev V, Ganin Y, Rakhuba M, et al. Speeding-up convolutional neural networks using fine-tuned CP-decomposition. Computer Science, 2015.
[30]	Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5MB model size. 2016.
[31]	Howard A G, Zhu M, Chen B, et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. 2017. arXiv: 1704.04861
[32]	Zhang X, Zhou X, Lin M, et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. 2017.
[33]	Bengio, Yoshua, Léonard, Nicholas, and Courville, Aaron. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv: 1308.3432, 2013.
[34]	LeCun, Bottou, Bengio, Haffner. Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, 1998. 86(11): 2278-2324
[35]	Krizhevsky A. Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases, 2009, 1(4).