Fine-grained Classification of Car Models Using Fg-CarNet Convolutional Neural Network
-
摘要: 车辆型号识别在智能交通系统、涉车刑侦案件侦破等方面具有十分重要的应用前景.针对车辆型号种类繁多、部分型号区分度小等带来的车辆型号精细分类困难的问题,采用车辆正脸图像为数据源,提出一种多分支多维度特征融合的卷积神经网络模型Fg-CarNet(Convolutional neural networks for car fine-grained classification,Fg-CarNet).该模型根据车正脸图像特征分布特点,将其分为上下两部分并行进行特征提取,并对网络中间层产生的特征进行两个维度的融合,以提取有区分度的特征,提高特征表达能力,通过使用小卷积核以及全局均值池化,使在网络分类准确度提高的同时降低了网络模型参数大小.在CompCars数据集上进行验证,实验结果表明,Fg-CarNet提取的车辆特征在保证网络模型参数最小的同时,车辆型号识别率达到最高,实现了最好的分类效果.Abstract: Car model recognition has very important application in intelligent transportation systems and vehicle-related criminal case detection. A multi-branch and multi-dimension feature fusion convolutional neural network (CNN) model, Fg-CarNet (convolutional neural networks for car fine-grained classification), is proposed. This model uses car frontal face images as data source, and aims to solve the classification difficulty caused by the wide variety of car models and little differentiation between some models. Based on the image feature distribution characteristic of frontal face images, the Fg-CarNet divides them into upper parts and lower parts to extract features in parallel, and then merges the features generated by middle layers of the network to extract more distinguishing features. Through using small convolution kernel and global average pooling, the classification accuracy of Fg-CarNet is improved and at the same time the size of network parameters is reduced. With CompCars dataset, experiments are carried out. The results show that the proposed method can achieve the highest recognition accuracy while keeping the smallest size of network parameters, i.e, the method can achieve the best classification result.1) 本文责任编委 赖剑煌
-
表 1 Fg-CarNet模型结构参数
Table 1 tructural parameters of the Fg-CarNet
子网络 层编号 类型 卷积核尺寸/步长 池化类型 池化尺寸和步长 输出尺寸(深度$\times$长度$\times$高度) UpNet 1 Convolution/BN 5$\times$5/2 Max pooling 3$\times$3/2 64$\times$64$\times$32 2 Convolution/BN 3$\times$3/1 Max pooling 3$\times$3/2 96$\times$32$\times$16 3 Convolution/BN 3$\times$3/1 Max pooling 3$\times$3/2 128$\times$16$\times$8 4 Convolution/BN 3$\times$3/1 Max pooling 3$\times$3/2 128$\times$8$\times$4 DownNet 1 Convolution 5$\times$5/2 $-$ $-$ 64$\times$128$\times$64 2 Convolution/BN 1$\times$1/1 Max pooling 3$\times$3/2 64$\times$64$\times$32 3 Convolution 3$\times$3/1 $-$ $-$ 96$\times$64$\times$32 4 Convolution/BN 1$\times$1/1 Max pooling 3$\times$3/2 96$\times$32$\times$16 5 Convolution 3$\times$3/1 $-$ $-$ 128$\times$32$\times$16 6 Convolution/BN 1$\times$1/1 Max pooling 3$\times$3/2 128$\times$16$\times$8 7 Convolution 3$\times$3/1 $-$ $-$ 128$\times$16$\times$8 8 Convolution/BN 1$\times$1/1 Max pooling 3$\times$3/2 128$\times$8$\times$4 FusionNet 1 Concat $-$ $-$ 96$\times$32$\times$32 2 Convolution 3$\times$3/2 Max pooling 2$\times$2/2 128$\times$8$\times$8 3 Concat $-$ $-$ $-$ 128$\times$8$\times$8 4 Concat $-$ $-$ $-$ 256$\times$8$\times$8 5 Convolution/BN 3$\times$3/1 Max pooling 3$\times$3/2 256$\times$4$\times$4 6 Convolution/Drop 1$\times$1/1 $-$ $-$ 281$\times$4$\times$4 7 Convolution/Drop 1$\times$1/1 $-$ $-$ 281$\times$4$\times$4 8 Convolution 1$\times$1/1 Global pooling $-$ 281$\times$1$\times$1 表 2 卷积神经网络模型在CompCars上使用不同分类器的识别率
Table 2 Recognition rate of different CNN models using different classifiers on CompCars
分类器 各模型识别率 AlexNet (%) GoogLeNet (%) NIN (%) Fg-CarNet (%) 朴素贝叶斯 91.10 96.95 86.06 98.42 KNN 93.41 98.35 92.78 98.78 逻辑回归 96.08 98.39 95.91 98.76 随机森林 82.96 95.61 74.60 93.52 SVM 96.02 98.33 96.23 98.78 Softmax 97.73 98.50 96.51 98.89 表 3 各神经网络模型参数的大小
Table 3 The size of each CNN model parameters
神经网络模型 模型参数大小(MB) AlexNet 232.1 GoogLeNet 44.7 NIN 12.8 Fg-CarNet 6.3 表 4 相关工作的识别结果
Table 4 Report results of some related works
表 5 分块融合的性能比较
Table 5 Performance comparison of block fusion
模型 准确率1 (%) 准确率2 (%) Fg-CarNet-Up 93.37 89.78 Fg-CarNet-Down 97.38 93.82 Fg-CarNet-Whole 98.02 97.84 Fg-CarNet 98.89 98.27 表 6 不同基本单元特征组合下的识别结果
Table 6 Recognition result based on different basic unit combinations
模型编号 单元编号 1 2 3 4 准确率1 1 √ √ 0.98906 2 √ √ 0.98789 3 √ √ 0.98843 4 √ √ √ 0.98835 5 √ √ √ 0.98901 6 √ √ √ 0.98882 7 √ √ √ √ 0.98835 -
[1] Coifman B, Beymer D, McLauchlan P, Malik J. A real-time computer vision system for vehicle tracking and traffic surveillance. Transportation Research, Part C:Emerging Technologies, 1998, 6(4):271-288 doi: 10.1016/S0968-090X(98)00019-9 [2] 吴聪, 李勃, 董蓉, 陈启美.基于车型聚类的交通流参数视频检测.自动化学报, 2011, 37(5):569-576 http://www.aas.net.cn/CN/abstract/abstract17392.shtmlWu Cong, Li Bo, Dong Rong, Chen Qi-Mei. Detecting traffic parameters based on vehicle clustering from video. Acta Automatica Sinica, 2011, 37(5):569-576 http://www.aas.net.cn/CN/abstract/abstract17392.shtml [3] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786):504-507 doi: 10.1126/science.1127647 [4] 管皓, 薛向阳, 安志勇.深度学习在视频目标跟踪中的应用进展与展望.自动化学报, 2016, 42(6):834-847 http://www.aas.net.cn/CN/abstract/abstract18874.shtmlGuan Hao, Xue Xiang-Yang, An Zhi-Yong. Advances on application of deep learning for video object tracking. Acta Automatica Sinica, 2016, 42(6):834-847 http://www.aas.net.cn/CN/abstract/abstract18874.shtml [5] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada, USA: ACM, 2012. 1097-1105 http://dl.acm.org/citation.cfm?id=2999257 [6] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015. 3431-3440 http://www.ncbi.nlm.nih.gov/pubmed/27244717/ [7] 朱煜, 赵江坤, 王逸宁, 郑兵兵.基于深度学习的人体行为识别算法综述.自动化学报, 2016, 42(6):848-857 http://www.aas.net.cn/CN/abstract/abstract18875.shtmlZhu Yu, Zhao Jiang-Kun, Wang Yi-Ning, Zheng Bing-Bing. A review of human action recognition based on deep learning. Acta Automatica Sinica, 2016, 42(6):848-857 http://www.aas.net.cn/CN/abstract/abstract18875.shtml [8] Hsieh J W, Yu S H, Chen Y S, Hu W F. Automatic traffic surveillance system for vehicle tracking and classification. IEEE Transactions on Intelligent Transportation Systems, 2006, 7(2):175-187 doi: 10.1109/TITS.2006.874722 [9] Zhang Z X, Tan T N, Huang K Q, Wang Y H. Three-dimensional deformable-model-based localization and recognition of road vehicles. IEEE Transactions on Image Processing, 2012, 21(1):1-13 http://europepmc.org/abstract/med/21724513 [10] Ji P J, Jin L W, Li X T. Vision-based vehicle type classification using partial Gabor filter bank. In: Proceedings of the 2007 IEEE International Conference on Automation and Logistics. Jinan, China: IEEE, 2007. 1037-1040 http://www.wanfangdata.com.cn/details/detail.do?_type=conference&id=WFHYXW199260 [11] Li J, Zhao W Z, Guo H. Vehicle type recognition based on Harris corner detector. In: Proceedings of the 2nd International Conference on Transportation Engineering. Chengdu, China: Southwest Jiaotong University, 2009. 3320-3325 http://cedb.asce.org/CEDBsearch/record.jsp?dockey=175042 [12] 康维新, 曹宇亭, 盛卓, 李鹏, 姜澎.车辆的Harris与SIFT特征及车型识别.哈尔滨理工大学学报, 2012, 17(3):69-73 doi: 10.3969/j.issn.1007-2683.2012.03.017Kang Wei-Xin, Cao Yu-Ting, Sheng Zhuo, Li Peng, Jiang Peng. Harris corner and SIFT feature of vehicle and type recognition. Journal of Harbin University of Science and Technology, 2012, 17(3):69-73 doi: 10.3969/j.issn.1007-2683.2012.03.017 [13] Qi X X, Ji J W, Han X W, Yuan Z H. An approach of passive vehicle type recognition by acoustic signal based on SVM. In: Proceedings of the 3rd International Conference on Genetic and Evolutionary Computing. Guilin, China: IEEE, 2009. 545-548 http://dl.acm.org/citation.cfm?id=1730574 [14] Dong Z, Wu Y W, Pei M T, Jia Y D. Vehicle type classification using a semisupervised convolutional neural network. IEEE Transactions on Intelligent Transportation Systems, 2015, 16(4):2247-2256 doi: 10.1109/TITS.2015.2402438 [15] Psyllos A P, Anagnostopoulos C N E, Kayafas E. Vehicle logo recognition using a SIFT-based enhanced matching scheme. IEEE Transactions on Intelligent Transportation Systems, 2010, 11(2):322-328 doi: 10.1109/TITS.2010.2042714 [16] 余烨, 聂振兴, 金强, 王江明.前背景骨架区域随机点对策略驱动下的车标识别方法.中国图象图形学报, 2016, 21(10):1348-1356 doi: 10.11834/jig.20161009Yu Ye, Nie Zhen-Xing, Jin Qiang, Wang Jiang-Ming. Vehicle logo recognition based on randomly sampled pixel-pair feature from foreground-background skeleton areas. Journal of Image and Graphics, 2016, 21(10):1348-1356 doi: 10.11834/jig.20161009 [17] Hu C P, Bai X, Qi L, Wang X G, Xue G J, Mei L. Learning discriminative pattern for real-time car brand recognition. IEEE Transactions on Intelligent Transportation Systems, 2015, 16(6):3170-3181 doi: 10.1109/TITS.2015.2441051 [18] Zhang B L. Reliable classification of vehicle types based on cascade classifier ensembles. IEEE Transactions on Intelligent Transportation Systems, 2013, 14(1):322-332 doi: 10.1109/TITS.2012.2213814 [19] Hsieh J W, Chen L C, Chen D Y. Symmetrical SURF and its applications to vehicle detection and vehicle make and model recognition. IEEE Transactions on Intelligent Transportation Systems, 2014, 15(1):6-20 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=JJ0232398590 [20] Pandey G, McBride J R, Eustice R M. Ford campus vision and lidar data set. The International Journal of Robotics Research, 2011, 30(13):1543-1552 doi: 10.1177/0278364911400640 [21] Xiao T J, Xu Y C, Yang K Y, Zhang J X, Peng Y X, Zhang Z. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015. 842-850 doi: 10.1109/CVPR.2015.7298685 [22] Göring C, Rodner E, Freytag A, Denzler J. Nonparametric part transfer for fine-grained recognition. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, OH, USA: IEEE, 2014. 2489-2496 doi: 10.1109/CVPR.2014.319 [23] Liao L, Hu R M, Xiao J, Wang Q, Xiao J, Chen J. Exploiting effects of parts in fine-grained categorization of vehicles. In: Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP). Quebec City, QC, Canada: IEEE, 2015. 745-749 http://ieeexplore.ieee.org/document/7350898/ [24] Lin Y L, Morariu V I, Hsu W, Davis L S. Jointly optimizing 3D model fitting and fine-grained classification. In: Proceedings of the 2014 European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014. 466-480 doi: 10.1007/978-3-319-10593-2_31 [25] Krause J, Stark M, Deng J, Li F F. 3D object representations for fine-grained categorization. In: Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops (ICCVW). Sydney, NSW, Australia: IEEE, 2014. 554-561 http://dl.acm.org/citation.cfm?id=2586296 [26] Yang L J, Luo P, Change Loy C, Tang X O. A large-scale car dataset for fine-grained categorization and verification. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015. 3973-3981 A large-scale car dataset for fine-grained categorization and verification. In: Proceedings of the 2015 IEEE Conference on C [27] Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y. Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv: 1312.6229, 2014. http://users.ics.aalto.fi/perellm1/thesis/summaries_html/node97.html [28] Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Rabinovich A. Going deeper with convolutions. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015. 1-9 [29] Zhang X F, Zhou F, Lin Y Q, Zhang S T. Embedding label structures for fine-grained feature representation. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, Nevada, USA: IEEE, 2016. 1114-1123 http://ieeexplore.ieee.org/abstract/document/7780495/ [30] Fang J, Zhou Y, Yu Y, Du S D. Fine-grained vehicle model recognition using a coarse-to-fine convolutional neural network architecture. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(7):1782-1792 doi: 10.1109/TITS.2016.2620495 [31] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556, 2015. http://www.sciencedirect.com/science/article/pii/S0893608014002007 [32] Haykin S O. Neural Networks and Learning Machines (3rd edition). Upper Saddle River, NJ, USA:Pearson, 2009. [33] Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning. Lille, France: PMLR, 2015, 37: 448-456 http://dl.acm.org/citation.cfm?id=3045167 [34] Jia Y Q, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia. Orlando, Florida, USA: ACM, 2014. 675-678 [35] Lin M, Chen Q, Yan S C. Network in network. arXiv: 1312. 4400, 2014. [36] Van Der Maaten L, Hinton G. Visualizing data using t-SNE. Journal of Machine Learning Research, 2008, 9(11):2579-2605 http://library.usask.ca/find/ejournals/view.php?id=111002212682020 [37] Van Der Maaten L. Accelerating t-SNE using tree-based algorithms. The Journal of Machine Learning Research, 2014, 15(1):3221-3245 http://www.mendeley.com/catalog/accelerating-tsne-using-treebased-algorithms/