-
摘要: 为避免人为因素对表情特征提取产生的影响,本文选择卷积神经网络进行人脸表情识别的研究.相较于传统的表情识别方法需要进行复杂的人工特征提取,卷积神经网络可以省略人为提取特征的过程.经典的LeNet-5卷积神经网络在手写数字库上取得了很好的识别效果,但在表情识别中识别率不高.本文提出了一种改进的LeNet-5卷积神经网络来进行面部表情识别,将网络结构中提取的低层次特征与高层次特征相结合构造分类器,该方法在JAFFE表情公开库和CK+数据库上取得了较好的结果.Abstract: In order to avoid the influence of human factors on facial expression feature extraction, convolution neural network is adopted for facial expression recognition in this paper. Compared with the traditional method of facial expression recognition which requires complicated manual feature extraction, convolutional neural network can omit the process of feature extraction. The classical LeNet-5 convolutional neural network has a good recognition rate in handwritten digital dataset, but a low recognition rate in facial expression recognition. An improved LeNet-5 convolution neural network is proposed for facial expression recognition, which combines low-level features with high-level features extracted from the network structure to construct the classifier. The method achieves good results in JAFFE expression dataset and the CK+ dataset.1) 本文责任编委 胡清华
-
表 1 LeNet-5网络Layer 2与Layer 3之间的连接方式
Table 1 Connection between LeNet-5 network0s Layer 2 and Layer 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 √ √ √ √ √ √ √ √ √ √ 2 √ √ √ √ √ √ √ √ √ √ 3 √ √ √ √ √ √ √ √ √ √ 4 √ √ √ √ √ √ √ √ √ √ 5 √ √ √ √ √ √ √ √ √ √ 6 √ √ √ √ √ √ √ √ √ √ 表 2 卷积网络参数
Table 2 Convolutional network parameters
输入 输入尺寸 卷积核大小 池化区域 步长 输出尺寸 Input 32 × 32 5 × 5 1 28 × 28 Layer 1 6 @ 28 × 28 2 × 2 2 6@14 × 14 Layer 2 6 @ 14 × 14 5 × 5 1 10 × 10 Layer 3 16 @ 10 × 10 2 × 2 2 16 @ 5 × 5 Layer 4 16 @ 5 × 5 5 × 5 1 120@1 × 1 Layer 5 120 @ 1 × 1 1 × 84 Layer 6 1 × 1 660 1 × 7 Output 1 × 7 表 3 JAFFE表情库不同表情的分类正确率(%)
Table 3 Classification accuracy of different expressions in JAFFE expression dataset (%)
生气 厌恶 害怕 高兴 中性 悲伤 惊讶 整体 测试集1 100 80 100 100 100 90.91 88.89 94.37 测试集2 100 90 90 81.82 100 100 100 92.96 测试集3 100 100 81.82 90.91 100 100 100 95.77 整体 100 89.66 90.63 90.63 100 96.77 96.55 94.37 表 4 CK+数据库不同表情的分类正确率(%)
Table 4 Classification accuracy of different expressions in CK+ dataset (%)
生气 厌恶 害怕 高兴 中性 悲伤 惊讶 整体 测试集1 88.89 94.44 80 92.86 70.83 96 93.94 88.89 测试集2 70.37 77.78 80 96.30 68 84 96.97 82.32 测试集3 77.78 85.71 84.62 100 64 72 93.94 83.33 测试集4 62.96 94.29 88 89.29 60 80 87.88 80.81 测试集5 81.48 85.71 72 92.86 64 79.17 100 83.33 整体 76.30 87.59 80.92 94.26 65.37 82.23 94.55 83.74 表 5 网络是否跨连接正确率对比(%)
Table 5 Classification accuracy of the network whether cross connection or not (%)
方法 参数量 JAFFE表情库中平均正确率 CK+数据库中平均正确率 LeNet-5 14 444 62.44 32.32 本文方法 25 476 94.37 83.74 -
[1] Pantic M, Rothkrantz L J M. Expert system for automatic analysis of facial expressions. Image and Vision Computing, 2000, 18(11):881-905 doi: 10.1016/S0262-8856(00)00034-2 [2] Ekman P, Friesen W V. Facial Action Coding System:A Technique for the Measurement of Facial Movement. Palo Alto, CA:Consulting Psychologists Press, 1978. https://www.researchgate.net/publication/239537771_Facial_action_coding_system_A_technique_for_the_measurement_of_facial_movement [3] Lucey P, Cohn J F, Kanade T, Saragih J, Ambadar Z, Matthews I. The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). San Francisco, CA, USA: IEEE, 2010. 94-101 http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=5543262 [4] Lanitis A, Taylor C J, Cootes T F. Automatic interpretation and coding of face images using flexible models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(7):743-756 doi: 10.1109/34.598231 [5] Praseeda Lekshmi V, Sasikumar M. Analysis of facial expression using Gabor and SVM. International Journal of Recent Trends in Engineering, 2009, 1(2):47-50 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.381.5275 [6] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, USA: NIPS, 2012. 1097-1105 http://dl.acm.org/citation.cfm?id=2999257 [7] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786):504-507 doi: 10.1126/science.1127647 [8] 余凯, 贾磊, 陈雨强, 徐伟.深度学习的昨天、今天和明天.计算机研究与发展, 2013, 50(9):1799-1804 doi: 10.7544/issn1000-1239.2013.20131180Yu Kai, Jia Lei, Chen Yu-Qiang, Xu Wei. Deep learning:yesterday, today, and tomorrow. Journal of Computer Research and Development, 2013, 50(9):1799-1804 doi: 10.7544/issn1000-1239.2013.20131180 [9] 王梦来, 李想, 陈奇, 李澜博, 赵衍运.基于CNN的监控视频事件检测.自动化学报, 2016, 42(6):892-903 http://www.aas.net.cn/CN/abstract/abstract18880.shtmlWang Meng-Lai, Li Xiang, Chen Qi, Li Lan-Bo, Zhao Yan-Yun. Surveillance event detection based on CNN. Acta Automatica Sinica, 2016, 42(6):892-903 http://www.aas.net.cn/CN/abstract/abstract18880.shtml [10] 奚雪峰, 周国栋.面向自然语言处理的深度学习研究.自动化学报, 2016, 42(10):1445-1465 http://www.aas.net.cn/CN/abstract/abstract18934.shtmlXi Xue-Feng, Zhou Guo-Dong. A survey on deep learning for natural language processing. Acta Automatica Sinica, 2016, 42(10):1445-1465 http://www.aas.net.cn/CN/abstract/abstract18934.shtml [11] 张晖, 苏红, 张学良, 高光来.基于卷积神经网络的鲁棒性基音检测方法.自动化学报, 2016, 42(6):959-964 http://www.aas.net.cn/CN/abstract/abstract18887.shtmlZhang Hui, Su Hong, Zhang Xue-Liang, Gao Guang-Lai. Convolutional neural network for robust pitch determination. Acta Automatica Sinica, 2016, 42(6):959-964 http://www.aas.net.cn/CN/abstract/abstract18887.shtml [12] 随婷婷, 王晓峰.一种基于CLMF的深度卷积神经网络模型.自动化学报, 2016, 42(6):875-882 http://www.aas.net.cn/CN/abstract/abstract18878.shtmlSui Ting-Ting, Wang Xiao-Feng. Convolutional neural networks with candidate location and multi-feature fusion. Acta Automatica Sinica, 2016, 42(6):875-882 http://www.aas.net.cn/CN/abstract/abstract18878.shtml [13] 王伟凝, 王励, 赵明权, 蔡成加, 师婷婷, 徐向民.基于并行深度卷积神经网络的图像美感分类.自动化学报, 2016, 42(6):904-914 http://www.aas.net.cn/CN/abstract/abstract18881.shtmlWang Wei-Ning, Wang Li, Zhao Ming-Quan, Cai Cheng-Jia, Shi Ting-Ting, Xu Xiang-Min. Image aesthetic classification using parallel deep convolutional neural networks. Acta Automatica Sinica, 2016, 42(6):904-914 http://www.aas.net.cn/CN/abstract/abstract18881.shtml [14] 常亮, 邓小明, 周明全, 武仲科, 袁野, 杨硕, 王宏安.图像理解中的卷积神经网络.自动化学报, 2016, 42(9):1300-1312 http://www.aas.net.cn/CN/abstract/abstract18919.shtmlChang Liang, Deng Xiao-Ming, Zhou Ming-Quan, Wu Zhong-Ke, Yuan Ye, Yang Shuo, Wang Hong-An. Convolutional neural networks in image understanding. Acta Automatica Sinica, 2016, 42(9):1300-1312 http://www.aas.net.cn/CN/abstract/abstract18919.shtml [15] 孙晓, 潘汀, 任福继.基于ROI-KNN卷积神经网络的面部表情识别.自动化学报, 2016, 42(6):883-891 http://www.aas.net.cn/CN/abstract/abstract18879.shtmlSun Xiao, Pan Ting, Ren Fu-Ji. Facial expression recognition using ROI-KNN deep convolutional neural networks. Acta Automatica Sinica, 2016, 42(6):883-891 http://www.aas.net.cn/CN/abstract/abstract18879.shtml [16] Hubel D H, Wiesel T N. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. The Journal of Physiology, 1962, 160(1):106-154 doi: 10.1113/jphysiol.1962.sp006837 [17] Fukushima K, Miyake S, Ito T. Neocognitron:a neural network model for a mechanism of visual pattern recognition. IEEE Transactions on Systems, Man, and Cybernetics, 1983, SMC-13(5):826-834 doi: 10.1109/TSMC.1983.6313076 [18] Le Cun Y, Boser B, Denker J S, Howard R E, Habbard W, Jackel L D, Henderson D. Handwritten digit recognition with a back-propagation network. Advances in Neural Information Processing Systems 2. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1989. 396-404 [19] Le Cun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11):2278-2324 doi: 10.1109/5.726791 [20] Bengio Y. Learning deep architectures for AI. Foundations and Trends® in Machine Learning, 2009, 2(1):1-127 doi: 10.1561/2200000006 [21] Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS) 2010. Sardinia, Italy: Chia Laguna Resort, 2010. 249-256 [22] Ziegel R. Modern Applied Statistics with S-plus (3rd edition), by Venables W N and Ripley B D, New York: Springer-Verlag, 1999, Technometrics, 2001, 43(2): 249 [23] Srivastava R K, Greff K, Schmidhuber J. Highway networks. Computer Science, arXiv: 1505. 00387, 2015. [24] Romero A, Ballas N, Kahou S E, Chassang A, Gatta C, Bengio Y. FitNets: hints for thin deep nets. Computer Science, arXiv: 1412. 6550, 2014. [25] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. arXiv: 1512. 03385, 2016. 770-778 [26] Sun Y, Wang X G, Tang X O. Deep learning face representation from predicting 10, 000 classes. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, OH, USA: IEEE, 2014. 1891-1898 https://www.computer.org/csdl/proceedings/cvpr/2014/5118/00/5118b891-abs.html [27] 张婷, 李玉鑑, 胡海鹤, 张亚红.基于跨连卷积神经网络的性别分类模型.自动化学报, 2016, 42(6):858-865 http://www.aas.net.cn/CN/abstract/abstract18876.shtmlZhang Ting, Li Yu-Jian, Hu Hai-He, Zhang Ya-Hong. A gender classification model based on cross-connected convolutional neural networks. Acta Automatica Sinica, 2016, 42(6):858-865 http://www.aas.net.cn/CN/abstract/abstract18876.shtml [28] Kumbhar M, Jadhav A, Patil M. Facial expression recognition based on image feature. International Journal of Computer and Communication Engineering, 2012, 1(2):117-119 https://www.researchgate.net/publication/250922449_Facial_Expression_Recognition_Based_on_Image_Feature