基于ROI-KNN卷积神经网络的面部表情识别

孙晓; 潘汀; 任福继

doi:10.16383/j.aas.2016.c150638

基于ROI-KNN卷积神经网络的面部表情识别

doi: 10.16383/j.aas.2016.c150638 cstr: 32138.14.j.aas.2016.c150638

孙晓^1, ,,
潘汀^1,,
任福继^1,2,

1.
合肥工业大学计算机与信息学院合肥 230009 中国
2.
德岛大学智能信息工学部德岛 7708500 日本

基金项目:

安徽省自然科学基金 1508085QF119

合肥工业大学2015年国家省级大学生创新训练计划项目( 2015cxcys109

国家自然科学基金重点项目 61432004

中国博士后科学基金 2015M580532

模式识别国家重点实验室开放课题 NLPR201407345

详细信息

作者简介:
潘汀合肥工业大学计算机与信息学院本科生. 主要研究方向为深度学习, 贝叶斯学习理论及其在计算机视觉与自然语言处理方面的应用

任福继合肥工业大学计算机与信息学院情感计算研究所教授, 德岛大学教授. 主要研究方向为人工智能, 情感计算, 自然语言处理, 机器学习与人机交互. E-mail:ren@is.tokushima-u.ac.jp

通讯作者:
孙晓合肥工业大学计算机与信息学院情感计算研究所副教授. 主要研究方向为自然语言处理与情感计算, 机器学习与人机交互. 本文通信作者. E-mail:sunx@hfut.edu.cn

中图分类号:
计量
- 文章访问数: 3860
- HTML全文浏览量: 1529
- PDF下载量: 2719
- 被引次数: 0
出版历程
- 收稿日期: 2015-10-12
- 录用日期: 2016-04-01
- 刊出日期: 2016-06-20

Facial Expression Recognition Using ROI-KNN Deep Convolutional Neural Networks

SUN Xiao^{1
, ,},
PAN Ting^1
,,
REN Fu-Ji^{1,2
,}

1.
School of Computer and Information, Hefei University of Technology, Hefei 230009, China;
2.
Department of Information Science and Intelligent Systems, Faculty of Engineering, Tokushima University, Tokushima 7708500, Japan

Funds:

the Natural Science Foundation of Anhui Province 1508085QF119

National Training Program of Innovation and Entrepreneurship for HFUT Undergraduates 2015cxcys109

Key Program of National Natural Foundation Science of China 61432004

China Postdoctoral Science Foundation 2015M580532

Open Project Program of the National Laboratory of Pattern Recognition NLPR201407345

More Information

Author Bio:
PAN Ting Bachelor student at the School of Computer Science and Infor- mation, Hefei University of Technology. His research interest covers the theory of deep learning and Bayesian learning, and corresponding applications in com- puter vision and natural language processing

REN Fu-Ji Professor at the Insti- tute of A®ective Computing, Hefei Uni- versity of Technology and Tokushima University. His research interest coves arti¯cial intelligent, a®ective computing, natural language processing, machine learning, and human-machine interaction.

Corresponding author: SUN Xiao Associate professor at the Institute of A®ective Computing, Hefei University of Technology. His re- search interest covers natural language processing, a®ective computing, machine learning and human-machine interac- tion. Corresponding author of this paper. E-mail:sunx@hfut.edu.cn

摘要

摘要: 深度神经网络已经被证明在图像、语音、文本领域具有挖掘数据深层潜在的分布式表达特征的能力. 通过在多个面部情感数据集上训练深度卷积神经网络和深度稀疏校正神经网络两种深度学习模型, 对深度神经网络在面部情感分类领域的应用作了对比评估. 进而, 引入了面部结构先验知识, 结合感兴趣区域(Region of interest, ROI)和K最近邻算法(K-nearest neighbors, KNN), 提出一种快速、简易的针对面部表情分类的深度学习训练改进方案——ROI-KNN, 该训练方案降低了由于面部表情训练数据过少而导致深度神经网络模型泛化能力不佳的问题, 提高了深度学习在面部表情分类中的鲁棒性, 同时, 显著地降低了测试错误率.
- 卷积神经网络 /
- 面部情感识别 /
- 模型泛化 /
- 先验知识
Abstract: Deep neural networks have been proved to be able to mine distributed representation of data including image, speech and text. By building two models of deep convolutional neural networks and deep sparse rectifier neural networks on facial expression dataset, we make contrastive evaluations in facial expression recognition system with deep neural networks. Additionally, combining region of interest (ROI) and K-nearest neighbors (KNN), we propose a fast and simple improved method called "ROI-KNN" for facial expression classification, which relieves the poor generalization of deep neural networks due to lacking of data and decreases the testing error rate apparently and generally. The proposed method also improves the robustness of deep learning in facial expression classification.
- Convolution neural networks /
- facial expression recognition /
- model generalization /
- prior knowledge

HTML全文

图 1 CK+与Wild数据集样例

Fig. 1 Samples from CK+ and Wild

下载: 全尺寸图片幻灯片

图 2 输入空间的流形面

Fig. 2 Manifold side of input space

下载: 全尺寸图片幻灯片

图 3 卷积神经网络的局部块状连接与基本结构

Fig. 3 Local connection and structure of convolutional neural network (CNN)

下载: 全尺寸图片幻灯片

图 4 不同激活函数的函数图像 (图片源自Glorot^[11]

Fig. 4 Graphs for different activation functions from Glorot^[11]

下载: 全尺寸图片幻灯片

图 5 深度卷积神经网络的结构

？表示不确定超参数,有多种优选方案.

Fig. 5 Structure of DNN

? represents uncertain parameters with many candidate solutions.

下载: 全尺寸图片幻灯片

图 6 深度稀疏校正网络的结构

Fig. 6 Structure of deep sparse rectifier net

下载: 全尺寸图片幻灯片

图 7 9个ROI区域(切割、翻转、遮盖、中心聚焦

Fig. 7 Nine ROI regions (cut,flip,cover,center focus

下载: 全尺寸图片幻灯片

表 1 ROI辅助评估的测试集错误率 (%)

Table 1 Test set error rate of ROI auxiliary (%)

	中性	高兴	悲伤	惊讶	愤怒	整体
CNN-64	4.7	32.7	54.3	33	40.3	33.3
CNN-64^*	5.6	36.3	59.3	20.0	31.7	30.6
CNN-96^*	5.0	36.7	53.3	20.7	24.7	28.6
CNN-128	3.3	32.0	51.0	27.0	37.7	30.2
CNN-128^*	3.0	31.0	55.7	18.7	24.3	26.6
DNN-1000	3.0	37.7	65.3	38.3	36.7	36.2
DNN-1000^*	2.3	39.0	52.0	30.0	31.7	31.0
DNN-2000*	2.0	43.3	55.0	24.7	32.7	31.5

下载: 导出CSV

表 2 旋转生成样本评估的测试集错误率 (%)

Table 2 Test set error rate of rotating generated sample(%)

	中性	高兴	悲伤	惊讶	愤怒	整体
CNN-128	3.3	32.0	51.0	27.0	37.7	30.2
CNN-128^*	4.7	41.3	52.7	32.7	35.0	33.2
CNN-128+	3.0	37.0	51.7	15.7	24.0	26.3
CNN-128^	0.0	30.0	54.0	13.0	26.7	24.7
DNN-1000	3.0	37.7	65.3	38.3	36.7	36.2
DNN-1000^*	1.3	39.7	62.0	37.3	42.0	36.5
DNN-1000+	2.3	41.3	57.0	30.0	35.7	33.3
DNN-1000^	1.3	43.0	67.7	31.0	33.7	35.3

下载: 导出CSV

表 3 ROI-KNN辅助评估的测试集错误率 (%)

Table 3 Test set error rate with ROI-KNN (%)

	中性	高兴	悲伤	惊讶	愤怒	整体
CNN-64	5.6	36.3	59.3	20.0	31.7	30.6
CNN-64*	1.0	29.7	56.0	17.0	30.0	26.7
CNN-96	5.0	36.7	53.3	20.7	24.7	28.6
CNN-96*	0.3	26.0	56.3	16.0	26.7	25.8
CNN-128	3.0	31.0	55.7	18.7	24.3	26.6
CNN-128*	0.6	22.7	57.0	12.0	26.3	23.7
DNN-1000	2.3	39.0	52.0	30.0	31.7	31.0
DNN-1000*	0.3	37.3	61.0	31.7	31.0	32.2
DNN-2000	2.0	43.3	55.0	24.7	32.7	31.5
DNN-2000*	0.3	40.0	68.0	26.3	33.3	33.6

下载: 导出CSV

表 4 在JAFFE上的模型对比

Table 4 Comparisons on JAFFE

来源模型	方法	错误率(%)
Kumbhar 等^[20]	Image feature	30»40
Lekshmi 等^[21]	SVM	13.1
Zhao 等^[22]	PCA and NMF	6.28
Zhi 等^[23]	2D-DLPP	4.09
Lee 等^[24]	RDA	3.3
本文	ROI-KNN+CNN 2.81

下载: 导出CSV

参考文献(25)

[1]	Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25. Lake Tahoe, Nevada, USA: Curran Associates, Inc., 2012. 1097-1105
[2]	Lopes A T, de Aguiar E, Oliveira-Santos T. A facial expression recognition system using convolutional networks. In: Proceedings of the 28th SIBGRAPI Conference on Graphics, Patterns and Images. Salvador: IEEE, 2015. 273-280
[3]	Lucey P, Cohn J F, Kanade T, Saragih J, Ambadar Z, Matthews I. The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). San Francisco, CA: IEEE, 2010. 94-101
[4]	Bishop C M. Pattern Recognition and Machine Learning. New York: Springer, 2007.
[5]	Bengio Y. Learning deep architectures for AI. Foundations and Trends in Machine Learning. Hanover, MA, USA: Now Publishers Inc., 2009. 1-127
[6]	LeCun Y, Boser B, Denker J S, Howard R E, Hubbard W, Jackel L D, Henderson D. Handwritten digit recognition with a back-propagation network. In: Proceedings of Advances in Neural Information Processing Systems 2. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1990. 396-404
[7]	Fukushima K. Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 1980, 36(4): 193-202
[8]	Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors. Nature, 1986, 323(6088): 533-536
[9]	LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278-2324
[10]	Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA: IEEE, 2015. 1-9
[11]	Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS). Fort Lauderdale, FL, USA, 2011, 15: 315-323
[12]	Barron A R. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 1993, 39(3): 930-945
[13]	Hubel D H, Wiesel T N, LeVay S. Visual-field representation in layer IV C of monkey striate cortex. In: Proceedings of the 4th Annual Meeting, Society for Neuroscience. St. Louis, US, 1974. 264
[14]	Dayan P, Abott L F. Theoretical Neuroscience. Cambridge: MIT Press, 2001.
[15]	Attwell D, Laughlin S B. An energy budget for signaling in the grey matter of the brain. Journal of Cerebral Blood Flow and Metabolism, 2001, 21(10): 1133-1145
[16]	Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv: 1207.0580, 2012.
[17]	Darwin C. On the Origin of Species. London: John Murray, Albemarle Street, 1859.
[18]	Xavier G, Yoshua B. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS 2010). Chia Laguna Resort, Sardinia, Italy, 2010, 9: 249-256
[19]	Sun Y, Wang X, Tang X. Deep learning face representation from predicting 10000 classes. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, OH: IEEE, 2014. 1891-1898
[20]	Kumbhar M, Jadhav A, Patil M. Facial expression recognition based on image feature. International Journal of Computer and Communication Engineering, 2012, 1(2): 117-119
[21]	Lekshmi V P, Sasikumar M. Analysis of facial expression using Gabor and SVM. International Journal of Recent Trends in Engineering, 2009, 1(2): 47-50
[22]	Zhao L H, Zhuang G B, Xu X H. Facial expression recognition based on PCA and NMF. In: Proceedings of the 7th World Congress on Intelligent Control and Automation. Chongqing, China: IEEE, 2008. 6826-6829
[23]	Zhi R C, Ruan Q Q. Facial expression recognition based on two-dimensional discriminant locality preserving projections. Neurocomputing, 2008, 71(7-9): 1730-1734
[24]	Lee C C, Huang S S, Shih C Y. Facial affect recognition using regularized discriminant analysis-based algorithms. EURASIP Journal on Advances in Signal Processing, 2010, article ID 596842(doi: 10.1155/2010/596842)
[25]	Bastien F, Lamblin P, Pascanu R, Bergstra J, Goodfellow I J, Bergeron A, Bouchard N, Warde-Farley D, Bengio Y. Theano: new features and speed improvements. In: Conference on Neural Information Processing Systems (NIPS) Workshop on Deep Learning and Unsuper Vised Feature Learning. Lake Tahoe, US, 2012.