2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于长短记忆与信息注意的视频-脑电交互协同情感识别

刘嘉敏 苏远歧 魏平 刘跃虎

刘嘉敏, 苏远歧, 魏平, 刘跃虎.基于长短记忆与信息注意的视频-脑电交互协同情感识别.自动化学报, 2020, 46(10): 2137-2147 doi: 10.16383/j.aas.c180107
引用本文: 刘嘉敏, 苏远歧, 魏平, 刘跃虎.基于长短记忆与信息注意的视频-脑电交互协同情感识别.自动化学报, 2020, 46(10): 2137-2147 doi: 10.16383/j.aas.c180107
Liu Jia-Min, Su Yuan-Qi, Wei Ping, Liu Yue-Hu. Video-EEG based collaborative emotion recognition using LSTM and information-attention. Acta Automatica Sinica, 2020, 46(10): 2137-2147 doi: 10.16383/j.aas.c180107
Citation: Liu Jia-Min, Su Yuan-Qi, Wei Ping, Liu Yue-Hu. Video-EEG based collaborative emotion recognition using LSTM and information-attention. Acta Automatica Sinica, 2020, 46(10): 2137-2147 doi: 10.16383/j.aas.c180107

基于长短记忆与信息注意的视频-脑电交互协同情感识别

doi: 10.16383/j.aas.c180107
基金项目: 

国家自然科学基金 91520301

详细信息
    作者简介:

    刘嘉敏  西安交通大学硕士研究生.主要研究方向为人机交互, 多模态情感识别, 增强学习. E-mail: ljm.168@stu.xjtu.edu.cn

    魏平  西安交通大学副教授.主要研究方向为计算机视觉, 机器学习, 认知计算. E-mail: pingwei@xjtu.edu.cn

    刘跃虎  西安交通大学教授.主要研究方向为计算机视觉, 人机交互, 增强现实与仿真测试. E-mail: liuyh@mail.xjtu.edu.cn

    通讯作者:

    苏远歧  西安交通大学讲师.主要研究方向为图像处理, 计算机视觉, 计算机图形学.本文通信作者. E-mail: yuanqisu@mail.xjtu.edu.cn

  • 本文责任编委  张道强

Video-EEG Based Collaborative Emotion Recognition Using LSTM and Information-Attention

Funds: 

National Natural Science Foundation of China 91520301

More Information
    Author Bio:

    LIU Jia-Min  Master student at Xi'an Jiaotong University. Her research interest covers human-computer interaction, multi-modal emotion recognition, and reinforcement learning

    WEI Ping  Associate professor at Xi'an Jiaotong University. His research interest covers computer vision, machine learning, and computational cognition

    LIU Yue-Hu  Professor at Xi'an Jiaotong University. His research interest covers computer vision, humancomputer interaction, augmented reality and simulation testing

    Corresponding author: SU Yuan-Qi  Lecturer at Xi'an Jiaotong University. His research interest covers image processing, computer vision, and computer graphics. Corresponding author of this paper
  • Recommended by Associate Editor ZHANG Dao-Qiang
  • 摘要: 基于视频-脑电信号交互协同的情感识别是人机交互重要而具有挑战性的研究问题.本文提出了基于长短记忆神经网络(Long-short term memory, LSTM)和注意机制(Attention mechanism)的视频-脑电信号交互协同的情感识别模型.模型的输入是实验参与人员观看情感诱导视频时采集到的人脸视频与脑电信号, 输出是实验参与人员的情感识别结果.该模型在每一个时间点上同时提取基于卷积神经网络(Convolution neural network, CNN)的人脸视频特征与对应的脑电信号特征, 通过LSTM进行融合并预测下一个时间点上的关键情感信号帧, 直至最后一个时间点上计算出情感识别结果.在这一过程中, 该模型通过空域频带注意机制计算脑电信号${\alpha}$波, ${\beta}$波与${\theta}$波的重要度, 从而更加有效地利用脑电信号的空域关键信息; 通过时域注意机制, 预测下一时间点上的关键信号帧, 从而更加有效地利用情感数据的时域关键信息.本文在MAHNOB-HCI和DEAP两个典型数据集上测试了所提出的方法和模型, 取得了良好的识别效果.实验结果表明本文的工作为视频-脑电信号交互协同的情感识别问题提供了一种有效的解决方法.
    Recommended by Associate Editor ZHANG Dao-Qiang
    1)  本文责任编委  张道强
  • 图  1  二维情感表示模型

    Fig.  1  The two-dimensional expression of emotion

    图  2  视频-脑电信号交互协同的情感识别模型

    Fig.  2  The overall architecture of multi-modal emotion

    图  3  人脸视频与脑电信号的特征提取过程

    Fig.  3  The process of bi-modal feature extraction

    图  4  人脸视频帧(Frame78)的卷积层特征图

    Fig.  4  The feature maps of three convolution layers on Frame78

    图  5  人脑电信号可视化示意图(从上到下:人脸视频帧; 对应的脑电信号可视化图; ${\alpha}$波可视化图; ${\beta}$波可视化图; ${\theta}$波可视化图.从左到右:情感信号第31帧; 第78帧; 第90帧; 第98帧; 第118帧)

    Fig.  5  The visualization of EEG signals (From top to down: video frames; the visualization of corresponding EEG signals; the visualization of ${\alpha}$ wave; the visualization of ${\beta}$ wave; the visualization of ${\theta}$ wave. From left to right: the 31st frame; the 78th frame; the 90th frame; the 98th frame; the 118th frame in the emotion data)

    图  6  基于LSTM与注意机制的交互协同过程

    Fig.  6  The process of emotion recognition based on LSTM and attention mechanism

    图  7  本文模型在MANNOB-HCI数据集上的可视化识别结果(从上到下分别为三组情感数据中的人脸视频.从左到右分别为情感数据; Groundtruth与本文模型的识别结果)

    Fig.  7  The visualization of results of the proposed model on MAHNOB-HCI dataset (From up to down: three groups of emotion data. From left to right: emotion data; the groundtruth and results of the proposed model)

    图  8  高激活度数据样本注意机制可视化结果

    Fig.  8  The presentation of the band attention weights on EEG signals and the temporal attention policy for a "nervous" man with high arousal

    图  9  低激活度数据样本注意机制可视化结果

    Fig.  9  The presentation of the band attention weights on EEG signals and the temporal attention policy for a "nervous" man with low arousal

    表  1  激活度和效价值的三分类

    Table  1  Valence and arousal class with range

    激活度效价值
    Low1~4.51~4.5
    Medium4.5~5.54.5~5.5
    High5.5~95.5~9
    下载: 导出CSV

    表  2  不同方法在MAHNOB-HCI数据集与DEAP数据集上的识别效果

    Table  2  The recognition result of different methods on MAHNOB-HCI dataset and DEAP dataset

    激活度效价值
    CR ($\%$)F1-${score}$CR ($\%$)F1-${score}$
    Baseline[15](MAHNOB-HCI)67.70.620${\bf{76.1}}$${\bf{0.740}}$
    Koelstra et al.[10] (MAHNOB-HCI)72.50.70973.00.718
    Huang et al.[11] (MAHNOB-HCI)63.266.3
    VGG-16+本文模型(MAHNOB-HCI)${\bf{73.1}}$${\bf{0.723}}$74.50.730
    VGG-16+本文模型(DEAP)${\bf{85.8}}$${\bf{84.3}}$
    下载: 导出CSV

    表  3  本文提出的情感识别模型的识别准确率和F1-${score}$(MAHNOB-HCI数据集)

    Table  3  The classification rate and F1-${score}$ of ablation studies on MAHNOB-HCI dataset

    激活度效价值
    CR ($\%$)F1-${score}$CR ($\%$)F1-${score}$
    w/o band and temp66.40.65068.90.678
    w/o band70.90.69073.00.711
    w/o temporal69.70.68070.40.695
    vis-EEG-LSTM${\bf{73.1}}$${\bf{0.723}}$${\bf{74.5}}$${\bf{0.730}}$
    下载: 导出CSV

    表  4  本文提出的情感识别模型的识别准确率和F1-${score}$ (DEAP数据集)

    Table  4  The classification rate and F1-${score}$ of ablation studies on DEAP dataset

    激活度效价值
    CR ($\%$)F1-${score}$CR ($\%$)F1-${score}$
    w/o band and temp79.10.77478.50.770
    w/o band83.10.81682.50.809
    w/o temporal78.10.75481.40.805
    vis-EEG-LSTM${\bf{85.8}}$${\bf{0.837}}$${\bf{84.3}}$${\bf{0.831}}$
    下载: 导出CSV

    表  5  两种单模态情感识别与多模态情感识别的识别准确率和F1-${score}$ (MAHNOB-HCI数据集)

    Table  5  The classification rate and F1-${score}$ of uni-modal and bi-modal emotion recognition on MAHNOB-HCI dataset

    激活度效价值
    CR ($\%$)F1-${score}$CR ($\%$)F1-${score}$
    人脸视频70.80.69172.90.711
    脑电信号69.90.67373.30.720
    人脸视频+脑电信号${\bf{73.1}}$${\bf{0.723}}$${\bf{74.5}}$${\bf{0.730}}$
    下载: 导出CSV

    表  6  两种单模态情感识别与多模态情感识别的识别准确率和F1-${score}$(DEAP数据集)

    Table  6  The classification rate and F1-${score}$ of uni-modal and bi-modal emotion recognition on DEAP dataset

    激活度效价值
    CR ($\%$)F1-${score}$CR ($\%$)F1-${score}$
    人脸视频67.10.65366.30.650
    脑电信号84.70.81583.40.819
    人脸视频+脑电信号${\bf{85.8}}$${\bf{0.837}}$${\bf{84.3}}$${\bf{0.831}}$
    下载: 导出CSV
  • [1] Bynion T M, Feldner M T. Self-Assessment Manikin. Berlin: Springer International Publishing, 2017. 1-3
    [2] Lin J C, Wu C H, Wei W L. Error weighted semi-coupled hidden Markov model for audio-visual emotion recognition. IEEE Transactions on Multimedia, 2012, 14(1): 142-156 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=267b730d55360c483c4723906b231f35
    [3] Jiang D, Cui Y, Zhang X, Fan P, Ganzale I, Sahli H. Audio visual emotion recognition based on triple-stream dynamic bayesian network models. In: Proceedings of the 2011 International Conference on Affective Computing and Intelligent Interaction. Berlin, GER: Springer-Verlag, 2011. 609-618
    [4] Xie Z, Guan L. Multimodal information fusion of audio emotion recognition based on kernel entropy component analysis. International Journal of Semantic Computing, 2013, 7(1): 25-42 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=10.1142_S1793351X13400023
    [5] Khorrami P, Le Paine T, Brady K. How deep neural networks can improve emotion recognition on video data. In: Proceedings of the 2016 IEEE International Conference on Image Processing. New York, USA: IEEE, 2016. 619-623
    [6] Liu J, Su, Y, Liu, Y. Multi-modal emotion recognition with temporal-band attention based on lstm-rnn. In: Proceedings of the 2017 Pacific Rim Conference on Multimedia. Berlin, GER: Springer, 2017. 194-204
    [7] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of the 2012 Annual Conference on Neural Information Processing Systems. Massachusetts, USA: MIT Press, 2012. 1097-1105
    [8] Sak H, Senior A, Beaufays F. Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. arXiv preprint arXiv: 1402.1128, 2014.
    [9] He L, Jiang D, Yang L, Pei E, Wu P, Sahli H. Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks. In: Proceedings of the 2015 International Workshop on Audio/visual Emotion Challenge. New York, USA: ACM, 2015. 73-80
    [10] Koelstra S, Patras I. Fusion of facial expressions and EEG for implicit affective tagging. Image and Vision Computing, 2013, 31(2): 164-174 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=04f03f0fd646221b3872240005017d1c
    [11] Huang X, Kortelainen J, Zhao G, Li X, Moilanen A, Seppanen T, Pietikainen M. Multi-modal emotion analysis from facial expressions and electroencephalogram. Computer Vision and Image Understanding, 2016, 147: 114-124 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=c0dd5236bcdae70bcbb065ddb2279f4a
    [12] Zhalehpour S, Akhtar Z, Erdem C E. Multimodal emotion recognition with automatic peak frame selection. In: Proceedings of the 2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications. New York, USA: IEEE, 2014. 116-121
    [13] Xu K, Ba J L, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R S, Bengio Y. Show, attend and tell: Neural image caption generation with visual attention. In: Proceedings of the 2015 International Conference on Machine Learning. New York, USA: ACM, 2015. 2048-2057
    [14] 刘畅, 刘勤让.使用增强学习训练多焦点聚焦模型.自动化学报, 2017, 43(9): 1563-1570 doi: 10.16383/j.aas.2017.c160643

    Liu Chang, Liu Qin-Rang. Using reinforce learning to train multi attention model. Acta Automatica Sinica, 2017, 43(9): 1563-1570 doi: 10.16383/j.aas.2017.c160643
    [15] Soleymani M, Lichtenauer J, Pun T, Pantic M. A multi-modal affective database for affect recognition and implicit tagging. IEEE Transactions on Affective Computing, 2012, 3(1): 42-55 http://dl.acm.org/citation.cfm?id=2197062
    [16] Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. In: Proceedings of the 2015 Advances in Neural Information Processing Systems. Massachusetts, USA: MIT Press, 2015. 91-99
    [17] Mowla M R, Ng S C, Zilany M S A, Paramesran R. Artifacts-matched blind source separation and wavelet transform for multichannel EEG denoising. Biomedical Signal Processing and Control, 2015, 22(3): 111-118 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=facdf66f1c48f19f20fba5c0f305d929
    [18] Bashivan P, Rish I, Yeasin M, Codella N. Learning representations from EEG with deep recurrent-convolutional neural networks. In: Proceedings of the 2016 International Conference on Learning Representation. San Juan, Puerto Rico: ICLR, 2016.
    [19] Anzai Y. Pattern Recognition and Machine Learning. Elsevier, 2012.
    [20] Lei T, Barzilay R, Jaakkola T. Rationalizing neural predictions. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. British Columbia, Canada: ACL, 2016. 107-117
    [21] Yu A W, Lee H, Le Q V. Learning to skim text. arXiv preprint arXiv: 1704.06877, 2017.
    [22] Rubinstein R Y, Kroese D P. Simulation and the Monte Carlo Method. John Wiley & Sons, 2008. 167-168 http://dl.acm.org/citation.cfm?id=539488
    [23] Koelstra S, Muhl C, Soleymani M, Lee S, Yazdani A, Ebrahimi T, Pun T, Nijholt A, Patras I. Deap: A database for emotion analysis using physiological signals. IEEE Transactions on Affective Computing, 2012, 3(1): 18-31 http://ieeexplore.ieee.org/document/5871728/
    [24] Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv: 1412.6980, 2014.
    [25] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv: 1409.1556, 2014.
  • 加载中
图(9) / 表(6)
计量
  • 文章访问数:  161
  • HTML全文浏览量:  93
  • PDF下载量:  94
  • 被引次数: 0
出版历程
  • 收稿日期:  2018-02-26
  • 录用日期:  2018-10-06
  • 刊出日期:  2020-10-29

目录

    /

    返回文章
    返回