2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于注意力胶囊网络的家庭活动识别

王金甲 纪绍男 崔琳 夏静 杨倩

王金甲, 纪绍男, 崔琳, 夏静, 杨倩. 基于注意力胶囊网络的家庭活动识别. 自动化学报, 2019, 45(11): 2199-2204. doi: 10.16383/j.aas.c180721
引用本文: 王金甲, 纪绍男, 崔琳, 夏静, 杨倩. 基于注意力胶囊网络的家庭活动识别. 自动化学报, 2019, 45(11): 2199-2204. doi: 10.16383/j.aas.c180721
WANG Jin-Jia, JI Shao-Nan, CUI Lin, XIA Jing, YANG Qian. Domestic Activity Recognition Based on Attention Capsule Network. ACTA AUTOMATICA SINICA, 2019, 45(11): 2199-2204. doi: 10.16383/j.aas.c180721
Citation: WANG Jin-Jia, JI Shao-Nan, CUI Lin, XIA Jing, YANG Qian. Domestic Activity Recognition Based on Attention Capsule Network. ACTA AUTOMATICA SINICA, 2019, 45(11): 2199-2204. doi: 10.16383/j.aas.c180721

基于注意力胶囊网络的家庭活动识别

doi: 10.16383/j.aas.c180721
基金项目: 

首批“河北省青年拔尖人才”项目 [2013]17

国家自然科学基金 61473339

京津冀基础研究合作专项 F2019203583

详细信息
    作者简介:

    纪绍男  燕山大学信息科学与工程学院硕士研究生.主要研究方向为信号与信息处理.E-mail:jsn1533915375@163.com

    崔琳  燕山大学信息科学与工程学院硕士研究生.主要研究方向为信息处理.E-mail:15733598690@163.com

    夏静  燕山大学信息科学与工程学院硕士研究生.主要研究方向为信号处理.E-mail:xiajing_527@sina.com

    杨倩  燕山大学信息科学与工程学院硕士研究生.主要研究方向为模式识别.E-mail:yqlhp@sina.cn

    通讯作者:

    王金甲   燕山大学信息科学与工程学院教授.主要研究方向为信号处理和模式识别.本文通信作者.E-mail:wjj@ysu.edu.cn

Domestic Activity Recognition Based on Attention Capsule Network

Funds: 

The First Batch of "Top Young Talents in Hebei Province" [2013]17

National Natural Science Foundation of China 61473339

Basic Research Cooperation Projects of Beijing, Tianjin and Hebei F2019203583

More Information
    Author Bio:

      Master student at the School of Information Science and Engineering, Yanshan University. His research interest covers signal and information processing

       Master student at the School of Information Science and Engineering, Yanshan University. Her research interest is information processing

       Master student at the School of Information Science and Engineering, Yanshan University. Her research interest is signal processing

       Master student at the School of Information Science and Engineering, Yanshan University. Her research interest is pattern recognition

    Corresponding author: WANG Jin-Jia   Professor at the School of Information Science and Engineering, Yanshan University. His research interest covers signal processing and pattern recognition. Corresponding author of this paper
  • 摘要: 本文提出了一种注意力胶囊网络的新框架利用录音识别家庭活动.胶囊网络可以通过动态路由算法来选择基于每个声音事件的代表性频带.为了进一步提高其能力,我们在胶囊网络中加入注意力机制,它通过加权来增加对重要时间帧的关注.为了评估我们的方法,我们在声学场景和事件的检测和分类(Detection and Classification of Acoustic Scenes and Events,DCASE)2018挑战任务5数据集上进行测试.结果表明,F1平均得分可达92.1%,优于几个基线方法的F1得分.
    Recommended by Associate Editor WU Jian-Xin
    1)  本文责任编委 吴建鑫
  • 图  1  胶囊路由的概念图

    Fig.  1  Conceptual diagram of capsule routing

    图  2  注意力胶囊网络模型

    Fig.  2  Attention capsule network model

    图  3  具有传感器节点的厨房和客厅混合的2D平面布置图

    Fig.  3  2D floorplan of the combined kitchen and living room with the used sensor nodes

    图  4  各类活动的对数Mel语谱图

    Fig.  4  Logmel spectrum of various activities

    表  1  开发集和评估集音频数量

    Table  1  Development set and evaluation set audio quantity

    活动 开发集样本数 评估集样本数
    缺席 18 860 21 112
    烹饪 5 124 4 221
    洗碗 1 424 1 477
    吃饭 2 308 2 100
    其他 2 060 1 960
    社会活动 4 944 3 815
    真空吸尘 972 868
    看电视 18 648 21 116
    工作 18 644 16 302
    总计 72 984 71 971
    下载: 导出CSV

    表  2  开发集上各模型的F1得分

    Table  2  F1 scores of each model on development dataset

    活动 基线系统 GCRNN GCRNN-att Caps Caps-att
    缺席 85.4 % 85.8 % 86.9 % 87.5 % 91.3 %
    烹饪 95.1 % 93.7 % 96.9 % 93.8 % 95.8 %
    洗碗 76.7 % 78.3 % 81.1 % 67.3 % 82.7 %
    吃饭 83.6 % 83.3 % 87.8 % 82.8 % 90.5 %
    其他 44.8 % 39.1 % 41.5 % 38.0 % 55.4 %
    社会活动 93.9 % 84.7 % 98.8 % 89.8 % 96.8 %
    真空吸尘 99.3 % 99.9 % 100.0 % 99.5 % 99.6 %
    看电视 99.6 % 98.7 % 99.8 % 100.0 % 99.9 %
    工作 82.0 % 84.1 % 84.4 % 84.3 % 87.6 %
    平均值 84.5 % 86.9 % 87.8 % 87.3 % 92.1 %
    下载: 导出CSV

    表  3  评估集上各模型F1得分

    Table  3  F1 scores of each model on evaluation dataset

    模型 F1得分
    基线系统 85.0 %
    GCRNN 86.5 %
    GCRNN-att 86.9 %
    Caps 86.6 %
    Caps-att 88.8 %
    下载: 导出CSV
  • [1] Rafferty J, Nugent C D, Liu J, Chen L. From activity recognition to intention recognition for assisted living within smart homes. IEEE Transactions on Human-Machine Systems, 2017, 47(3):368-379 doi: 10.1109/THMS.2016.2641388
    [2] Erden F, Velipasalar S, Alkar A Z, Cetin A E. Sensors in assisted living:a survey of signal and image processing methods. IEEE Signal Processing Magazine, 2016, 33(2):36-44 doi: 10.1109/MSP.2015.2489978
    [3] Phan H, Hertel L, Maass M, Koch P, Mazur R, Mertins A. Improved audio scene classification based on label-tree embeddings and convolutional neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017, 25(6):1278-1290 doi: 10.1109/TASLP.2017.2690564
    [4] 朱煜, 赵江坤, 王逸宁, 郑兵兵.基于深度学习的人体行为识别算法综述.自动化学报, 2016, 42(6):848-857 http://www.aas.net.cn/CN/abstract/abstract18875.shtml

    Zhu Yu, Zhao Jiang-Kun, Wang Yi-Ning, Zheng Bing-Bing. A review of human action recognition based on deep learning. Acta Automatica Sinica, 2016, 42(6):848-857 http://www.aas.net.cn/CN/abstract/abstract18875.shtml
    [5] Fonseca E, Gong R, Serra X. A simple fusion of deep and shallow learning for acoustic scene classification. In: Proceedings of the 15th Sound and Music Computing Conference. Limassol, Cyprus, 2018 http://www.researchgate.net/publication/325893998_A_Simple_Fusion_of_Deep_and_Shallow_Learning_for_Acoustic_Scene_Classification
    [6] Hershey S, Chaudhuri S, Ellis D P W, Gemmeke J F, Jansen A, Moore R C, et al. CNN architectures for large-scale audio classification. In: Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. Seoul, South Korea: IEEE, 2017. 131-135
    [7] Parascandolo G, Huttunen H, Virtanen T. Recurrent neural networks for polyphonic sound event detection in real life recordings. In: Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. Shanghai, China: IEEE, 2016. 6440-6444
    [8] Cakir E, Parascandolo G, Heittola T, Huttunen H, Virtanen T. Convolutional recurrent neural networks for polyphonic sound event detection. IEEE Transactions on Audio, Speech, and Language Processing, 2017, 25(6):1291-1303 doi: 10.1109/TASLP.2017.2690575
    [9] Xu Y, Kong Q Q, Huang Q, Wang W W, Plumbley M. D. Attention and localization based on a deep convolutional recurrent model for weakly supervised audio tagging. In: Proceedings of Interspeech 2017. Stockholm, Sweden: ISCA, 2017. 3083-3087
    [10] Xu Y, Kong Q Q, Wang W, Plumbley M D. Large-scale weakly supervised audio classification using gated convolutional neural network. In: Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Calgary, Alberta, Canada: IEEE, 2018. 121-125 http://www.researchgate.net/publication/322797180_LARGE-SCALE_WEAKLY_SUPERVISED_AUDIO_CLASSIFICATION_USING_GATED_CONVOLUTIONAL_NEURAL_NETWORK
    [11] Barker J, Marxer R, Vincent E, Watanabe S. Multi-microphone speech recognition in everyday environments. Computer Speech & Language, 2017, 26:386-387 http://www.sciencedirect.com/science/article/pii/S0885230817300475
    [12] Dekkers G, Vuegen L, Waterschoot T V, Vanrumste B, Karsmakers P. Dcase 2018 challenge-task 5: monitoring of domestic activities based on multi-channel acoustics.[Online], available: https://arxiv.org/pdf/1807.11246.pdf, August 1, 2018
    [13] Kong Q Q, Iqbal T, Xu Y, Wang W W, Plumbley M D. Dcase 2018 challenge surrey cross-task convolutional neural network baseline.[Online], available: https://arxiv.org/pdf/1808.00773.pdf, September 29, 2018
    [14] Tanabe R, Endo T, Nikaido Y, Ichige T, Nguyen P, Kawaguchi Y, et al.[Online], available: http://dcase.community/documents/challenge2018/technical_reports/DCASE2018_Tanabe_55.pdf, September 15, 2018
    [15] Inoue T, Vinayavekhin P, Wang S, Wood D, Greco N, Tachibana R.[Online], available: http://dcase.community/documents/challenge2018/technical_reports/DCASE2018_Inoue_14.pdf, September 15, 2018
    [16] Sabour S, Frosst N, Hinton G E. Dynamic routing between capsules. In: Proceedings of the 2017 Neural Information Processing Systems. Long Beach, CA, USA: NIPS, 2017. 3856-3866
    [17] Dauphin Y N, Fan A, Auli M, Grangier D. Language modeling with gated convolutional networks. In: Proceedings of the 2016 International Conference on Machine Learning. New York, USA: ACM, 2016. 933-941 http://www.researchgate.net/publication/311900760_Language_Modeling_with_Gated_Convolutional_Networks
    [18] Dekkers G, Lauwereins S, Thoen B, Adhana M W, Brouckxon H, Waterschoot T V, et al. The sins database for detection of daily activities in a home environment using an acoustic sensor network. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop. Munich, Germany: DCASE, 2017. 32-36
    [19] Kong Q Q, Xu Y, Wang W W, Plumbley M D. A joint separation-classification model for sound event detection of weakly labelled data. In: Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Calgary, Alberta, Canada: IEEE, 2018. 321-325
    [20] Kong Q Q, Xu Y, Sobieraj I, Wang W W, Plumbley M D (2019). Sound Event Detection and Time-Frequency Segmentation from Weakly Labelled Data. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 27(4):777-787 doi: 10.1109/TASLP.2019.2895254
  • 加载中
图(4) / 表(3)
计量
  • 文章访问数:  1709
  • HTML全文浏览量:  605
  • PDF下载量:  227
  • 被引次数: 0
出版历程
  • 收稿日期:  2018-11-12
  • 录用日期:  2019-04-15
  • 刊出日期:  2019-11-20

目录

    /

    返回文章
    返回