周风余 尹建芹 杨阳 张海婷 袁宪锋

周风余, 尹建芹, 杨阳, 张海婷, 袁宪锋. 基于时序深度置信网络的在线人体动作识别. 自动化学报, 2016, 42(7): 1030-1039. doi: 10.16383/j.aas.2016.c150629
doi: 10.16383/j.aas.2016.c150629

国家自然科学基金 61203341

山东省自然科学基金重点项目 ZR2015QZ08

国家自然科学基金 61375084








Online Recognition of Human Actions Based on Temporal Deep Belief Neural Network


National Natural Science Foundation of China 61203341

Key Program of Natural Science Foundation of Shandong Province ZR2015QZ08

National Natural Science Foundation of China 61375084

    Professor at the School of Control Science and Engineering, Shandong University.He received his Ph.D.degree from Tianjin University in 2008.His main research interest is technology of intelligent robot

    Lecturer at the School of Information Science and Technology, Shandong University.He received his Ph.D.degree from the School of Information Science and Technology, Shandong University in 2009.His research interest covers image processing and object tracking

    Master student at the School of Control Science and Engineering, Shandong University. She received her bachelor degree from Shandong University in 2011.Her research interest covers deep learning and image processing

    Ph.D.candidate at the School of Control Science and Engineering, Shandong University. He received his bachelor degree from Shandong University in 2011.His research interest covers machine learning and service robot

    Corresponding author: YIN Jian-Qin Associate professor at the School of Information Science and Technology, Jinan University.She received her Ph.D.degree from the School of Control Science and Engineering, Shandong University in 2013. Her research interest covers image processing and machine learning.Corresponding author of this paper.
  • 摘要: 在线人体动作识别是人体动作识别的最终目标,但由于如何分割动作序列是一个待解决的难点问题,因此目前大多数人体动作识别方法仅关注在分割好的动作序列中进行动作识别,未关注在线人体动作识别问题.本文针对这一问题,提出了一种可以完成在线人体动作识别的时序深度置信网络(Temporal deep belief network, TDBN)模型.该模型充分利用动作序列前后帧提供的上下文信息,解决了目前深度置信网络模型仅能识别静态图像的问题,不仅大大提高了动作识别的准确率,而且由于该模型不需要人为对动作序列进行分割,可以从动作进行中的任意时刻开始识别,实现了真正意义上的在线动作识别,为实际应用打下了较好的理论基础.
  • 图  1  条件限制玻尔兹曼机结构

    Fig.  1  The structure of conditional restricted Boltzmann machines

    图  2  时序深度置信网络结构

    Fig.  2  The structure of the temporal deep belief network

    图  3  MIT数据库关节示意图

    Fig.  3  Illustration of the skeleton of MIT

    图  4  CRBM学习过程流程图

    Fig.  4  Flowchart of the learning of CRBM

    图  5  全局微调流程图

    Fig.  5  Flowchart of the global weights adjustment

    图  6  MIT数据库的识别结果

    Fig.  6  Recognition results on MIT datasets

    图  7  MIT数据库的混淆矩阵

    Fig.  7  Confusion matrix of MIT dataset

    图  8  CRBM的权重分布示意图

    Fig.  8  Illustration of the distribution of the weights of CRBM

    图  9  3D数据库动作示意图

    Fig.  9  Illustration of the action of MSR Action 3D

    图  10  3D数据库关节示意图

    Fig.  10  Illustration of the Skeleton of MSR Action 3D

    图  11  MSR Action 3D数据库 $AS1_{2}$的混淆矩阵

    Fig.  11  Confusion matrix of MSR Action 3D of $AS1_{2}$

    表  1  测试1和测试2中整个序列的识别结果(%)

    Table  1  Results of the sequences (%)

    ASl1 AS21 AS31 AS12 AS22 AS32
    本文一CRBM 92.23 89.46 92.05 95.62 93.42 95.67
    本文一TDBN 96.67 92.81 96.68 99.33 97.44 99.87
    Li等[2] 89.5 89.0 96.3 93.4 92.9 96.3
    Xia等[19] 98.47 96.67 93.47 98.61 97.92 94.93
    Yang等[3] 97.3 92.2 98.0 98.7 94.7 98.7
    表  2  测试3中本文算法与其他算法的比较(%)

    Table  2  Comparisons between our method

    ASl1 AS21 AS31 Average
    Li等[2] 72.9 71.9 79.2 74.7
    Chen等[21] 96.2 83.2 92.0 90.47
    Gowayyed等[22] 92.39 90.18 91.43 91.26
    Vemulapalli等[23] 95.29 83.87 98.22 92.46
    Du等[13] 93.33 94.64 95.50 94.49
    TDBN 97.01 94.22 98.34 96.52
    表  3  前5帧的识别结果(%)

    Table  3  Recognition results of the first 5 sequences (%)

    ASl1 AS21 AS31 AS12 AS22 AS32
    本文 79.84 79.35 82.93 90.78 92.76 94.66
    Yang等[3] 67±1 67±1 74±1 77±1 75±1 82±1
    表  4  全部实验识别结果(%)

    Table  4  All recognition results (%)

    1 5 整个动作
    ASl1 77.55 79.84 96.67
    AS21 78.01 79.35 92.81
    AS31 81.60 82.93 96.68
    平均 79.05 80.71 95.39
    ASl2 89.74 90.78 99.33
    AS22 90.78 92.76 97.44
    AS32 93.00 94.66 99.87
    平均 91.17 92.73 98.88
    表  5  不同阶数的识别时间(ms)

    Table  5  Recognition time with different orders (ms)

    Action n
    0 1 2 3 4 5 6
    Horizontal arm wave 4.45 9.78 12.56 14.61 17.21 19.45 23.51
    Hammer 3.67 9.89 11.89 14.39 17.12 19.78 22.13
    Forward punch 3.79 10.03 12.54 14.48 17.49 20.01 22.56
    High throw 3.96 9.92 12.48 14.68 17.73 19.21 22.67
    Hand clap 4.13 9.99 12.49 14.63 17.62 19.84 22.78
    Bend 4.78 9.79 12.34 14.61 17.94 19.47 21.87
    Tennis serve 4.56 9.67 12.52 14.65 17.56 19.49 22.46
    Pickup and throw 3.71 9.97 12.67 14.51 17.83 19.92 22.81
    表  6  不同阶数的识别率(%)

    Table  6  Recognition rates with different orders (%)

    Action n
    0 1 2 3 4 5 6
    Horizontal arm wave 81.56 85.39 89.12 90.03 91.45 89.97 87.68
    Hammer 82.56 86.48 85.34 87.50 86.84 88.10 87.98
    Forward punch 73.67 76.45 78.78 79.19 77.87 78.16 78.45
    High throw 72.78 73.46 76.98 79.92 79.23 78.89 76.75
    Hand clap 87.65 93.78 98.34 98.65 97.85 96.12 96.23
    Bend 80.13 81.35 84.56 86.43 86.72 85.97 83.85
    Tennis serve 88.74 91.67 92.89 93.67 93.35 92.89 92.54
    Pickup and throw 83.81 86.34 86.94 88.34 87.13 87.67 97.56
    Average 81.36 84.37 86.62 87.97 87.56 87.22 87.63
