-
摘要: 基于视觉的人体动作质量评价利用计算机视觉相关技术自动分析个体运动完成情况, 并为其提供相应的动作质量评价结果. 这已成为运动科学和人工智能交叉领域的一个热点研究问题, 在竞技体育、运动员选材、健身锻炼、运动康复等领域具有深远的理论研究意义和很强的实用价值. 本文将从数据获取及标注、运动特征表示、运动质量评价3个方面对涉及到的技术进行回顾分析, 对相关方法进行分类, 并比较分析不同方法在AQA-7、JIGSAWS、EPIC-Skills 2018三个数据集上的性能. 最后讨论未来可能的研究方向.Abstract: Vision-based motion quality assessment utilizes computer vision techniques to analyze the quality of individual movement behavior automatically and provide the corresponding assessments of movement quality. It has gradually become the hot issues at the intersection of the sport science and artificial intelligence, and has widely used in the fields of sporting events, athlete selection, fitness and rehabilitation. This article conducts a retrospective analysis of the involved technologies from three aspects: Data acquisition and annotation, movement representation learning, and quality assessment models. It categorizes and compares various mainstream methods on three datasets: AQA-7, JIGSAWS, and EPIC-Skills 2018. Finally, potential future research directions are discussed.
-
Key words:
- Motion quality /
- assessment /
- computer vision /
- data acquisition /
- feature representation /
- loss function
-
表 1 基于视觉的动作质量评价方法不同阶段的主要任务及存在的问题
Table 1 Main tasks and existing challenges in different stages of vision-based action quality assessment
阶段 主要任务 存在的问题 运动数据获取 通过视觉传感器来收集和记录与运动相关的数据(RGB、深度图、骨架序列) 如何根据不同的应用场景选择适用的数据模态?如何确保专家的评分质量? 运动特征表示 综合利用静态图像和人体运动等多方面信息, 设计具有区分性的特征向量以描述人体的运动过程 如何根据动作质量评价任务本身的特性学习具有强鉴别性的运动特征, 以有效地抽取和表示不同运动者在执行相同动作时的细微差异? 运动质量评价 设计特征映射方式, 将提取的特征与相应的评分、评级或排序评价目标关联起来 如何在设计损失函数时考虑标注不确定性(如不同专家的评分差异)、同一动作之间的评分差异等问题? 表 2 主流的动作质量评估数据集总览
Table 2 Brief overview of action quality evaluation dataset
数据集 动作类别 样本数(受试者人数) 标注类别 应用场景 数据模态 发表年份 Heian Shodan[25] 1 14 评级标注 健身锻炼 3D骨架 2003 FINA09 Dive[26] 1 68 评分标注 体育赛事 RGB视频 2010 MIT-Dive[8] 1 159 评分标注、反馈标注 体育赛事 RGB视频 2014 MIT-Skate[8] 1 150 评分标注 体育赛事 RGB视频 2014 SPHERE-Staircase2014[10] 1 48 评级标注 运动康复 3D骨架 2014 JIGSAWS[9] 3 103 评级标注 技能训练 RGB视频、运动学数据 2014 SPHERE-Walking2015[16] 1 40 评级标注 运动康复 3D骨架 2016 SPHERE-SitStand2015[16] 1 109 评级标注 运动康复 3D骨架 2016 LAM Exercise Dataset[23] 5 125 评级标注 运动康复 3D骨架 2016 First-Person Basketball[27] 1 48 排序标注 健身锻炼 RGB视频 2016 UNLV-Dive[28] 1 370 评分标注 体育赛事 RGB视频 2017 UNLV-Vault[28] 1 176 评分标注 体育赛事 RGB视频 2017 UI-PRMD[20] 10 100 评级标注 运动康复 3D骨架 2018 EPIC-Skills 2018[24] 4 216 排序标注 技能训练 RGB视频 2018 Infant Grasp[29] 1 94 排序标注 技能训练 RGB视频 2019 AQA-7[30] 7 1189 评分标注 体育赛事 RGB视频 2019 MTL-AQA[31] 1 1412 评分标注 体育赛事 RGB视频 2019 FSD-10[32] 10 1484 评分标注 体育赛事 RGB视频 2019 Fis-V[33] 1 500 评分标注 体育赛事 RGB视频 2019 BEST 2019[32] 5 500 排序标注 技能训练 RGB视频 2019 KIMORE[22] 5 78 评分标注 康复运动 RGB、深度视频、3D骨架 2019 TASD-2(SyncDiving-3m)[34] 1 238 评分标注 体育赛事 RGB视频 2020 TASD-2(SyncDiving-10m)[34] 1 368 评分标注 体育赛事 RGB视频 2020 RG[35] 4 1000 评分标注 体育赛事 RGB视频 2020 QMAR[36] 6 38 评级标注 运动康复 RGB视频 2020 PISA[37] 1 992 评级标注 技能训练 RGB视频、音频 2021 FR-FS[38] 1 417 评分标注 体育赛事 RGB视频 2021 SMART[39] 8 640 评分标注 体育赛事、健身锻炼 RGB视频 2021 Fitness-AQA[40] 3 1000 反馈标注 健身锻炼 RGB视频 2022 Finediving[41] 1 3000 评分标注 体育赛事 RGB视频 2022 LOGO[42] 1 200 评分标注 体育赛事 RGB视频 2022 RFSJ[43] 23 1304 评分标注 体育赛事 RGB视频 2023 FineFS[44] 2 1167 评分标注 体育赛事 RGB视频、骨架数据 2023 AGF-Olympics[45] 1 500 评分标注 体育赛事 RGB视频、骨架数据 2024 表 3 两类运动特征表示方法优缺点对比
Table 3 Advantage and disadvantage comparison for two types of motion feature methods
表 4 基于RGB信息的深度运动特征方法优缺点对比
Table 4 Advantage and disadvantage comparison for RGB-based deep motion feature methods
方法分类 优点 缺点 基于卷积神经网络的动作特征
表示方法[12, 24, 28, 30−33, 48, 54, 59]简单易实现 无法充分捕捉动作特征的复杂性 基于孪生网络的动作特征
表示学习方法[24, 62−64]便于建模动作之间的细微差异 计算复杂度较高需要构建有效的样本对 基于时序分割的动作特征
表示学习方法[44, 48, 59, 65−68]降低噪声干扰更好地捕获动作的细节和变化 额外的分割标注信息片段划分不准确对性能影响较大 基于注意力机制的动作特征表示
学习方法[29, 32−35, 38, 41, 43−44, 68−72]自适应性好对重要特征的捕获能力强可解释性较好 计算复杂度高、内存消耗大 表 5 基于骨架序列的深度运动特征方法优缺点对比
Table 5 Advantage and disadvantage comparison for skeleton-based deep motion feature methods
表 6 在体育评分数据集AQA-7上的不同方法性能对比
Table 6 Action evaluation performance of various methods on sports scoring dataset AQA-7
方法 Diving Gym Vault Skiing Snowboard Sync. 3m Sync. 10m AQA-7 传统/深度 发表时间 Pose+DCT+SVR[8] 0.53 0.10 — — — — — 传统 2014 C3D+SVR[28] 0.7902 0.6824 0.5209 0.4006 0.5937 0.9120 0.6937 深度 2017 C3D+LSTM[28] 0.6047 0.5636 0.4593 0.5029 0.7912 0.6927 0.6165 深度 2017 All-action C3D+LSTM[30] 0.6177 0.6746 0.4955 0.3648 0.8410 0.7343 0.6478 深度 2017 Li et al.[11] 0.8009 0.7028 — — — — — 深度 2018 S3D[59] — 0.8600 — — — — — 深度 2018 C3D-AVG-MTL[30] 0.8808 — — — — — — 深度 2019 JRG[49] 0.7630 0.7358 0.6006 0.5405 0.9013 0.9254 0.7849 深度 2019 USDL[12] 0.8099 0.7570 0.6538 0.7109 0.9166 0.8878 0.8102 深度 2020 DML[62] 0.6900 0.4400 — — — — — 深度 2020 AIM[36] 0.7419 0.7296 0.5890 0.4960 0.9298 0.9043 0.7789 深度 2020 CoRe[63] 0.8824 0.7746 0.7115 0.6624 0.9442 0.9078 0.8401 深度 2021 Lei et al.[69] 0.8649 0.7858 — — — — — 深度 2021 EAGLE-EYE[99] 0.8331 0.7411 0.6635 0.6447 0.9143 0.9158 0.8140 深度 2021 TSA-Net[38] 0.8379 0.8004 0.6657 0.6962 0.9493 0.9334 0.8476 深度 2021 Adaptive[98] 0.8306 0.7593 0.7208 0.6940 0.9588 0.9298 0.8500 深度 2021 PCLN[64] 0.8697 0.8759 0.7754 0.5778 0.9629 0.9541 0.8795 深度 2022 TPT[70] 0.8969 0.8043 0.7336 0.6965 0.9456 0.9545 0.8715 深度 2022 表 7 JIGSAW数据集上的不同方法性能对比
Table 7 Action evaluation performance of various methods on JIGSAWS
方法 数据模态 评价方法 技能水平
划分交叉验证方法 评测指标 SU KT NP 发表时间 k-NN[111] 运动特征 GRS 两类 LOSO Accuracy 0.897 — 0.821 2018 LOUO Accuracy 0.719 — 0.729 2018 LR[111] 运动特征 GRS 两类 LOSO Accuracy 0.899 — 0.823 2018 LOUO Accuracy 0.744 — 0.702 2018 SVM[111] 运动特征 GRS 两类 LOSO Accuracy 0.754 — 0.754 2018 LOUO Accuracy 0.798 — 0.779 2018 SMT[112] 运动特征 Self-proclaimed 三类 LOSO Accuracy 0.990 0.996 0.999 2018 LOUO Accuracy 0.353 0.323 0.571 2018 DCT[112] 运动特征 Self-proclaimed 三类 LOSO Accuracy 1.00 0.997 0.999 2018 LOUO Accuracy 0.647 0.548 0.357 2018 DFT[112] 运动特征 Self-proclaimed 三类 LOSO Accuracy 1.00 0.999 0.999 2018 LOUO Accuracy 0.647 0.516 0.464 2018 ApEn[112] 运动特征 Self-proclaimed 三类 LOSO Accuracy 1.00 0.999 1.00 2018 LOUO Accuracy 0.882 0.774 0.857 2018 CNN[103] 运动特征 Self-proclaimed 三类 LOSO Accuracy 0.934 0.898 0.849 2018 CNN[103] 运动特征 GRS 三类 LOSO Accuracy 0.925 0.954 0.913 2018 CNN[106] 运动特征 Self-proclaimed 三类 LOSO Micro f1 1.00 0.921 1.00 2018 Macro f1 1.00 0.932 1.00 2018 Forestier et al.[113] 运动特征 GRS 三类 LOSO Micro f1 0.897 0.611 0.963 2018 Macro f1 0.867 0.533 0.958 2018 S3D[59] 视频数据 GRS 三类 LOSO SRC 0.68 0.64 0.57 2018 LOUO SRC 0.03 0.14 0.35 2018 FCN[100] 运动特征 Self-proclaimed 三类 LOSO Micro f1 1.00 0.921 1.00 2019 Macro f1 1.00 0.932 1.00 2019 3D ConvNet(RGB)[104] 视频数据 Self-proclaimed 三类 LOSO Accuracy 1.00 0.958 0.964 2019 3D ConvNet(OF)[104] 视频数据 Self-proclaimed 三类 LOSO Accuracy 1.00 0.951 1.00 2019 JRG[49] 视频数据 GRS 三类 LOUO SRC 0.35 0.19 0.67 2019 USDL[12] 视频数据 GRS 三类 4-fold cross validation SRC 0.71 0.71 0.69 2020 AIM[34] 视频数据、
运动特征GRS 三类 LOUO SRC 0.45 0.61 0.34 2020 MTL-VF(ResNet)[114] 视频数据 GRS 三类 LOSO SRC 0.79 0.63 0.73 2020 LOUO SRC 0.68 0.72 0.48 2020 MTL-VF(C3D)[114] 视频数据 GRS 三类 LOSO SRC 0.77 0.89 0.75 2020 LOUO SRC 0.69 0.83 0.86 2020 CoRe[63] 视频数据 GRS 三类 4-fold cross validation SRC 0.84 0.86 0.86 2021 VTPE[107] 视频数据、
运动特征GRS 三类 LOUO SRC 0.45 0.59 0.65 2021 4-fold cross validation SRC 0.83 0.82 0.76 2021 ViSA[108] 视频数据 GRS 三类 LOSO SRC 0.84 0.92 0.93 2022 LOUO SRC 0.72 0.76 0.90 2022 4-fold cross validation SRC 0.79 0.84 0.86 2022 Gao et. al[109] 视频数据、
运动特征GRS 三类 LOUO SRC 0.60 0.69 0.66 2023 4-fold cross validation SRC 0.83 0.95 0.83 2023 Contra-Sformer[110] 视频数据 GRS 三类 LOSO SRC 0.86 0.89 0.71 2023 LOUO SRC 0.65 0.69 0.71 2023 表 8 在EPIC-Skills 2018上的不同方法性能对比
Table 8 Action evaluation performance of various methods on EPIC-Skills 2018
-
[1] 朱煜, 赵江坤, 王逸宁, 郑兵兵. 基于深度学习的人体行为识别算法综述. 自动化学报, 2016, 42(6): 848−857Zhu Yu, Zhao Jiang-Kun, Wang Yi-Ning, Zheng Bing-Bing. A review of human action recognition based on deep learning. Acta Automatica Sinica, 2016, 42(6): 848−857 [2] LEI Q, DU J X, ZHANG H B, Ye S, Chen D S. A survey of vision-based human action evaluation methods. Sensors, 2019, 19(19): 4129−4155 doi: 10.3390/s19194129 [3] Ahad M A R, Antar A D, Shahid O. Vision-based action understanding for assistive healthcare: A short review. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Long Beach, CA, USA: IEEE, 2019. 1−11 [4] Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E. Deep learning for computer vision: A brief review. Computational Intelligence and Neuroscience, 2018, 2018 [5] 郑太雄, 黄帅, 李永福, 冯明驰. 基于视觉的三维重建关键技术研究综述. 自动化学报, 2020, 46(4): 631−652Zheng Tai-Xiong, Huang Shuai, Li Yong-Fu, Feng Ming-Chi. Key techniques for vision based 3D reconstruction: A review. Acta Automatica Sinica, 2020, 46(4): 631−652 [6] 林景栋, 吴欣怡, 柴毅, 尹宏鹏. 卷积神经网络结构优化综述. 自动化学报, 2020, 46(1): 24−37Lin Jing-Dong, Wu Xin-Yi, Chai Yi, Yin Hong-Peng. Structure optimization of convolutional neural networks: A survey. Acta Automatica Sinica, 2020, 46(1): 24−37 [7] 张重生, 陈杰, 李岐龙, 邓斌权, 王杰, 陈承功. 深度对比学习综述. 自动化学报, 2023, 49(1): 15−39Zhang Chong-Sheng, Chen Jie, Li Qi-Long, Deng Bin-Quan, Wang Jie, Chen Cheng-Gong. Deep contrastive learning: A survey. Acta Automatica Sinica, 2023, 49(1): 15−39 [8] PIRSIAVASH H, VONDRICK C, TORRALBA A. Assessing the quality of actions. In: Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014. 556−571 [9] GAO Y, VEDULA S S, REILEY C E, Ahmidi N, Varadarajan B, Lin H C, et al. Jhu-isi gesture and skill assessment working set (jigsaws): A surgical activity dataset for human motion modeling. In: Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention Workshops. Boston, MA, USA: Springer, 2014. 3−12 [10] PAIEMENT A, TAO L, HANNUNA S, Camplani M, Damen D, Mirmehdi M. Online quality assessment of human movement from skeleton data. In: Proceedings of the British Machine Vision Conference. Nottingham, UK: BMVA, 2014. 153−166 [11] LI Y, CHAI X, CHEN X. End-to-end learning for action quality assessment. In: Proceedings of the Pacific Rim Conference on Multimedia. Hefei, China: Springer, 2018. 125−134 [12] TANG Y, NI Z, ZHOU J H, ZHANG D Y, LU J W, Wu Y, et al. Uncertainty-aware score distribution learning for action quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2020. 9839−9848 [13] Xu J, Yin S, Zhao G, Wang Z, Peng Y. FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle WA, USA: IEEE, 2024. 14628−14637 [14] Morgulev E, Azar O H, Lidor R. Sports analytics and the big-data era. International Journal of Data Science and Analytics, 2018, 5: 213−222 [15] BUTEPAGE J, BLACK M J, KRAGIC D, Kjellström H. Deep representation learning for human motion prediction and classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017. 6158−6166 [16] TAO L, PAIEMENT A, DAMEN D, Mirmedhi M, Hannuna S, Camplani M, et al. A comparative study of pose representation and dynamics modelling for online motion quality assessment. Computer Vision and Image Understanding, 2016, 148: 136−152 doi: 10.1016/j.cviu.2015.11.016 [17] KHALID S, GOLDENBERG M, GRANTCHAROV T, TAATI B, RUDZICZ F. Evaluation of deep learning models for identifying surgical actions and measuring performance. JAMA Network Open, 2020, 3(3): e201664−e201664 doi: 10.1001/jamanetworkopen.2020.1664 [18] QIU Y, WANG J, JIN Z, CHEN H, ZHANG M, GUO L. Pose-guided matching based on deep learning for assessing quality of action on rehabilitation training. Biomedical Signal Processing and Control, 2022, 72: 103323−103333 [19] NIEWIADOMSKI R, KOLYKHALOVA K, PIANA S, ALBORNO P, VOLPE G, CAMURRI A. Analysis of movement quality in full-body physical activities. ACM Transactions on Interactive Intelligent Systems, 2019, 9(1): 1−20 [20] VAKANSKI A, JUN HP, PAUL D, BAKER R. A dataset of human body movements for physical rehabilitation exercises. Data, 2018, 3(1): 2 doi: 10.3390/data3010002 [21] ALEXIADIS D S, KELLY P, DARAS P, O'CONNOR N E, BOUBEKEUR T, MOUSSA M B. Evaluating a dancer's performance using kinect-based skeleton tracking. In: Proceedings of the ACM International Conference on Multimedia. Scottsdale, AZ, USA: ACM, 2011. 659−662 [22] CAPECCI M, CERAVOLO M G, FERRACUTI F, LARLORI S, MONTERIU A, ROMEO L, et al. The kimore dataset: Kinematic assessment of movement and clinical scores for remote monitoring of physical rehabilitation. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2019, 27(7): 1436−1448 doi: 10.1109/TNSRE.2019.2923060 [23] PARMAR P, MORRIS BT. Measuring the quality of exercises. In: Proceedings of the Aunual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Orlando, Florida, USA: IEEE, 2016. 2241−2244 [24] DOUGHTY H, DAMEN D, MAYOL-CUEVAS W. Who's better? who's best? pairwise deep ranking for skill determination. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018. 6057−6066 [25] ILG W, MEZGER J, GIESE M. Estimation of skill levels in sports based on hierarchical spatio-temporal correspondences. In: Proceedings of the Joint Pattern Recognition Symposium. Springer, 2003: 523−531 [26] WNUK K, SOATTO S. Analyzing diving: A dataset for judging action quality. In: Proceedings of the Asian Conference on Computer Vision. Queenstown, New Zealand: Springer, 2010. 266−276 [27] BERTASillS G, SOO PARK H, YU S X, SHI J. Am I a baller? basketball performance assessment from first-person videos. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 2177−2185 [28] PARMAR P, TRAN MORRIS B. Learning to score olympic events. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, HI, USA: IEEE, 2017. 20−28 [29] LI Z, HUANG Y, CAI M, SATO Y. Manipulation-skill assessment from videos with spatial attention network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. Seoul, Korea: IEEE, 2019. [30] PARMAR P, MORRIS B. Action quality assessment across multiple actions. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Waikoloa Village, HI, USA: IEEE, 2019. 1468−1476 [31] PARMAR P, MORRIS B T. What and how well you performed? a multitask learning approach to action quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019. 304−313 [32] DOUGHTY H, MAYOL-CUEVAS W, DAMEN D. The pros and cons: Rank-aware temporal attention for skill determination in long videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019: 7862−7871 [33] XU C, FU Y, ZHANG B, JIANG Y G, XUE X. Learning to score figure skating sport videos. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 30(12): 4578−4590 [34] GAO J, ZHENG W S, PAN J H, et al. An asymmetric modeling for action assessment. In: Proceedings of the European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 222−238 [35] ZENG L A, HONG F T, ZHENG W S, et al. Hybrid dynanlic-static context-aware attention network for action assessment in long videos. In: Proceedings of the ACM International Conference on Multimedia. Seattle, WA, USA: ACM 2020. 2526−2534 [36] SARDARI F, PAIEMENT A, HANNUNA S, MIRMEHDI M. Vi-net——view-invariant quality of human movement assessment. Sensors, 2020, 20(18): 5258−5263 doi: 10.3390/s20185258 [37] PARMAR P, REDDY J, MORRIS B. Piano skills assessment. In: Proceedings of the International Workshop on Multimedia Signal Processing (MMSP). Tampere, Finland: IEEE, 2021. 1−5 [38] WANG S, YANG D, ZHAI P, CHEN C, ZHANG L. Tsa-net: Tube self-attention network for action quality assessment. In: Proceedings of the International Conference on Multimedia. Chengdu, China: ACM, 2021. 4902−4910 [39] CHEN X, PANG A, YANG W, MA Y, XU L, YU J. Sportscap: Monocular 3D human motion capture and fine-grained understanding in challenging sports videos. International Journal of Computer Vision, 2021, 129: 2846−2864 doi: 10.1007/s11263-021-01486-4 [40] PARMAR P, GHARAT A, RHODIN H. Domain knowledge-informed self-supervised representations for workout form assessment. In: Proceedings of the European Conference on Computer Vision. Tel Aviv, Israel: Springer, 2022. 105−123 [41] XU J, RAO Y, YU X, CHEN G, ZHOU J, LU J. Finediving: A fine-grained dataset for procedure-aware action quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA: IEEE, 2022. 2949−2958 [42] ZHANG S, DAI W, WANG S, SHEN X, LU J, ZHOU J, et al. Logo: A long-form video dataset for group action quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada: IEEE, 2023. 2405−2414 [43] LIU Y, CHENG X, IKENAGA T. A figure skating jumping dataset for replay-guided action quality assessment. In: Proceedings of the ACM International Conference on Multimedia. Ottawa, Canada: ACM, 2023. 2437−2445 [44] JI Y, YE L, HUANG H, MAO L, ZHOU Y, GAO L. Localization-assisted uncertainty score disentanglement network for action quality assessment. In: Proceedings of the ACM International Conference on Multimedia. Ottawa, Canada: ACM 2023. 8590−8597 [45] Zahan S, Hassan G M, Mian A. Learning sparse temporal video mapping for action quality assessment in floor gymnastics. IEEE Transactions on Instrumentation and Measurement, 2024, 73: 1−11 [46] AHMIDI N, TAO L, SEFATI S, GAO Y, LEA C, HARO B, et al. A dataset and benchmarks for segmentation and recognition of gestures in robotic surgery. IEEE Transactions on Biomedical Engineering, 2017, 64(9): 2025−2041 [47] LIAO Y, VAKANSKI A, XIAN M. A deep learning framework for assessing physical rehabilitation exercises. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2020, 28(2): 468−477 doi: 10.1109/TNSRE.2020.2966249 [48] LI Y, CHAI X, CHEN X. Scoringnet: learning key fragment for action quality assessment with ranking loss in skilled sports. In: Proceedings of the Asian Conference on Computer Vision. Perth, Western Australia: Springer, 2018. 149−164 [49] PAN J H, GAO J, ZHENG W S. Action assessment by joint relation graphs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, Korea: IEEE 2019. 6331−6340 [50] LEI Q, ZHANG H B, DU J X, HSIAO T, CHEN C. Learning effective skeletal representations on RGB video for fine-grained human action quality assessment. Electronics, 2020, 9(4): 568−587 doi: 10.3390/electronics9040568 [51] GORDON A S. Automated video assessment of human performance. In: Proceedings of AI-ED. Washington, DC: 1995. 10−15 [52] VENKATARAMAN V, VLACHOS I, TURAGA P. Dynamical regularity for action analysis. In: Proceedings of the British Machine Vision Conference. Swansea, UK: BMVA, 2015. 67−78 [53] ZIA A, SHARMA Y, BETTADAPURA V, SARIN E L, PLOETZ T, CLEMENTS M, et al. Automated video-based assessment of surgical skills for training and evaluation in medical schools. International Journal of Computer Assisted Radiology and Surgery, 2016, 11: 1623−1636 doi: 10.1007/s11548-016-1468-2 [54] Parmar P. On action quality assessment. Reno, USA: The University of Nevada, 2019. [55] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos. In: Proceedings of the Advances in Neural Information Processing Systems. Cambridge, MA, US: ACM, 2014. 27−35 [56] TRAN D, BOURDEV L, FERGUS R, TORRESANI L, PALURI M. Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015. 4489−4497 [57] CARREIRA J, ZISSERMAN A. Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017. 6299−6308 [58] QIU Z, YAO T, MEI T. Learning spatio-temporal representation with pseudo-3D residual networks. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 5533−5541 [59] XIANG X, TIAN Y, REITER A, et al. S3D: Stacking segmental P3D for action quality assessment. In: Proceedings of the IEEE International Conference on Image Processing. Athens, Greece: IEEE, 2018: 928−932 [60] YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions. In: Proceedings of the International Conference on Learning Representations. San Juan, Puerto Rico: 2016. 928−932 [61] BROMLEY J, BENTZ J W, BOTTOU L, GUYON I. Signature verification using a “siamese” time delay neural network. International Journal of Pattern Recognition and Articial Intelligence, 1993, 7(04): 669−688 doi: 10.1142/S0218001493000339 [62] JAIN H, HARIT G, SHARMA A. Action quality assessment using siamese network-based deep metric learning. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 31(6): 2260−2273 [63] YU X, RAO Y, ZHAO W, LU J, ZHOU J. Group-aware contrastive regression for action quality assessment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021. 7919−7928 [64] LI M, ZHANG H B, LEI Q, FAN Z, LIU J, DU J. Pairwise contrastive learning network for action quality assessment. In: Proceedings of the European Conference on Computer Vision. Tel Aviv, Israel: Springer, 2022. 457−473 [65] DONG L J, ZHANG H B, SHI Q, LEI Q, DU J, GAO S. Learning and fusing multiple hidden substages for action quality assessment. Knowledge-Based Systems, 2021, 229: 107388 [66] LEA C, FLYNN M D, VIDAL R, REITER A, HAGER G. Temporal convolutional networks for action segmentation and detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii, USA: IEEE, 2017. 156−165 [67] LIU L, ZHAI P, ZHENG D, FANG Y. Multi-stage action quality assessment method. In: Proceedings of the International Conference on Control, Robotics and Intelligent System. Guangzhou, China: ACM, 2023. 116−122 [68] GEDAMU K, 且Y, YANG Y, SHAO J, SHEN H. Fine-grained spatio-temporal parsing network for action quality assessment. IEEE Transactions on Image Processing, 2023, 32: 6386−6400 doi: 10.1109/TIP.2023.3331212GEDAMU K, 且Y, YANG Y, SHAO J, SHEN H. Fine-grained spatio-temporal parsing network for action quality assessment. IEEE Transactions on Image Processing, 2023, 32: 6386−6400 doi: 10.1109/TIP.2023.3331212 [69] LEI Q, ZHANG H, DU J. Temporal attention learning for action quality assessment in sports video. Signal, Image and Video Processing, 2021, 15: 1575−1583 doi: 10.1007/s11760-021-01890-w [70] BAI Y, ZHOU D, ZHANG S, WANG J, DING E, GUAN Y, et al. Action quality assessment with temporal parsing transformer. In: Proceedings of the European Conference on Computer Vision. Tel Aviv, Israel: Springer, 2022. 422−438 [71] XU A, ZENG LA, ZHENG W S. Likert scoring with grade decoupling for long-term action assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA: IEEE, 2022. 3232−3241 [72] DU Z, HE D, WANG X, WANG Q. Learning semantics-guided representations for scoring figure skating. IEEE Transactions on Multimedia, 2023, 26: 4987−4997 [73] YAN S, XIONG Y, LIN D. Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans, LA, USA: AAAI, 2018. 7444−7452 [74] GAO X, HU W, TANG J, LIU J. Optimized skeleton-based action recognition via sparsified graph regression. In: Proceedings of the ACM International Conference on Multimedia. Nice, France: ACM, 2019. 601−610 [75] PATRONA F, CHATZITOFIS A, ZARPALAS D, DARAS P. Motion analysis: Action detection, recognition and evaluation based on motion capture data. Pattern Recognition, 2018, 76: 612−622 doi: 10.1016/j.patcog.2017.12.007 [76] Microsoft Development Team. Azure Kinect DK depth camera. Microsoft Azure Documentation [Online], available: https://docs.microsoft.com/en-us/azure/kinect-dk/, 2019. [77] YANG Y, RAMANAN D. Articulated pose estimation with flexible mixtures-of-parts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, CO, USA: IEEE, 2011. 1385−1392 [78] FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D, RAMANAN D. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 32(9): 1627−1645 [79] TIAN Y, SUKTHANKAR R, SHAH M. Spatiotemporal deformable part models for action detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2013. 2642−2649 [80] CAO Z, SIMON T, WEI S E, SHEIKH Y. Realtime multi-person 2D pose estinlation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017. 7291−7299 [81] FANG H S, XIE S, TAI Y W, LU C. Rmpe: Regional multi-person pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 2334−2343 [82] HE K, GKIOXARI G, DOLLÁR P, GIRSHICK R. Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 2961−2969 [83] SHOTTON J, FITZGIBBON A, COOK M, SHARP T, FINOCCHIO M, MOORE R, et al. Real-time human pose recognition in parts from single depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, CO, USA: IEEE, 2011. 1297−1304 [84] GUIDE OpenNI User. Openni organization, november 2010. Last viewed, 2011, 18: 15 [85] RHODIN H, SPORRI J, KATIRCIOGLU I, CONSTANTIN V, MEYER F, Müller E, SALZMANN M, et al. Learning monocular 3D human pose estimation from multi-view images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018. 8437−8446 [86] DONG J, JIANG W, HUANG Q, BAO H, ZHOU X. Fast and robust multi-person 3D pose estimation from multiple views. In: Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019. 7792−7801 [87] ELIKTUTAN 0, AKGUL CB, WOLF C, SANKUR B. Graph-based analysis of physical exercise actions. In: Proceedings of the International Workshop on Multimedia Indexing and Information Retrieval for Healthcare. Barcelona, Spain: ACM, 2013. 23−32 [88] LIU J, WANG G, HU P, DUAN L, KOT A. Global context-aware attention lstm networks for 3D action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017. 1647−1656 [89] LEE I, KIM D, KANG S, LEE S. Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 1012−1020 [90] LI C, ZHONG Q, XIE D, PU S. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: Proceedings of the International Joint Conference on Artificial Intelligence. Stockholm, Sweden: Morgan Kaufmann, 2018. 786−792 [91] LI Y, XIA R, LIU X, HUANG Q. Learning shape-motion representations from geometric algebra spatio-temporal model for skeleton-based action recognition. In: Proceedings of the IEEE International Conference on Multimedia and Expo. Shanghai, China: IEEE, 2019. 1066−1071 [92] LI M, CHEN S, CHEN X, ZHANG Y, WANG Y, TIAN Q. Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019. 3595−3603 [93] SHI L, ZHANG Y, CHENG J, LU H. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019. 12026−12035 [94] BRUCE X, LIU Y, CHAN K C. Skeleton-based detection of abnormalities in human actions using graph convolutional networks. In: Proceedings of the International Conference on Transdisciplinary AI (TransAI). Irvine, California, USA: IEEE, 2020. 131−137 [95] CHOWDHURY S H, AL AMIN M, RAHMAN A M, AMIN M A, ALI A A. Assessment of rehabilitation exercises from depth sensor data. In: Proceedings of the International Conference on Computer and Information Technology. Dhaka, Bangladesh: IEEE, 2021. 1−7 [96] DEB S, ISLAM M F, RAHMAN S, RAHMAN S. Graph convolutional networks for assessment of physical rehabilitation exercises. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2022, 30: 410−419 [97] LI H, LEI Q, ZHANG H, DU J, GAO S. Skeleton-based deep pose feature learning for action quality assessment on figure skating videos. Journal of Visual Communication and Image Representation, 2022, 89: 103625 doi: 10.1016/j.jvcir.2022.103625 [98] PAN J H, GAO J, ZHENG W S. Adaptive action assessment. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(12): 8779−8795 [99] NEKOUI M, CRUZ F O T, CHENG L. Eagle-eye: Extreme-pose action grader using detail bird's-eye view. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa, HI, USA: IEEE, 2021. 394−402 [100] FAWAZ H I, FORESTIER G, WEBER J, IDOUMGHAR L, MULLER P A. Accurate and interpretable evaluation of surgical skills from kinematic data using fully convolutional neural networks. International Journal of Computer Assisted Radiology and Surgery, 2019, 14(9): 1611−1617 doi: 10.1007/s11548-019-02039-4 [101] RODITAKIS K, MAKRIS A, ARGYROS A. Towards improved and interpretable action quality assessment with self-supervised alignment. In: Proceedings of the PErvasive Technologies Related to Assistive Environments Conference. Corfu, Greece: IEEE, 2021. 507−513 [102] LI M Z, ZHANG H B, DONG L J, LEI Q, DU J X. Gaussian guided frame sequence encoder network for action quality assessment. Complex & Intelligent Systems, 2023, 9(2): 1963−1974 [103] WANG Z, FEY A M. Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery. International Journal of Computer Assisted Radiology and Surgery, 2018, 13(12): 1959−1970 doi: 10.1007/s11548-018-1860-1 [104] FUNKE I, MEES S T, WEITZ J, SPEIDEL S. Video-based surgical skill assessment using 3D convolutional neural networks. International Journal of Computer Assisted Radiology and Surgery, 2019, 14(7): 1217−1225 doi: 10.1007/s11548-019-01995-1 [105] WANG Z, FEY AM. Satr-dl: improving surgical skill assessment and task recognition in robot-assisted surgery with deep neural networks. In: Proceedings of the IEEE Engineering in Medicine and Biology Society. Honolulu, Hawaii, USA: IEEE, 2018. 1793−1796 [106] FAWAZ H I, FORESTIER G, WEBER J, IDOUMGHAR L, MULLER P A. Evaluating surgical skills from kinematic data using convolutional neural networks. In: Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention. Granada, Spain: Springer, 2018. 214−221 [107] LIU D, LIQ, JIANG T, WANG Y, MIAO R, SHAN F, et al. Towards unified surgical skill assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA: IEEE, 2021. 9522−9531 [108] LI Z, GU L, WANG W, NAKAMURA R, SATO Y. Surgical skill assessment via video semantic aggregation. In: Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Singapore: Springer, 2022. 410−420 [109] GAO J, PAN J H, ZHANG S J, ZHENG W S. Automatic modelling for interactive action assessment. International Journal of Computer Vision, 2023, 131(3): 659−679 doi: 10.1007/s11263-022-01695-5 [110] ANASTASIOU D, JIN Y, STOYANOV D, MAZOMENOS E. Keep your eye on the best: Contrastive regression transformer for skill assessment in robotic surgery. IEEE Robotics and Automation Letters, 2023, 8(3): 1755−1762 [111] FARD M J, AMERI S, DARIN ELLIS R, CHINNAM R B, PANDYA A K, KLEIN M D. Automated robot-assisted surgical skill evaluation: Predictive analytics approach. International Journal of Medical Robotics and Computer Assisted Surgery, 2018, 14(1): e1850 doi: 10.1002/rcs.1850 [112] ZIA A, ESSA I. Automated surgical skill assessment in rmis training. International Journal of Computer Assisted Radiology and Surgery, 2018, 13(5): 731−739 doi: 10.1007/s11548-018-1735-5 [113] FORESTIER G, PETITJEAN F, SENIN P, DESPINOY F, HUAULMÉ A, FAWAZ H I, et al. Surgical motion analysis using discriminative interpretable patterns. Artificial Intelligence in Medicine, 2018, 91: 3−11 doi: 10.1016/j.artmed.2018.08.002 [114] WANG T, WANG Y, LI M. Towards accurate and interpretable surgical skill assessment: A video-based method incorporating recognized surgical gestures and skill levels. In: Proceedings of the Medical Image Computing and Computer Assisted Intervention. Linla, Peru: Springer, 2020. 668−678 [115] Okamoto L, Parmar P. Hierarchical NeuroSymbolic Approach for Comprehensive and Explainable Action Quality Assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle WA, USA: IEEE, 2024. 3204−3213
计量
- 文章访问数: 21
- HTML全文浏览量: 17
- 被引次数: 0