姿态特征与深度特征在图像动作识别中的混合应用

钱银中; 沈一帆

doi:10.16383/j.aas.2018.c170294

姿态特征与深度特征在图像动作识别中的混合应用

doi: 10.16383/j.aas.2018.c170294

钱银中^1, ,,
沈一帆^2,3,

1.
常州信息职业技术学院软件学院常州 213164
2.
复旦大学计算机科学技术学院上海 200433
3.
复旦大学上海市智能信息处理重点实验室上海 200433

基金项目:

常州信息职业技术学院自然科学项目 CXZK201803Z

江苏高校品牌专业建设工程资助项目 PPZY2015A090

详细信息

作者简介:
沈一帆复旦大学计算机科学技术学院教授.研究方向为计算机图形学和科学计算的可视化.E-mail:yfshen@fudan.edu.cn

通讯作者:
钱银中复旦大学计算机科学技术学院博士研究生, 常州信息职业技术学院副教授.主要研究方向为计算机视觉和机器学习.本文通信作者.E-mail:yinzhongqian10@fudan.edu.cn

计量
- 文章访问数: 2709
- HTML全文浏览量: 760
- PDF下载量: 851
- 被引次数: 0
出版历程
- 收稿日期: 2017-06-01
- 录用日期: 2018-01-20
- 刊出日期: 2019-03-20

Hybrid of Pose Feature and Depth Feature for Action Recognition in Static Image

QIAN Yin-Zhong^{1
, ,},
SHEN Yi-Fan^{2,3
,}

1.
School of Software, Changzhou College of Information Technology, Changzhou 213164
2.
School of Computer Science, Fudan University, Shanghai 200433
3.
Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai 200433

Funds:

Natural Science Project of Changzhou College of Information Technology CXZK201803Z

Top-notch Academic Programs Project of Jiangsu Higher Education Institutions PPZY2015A090

More Information

Author Bio:
Professor at the School of Computer Science, Fudan University. His research interest covers computer graphics and scientiflc computing visualization

Corresponding author: QIAN Yin-Zhong Ph. D. candidate at the School of Computer Science, Fudan University. He is also an associate professor in Changzhou College of Information Technology. His research interest covers computer vision and machine learning. Corresponding author of this paper

摘要

摘要: 人体姿态是动作识别的重要语义线索，而CNN能够从图像中提取有很强判别能力的深度特征，本文从图像局部区域提取姿态特征，从整体图像中提取深度特征，探索两者在动作识别中的互补作用.首先介绍了一种姿态表示方法，每个肢体部件的姿态由描述该部件姿态的一组Poselet检测得分表示.为了抑制检测错误，设计了基于部件的模型作为检测上下文.为了从数量有限的数据集中训练CNN网络，本文使用了预训练和精细调节的方法.在两个数据集中的实验表明，本文介绍的姿态特征与深度特征混合使用，动作识别性能得到了极大提升.
- 动作识别 /
- 姿态特征 /
- poselet /
- 深度特征
Abstract: Body pose is an important semantic cue for action recognition, and CNN can extract strong discriminative depth feature. This paper extracts pose feature from local image patches and gets depth feature from holistic image, then exploits their complementary relationship in action recognition. A pose representation is introduced, in which pose of a body part is represented by a collection of poselets which describe its pose variability. To suppress detection ambiguity, part-based model is designed as the context of detection for each poselet. CNN is trained through pre-training and fine tuning on the data set with very limited images. Empirical results demonstrate aggressive performance improvement by concatenating pose feature and depth feature.
- Action recognition /
- pose feature /
- poselet /
- depth feature
注释:

1) 本文责任编委赖剑煌

HTML全文

图 6 静止图像数据集中的部分图像

Fig. 6 Some images in static image data set

下载: 全尺寸图片幻灯片

图 1 打高尔夫球动作中部分胳膊Poselet训练实例

Fig. 1 Instances for some arm poselets in playing golf

下载: 全尺寸图片幻灯片

图 2 层次部件树

Fig. 2 Hierarchical part tree

下载: 全尺寸图片幻灯片

图 3 Poselet上下文模型

Fig. 3 Poselet context

下载: 全尺寸图片幻灯片

图 4 在上下文环境中检测Poselet

Fig. 4 Detecting Poselet in context

下载: 全尺寸图片幻灯片

图 5 提取深度特征

Fig. 5 Extract deep features

下载: 全尺寸图片幻灯片

图 7 标注了关键点的图像

Fig. 7 Some images with annotated key points

下载: 全尺寸图片幻灯片

图 8 视频截图数据集中的部分图像

Fig. 8 ome images in video data set

下载: 全尺寸图片幻灯片

图 9 使用预备模型前后CNN训练过程top1错误率比较

Fig. 9 Comparison of top1 error between whether using pre trained model

下载: 全尺寸图片幻灯片

图 10 姿态特征识别正确而深度特征识别错误的图像

Fig. 10 Some images recognized accurately by pose feature but falsely by deep feature

下载: 全尺寸图片幻灯片

图 11 深度特征识别正确而姿态特征识别错误的图像

Fig. 11 Some images recognized accurately by deep feature but falsely by pose feature

下载: 全尺寸图片幻灯片

表 1 静止图像数据集上的动作识别精度(%)

Table 1 Precision on static image data set (%)

方法	姿态特征平均精度	基准平均精度	与基准比较
四节点星型^[17]	61.07	56.45	+ 4.62
20节点模型^[18]	65.15	62.8	+ 2.35
POSELETS^[7]	61.33	56.41	+ 4.92
CNN黑盒特征^[31]	67.20	56.41	+ 10.79
本文的姿态特征	66.40	56.41	+ 9.99

下载: 导出CSV

表 2 视频截图数据集上的动作识别精度(%)

Table 2 Precision on image form video data set (%)

方法	姿态特征平均精度	基准平均精度	与基准比较
四节点星型^[17]	50.58	46.98	+ 3.6
multiLR NMF^[11]	63.61	59.35	+ 4.26
POSELETS^[7]	54.84	49.42	+ 5.42
CNN黑盒特征^[31]	63.84	49.42	+ 14.42
本文的姿态特征	63.58	49.42	+ 14.16

下载: 导出CSV

表 3 静止图像数据集姿态特征、CNN及混合后性能比较(%)

Table 3 Precision comparison on static image data set (%)

方法	舞蹈	打高尔夫球	跑步	坐	行走	平均
姿态特征	69	65	74	65	59	66.4
CNN	72.4	76.4	70.2	73.9	65.6	71.7
姿态特征+ C5	79.4	68.8	79.9	77.4	72.0	75.5
姿态特征+ F6	78.5	70.2	77.9	78.3	74.5	75.8
姿态特征+ F7	75.3	68.9	76.2	79.3	72.4	74.4

下载: 导出CSV

表 4 视频截图数据集姿态特征、CNN及混合后精度比较(%)

Table 4 Precision comparision on video data set (%)

方法	舞蹈	打高尔夫球	跑步	坐	行走	平均
姿态特征	52.1	91.5	83.6	39.4	51.4	63.6
CNN	62.2	58.9	76.2	63.9	58.9	64.0
姿态特征+ C5	63.4	65.7	82.3	61.5	65.5	67.6
姿态特征+ F6	69.8	64.6	84.5	64.3	66.3	69.9
姿态特征+ F7	67.5	64.1	82.5	63.7	65.8	68.7

下载: 导出CSV

参考文献(33)

[1]	Aggarwal J K, Ryoo M S. Human activity analysis:a review. ACM Computing Surveys, 2011, 43(3):Article No.16 http://d.old.wanfangdata.com.cn/Periodical/dlkxjz201407009
[2]	Jiang Y G, Wu Z X, Wang J, Xue X Y, Chang S F. Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(2):352-364 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=Arxiv000001111697
[3]	Wu Z X, Jiang Y G, Wang X, Ye H, Xue X Y. Multi-stream multi-class fusion of deep networks for video classification. In:Proceedings of the 2016 ACM on Multimedia Conference. Amsterdam, The Netherlands:ACM, 2016. 791-800
[4]	朱煜, 赵江坤, 王逸宁, 郑兵兵.基于深度学习的人体行为识别算法综述.自动化学报, 2016, 42(6):848-857 http://www.aas.net.cn/CN/abstract/abstract18875.shtml Zhu Yu, Zhao Jiang-Kun, Wang Yi-Ning, Zheng Bing-Bing. A review of human action recognition based on deep learning. Acta Automatica Sinica, 2016, 42(6):848-857 http://www.aas.net.cn/CN/abstract/abstract18875.shtml
[5]	关秋菊, 罗晓牧, 郭雪梅, 王国利.基于隐马尔科夫模型的人体动作压缩红外分类.自动化学报, 2017, 43(3):398-406 http://www.aas.net.cn/CN/abstract/abstract19018.shtml Guan Qiu-Ju, Luo Xiao-Mu, Guo Xue-Mei, Wang Guo-Li. Compressive infrared classification of human motion using HMM. Acta Automatica Sinica, 2017, 43(3):398-406 http://www.aas.net.cn/CN/abstract/abstract19018.shtml
[6]	Guo G D, Lai A. A survey on still image based human action recognition. Pattern Recognition, 2014, 47(10):3343-3361 doi: 10.1016/j.patcog.2014.04.018
[7]	Maji S, Bourdev L, Malik J. Action recognition from a distributed representation of pose and appearance. In:Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA:IEEE, 2011. 3177-3184
[8]	Yao B P, Li F F. Action recognition with exemplar based 2.5D graph matching. In:Proceedings of the 12th European Conference on Computer Vision. Florence, Italy:Springer, 2012. 173-186
[9]	Yao B P, Jiang X Y, Khosla A, Lin A L, Guibas L, Li F F. Human action recognition by learning bases of action attributes and parts. In:Proceedings of the 2011 IEEE International Conference on Computer Vision. Barcelona, Spain:IEEE, 2011. 1331-1338
[10]	Delaitre V, Laptev I, Sivic J. Recognizing human actions in still images:a study of bag-of-features and part-based representations. In:Proceedings of the 21st British Machine Vision Conference. Aberystwyth, UK:BMVC Press, 2010. 1-11
[11]	Ikizler-Cinbis N, Cinbis R G, Sclaroff S. Learning actions from the Web. In:Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. Kyoto, Japan:IEEE, 2009. 995-1002
[12]	Wang Y, Jiang H, Drew M S, Li Z N, Mori G. Unsupervised discovery of action classes. In:Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA:IEEE, 2006. 1654-1661
[13]	Felzenszwalb P F, Huttenlocher D P. Pictorial structures for object recognition. International Journal of Computer Vision, 2005, 61(1):55-79 doi: 10.1023/B:VISI.0000042934.15159.49
[14]	Yang Y, Ramanan D. Articulated pose estimation with flexible mixtures-of-parts. In:Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA:IEEE, 2011. 1385-1392
[15]	Bourdev L, Malik J. Poselets:body part detectors trained using 3D human pose annotations. In:Proceedings of IEEE the 12th International Conference on Computer Vision. Kyoto, Japan:IEEE, 2009. 1365-1372
[16]	Bourdev L, Maji S, Brox T, Malik J. Detecting people using mutually consistent poselet activations. In:Proceedings of the 11th European Conference on Computer Vision. Heraklion, Crete, Greece:Springer, 2010. 168-181
[17]	Yang W L, Wang Y, Mori G. Recognizing human actions from still images with latent poses. In:Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, USA:IEEE, 2010. 2030-2037
[18]	Wang Y, Tran D, Liao Z C, Forsyth D. Discriminative hierarchical part-based models for human parsing and action recognition. Journal of Machine Learning Research, 2012, 13(1):3075-3102 http://cn.bing.com/academic/profile?id=b01a8d0fa4dfaa33e7d60d84dc664875&encoded=0&v=paper_preview&mkt=zh-cn
[19]	Ni B B, Moulin P, Yang X K, Yan S C. Motion part regularization:improving action recognition via trajectory group selection. In:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA:IEEE, 2015. 3698-3706
[20]	Wang H, Schmid C. Action recognition with improved trajectories. In:Proceedings of the 2013 IEEE International Conference on Computer Vision. Sydney, Australia:IEEE, 2013. 3551-3558
[21]	Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In:Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada, USA:ACM, 2012. 84-90
[22]	Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. In:Proceedings of the 2014 Advances in Neural Information Processing Systems. Montrál, Canada:NIPS, 2014. 1-8
[23]	Goodale M A, Milner A D. Separate visual pathways for perception and action. Trends in Neurosciences, 1992, 15(1):20-25 doi: 10.1016/0166-2236(92)90344-8
[24]	Gkioxari G, Malik J. Finding action tubes. In:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA:IEEE, 2015. 759-768
[25]	Cheron G, Laptev I, Schmid C. P-CNN:pose-based CNN features for action recognition. In:Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago, Chile:IEEE, 2015. 3218-3226
[26]	Chen J W, Wu J, Konrad J, Ishwar P. Semi-coupled two-stream fusion ConvNets for action recognition at extremely low resolutions. In:Proceedings of 2017 IEEE Winter Conference on Applications of Computer Vision. Santa Rosa, CA, USA:IEEE, 2017. 139-147
[27]	Shi Y M, Tian Y H, Wang Y W, Huang T J. Sequential deep trajectory descriptor for action recognition with three-stream CNN. IEEE Transactions on Multimedia, 2017, 19(7):1510-1520 doi: 10.1109/TMM.2017.2666540
[28]	Tu Z G, Cao J, Li Y K, Li B X. MSR-CNN:applying motion salient region based descriptors for action recognition. In:Proceedings of the 23rd International Conference on Pattern Recognition. Cancun, Mexico:IEEE, 2016. 3524-3529
[29]	Mikolajczyk K, Choudhury R, Schmid C. Face detection in a video sequence-a temporal approach. In:Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Kauai, HI, USA:IEEE, 2001. Ⅱ-96-Ⅱ-101
[30]	Ramanan D. Dual coordinate solvers for large-scale structural SVMs. USA:UC Irvine, 2013
[31]	Donahue J, Jia Y Q, Vinyals O, Hoffman J, Zhang N, Tzeng E, et al. DeCAF:a deep convolutional activation feature for generic visual recognition. In:Proceedings of the 31st International Conference on Machine Learning. Beijing, China:ICML, 2014. 647-655
[32]	Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In:Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA:IEEE, 2014. 580-587
[33]	Niebles J C, Han B, Ferencz A, Li F F. Extracting moving people from internet videos. In:Proceedings of the 10th European Conference on Computer Vision. Marseille, France:Springer, 2008. 527-540

施引文献

资源附件(0)

访问统计

图(11) / 表(4)

计量

文章访问数: 2709
HTML全文浏览量: 760
PDF下载量: 851
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

姿态特征与深度特征在图像动作识别中的混合应用

doi: 10.16383/j.aas.2018.c170294

作者简介:
沈一帆复旦大学计算机科学技术学院教授.研究方向为计算机图形学和科学计算的可视化.E-mail:yfshen@fudan.edu.cn

通讯作者:
钱银中复旦大学计算机科学技术学院博士研究生, 常州信息职业技术学院副教授.主要研究方向为计算机视觉和机器学习.本文通信作者.E-mail:yinzhongqian10@fudan.edu.cn

计量

Hybrid of Pose Feature and Depth Feature for Action Recognition in Static Image

Author Bio:
Professor at the School of Computer Science, Fudan University. His research interest covers computer graphics and scientiflc computing visualization

Corresponding author: QIAN Yin-Zhong Ph. D. candidate at the School of Computer Science, Fudan University. He is also an associate professor in Changzhou College of Information Technology. His research interest covers computer vision and machine learning. Corresponding author of this paper

计量

目录

留言板

姿态特征与深度特征在图像动作识别中的混合应用

doi: 10.16383/j.aas.2018.c170294

作者简介: 沈一帆 复旦大学计算机科学技术学院教授.研究方向为计算机图形学和科学计算的可视化.E-mail:yfshen@fudan.edu.cn

通讯作者: 钱银中 复旦大学计算机科学技术学院博士研究生, 常州信息职业技术学院副教授.主要研究方向为计算机视觉和机器学习.本文通信作者.E-mail:yinzhongqian10@fudan.edu.cn

计量

出版历程

Hybrid of Pose Feature and Depth Feature for Action Recognition in Static Image

Author Bio: Professor at the School of Computer Science, Fudan University. His research interest covers computer graphics and scientiflc computing visualization

Corresponding author: QIAN Yin-Zhong Ph. D. candidate at the School of Computer Science, Fudan University. He is also an associate professor in Changzhou College of Information Technology. His research interest covers computer vision and machine learning. Corresponding author of this paper

计量

出版历程

目录

作者简介:
沈一帆复旦大学计算机科学技术学院教授.研究方向为计算机图形学和科学计算的可视化.E-mail:yfshen@fudan.edu.cn

通讯作者:
钱银中复旦大学计算机科学技术学院博士研究生, 常州信息职业技术学院副教授.主要研究方向为计算机视觉和机器学习.本文通信作者.E-mail:yinzhongqian10@fudan.edu.cn

Author Bio:
Professor at the School of Computer Science, Fudan University. His research interest covers computer graphics and scientiflc computing visualization