基于多尺度上下文的图像标注算法

周全; 王磊; 周亮; 郑宝玉

doi:10.3724/SP.J.1004.2014.02944

基于多尺度上下文的图像标注算法

doi: 10.3724/SP.J.1004.2014.02944 cstr: 32138.14.SP.J.1004.2014.02944

周全^1, ,,
王磊^1,2,
周亮¹,
郑宝玉¹

1.
南京邮电大学宽带无线通信与传感网技术教育部重点实验室南京 210003;
2.
东南大学移动通信国家重点实验室南京 210096

基金项目:

国家自然科学基金(61201165,61271240,61201164),高等学校博士学科点专项科研基金(20113223120002),中国博士后科学基金(2013M531392),江苏高校优势学科建设工程资助项目,东南大学移动通信国家重点实验室开放研究基金(2011D05),南京邮电大学科研基金(NY210072,NY213067)资助

详细信息

作者简介:
王磊南京邮电大学通信与信息工程学院副教授, 博士. 主要研究方向为无线通信. E-mail: wanglei@njupt.edu.cn

通讯作者:
周全南京邮电大学通信与信息工程学院讲师, 博士. 主要研究方向为图像/视频处理, 计算机视觉, 机器学习与模式识别. 本文通信作者.E-mail: quan.zhou@njupt.edu.cn

计量
- 文章访问数: 2841
- HTML全文浏览量: 181
- PDF下载量: 770
- 被引次数: 0
出版历程
- 收稿日期: 2013-12-03
- 修回日期: 2014-05-23
- 刊出日期: 2014-12-20

Multi-scale Contextual Image Labeling

1.
Key Laboratory of Broadband Wireless Communication and Sensor Network Technology, Nanjing University of Posts and Telecommunications, Nanjing 210003;
2.
National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096

Funds:

Supported by National Natural Science Foundation of China (61201165, 61271240, 61201164), Specialized Research Fund for the Doctoral Program of Higher Education (20113223120002), China Postdoctoral Science Foundation (2013M531392), A Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions, The Open Research Fund of National Mobile Communications Research Laboratory, Southeast University (2011D05), and the Scientific Research Foundation of Nanjing University of Posts and Telecommunications (NY210072, NY213067)

摘要

摘要: 提出了一种在层次化分割框架下,通过结合图像的底层局部特征以及高层的上下文特征,进行图像自动语义标注的新算法. 该算法的核心思想在于对较大的图像区域的识别结果有利于对其包含的较小图像区域进行识别.算法首先对每层分割后的图像区域进行识别, 然后利用贝叶斯定理将各层区域识别的结果通过线性加权的方式进行融合,从而达到对整幅图像进行自动语义标注的目的.与现有的图像标注算法相比,仿真实验表明本文算法获得了最好的标注精度以及最快的标注速度.
- 图像标注与理解 /
- 图像分割 /
- 特征选择 /
- 分类
Abstract: This paper provides a novel method for image labeling by combining the local features and contextual cues in a multiple segmentation framework. Our main insight is that identifying a larger image region provides strong evidence for classifying the contained smaller ones. The proposed method weights the classification results of each image region at different levels using the Bayesian rules, which are obtained by a series of learned discriminative models based on bag of features. Multiple segmentation framework provides a robust representation, allowing a wide variety of cues to contribute to the confidence in each semantic label. Compared with previous methods, the algorithm achieves the state-of-the-art results and fastest implemental speed on the benchmark dataset.
- Image labeling and understanding /
- segmentation /
- feature selection /
- classification

HTML全文

参考文献(17)

[1]	Shotton J, Winn J W, Rother C, Criminisi A. Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 2009, 81(1): 2-23
[2]	[2] Tu Z W, Bai X. Auto-context and its application to highlevel vision tasks and 3D brain image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(10): 1744-1757
[3]	[3] Gould S, Rodgers J, Cohen D, Elidan E, Koller D. Multi-class segmentation with relative location prior. International Journal of Computer Vision, 2008, 80(3): 300-316
[4]	[4] Gould S, Fulton R, Koller D. Decomposing a scene into geometric and semantically consistent regions. In: Proceedings of the 12th IEEE Conference on Computer Vision. Kyoto, Japan: IEEE, 2009. 1-8
[5]	Jiang Li-Xing, Hou Jin. Image annotation using the ensemble learning. Acta Automatica Sinica, 2012, 38(8): 1257-1262(蒋黎星, 侯进. 基于集成分类算法的自动图像标注. 自动化学报, 2012, 38(8): 1257-1262)
[6]	Zhang Su-Lan, Guo Ping, Zhang Ji-Fu, Hu Li-Hua. Automatic semantic image annotation with granular analysis method. Acta Automatica Sinica, 2012, 38(5): 688-697(张素兰, 郭平, 张继福, 胡立华. 图像语义自动标注及其粒度分析方法. 自动化学报, 2012, 38(5): 688-697)
[7]	Yang Dong, Zhou Xiu-Ling, Guo Ping. Image annotation with Bayesian universal background model. Acta Automatica Sinica, 2013, 39(10): 1674-1680(杨栋, 周秀玲, 郭平. 基于贝叶斯通用背景模型的图像标注. 自动化学报, 2013, 39(10): 1674-1680)
[8]	[8] Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 2008 IEEE Conference on Machine Learning. Helsinki, Finland: IEEE, 2008. 282-289
[9]	[9] Galleguillos C, Rabinovich A, Belongie S. Object categorization using co-occurrence, location and appearance. In: Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA: IEEE, 2008. 1-8
[10]	Hoiem D, Efros A A, Hebert M. Geometric context from a single image. In: Proceedings of the 2005 IEEE Conference on Computer Vision. Beijing, China: IEEE, 2005. 654-661
[11]	He X M, Zemel R S, Ray D. Learning and incorporating top-down cues in image segmentation. In: Proceedings of the 2006 Europe Conference on Computer Vision. Berlin Heidelberg: Springer, 2006. 338-351
[12]	Medin D L, Schaffer M M. Context theory of classification learning. Psychological Review, 1978, 85(3): 207-238
[13]	Yang L, Meer P, Foran D J. Multiple class segmentation using a unified framework over mean-shift patches. In: Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA: IEEE, 2007. 1-8
[14]	Comaniciu D, Meer P. Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(5): 603-617
[15]	Elkan C. Using the triangle inequality to accelerate k-means. In: Proceedings of the 2003 IEEE Conference on Machine Learning. Washington D.C., USA: IEEE, 2003. 147-153
[16]	Collins M, Schapire R, Singer Y. Logistic regression, adaboost and Bregman distances. Machine Learning, 2002, 48(1-3): 253-285
[17]	Yao B, Yang X, Zhu S C. Introduction to a large scale general purpose groundtruth dataset: methodology, annotation tool, and benchmark. In: Proceedings of the 2009 Energy Minimization Methods in Computer Vision and Pattern Recognition. Berlin, Heidelberg: Springer-Verlag, 2007. 169-183