2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于深度卷积特征的细粒度图像分类研究综述

罗建豪 吴建鑫

罗建豪, 吴建鑫. 基于深度卷积特征的细粒度图像分类研究综述. 自动化学报, 2017, 43(8): 1306-1318. doi: 10.16383/j.aas.2017.c160425
引用本文: 罗建豪, 吴建鑫. 基于深度卷积特征的细粒度图像分类研究综述. 自动化学报, 2017, 43(8): 1306-1318. doi: 10.16383/j.aas.2017.c160425
LUO Jian-Hao, WU Jian-Xin. A Survey on Fine-grained Image Categorization UsingDeep Convolutional Features. ACTA AUTOMATICA SINICA, 2017, 43(8): 1306-1318. doi: 10.16383/j.aas.2017.c160425
Citation: LUO Jian-Hao, WU Jian-Xin. A Survey on Fine-grained Image Categorization UsingDeep Convolutional Features. ACTA AUTOMATICA SINICA, 2017, 43(8): 1306-1318. doi: 10.16383/j.aas.2017.c160425

基于深度卷积特征的细粒度图像分类研究综述

doi: 10.16383/j.aas.2017.c160425
基金项目: 

国家自然科学基金 61422203

详细信息
    作者简介:

    罗建豪    南京大学计算机科学与技术系博士研究生.2015年获得吉林大学计算机科学与技术学院学士学位.主要研究方向为计算机视觉与机器学习.E-mail:luojh@lamda.nju.edu.cn

    通讯作者:

    吴建鑫    南京大学计算机科学与技术系教授.分别于1999年, 2002年获得南京大学计算机科学与技术系学士, 硕士学位.于2009年获得美国佐治亚理工学院博士学位.曾担任新加坡南洋理工大学计算机工程学院助理教授.主要研究方向为计算机视觉与机器学习.本文通信作者.E-mail:wujx2001@nju.edu.cn

  • 本文责任编委 王亮

A Survey on Fine-grained Image Categorization UsingDeep Convolutional Features

Funds: 

National Natural Science Foundation of China 61422203

More Information
    Author Bio:

       Ph. D. candidate in the Department of Computer Science and Technology, Nanjing University. He received his bachelor degree from the College of Computer Science and Technology, Jilin University in 2015. His research interest covers computer vision and machine learning.E-mail:

    Corresponding author: WU Jian-Xin    Professor in the Department of Computer Science and Technology, Nanjing University. He received his bachelor and master degrees from Nanjing University in 1999 and 2002, respectively. In 2009, he received his Ph. D. degree in computer science from the Georgia Institute of Technology, USA. He was an assistant professor at the Nanyang Technological University, Singapore. His research interest covers computer vision and machine learning. Corresponding author of this paper.E-mail:wujx2001@nju.edu.cn
  • 摘要: 细粒度图像分类问题是计算机视觉领域一项极具挑战的研究课题,其目标是对子类进行识别,如区分不同种类的鸟.由于子类别间细微的类间差异和较大的类内差异,传统的分类算法不得不依赖于大量的人工标注信息.近年来,随着深度学习的发展,深度卷积神经网络为细粒度图像分类带来了新的机遇.大量基于深度卷积特征算法的提出,促进了该领域的快速发展.本文首先从该问题的定义以及研究意义出发,介绍了细粒度图像分类算法的发展现状.之后,从强监督与弱监督两个角度对比分析了不同算法之间的差异,并比较了这些算法在常用数据集上的性能表现.最后,我们对这些算法进行了总结,并讨论了该领域未来可能的研究方向及其面临的挑战.
    1)  本文责任编委 王亮
  • 图  1  细粒度图像分类示意图(取自CUB200-2011数据集[1])

    Fig.  1  Illustration of fine-grained categorization (sampled from the CUB200-2011 dataset[1])

    图  2  细粒度图像数据库示意图(所有图像均取自不同类别)

    Fig.  2  Illustration of fine-grained datasets (the images are sampled from different categories)

    图  3  卷积神经网络框架图

    Fig.  3  The framework of convolutional neural networks

    图  4  Part R-CNN流程图[43]

    Fig.  4  Part R-CNN system overview[43]

    图  5  姿态归一化CNN流程图[48]

    Fig.  5  Pose normalized CNN system overview[48]

    图  6  算法流程图[12]

    Fig.  6  System overview[12]

    图  7  双线性CNN网络结构图[13]

    Fig.  7  Illustration of Bilinear CNN[13]

    表  1  CUB200-2011[1]数据库上的算法性能比较(其中BBox指标注框信息(Bounding Box), Parts指局部区域信息)

    Table  1  Performance of different algorithms in CUB200-2011[1] (where BBox refers to bounding box, Parts means part annotations)

    算法BBox
    (训练)
    Parts
    (训练)
    BBox
    (测试)
    Parts
    (测试)
    简要描述准确率(%)
    CUB[1]SIFT + BoW + SVM10.3
    CUB[1]SIFT + BoW + SVM17.3
    [2mm] POOF[26]POOF + SVM56.8
    POOF[26]POOF + SVM73.3
    Alignment[31]Fisher + SVM62.7
    Symbiotic[30]Fisher + SVM61
    [2mm] DeCAF[25]Alex-Net + Logistic Regression61
    Part R-CNN[43]Alex-Net + Fine-Tune + SVM73.9
    Pose Normalized CNN[48]Alex-Net + Fine-Tune + SVM75.7
    Pose Normalized CNN[48]Alex-Net + Fine-Tune + SVM85.4
    [2mm] Two-level Attention[56]Alex-Net69.7
    Two-level Attention[56]VGG16-Net77.9
    Zhang et al.[12]VGG16-Net + Fine-Tune + SVM79.3
    Constellations[58]VGG19-Net + Fine-Tune + Flip + SVM81
    Bilinear CNN[13]VGG19-Net/VGG-M + Flip84.1
    Spatial Transformer Net[55]Inception[62] + Flip84.1
    下载: 导出CSV
  • [1] Wah C, Branson S, Welinder P, Perona P, Belongie S. The Caltech-UCSD Birds-200-2011 Dataset, Technical Report CNS-TR-2011-001, California Institute of Technology, Pasadena, CA, USA, 2011
    [2] Bosch A, Zisserman A, Muñoz X. Scene classification using a hybrid generative/discriminative approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(4):712-727 doi: 10.1109/TPAMI.2007.70716
    [3] Wu J X, Rehg J M. CENTRIST:a visual descriptor for scene categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(8):1489-1501 doi: 10.1109/TPAMI.2010.224
    [4] Gehler P, Nowozin S. On feature combination for multiclass object classification. In:Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto, Japan:IEEE, 2009. 221-228
    [5] Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y. What is the best multi-stage architecture for object recognition? In:Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto, Japan:IEEE, 2009. 2146-2153
    [6] Wright J, Yang A Y, Ganesh A, Sastry S S, Ma Y. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(2):210-227 doi: 10.1109/TPAMI.2008.79
    [7] 李晓莉, 达飞鹏.基于排除算法的快速三维人脸识别方法.自动化学报, 2010, 36(1):153-158 http://www.aas.net.cn/CN/abstract/abstract13642.shtml

    Li Xiao-Li, Da Fei-Peng. A rapid method for 3D face recognition based on rejection algorithm. Acta Automatica Sinica, 2010, 36(1):153-158 http://www.aas.net.cn/CN/abstract/abstract13642.shtml
    [8] Khosla A, Jayadevaprakash N, Yao B P, Li F F. Novel dataset for fine-grained image categorization. In:Proceedings of the 1st Workshop on Fine-Grained Visual Categorization (FGVC), IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Springs, USA:IEEE, 2011.
    [9] Nilsback M E, Zisserman A. Automated flower classification over a large number of classes. In:Proceedings of the 6th Indian Conference on Computer Vision, Graphics & Image Processing. Bhubaneswar, India:IEEE, 2008. 722-729
    [10] Krause J, Stark M, Deng J, Li F F. 3D object representations for fine-grained categorization. In:Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops (ICCVW). Sydney, Australia:IEEE, 2013. 554-561
    [11] Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A. Fine-grained visual classification of aircraft[Online], available:https://arxiv.org/abs/1306.5151, June 21, 2013
    [12] Zhang Y, Wei X S, Wu J X, Cai J F, Lu J B, Nguyen V A, Do M N. Weakly supervised fine-grained categorization with part-based image representation. IEEE Transactions on Image Processing, 2016, 25(4):1713-1725 doi: 10.1109/TIP.2016.2531289
    [13] Lin T Y, RoyChowdhury A, Maji S. Bilinear CNN models for fine-grained visual recognition. In:Proceedings of the 15th IEEE International Conference on Computer Vision (ICCV). Santiago, Chile:IEEE, 2015. 1449-1457
    [14] 张琳波, 王春恒, 肖柏华, 邵允学.基于Bag-of-phrases的图像表示方法.自动化学报, 2012, 38(1):46-54 http://www.aas.net.cn/CN/abstract/abstract17634.shtml

    Zhang Lin-Bo, Wang Chun-Heng, Xiao Bai-Hua, Shao Yun-Xue. Image representation using bag-of-phrases. Acta Automatica Sinica, 2012, 38(1):46-54 http://www.aas.net.cn/CN/abstract/abstract17634.shtml
    [15] 余旺盛, 田孝华, 侯志强.基于区域边缘统计的图像特征描述新方法.计算机学报, 2014, 37(6):1398-1410 http://www.cnki.com.cn/Article/CJFDTOTAL-JSJX201406018.htm

    Yu Wang-Sheng, Tian Xiao-Hua, Hou Zhi-Qiang. A new image feature descriptor based on region edge statistical. Chinese Journal of Computers, 2014, 37(6):1398-1410 http://www.cnki.com.cn/Article/CJFDTOTAL-JSJX201406018.htm
    [16] 颜雪军, 赵春霞, 袁夏. 2DPCA-SIFT:一种有效的局部特征描述方法.自动化学报, 2014, 40(4):675-682 http://www.aas.net.cn/CN/abstract/abstract18333.shtml

    Yan Xue-Jun, Zhao Chun-Xia, Yuan Xia. 2DPCA-SIFT:an efficient local feature descriptor. Acta Automatica Sinica, 2014, 40(4):675-682 http://www.aas.net.cn/CN/abstract/abstract18333.shtml
    [17] Lowe D G. Object recognition from local scale-invariant features. In:Proceedings of the 7th IEEE International Conference on Computer Vision. Kerkyra, Greece:IEEE, 1999. 1150-1157
    [18] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In:Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA:IEEE, 2005. 886-893
    [19] Jégou H, Douze M, Schmid C, Pérez P. Aggregating local descriptors into a compact image representation. In:Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, USA:IEEE, 2010. 3304-3311
    [20] Perronnin F, Dance C. Fisher kernels on visual vocabularies for image categorization. In:Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA:IEEE, 2007. 1-8
    [21] Sánchez J, Perronnin F, Mensink T, Verbeek J. Image classification with the Fisher vector:theory and practice. International Journal of Computer Vision, 2013, 105(3):222-245 doi: 10.1007/s11263-013-0636-x
    [22] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In:Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada, USA:MIT Press, 2012. 1097-1105
    [23] 高莹莹, 朱维彬.深层神经网络中间层可见化建模.自动化学报, 2015, 41(9):1627-1637 http://www.aas.net.cn/CN/abstract/abstract18736.shtml

    Gao Ying-Ying, Zhu Wei-Bin. Deep neural networks with visible intermediate layers. Acta Automatica Sinica, 2015, 41(9):1627-1637 http://www.aas.net.cn/CN/abstract/abstract18736.shtml
    [24] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553):436-444 doi: 10.1038/nature14539
    [25] Donahue J, Jia Y Q, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T. DeCAF:a deep convolutional activation feature for generic visual recognition. In:Proceedings of the 31st International Conference on Machine Learning. Beijing, China:ACM, 2014. 647-655
    [26] Berg T, Belhumeur P N. POOF:part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In:Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Portland, USA:IEEE, 2013. 955-962
    [27] Perronnin F, Sánchez J, Mensink T. Improving the fisher kernel for large-scale image classification. In:Proceedings of the 11th European Conference on Computer Vision. Berlin Heidelberg, Germany:Springer, 2010. 143-156
    [28] Bo L, Ren X, Fox D. Kernel descriptors for visual recognition. In:Proceedings of the 24th Annual Conference on Neural Information Processing Systems. Vancouver, Canada:MIT Press, 2010. 244-252
    [29] Branson S, Van Horn G, Wah C, Perona P, Belongie S. The ignorant led by the blind:a hybrid human-machine vision system for fine-grained categorization. International Journal of Computer Vision, 2014, 108(1-2):3-29 doi: 10.1007/s11263-014-0698-4
    [30] Chai Y N, Lempitsky V, Zisserman A. Symbiotic segmentation and part localization for fine-grained categorization. In:Proceedings of the 14th IEEE International Conference on Computer Vision (ICCV). Sydney, Australia:IEEE, 2013. 321-328
    [31] Gavves E, Fernando B, Snoek C G M, Smeulders A W M, Tuytelaars T. Fine-grained categorization by alignments. In:Proceedings of the 14th IEEE International Conference on Computer Vision (ICCV). Sydney, Australia:IEEE, 2013. 1713-1720
    [32] Yao B P, Bradski G, Li F F. A codebook-free and annotation-free approach for fine-grained image categorization. In:Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, USA:IEEE, 2012. 3466-3473
    [33] Yang S L, Bo L F, Wang J, Shapiro L. Unsupervised template learning for fine-grained object recognition. In:Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA:MIT Press, 2012. 3122-3130
    [34] Branson S, Wah C, Schroff F, Babenko B, Welinder P, Perona P, Belongie S. Visual recognition with humans in the loop. In:Proceedings of the 11th European Conference on Computer Vision. Berlin Heidelberg, Germany:Springer, 2010. 438-451
    [35] Wah C, Branson S, Perona P, Belongie S. Multiclass recognition and part localization with humans in the loop. In:Proceedings of the 13th IEEE International Conference on Computer Vision (ICCV). Barcelona, Spain:IEEE, 2011. 2524-2531
    [36] LeCun Y, Boser B, Denker J S, Henderson D, Howard R E, Hubbard W, Jackel L D. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1989, 1(4):541-551 doi: 10.1162/neco.1989.1.4.541
    [37] LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11):2278-2324 doi: 10.1109/5.726791
    [38] Zeiler M D, Fergus R. Visualizing and understanding convolutional networks. In:Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland:Springer, 2014. 818-833
    [39] Gong Y C, Wang L W, Guo R Q, Lazebnik S. Multi-scale orderless pooling of deep convolutional activation features. In:Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland:Springer, 2014. 392-407
    [40] Cimpoi M, Maji S, Vedaldi A. Deep filter banks for texture recognition and segmentation. In:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA:IEEE, 2015. 3828-3836
    [41] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[Online], available:https://arxiv.org/abs/1409.1556, April 10, 2015
    [42] Welinder P, Branson S, Mita T, Wah C, Schroff F, Belongie S, Perona P. Caltech-UCSD Birds 200, Technical Report CNS-TR-2010-001, California Institute of Technology, Pasadena, CA, USA, 2010
    [43] Zhang N, Donahue J, Girshick R, Darrell T. Part-based R-CNNs for fine-grained category detection. In:Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland:Springer, 2014. 834-849
    [44] Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In:Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, USA:IEEE, 2014. 580-587
    [45] Viola P, Jones M J. Robust real-time face detection. International Journal of Computer Vision, 2004, 57(2):137-154 doi: 10.1023/B:VISI.0000013087.49260.fb
    [46] Wu J X, Liu N N, Geyer C, Rehg M J. C^4:a real-time object detection framework. IEEE Transactions on Image Processing, 2013, 22(10):4096-4107 doi: 10.1109/TIP.2013.2270111
    [47] Uijlings J R R, van de Sande K E A, Gevers T, Smeulders A W M. Selective search for object recognition. International Journal of Computer Vision, 2013, 104(2):154-171 doi: 10.1007/s11263-013-0620-5
    [48] Branson S, Van Horn G, Belongie S, Perona P. Bird species categorization using pose normalized deep convolutional nets[Online], available:https://arxiv.org/abs/1406.2952, June 11, 2014
    [49] Branson S, Beijbom O, Belongie S. Efficient large-scale structured learning. In:Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Portland, USA:IEEE, 2013. 1806-1813
    [50] Krause J, Jin H L, Yang J C, Li F F. Fine-grained recognition without part annotations. In:Proceedings of the 15th IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA:IEEE, 2015. 5546-5555
    [51] Guillaumin M, Küttel D, Ferrari V. Imagenet auto-annotation with segmentation propagation. International Journal of Computer Vision, 2014, 110(3):328-348 doi: 10.1007/s11263-014-0713-9
    [52] Kuettel D, Guillaumin M, Ferrari V. Segmentation propagation in imagenet. In:Proceedings of the 12th European Conference on Computer Vision. Berlin Heidelberg, Germany:Springer, 2012. 459-473
    [53] Lin D, Shen X Y, Lu C W, Jia J Y. Deep LAC:deep localization, alignment and classification for fine-grained recognition. In:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA:IEEE, 2015. 1666-1674
    [54] Xu Z, Huang S L, Zhang Y, Tao D C. Augmenting strong supervision using web data for fine-grained categorization. In:Proceedings of the 15th IEEE International Conference on Computer Vision (ICCV). Santiago, Chile:IEEE, 2015. 2524-2532
    [55] Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K. Spatial transformer networks. In:Proceedings of the 29th Annual Conference on Neural Information Processing Systems. Montreal, Canada:MIT Press, 2015. 2017-2025
    [56] Xiao T J, Xu Y C, Yang K Y, Zhang J X, Peng Y X, Zhang Z. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA:IEEE, 2015. 842-850
    [57] Zhang Y, Wu J X, Cai J F. Compact representation for image classification:to choose or to compress. In:Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, USA:IEEE, 2014. 907-914
    [58] Simon M, Rodner E. Neural activation constellations:unsupervised part model discovery with convolutional networks. In:Proceedings of the 15th IEEE International Conference on Computer Vision (ICCV). Santiago, Chile:IEEE, 2015. 1143-1151
    [59] Simon M, Rodner E, Denzler J. Part detector discovery in deep convolutional neural networks. In:Proceedings of the 12th Asian Conference on Computer Vision. Singapore:Springer, 2014. 162-177
    [60] Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional networks:visualising image classification models and saliency maps[Online], available:https://arxiv.org/abs/1312.6034, April 19, 2014
    [61] Wang D Q, Shen Z Q, Shao J, Zhang W, Xue X Y, Zhang Z. Multiple granularity descriptors for fine-grained categorization. In:Proceedings of the 15th IEEE International Conference on Computer Vision (ICCV). Santiago, Chile:IEEE, 2015. 2399-2406
    [62] Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA:IEEE, 2015. 1-9
    [63] Hall D, Perona P. Fine-grained classification of pedestrians in video:benchmark and state of the art. In:Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA:IEEE, 2015. 5482-5491
    [64] Liu Y, Zhang D S, Lu G J, Ma W Y. A survey of content-based image retrieval with high-level semantics. Pattern Recognition, 2007, 40(1):262-282 doi: 10.1016/j.patcog.2006.04.045
    [65] Datta R, Joshi D, Li J, Wang J Z. Image retrieval:ideas, influences, and trends of the new age. ACM Computing Surveys, 2008, 40(2):Article No.5 http://dl.acm.org/citation.cfm?id=1348248
    [66] Felzenszwalb P F, Girshick R B, McAllester D, Ramanan D. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9):1627-1645 doi: 10.1109/TPAMI.2009.167
    [67] Wei X S, Luo J H, Wu J X. Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Transactions on Image Processing, 2017, 26(6):2868-2881 doi: 10.1109/TIP.2017.2688133
    [68] Xie L X, Wang J D, Zhang B, Tian Q. Fine-grained image search. IEEE Transactions on Multimedia, 2015, 17(5):636-647 doi: 10.1109/TMM.2015.2408566
  • 加载中
图(7) / 表(1)
计量
  • 文章访问数:  3665
  • HTML全文浏览量:  594
  • PDF下载量:  2739
  • 被引次数: 0
出版历程
  • 收稿日期:  2016-05-25
  • 录用日期:  2017-02-03
  • 刊出日期:  2017-08-20

目录

    /

    返回文章
    返回