-
摘要: 针对视觉词袋(Bag-of-visual-words,BOVW)模型直方图量化误差大的缺点,提出基于稀疏编码的图像检索算法.由于大多数图像特征属于非线性流形结构,传统稀疏编码使用向量空间对其度量必然导致不准确的稀疏表示.考虑到图像特征空间的流形结构,选择对称正定矩阵作为特征描述子,构建黎曼流形空间.利用核技术将黎曼流形结构映射到再生核希尔伯特空间,非线性流形转换为线性稀疏编码,获得图像更准确的稀疏表示.实验在Corel1000和Caltech101两个数据集上进行,与已有的图像检索算法对比,提出的图像检索算法不仅提高了检索准确率,而且获得了更好的检索性能.Abstract: In the BOVW (bag-of-visual-words) model, histogram quantization would result in a bigger error for image retrieval. Considering this shortcoming, a new image retrieval algorithm based on sparse coding is proposed. Most image features belongs to nonlinear manifold structure, but the traditional sparse coding uses vector space to measure image feature space, which must lead to an inaccurate sparse representation. Owing to the manifold structure of image features space, symmetric positive definite matrices are selected as feature descriptors to build a Riemannian manifold space. Through the kernel method, the Riemann manifold structure is mapped into the reproducing kernel Hilbert space, and nonlinear manifold is converted into linear sparse coding, so the image can acquire a more accurate sparse representation. Experiments are performed on the Corel1000 database and Caltech101 database. In comparison with the existing image retrieval algorithms, the new image retrieval algorithm largely improves the retrieval accuracy and has a better efficiency.1) 本文责任编委 贾云得
-
表 1 Corel1000数据集各类算法MAP值对比
Table 1 The MAP contrastive results of different algorithms on Corel11000 database
各类算法 MAP (%) Error n-Grams算法 42.31 士0.0729 LTrPs算法 54.25 士0.0533 RMSC算法 54.25 士0.0468 表 2 Caltech101数据集各类算法MAP值对比
Table 2 The MAP contrastive results of different algorithms on Caltech101 database
各类算法 MAP (%) Error n-Grams算法 28.32 士0.0898 LTrPs算法 43.81 士0.0732 RMSC算法 51.31 士0.0539 表 3 Caltech101数据集的图像类别
Table 3 The image classification on Caltech101 database
1~17类 18~34类 35~51类 52~68类 69~85类 86~101类 1 faces 18 camera 35 dragonfly 52 ibis 69 okapi 86 stapler 2 faces_easy 19 cannon 36 electric_guitar 53 inline_skate 70 pagoda 87 starfish 3 leopards 20 car_side 37 elephant 54 joshua_tree 71 panda 88 stegosaurus 4 motorbikes 21 ceiling_fan 38 emu 55 kangaroo 72 pigeon 89 stop_sign 5 accordion 22 cellphone 39 euphonium 56 ketch 73 pizza 90 strawberry 6 airplanes 23 chair 40 ewer 57 lamp 74 platypus 91 sunflower 7 anchor 24 chandelier 41 ferry 58 laptop 75 pyramid 92 tick 8 ant 25 cougar_body 42 flamingo 59 llama 76 revolver 93 trilobite 9 barrel 26 cougar_face 43 flamingo_head 60 lobster 77 rhino 94 umbrella 10 bass 27 crab 44 garfield 61 lotus 78 rooster 95 watch 11 beaver 28 crayfish 45 gerenuk 62 mandolin 79 saxophone 96 water」illy 12 binocular 29 crocodile 46 gramophone 63 mayfly 80 schooner 97 wheelchair 13 bonsai 30 crocodile_head 47 grand_piano 64 menorah 81 scissors 98 wild_cat 14 brain 31 cup 48 hawksbill 65 metronome 82 scorpion 99 windsor_chair 15 brontosaurus 32 dalmatian 49 headphone 66 minaret 83 sea_horse 100 wrench 16 buddha 33 dollar_bill 50 hedgehog 67 nautilus 84 snoopy 101 yin_yang 17 butterfly 34 dolphin 51 helicopter 68 octopus 85 soccer_ball -
[1] Sivic J, Zisserman A. Video google: a text retrieval approach to object matching in videos. In: Proceedings of the 9th IEEE International Conference on Computer Vision. Nice, France: IEEE, 2003. 1470-1477 [2] 刘鹏, 叶志鹏, 赵巍, 唐降龙.一种多层次抽象语义决策图像分类方法.自动化学报, 2015, 41(5): 960-969 http://www.aas.net.cn/CN/abstract/abstract18670.shtmlLiu Peng, Ye Zhi-Peng, Zhao Wei, Tang Xiang-Long. A multiple layer abstract semantic decision method for image classification. Acta Automatica Sinica, 2015, 41(5): 960-969 http://www.aas.net.cn/CN/abstract/abstract18670.shtml [3] 张琳波, 王春恒, 肖柏华, 邵允学.基于Bag-of-phrases的图像表示方法.自动化学报, 2012, 38(1): 46-54 http://www.aas.net.cn/CN/abstract/abstract17634.shtmlZhang Lin-Bo, Wang Chun-Heng, Xiao Bai-Hua, Shao Yun-Xue. Image representation using bag-of-phrases. Acta Automatica Sinica, 2012, 38(1): 46-54 http://www.aas.net.cn/CN/abstract/abstract17634.shtml [4] El Sayad I, Martinet J, Urruty T, Djeraba C. Toward a higher-level visual representation for content-based image retrieval. Multimedia Tools and Applications, 2012, 60(2): 455-482 doi: 10.1007/s11042-010-0596-x [5] Pedrosa G V, Traina A J M. From bag-of-visual-words to bag-of-visual-phrases using n-grams. In: Proceedings of the 2013 XXVI Conference on Graphics, Patterns and Images. Arequipa, Peru: IEEE, 2013. 304-311 [6] Shriwas M K, Raut V R. Content based image retrieval: a past, present and new feature descriptor. In: Proceedings of the 2015 International Conference on Circuit, Power and Computing Technologies. Nagercoil, India: IEEE, 2015. 1-7 [7] Cherian A, Morellas V, Papanikolopoulos N. Bayesian nonparametric clustering for positive definite matrices. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(5): 862-874 doi: 10.1109/TPAMI.2015.2456903 [8] Wu Y W, Jia Y D, Li P H, Zhang J, Yuan J S. Manifold kernel sparse representation of symmetric positive-definite matrices and its applications. IEEE Transactions on Image Processing, 2015, 24(11): 3729-3741 doi: 10.1109/TIP.2015.2451953 [9] Tabia H, Laga H. Covariance-based descriptors for efficient 3D shape matching, retrieval, and classification. IEEE Transactions on Multimedia, 2015, 17(9): 1591-1603 doi: 10.1109/TMM.2015.2457676 [10] 李广伟, 刘云鹏, 尹健, 史泽林.基于黎曼流形的平面目标识别.自动化学报, 2010, 36(4): 465-474 http://www.aas.net.cn/CN/abstract/abstract13693.shtmlLi Guang-Wei, Liu Yun-Peng, Yin Jian, Shi Ze-Lin. Planar object recognition based on Riemannian manifold. Acta Automatica Sinica, 2010, 36(4): 465-474 http://www.aas.net.cn/CN/abstract/abstract13693.shtml [11] Jayasumana S, Hartley R, Salzmann M, Li H D, Harandi M. Kernel methods on the Riemannian manifold of symmetric positive definite matrices. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2013. 73-80 [12] Harandi M T, Hartley R, Lovell B, Sanderson C. Sparse coding on symmetric positive definite manifolds using bregman divergences. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(6): 1294-1306 doi: 10.1109/TNNLS.2014.2387383 [13] Tuzel O, Porikli F, Meer P. Region covariance: a fast descriptor for detection and classification. In: Proceedings of the 9th European Conference on Computer Vision. Graz, Austria: Springer, 2006. 589-600 [14] Arsigny V, Fillard P, Pennec X, Ayache N. Log-Euclidean metrics for fast and simple calculus on diffusion tensors. Magnetic Resonance in Medicine, 2006, 56(2): 411-421 doi: 10.1002/(ISSN)1522-2594 [15] Pennec X, Fillard P, Ayache N. A Riemannian framework for tensor computing. International Journal of Computer Vision, 2006, 66(1): 41-66 doi: 10.1007/s11263-005-3222-z [16] Sra S. A new metric on the manifold of kernel matrices with application to matrix geometric means. In: Proceedings of the 2012 Advances in Neural Information Processing Systems 25. Lake Tahoe, Nevada, USA: Curran Associates Inc., 2012. 144-152 [17] Cherian A, Sra S, Banerjee A, Papanikolopoulos N. Jensen-Bregman LogDet divergence with application to efficient similarity search for covariance matrices. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(9): 2161-2174 doi: 10.1109/TPAMI.2012.259 [18] Sra S, Hosseini R. Conic geometric optimization on the manifold of positive definite matrices. SIAM Journal on Optimization, 2015, 25(1): 713-739 doi: 10.1137/140978168 [19] Sra S. Positive definite matrices and the S-divergence [Online], available: http://arxiv.org/pdf/1110.1773.pdf, May 23, 2016 [20] Harandi M, Sanderson C, Shen C, Lovell B. Dictionary learning and sparse coding on Grassmann manifolds: an extrinsic solution. In: Proceedings of the 2013 IEEE International Conference on Computer Vision. Sydney, NSW, Australia: IEEE, 2013. 3120-3127 [21] Aharon M, Elad M, Bruckstein A. K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 2006, 54(11): 4311-4322 doi: 10.1109/TSP.2006.881199 [22] Xie Y C, Ho J, Vemuri B. On a nonlinear generalization of sparse coding and dictionary learning. In: Proceedings of the 30th International Conference on Machine Learning. Atlanta, GA, USA: IEEE, 2013. 1480-1488 [23] Zhang S P, Kasiviswanathan S, Yuen P C, Harandi M. Online dictionary learning on symmetric positive definite manifolds with vision applications. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence. Austin, Texas, USA: AAAI Press, 2015. 3165-3173 [24] Schölkopf B, Platt J, Hofmann T. Efficient sparse coding algorithms. In: Proceedings of the 2006 Advances in Neural Information Processing Systems 19. Vancouver, British Columbia, Canada: MIT Press, 2006. 801-808 [25] Higham N J. Computing a nearest symmetric positive semidefinite matrix. Linear Algebra and Its Applications, 1988, 103: 103-118 doi: 10.1016/0024-3795(88)90223-6 [26] Powers D M W. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2011, 2(1): 37-63 http://www.peerevaluation.org/pdf/download/libraryID:29919 [27] Turpin A, Scholer F. User performance versus precision measures for simple search tasks. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Seattle, Washington, USA: ACM, 2006. 11-18