Texture and Shape Feature Fusion Based Sketch Recognition
-
摘要: 人类具有很强的草图识别能力. 然而, 由于草图具有稀疏性和缺少细节的特点, 目前的深度学习模型在草图分类任务上仍然面临挑战. 目前的工作只是将草图看作灰度图像而忽略了不同草图类别间的形状表示差异. 本文提出一种端到端的手绘草图识别模型, 简称双模型融合网络(Dual-Model Fusion Network, DMF-Net), 它可以通过相互学习策略获取草图的纹理和形状信息. 具体来说, 该模型由两个分支组成: 一个分支能够从图像表示(即原始草图)中自动提取纹理特征, 另一个分支能够从图形表示(即基于点的草图)中自动提取形状特征. 此外, 提出视觉注意一致性损失来度量两个分支之间视觉显著图的一致性, 这样可以保证两个分支关注相同的判别性区域. 最终将分类损失、类别一致性损失和视觉注意一致性损失结合完成DMF-Net网络的优化. 本文在两个具有挑战性的数据集TU-Berlin数据集和Sketchy数据集上进行草图分类实验, 评估结果说明了DMF-Net显著优于基准方法并达到最佳性能.Abstract: Human has a strong ability to recognize hand-drawn sketches. However, state-of-the-art models on sketch classification tasks remain challenging due to the sparse lines and limited details of sketches. Previous deep neural networks treat sketches as general images and ignore the shape representations for different categories. In this paper, we aim to address the problem by an end-to-end hand-drawn sketch recognition model, named Dual-Model Fusion Network (DMF-Net), which can capture both texture and shape information of sketches via a mutual learning strategy. Specifically, our model is composed of two branches: one branch can automatically extract texture features from an image-based representation, i.e., the raw sketches, and the other branch can obtain shape information from a graph-based representation, i.e., point-based sketches. Moreover, we propose an attention consistency loss to measure the attention heat-map consistency between the two branches, which can simultaneously enable the same concentration of discriminative regions in the two representations. Finally, the proposed DMF-Net is optimized by combining classification loss, category consistency loss and attention consistency loss. We conduct extensive experiments on two challenging datasets, TU-Berlin and Sketchy, for sketch classification tasks. Our DMF-Net significantly outperforms baselines, and achieves the new state-of-the-art performance.
-
Key words:
- Sketch classification /
- attention mechanism /
- mutual learning strategy
-
表 1 不同算法下在TU-Berlin数据集上分类准确率的比较
Table 1 Comparison of sketch classification accuracy with different algorithms on the TU-Berlin dataset
方法 8 40 64 72 Eitz (Knn hard) 22% 33% 36% 38% Eitz (Knn soft) 26% 39% 43% 44% Eitz (SVM hard) 32% 48% 53% 53% Eitz (SVM soft) 33% 50% 55% 55% FV size 16 39% 56% 61% 62% FV size 16 (SP) 44% 60% 65% 66% FV size 24 41% 60% 64% 65% FV size 24 (SP) 43% 62% 67% 68% SketchPoint 50% 68% 71% 74% AlexNet 55% 70% 74% 75% NIN 51% 70% 75% 75% VGGNet 54% 67% 75% 76% GoogLeNet 52% 69% 76% 77% Sketch-a-Net 58% 73% 77% 78% SketchNet 58% 74% 77% 80% Cousin Network 59% 75% 78% 80% Hybrid CNN 57% 75% 80% 81% LN 58% 76% 82% 82% SSDA 59% 76% 82% 84% DMF-Net 60% 77% 85% 86% 表 2 在TU-Berlin数据集和Sketchy数据集上实现草图分类的网络结构分析
Table 2 Architecture design analysis for sketch classification on TU-Berlin and Sketchy
方法 TU-Berlin Sketchy BN 82.71% 85.75% BN+GC 83.93% 86.49% BN+AC 84.12% 87.07% BN+CC 84.75% 87.36% BN+GC+CC 85.47% 87.64% BN+AC+CC 85.51% 87.71% BN+GC+AC+CC 86.12% 88.01% 表 3 利用双分支神经网络的草图分类准确率
Table 3 Classification accuracy results using two-branch neural networks
方法 TU-Berlin Sketchy 纹理网络 81.05% 83.18% 形状网络 70.87% 70.43% 基础网络 82.71% 85.75% 表 4 不同层的分类准确率结果
Table 4 Classification accuracy results using given feature levels
方法 TU-Berlin Sketchy {4} 85.83% 87.23% {3,4} 86.01% 87.87% {2,3,4} 86.06% 87.93% {1,2,3,4} 86.12% 88.01% 表 5 两种采样策略在TU-Berlin数据集的分类准确率
Table 5 Classification accuracy on TU-Berlin dataset using two sampling strategies
方法 分类准确率 均匀采样 86.12% 随机采样 86.09% 表 6 不同采样点数对分类准确率的影响
Table 6 Effects of the point number for the classification accuracy
点数数据集 32 64 128 256 512 TU-Berlin数据集 81.87% 82.75% 83.23% 84.34% 85.42% Sketchy数据集 83.37% 84.35% 84.83% 85.90% 87.36% 点数数据集 600 750 1024 1200 1300 TU-Berlin数据集 85.75% 86.00% 86.12% 86.13% 86.08% Sketchy数据集 87.5% 88.00% 88.01% 88.04% 88.01% -
[1] Huang F, Canny J F, Nichols J. Swire: Sketch-based user interface retrieval. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. Glasgow, Scotland, UK: ACM, 2019. 1−10. [2] Dutta A, Akata Z. Semantically tied paired cycle consistency for zero-shot sketch-based image retrieval. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, LA, USA: IEEE, 2019. 5089−5098. [3] Riaz M, Yang Y, Song Y, Xiang T, Timothy M. Learning deep sketch abstraction. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake, USA: IEEE, 2018. 8014−8023. [4] Li K, Pang K, Song J, et al. Universal sketch perceptual grouping. In: Proceedings of the 15th IEEE European Conference on Computer Vision. Munich, Germany: IEEE, 2018. 582−597. [5] Chen W, Hays J. Sketchygan: Towards diverse and realistic sketch to image synthesis. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake, USA: IEEE, 2018. 9416−9425. [6] Zhang M, Zhang J, Chi Y, et al. Cross-domain face sketch synthesis. IEEE Access, 2019, 7: 98866−98874 doi: 10.1109/ACCESS.2019.2931012 [7] Pang K, Li K, Yang Y, et al. Generalising fine-grained sketch-based image retrieval. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019. 677−686. [8] 付晓, 沈远彤, 李宏伟, 程晓梅. 基于半监督编码生成对抗网络的图像分类模型. 自动化学报, 2020, 46(3): 531−539FU Xiao, SHEN Yuan-Tong, LI Hong-Wei, CHENG Xiao-Mei. A Semi-supervised Encoder Generative Adversarial Networks Model for Image Classification. ACTA AUTOMATICA SINICA, 2020, 46(3): 531−539 [9] Zheng Y, Yao H, Sun X, et al. Sketch-specific data augmentation for freehand sketch recognition[J]. arXiv preprint arXiv: 1910.06038, 2019. [10] Hayat S, She K, Mateen M, Yu Y. Deep CNN-based Features for Hand-Drawn Sketch Recognition via Transfer Learning Approach. Editorial Preface From the Desk of Managing Editor, 2019, 10(9 [11] Zhu M, Chen C, Wang N, Tang J, Bao W. Gradually focused fine-grained sketch-based image retrieval. PloS one, 2019, 14(5 [12] Jabal M F A, Rahim M S M, Othman N Z S, Jupri Z. A comparative study on extraction and recognition method of cad data from cad drawings. In: Proceedings of the 2009 IEEE International Conference on Information Management and Engineering. Kuala Lumpur, Malaysia: IEEE, 2009. 709−713. [13] Eitz M, Hays J, Alexa M. How do humans sketch objects?. ACM Transactions on graphics (TOG), 2012, 31(4): 1−10 [14] Schneider R G, Tuytelaars T. Sketch classification and classification-driven analysis using fisher vectors. ACM Transactions on Graphics (TOG), 2014, 33(6): 1−9 [15] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition. San Diego, California: IEEE, 2005. 886−893. [16] Lowe D G. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 2004, 60(2): 91−110 doi: 10.1023/B:VISI.0000029664.99615.94 [17] Chang C, Lin C J. LIBSVM: A library for support vector machines. ACM transactions on intelligent systems and technology, 2011, 2(3): 1−27 [18] 刘丽, 赵凌君, 郭承玉, 王亮, 汤俊. 图像纹理分类方法研究进展和展望. 自动化学报, 2018, 44(4): 584−607LIU Li, ZHAO Ling-Jun, GUO Cheng-Yu, WANG Liang, TANG Jun. Texture Classification: State-of-the-art Methods and Prospects. ACTA AUTOMATICA SINICA, 2018, 44(4): 584−607 [19] Yu Q, Yang Y, Liu F, et al. Sketch-a-net: A deep neural network that beats humans. International journal of computer vision, 2017, 122(3): 411−425 doi: 10.1007/s11263-016-0932-3 [20] He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, Nevada, USA: IEEE, 2016. 770−778. [21] Bui T, Ribeiro L, Ponti M, et al. Sketching out the details: Sketch-based image retrieval using convolutional neural networks with multi-stage regression. Computers & Graphics, 2018, 71: 77−87 [22] Dutta A, Akata Z. Semantically tied paired cycle consistency for zero-shot sketch-based image retrieval. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE 2019. 5089−5098. [23] Li L, Zou C, Zheng Y, et al. Sketch-R2CNN: An Attentive Network for Vector Sketch Recognition[J]. arXiv preprint arXiv: 1811.08170, 2018. [24] Xu P, Song Z, Yin Q, Song Y, Wang L. Deep Self-Supervised Representation Learning for Free-Hand Sketch[J]. arXiv preprint arXiv: 2002.00867, 2020. [25] Xu P, Huang Y, Yuan T, et al. Sketchmate: Deep hashing for million-scale human sketch retrieval. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake, USA, 2018. 8090−8098. [26] Jia Q, Yu M, Fan X, Li H. Sequential dual deep learning with shape and texture features for sketch recognition[J]. arXiv preprint arXiv: 1708.02716, 2017. [27] Liu Y, Tang K, Joneja A. Sketch-based free-form shape modelling with a fast and stable numerical engine. Comput. Graph, 2005: 771−786 [28] 林景栋, 吴欣怡, 柴毅, 尹宏鹏. 卷积神经网络结构优化综述. 自动化学报, 2020, 46(1): 24−37LIN Jing-Dong, WU Xin-Yi, CHAI Yi, YIN Hong-Peng. Structure Optimization of Convolutional Neural Networks: A Survey. ACTA AUTOMATICA SINICA, 2020, 46(1): 24−37 [29] Eldar Y, Lindenbaum M, Porat M, et al. The farthest point strategy for progressive image sampling. IEEE Transactions on Image Processing, 1997, 6(9): 1305−1315 doi: 10.1109/83.623193 [30] Qi C, Su H, Mo K, Leonidas J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii, USA: IEEE, 2017. 77−85. [31] Hua B, Tran M, Yeung S. Pointwise Convolutional Neural Networks. In: Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake, USA: IEEE, 2018. 984−993. [32] Wang Y, Wu S, Huang H, Daniel C, Olga S. Patch-Based Progressive 3D Point Set Upsampling. In: Proceedings of 2019 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake, USA: IEEE, 2018. 5951−5960. [33] Li Y, Bu R, Sun M, et al. Pointcnn: Convolution on x-transformed points. In: Proceedings of the 2018 IEEE Advances in neural information processing systems. Montreal in Montreal, Canada: IEEE, 2018. 820−830. [34] Wang Y, Sun Y, Liu Z, et al. Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (TOG), 2019, 38(5): 1−12 [35] Gold S, Rangarajan A, Lu C P, et al. New algorithms for 2D and 3D point matching: Pose estimation and correspondence. Pattern recognition, 1998, 31(8): 1019−1031 doi: 10.1016/S0031-3203(98)80010-1 [36] Ho J, Yang M H, Rangarajan A, Baba C. A new affine registration algorithm for matching 2D point sets. In: Proceedings of 2007 IEEE Workshop on Applications of Computer Vision. Austin, Texas, USA: IEEE, 2007. [37] Shen Y, Feng C, Yang Y, Dong T. Mining point cloud local structures by kernel correlation and graph pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Salt Lake, USA: IEEE, 2018. 4548−4557. [38] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of the Advances in neural information processing systems. Doha, Qatar: IEEE, 2012. 1097−1105. [39] Sangkloy P, Burnell N, Ham C, James H. The sketchy database: learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (TOG), 2016, 35(4): 1−12 [40] Dey S, Riba P, Dutta A, Josep L. Doodle to search: Practical zero-shot sketch-based image retrieval. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 2179−2188. [41] Zhang H, Liu S, Zhang C, et al. Sketchnet: Sketch classification with web images. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, Nevada, USA: IEEE, 2016. 1105−1113. [42] Zhang K, Luo W, Ma L, Li H. Cousin network guided sketch recognition via latent attribute warehouse. In: Proceedings of the AAAI Conference on Artificial Intelligence. Hilton Hawaiian Village, Honolulu, Hawaii, USA: IEEE, 2019. 9203−9210. [43] Zhang H, She P, Liu Y, et al. Learning structural representations via dynamic object landmarks discovery for sketch recognition and retrieval. IEEE Transactions on Image Processing, 2019, 28(9): 4486−4499 doi: 10.1109/TIP.2019.2910398 [44] Wang X, Chen X, Zha Z. Sketchpointnet: a compact network for robust sketch recognition. In: Proceedings of 2018 25th IEEE International Conference on Image Processing. Athens, Greece: IEEE, 2018. 2994−2998. [45] Lin M, Chen Q, Yan S. Network in network[J]. arXiv preprint arXiv: 1312.4400, 2013. [46] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv: 1409.1556, 2014. [47] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. In: Proceedings of the 2015 IEEE conference on computer vision and pattern recognition. Massachusetts, Boston, USA: IEEE, 2015. 1−9. [48] Zhang X, Huang Y, Zou Q, et al. A hybrid convolutional neural network for sketch recognition. Pattern Recognition Letters, 2019 [49] Zheng Y, Yao H, Sun X, et al. Sketch-specific data augmentation for freehand sketch recognition[J]. arXiv preprint arXiv: 1910.06038, 2019. [50] Selvaraju R. et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. Las Vegas, Nevada, USA: IEEE, 2016. 618−626. [51] Dutt R, Sukhatme U. Mapping of shape invariant potentials under point canonical transformations. Journal of Physics A: Mathematical and General, 1992: L843 [52] Mikolajczyk K and Cordelia S. An Affine Invariant Interest Point Detector. In: Proceedings of the 2002 IEEE European Conference on Computer Vision. Copenhagen, Denmark: IEEE, 2002. [53] Wang C, Samari B, Siddiqi K. Local spectral graph convolution for point set feature learning. In: Proceedings of the European conference on computer vision. Munich, Germany: IEEE, 2018. 52−66. [54] Simonovsky M, Nikos K. Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii, USA: IEEE, 2017. 29−38. [55] Kipf T, Max W. Semi-Supervised Classification with Graph Convolutional Networks. In: Proceedings of the International Conference on Learning Representations. Toulon, France, 2017. [56] Hanley J, Barbara J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 1982: 29−36 [57] L ee, W, Paul D, Joseph N. Optimizing the Area Under a Receiver Operating Characteristic Curve With Application to Landmine Detection. IEEE Transactions on Geoscience and Remote Sensing, 2007: 389−397 -

计量
- 文章访问数: 26
- HTML全文浏览量: 26
- 被引次数: 0