-
摘要: 甲骨文字图像可以分为拓片甲骨文字与临摹甲骨文字两类. 拓片甲骨文字图像是从龟甲、兽骨等载体上获取的原始拓片图像, 临摹甲骨文字图像是经过专家手工书写得到的高清图像. 拓片甲骨文字样本难以获得, 而临摹文字样本相对容易获得. 为了提高拓片甲骨文字识别的性能, 本文提出一种基于跨模态深度度量学习的甲骨文字识别方法, 通过对临摹甲骨文字和拓片甲骨文字进行共享特征空间建模和最近邻分类, 实现了拓片甲骨文字的跨模态识别. 实验结果表明, 在拓片甲骨文字识别任务上, 本文提出的跨模态学习方法比单模态方法有明显的提升, 同时对新类别拓片甲骨文字也能增量识别.Abstract: There are two types of oracle character images: handprinted ones that are clean, and ones scanned from bones and shells that are noised. The collection of handprinted samples is easier than that of scanned images. Therefore, to improve the recognition of scanned oracle characters, we propose a method based on cross-modal deep metric learning to take advantage of the handprinted samples. Via shared feature space learning using cross-modal handprinted and scanned samples, scanned characters can be recognized by nearest neighbor classification in the shared space. Experimental results demonstrate that the proposed method not only achieves better performance in oracle character recognition but also can recognize new categories incrementally.
-
表 1 不同图像尺度对性能的影响
Table 1 Effects of different image scales
图像大小 识别率 (%) 32×32 76.80 64×64 82.10 128×128 83.40 表 2 拓片甲骨文字分类精度对比
Table 2 Comparison of different oracle character recognition methods
方法 识别率 (%) 单模态最近邻 74.14 单模态CNN 84.40 跨模态最近邻 82.10 融合跨模态信息的CNN 86.70 表 3 新类别拓片甲骨文字识别
Table 3 Recognition performance of new oracle characters
特征学习方法 跨模态近邻分类精度 (%) 度量学习+领域自适应 43.67 度量学习+领域自适应+特征修正 62.10 -
[1] Huang S P, Wang H B, Liu Y G, Shi X S, Jin L W. OBC306: A large-scale Oracle Bone character recognition dataset. ICDAR 2019: 681−688 [2] 金连文, 钟卓耀, 杨钊, 杨维信, 谢泽澄, 孙俊. 深度学习在手写汉字识别中的应用综述. 自动化学报, 2016, 42(8): 1125−1141Jin Lian-Wen, Zhong Zhuo-Yao, Yang Zhao, Yang Wei-Xin, Xie Ze-Cheng, Sun Jun. Applications of deep learning for handwritten Chinese character recognition: A review. Acta Automatica Sinica, 2016, 42(8): 1125−1141 [3] Zhang X Y, Bengio Y, Liu C L: Online and offline handwritten Chinese character recognition: A comprehensive study and new benchmark. Pattern Recognition, 2017, 61: 348−360 [4] 李文英, 曹斌, 曹春水, 黄永祯. 一种基于深度学习的青铜器铭文识别方法. 自动化学报, 2018, 44(11): 2023−2030Li Wen-Ying, Cao Bin, Cao Chun-Shui, Huang Yong-Zhen. A deep learning based method for bronze inscription recognition. Acta Automatica Sinica, 2018, 44(11): 2023−2030 [5] Guo J, Wang C H, Roman-Rangel E, Chao H Y, Rui Y. Building hierarchical representations for oracle character and sketch recognition. IEEE Transactions on Image Processing, 2016, 25(1): 104−118 doi: 10.1109/TIP.2015.2500019 [6] Bengio Y, Lamblin P, Popovici D, Larochelle H. Greedy layer-wise training of deep networks. NIPS 2006: 153−160 [7] Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S E, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. CVPR 2015: 1−9 [8] Berg A C, Berg T L, Malik J. Shape matching and object recognition using low distortion correspondences. CVPR 2005: 26−33 [9] Roman-Rangel E, Pallan C, Odobez J M, Gatica-Perez D. Analyzing ancient Maya glyph collections with contextual shape descriptors. Int. J. Computer Vision, 2011, 94(1): 101−117 doi: 10.1007/s11263-010-0387-x [10] Cortes C, Vapnik V. Support-vector networks. Machine Learning, 1995, 20(3): 273−297 [11] Yu Q, Yang Y X, Liu F, Song Y Z, Xiang T, Hospedales T M. Sketch-a-Net: A deep neural network that beats humans. Int. J. Computer Vision, 2017, 122(3): 411−425 doi: 10.1007/s11263-016-0932-3 [12] Creswell A, Bharath A A. Adversarial training for sketch retrieval. ECCV Workshops 2016: 798-809. [13] Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks. Advances in Neural Information Processing Systems, 2014, 3: 2672−2680 [14] Yang L, Jin R, Sukthankar R, Liu Y: An efficient algorithm for local distance metric learning. AAAI 2006: 543−548. [15] Yang L, Jin R, Sukthankar R. Bayesian active distance metric learning. UAI 2007: 442−449. [16] Hu J L, Lu J W, Tan Y P. Discriminative deep metric learning for face verification in the wild. CVPR 2014: 1875−1882 [17] Schroff F, Kalenichenko D, Philbin J: FaceNet: A unified embedding for face recognition and clustering. CVPR 2015: 815−823. [18] Gong B Q, Shi Y, Sha F, Grauman K. Geodesic flow kernel for unsupervised domain adaptation. CVPR 2012: 2066−2073. [19] Pan S J, Yang Q. A Survey on transfer learning. IEEE Trans. Knowl. Data Eng., 2010, 22(10): 1345−1359 doi: 10.1109/TKDE.2009.191 [20] Fernando B, Habrard A, Sebban M, Tuytelaars T. Unsupervised visual domain adaptation using subspace alignment. ICCV 2013: 2960−2967 [21] Solomon J, de Goes F, Peyré G, Cuturi M, Butscher A, Nguyen A, Du T, Guibas L J. Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains. ACM Trans. Graph., 2015, 34(4): 1−66 [22] Sankaranarayanan S, Balaji Y, Jain A, Lim S, Chellappa R: Unsupervised domain adaptation for semantic segmentation with GANs. CoRR abs/1711.06969 (2017). [23] Kim T, Cha M, Kim H, Lee J K, Kim J. Learning to discover cross-domain relations with generative adversarial networks. ICML 2017: 1857−1865 [24] Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. CoRR abs/1701.07875 (2017). [25] Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A C. Improved training of Wasserstein GANs. NIPS 2017: 5767−5777 [26] Zhang Y K, Zhang H, Liu Y G, Yang Q, Liu C L. Oracle character recognition by nearest neighbor classification with deep metric learning. ICDAR 2019: 309−314 [27] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML 2015: 448−456 [28] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. AISTATS 2011: 315−323 -
跨模态零样本文字识别PPT.pdf