Oracle Character Recognition Based on Cross-Modal Deep Metric Learning
-
摘要: 甲骨文字图像可以分为拓片甲骨文字与临摹甲骨文字两类. 拓片甲骨文字图像是从龟甲、兽骨等载体上获取的原始拓片图像, 临摹甲骨文字图像是经过专家手工书写得到的高清图像. 拓片甲骨文字样本难以获得, 而临摹文字样本相对容易获得. 为了提高拓片甲骨文字识别的性能, 本文提出一种基于跨模态深度度量学习的甲骨文字识别方法, 通过对临摹甲骨文字和拓片甲骨文字进行共享特征空间建模和最近邻分类, 实现了拓片甲骨文字的跨模态识别. 实验结果表明, 在拓片甲骨文字识别任务上, 本文提出的跨模态学习方法比单模态方法有明显的提升, 同时对新类别拓片甲骨文字也能增量识别.Abstract: There are two types of oracle character images: handprinted ones that are clean, and ones scanned from bones and shells that are noised. The collection of handprinted samples is easier than that of scanned images. Therefore, to improve the recognition of scanned oracle characters, we propose a method based on cross-modal deep metric learning to take advantage of the handprinted samples. Via shared feature space learning using cross-modal handprinted and scanned samples, scanned characters can be recognized by nearest neighbor classification in the shared space. Experimental results demonstrate that the proposed method not only achieves better performance in oracle character recognition but also can recognize new categories incrementally.
-
表 1 不同图像尺度对性能的影响
Table 1 Effects of different image scales
图像大小 识别率 (%) 32*32 76.80 64*64 82.10 128*128 83.40 表 2 拓片甲骨文字分类精度对比
Table 2 Comparison of different oracle character recognition methods
方法 识别率 (%) 单模态最近邻 74.14 单模态CNN 84.40 跨模态最近邻 82.10 融合跨模态信息的CNN 86.70 表 3 新类别拓片甲骨文字识别
Table 3 Recognition performance of new oracle characters
特征学习方法 跨模态近邻分类精度 (%) 度量学习+领域自适应 43.67 度量学习+领域自适应+特征修正 62.10 -
[1] Shuangping Huang, Haobin Wang, Yongge Liu, Xiaosong Shi, Lianwen Jin: OBC306: A Large-Scale Oracle Bone Character Recognition Dataset. ICDAR 2019: 681−688. [2] 金连文、钟卓耀、杨钊、杨维信、谢泽澄、孙俊. 深度学习在手写汉字识别中的应用综述. 自动化学报, 2016, 42(8): 1125−1141 [3] Xu-Yao Zhang, Yoshua Bengio, Cheng-Lin Liu: Online and offline handwritten Chinese character recognition: A comprehensive study and new benchmark. Pattern Recognition, 2017, 61: 348−360. [4] 李文英、曹斌、曹春水、黄永祯. 一种基于深度学习的青铜器铭文识别方法. 自动化学报, 2018, 44(11): 2023−2030 [5] Jun Guo, Changhu Wang, Edgar Roman-Rangel, Hongyang Chao, Yong Rui. Building Hierarchical Representations for Oracle Character and Sketch Recognition. IEEE Trans. Image Processing, 2016, 25(1): 104−118 doi: 10.1109/TIP.2015.2500019 [6] Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle: Greedy Layer-Wise Training of Deep Networks. NIPS 2006: 153−160. [7] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich: Going deeper with convolutions. CVPR 2015: 1−9. [8] Alexander C. Berg, Tamara L. Berg, Jitendra Malik: Shape Matching and Object Recognition Using Low Distortion Correspondences. CVPR 2005: 26−33. [9] Edgar Roman-Rangel, Carlos Pallan, Jean-Marc Odobez, Daniel Gatica-Perez. Analyzing Ancient Maya Glyph Collections with Contextual Shape Descriptors. Int. J. Computer Vision, 2011, 94(1): 101−117 doi: 10.1007/s11263-010-0387-x [10] Corinna Cortes, Vladimir Vapnik. Support-Vector Networks. Machine Learning, 1995, 20(3): 273−297 [11] Qian Yu, Yongxin Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales. Sketch-a-Net: A Deep Neural Network that Beats Humans. Int. J. Computer Vision, 2017, 122(3): 411−425 doi: 10.1007/s11263-016-0932-3 [12] Antonia Creswell, Anil Anthony Bharath: Adversarial Training for Sketch Retrieval. ECCV Workshops 2016: 798-809. [13] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, et al. Generative adversarial networks. Advances in Neural Information Processing Systems, 2014, 3: 2672−2680 [14] Liu Yang, Rong Jin, Rahul Sukthankar, Yi Liu: An Efficient Algorithm for Local Distance Metric Learning. AAAI 2006: 543−548. [15] Liu Yang, Rong Jin, Rahul Sukthankar: Bayesian Active Distance Metric Learning. UAI 2007: 442−449. [16] Junlin Hu, Jiwen Lu, Yap-Peng Tan: Discriminative Deep Metric Learning for Face Verification in the Wild. CVPR 2014: 1875−1882. [17] Florian Schroff, Dmitry Kalenichenko, James Philbin: FaceNet: A unified embedding for face recognition and clustering. CVPR 2015: 815−823. [18] Boqing Gong, Yuan Shi, Fei Sha, Kristen Grauman: Geodesic flow kernel for unsupervised domain adaptation. CVPR 2012: 2066−2073. [19] Sinno Jialin Pan, Qiang Yang. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng., 2010, 22(10): 1345−1359 doi: 10.1109/TKDE.2009.191 [20] Basura Fernando, Amaury Habrard, Marc Sebban, Tinne Tuytelaars: Unsupervised Visual Domain Adaptation Using Subspace Alignment. ICCV 2013: 2960−2967. [21] Justin Solomon, Fernando de Goes, Gabriel Peyré, Marco Cuturi, Adrian Butscher, Andy Nguyen, Tao Du, Leonidas J. Guibas. Convolutional Wasserstein distances: efficient optimal transportation on geometric domains. ACM Trans. Graph., 2015, 34(4): 1−66 [22] Swami Sankaranarayanan, Yogesh Balaji, Arpit Jain, Ser-Nam Lim, Rama Chellappa: Unsupervised Domain Adaptation for Semantic Segmentation with GANs. CoRR abs/1711.06969 (2017). [23] Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, Jiwon Kim: Learning to Discover Cross-Domain Relations with Generative Adversarial Networks. ICML 2017: 1857−1865. [24] Martín Arjovsky, Soumith Chintala, Léon Bottou: Wasserstein GAN. CoRR abs/1701.07875 (2017). [25] Ishaan Gulrajani, Faruk Ahmed, Martín Arjovsky, Vincent Dumoulin, Aaron C. Courville: Improved Training of Wasserstein GANs. NIPS 2017: 5767−5777. [26] Yi-Kang Zhang, Heng Zhang, Yongge Liu, Qing Yang, Chen-Lin Liu: Oracle Character Recognition by Nearest Neighbor Classification with Deep Metric Learning. ICDAR 2019: 309−314. [27] Sergey Ioffe, Christian Szegedy: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. ICML 2015: 448−456. [28] Xavier Glorot, Antoine Bordes, Yoshua Bengio: Deep Sparse Rectifier Neural Networks. AISTATS 2011: 315−323. -