基于跨模态深度度量学习的甲骨文字识别

张颐康; 张恒; 刘永革; 刘成林

doi:10.16383/j.aas.c200443

基于跨模态深度度量学习的甲骨文字识别

doi: 10.16383/j.aas.c200443

张颐康^{1, 2,},
张恒^1,,
刘永革^4,,
刘成林^{1, 2, 3,}

1.
中国科学院自动化研究所模式识别国家重点实验室北京 100190
2.
中科院大学人工智能学院北京 100049
3.
中国科学院脑科学与智能技术卓越创新中心北京 100190
4.
安阳师范学院安阳 455099

基金项目: 新一代人工智能重大项目(2018AAA0100400), 国家自然科学基金(61936003, 61721004), 安阳师范学院甲骨文信息处理教育部重点实验室开放课题(KFKT2018001)资助

详细信息

作者简介:
张颐康：中科院大学硕士研究生. 2016年获得中国农业大学学士学位. 主要研究方向为文字识别. E-mail: yikang.zhang@nlpr.ia.ac.cn

张恒：中国科学院自动化研究所副研究员. 2007年获中国科学技术大学学士学位, 2013年获中科院大学博士学位. 主要研究方向为文档图像分析与识别. E-mail: heng.zhang@ia.ac.cn

刘永革：安阳师范大学教授. 2000年获得西北工业大学硕士学位. 2012年至2013年, 他以访问学者身份访问加州大学洛杉矶分校. 主要研究方向为甲骨文信息处理,多媒体分析. E-mail: ay_liuyongge@163.com

刘成林：中国科学院自动化研究所模式识别国家重点实验室研究员. 主要研究方向为图像处理, 模式识别,机器学习, 文字识别, 文档分析. 本文通信作者.E-mail: liucl@nlpr.ia.ac.cn

计量
- 文章访问数: 3183
- HTML全文浏览量: 1197
- PDF下载量: 662
- 被引次数: 0
出版历程
- 收稿日期: 2020-06-22
- 录用日期: 2020-10-19
- 网络出版日期: 2021-02-26
- 刊出日期: 2021-04-23

Oracle Character Recognition Based on Cross-Modal Deep Metric Learning

ZHANG Yi-Kang^{1, 2
,},
ZHANG Heng^1
,,
LIU Yong-Ge^4
,,
LIU Cheng-Lin^{1, 2, 3
,}

1.
National Laboratory of Pattern Recognition, Institute of Automation of Chinese Academy of Sciences, Beijing 100190
2.
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049
3.
Chinese Academy of Sciences Center for Excellence of Brain Science and Intelligence Technology, Beijing 100190
4.
Anyang Normal University, Anyang 455099

Funds: Supported by Major Project for New Generation AI (2018AAA0100400), National Natural Science Foundation of China (61936003, 61721004), Open Project from Key Laboratory of Oracle Information Processing in Anyang Normal University (KFKT2018001)

More Information

Author Bio:
ZHANG Yi-Kang　Master student at the University of Chinese Academy of Sciences. He received his bachelor degree from China Agricultural University in 2016. His main research interest is character recognition

ZHANG Heng　 Associate professor at the Institute of Automation, Chinese Academy of Sciences. He received his bachelor degree from University of Science and Technology of China in 2007 and Ph.D. degree from the University of Chinese Academy of Sciences in 2013. His research interest is document image analysis and recognition

LIU Yong-Ge　Professor at Anyang Normal University, China. He received his master degree from Northwestern Polytechnical University, China in 2000. From 2012 to 2013, he visited the University of California at Los Angeles, USA as a visiting scholar. His research interest covers Oracle Bone inscription information processing, and multimedia analysis

LIU Cheng-Lin　Professor at the National Laboratory of Pattern Recognition, Institute of Automation of Chinese Academy of Sciences. His research interest covers pattern recognition, image processing, neural networks, machine learning, and especially the applications to document analysis and recognition. Correspongding author of this paper

摘要

摘要: 甲骨文字图像可以分为拓片甲骨文字与临摹甲骨文字两类. 拓片甲骨文字图像是从龟甲、兽骨等载体上获取的原始拓片图像, 临摹甲骨文字图像是经过专家手工书写得到的高清图像. 拓片甲骨文字样本难以获得, 而临摹文字样本相对容易获得. 为了提高拓片甲骨文字识别的性能, 本文提出一种基于跨模态深度度量学习的甲骨文字识别方法, 通过对临摹甲骨文字和拓片甲骨文字进行共享特征空间建模和最近邻分类, 实现了拓片甲骨文字的跨模态识别. 实验结果表明, 在拓片甲骨文字识别任务上, 本文提出的跨模态学习方法比单模态方法有明显的提升, 同时对新类别拓片甲骨文字也能增量识别.
- 甲骨文字识别 /
- 深度度量学习 /
- 最近邻分类 /
- 跨模态学习
Abstract: There are two types of oracle character images: handprinted ones that are clean, and ones scanned from bones and shells that are noised. The collection of handprinted samples is easier than that of scanned images. Therefore, to improve the recognition of scanned oracle characters, we propose a method based on cross-modal deep metric learning to take advantage of the handprinted samples. Via shared feature space learning using cross-modal handprinted and scanned samples, scanned characters can be recognized by nearest neighbor classification in the shared space. Experimental results demonstrate that the proposed method not only achieves better performance in oracle character recognition but also can recognize new categories incrementally.
- Oracle character recognition /
- deep metric learning /
- nearest neighbor classification /
- cross-modal learning

HTML全文

图 1 不同模态的甲骨文字图像

Fig. 1 Oracle character images with different modals

下载: 全尺寸图片幻灯片

图 2 拓片甲骨文中的字形残缺、大量噪声问题

Fig. 2 Incomplete and noisy oracle character images

下载: 全尺寸图片幻灯片

图 3 基于跨模态深度度量学习的拓片甲骨文字识别

Fig. 3 Oracle character recognition based on cross-modal deep metric learning

下载: 全尺寸图片幻灯片

图 4 甲骨文字编码器结构

Fig. 4 Embedding of oracle character images

下载: 全尺寸图片幻灯片

图 5 三元组损失函数的学习目标

Fig. 5 Learning objective of triple loss function

下载: 全尺寸图片幻灯片

图 6 判别器的网络结构

Fig. 6 Network structure of discriminator

下载: 全尺寸图片幻灯片

图 7 不同阶段特征分布示意图

Fig. 7 Feature distributions in different stages

下载: 全尺寸图片幻灯片

图 8 不同模态的甲骨文特征可视化

Fig. 8 Visualization of oracle character features with different modals

下载: 全尺寸图片幻灯片

图 9 拓片甲骨文字中241个类别的样本个数分布

Fig. 9 Sample distribution of oracle characters scanned from bones and shells

下载: 全尺寸图片幻灯片

图 10 拓片甲骨文字类内样本示例, 每一列属于同一类

Fig. 10 Oracle images with the same characters in each array

下载: 全尺寸图片幻灯片

图 11 领域自适应后的最近邻对示例

Fig. 11 Nearest neighbor pairs after domain adaption

下载: 全尺寸图片幻灯片

图 12 置信度阈值与识别精度关系曲线图

Fig. 12 Relationship between confidence threshold and recognition accuracy

下载: 全尺寸图片幻灯片

图 13 最近邻检索的类别数与识别精度关系图

Fig. 13 Relationship between character number of nearest neighbor retrieval and recognition accuracy

下载: 全尺寸图片幻灯片

图 14 来自同一类别的多种字形结构

Fig. 14 Different glyph images of the same character

下载: 全尺寸图片幻灯片

表 1 不同图像尺度对性能的影响

Table 1 Effects of different image scales

图像大小识别率 (%)

32×32 76.80
64×64 82.10
128×128 83.40

下载: 导出CSV

表 2 拓片甲骨文字分类精度对比

Table 2 Comparison of different oracle character recognition methods

方法识别率 (%)

单模态最近邻 74.14
单模态CNN 84.40
跨模态最近邻 82.10
融合跨模态信息的CNN 86.70

下载: 导出CSV

表 3 新类别拓片甲骨文字识别

Table 3 Recognition performance of new oracle characters

特征学习方法跨模态近邻分类精度 (%)

度量学习+领域自适应 43.67
度量学习+领域自适应+特征修正 62.10

下载: 导出CSV

参考文献(28)

[1]	Huang S P, Wang H B, Liu Y G, Shi X S, Jin L W. OBC306: A large-scale Oracle Bone character recognition dataset. ICDAR 2019: 681−688
[2]	金连文, 钟卓耀, 杨钊, 杨维信, 谢泽澄, 孙俊. 深度学习在手写汉字识别中的应用综述. 自动化学报, 2016, 42(8): 1125−1141 Jin Lian-Wen, Zhong Zhuo-Yao, Yang Zhao, Yang Wei-Xin, Xie Ze-Cheng, Sun Jun. Applications of deep learning for handwritten Chinese character recognition: A review. Acta Automatica Sinica, 2016, 42(8): 1125−1141
[3]	Zhang X Y, Bengio Y, Liu C L: Online and offline handwritten Chinese character recognition: A comprehensive study and new benchmark. Pattern Recognition, 2017, 61: 348−360
[4]	李文英, 曹斌, 曹春水, 黄永祯. 一种基于深度学习的青铜器铭文识别方法. 自动化学报, 2018, 44(11): 2023−2030 Li Wen-Ying, Cao Bin, Cao Chun-Shui, Huang Yong-Zhen. A deep learning based method for bronze inscription recognition. Acta Automatica Sinica, 2018, 44(11): 2023−2030
[5]	Guo J, Wang C H, Roman-Rangel E, Chao H Y, Rui Y. Building hierarchical representations for oracle character and sketch recognition. IEEE Transactions on Image Processing, 2016, 25(1): 104−118 doi: 10.1109/TIP.2015.2500019
[6]	Bengio Y, Lamblin P, Popovici D, Larochelle H. Greedy layer-wise training of deep networks. NIPS 2006: 153−160
[7]	Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S E, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. CVPR 2015: 1−9
[8]	Berg A C, Berg T L, Malik J. Shape matching and object recognition using low distortion correspondences. CVPR 2005: 26−33
[9]	Roman-Rangel E, Pallan C, Odobez J M, Gatica-Perez D. Analyzing ancient Maya glyph collections with contextual shape descriptors. Int. J. Computer Vision, 2011, 94(1): 101−117 doi: 10.1007/s11263-010-0387-x
[10]	Cortes C, Vapnik V. Support-vector networks. Machine Learning, 1995, 20(3): 273−297
[11]	Yu Q, Yang Y X, Liu F, Song Y Z, Xiang T, Hospedales T M. Sketch-a-Net: A deep neural network that beats humans. Int. J. Computer Vision, 2017, 122(3): 411−425 doi: 10.1007/s11263-016-0932-3
[12]	Creswell A, Bharath A A. Adversarial training for sketch retrieval. ECCV Workshops 2016: 798-809.
[13]	Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks. Advances in Neural Information Processing Systems, 2014, 3: 2672−2680
[14]	Yang L, Jin R, Sukthankar R, Liu Y: An efficient algorithm for local distance metric learning. AAAI 2006: 543−548.
[15]	Yang L, Jin R, Sukthankar R. Bayesian active distance metric learning. UAI 2007: 442−449.
[16]	Hu J L, Lu J W, Tan Y P. Discriminative deep metric learning for face verification in the wild. CVPR 2014: 1875−1882
[17]	Schroff F, Kalenichenko D, Philbin J: FaceNet: A unified embedding for face recognition and clustering. CVPR 2015: 815−823.
[18]	Gong B Q, Shi Y, Sha F, Grauman K. Geodesic flow kernel for unsupervised domain adaptation. CVPR 2012: 2066−2073.
[19]	Pan S J, Yang Q. A Survey on transfer learning. IEEE Trans. Knowl. Data Eng., 2010, 22(10): 1345−1359 doi: 10.1109/TKDE.2009.191
[20]	Fernando B, Habrard A, Sebban M, Tuytelaars T. Unsupervised visual domain adaptation using subspace alignment. ICCV 2013: 2960−2967
[21]	Solomon J, de Goes F, Peyré G, Cuturi M, Butscher A, Nguyen A, Du T, Guibas L J. Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains. ACM Trans. Graph., 2015, 34(4): 1−66
[22]	Sankaranarayanan S, Balaji Y, Jain A, Lim S, Chellappa R: Unsupervised domain adaptation for semantic segmentation with GANs. CoRR abs/1711.06969 (2017).
[23]	Kim T, Cha M, Kim H, Lee J K, Kim J. Learning to discover cross-domain relations with generative adversarial networks. ICML 2017: 1857−1865
[24]	Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. CoRR abs/1701.07875 (2017).
[25]	Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A C. Improved training of Wasserstein GANs. NIPS 2017: 5767−5777
[26]	Zhang Y K, Zhang H, Liu Y G, Yang Q, Liu C L. Oracle character recognition by nearest neighbor classification with deep metric learning. ICDAR 2019: 309−314
[27]	Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML 2015: 448−456
[28]	Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. AISTATS 2011: 315−323