2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于跨模态深度度量学习的甲骨文字识别

张颐康 张恒 刘永革 刘成林

张颐康, 张恒, 刘永革, 刘成林. 基于跨模态深度度量学习的甲骨文字识别. 自动化学报, 2021, 47(4): 791−800 doi: 10.16383/j.aas.c200443
引用本文: 张颐康, 张恒, 刘永革, 刘成林. 基于跨模态深度度量学习的甲骨文字识别. 自动化学报, 2021, 47(4): 791−800 doi: 10.16383/j.aas.c200443
Zhang Yi-Kang, Zhang Heng, Liu Yong-Ge, Liu Cheng-Lin. Oracle character recognition based on cross-modal deep metric learning. Acta Automatica Sinica, 2021, 47(4): 791−800 doi: 10.16383/j.aas.c200443
Citation: Zhang Yi-Kang, Zhang Heng, Liu Yong-Ge, Liu Cheng-Lin. Oracle character recognition based on cross-modal deep metric learning. Acta Automatica Sinica, 2021, 47(4): 791−800 doi: 10.16383/j.aas.c200443

基于跨模态深度度量学习的甲骨文字识别

doi: 10.16383/j.aas.c200443
基金项目: 新一代人工智能重大项目(2018AAA0100400), 国家自然科学基金(61936003, 61721004), 安阳师范学院甲骨文信息处理教育部重点实验室开放课题(KFKT2018001)资助
详细信息
    作者简介:

    张颐康:中科院大学硕士研究生. 2016年获得中国农业大学学士学位. 主要研究方向为文字识别. E-mail: yikang.zhang@nlpr.ia.ac.cn

    张恒:中国科学院自动化研究所副研究员. 2007年获中国科学技术大学学士学位, 2013年获中科院大学博士学位. 主要研究方向为文档图像分析与识别. E-mail: heng.zhang@ia.ac.cn

    刘永革:安阳师范大学教授. 2000年获得西北工业大学硕士学位. 2012年至2013年, 他以访问学者身份访问加州大学洛杉矶分校. 主要研究方向为甲骨文信息处理,多媒体分析. E-mail: ay_liuyongge@163.com

    刘成林:中国科学院自动化研究所模式识别国家重点实验室研究员. 主要研究方向为图像处理, 模式识别,机器学习, 文字识别, 文档分析. 本文通信作者.E-mail: liucl@nlpr.ia.ac.cn

Oracle Character Recognition Based on Cross-Modal Deep Metric Learning

Funds: Supported by Major Project for New Generation AI (2018AAA0100400), National Natural Science Foundation of China (61936003, 61721004), Open Project from Key Laboratory of Oracle Information Processing in Anyang Normal University (KFKT2018001)
More Information
    Author Bio:

    ZHANG Yi-Kang Master student at the University of Chinese Academy of Sciences. He received his bachelor degree from China Agricultural University in 2016. His main research interest is character recognition

    ZHANG Heng  Associate professor at the Institute of Automation, Chinese Academy of Sciences. He received his bachelor degree from University of Science and Technology of China in 2007 and Ph.D. degree from the University of Chinese Academy of Sciences in 2013. His research interest is document image analysis and recognition

    LIU Yong-Ge Professor at Anyang Normal University, China. He received his master degree from Northwestern Polytechnical University, China in 2000. From 2012 to 2013, he visited the University of California at Los Angeles, USA as a visiting scholar. His research interest covers Oracle Bone inscription information processing, and multimedia analysis

    LIU Cheng-Lin Professor at the National Laboratory of Pattern Recognition, Institute of Automation of Chinese Academy of Sciences. His research interest covers pattern recognition, image processing, neural networks, machine learning, and especially the applications to document analysis and recognition. Correspongding author of this paper

  • 摘要: 甲骨文字图像可以分为拓片甲骨文字与临摹甲骨文字两类. 拓片甲骨文字图像是从龟甲、兽骨等载体上获取的原始拓片图像, 临摹甲骨文字图像是经过专家手工书写得到的高清图像. 拓片甲骨文字样本难以获得, 而临摹文字样本相对容易获得. 为了提高拓片甲骨文字识别的性能, 本文提出一种基于跨模态深度度量学习的甲骨文字识别方法, 通过对临摹甲骨文字和拓片甲骨文字进行共享特征空间建模和最近邻分类, 实现了拓片甲骨文字的跨模态识别. 实验结果表明, 在拓片甲骨文字识别任务上, 本文提出的跨模态学习方法比单模态方法有明显的提升, 同时对新类别拓片甲骨文字也能增量识别.
  • 图  1  不同模态的甲骨文字图像

    Fig.  1  Oracle character images with different modals

    图  2  拓片甲骨文中的字形残缺、大量噪声问题

    Fig.  2  Incomplete and noisy oracle character images

    图  3  基于跨模态深度度量学习的拓片甲骨文字识别

    Fig.  3  Oracle character recognition based on cross-modal deep metric learning

    图  4  甲骨文字编码器结构

    Fig.  4  Embedding of oracle character images

    图  5  三元组损失函数的学习目标

    Fig.  5  Learning objective of triple loss function

    图  6  判别器的网络结构

    Fig.  6  Network structure of discriminator

    图  7  不同阶段特征分布示意图

    Fig.  7  Feature distributions in different stages

    图  8  不同模态的甲骨文特征可视化

    Fig.  8  Visualization of oracle character features with different modals

    图  9  拓片甲骨文字中241个类别的样本个数分布

    Fig.  9  Sample distribution of oracle characters scanned from bones and shells

    图  10  拓片甲骨文字类内样本示例, 每一列属于同一类

    Fig.  10  Oracle images with the same characters in each array

    图  11  领域自适应后的最近邻对示例

    Fig.  11  Nearest neighbor pairs after domain adaption

    图  12  置信度阈值与识别精度关系曲线图

    Fig.  12  Relationship between confidence threshold and recognition accuracy

    图  13  最近邻检索的类别数与识别精度关系图

    Fig.  13  Relationship between character number of nearest neighbor retrieval and recognition accuracy

    图  14  来自同一类别的多种字形结构

    Fig.  14  Different glyph images of the same character

    表  1  不同图像尺度对性能的影响

    Table  1  Effects of different image scales

    图像大小识别率 (%)
    32×3276.80
    64×6482.10
    128×12883.40
    下载: 导出CSV

    表  2  拓片甲骨文字分类精度对比

    Table  2  Comparison of different oracle character recognition methods

    方法识别率 (%)
    单模态最近邻74.14
    单模态CNN84.40
    跨模态最近邻82.10
    融合跨模态信息的CNN86.70
    下载: 导出CSV

    表  3  新类别拓片甲骨文字识别

    Table  3  Recognition performance of new oracle characters

    特征学习方法跨模态近邻分类精度 (%)
    度量学习+领域自适应43.67
    度量学习+领域自适应+特征修正62.10
    下载: 导出CSV
  • [1] Huang S P, Wang H B, Liu Y G, Shi X S, Jin L W. OBC306: A large-scale Oracle Bone character recognition dataset. ICDAR 2019: 681−688
    [2] 金连文, 钟卓耀, 杨钊, 杨维信, 谢泽澄, 孙俊. 深度学习在手写汉字识别中的应用综述. 自动化学报, 2016, 42(8): 1125−1141

    Jin Lian-Wen, Zhong Zhuo-Yao, Yang Zhao, Yang Wei-Xin, Xie Ze-Cheng, Sun Jun. Applications of deep learning for handwritten Chinese character recognition: A review. Acta Automatica Sinica, 2016, 42(8): 1125−1141
    [3] Zhang X Y, Bengio Y, Liu C L: Online and offline handwritten Chinese character recognition: A comprehensive study and new benchmark. Pattern Recognition, 2017, 61: 348−360
    [4] 李文英, 曹斌, 曹春水, 黄永祯. 一种基于深度学习的青铜器铭文识别方法. 自动化学报, 2018, 44(11): 2023−2030

    Li Wen-Ying, Cao Bin, Cao Chun-Shui, Huang Yong-Zhen. A deep learning based method for bronze inscription recognition. Acta Automatica Sinica, 2018, 44(11): 2023−2030
    [5] Guo J, Wang C H, Roman-Rangel E, Chao H Y, Rui Y. Building hierarchical representations for oracle character and sketch recognition. IEEE Transactions on Image Processing, 2016, 25(1): 104−118 doi: 10.1109/TIP.2015.2500019
    [6] Bengio Y, Lamblin P, Popovici D, Larochelle H. Greedy layer-wise training of deep networks. NIPS 2006: 153−160
    [7] Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S E, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. CVPR 2015: 1−9
    [8] Berg A C, Berg T L, Malik J. Shape matching and object recognition using low distortion correspondences. CVPR 2005: 26−33
    [9] Roman-Rangel E, Pallan C, Odobez J M, Gatica-Perez D. Analyzing ancient Maya glyph collections with contextual shape descriptors. Int. J. Computer Vision, 2011, 94(1): 101−117 doi: 10.1007/s11263-010-0387-x
    [10] Cortes C, Vapnik V. Support-vector networks. Machine Learning, 1995, 20(3): 273−297
    [11] Yu Q, Yang Y X, Liu F, Song Y Z, Xiang T, Hospedales T M. Sketch-a-Net: A deep neural network that beats humans. Int. J. Computer Vision, 2017, 122(3): 411−425 doi: 10.1007/s11263-016-0932-3
    [12] Creswell A, Bharath A A. Adversarial training for sketch retrieval. ECCV Workshops 2016: 798-809.
    [13] Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks. Advances in Neural Information Processing Systems, 2014, 3: 2672−2680
    [14] Yang L, Jin R, Sukthankar R, Liu Y: An efficient algorithm for local distance metric learning. AAAI 2006: 543−548.
    [15] Yang L, Jin R, Sukthankar R. Bayesian active distance metric learning. UAI 2007: 442−449.
    [16] Hu J L, Lu J W, Tan Y P. Discriminative deep metric learning for face verification in the wild. CVPR 2014: 1875−1882
    [17] Schroff F, Kalenichenko D, Philbin J: FaceNet: A unified embedding for face recognition and clustering. CVPR 2015: 815−823.
    [18] Gong B Q, Shi Y, Sha F, Grauman K. Geodesic flow kernel for unsupervised domain adaptation. CVPR 2012: 2066−2073.
    [19] Pan S J, Yang Q. A Survey on transfer learning. IEEE Trans. Knowl. Data Eng., 2010, 22(10): 1345−1359 doi: 10.1109/TKDE.2009.191
    [20] Fernando B, Habrard A, Sebban M, Tuytelaars T. Unsupervised visual domain adaptation using subspace alignment. ICCV 2013: 2960−2967
    [21] Solomon J, de Goes F, Peyré G, Cuturi M, Butscher A, Nguyen A, Du T, Guibas L J. Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains. ACM Trans. Graph., 2015, 34(4): 1−66
    [22] Sankaranarayanan S, Balaji Y, Jain A, Lim S, Chellappa R: Unsupervised domain adaptation for semantic segmentation with GANs. CoRR abs/1711.06969 (2017).
    [23] Kim T, Cha M, Kim H, Lee J K, Kim J. Learning to discover cross-domain relations with generative adversarial networks. ICML 2017: 1857−1865
    [24] Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. CoRR abs/1701.07875 (2017).
    [25] Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A C. Improved training of Wasserstein GANs. NIPS 2017: 5767−5777
    [26] Zhang Y K, Zhang H, Liu Y G, Yang Q, Liu C L. Oracle character recognition by nearest neighbor classification with deep metric learning. ICDAR 2019: 309−314
    [27] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML 2015: 448−456
    [28] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. AISTATS 2011: 315−323
  • 跨模态零样本文字识别PPT.pdf
  • 加载中
图(14) / 表(3)
计量
  • 文章访问数:  2218
  • HTML全文浏览量:  840
  • PDF下载量:  623
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-06-22
  • 录用日期:  2020-10-19
  • 网络出版日期:  2021-02-26
  • 刊出日期:  2021-04-23

目录

    /

    返回文章
    返回