2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于跨模态深度度量学习的甲骨文字识别

张颐康 张恒 刘永革 刘成林

张颐康, 张恒, 刘永革, 刘成林. 基于跨模态深度度量学习的甲骨文字识别. 自动化学报, 2021, x(x): 1−10 doi: 10.16383/j.aas.c200443
引用本文: 张颐康, 张恒, 刘永革, 刘成林. 基于跨模态深度度量学习的甲骨文字识别. 自动化学报, 2021, x(x): 1−10 doi: 10.16383/j.aas.c200443
Zhang Yi-Kang, Zhang Hang, Liu Yong-Ge, Liu Cheng-Lin. Oracle character recognition based on cross-modal deep metric learning. Acta Automatica Sinica, 2021, x(x): 1−10 doi: 10.16383/j.aas.c200443
Citation: Zhang Yi-Kang, Zhang Hang, Liu Yong-Ge, Liu Cheng-Lin. Oracle character recognition based on cross-modal deep metric learning. Acta Automatica Sinica, 2021, x(x): 1−10 doi: 10.16383/j.aas.c200443

基于跨模态深度度量学习的甲骨文字识别

doi: 10.16383/j.aas.c200443
基金项目: 新一代人工智能重大项目(2018AAA0100400), 自然科学基金(61936003, 61721004), 安阳师范学院甲骨文信息处理教育部重点实验室开放课题(KFKT2018001)资助
详细信息
    作者简介:

    张颐康:中科院大学硕士研究生, 本科毕业于中国农业大学. 研究方向是文字识别. Email: yikang.zhang@nlpr.ia.ac.cn

    张恒:中国科学院自动化研究所副研究员, 2007年获中国科学技术大学学士学位, 2013年获中科院大学博士学位. 研究方向是文档图像分析与识别. Email: heng.zhang@ia.ac.cn

    刘永革:现任安阳师范大学教授, 教育部创新团队负责人. 获西北工业大学硕士学位. 2012年至2013年, 他以访问学者身份访问加州大学洛杉矶分校. 他目前的研究方向包括甲骨文信息处理和多媒体分析. Email: ay_liuyongge@163.com

    刘成林:模式识别国家重点实验室主任, 研究员、博士生导师, 中国科学院大学人工智能学院副院长. 研究兴趣包括图像处理、模式识别、机器学习、文字识别与文档分析等. 在国内外期刊和学术会议上发表论文300余篇. Email: liucl@nlpr.ia.ac.cn

Oracle Character Recognition Based on Cross-Modal Deep Metric Learning

Funds: Supported by Major Project for New Generation AI (2018AAA0100400), National Natural Science Foundation of China (61936003, 61721004), Open Project from Key Laboratory of Oracle Information Processing in Anyang Normal University (KFKT2018001)
  • 摘要: 甲骨文字图像可以分为拓片甲骨文字与临摹甲骨文字两类. 拓片甲骨文字图像是从龟甲、兽骨等载体上获取的原始拓片图像, 临摹甲骨文字图像是经过专家手工书写得到的高清图像. 拓片甲骨文字样本难以获得, 而临摹文字样本相对容易获得. 为了提高拓片甲骨文字识别的性能, 本文提出一种基于跨模态深度度量学习的甲骨文字识别方法, 通过对临摹甲骨文字和拓片甲骨文字进行共享特征空间建模和最近邻分类, 实现了拓片甲骨文字的跨模态识别. 实验结果表明, 在拓片甲骨文字识别任务上, 本文提出的跨模态学习方法比单模态方法有明显的提升, 同时对新类别拓片甲骨文字也能增量识别.
  • 图  1  不同模态的甲骨文字图像

    Fig.  1  Oracle character images with different modals

    图  2  拓片甲骨文中的字形残缺、大量噪声问题

    Fig.  2  Incomplete and noisy oracle character images

    图  3  基于跨模态深度度量学习的拓片甲骨文字识别

    Fig.  3  Oracle character recognition based on cross-modal deep metric learning

    图  4  甲骨文字编码器结构

    Fig.  4  Embedding of oracle character images

    图  5  三元组损失函数的学习目标

    Fig.  5  Learning objective of triple loss function

    图  6  判别器的网络结构

    Fig.  6  Network structure of discriminator

    图  7  不同阶段特征分布示意图

    Fig.  7  Feature distributions in different stages

    图  8  不同模态的甲骨文特征可视化

    Fig.  8  Visualization of oracle character features with different modals

    图  9  拓片甲骨文字中241个类别的样本个数分布

    Fig.  9  Sample distribution of oracle characters scanned from bones and shells

    图  10  拓片甲骨文字类内样本示例, 每一列属于同一类

    Fig.  10  Oracle images with the same characters in each array

    图  11  领域自适应后的最近邻对示例

    Fig.  11  Nearest neighbor pairs after domain adaption

    图  12  置信度阈值与识别精度关系曲线图

    Fig.  12  Relationship between confidence threshold and recognition accuracy

    图  13  最近邻检索的类别数与识别精度关系图

    Fig.  13  Relationship between character number of nearest neighbor retrieval and recognition accuracy

    图  14  来自同一类别的多种字形结构

    Fig.  14  Different glyph images of the same character

    表  1  不同图像尺度对性能的影响

    Table  1  Effects of different image scales

    图像大小识别率 (%)
    32*3276.80
    64*6482.10
    128*12883.40
    下载: 导出CSV

    表  2  拓片甲骨文字分类精度对比

    Table  2  Comparison of different oracle character recognition methods

    方法识别率 (%)
    单模态最近邻74.14
    单模态CNN84.40
    跨模态最近邻82.10
    融合跨模态信息的CNN86.70
    下载: 导出CSV

    表  3  新类别拓片甲骨文字识别

    Table  3  Recognition performance of new oracle characters

    特征学习方法跨模态近邻分类精度 (%)
    度量学习+领域自适应43.67
    度量学习+领域自适应+特征修正62.10
    下载: 导出CSV
  • [1] Shuangping Huang, Haobin Wang, Yongge Liu, Xiaosong Shi, Lianwen Jin: OBC306: A Large-Scale Oracle Bone Character Recognition Dataset. ICDAR 2019: 681−688.
    [2] 金连文、钟卓耀、杨钊、杨维信、谢泽澄、孙俊. 深度学习在手写汉字识别中的应用综述. 自动化学报, 2016, 42(8): 1125−1141
    [3] Xu-Yao Zhang, Yoshua Bengio, Cheng-Lin Liu: Online and offline handwritten Chinese character recognition: A comprehensive study and new benchmark. Pattern Recognition, 2017, 61: 348−360.
    [4] 李文英、曹斌、曹春水、黄永祯. 一种基于深度学习的青铜器铭文识别方法. 自动化学报, 2018, 44(11): 2023−2030
    [5] Jun Guo, Changhu Wang, Edgar Roman-Rangel, Hongyang Chao, Yong Rui. Building Hierarchical Representations for Oracle Character and Sketch Recognition. IEEE Trans. Image Processing, 2016, 25(1): 104−118 doi: 10.1109/TIP.2015.2500019
    [6] Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle: Greedy Layer-Wise Training of Deep Networks. NIPS 2006: 153−160.
    [7] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich: Going deeper with convolutions. CVPR 2015: 1−9.
    [8] Alexander C. Berg, Tamara L. Berg, Jitendra Malik: Shape Matching and Object Recognition Using Low Distortion Correspondences. CVPR 2005: 26−33.
    [9] Edgar Roman-Rangel, Carlos Pallan, Jean-Marc Odobez, Daniel Gatica-Perez. Analyzing Ancient Maya Glyph Collections with Contextual Shape Descriptors. Int. J. Computer Vision, 2011, 94(1): 101−117 doi: 10.1007/s11263-010-0387-x
    [10] Corinna Cortes, Vladimir Vapnik. Support-Vector Networks. Machine Learning, 1995, 20(3): 273−297
    [11] Qian Yu, Yongxin Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales. Sketch-a-Net: A Deep Neural Network that Beats Humans. Int. J. Computer Vision, 2017, 122(3): 411−425 doi: 10.1007/s11263-016-0932-3
    [12] Antonia Creswell, Anil Anthony Bharath: Adversarial Training for Sketch Retrieval. ECCV Workshops 2016: 798-809.
    [13] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, et al. Generative adversarial networks. Advances in Neural Information Processing Systems, 2014, 3: 2672−2680
    [14] Liu Yang, Rong Jin, Rahul Sukthankar, Yi Liu: An Efficient Algorithm for Local Distance Metric Learning. AAAI 2006: 543−548.
    [15] Liu Yang, Rong Jin, Rahul Sukthankar: Bayesian Active Distance Metric Learning. UAI 2007: 442−449.
    [16] Junlin Hu, Jiwen Lu, Yap-Peng Tan: Discriminative Deep Metric Learning for Face Verification in the Wild. CVPR 2014: 1875−1882.
    [17] Florian Schroff, Dmitry Kalenichenko, James Philbin: FaceNet: A unified embedding for face recognition and clustering. CVPR 2015: 815−823.
    [18] Boqing Gong, Yuan Shi, Fei Sha, Kristen Grauman: Geodesic flow kernel for unsupervised domain adaptation. CVPR 2012: 2066−2073.
    [19] Sinno Jialin Pan, Qiang Yang. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng., 2010, 22(10): 1345−1359 doi: 10.1109/TKDE.2009.191
    [20] Basura Fernando, Amaury Habrard, Marc Sebban, Tinne Tuytelaars: Unsupervised Visual Domain Adaptation Using Subspace Alignment. ICCV 2013: 2960−2967.
    [21] Justin Solomon, Fernando de Goes, Gabriel Peyré, Marco Cuturi, Adrian Butscher, Andy Nguyen, Tao Du, Leonidas J. Guibas. Convolutional Wasserstein distances: efficient optimal transportation on geometric domains. ACM Trans. Graph., 2015, 34(4): 1−66
    [22] Swami Sankaranarayanan, Yogesh Balaji, Arpit Jain, Ser-Nam Lim, Rama Chellappa: Unsupervised Domain Adaptation for Semantic Segmentation with GANs. CoRR abs/1711.06969 (2017).
    [23] Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, Jiwon Kim: Learning to Discover Cross-Domain Relations with Generative Adversarial Networks. ICML 2017: 1857−1865.
    [24] Martín Arjovsky, Soumith Chintala, Léon Bottou: Wasserstein GAN. CoRR abs/1701.07875 (2017).
    [25] Ishaan Gulrajani, Faruk Ahmed, Martín Arjovsky, Vincent Dumoulin, Aaron C. Courville: Improved Training of Wasserstein GANs. NIPS 2017: 5767−5777.
    [26] Yi-Kang Zhang, Heng Zhang, Yongge Liu, Qing Yang, Chen-Lin Liu: Oracle Character Recognition by Nearest Neighbor Classification with Deep Metric Learning. ICDAR 2019: 309−314.
    [27] Sergey Ioffe, Christian Szegedy: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. ICML 2015: 448−456.
    [28] Xavier Glorot, Antoine Bordes, Yoshua Bengio: Deep Sparse Rectifier Neural Networks. AISTATS 2011: 315−323.
  • 加载中
图(14) / 表(3)
计量
  • 文章访问数:  65
  • HTML全文浏览量:  22
  • PDF下载量:  21
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-06-22
  • 录用日期:  2020-10-19
  • 网络出版日期:  2021-02-26

目录

    /

    返回文章
    返回