2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于文字局部结构相似度量的开放集文字识别方法

刘畅 杨春 殷绪成

刘畅, 杨春, 殷绪成. 基于文字局部结构相似度量的开放集文字识别方法. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c230545
引用本文: 刘畅, 杨春, 殷绪成. 基于文字局部结构相似度量的开放集文字识别方法. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c230545
Liu Chang, Yang Chun, Yin Xu-Cheng. Open-set text recognition via part-based similarity. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c230545
Citation: Liu Chang, Yang Chun, Yin Xu-Cheng. Open-set text recognition via part-based similarity. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c230545

基于文字局部结构相似度量的开放集文字识别方法

doi: 10.16383/j.aas.c230545
基金项目: 新一代人工智能国家科技重大专项 (2020AAA0109701); 国家杰出青年科学基金项目 (62125601); 国家自然科学基金项目 (62076024)资助
详细信息
    作者简介:

    刘畅:北京科技大学 博士, 2024年获北京科技大学博士学位. 主要研究方向为小样本学习, 文本识别和文本检测. E-mail: lasercat@gmx.us

    杨春:北京科技大学 讲师, 2018年获北京科技大学博士学位. 主要研究方向为模式识别, 计算机视觉, 文档分析与识别. 本文通信作者. E-mail: chunyang@ustb.edu.cn

    殷绪成:北京科技大学 教授, 2006年获中国科学院自动化研究所博士学位. 主要研究方向为模式识别与计算机视觉, 文档图像分析与识别, 信息检索与自然语言处理, 人工智能芯片技术及应用. E-mail: xuchengyin@ustb.edu.cn

Open-set text recognition via part-based similarity

Funds: Supported by National Science and Technology Major Project (2020AAA0109701), National Science Fund for Distinguished Young Scholars (62125601), National Natural Science Foundation of China (62076024)
More Information
    Author Bio:

    LIU Chang Ph.D. at University of Science and Technology Beijing. He received his Ph.D. degree from University of Science an Technology Beijing in 2024. His research interests cover text detection, few-shot learning, and text recognition

    YANG Chun Lecturer at University of Science and Technology Beijing. He received his Ph.D. degree from University of Science an Technology Beijing in 2018. His current research interests include pattern recognition, classifier ensemble, and document analysis and recognition. He is the corresponding author of this paper

    YIN Xu-Cheng Professor at University of Science and Technology Beijing. He received his Ph.D. degree from Institute of Automation Chinese Academy of Sciences in 2006. His current research interests include computer vision, document analysis, Information retrieval, NLP, and AI accelerator technology

  • 摘要: 开放集文字识别 (Open-set text recognition, OSTR) 是一项新任务, 旨在解决开放环境下文字识别应用中的语言模型偏差及新字符识别与拒识问题. 最近的 OSTR 方法通过将上下文信息与视觉信息分离来解决语言模型偏差问题. 然而, 这些方法往往忽视了字符视觉细节的重要性. 考虑到上下文信息的偏差, 局部细节信息在区分视觉上接近的字符时变得更加重要. 本文提出了一种基于自适应字符部件表示的开放集文字识别框架, 构建了基于文字局部结构相似度量的开放集文字识别方法, 通过对不同字符部件进行显式建模来改进对局部细节特征的建模能力. 与基于字根 (Radical) 的方法不同, 本文所提出的框架采用数据驱动的部件设计, 具有语言无关的特性和跨语言泛化识别的能力. 此外, 我们还提出一种局部性约束正则项来使模型训 练更加稳定. 大量的对比实验表明, 本文提出的方法在开放集, 传统闭集文字识别任务上均具有良好的性能.
    1)  11 代码, 模型, 文档见: https://github.com/lancercat/OAPR
    2)  22 本文中数学符号的样式表示其类型, $ \text{X} $表示函数, $ \mathbf{X} $表示集合, $ X $表示变量, $ \boldsymbol{X} $表示数组, $ x $ 表示单个数字.
    3)  33 具体来说, 本文的基线训练过程中移除了数据中没有的中文繁体字, 在之前的一些工作中, 这些字符由于没有对应的数据会一直被当作负样本, 导致性能损失. 值得注意的是, 本工作在训练中实际上使用了严格更少的信息, 所以比较仍然是公平的.
    4)  44 注意, 字符在特征空间的区域可能有交集.
  • 图  1  基于整字符识别的基线模型((a))容易混淆形近字, 这个问题在我们提出的基于自适应字符部件表示的开放集文字识别框架((b))中得到了一定程度的缓解

    Fig.  1  The proposed part based framework (a) can successfully alleviate the confusion among close characters, compared to the whole-character-based basemethod (b)

    图  2  开放集文字识别任务示意图[24]

    Fig.  2  An illustration of the Open-set text task[24]

    图  3  本文提出的基于自适应字符部件表示的开放集文字识别框架

    Fig.  3  The proposed adaptive part representation based open-set text recognition framework (OAPR)

    图  4  行级部件注意力模块 (Part attention line module, PALM) 该模块将字符图像特征进行序列化为各时序的字符部件特征

    Fig.  4  The proposed part attention line module (PALM)

    图  5  字符级部件注意力模块 (Part Attention Character Module, PACM)

    Fig.  5  The proposed Part attention character module (PACM)

    图  6  部件相似度模块 (Part similarity recognition module, PSRM)

    Fig.  6  The proposed part similarity recognition module (PSRM)

    图  7  消融实验详细结果图

    Fig.  7  Details of each individual run in the ablative studies

    图  8  基线与我们的模型的识别结果对比上边一行为图像中文字的真值, 其中白色代表训练中见过的字符, 黄色代表新字符. 下边一行为模型的预测结果, 绿色表示识别正确的结果(最长子序列匹配的字符), 红色表示识别错误的结果

    Fig.  8  More comparison between basemethod (top) and the proposed OAPR framework (bottom) The top row in each result clip indicates ground truth. White characters are seen characters during training, while yellow characters are novel characters. The bottom row indicates prediction, where green characters are correctly recognized characters, red characters are wrong predictions

    图  9  日文数据集上的识别结果(GZSL 划分)上边一行为图像中文字的真值, 其中白色代表训练中见过的字符, 黄色代表新字符. 下边一行为模型的预测结果, 绿色表示识别正确的结果(最长子序列匹配的字符), 红色表示识别错误的结果

    Fig.  9  Sample results on the Japanese testing set (with GZSL split)

    图  10  韩文数据集识别结果. 上边一行为图像中文字的真值, 其中白色代表训练中见过的字符, 黄色代表新字符. 下边一行为模型的预测结果, 绿色表示识别正确的结果(最长子序列匹 配的字符), 红色表示识别错误的结果紫色块代表模型拒识该处字符

    Fig.  10  Sample recognition results on the Korean language

    图  11  封闭集上的识别结果展示. 上边一行为图像中文字的真值, 其中白色代表训练中见过的字符, 黄色代表新字符. 下边一行为模型的预测结果, 绿色表示识别正确的结果(最长子 序列匹配的字符), 红色表示识别错误的结果紫色块代表模型拒识该处字符

    Fig.  11  Sample results from the close-set benchmark

    表  1  消融实验

    Table  1  Ablative studies

    自适应字符
    部件表示
    局部性
    约束
    Avg LA $ \uparrow $Gap LA $ \downarrow $
    Ours$\checkmark $$\checkmark $39.614.91
    仅自适应字符部件表示$\checkmark $38.916.54
    字符整体特征34.042.27
    下载: 导出CSV

    表  2  开放集文字识别性能

    Table  2  Performance on open-set text recognition benchmarks

    Split$ \mathbf{C}_{test}^k $$ \mathbf{C}_{test}^u $NameVenueLA(%)Recall(%)Precision(%)F-measure(%)
    Unique KanjiOSOCR-Large[8]PR' 202330.83
    GZSLShared Kanji,$ \emptyset $OpenCCD[9]CVPR' 202236.57
    Kana, Latin,OpenCCD-Large[9]CVPR' 202241.31
    Ours39.61
    Ours-Large40.91
    OSRShared KanjiUnique KanjiOSOCR-Large[8]PR' 202374.3511.2798.2820.23
    LatinKanaOpenCCD-Large*[9]CVPR' 202284.7630.6398.9046.78
    Ours73.5664.3096.2176.66
    Ours-Large77.1560.5996.8074.52
    GOSRShared KanjiKanaOSOCR-Large[8]PR' 202356.033.0363.525.78
    Unique KanjiOpenCCD-Large*[9]CVPR' 202268.293.4786.116.68
    LatinOurs65.0754.1282.5264.65
    Ours-Large67.4047.6482.9960.53
    OSTRShared Kanji,KanaOSOCR-Large[8]PR' 202358.5724.4693.7838.80
    Unique KanjiLatinOpenCCD-Large*[9]CVPR' 202269.8235.9597.0352.47
    Ours68.2081.0489.8685.07
    Ours-Large69.8775.9791.1882.88
    注: *表示原论文中未报告的性能, 数据来自原作者代码仓库和释出的模型.
    下载: 导出CSV

    表  3  封闭集文字识别性能[23] 及单 batch 速度结果, 其他方法数据来自相应论文报告

    Table  3  Performance on close-set benchmarks and single batch inference speed

    Method Venue IIIT5K CUTE SVT IC03 IC13 GPU TFlops FPS
    CA-FCN*[22]AAAI'1992.079.982.191.4Titan XP1245
    Comb.Best[23]ICCV'1987.974.087.594.492.3Tesla P401236
    PERN[47]CVPR'2192.181.392.094.994.7Tesla V1001444
    JVSR[48]ICCV'2195.289.792.295.5RTX 2080Ti13.638
    ABINet[49]T-PAMI'2396.289.293.597.495.7V1001429.4
    CRNN[21, 23]T-PAMI'1782.965.581.692.689.2Tesla P4012227
    Rosetta[23, 50]KDD'1884.369.284.792.989.0Tesla P4012212
    ViTSTR[51]ICDAR'2188.481.387.794.392.4RTX 2080Ti13.6102
    GLaLT-Big-Aug[52]TNNLS'2390.477.190.095.295.362.1
    Ours-Large89.0677.7780.6889.6187.98Tesla P401285.70
    下载: 导出CSV
  • [1] 李文英, 曹斌, 曹春水, 黄永祯. 一种基于深度学习的青铜器铭文识别方法. 自动化学报, 2018, 44(11): 2023−2030

    Li Wen-Ying, Cao Bin, Cao Chun-Shui, Huang Yong-Zhen. A deep learning based method for bronze inscription recognition. Acta Automatica Sinica, 2018, 44(11): 2023−2030
    [2] Zheng T L, Chen Z N, Huang B C, Zhang W, Jiang Y G. MRN: Multiplexed routing network for incremental multilingual text recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Paris, France: IEEE, 2023. 18598−18607
    [3] 麻斯亮, 许勇. 叠层模型驱动的书法文字识别方法研究. 自动化学报, 2024, 50(5): 947−957

    Ma Si-Liang, Xu Yong. Calligraphy character recognition method driven by stacked model. Acta Automatica Sinica, 2024, 50(5): 947−957
    [4] 张颐康, 张恒, 刘永革, 刘成林. 基于跨模态深度度量学习的甲骨文字识别. 自动化学报, 2021, 47(4): 791−800

    Zhang Yi-Kang, Zhang Heng, Liu Yong-Ge, Liu Cheng-Lin. Oracle character recognition based on cross-modal deep metric learning. Acta Automatica Sinica, 2021, 47(4): 791−800
    [5] Zhang C H, Gupta A, Zisserman A. Adaptive text recognition through visual matching. In: Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 51−67
    [6] Souibgui M A, Fornés A, Kessentini Y, Megyesi B. Few shots are all you need: A progressive learning approach for low resource handwritten text recognition. Pattern Recognition Letters, 2022, 160: 43−49 doi: 10.1016/j.patrec.2022.06.003
    [7] Kordon F, Weichselbaumer N, Herz R, Mossman S, Potten E, Seuret M, et al. Classification of incunable glyphs and out-of-distribution detection with joint energy-based models. International Journal on Document Analysis and Recognition, 2023, 26(3): 223−240 doi: 10.1007/s10032-023-00442-x
    [8] Liu C, Yang C, Qin H B, Zhu X B, Liu C L, Yin X C. Towards open-set text recognition via label-to-prototype learning. Pattern Recognition, 2023, 134: Article No. 109109 doi: 10.1016/j.patcog.2022.109109
    [9] Liu C, Yang C, Yin X C. Open-set text recognition via character-context decoupling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 4513−4522
    [10] Liu C, Yang C, Yin X C. Open-set text recognition via shape-awareness visual reconstruction. In: Proceedings of the 17th International Conference on Document Analysis and Recognition. San José, USA: Springer, 2023. 89−105
    [11] Yu H Y, Chen J Y, Li B, Ma J, Guan M N, Xu X X, et al. Benchmarking Chinese text recognition: Datasets, baselines, and an empirical study. arXiv: 2112.15093, 2021. (查阅网上资料, 不确定文献类型, 请确认)

    Yu H Y, Chen J Y, Li B, Ma J, Guan M N, Xu X X, et al. Benchmarking Chinese text recognition: Datasets, baselines, and an empirical study. arXiv: 2112.15093, 2021. (查阅网上资料, 不确定文献类型, 请确认)
    [12] Wan Z Y, Zhang J L, Zhang L, Luo J B, Yao C. On vocabulary reliance in scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 11422−11431
    [13] Zhang J Y, Liu C, Yang C. SAN: Structure-aware network for complex and long-tailed Chinese text recognition. In: Proceedings of the 17th International Conference on Document Analysis and Recognition. San José, USA: Springer, 2023. 244−258
    [14] Yao C, Bai X, Shi B G, Liu W Y. Strokelets: A learned multi-scale representation for scene text recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014. 4042−4049
    [15] Seok J H, Kim J H. Scene text recognition using a Hough forest implicit shape model and semi-Markov conditional random fields. Pattern Recognition, 2015, 48(11): 3584−3599 doi: 10.1016/j.patcog.2015.05.004
    [16] Li B C, Tang X, Qi X B, Chen Y H, Xiao R. Hamming OCR: A locality sensitive hashing neural network for scene text recognition. arXiv: 2009.10874, 2020. (查阅网上资料, 不确定文献类型, 请确认)

    Li B C, Tang X, Qi X B, Chen Y H, Xiao R. Hamming OCR: A locality sensitive hashing neural network for scene text recognition. arXiv: 2009.10874, 2020. (查阅网上资料, 不确定文献类型, 请确认)
    [17] Wang T, Xie Z, Li Z, et al. Radical aggregation network for few-shot offline handwritten Chinese character recognition. Pattern Recognition Letters, 2019, 125: 821−827 doi: 10.1016/j.patrec.2019.08.005
    [18] Cao Z, Lu J, Cui S, Zhang C S. Zero-shot handwritten Chinese character recognition with hierarchical decomposition embedding. Pattern Recognition, 2020, 107: Article No. 107488 doi: 10.1016/j.patcog.2020.107488
    [19] Chen J Y, Li B, Xue X Y. Zero-shot Chinese character recognition with stroke-level decomposition. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence. Montreal, Canada: IJCAI.org, 2021. 615−621
    [20] Zu X Y, Yu H Y, Li B, Xue X Y. Chinese character recognition with augmented character profile matching. In: Proceedings of the 30th ACM International Conference on Multimedia. Lisboa, Portugal: ACM, 2022. 6094−6102
    [21] Shi B G, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11): 2298−2304 doi: 10.1109/TPAMI.2016.2646371
    [22] Liao M H, Zhang J, Wan Z Y, Xie F M, Liang J J, Lyu P Y, et al. Scene text recognition from two-dimensional perspective. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, The 31st Innovative Applications of Artificial Intelligence Conference, The 9th AAAI Symposium on Educational Advances in Artificial Intelligence. Honolulu, USA: AAAI Press, 2019. 8714−8721
    [23] Baek J, Kim G, Lee J, Park S, Han D, Yun S, et al. What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE, 2019. 4714−4722
    [24] 杨春, 刘畅, 方治屿, 韩铮, 刘成林, 殷绪成. 开放集文字识别技术. 中国图象图形学报, 2023, 28(6): 1767−1791 doi: 10.11834/jig.230018

    Yang Chun, Liu Chang, Fang Zhi-Yu, Han Zheng, Liu Cheng-Lin, Yin Xu-Cheng. Open set text recognition technology. Journal of Image and Graphics, 2023, 28(6): 1767−1791 doi: 10.11834/jig.230018
    [25] He J, Chen J N, Lin M X, Yu Q H, Yuille A. Compositor: Bottom-up clustering and compositing for robust part and object segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada: IEEE, 2023. 11259−11268
    [26] Pourpanah F, Abdar M, Luo Y X, Zhou X L, Wang R, Lim C P, et al. A review of generalized zero-shot learning methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(4): 4051−4070
    [27] Zhang J S, Du J, Dai L R. Radical analysis network for learning hierarchies of Chinese characters. Pattern Recognition, 2020, 103: Article No. 107305 doi: 10.1016/j.patcog.2020.107305
    [28] He S, Schomaker L. Open set Chinese character recognition using multi-typed attributes. arXiv: 1808.08993, 2018. (查阅网上资料, 不确定文献类型, 请确认)

    He S, Schomaker L. Open set Chinese character recognition using multi-typed attributes. arXiv: 1808.08993, 2018. (查阅网上资料, 不确定文献类型, 请确认)
    [29] Huang Y H, Jin L W, Peng D Z. Zero-shot Chinese text recognition via matching class embedding. In: Proceedings of the 16th International Conference on Document Analysis and Recognition. Lausanne, Switzerland: Springer, 2021. 127−141
    [30] Wang W C, Zhang J S, Du J, Wang Z R, Zhu Y X. DenseRAN for offline handwritten Chinese character recognition. In: Proceedings of the 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). Niagara Falls, USA: IEEE, 2018. 104−109
    [31] Chen S, Zhao Q. Divide and conquer: Answering questions with object factorization and compositional reasoning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada: IEEE, 2023. 6736−6745
    [32] Geng Z G, Wang C Y, Wei Y X, Liu Z, Li H Q, Hu H. Human pose as compositional tokens. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada: IEEE, 2023. 660−671
    [33] Zhang H, Li F, Liu S L, Zhang L, Su H, Zhu J, et al. DINO: DETR with improved DeNoising anchor boxes for end-to-end object detection. In: Proceedings of the 11th International Conference on Learning Representations. Kigali, Rwanda: OpenReview.net, 2023.
    [34] Chng C K, Liu Y L, Sun Y P, Ng C C, Luo C J, Ni Z H, et al. ICDAR2019 robust reading challenge on arbitrary-shaped text-RRC-ArT. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia: IEEE, 2019. 1571−1576
    [35] Sun Y P, Ni Z H, Chng C K, Liu Y L, Luo C J, Ng C C, et al. ICDAR 2019 competition on large-scale street view text with partial labeling-RRC-LSVT. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia: IEEE, 2019. 1557−1562
    [36] Yuan T L, Zhu Z, Xu K, Li C J, Mu T J, Hu S M. A large Chinese text dataset in the wild. Journal of Computer Science and Technology, 2019, 34(3): 509−521 doi: 10.1007/s11390-019-1923-y
    [37] Shi B G, Yao C, Liao M H, Yang M K, Xu P, Cui L Y, et al. ICDAR2017 competition on reading Chinese text in the wild (RCTW-17). In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Kyoto, Japan: IEEE, 2017. 1429−1434
    [38] Nayef N, Patel Y, Busta M, Chowdhury P N, Karatzas D, Khlif W, et al. ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition-RRC-MLT-2019. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia: IEEE, 2019. 1582−1587
    [39] Jaderberg M, Simonyan K, Vedaldi A, Zisserman A. Synthetic data and artificial neural networks for natural scene text recognition. In: Proceedings of the Deep Learning Workshop. Neural Information Processing Systems, 2014. (查阅网上资料, 未找到本条文献出版地信息, 请确认)

    Jaderberg M, Simonyan K, Vedaldi A, Zisserman A. Synthetic data and artificial neural networks for natural scene text recognition. In: Proceedings of the Deep Learning Workshop. Neural Information Processing Systems, 2014. (查阅网上资料, 未找到本条文献出版地信息, 请确认)
    [40] Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE Computer Society, 2016. 2315−2324
    [41] Mishra A, Alahari K, Jawahar C V. Scene text recognition using higher order language priors. In: Proceedings of the British Machine Vision Conference. Surrey, UK: BMVA Press, 2012. 1−11
    [42] Risnumawan A, Shivakumara P, Chan C S, Tan C L. A robust arbitrary text detection system for natural scene images. Expert Systems With Applications, 2014, 41(18): 8027−8048 doi: 10.1016/j.eswa.2014.07.008
    [43] Wang K, Babenko B, Belongie S. End-to-end scene text recognition. In: Proceedings of the IEEE International Conference on Computer Vision. Barcelona, Spain: IEEE Computer Society, 2011. 1457−1464
    [44] Lucas S M, Panaretos A, Sosa L, Tang A, Wong S, Young R, et al. ICDAR 2003 robust reading competitions: Entries, results, and future directions. International Journal of Document Analysis and Recognition, 2005, 7(2-3): 105−122 doi: 10.1007/s10032-004-0134-3
    [45] Karatzas D, Shafait F, Uchida S, Iwamura M, Bigorda L G I, Mestre S R, et al. ICDAR 2013 robust reading competition. In: Proceedings of the 12th International Conference on Document Analysis and Recognition. Washington, USA: IEEE Computer Society, 2013. 1484−1493
    [46] Geng C X, Huang S J, Chen S C. Recent advances in open set recognition: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3614−3631 doi: 10.1109/TPAMI.2020.2981604
    [47] Yan R J, Peng L R, Xiao S Y, Yao G. Primitive representation learning for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE, 2021. 284−293
    [48] Bhunia A K, Sain A, Kumar A, Ghose S, Chowdhury P N, Song Y Z. Joint visual semantic reasoning: Multi-stage decoder for text recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE, 2021. 14920−14929
    [49] Fang S C, Mao Z D, Xie H T, Wang Y X, Yan C G, Zhang Y D. ABINet++: Autonomous, bidirectional and iterative language modeling for scene text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(6): 7123−7141 doi: 10.1109/TPAMI.2022.3223908
    [50] Borisyuk F, Gordo A, Sivakumar V. Rosetta: Large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London, UK: ACM, 2018. 71−79
    [51] Atienza R. Vision transformer for fast and efficient scene text recognition. In: Proceedings of the 16th International Conference on Document Analysis and Recognition. Lausanne, Switzerland: Springer, 2021. 319−334
    [52] Zhang H, Luo G Y, Kang J, Huang S, Wang X, Wang F Y. GLaLT: Global-local attention-augmented light transformer for scene text recognition. IEEE Transactions on Neural Networks and Learning Systems, to be published, DOI: 10.1109/TNNLS.2023.3239696
    [53] Fang S C, Xie H T, Wang Y X, Mao Z D, Zhang Y D. Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE, 2021. 7098−7107
  • 加载中
计量
  • 文章访问数:  30
  • HTML全文浏览量:  15
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-09-04
  • 录用日期:  2024-04-19
  • 网络出版日期:  2024-07-11

目录

    /

    返回文章
    返回