2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

结合感受野增强和全卷积网络的场景文字检测方法

李晓玉 宋永红 余涛

李晓玉, 宋永红, 余涛. 结合感受野增强和全卷积网络的场景文字检测方法. 自动化学报, 2022, 48(3): 797−807 doi: 10.16383/j.aas.c190376
引用本文: 李晓玉, 宋永红, 余涛. 结合感受野增强和全卷积网络的场景文字检测方法. 自动化学报, 2022, 48(3): 797−807 doi: 10.16383/j.aas.c190376
Li Xiao-Yu, Song Yong-Hong, Yu Tao. Text detection in natural scene images based on enhanced receptive field and fully convolution network. Acta Automatica Sinica, 2022, 48(3): 797−807 doi: 10.16383/j.aas.c190376
Citation: Li Xiao-Yu, Song Yong-Hong, Yu Tao. Text detection in natural scene images based on enhanced receptive field and fully convolution network. Acta Automatica Sinica, 2022, 48(3): 797−807 doi: 10.16383/j.aas.c190376

结合感受野增强和全卷积网络的场景文字检测方法

doi: 10.16383/j.aas.c190376
基金项目: 陕西省自然科学基础研究计划 (2018JM6104), 国家重点研究开发计划 (017YFB1301101)资助
详细信息
    作者简介:

    李晓玉:西安交通大学软件学院硕士研究生. 主要研究方向为自然场景文字检测技术. E-mail: 18155760591@163.com

    宋永红:西安交通大学人工智能学院研究员. 主要研究方向为图像与视频内容理解, 智能软件开发. 本文通信作者. E-mail: songyh@xjtu.edu.cn

    余涛:西安交通大学软件学院硕士研究生. 2018年获得西安交通大学软件学院学士学位. 主要研究方向为自然场景文字检测技术. E-mail: yyttmonster@outlook.com

Text Detection in Natural Scene Images Based on Enhanced Receptive Field and Fully Convolution Network

Funds: Supported by Natural Science Basic Research Program of Shaanxi (2018JM6104) and National Key Research and Development Program of China (017YFB1301101)
More Information
    Author Bio:

    LI Xiao-Yu Master student at the School of Software Engineering, Xi'an Jiaotong University. Her research interest covers text detection in natural scenes

    SONG Yong-Hong Professor at the College of Artificial Intelligence, Xi'an Jiaotong University. Her research interest covers image and video content understanding, and intelligent software development. Corresponding author of this paper

    YU Tao Master student at the School of Software Engineering, Xi'an Jiaotong University. He received his bachelor degree from Xi'an Jiaotong University in 2018. His research interest covers text detection in natural scenes

  • 摘要: 自然场景图像质量易受光照及采集设备的影响, 且其背景复杂, 图像中文字颜色、尺度、排列方向多变, 因此, 自然场景文字检测具有很大的挑战性. 本文提出一种基于全卷积网络的端对端文字检测器, 集中精力在网络结构和损失函数的设计, 通过设计感受野模块并引入 Focalloss、GIoUloss 进行像素点分类和文字包围框回归, 从而获得更加稳定且准确的多方向文字检测器. 实验结果表明本文方法与现有先进方法相比, 无论是在多方向场景文字数据集还是水平场景文字数据集均取得了具有可比性的成绩.
  • 图  1  本文方法检测流程图

    Fig.  1  Flow chart of our detection method

    图  2  本文方法网络结构图

    Fig.  2  Structure of our network

    图  3  离心率与感受野的关系图

    Fig.  3  Structure of the human visual system's receptive field

    图  4  感受野增强模块

    Fig.  4  Receptive field block

    图  5  不同膨胀因子的空洞卷积

    Fig.  5  Dilated convolution with different dilation rates

    图  6  三种IoU相等的情况[11]

    Fig.  6  Three situations with the same IoU[11]

    图  7  各种方法在ICDAR2015测试集检测结果比较

    Fig.  7  Qualitative comparison on ICDAR2015 dataset

    图  8  本文方法在各个数据集上检测结果比较

    Fig.  8  Comparison of detection results on different datasets

    图  9  本文方法检测失败的一些场景图像

    Fig.  9  Some scene image of detect failure

    表  1  ICDAR2015测试集检测结果对比

    Table  1  Qualitative comparison on ICDAR2015 dataset

    方法 召回率 (R) 精确度 (P) F 值
    CNN MSER[22] 0.34 0.35 0.35
    Islam 等[25] 0.64 0.78 0.70
    AJOU[26] 0.47 0.47 0.47
    NJU[22] 0.36 0.70 0.48
    StradVision2[22] 0.37 0.77 0.50
    Zhang 等[23] 0.43 0.71 0.54
    Tian 等[27] 0.52 0.74 0.61
    Yao 等[28] 0.59 0.72 0.65
    Liu 等[29] 0.682 0.732 0.706
    Shi 等[24] 0.768 0.731 0.750
    East PVANET[15] 0.7135 0.8086 0.7571
    East PVANET2x[15] 0.735 0.836 0.782
    EAST PVANET2x MS[15] 0.783 0.833 0.807
    TextBoxes++[30] 0.767 0.872 0.817
    RRD[8] 0.79 0.8569 0.822
    TextSnake[6] 0.804 0.849 0.826
    TextBoxes++ MS[30] 0.785 0.878 0.829
    Lv 等[7] 0.895 0.797 0.843
    本文方法 0.789 0.854 0.82
    下载: 导出CSV

    表  2  MSRA-TD500测试集检测结果对比

    Table  2  Qualitative comparison on MSRA-TD500 dataset

    方法 召回率 (R) 精确度 (P) F 值
    Epshtein 等[31] 0.25 0.25 0.25
    TD-ICDAR[21] 0.52 0.53 0.50
    Zhang 等[23] 0.43 0.71 0.54
    TD-Mixture[21] 0.63 0.63 0.60
    Yao 等[28] 0.59 0.72 0.65
    Kang 等[32] 0.62 0.71 0.66
    Yin 等[33] 0.62 0.81 0.71
    East PVANET[15] 0.6713 0.8356 0.7445
    EAST PVANET2x[15] 0.6743 0.8728 0.7608
    TextSnake[6] 0.739 0.832 0.783
    本文方法 0.689 0.925 0.79
    下载: 导出CSV

    表  3  ICDAR2013测试集检测结果对比

    Table  3  Qualitative comparison on ICDAR2013 dataset

    方法 召回率 (R) 精确度 (P) F 值
    Fasttext[34] 0.69 0.84 0.77
    MMser[35] 0.70 0.86 0.77
    Lu 等[36] 0.70 0.89 0.78
    TextFlow[37] 0.76 0.85 0.80
    TextBoxes [38] 0.74 0.86 0.80
    TextBoxes++[30] 0.74 0.86 0.80
    RRD[8] 0.75 0.88 0.81
    He 等[39] 0.73 0.93 0.82
    FCN[23] 0.78 0.88 0.83
    Qin 等[40] 0.79 0.89 0.83
    Tian 等[41] 0.84 0.84 0.84
    TextBoxes MS[38] 0.83 0.88 0.85
    Lv 等[7] 0.933 0.794 0.858
    TextBoxes++ MS[30] 0.84 0.91 0.88
    EAST PVANET2x[15] 0.8267 0.9264 0.8737
    Tang 等[42] 0.87 0.92 0.90
    本文方法 0.858 0.931 0.893
    下载: 导出CSV

    表  4  多种文字检测方法在ICDAR2015上的精度和速度对比结果

    Table  4  Comparison of accuracy and speed on ICDAR2015 dataset

    方法 测试图片尺寸
    (像素)
    设备 帧率
    (帧/s)
    F 值
    Zhang 等[23] MS TitanX 0.476 0.54
    Tian 等[27] ss-600 GPU 7.14 0.61
    Yao 等[28] 480 p K40m 1.61 0.65
    Shi 等[24] 768 × 768 TitanX 8.9 0.750
    EAST PVANET[15] 720 p TitanX 16.8 0.757
    EAST PVANET2x[15] 720 p TitanX 13.2 0.782
    TextBoxes++[30] 1024 × 1024 TitanX 11.6 0.817
    RRD[8] 1024 × 1024 TitanX 6.5 0.822
    TextSnake[6] 1280 × 768 TitanX 1.1 0.826
    TextBoxes++ MS[30] MS TitanX 2.3 0.829
    Lv 等[7] 512 × 512 TitanX 1 0.843
    本文方法 720 p TitanX 12.5 0.82
    下载: 导出CSV

    表  5  本文方法各组件在ICDAR2015数据集上的作用效果

    Table  5  Effectiveness of various designs on ICDAR2015 dataset

    ResNet50 感受野增强模块 Focalloss GIoUloss 召回率 (R) 精确度 (P) F 值
    × × × × 0.735 0.836 0.782
    × × × 0.764 0.833 0.797
    × × 0.766 0.845 0.802
    × 0.776 0.853 0.813
    0.789 0.854 0.82
    下载: 导出CSV
  • [1] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, et al. SSD: Single shot multibox detector. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer, 2016. 21−37
    [2] Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. In: Proceedings of the 2015 Advances in Neural Information Processing Systems. NIPS, 2015. 91−99
    [3] He W H, Zhang X Y, Yin F, Liu C L. Deep direct regression for multi-oriented scene text detection. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 745−753
    [4] Deng D, Liu H F, Li X L, Cai D. Pixellink: Detecting scene text via instance segmentation. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, LA, USA: AAAI, 2018. 6773−6780
    [5] Yan S, Feng W, Zhao P, Liu C L. Progressive scale expansion network with octave convolution for arbitrary shape scene text detection. In: Proceedings of the 2019 Asian Conference on Pattern Recognition. Springer, Cham, 2019. 663−676
    [6] Long S B, Ruan J Q, Zhang W J, He X, Wu W H, Yao C. TextSnake: A flexible representation for detecting text of arbitrary shapes. In: Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018. 19−35
    [7] Lv P Y, Yao C, Wu W H, Yan S C, Bai X. Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 7553−7563
    [8] Liao M H, Zhu Z, Shi B G, Xia G S, Bai X. Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 5909−5918
    [9] Lyu P Y, Liao M H, Yao C, Wu W H, Bai X. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018. 71−88
    [10] Lin T Y, Goyal P, Girshick R, He K M, Dollar P. Focal loss for dense object detection. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 2999−3007
    [11] Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S. Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 658−666
    [12] Lin T Y, Dollar P, Girshick R, He K M, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 936−944
    [13] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60(6): 84-90 doi: 10.1145/3065386
    [14] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 770−778
    [15] Zhou X Y, Yao C, Wen H, Wang Y Z, Zhou S C, He W R, et al. EAST: An efficient and accurate scene text detector. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 2642−2651
    [16] Liu S T, Huang D, Wang Y H. Receptive field block net for accurate and fast object detection. In: Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018. 404−419
    [17] Szegedy C, Ioffe S, Vanhoucke V, Alemi A A. Inception-v4, Inception-ResNet and the impact of residual connections on learning. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI, 2017. 4278−4284
    [18] Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. arXiv: 1511.07122, 2015.
    [19] Karatzas D, Shafait F, Uchida S, Iwamura M, Bigorda L G I, Mestre S R, et al. ICDAR 2013 robust reading competition. In: Proceedings of the 12th International Conference on Document Analysis and Recognition. Washington, USA: IEEE, 2013. 1484−1493
    [20] Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, et al. ICDAR 2015 competition on robust reading. In: Proceedings of the 13th International Conference on Document Analysis and Recognition. Tunis, Tunisia: IEEE, 2015. 1156−1160
    [21] Yao C, Bai X, Liu W Y, Ma Yi, Tu Z W. Detecting texts of arbitrary orientations in natural images. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE, 2012. 1083−1090
    [22] Yao C, Bai X, Liu W Y. A unified framework for multioriented text detection and recognition. IEEE Transactions on Image Processing, 2014, 23(11): 4737-4749 doi: 10.1109/TIP.2014.2353813
    [23] Zhang Z, Zhang C Q, Shen W, Yao C, Liu W Y, Bai X. Multi-oriented text detection with fully convolutional networks. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 4159−4167
    [24] Shi B G, Bai X, Belongie S. Detecting oriented text in natural images by linking segments. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE 2017. 3482−3490
    [25] Islam M R, Mondal C, Azam M K, Islam A S M. Text detection and recognition using enhanced MSER detection and a novel OCR technique. In: Proceedings of the 5th International Conference on Informatics, Electronics and Vision (ICIEV). Dhaka, Bangladesh: IEEE, 2016. 15−20
    [26] Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016. 2315−2324
    [27] Tian Z, Huang W L, He T, He P, Qiao Y. Detecting text in natural image with connectionist text proposal network. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer, 2016. 56−72
    [28] Yao C, Bai X, Sang N, Zhou X Y, Zhou S C, Cao Z M. Scene text detection via holistic, multi-channel prediction. arXiv: 1606.09002, 2016.
    [29] Liu Y L, Jin L W. Deep matching prior network: Toward tighter multi-oriented text detection. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: 2017. 3454−3461
    [30] Liao M H, Shi B G, Bai X. TextBoxes++: A single-shot oriented scene text detector. IEEE Transactions on Image Processing, 2018, 27(8): 3676-3690 doi: 10.1109/TIP.2018.2825107
    [31] Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In: Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE, 2010. 2963−2970
    [32] Kang L, Li Y, Doermann D. Orientation robust text line detection in natural images. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 2014. 4034−4041
    [33] Yin X C, Pei W Y, Zhang J, Hao H W. Multi-orientation scene text detection with adaptive clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1930-1937 doi: 10.1109/TPAMI.2014.2388210
    [34] Buta M, Neumann L, Matas J. FASText: Efficient unconstrained scene text detector. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 1206−1214
    [35] Zamberletti A, Noce L, Gallo I. Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions. In: Proceedings of the 2015 Asian Conference on Computer Vision. Singapore, Singapore: Springer, 2014. 91−105
    [36] Lu S J, Chen T, Tian S X, Lim J H, Tan C L. Scene text extraction based on edges and support vector regression. International Journal on Document Analysis and Recognition (IJDAR), 2015, 18(2): 125-135 doi: 10.1007/s10032-015-0237-z
    [37] Tian S X, Pan Y F, Huang C, Lu S J, Yu Kai, Tan C L. Text flow: A unified text detection system in natural scene images. In: Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago, Chile: 2015. 4651−4659
    [38] Liao M H, Shi B G, Bai X, Wang X G, Liu W Y. Textboxes: A fast text detector with a single deep neural network. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI, 2017. 4161−4167
    [39] He T, Huang W L, Qiao Y, Yao J. Text-attentional convolutional neural network for scene text detection. IEEE Transactions on Image Processing, 2016, 25(6): 2529-2541 doi: 10.1109/TIP.2016.2547588
    [40] Qin S Y, Manduchi R. A fast and robust text spotter. In: Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision. Lake Placid, USA: IEEE, 2016. 1−8
    [41] Tian C N, Xia Y, Zhang X N, Gao X B. Natural scene text detection with MC-MR candidate extraction and coarse-to-fine filtering. Neurocomputing, 2017, 260: 112-122 doi: 10.1016/j.neucom.2017.03.078
    [42] Tang Y B, Wu X Q. Scene text detection and segmentation based on cascaded convolution neural networks. IEEE Transactions on Image Processing, 2017, 26(3): 1509-1520 doi: 10.1109/TIP.2017.2656474
    [43] 李文英, 曹斌, 曹春水, 黄永祯. 一种基于深度学习的青铜器铭文识别方法. 自动化学报, 2018, 44(11): 2023-2030

    Li Wen-Ying, Cao Bin, Cao Chun-Shui, Huang Yong-Zhen. A deep learning based method for bronze inscription recognition. Acta Automatica Sinica, 2018, 44(11): 2023-2030
    [44] 王润民, 桑农, 丁丁, 陈杰, 叶齐祥, 高常鑫, 等. 自然场景图像中的文本检测综述. 自动化学报, 2018, 44(12): 2113-2141

    Wang Run-Min, Sang Nong, Ding Ding, Chen Jie, Ye Qi-Xiang, Gao Chang-Xin, et al. Text detection in natural scene image: A survey. Acta Automatica Sinica, 2018, 44(12): 2113-2141
    [45] 金连文, 钟卓耀, 杨钊, 杨维信, 谢泽澄, 孙俊. 深度学习在手写汉字识别中的应用综述. 自动化学报, 2016, 42(8): 1125-1141

    Jin Lian-Wen, Zhong Zhuo-Yao, Yang Zhao, Yang Wei-Xin, Xie Ze-Cheng, Sun Jun. Applications of deep learning for handwritten Chinese character recognition: A review. Acta Automatica Sinica, 2016, 42(8): 1125-1141
    [46] Wang W H, Xie E Z, Li X, Hou W B, Lu T, Yu G, Shao S. Shape robust text detection with progressive scale expansion network. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2018. 9328−9337
  • 加载中
图(9) / 表(5)
计量
  • 文章访问数:  631
  • HTML全文浏览量:  328
  • PDF下载量:  189
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-05-16
  • 录用日期:  2019-08-22
  • 网络出版日期:  2022-02-17
  • 刊出日期:  2022-03-25

目录

    /

    返回文章
    返回