2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一种基于信息熵迁移的文本检测模型自蒸馏方法

陈建炜 杨帆 赖永炫

陈建炜, 杨帆, 赖永炫. 一种基于信息熵迁移的文本检测模型自蒸馏方法. 自动化学报, 2024, 50(11): 1−12 doi: 10.16383/j.aas.c210598
引用本文: 陈建炜, 杨帆, 赖永炫. 一种基于信息熵迁移的文本检测模型自蒸馏方法. 自动化学报, 2024, 50(11): 1−12 doi: 10.16383/j.aas.c210598
Chen Jian-Wei, Yang Fan, Lai Yong-Xuan. Self-distillation via entropy transfer for scene text detection. Acta Automatica Sinica, 2024, 50(11): 1−12 doi: 10.16383/j.aas.c210598
Citation: Chen Jian-Wei, Yang Fan, Lai Yong-Xuan. Self-distillation via entropy transfer for scene text detection. Acta Automatica Sinica, 2024, 50(11): 1−12 doi: 10.16383/j.aas.c210598

一种基于信息熵迁移的文本检测模型自蒸馏方法

doi: 10.16383/j.aas.c210598
基金项目: 科技创新2030——“新一代人工智能”重大项目(2021ZD0112600), 国家自然科学基金委员会面上项目(62173282, 61872154), 广东省自然科学基金(2021A1515011578), 深圳市基础研究专项面上项目(JCYJ20190809161603551)资助
详细信息
    作者简介:

    陈建炜:厦门大学航空航天学院硕士研究生. 主要研究方向为计算机视觉, 图像处理. E-mail: jianweichen@ stu.xmu.edu.cn

    杨帆:厦门大学航空航天学院副教授. 主要研究方向为机器学习, 数据挖掘和生物信息学. 本文通信作者. E-mail: yang@xmu.edu.cn

    赖永炫:厦门大学信息学院教授. 主要研究方向为大数据分析和管理, 智能交通系统, 深度学习和车载网络. E-mail: laiyx@xmu.edu.cn

Self-distillation via Entropy Transfer for Scene Text Detection

Funds: Supported by National Key Research and Development Program of China (2021ZD0112600), National Natural Science Foundation of China (62173282, 61872154), Natural Science Foundation of Guangdong Province (2021A1515011578), and Shenzhen Fundamental Research Program (JCYJ20190809161603551)
More Information
    Author Bio:

    CHEN Jian-Wei Master student at the School of Aerospace Engineering, Xiamen University. His research interest covers computer vision and image processing

    YANG Fan Associate professor at the School of Aerospace Engineering, Xiamen University. His research interest covers machine learning, data mining, and bio-informatics. Corresponding author of this paper

    LAI Yong-Xuan Professor at the School of Informatics, Xiamen University. His research interest covers big data analysis and management, intelligent transportation systems, deep learning, and vehicular networks

  • 摘要: 前沿的自然场景文本检测方法大多基于全卷积语义分割网络, 利用像素级分类结果有效检测任意形状的文本, 其主要缺点是模型大、推理时间长、内存占用高, 这在实际应用中限制了其部署. 提出一种基于信息熵迁移的自蒸馏训练方法(Self-distillation via entropy transfer, SDET), 利用文本检测网络深层网络输出的分割图(Segmentation map, SM)信息熵作为待迁移知识, 通过辅助网络将信息熵反馈给浅层网络. 与依赖教师网络的知识蒸馏 (Knowledge distillation, KD)不同, SDET仅在训练阶段增加一个辅助网络, 以微小的额外训练代价实现无需教师网络的自蒸馏(Self-distillation, SD). 在多个自然场景文本检测的标准数据集上的实验结果表明, SDET在基线文本检测网络的召回率和F1得分上, 能显著优于其他蒸馏方法.
  • 图  1  可微二值化文本检测网络的分割图和信息熵图可视化

    Fig.  1  Segmentation map and entropy map visualization of differentiable binarization text detection network

    图  2  不同知识蒸馏方法对比

    Fig.  2  Comparison of different knowledge distillation methods

    图  3  SDET训练框架

    Fig.  3  SDET training framework

    图  4  辅助网络的3种结构形式

    Fig.  4  The three types of auxiliary networks

    图  5  SDET与基线模型的检测结果对比((a)真实标签; (b)基线模型检测结果; (c) SDET训练后的模型检测结果)

    Fig.  5  Comparison of detection results between SDET and baseline models ((a) Ground-truth; (b) Detection results of baseline models; (c) Detection results of models trained with SDET)

    表  1  不同辅助分类器对SDET的影响 (%)

    Table  1  The impact of different auxiliary classifiers on SDET (%)

    模型方法ICDAR2013ICDAR2015
    PRFPRF
    MV3-EAST基线81.764.472.080.975.478.0
    A型78.865.971.878.876.377.5
    B型84.466.574.481.377.079.1
    C型81.467.473.778.977.778.3
    MV3-DB基线83.766.073.887.171.878.7
    A型84.168.875.786.573.979.7
    B型81.167.373.687.871.778.9
    C型84.967.975.487.873.079.7
    下载: 导出CSV

    表  2  不同特征金字塔位置对B型的影响 (%)

    Table  2  The impact of different feature pyramid positions on type B (%)

    方法特征图尺寸(像素)PRF
    基线80.975.478.0
    P0${\text{16}} \times {\text{16}}$79.175.877.4
    P1${\text{32}} \times {\text{32}}$79.576.578.0
    P2${\text{64}} \times {\text{64}}$80.777.479.0
    P3${\text{128}} \times {\text{128}}$81.377.079.1
    下载: 导出CSV

    表  3  MV3-DB在不同数据集上的知识蒸馏实验结果(%)

    Table  3  Experimental results of knowledge distillation of MV3-DB on different datasets (%)

    方法ICDAR2013TD500TD-TRICDAR2015Total-textCASIA-10K
    PRFPRFPRFPRFPRFPRF
    基线83.766.073.878.771.474.983.674.478.787.171.878.787.266.975.788.151.965.3
    ST82.565.873.277.073.074.984.673.578.785.472.278.287.465.374.888.849.463.5
    KA82.566.873.879.571.375.286.372.578.885.073.378.785.966.875.287.851.464.8
    FitNets84.765.473.878.673.375.885.374.079.285.373.378.887.467.576.288.052.365.6
    SKD82.468.875.081.270.675.584.874.579.387.471.678.787.467.075.988.651.665.2
    SD83.567.874.879.472.275.685.074.079.185.173.078.687.067.676.187.152.065.1
    SAD82.866.773.978.772.375.487.372.078.986.772.779.186.567.175.688.450.764.4
    本文方法84.168.875.780.672.276.285.674.679.786.573.979.787.568.476.887.453.466.3
    下载: 导出CSV

    表  4  MV3-EAST在不同数据集上的知识蒸馏实验结果(%)

    Table  4  Experimental results of knowledge distillation of MV3-EAST on different datasets (%)

    方法ICDAR2013ICDAR2015CASIA-10K
    PRFPRFPRF
    基线81.764.472.080.975.478.066.164.965.5
    ST77.864.970.880.975.177.964.765.164.9
    KA78.664.070.578.276.477.367.763.065.3
    FitNets82.465.873.278.077.877.965.464.264.8
    SKD79.566.372.381.975.678.666.664.765.6
    SD80.263.871.179.674.777.166.263.564.8
    SAD81.465.672.680.276.578.365.764.164.9
    本文方法84.466.574.481.377.079.170.863.066.7
    下载: 导出CSV

    表  5  SDET与DSN在不同数据集上的对比(%)

    Table  5  Comparison of SDET and DSN on different datasets (%)

    方法ICDAR2013TD500TD-TRICDAR2015Total-textCASIA-10K
    PRFPRFPRFPRFPRFPRF
    基线83.766.073.878.771.474.983.674.478.787.171.878.787.266.975.788.151.965.3
    DSN84.468.075.379.771.575.486.472.278.785.873.479.186.167.975.987.952.365.6
    本文方法84.168.875.780.672.276.285.674.679.786.573.979.787.568.476.887.453.466.3
    下载: 导出CSV

    表  6  SDET在不同数据集上提升ResNet50-DB的效果(%)

    Table  6  The effect of SDET on improving ResNet50-DB on different datasets (%)

    方法ICDAR2013TD500TD-TRICDAR2015Total-textCASIA-10K
    PRFPRFPRFPRFPRFPRF
    基线86.372.979.084.175.979.887.380.483.790.380.184.987.779.483.390.164.775.3
    本文方法82.777.279.979.981.580.787.283.085.090.382.186.087.481.884.586.068.776.4
    下载: 导出CSV
  • [1] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015. 3431−3440
    [2] Yuan Y H, Chen X L, Wang J D. Object-contextual representations for semantic segmentation. arXiv preprint arXiv: 1909.11065, 2019.
    [3] Lv P Y, Liao M H, Yao C, Wu W H, Bai X. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer, 2018. 67−83
    [4] He K M, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 2961−2969
    [5] Ye J, Chen Z, Liu J H, Du B. TextFuseNet: Scene text detection with richer fused features. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. Yokohama, Japan: 2020. 516−522
    [6] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 770−778
    [7] Hinton G E, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv preprint arXiv: 1503.02531, 2015.
    [8] 赖轩, 曲延云, 谢源, 裴玉龙. 基于拓扑一致性对抗互学习的知识蒸馏. 自动化学报, 2023, 49(1): 102−110 doi: 10.16383/j.aas.200665

    Lai Xuan, Qu Yan-Yun, Xie Yuan, Pei Yu-Long. Topology-guided adversarial deep mutual learning for knowledge distillation. Acta Automatica Sinica, 2023, 49(1): 102−110 doi: 10.16383/j.aas.200665
    [9] Romero A, Ballas N, Kahou S E, Chassang A, Gatta C, Bengio Y. FitNets: Hints for thin deep nets. arXiv preprint arXiv: 1412.6550, 2014.
    [10] Zagoruyko S, Komodakis N. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv: 1612.03928, 2016.
    [11] Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, et al. ICDAR2015 competition on robust reading. In: Proceedings of the 13th International Conference on Document Analysis and Recognition. Nancy, France: IEEE, 2015. 1156−1160
    [12] Chng C K, Chan C S. Total-text: A comprehensive data-set for scene text detection and recognition. In: Proceedings of the 14th International Conference on Document Analysis and Recognition. Kyoto, Japan: IEEE, 2017. 935−942
    [13] Cho J H, Hariharan B. On the efficacy of knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, South Korea: IEEE, 2019. 4794−4802
    [14] Yang P, Yang G W, Gong X, Wu P P, Han X, Wu J S, et al. Instance segmentation network with self-distillation for scene text detection. IEEE Access, 2020, 8: 45825−45836 doi: 10.1109/ACCESS.2020.2978225
    [15] Vu T H, Jain H, Bucher M, Cord M, Pérez P. Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 2517−2526
    [16] Lee C Y, Xie S N, Gallagher P, Zhang Z Y, Tu Z W. Deeply-supervised nets. In: Proceedings of the 18th International Conference on Artificial Intelligence and Statistics. San Diego, USA: PMLR, 2015. 562−570
    [17] Hou Y N, Ma Z, Liu C X, Loy C C. Learning lightweight lane detection CNNs by self attention distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, South Korea: IEEE, 2019. 1013−1021
    [18] 王润民, 桑农, 丁丁, 陈杰, 叶齐祥, 高常鑫, 等. 自然场景图像中的文本检测综述. 自动化学报, 2018, 44(12): 2113−2141

    Wang Run-Min, Sang Nong, Ding Ding, Chen Jie, Ye Qi-Xiang, Gao Chang-Xin, et al. Text detection in natural scene image: A survey. Acta Automatica Sinica, 2018, 44(12): 2113−2141
    [19] Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv preprint arXiv: 1506.01497, 2015.
    [20] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, et al. SSD: Single shot multi-box detector. In: Proceedings of the European Conference on Computer Vision. Amsterdam, Netherlands: 2016. 21−37
    [21] Liao M H, Shi B G, Bai X, Wang X G, Liu W Y. Textboxes: A fast text detector with a single deep neural network. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI, 2017. 4161−4167
    [22] Tian Z, Huang W L, He T, He P, Qiao Y. Detecting text in natural image with connectionist text proposal network. In: Proce-edings of the European Conference on Computer Vision. Amsterdam, Netherlands: Springer, 2016. 56−72
    [23] Zhou X Y, Yao C, Wen H, Wang Y Z, Zhou S C, He W R, et al. East: An efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 5551−5560
    [24] Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. In: Proceedings of the Medical Image Computing and Computer Assisted Intervention. Munich, Germany: Springer, 2015. 234−241
    [25] Liao M H, Wan Z Y, Yao C, Chen K, Bai X. Real-time scene text detection with differentiable binarization. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 11474−11481
    [26] Wang W H, Xie E Z, Li X, Hou W B, Lu T, Yu G, et al. Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 9336−9345
    [27] Wang W H, Xie E Z, Song X G, Zang Y H, Wang W J, Lu T, et al. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, South Korea: IEEE, 2019. 8440−8449
    [28] Xu Y C, Wang Y K, Zhou W, Wang Y P, Yang Z B, Bai X. Textfield: Learning a deep direction field for irregular scene text detection. IEEE Transactions on Image Processing, 2019, 28(11): 5566−5579 doi: 10.1109/TIP.2019.2900589
    [29] He T, Shen C H, Tian Z, Gong D, Sun C M, Yan Y L. Knowledge adaptation for efficient semantic segmentation. In: Proce-edings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 578−587
    [30] Liu Y F, Chen K, Liu C, Qin Z C, Luo Z B, Wang J D. Structured knowledge distillation for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 2604−2613
    [31] Wang Y K, Zhou W, Jiang T, Bai X, Xu Y C. Intra-class feature variation distillation for semantic segmentation. In: Proce-edings of the European Conference on Computer Vision. Glasg-ow, UK: Springer, 2020. 346−362
    [32] Zhang L F, Song J B, Gao A, Chen J W, Bao C L, Ma K S. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. Seo-ul, South Korea: IEEE, 2019. 3713−3722
    [33] Howard A, Sandler M, Chu G, Chen L C, Chen B, Tan M X, et al. Searching for MobileNetV3. In: Proceedings of the IEEE/ CVF International Conference on Computer Vision. Seoul, South Korea: IEEE, 2019. 1314−1324
    [34] Lin T Y, Dollár P, Girshick R, He K M, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 2117−2125
    [35] Chen Z Y, Xu Q Q, Cong R M, Huang Q M. Global context-aware progressive aggregation network for salient object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 10599−10606
    [36] Karatzas D, Shafait F, Uchida S, Iwamura M I, Bigorda L G, Mestre S R, et al. ICDAR2013 robust reading competition. In: Proceedings of the 12th International Conference on Document Analysis and Recognition. Washington DC, USA: IEEE, 2013. 1484−1493
    [37] Yao C, Bai X, Liu W Y, Ma Y, Tu Z W. Detecting texts of arbitrary orientations in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE, 2012. 1083−1090
    [38] Xue C H, Lu S J, Zhan F N. Accurate scene text detection through border semantics awareness and bootstrapping. In: Proceedings of the European Conference on Computer Vision. Munich, Germany: IEEE, 2018. 355−372
    [39] He W H, Zhang X Y, Yin F, Liu C L. Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Transactions on Image Processing, 2018, 27(11): 5406−5419 doi: 10.1109/TIP.2018.2855399
  • 加载中
计量
  • 文章访问数:  311
  • HTML全文浏览量:  133
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-06-29
  • 录用日期:  2022-02-10
  • 网络出版日期:  2023-10-12

目录

    /

    返回文章
    返回