Dehazeformer: Nonhomogeneous Image Dehazing With Collaborative Global-local Network
-
摘要: 近年来, 基于卷积神经网络(Convolutional neural network, CNN) 的图像去雾方法在合成数据集上取得了显著的进展, 但由于真实场景中存在雾分布不均的问题, 卷积运算的局部感受野难以有效捕获到上下文指导信息, 从而导致全局结构信息丢失. 因此, 真实场景下的图像去雾任务面临着巨大的挑战. 考虑到Transformer具有捕获长距离语义信息依赖关系的优势, 有利于引导全局结构信息重建. 然而, 标准Transformer结构的高计算复杂度阻碍了其在图像恢复中的应用. 针对上述提到的问题, 提出一个由Transformer和卷积神经网络组成的双分支协同非均匀图像去雾网络Dehazeformer. Transformer分支用于提取全局结构信息, 同时设计稀疏自注意力模块(Sparse self-attention modules, SSM) 以降低计算复杂度. 卷积神经网络分支用于获取局部信息, 从而恢复纹理细节. 在真实非均匀有雾场景下的实验结果表明, 该方法不管是在客观评价还是在主观视觉效果方面均达到优异的性能.
-
关键词:
- 图像去雾 /
- 卷积神经网络 /
- Transformer /
- 特征融合 /
- 稀疏自注意力
Abstract: In recent years, image dehazing methods based on convolutional neural network (CNN) have made remarkable progress in synthetic datasets, but the local receptive field of convolution operation is difficult to effectively capture contextual guidance information due to the uneven distribution of haze in the real scene, resulting in the loss of global structure information. Therefore, the image dehazing task in the real scene still faces great challenges. Considering that Transformer has the advantage of capturing long-range semantic information dependency relationships, it can facilitate global structure information reconstruction. However, the high computational complexity of the standard Transformer structure hinders its application in image restoration. To solve the problems mentioned above, this paper proposes a double-branch collaborative nonhomogeneous image dehazing network, which is called Dehazeformer and composed of Transformer and convolutional neural network. The Transformer branch is used to extract global structure information, and sparse self-attention modules (SSM) are designed to reduce computational complexity. Besides, the convolutional neural network branch is used to obtain local information to recover texture details. Extensive experiments in the real nonhomogeneous haze scene show that the proposed method achieves excellent performance in both objective evaluation and subjective visual effects. -
表 1 在NH-HAZE21测试集上的消融实验定量比较
Table 1 Quantitative comparison of ablation experiments on the NH-HAZE21 test dataset
方法 HTL SSM SRB TRB PSNR (dB) $ \uparrow $ SSIM$ \uparrow $ LPIPS$ \downarrow $ Baseline × × $\checkmark$ × 21.24 0.8339 0.1849 Baseline+HTL $\checkmark$ × $\checkmark$ × 21.88 0.8428 0.1766 Baseline+SSM × $\checkmark$ $\checkmark$ × 21.54 0.8361 0.1837 SRB $\checkmark$ $\checkmark$ $\checkmark$ × 22.08 0.8458 0.1752 TRB × × × $\checkmark$ 21.62 0.8566 0.1740 Ours $\checkmark$ $\checkmark$ $\checkmark$ $\checkmark$ 22.44 0.8631 0.1597 表 2 混合Transformer层与Twin Transformer层在NH-HAZE21测试集上的消融实验定量比较
Table 2 Quantitative comparison of ablation experiments between hybrid Transformer layer and Twin Transformer layer on the NH-HAZE21 test dataset
方法 PSNR (dB) $ \uparrow $ SSIM$ \uparrow $ LPIPS$ \downarrow $ 本文方法+位置嵌入编码 22.33 0.8611 0.1613 Twin Transformer层 22.29 0.8611 0.1613 混合Transformer层 22.44 0.8631 0.1597 表 3 在NH-HAZE20和NH-HAZE21测试集上与主流去雾方法的定量比较 (注: — 表示该方法未提供源码)
Table 3 Quantitative comparison with mainstream dehazing methods on the NH-HAZE20 and NH-HAZE21 test datasets (Note: — indicates that the method does not provide source code)
方法 NH-HAZE20 NH-HAZE21 NH-HAZE平均值 PSNR (dB)$ \uparrow $ SSIM$ \uparrow $ LPIPS$ \downarrow $ PSNR (dB)$ \uparrow $ SSIM$ \uparrow $ LPIPS$ \downarrow $ PSNR (dB)$ \uparrow $ SSIM$ \uparrow $ LPIPS$ \downarrow $ DCP[25] 11.64 0.4533 0.5365 11.57 0.6278 0.4486 11.61 0.5674 0.4926 CAP[26] 11.54 0.4188 0.5724 11.56 0.5848 0.4865 11.55 0.5018 0.5295 AOD-Net[27] 13.44 0.4130 — 15.20 0.6413 0.3103 14.32 0.5272 — GridDehaze[28] 17.63 0.6668 0.3046 20.08 0.8134 0.2332 18.86 0.7401 0.2689 FFA-Net[4] 17.44 0.6543 0.3340 20.51 0.8139 0.2315 18.98 0.7341 0.2828 MSBDN[2] 19.01 0.7033 0.2858 20.89 0.8207 0.2393 19.95 0.7620 0.2626 KDDN[3] 17.25 0.6602 0.3121 20.64 0.8156 0.2170 18.95 0.7379 0.2646 AECR[5] 18.58 0.6575 0.2809 20.81 0.8269 0.1865 19.70 0.7422 0.2337 MPSHAN[15] 18.13 0.6410 — 18.97 0.7810 — 18.55 0.7110 — TransWeather[29] 19.60 0.6990 0.2699 21.72 0.8368 0.1972 20.66 0.7679 0.2336 Res2Net+RCAN[8] 21.44 0.7040 — 21.66 0.8430 — 21.55 0.7735 — DB-CGAN[16] 18.29 0.6330 — 19.33 0.7910 — 18.81 0.7120 — FADehaze[30] 17.44 0.6300 — 20.50 0.8400 — 18.97 0.7350 — BiN-Flow[31] 18.63 0.6340 — — — — — — — PFONet[32] 20.09 0.6583 — — — — — — — SDD[33] — — — 22.15 0.8350 — — — — TUSR-Net[34] 21.96 0.7254 — — — — — — — ITBdehaze[11] 21.44 0.7100 — 21.67 0.8380 — 21.56 0.7740 — 本文方法 22.16 0.7345 0.2501 22.44 0.8631 0.1597 22.30 0.7988 0.2049 表 4 与NTIRE 2021非均匀图像去雾挑战赛优胜方案的定量比较
Table 4 Quantitative comparison with winning schemes of the nonhomogeneous image dehazing challenge in NTIRE 2021
方法 PSNR (dB) $\uparrow $ SSIM $\uparrow $ DWT dehaze 21.99 0.8560 Mac dehaze 21.66 0.8430 Bilibili AI & FDU 21.24 0.7882 VIP UNIST 21.17 0.8360 Buaa colab 20.13 0.8034 本文方法 22.44 0.8631 -
[1] Nayar S K, Narasimhan S G. Vision in bad weather. In: Proceedings of the Seventh IEEE/CVF International Conference on Computer Vision (ICCV). Kerkyra, Greece: IEEE, 1999. 820–827 [2] Dong H, Pan J S, Xiang L, Hu Z, Zhang X Y, Wang F, et al. Multi-scale boosted dehazing network with dense feature fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 2154–2164 [3] Hong M, Xie Y, Li C H, Qu Y Y. Distilling image dehazing with heterogeneous task imitation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 3459–3468 [4] Qin X, Wang Z L, Bai Y C, Xie X D, Jia H Z. FFA-Net: Feature fusion attention network for single image dehazing. In: Proceedings of the Thirty-fourth AAAI Conference on Artificial Intelligence. New York, USA: AAAI Press, 2020. 11908–11915 [5] Wu H Y, Qu Y Y, Lin S H, Zhou J, Qiao R Z, Zhang Z Z, et al. Contrastive learning for compact single image dehazing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE, 2021. 10546–10555 [6] Liu J, Wu H Y, Xie Y, Qu Y Y, Ma L Z. Trident dehazing network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Seattle, USA: IEEE, 2020. 1732–1741 [7] Wu H Y, Liu J, Xie Y, Qu Y Y, Ma L Z. Knowledge transfer dehazing network for nonhomogeneous dehazing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Seattle, USA: IEEE, 2020. 1975–1983 [8] Yu Y K, Liu H, Fu M H, Chen J, Wang X Y, Wang K Y. A two-branch neural network for non-homogeneous dehazing via ensemble learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Nashville, USA: IEEE, 2021. 193–202 [9] Fu M H, Liu H, Yu Y K, Chen J, Wang K Y. DW-GAN: A discrete wavelet transform GAN for nonhomogeneous dehazing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Nashville, USA: IEEE, 2021. 203–212 [10] Guo C L, Yan Q X, Anwar S, Cong R M, Ren W Q, Li C Y. Image dehazing Transformer with transmission-aware 3D position embedding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). New Orleans, USA: IEEE, 2022. 5802–5810 [11] Liu Y Y, Liu H, Li L Y, Wu Z J, Chen J. A data-centric solution to nonhomogeneous dehazing via vision Transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Vancouver, Canada: IEEE, 2023. 1406–1415 [12] Das S D, Dutta S. Fast deep multi-patch hierarchical network for nonhomogeneous image dehazing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Seattle, USA: IEEE, 2020. 1994–2001 [13] Metwaly K, Li X L, Guo T T, Monga V. Nonlocal channel attention for nonhomogeneous image dehazing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Seattle, USA: IEEE, 2020. 1842–1851 [14] Ye T, Chen E K, Huang X R, Chen P. Efficient re-parameterization residual attention network for nonhomogeneous image dehazing. arXiv preprint arXiv: 2109.05479, 2021. [15] 杨坤, 张娟, 方志军. 基于多补丁和多尺度层级聚合网络的快速非均匀图像去雾. 计算机科学, 2021, 48(11): 250−257 doi: 10.11896/jsjkx.200900058Yang Kun, Zhang Juan, Fang Zhi-Jun. Multi-patch and multi-scale hierarchical aggregation network for fast nonhomogeneous image dehazing. Computer Science, 2021, 48(11): 250−257 doi: 10.11896/jsjkx.200900058 [16] 朱利安, 张鸿. 基于双分支条件生成对抗网络的非均匀图像去雾. 计算机应用, 2023, 43(2): 567−574Zhu Li-An, Zhang Hong. Nonhomogeneous image dehazing based on dual-branch conditional generative adversarial network. Journal of Computer Applications, 2023, 43(2): 567−574 [17] Liu Z, Lin Y T, Cao Y, Hu H, Wei Y X, Zhang Z, et al. Swin Transformer: Hierarchical vision Transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE, 2021. 9992–10002 [18] Zhang Y L, Li K P, Li K, Wang L C, Zhong B N, Fu Y. Image super-resolution using very deep residual channel attention networks. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 294–310 [19] Han K, Xiao A, Wu E H, Guo J Y, Xu C J, Wang Y H. Transformer in Transformer. arXiv preprint arXiv: 2103.00112, 2021. [20] Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv: 2010.11929, 2021. [21] Guo R H, Niu D T, Qu L, Li Z B. SOTR: Segmenting objects with Transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE, 2021. 7137–7146 [22] Zhao G X, Lin J Y, Zhang Z Y, Ren X C, Su Q, Sun X. Explicit sparse Transformer: Concentrated attention through explicit selection. arXiv preprint arXiv: 1912.11637, 2019. [23] Johnson J, Alahi A, Li F F. Perceptual losses for real-time style transfer and super-resolution. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016. 694–711 [24] Lin T Y, Dollár P, Girshick R, He K M, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 936–944 [25] He K M, Sun J, Tang X O. Single image haze removal using dark channel prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(12): 2341−2353 doi: 10.1109/TPAMI.2010.168 [26] Zhu Q S, Mai J M, Shao L. A fast single image haze removal algorithm using color attenuation prior. IEEE Transactions on Image Processing, 2015, 24(11): 3522−3533 doi: 10.1109/TIP.2015.2446191 [27] Li B Y, Peng X L, Wang Z Y, Xu J Z, Feng D. AOD-Net: All-in-one dehazing network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017. 4780–4788 [28] Liu X H, Ma Y R, Shi Z H, Chen J. GridDehazeNet: Attention-based multi-scale network for image dehazing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, South Korea: IEEE, 2019. 7313–7322 [29] Valanarasu J M J, Yasarla R, Patel V M. TransWeather: Transformer-based restoration of images degraded by adverse weather conditions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 2343–2353 [30] 吴正平, 程洁莹, 雷帮军, 赵俊臣. 基于特征注意力的快速非均匀雾图像去雾算法. 国外电子测量技术, 2023, 42(9): 9−18Wu Zheng-Ping, Cheng Jie-Ying, Lei Bang-Jun, Zhao Jun-Chen. Fast nonhomogeneous image dehazing algorithm based on feature attention. Foreign Electronic Measurement Technology, 2023, 42(9): 9−18 [31] Wu Y Q, Tao D P, Zhan Y B, Zhang C Y. BiN-Flow: Bidirectional normalizing flow for robust image dehazing. IEEE Transactions on Image Processing, 2022, 31: 6635−6648 doi: 10.1109/TIP.2022.3214093 [32] Li S S, Zhou Y, Ren W Q, Xiang W. PFONet: A progressive feedback optimization network for lightweight single image dehazing. IEEE Transactions on Image Processing, 2023, 32: 6558−6569 doi: 10.1109/TIP.2023.3333564 [33] Kim G, Kwon J. Self-parameter distillation dehazing. IEEE Transactions on Image Processing, 2022, 32: 631−642 [34] Song X B, Zhou D F, Li W, Dai Y C, Shen Z L, Zhang L J, et al. TUSR-Net: Triple unfolding single image dehazing with self-regularization and dual feature to pixel attention. IEEE Transactions on Image Processing, 2023, 32: 1231−1244 doi: 10.1109/TIP.2023.3234701 [35] Chen T Y, Fu J H, Jiang W T, Gao C, Liu S. SRKTDN: Applying super resolution method to dehazing task. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Nashville, USA: IEEE, 2021. 487–496 [36] Jo E, Sim J Y. Multi-scale selective residual learning for non-homogeneous dehazing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Nashville, USA: IEEE, 2021. 507–515