2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

多维注意力特征聚合立体匹配算法

张亚茹 孔雅婷 刘彬

张亚茹, 孔雅婷, 刘彬. 多维注意力特征聚合立体匹配算法. 自动化学报, 2022, 48(7): 1805−1815 doi: 10.16383/j.aas.c200778
引用本文: 张亚茹, 孔雅婷, 刘彬. 多维注意力特征聚合立体匹配算法. 自动化学报, 2022, 48(7): 1805−1815 doi: 10.16383/j.aas.c200778
Zhang Ya-Ru, Kong Ya-Ting, Liu Bin. Multi-dimensional attention feature aggregation stereo matching algorithm. Acta Automatica Sinica, 2022, 48(7): 1805−1815 doi: 10.16383/j.aas.c200778
Citation: Zhang Ya-Ru, Kong Ya-Ting, Liu Bin. Multi-dimensional attention feature aggregation stereo matching algorithm. Acta Automatica Sinica, 2022, 48(7): 1805−1815 doi: 10.16383/j.aas.c200778

多维注意力特征聚合立体匹配算法

doi: 10.16383/j.aas.c200778
基金项目: 河北省自然科学基金 (F2019203320)资助
详细信息
    作者简介:

    张亚茹:燕山大学博士研究生. 主要研究方向为人工智能, 计算机视觉和计算机图形学. E-mail: yrzhang1014@163.com

    孔雅婷:燕山大学硕士研究生. 主要研究方向为人工智能, 计算机视觉和计算机图形学. E-mail: kongyt10@163.com

    刘彬:燕山大学信息科学与工程学院教授. 主要研究方向为人工智能和计算机视觉. 本文通信作者.E-mail: liubin@ysu.edu.cn

Multi-dimensional Attention Feature Aggregation Stereo Matching Algorithm

Funds: Supported by Natural Science Foundation of Hebei Province (F2019203320)
More Information
    Author Bio:

    ZHANG Ya-Ru Ph.D. candidate at the School of Information Science and Engineering, Yanshan University. Her research interest covers artificial intelligence, computer vision, and computer graphics

    KONG Ya-Ting Master student at the School of Electrical Engineering, Yanshan University. Her research interest covers artificial intelligence, computer vision, and computer graphics

    LIU Bin Professor at the School of Information Science and Engineering, Yanshan University. His research interest covers artificial intelligence and computer vision. Corresponding author of this paper

  • 摘要: 现有基于深度学习的立体匹配算法在学习推理过程中缺乏有效信息交互, 而特征提取和代价聚合两个子模块的特征维度存在差异, 导致注意力方法在立体匹配网络中应用较少、方式单一. 针对上述问题, 本文提出了一种多维注意力特征聚合立体匹配算法. 设计2D注意力残差模块, 通过在原始残差网络中引入无降维自适应2D注意力残差单元, 局部跨通道交互并提取显著信息, 为匹配代价计算提供丰富有效的特征. 构建3D注意力沙漏聚合模块, 以堆叠沙漏结构为骨干设计3D注意力沙漏单元, 捕获多尺度几何上下文信息, 进一步扩展多维注意力机制, 自适应聚合和重新校准来自不同网络深度的代价体. 在三大标准数据集上进行评估, 并与相关算法对比, 实验结果表明所提算法具有更高的预测视差精度, 且在无遮挡的显著对象上效果更佳.
  • 图  1  算法网络结构图

    Fig.  1  Architecture overview of proposed algorithm

    图  2  2D注意力残差单元结构图

    Fig.  2  2D attention residual unit architecture

    图  3  联合代价体结构图

    Fig.  3  Combined cost volume architecture

    图  4  3D注意力沙漏聚合模块结构图

    Fig.  4  3D attention hourglass aggregation module architecture

    图  5  3D注意力沙漏单元结构图

    Fig.  5  3D attention hourglass unit architecture

    图  6  损失函数权重对网络的影响

    Fig.  6  The influence of the weight of loss function on network performance

    图  7  SceneFlow视差估计结果

    Fig.  7  Results of disparity estimation on SceneFlow dataset

    图  8  KITTI2015视差估计结果

    Fig.  8  Results of disparity estimation on KITTI2015 dataset

    图  9  KITTI2012视差估计结果

    Fig.  9  Results of disparity estimation on KITTI2012 dataset

    表  1  2D注意力残差单元和联合代价体的参数设置(D表示最大视差, 默认步长为1)

    Table  1  Parameter setting of the 2D attention residual unit and combined cost volume (D represents the maximum disparity. The default stride is 1)

    层级名称层级设置输出维度
    ${ { { {{F} }_{\rm{l}}} } / { { {{F} }_{\rm{r}}} } }$卷积核尺寸, 通道数, 步长H×W×3
    2D 注意力残差模块
    Conv0_1$3 \times 3,32,$ 步长 = 2${1 / 2}H \times {1 / 2}W \times 32$
    Conv0_2$3 \times 3,32,$${1 / 2}H \times {1 / 2}W \times 32$
    Conv0_3$3 \times 3,32,$${1 / 2}H \times {1 / 2}W \times 32$
    Conv1_x$\left[ \begin{aligned} 3 \times 3,32 \\ 3 \times 3,32 \end{aligned} \right] \times 3$${1 / 2}H \times {1 / 2}W \times 32$
    Conv2_x$\left[ \begin{aligned} 3 \times 3,32 \\ 3 \times 3,32 \end{aligned} \right] \times 16$, 步长 = 2${1 / 4}H \times {1 / 4}W \times 64$
    Conv3_x$\left[ \begin{aligned} 3 \times 3,32 \\ 3 \times 3,32 \end{aligned} \right] \times 3$${1 / 4}H \times {1 / 4}W \times 128$
    Conv4_x$\left[ \begin{aligned} 3 \times 3,32 \\ 3 \times 3,32 \end{aligned} \right] \times 3$${1 / 4}H \times {1 / 4}W \times 128$
    ${ {{F} }_{\rm{l}}}$/${ {{F} }_{\rm{r}}}$级联: Conv2_x, Conv3_x, Conv4_x${1 / 4}H \times {1 / 4}W \times 320$
    联合代价体
    ${ {{F} }_{{\rm{gc}}} }$${1 / 4}D \times {1 / 4}H \times {1 / 4}W \times 40$
    ${\tilde {{F} }_{\rm{l}}}$/${\tilde {{F} }_{\rm{r}}}$$\left[ \begin{aligned} 3 \times 3,128 \\ 1 \times 1,{\rm{ } }12 \end{aligned} \right]$${1 / 4}H \times {1 / 4}W \times 12$
    ${ {{F} }_{{\rm{cat}}} }$${1 / 4}D \times {1 / 4}H \times {1 / 4}W \times 24$
    ${ {{F} }_{{\rm{com}}} }$级联: ${ {{F} }_{{\rm{gc}}} }$, ${ {{F} }_{{\rm{cat}}} }$${1 / 4}D \times {1 / 4}H \times {1 / 4}W \times 64$
    下载: 导出CSV

    表  2  2D注意力残差模块在不同设置下的性能评估

    Table  2  Performance evaluation of 2D attention residual module with different settings

    网络设置KITTI2015
    2D 注意力单元> 1 px (%)> 2 px (%)> 3 px (%)EPE (px)
    13.63.491.790.631
    最大池化 + 降维12.93.201.690.623
    平均池化 + 降维12.73.261.640.620
    $ \checkmark$12.43.121.610.615
    下载: 导出CSV

    表  3  联合代价体和3D注意力沙漏聚合模块在不同设置下的性能评估

    Table  3  Evaluation of 3D attention hourglass aggregation module and combined cost volume with different settings

    网络设置KITTI2012 KITTI2015
    联合代价体3D 注意力单元EPE (px)D1-all (%) EPE (px)D1-all (%)
    3D 最大池化3D 平均池化
    $ \checkmark$0.8042.57 0.6151.94
    $ \checkmark$$ \checkmark$0.7222.36 0.6101.70
    $ \checkmark$$ \checkmark$0.7032.33 0.6071.68
    PSMNet[17]$ \checkmark$$ \checkmark$0.8672.65 0.6522.03
    $ \checkmark$$ \checkmark$$ \checkmark$0.6542.13 0.5891.43
    下载: 导出CSV

    表  4  不同算法在SceneFlow数据集上的性能评估

    Table  4  Performance evaluation of different methods on the SceneFlow dataset

    算法EPE (px)
    本文算法0.71
    Gwc-Net[24]0.765
    PSMNet[17]1.09
    MCA-Net[29]1.30
    CRL[35]1.32
    GC-Net[21]2.51
    下载: 导出CSV

    表  5  不同算法在KITTI2015上的性能评估 (%)

    Table  5  Performance evaluation of different methods on the KITTI2015 dataset (%)

    算法AllNoc
    D1-bgD1-fgD1-allD1-bgD1-fgD1-all
    DispNetC[20]4.324.414.344.113.724.05
    MC-CNN-art[36]2.898.883.882.487.643.33
    CRL[35]2.483.592.672.323.122.45
    PDSNet[37]2.294.052.582.093.682.36
    GC-Net[21]2.216.162.872.025.582.61
    PSMNet[17]1.864.622.321.714.312.14
    本文算法1.724.532.301.644.082.06
    下载: 导出CSV

    表  6  不同算法在KITTI2012上的性能评估 (%)

    Table  6  Performance evaluation of different methods on the KITTI2012 dataset (%)

    算法> 2 px> 3 px> 5 px平均误差
    NocAllNocAllNocAllNocAll
    DispNetC[20]7.388.114.114.652.052.390.91.0
    MC-CNN-acrt[36]3.905.452.433.631.642.390.70.9
    GC-Net[21]2.713.461.772.301.121.460.60.7
    SegStereo[11]2.663.191.682.031.001.210.50.6
    PSMNet[17]2.443.011.491.890.901.150.50.6
    本文算法3.013.601.461.730.810.900.50.6
    下载: 导出CSV
  • [1] Feng D, Rosenbaum L, Dietmayer K. Towards safe autonomous driving: capture uncertainty in the deep neural network for lidar 3D vehicle detection. In: Proceedings of the 21st International Conference on Intelligent Transportation Systems. Maui, HI, USA: IEEE, 2018. 3266−3273
    [2] Schmid K, Tomic T, Ruess F, Hirschmüller H, Suppa M. Stereo vision based indoor/outdoor navigation for flying robots. In: Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. Tokyo, Japan: IEEE, 2013. 3955−3962
    [3] 李培玄, 刘鹏飞, 曹飞道, 赵怀慈. 自适应权值的跨尺度立体匹配算法. 光学学报, 2018, 38(12): 248-253.

    Li Pei-Xuan, Liu Peng-Fei, Cao Fei-Dao, Zhao Huai-Ci. Weight-adaptive cross-scale algorithm for stereo matching. Acta Optica Sinica, 2018, 38(12): 248-253.
    [4] 韩先君, 刘艳丽, 杨红雨. 多元线性回归引导的立体匹配算法. 计算机辅助设计与图形学学报, 2019, 31(1): 84-93.

    Han Xian-Jun, Liu Yan-Li, Yang Hong-Yu. A stereo matching algorithm guided by multiple linear regression. Journal of Computer-Aided Design & Computer Graphics, 2019, 31(1): 84-93.
    [5] Zagoruyko S, Komodakis N. Learning to compare image patches via convolutional neural networks. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015. 4353−4361
    [6] Luo W, Schwing A G, Urtasun R. Efficient deep learning for stereo matching. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016. 5695−5703
    [7] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651.
    [8] Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A, Brox T. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016. 4040−4048
    [9] Song X, Zhao X, Fang L, Hu H. Edgestereo: An effective multi-task learning network for stereo matching and edge detection. International Journal of Computer Vision, 2020, 128(4): 910-930.
    [10] Song X, Zhao X, Hu H W, Fang L J. Edgestereo: A context integrated residual pyramid network for stereo matching. In: Proceedings of the 14th Asian Conference on Computer Vision. Springer, Cham, 2018. 11365: 20−35
    [11] Yang G R, Zhao H S, Shi J P, Deng Z D, Jia J Y. Segstereo: Exploiting semantic information for disparity estimation. In: Proceedings of the 15th European Conference on Computer Vision. Springer Verlag: 2018. 11211: 660−676
    [12] Zhang J, Skinner K A, Vasudevan R, Johnson-Roberson M. Dispsegnet: leveraging semantics for end-to-end learning of disparity estimation from stereo imagery. IEEE Robotics and Automation Letters, 2019, 4(2): 1162-1169.
    [13] Jie Z Q, Wang P F, Ling Y G, Zhao B, Wei Y C, Feng J S, Liu W. Left-right comparative recurrent model for stereo matching. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018.3838−3846
    [14] Liang Z F, Feng Y L, Guo Y L, Liu H Z, Chen W, Qiao L B, Zhou L, Zhang J F. Learning for disparity estimation through feature constancy. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 2811−2820
    [15] 程鸣洋, 盖绍彦, 达飞鹏. 基于注意力机制的立体匹配网络研究. 光学学报, 2020, 40(14): 144-152.

    Cheng Ming-Yang, Gai Shao-Yan, Da Fei-Peng. A stereo-matching neural network based on attention mechanism. Acta Optica Sinica, 2020, 40(14): 144-152.
    [16] 王玉锋, 王宏伟, 于光, 杨明权, 袁昱纬, 全吉成. 基于三维卷积神经网络的立体匹配算法. 光学学报, 2019, 39(11): 227-234.

    Wang Yu-Feng, Wang Hong-Wei, Yu Guang, Yang Ming-Quan, Yuan Yu-Wei, Quan Ji-Cheng. Stereo matching based on 3D convolutional neural network. Acta Optica Sinica, 2019, 39(11): 227-234.
    [17] Chang J R, Chen Y S. Pyramid stereo matching network. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 5410−5418
    [18] Zhu Z D, He M Y, Dai Y C, Rao Z B, Li B. Multi-scale cross-form pyramid network for stereo matching. In: Proceedings of the 14th IEEE Conference on Industrial Electronics and Applications. Xi'an, China: IEEE, 2019. 1789−1794
    [19] Zhang L, Wang Q H, Lu H H, Zhao Y. End-to-end learning of multi-scale convolutional neural network for stereo matching. In: Proceedings of Asian Conference on Machine Learning. 2018. 81−96
    [20] Mayer N, Ilg E, Hausser P, Fischer P. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016. 4040−4048
    [21] Kendall A, Martirosyan H, Dasgupta S, Henry P, Kennedy R, Bachrach A, Bry A. End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 66−75
    [22] Lu H H, Xu H, Zhang L, Ma Y B, Zhao Y. Cascaded multi-scale and multi-dimension convolutional neural network for stereo matching. In: Proceedings of the 2018 IEEE Visual Communications and Image Processing. Taichung, China: IEEE, 2018. 1−4
    [23] Rao Z B, He M Y, Dai Y C, Zhu Z D, Li B, He R J. MSDC-Net: Multi-scale dense and contextual networks for automated disparity map for stereo matching. arXiv PreprintarXiv: 1904.12658, 2019.
    [24] Guo X Y, Yang K, Yang W K, Wang X G, Li H S. Group-wise correlation stereo network. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 3273−3282
    [25] Liu M Y, Yin H J. Cross attention network for semantic segmentation. In: Proceedings of the 2019 IEEE International Conference on Image Processing. Taipei, China: IEEE, 2019. 2434−2438
    [26] 王亚珅, 黄河燕, 冯冲, 周强. 基于注意力机制的概念化句嵌入研究. 自动化学报, 2020, 46(7): 1390-1400.

    Wang Ya-Shen, Huang He-Yan, Feng Chong, Zhou Qiang. Conceptual sentence embeddings based on attention mechanism. Acta Automatica Sinica, 2020, 46(7): 1390- 1400.
    [27] Kim J H, Choi J H, Cheon M, Lee J S. RAM: Residual attention module for single image super-resolution. arXiv PreprintarXiv: 1811.12043, 2018.
    [28] Jeon S, Kim S, Sohn K. Convolutional feature pyramid fusion via attention network. In: Proceedings of the 2017 IEEE International Conference on Image Processing. Beijing, China: IEEE, 2017. 1007−1011
    [29] Sang H, Wang Q, Zhao Y. Multi-scale context attention network for stereo matching. IEEE Access, 2019, 7: 15152-15161.
    [30] Zhang G, Zhu D, Shi W, et al. Multi-dimensional residual dense attention network for stereo matching. IEEE Access, 2019, 7: 51681-51690.
    [31] Hu J, Shen L, Albanie S, Sun G, Wu E. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2019, 42(8). 2011-2023.
    [32] Wang Q L, Wu B G, Zhu P F, Li P H, Zuo W M, Hu Q H. ECA-Net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2020. 11531−11539
    [33] Menze M, Geiger A. Object scene flow for autonomous vehicles. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015. 3061−3070
    [34] Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2012. 3354−3361
    [35] Pang J H, Sun W X, Ren J S, Yang C X, Yan Q. Cascade residual learning: A two-stage convolutional neural network for stereo matching. In: Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops. Venice, Italy: IEEE, 2017. 878−886
    [36] Žbontar J, Lecun Y. Stereo matching by training a convolutional neural network to compare image patches. Journal of Machine Learning Research, 2016, 17.
    [37] Tulyakov S, Ivanov A, Fleuret F. Practical deep stereo (PDS): Toward applications-friendly deep stereo matching. In: Proceedings of the 32nd Conference on Neural Information Processing Systems. Montreal, Canada: 2018.
  • 加载中
图(9) / 表(6)
计量
  • 文章访问数:  1788
  • HTML全文浏览量:  677
  • PDF下载量:  214
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-09-23
  • 网络出版日期:  2020-12-17
  • 刊出日期:  2022-07-01

目录

    /

    返回文章
    返回