2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

多维注意力特征聚合立体匹配算法

张亚茹 孔雅婷 刘彬

张亚茹, 孔雅婷, 刘彬. 多维注意力特征聚合立体匹配算法. 自动化学报, 2020, 46(x): 1−11 doi: 10.16383/j.aas.c200778
引用本文: 张亚茹, 孔雅婷, 刘彬. 多维注意力特征聚合立体匹配算法. 自动化学报, 2020, 46(x): 1−11 doi: 10.16383/j.aas.c200778
Zhang Ya-Ru, Kong Ya-Ting, Liu Bin. Multi-dimensional attention feature aggregation stereo matching algorithm. Acta Automatica Sinica, 2020, 46(x): 1−11 doi: 10.16383/j.aas.c200778
Citation: Zhang Ya-Ru, Kong Ya-Ting, Liu Bin. Multi-dimensional attention feature aggregation stereo matching algorithm. Acta Automatica Sinica, 2020, 46(x): 1−11 doi: 10.16383/j.aas.c200778

多维注意力特征聚合立体匹配算法

doi: 10.16383/j.aas.c200778
基金项目: 河北省国家自然科学基金 (F2019203320)资助
详细信息
    作者简介:

    张亚茹:在燕山大学攻读博士学位. 主要研究方向为人工智能, 计算机视觉和计算机图形学. E-mail: zyaru@stumail.ysu.edu.cn

    孔雅婷:在燕山大学攻读硕士学位. 主要研究方向为人工智能, 计算机视觉和计算机图形学. 本文通信作者. E-mail: kongyt10@163.com

    刘彬:燕山大学信息科学与工程学院教授, 博士生导师. 主要研究方向为人工智能和计算机视觉. E-mail: liubin@ysu.edu.cn

Multi-dimensional Attention Feature Aggregation Stereo Matching Algorithm

Funds: Supported by National Science Foundation of Hebei Province (F2019203320)
  • 摘要: 现有基于深度学习的立体匹配算法在学习推理过程中缺乏有效信息交互, 而特征提取和代价聚合两个子模块的特征维度存在差异, 导致注意力方法在立体匹配网络中应用较少、方式单一. 针对这些问题, 本文提出了一种多维注意力特征聚合立体匹配算法. 设计二维(Two-dimensional, 2D)注意力残差模块, 通过在原始残差网络中引入无降维自适应2D注意力残差单元, 局部跨通道交互并提取显著信息, 为匹配代价计算提供丰富有效的特征. 构建三维(Three-dimensional, 3D)注意力沙漏聚合模块, 以堆叠沙漏结构为骨干设计3D注意力沙漏单元, 捕获多尺度几何上下文信息, 进一步扩展多维注意力机制, 自适应聚合和重新校准来自不同网络深度的代价体. 在三大标准数据集上进行评估, 并与相关算法对比, 实验结果表明所提算法具有更高的预测视差精度, 且在无遮挡的显著对象上效果更佳.
  • 图  1  算法网络结构图

    Fig.  1  Architecture overview of proposed algorithm

    图  2  2D注意力残差单元结构图

    Fig.  2  2D attention residual unit architecture

    图  3  联合代价体结构图

    Fig.  3  Combined cost volume architecture

    图  4  3D注意力沙漏聚合模块结构图

    Fig.  4  3D attention hourglass aggregation module architecture

    图  5  3D注意力沙漏单元结构图

    Fig.  5  3D attention hourglass unit architecture

    图  6  损失函数权重对网络的影响

    Fig.  6  The influence of the weight of loss function on network performance

    图  7  SceneFlow视差估计结果

    Fig.  7  Results of disparity estimation on SceneFlow dataset

    图  8  KITTI2015视差估计结果

    Fig.  8  Results of disparity estimation on KITTI2015 dataset

    图  9  KITTI 2012视差估计结果

    Fig.  9  Results of disparity estimation on KITTI2012 dataset

    表  1  2D注意力残差单元和联合代价体的参数设置. D表示最大视差, 默认步长为1.

    Table  1  Parameter setting of the 2D attention residual unit and combined cost volume. D represents the maximum disparity. The default stride is 1.

    层级名称层级设置输出维度
    ${{{{{F}}_l}} / {{{{F}}_r}}}$卷积核尺寸, 通道数, 步长H×W×3
    2D注意力残差模块
    Conv0_1$3 \times 3,32,$ 步长=2${1 / 2}H \times {1 / 2}W \times 32$
    Conv0_2$3 \times 3,32,$${1 / 2}H \times {1 / 2}W \times 32$
    Conv0_3$3 \times 3,32,$${1 / 2}H \times {1 / 2}W \times 32$
    Conv1_x$\left[ \begin{aligned} 3 \times 3,32 \\ 3 \times 3,32 \end{aligned} \right] \times 3$${1 / 2}H \times {1 / 2}W \times 32$
    Conv2_x$\left[ \begin{aligned} 3 \times 3,32 \\ 3 \times 3,32 \end{aligned} \right] \times 16$, 步长=2${1 / 4}H \times {1 / 4}W \times 64$
    Conv3_x$\left[ \begin{aligned} 3 \times 3,32 \\ 3 \times 3,32 \end{aligned} \right] \times 3$${1 / 4}H \times {1 / 4}W \times 128$
    Conv4_x$\left[ \begin{aligned} 3 \times 3,32 \\ 3 \times 3,32 \end{aligned} \right] \times 3$${1 / 4}H \times {1 / 4}W \times 128$
    ${{{F}}_l}$/${{{F}}_r}$级联: Conv2_x, Conv3_x, Conv4_x${1 / 4}H \times {1 / 4}W \times 320$
    联合代价体
    ${{{F}}_{gc}}$${1 / 4}D \times {1 / 4}H \times {1 / 4}W \times 40$
    ${\tilde {{F}}_l}$/${\tilde {{F}}_r}$$\left[ \begin{aligned} 3 \times 3,128 \\ 1 \times 1,{\rm{ } }12 \end{aligned} \right]$${1 / 4}H \times {1 / 4}W \times 12$
    ${{{F}}_{cat}}$${1 / 4}D \times {1 / 4}H \times {1 / 4}W \times 24$
    ${{{F}}_{com}}$级联: ${{{F}}_{gc}}$, ${{{F}}_{cat}}$${1 / 4}D \times {1 / 4}H \times {1 / 4}W \times 64$
    下载: 导出CSV

    表  2  2D注意力残差模块在不同设置下的性能评估

    Table  2  Performance evaluation of 2D attention residual module with different settings

    网络设置KITTI2015
    2D注意力单元>1px/%>2px/%>3px/%EPE/px
    13.63.491.790.631
    最大池化+降维12.93.201.690.623
    平均池化+降维12.73.261.640.620
    $ \checkmark$12.43.121.610.615
    下载: 导出CSV

    表  3  联合代价体和3D注意力沙漏聚合模块在不同设置下的性能评估

    Table  3  Evaluation of 3D attention hourglass aggregation module and combined cost volume with different settings

    网络设置KITTI2012KITTI2015
    联合代价体3D注意力单元EPE/pxD1-all/%EPE/pxD1-all/%
    3D最大池化3D平均池化
    $ \checkmark$0.8042.570.6151.94
    $ \checkmark$$ \checkmark$0.7222.360.6101.70
    $ \checkmark$$ \checkmark$0.7032.330.6071.68
    PSMNet[17]$ \checkmark$$ \checkmark$0.8672.650.6522.03
    $ \checkmark$$ \checkmark$$ \checkmark$0.6542.130.5891.43
    下载: 导出CSV

    表  4  不同算法在SceneFlow数据集上的性能评估

    Table  4  Performance evaluation of different methods on the SceneFlow dataset

    算法OursGwc-Net[24]PSMNet[17]MCA-Net[29]CRL[35]GC-Net[21]
    EPE/px0.710.7651.091.301.322.51
    下载: 导出CSV

    表  5  不同算法在KITTI2015上的性能评估

    Table  5  Performance evaluation of different methods on the KITTI2015 dataset

    算法All/%Noc/%
    D1-bgD1-fgD1-allD1-bgD1-fgD1-all
    DispNetC[20]4.324.414.344.113.724.05
    MC-CNN-art[36]2.898.883.882.487.643.33
    CRL[35]2.483.592.672.323.122.45
    PDSNet[37]2.294.052.582.093.682.36
    GC-Net[21]2.216.162.872.025.582.61
    PSMNet[17]1.864.622.321.714.312.14
    Ours1.724.532.301.644.082.06
    下载: 导出CSV

    表  6  不同算法在KITTI 2012上的性能评估

    Table  6  Performance evaluation of different methods on the KITTI2012 dataset

    算法>2px/%>3px/%>5px/%平均误差/%
    NocAllNocAllNocAllNocAll
    DispNetC[20]7.388.114.114.652.052.390.91.0
    MC-CNN-acrt[36]3.905.452.433.631.642.390.70.9
    GC-Net[21]2.713.461.772.301.121.460.60.7
    SegStereo[11]2.663.191.682.031.001.210.50.6
    PSMNet[17]2.443.011.491.890.901.150.50.6
    Ours3.013.601.461.730.810.900.50.6
    下载: 导出CSV
  • [1] Feng D, Rosenbaum L, Dietmayer K. Towards safe autonomous driving: capture uncertainty in the deep neural network for lidar 3D vehicle detection. In: Proceedings of 2018 21st International Conference on Intelligent Transportation Systems. Maui, HI, USA: IEEE, 2018. 3266−3273.
    [2] Schmid K, Tomic T, Ruess F, Hirschmüller H, Suppa M. Stereo vision based indoor/outdoor navigation for flying robots. In: Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2013. 3955−3962.
    [3] 李培玄, 刘鹏飞, 曹飞道, 赵怀慈. 自适应权值的跨尺度立体匹配算法. 光学学报, 2018, 38(12): 248−253

    Li Pei-Xuan, Liu Peng-Fei, Cao Fei-Dao, Zhao Huai-Ci. Weight-adaptive cross-scale algorithm for stereo matching. Acta Optica Sinica, 2018, 38(12): 248−253
    [4] 韩先君, 刘艳丽, 杨红雨. 多元线性回归引导的立体匹配算法. 计算机辅助设计与图形学学报, 2019, 31(01): 84−93

    Han Xian-Jun, Liu Yan-Li, Yang Hong-Yu. A stereo matching algorithm guided by multiple linear regression. Journal of Computer-Aided Design & Computer Graphics, 2019, 31(01): 84−93
    [5] Zagoruyko S, Komodakis N. Learning to compare image patches via convolutional neural networks. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015. 4353−4361.
    [6] Luo W, Schwing A G, Urtasun R. Efficient deep learning for stereo matching. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016. 5695−5703.
    [7] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640−651
    [8] Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A, Brox T. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016. 4040−4048.
    [9] Song X, Zhao X, Fang L, Hu H. Edgestereo: An effective multi-task learning network for stereo matching and edge detection. International Journal of Computer Vision, 2020, 128(4): 910−930
    [10] Song X, Zhao X, Hu H, Fang L. Edgestereo: a context integrated residual pyramid network for stereo matching. In: Proceedings of the 14th Asian Conference on Computer Vision. Springer, Cham, 2018. 11365: 20−35.
    [11] Yang G, Zhao H, Shi J, Deng Z, Jia J. Segstereo: exploiting semantic information for disparity estimation. In: Proceedings of the 15th European Conference on Computer Vision. Springer Verlag: 2018. 11211: 660−676.
    [12] Zhang J, Skinner K A, Vasudevan R, Johnson-Roberson M. Dispsegnet: leveraging semantics for end-to-end learning of disparity estimation from stereo imagery. IEEE Robotics and Automation Letters, 2019, 4(2): 1162−1169
    [13] Jie Z, Wang P, Ling Y, et al. Left-right comparative recurrent model for stereo matching. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018.3838−3846.
    [14] Liang Z, Feng Y, Guo Y, et al. Learning for disparity estimation through feature constancy. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 2811−2820.
    [15] 程鸣洋, 盖绍彦, 达飞鹏. 基于注意力机制的立体匹配网络研究. 光学学报, 2020, 40(14): 144−152

    Cheng Ming-Yang, Gai Shao-Yan, Da Fei-Peng. A stereo-matching neural network based on attention mechanism. Acta Optica Sinica, 2020, 40(14): 144−152
    [16] 王玉锋, 王宏伟, 于光, 杨明权, 袁昱纬, 全吉成. 基于三维卷积神经网络的立体匹配算法. 光学学报, 2019, 39(11): 227−234

    Wang Yu-Feng, Wang Hong-Wei, Yu Guang, Yang Ming-Quan, Yuan Yu-Wei, Quan Ji-Cheng. Stereo matching based on 3D convolutional neural network. Acta Optica Sinica, 2019, 39(11): 227−234
    [17] Chang J R, Chen Y S. Pyramid stereo matching network. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 5410−5418.
    [18] Zhu Z, He M, Dai Y, Rao Z, Li B. Multi-scale cross-form pyramid network for stereo matching. In: Proceedings of 2019 14th IEEE Conference on Industrial Electronics and Applications. Xi'an, China: IEEE, 2019. 1789−1794.
    [19] Zhang L, Wang Q, Lu H, Zhao Y. End-to-end learning of multi-scale convolutional neural network for stereo matching. In: Proceedings of Asian Conference on Machine Le−arning. 2018. 81-96.
    [20] Mayer N, Ilg E, Hausser P, Fischer P. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016. 4040−4048.
    [21] Kendall A, Martirosyan H, Dasgupta S, et al. End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 66−75.
    [22] Lu H, Xu H, Zhang L, Zhao Y. Cascaded multi-scale and multi-dimension convolutional neural network for stereo matching. In: Proceedings of 2018 IEEE Visual Communications and Image Processing. Taichung, Taiwan, China: IEEE, 2018. 1−4.
    [23] Rao Z, He M, Dai Y, et al. MSDC-Net: multi-scale dense and contextual networks for automated disparity map for stereo matching. arXiv PreprintarXiv: 1904.12658, 2019.
    [24] Guo X, Yang K, Yang W, Wang X, Li H. Group-wise correlation stereo network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2019.3273−3282.
    [25] Liu M, Yin H. Cross attention network for semantic segmentation. In: Proceedings of 2019 IEEE International Conference on Image Processing. Taipei, Taiwan, China: IEEE, 2019. 2434−2438.
    [26] 王亚珅, 黄河燕, 冯冲, 周强. 基于注意力机制的概念化句嵌入研究. 自动化学报, 2020, 46(7): 1390−1400

    Wang Ya-Shen, Huang He-Yan, Feng Chong, Zhou Qiang. Conceptual sentence embeddings based on attention mechanism. Acta Automatica Sinica, 2020, 46(7): 1390−1400
    [27] Kim J H, Choi J H, Cheon M, et al. Ram: residual attention module for single image super-resolution. arXiv PreprintarXiv: 1811.12043, 2018.
    [28] Jeon S, Kim S, Sohn K. Convolutional feature pyramid fusion via attention network. In: Proceedings of the 2017 IEEE International Conference on Image Processing. Beijing, China: IEEE, 2017. 1007−1011.
    [29] Sang H, Wang Q, Zhao Y. Multi-scale context attention network for stereo matching. IEEE Access, 2019, 7: 15152−15161
    [30] Zhang G, Zhu D, Shi W, et al. Multi-dimensional residual dense attention network for stereo matching. IEEE Access, 2019, 7: 51681−51690
    [31] Hu J, Shen L, Albanie S, Sun G, Wu E. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 42(8): 2011−2023
    [32] Wang Q, Wu B, Zhu P, et al. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In: Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2020. 11531−11539.
    [33] Menze M, Geiger A. Object scene flow for autonomous vehicles. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015. 3061−3070.
    [34] Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2012. 3354−3361.
    [35] Pang J, Sun W, Ren J, Yang C, Yan Q. Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops. Venice, Italy: IEEE, 2017. 878−886.
    [36] Žbontar J, Lecun Y. Stereo matching by training a convolutional neural network to compare image patches. Journal of Machine Learning Research, 2016, 17
    [37] Tulyakov S, Ivanov A, Fleuret F. Practical deep stereo (PDS): toward applications-friendly deep stereo matching. In: Proceedings of the 32nd Conference on Neural Information Processing Systems. Montreal, Canada: 2018. 31.
  • 加载中
计量
  • 文章访问数:  42
  • HTML全文浏览量:  18
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-09-23
  • 修回日期:  2020-12-01
  • 网络出版日期:  2020-12-17

目录

    /

    返回文章
    返回