移动机器人视觉里程计综述

丁文东; 徐德; 刘希龙; 张大朋; 陈天

doi:10.16383/j.aas.2018.c170107

移动机器人视觉里程计综述

doi: 10.16383/j.aas.2018.c170107

丁文东^1,2,,
徐德^1,2,3, ,,
刘希龙^1,2,,
张大朋^1,2,,
陈天^1,2,

1.
中国科学院自动化研究所精密感知与控制研究中心北京 100190
2.
中国科学院大学北京 101408
3.
天津中科智能技术研究院有限公司天津 300300

基金项目:

国家自然科学基金 61503376

天津市支持科研院所来津发展项目 16PTYJGX00050

北京市自然科学基金 4161002

国家自然科学基金 51405485

国家自然科学基金 61673383

国家自然科学基金 51405486

详细信息

作者简介:
丁文东  中国科学院自动化研究所博士研究生.2013年获得武汉理工大学信息工程学院电子科学与技术学士学位.主要研究方向为视觉测量及定位技术.E-mail:dingwendong2013@ia.ac.cn

刘希龙  中国科学院自动化研究所副研究员.2009年获得北京交通大学学士学位.2014年获得中国科学院自动化研究所博士学位.主要研究方向为图像处理, 模式识别, 视觉测量.E-mail:xilong.liu@ia.ac.cn

张大朋  中国科学院自动化研究所副研究员.2003年、2006年获得河北科技大学学士、硕士学位.2011年获得北京航空航天大学博士学位.主要研究方向为机器人视觉测量, 医疗机器人.E-mail:dapeng.zhang@ia.ac.cn

陈天  中国科学院自动化研究所硕士研究生.2016年获得北京邮电大学学士学位.主要研究方向为视觉定位及三维重建技术.E-mail:chentian2016@ia.ac.cn

通讯作者:
徐德中国科学院自动化研究所研究员.1985年、1990年获得山东科技大学学士、硕士学位.2001年获得浙江大学博士学位.主要研究方向为机器人视觉测量, 视觉伺服, 显微视觉技术.本文通信作者.E-mail:de.xu@ia.ac.cn

计量
- 文章访问数: 4293
- HTML全文浏览量: 1830
- PDF下载量: 3189
- 被引次数: 0
出版历程
- 收稿日期: 2017-02-27
- 录用日期: 2017-09-07
- 刊出日期: 2018-03-20

Review on Visual Odometry for Mobile Robots

DING Wen-Dong^{1,2
,},
XU De^{1,2,3
, ,},
LIU Xi-Long^{1,2
,},
ZHANG Da-Peng^{1,2
,},
CHEN Tian^{1,2
,}

1.
Research Center of Precision Sensing and Control, Institute of Automation, Chinese Academy of Sciences, Beijing 100190
2.
University of Chinese Academy of Sciences, Beijing 101408
3.
Tianjin Intelligent Technology Institute of CASIA Co., Ltd, Tianjin 300300

Funds:

National Natural Science Foundation of China 61503376

the Project of Development in Tianjin for Scientific Research Institutes Supported by Tianjin Government 16PTYJGX00050

Beijing Natural Science Foundation 4161002

National Natural Science Foundation of China 51405485

National Natural Science Foundation of China 61673383

National Natural Science Foundation of China 51405486

More Information

Author Bio:
Ph. D. candidate at the Institute of Automation, Chinese Academy of Sciences. He received his bachelor degree in electronic science and technology from Wuhan University of Technology in 2013. His research interest covers visual localization and measurement

Associate professor at the Institute of Automation, Chinese Academy of Sciences. He received his bachelor degree from Beijing Jiaotong University in 2009, and his Ph. D. degree from the Institute of Automation, Chinese Academy of Sciences in 2014. His research interest covers image processing, pattern recognition, visual measurement, and visual scene cognition

Associate professor at the Institute of Automation, Chinese Academy of Sciences. He received his bachelor degree and master degree from Hebei University of Science and Technology, in 2003 and 2006, and received his Ph. D. degree from Beijing University of Aeronautics and Astronautics, in 2011. His research interest covers robot vision measurement and medical robot

Master student at the Institute of Automation, Chinese Academy of Sciences. She received her bachelor degree from Beijing University of Posts and Telecommunications in 2016. Her research interest covers visual localization and reconstruction

Corresponding author: XU De Professor at the Institute of Automation, Chinese Academy of Sciences. He received his bachelor degree and master degree from Shandong University of Technology in 1985 and 1990, and received his Ph. D. degree from Zhejiang University in 2001. His research interest covers robot vision measurement, visual servoing, and microvisual technology. Corresponding author of this pape

摘要

摘要: 定位是移动机器人导航的重要组成部分.在定位问题中，视觉发挥了越来越重要的作用.本文首先给出了视觉定位的数学描述，然后按照数据关联方式的不同介绍了视觉里程计（Visual odometry，VO）所使用的较为代表性方法，讨论了提高视觉里程计鲁棒性的方法.此外，本文讨论了语义分析在视觉定位中作用以及如何使用深度学习神经网络进行视觉定位的问题.最后，本文简述了视觉定位目前存在的问题和未来的发展方向.
- 视觉里程计 /
- 视觉定位 /
- 位姿估计 /
- 导航 /
- 移动机器人
Abstract: Localization plays a key role in mobile robot's navigation. Vision becomes more and more important for localization. Firstly, this paper gives the mathematical description of visual localization. Secondly, typical methods of visual odometry (VO) are introduced according to the data association modes. Thirdly, the methods to improve the robustness of visual odometry are discussed. Fourthly, the effect of semantic analysis on visual localization is described. How to use deep neural network in visual localization is also provided. Finally, existing problems and future development trends are presented.
- Visual odometry (VO) /
- visual localization /
- pose estimation /
- navigation /
- mobile robot
注释:

1) 本文责任编委侯增广

HTML全文

表 1 直接法与非直接法优缺点对比

Table 1 The comparison between direct methods and indirect methods

	直接法	非直接法
目标函数	最小化光度值误差	最小化重投影误差
优点1	使用了图像中的所有信息	适用于图像帧间的大幅运动
优点2	使用帧间的增量计算减小了每帧的计算量	比较精确, 对于运动和结构的计算效率高
缺点1	受限于帧与帧之间的运动比较小的情况	速度慢(计算特征描述等)
缺点2	通过对运动结构密集的优化比较耗时	需要使用RANSAC等鲁棒估计方法
容易失败	场景的照明发生变化	纹理较弱的地方

下载: 导出CSV

表 2 常用运动模型先验假设

Table 2 The common used motion model assumption

运动模型假设	方法
恒速运动模型	ORB SLAM, PTAM, DPPTAM, DSO
帧位姿变化为0	DPPTAM, SVO, LSD SLAM
帧间仿射变换	DT SLAM

下载: 导出CSV

表 3 常用的鲁棒估计器

Table 3 The common used robust estimators

类型	$\rho(x)$
$\ell_2$	$\dfrac{x^2}{2}$
$\ell_1$	$\|x\|$
$\ell_1-\ell_2$	$2\left(\sqrt{1+\dfrac{x^2}{2}}-1\right)$
$\ell_p$	$\dfrac{\|x\|^\nu}{\nu}$
Huber	$\begin{cases} \dfrac{x^2}{2}, &\mbox{若} \ \|x\|\leq c\\ c\left(\|x\|- \dfrac{c}{2}\right), &\mbox{若} \ \|x\| > c \end{cases}$
Cauchy	$\dfrac{C^2}{2}\ln\left(1+\left( \dfrac{x}{c}\right)^2\right)$
Tukey	$\begin{cases} \dfrac{c^2}{6}\left(1-\left[1-\left( \dfrac{x}{c}\right)^2\right]^3\right), &\mbox{若} \|x\|\leq c \\ \dfrac{c^2}{6}, &\mbox{若} \ \|x\| > c \end{cases}$
t分布	$\dfrac{\nu+1}{\nu+\left( \dfrac{r}{\sigma}\right)^2}$

下载: 导出CSV

表 4 VO系统中的鲁棒目标函数设计

Table 4 The common used robust objection function in VO systems

VO系统	目标函数框架
PTAM	Tukey biweight
DSO	Huber
DT SLAM	Cauchy distribution
DPPTAM	Reweighted Tukey
DTAM	Weighted Huber norm

下载: 导出CSV

表 5 深度图模型

Table 5 The common used models of depth map

VO系统	深度模型
SVO	高斯混合均匀模型
DSO	高斯模型
DT SLAM	极线分段约束^[61]
DPPTAM半稠密	一致性假设
DPPTAM稠密	能量函数¹
DTAM	能量函数²
LSD SLAM	能量函数³
¹光度值误差+图像空间平滑+平面块假设. ²使用光度值误差和图像空间平滑(正则). ³光度值误差和关键帧间方法惩罚.

下载: 导出CSV

表 6 深度网络定位系统特点

Table 6 The comparison of the learning based localization methods

定位系统	目标函数	输入数据	输出结果	网络类型	面向的问题
LSM$^{1}$^[79]	SFA$^{2}$	2帧图像	位姿	CNN	运动估计
PoseNet	位姿误差(7)	RGB单帧图像	位姿	GoogLeNet$^{3}$	重定位
3D-R2N2$^{5}$^[72]	体素交叉熵$^{4}$	单/多帧图像	图像重建	CNN + LSTM	三维重建
LST$^{6}$^[74]	位姿误差	IMU	位姿	LSTM	数据融合
MatchNet	相似度交叉熵$^{7}$	2帧图像	匹配度	CNN + FC	图像块匹配
GVNN	光度值误差	当前/参考图像	位姿	CNN + SE3$^{8}$	视觉里程计
HomographNet	图像点误差	2帧图像	单应矩阵	CNN	估计单应矩阵
SFM-Net^[80]	相机运动误差	RGBD图像	相机运动、三维点云	全卷积	相机运动估计和三维重建
SE3-Net^[81]	物体运动误差	点云数据	物体运动	卷积+反卷积	刚体运动
$^{1}$ Learning to see by moving. $^{2}$ SFA使用图像的密集像素匹配目标(损失)函数, 详见文献[82]. $^{3}$ GoogLeNet是一种22层的CNN网络, 常用于分类识别等. $^{4}$体素是一个三维向量, 对应像素(二维), 具有三维坐标, 表示点对应的空间位置的颜色值, 详见文献[83]. $^{5}$ 3D-R2N2(3D Recurrent reconstruction neural network). $^{6}$ Learning to fuse. $^{7}$文章在全连接层中使用Softmax层, 因此输出为0/1值, 全连接层输入为拼接的特征点对, 目标函数为sofmax输出值的交叉熵误差. $^{8}$除了使用SE3层, 还有包括投影层, 反投影层等.

下载: 导出CSV

表 7 视觉定位系统工具库

Table 7 The common used tools in visual localization

分类	算法库
优化	Eigen, g2o^[21], ceres^[98], GTSAM^[99], iSAM^[100], SLAM++^[101]
空间变换	Eigen, ROS TF, OpenCV Transform, Sophus
标定	OpenCV Calib, Kalibr, MATLAB Calibration Toolbox
特征	OpenCV Feature, VLFeat^[102]
可视化	PCL Visialization, Pangolin, rviz
SFM	Bundler^[95], opencvMVG^[96], 多视几何Matlab工具箱^[97]

下载: 导出CSV

表 8 VO系统常用验证数据集

Table 8 The common used dataset in VO system

名称	发布时间	数据类型	相机类型	真值来源	传感器	文献
KITTI VO	2012	png	双目	GPS	激光	[103]
TUM-Monocular	2012	jpg	单目	无	否	[25]
TUM-RGBD	2012	png + d	RGBD	无	否	[104]
ICL NUM	2014	png	双目	$^{4}$	无	[105]
EuRoC MAV	2016	ROS$^{1}$ + ASL$^{2}$	双目	VICON$^{3}$	IMU	[106]
Scene Flow	2016	png	双目	$^{4}$	无	[107]
COLD$^{5}$	2009	JPEG	全向$^{6}$	无	激光$^{7}$	[108]
NYU depth	2011/2012	png + d	RGBD	无	无	[109], [110]
PACAL 3D +	2014	JPEG	单目	$^{4}$	无	[111]
$^1$ ROS中使用的一种bag记录文件, 使用ROS可以广播文件中的数据为消息. $^2$ ASL为该数据集自定义格式. $^3$除了使用VICON另外还有Laser tracker以及3D structure scan, 具体为Vicon motion capture system (6D pose), Leica MS50 laser tracker (3D position), Leica MS50 3D structure scan. $^4$合成数据集, 存在真值. $^5$这里有一个COLD数据的拓展数据集, 详见http://www.pronobis.pro/data/cold-stockholm $^6$系统配备了普通的相机以及全向相机(Omnidirectional camera). $^7$除了激光雷达, 还有一个轮式里程计(码盘).

下载: 导出CSV

参考文献(117)

[1]	Burri M, Oleynikova H, Achtelik M W, Siegwart R. Realtime visual-inertial mapping, re-localization and planning onboard MAVs in unknown environments. In: Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Hamburg, Germany: IEEE, 2015. 1872-1878
[2]	Dunkley O, Engel J, Sturm J, Cremers D. Visual-inertial navigation for a camera-equipped 25g Nano-quadrotor. In: Proceedings of IROS2014 Aerial Open Source Robotics Workshop. Chicago, USA: IEEE, 2014. 1-2
[3]	Pinto L, Gupta A. Supersizing self-supervision: learning to grasp from 50 K tries and 700 robot hours. In: Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA). Stockholm, Sweden: IEEE, 2016. 3406-3413
[4]	Ai-Chang M, Bresina J, Charest L, Chase A, Hsu J C J, Jonsson A, Kanefsky B, Morris P, Rajan K, Yglesias J, Chafin B G, Dias W C, Maldague P F. MAPGEN: mixed-initiative planning and scheduling for the mars exploration rover mission. IEEE Intelligent Systems, 2004, 19(1): 8-12 doi: 10.1109/MIS.2004.1265878
[5]	Slaughter D C, Giles D K, Downey D. Autonomous robotic weed control systems: a review. Computers and Electronics in Agriculture, 2008, 61(1): 63-78 doi: 10.1016/j.compag.2007.05.008
[6]	Kamegawa T, Yarnasaki T, Igarashi H, Matsuno F. Development of the snake-like rescue robot "kohga". In: Proceedings of the 2004 IEEE International Conference on Robotics and Automation. New Orleans, LA, USA: IEEE, 2004. 5081-5086
[7]	Olson E. AprilTag: a robust and flexible visual fiducial system. In: Proceedings of the 2011 IEEE International Conference on Robotics and Automation (ICRA). Shanghai, China: IEEE, 2011. 3400-3407
[8]	Kikkeri H, Parent G, Jalobeanu M, Birchfield S. An inexpensive method for evaluating the localization performance of a mobile robot navigation system. In: Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA). Hong Kong, China: IEEE, 2014. 4100-4107
[9]	Scaramuzza D, Faundorfer F. Visual odometry: Part Ⅰ: the first 30 years and fundamentals. IEEE Robotics and Automation Magazine, 2011, 18(4): 80-92 doi: 10.1109/MRA.2011.943233
[10]	Fraundorfer F, Scaramuzza D. Visual odometry: Part Ⅱ: matching, robustness, optimization, and applications. IEEE Robotics and Automation Magazine, 2012, 19(2): 78-90 doi: 10.1109/MRA.2012.2182810
[11]	Hesch J A, Roumeliotis S I. A direct least-squares (DLS) method for PnP In: Proceedings of the 2011 International Conference on Computer Vision (ICCV). Barcelona, Spain: IEEE, 2011. 383-390
[12]	Craighead J, Murphy R, Burke J, Goldiez B. A survey of commercial and open source unmanned vehicle simulators. In: Proceedings of the 2007 IEEE International Conference on Robotics and Automation. Roma, Italy: IEEE, 2007. 852-857
[13]	Faessler M, Mueggler E, Schwabe K, Scaramuzza D. A monocular pose estimation system based on infrared LEDs. In: Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA). Hong Kong, China: IEEE, 2014. 907-913
[14]	Meier L, Tanskanen P, Heng L, Lee G H, Fraundorfer F, Pollefeys M. PIXHAWK: a micro aerial vehicle design for autonomous flight using onboard computer vision. Autonomous Robots, 2012, 33(1-2): 21-39 doi: 10.1007/s10514-012-9281-4
[15]	Lee G H, Achtelik M, Fraundorfer F, Pollefeys M, Siegwart R. A benchmarking tool for MAV visual pose estimation. In: Proceedings of the 11th International Conference on Control Automation Robotics and Vision (ICARCV). Singapore, Singapore: IEEE, 2010. 1541-1546
[16]	Klein G, Murray D. Parallel tracking and mapping for small AR workspaces. In: Proceedings of the 6th IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR). Nara, Japan: IEEE, 2007. 225-234
[17]	Leutenegger S, Lynen S, Bosse M, Siegwart R, Furgale P. Keyframe-based visual-inertial odometry using nonlinear optimization. The International Journal of Robotics Research, 2015, 34(3): 314-334 doi: 10.1177/0278364914554813
[18]	Yang Z F, Shen S J. Monocular visual-inertial state estimation with online initialization and camera-IMU extrinsic calibration. IEEE Transactions on Automation Science and Engineering, 2017, 14(1): 39-51 doi: 10.1109/TASE.2016.2550621
[19]	Shen S J, Michael N, Kumar V. Tightly-coupled monocular visual-inertial fusion for autonomous flight of rotorcraft MAVs. In: Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA). Seattle, WA, USA: IEEE, 2015. 5303-5310
[20]	Concha A, Loianno G, Kumar V, Civera J. Visual-inertial direct SLAM. In: Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA). Stockholm, Sweden: IEEE, 2016. 1331-1338
[21]	Kümmerle R, Grisetti G, Strasdat H, Konolige K, Burgard W. G2o: a general framework for graph optimization. In: Proceedings of the 2011 IEEE International Conference on Robotics and Automation (ICRA). Shanghai, China: IEEE, 2011. 3607-3613
[22]	Forster C, Pizzoli M, Scaramuzza D. SVO: fast semi-direct monocular visual odometry. In: Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA). Hong Kong, China: IEEE, 2014. 15-22
[23]	Newcombe R A, Lovegrove S J, Davison A J. DTAM: dense tracking and mapping in real-time. In: Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV). Barcelona, Spain: IEEE, 2011. 2320-2327
[24]	Engel J, Koltun V, Cremers D. Direct sparse odometry. arXiv: 1607. 02565, 2016.
[25]	Engel J, Usenko V, Cremers D. A photometrically calibrated benchmark for monocular visual odometry. arXiv: 1607. 02555, 2016.
[26]	Lucas B D, Kanade T. An iterative image registration technique with an application to stereo vision. In: Proceedings of the 7th International Joint Conference on Artificial Intelligence. Vancouver, BC, Canada: ACM, 1981. 674-679
[27]	Baker S, Matthews I. Lucas-Kanade 20 years on: a unifying framework. International Journal of Computer Vision, 2004, 56(3): 221-255 doi: 10.1023/B:VISI.0000011205.11775.fd
[28]	Klein G, Murray D. Parallel tracking and mapping for small AR workspaces. In: Proceedings of the 6th IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR). Nara, Japan: IEEE, 2007. 225-234
[29]	Concha A, Civera J. DPPTAM: dense piecewise planar tracking and mapping from a monocular sequence. In: Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Hamburg, Germany: IEEE, 2015. 5686-5693
[30]	Engel J, Sturm J, Cremers D. Semi-dense visual odometry for a monocular camera. In: Proceedings of the 2013 IEEE International Conference on Computer Vision. Sydney, NSW, Australia: IEEE, 2013. 1449-1456
[31]	Engel J, Schöps T, Cremers D. LSD-SLAM: large-scale direct monocular SLAM. In: Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014. 834-849
[32]	Rublee E, Rabaud V, Konolige K, Bradski G. ORB: an efficient alternative to SIFT or SURF. In: Proceedings of the 2011 IEEE International Conference on Computer Vision. Barcelona, Spain: IEEE, 2011. 2564-2571
[33]	Rosten E, Porter R, Drummond T. Faster and better: a machine learning approach to corner detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(1): 105-119 doi: 10.1109/TPAMI.2008.275
[34]	Leutenegger S, Chli M, Siegwart R Y. Brisk: binary robust invariant scalable keypoints. In: Proceedings of the 2011 International Conference on Computer Vision. Barcelona, Spain: IEEE, 2011. 2548-2555
[35]	Bay H, Tuytelaars T, Van Gool L. Surf: speeded up robust features. In: Proceedings of the 9th European Conference on Computer Vision. Graz, Austria: Springer, 2006. 404-417
[36]	Mur-Artal R, Montiel J M M, Tardós J D. Orb-SLAM: a versatile and accurate monocular SLAM system. IEEE Transactions on Robotics, 2015, 31(5): 1147-1163 doi: 10.1109/TRO.2015.2463671
[37]	Herrera C D, Kim K, Kannala J, Pulli K, Heikkilä J. DTSLAM: deferred triangulation for robust SLAM. In: Proceedings of the 2nd International Conference on 3D Vision (3DV). Tokyo, Japan: IEEE, 2014. 609-616
[38]	Yang S C, Scherer S. Direct monocular odometry using points and lines. arXiv: 1703. 06380, 2017.
[39]	Lu Y, Song D Z. Robust RGB-D odometry using point and line features. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 3934-3942
[40]	Gomez-Ojeda R, Gonzalez-Jimenez J. Robust stereo visual odometry through a probabilistic combination of points and line segments. In: Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA). Stockholm, Sweden: IEEE, 2016. 2521-2526
[41]	Zhang L L, Koch R. An efficient and robust line segment matching approach based on LBD descriptor and pairwise geometric consistency. Journal of Visual Communication and Image Representation, 2013, 24(7): 794-805 doi: 10.1016/j.jvcir.2013.05.006
[42]	Zhou H Z, Zou D P, Pei L, Ying R D, Liu P L, Yu W X. StructSLAM: visual slam with building structure lines. IEEE Transactions on Vehicular Technology, 2015, 64(4): 1364-1375 doi: 10.1109/TVT.2015.2388780
[43]	Zhang G X, Suh I H. Building a partial 3D line-based map using a monocular SLAM. In: Proceedings of the 2011 IEEE International Conference on Robotics and Automation (ICRA). Shanghai, China: IEEE, 2011. 1497-1502
[44]	Toldo R, Fusiello A. Robust multiple structures estimation with J-linkage. In: Proceedings of the 10th European Conference on Computer Vision. Marseille, France: Springer, 2008. 537-547
[45]	Camposeco F, Pollefeys M. Using vanishing points to improve visual-inertial odometry. In: Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA). Seattle, WA, USA: IEEE, 2015. 5219-5225
[46]	Gräter J, Schwarze T, Lauer M. Robust scale estimation for monocular visual odometry using structure from motion and vanishing points. In: Proceedings of the 2015 IEEE Intelligent Vehicles Symposium (Ⅳ). Seoul, South Korea: IEEE 2015. 475-480
[47]	Karpenko A, Jacobs D, Baek J, Levoy M. Digital Video Stabilization and Rolling Shutter Correction Using Gyroscopes, Stanford University Computer Science Technical Report, CTSR 2011-03, Stanford University, USA, 2011.
[48]	Forssén P E, Ringaby E. Rectifying rolling shutter video from hand-held devices. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). San Francisco, CA, USA: IEEE, 2010. 507-514
[49]	Kerl C, Stüeckler J, Cremers D. Dense continuous-time tracking and mapping with rolling shutter RGB-D cameras. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 2264-2272
[50]	Pertile M, Chiodini S, Giubilato R, Debei S. Effect of rolling shutter on visual odometry systems suitable for planetary exploration. In: Proceedings of the 2016 IEEE Metrology for Aerospace (MetroAeroSpace). Florence, Italy: IEEE, 2016. 598-603
[51]	Kim J H, Cadena C, Reid I. Direct semi-dense SLAM for rolling shutter cameras. In: Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA). Stockholm, Sweden: IEEE, 2016. 1308-1315
[52]	Guo C X, Kottas D G, DuToit R C, Ahmed A, Li R P, Roumeliotis S I. Efficient visual-inertial navigation using a rolling-shutter camera with inaccurate timestamps. In: Proceedings of the 2014 Robotics: Science and Systems. Berkeley, USA: University of California, 2014. 1-9
[53]	Dai Y C, Li H D, Kneip L. Rolling shutter camera relative pose: generalized epipolar geometry. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE, 2016. 4132-4140
[54]	Faugeras O D, Lustman F. Motion and structure from motion in a piecewise planar environment. International Journal of Pattern Recognition and Artificial Intelligence, 1988, 2(3): 485-508 doi: 10.1142/S0218001488000285
[55]	Tan W, Liu H M, Dong Z L, Zhang G F, Bao H J. Robust monocular SLAM in dynamic environments. In: Proceedings of the 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). Adelaide, SA, Australia: IEEE, 2013. 209-218
[56]	Lim H, Lim J, Kim H J. Real-time 6-DOF monocular visual SLAM in a large-scale environment. In: Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA). Hong Kong, China: IEEE, 2014. 1532-1539
[57]	Davison A J, Reid I D, Molton N D, Stasse O. MonoSLAM: real-time single camera SLAM. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 9(6): 1052-1067 http://www.ncbi.nlm.nih.gov/pubmed/17431302
[58]	Özyesil O, Singer A. Robust camera location estimation by convex programming. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA: IEEE, 2015. 2674-2683
[59]	Daubechies I, DeVore R, Fornasier M, Güntürk C S. Iteratively reweighted least squares minimization for sparse recovery. Communications on Pure and Applied Mathematics, 2010, 63(1): 1-38 doi: 10.1002/cpa.v63:1
[60]	Sünderhauf N, Protzel P. Switchable constraints for robust pose graph SLAM. In: Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Vilamoura, Portugal: IEEE, 2012. 1879-1884
[61]	Chum O, Werner T, Matas J. Epipolar geometry estimation via RANSAC benefits from the oriented epipolar constraint. In: Proceedings of the 17th International Conference on Pattern Recognition (ICPR). Cambridge, UK: IEEE, 2004. 112-115
[62]	Salas-Moreno R F, Newcombe R A, Strasdat H, Kelly P H J, Davison A J. SLAM++: simultaneous localisation and mapping at the level of objects. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Portland, OR, USA: IEEE, 2013. 1352-1359
[63]	Dharmasiri T, Lui V, Drummond T. Mo-SLAM: multi object SLAM with run-time object discovery through duplicates. In: Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Daejeon, South Korea: IEEE, 2016. 1214-1221
[64]	Choudhary S, Trevor A J B, Christensen H I, Dellaert F. SLAM with object discovery, modeling and mapping. In: Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Chicago, IL, USA: IEEE, 2014. 1018-1025
[65]	Dame A, Prisacariu V A, Ren C Y, Reid I. Dense reconstruction using 3D object shape priors. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Portland, OR, USA: IEEE, 2013. 1288-1295
[66]	Xiang Y, Fox D. DA-RNN: semantic mapping with data associated recurrent neural networks. arXiv: 1703. 03098, 2017.
[67]	Newcombe R A, Izadi S, Hilliges O, Molyneaux D, Kim D, Davison A J, Kohi P, Shotton J, Hodges S, Fitzgibbon A. KinectFusion: real-time dense surface mapping and tracking. In: Proceedings of the 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR). Basel, Switzerland: IEEE, 2011. 127-136
[68]	McCormac J, Handa A, Davison A, Leutenegger S. SemanticFusion: dense 3D semantic mapping with convolutional neural networks. arXiv: 1609. 05130, 2016.
[69]	Vineet V, Miksik O, Lidegaard M, Nießner M, Golodetz S, Prisacariu V A, Kähler O, Murray D W, Izadi S, Pérez P, Torr P H S. Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction. In: Proceedings of the 2015 IEEE International Conference on in Robotics and Automation (ICRA). Seattle, WA, USA: IEEE, 2015. 75-82
[70]	Zamir A R, Wekel T, Agrawal P, Wei C, Malik J, Savarese S. Generic 3D representation via pose estimation and matching. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam, Netherlands: Springer, 2016. 535-553
[71]	Kendall A, Grimes M, Cipolla R. PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 2938-2946
[72]	Choy C B, Xu D F, Gwak J, Chen K, Savarese S. 3DR2N2: a unified approach for single and multi-view 3D object reconstruction. arXiv: 1604. 00449, 2016.
[73]	Altwaijry H, Trulls E, Hays J, Fua P, Belongie S. Learning to match aerial images with deep attentive architectures. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE, 2016. 3539-3547
[74]	Rambach J R, Tewari A, Pagani A, Stricker D. Learning to fuse: a deep learning approach to visual-inertial camera pose estimation. In: Proceedings of the 2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). Merida, Mexico: IEEE, 2016. 71-76
[75]	Kar A, Tulsiani S, Carreira J, Malik J. Category-specific object reconstruction from a single image. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA: IEEE, 2015. 1966-1974
[76]	Vicente S, Carreira J, Agapito L, Batista J. Reconstructing PASCAL VOC. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, OH, USA: IEEE, 2014. 41-48
[77]	Doumanoglou A, Kouskouridas R, Malassiotis S, Kim T K. Recovering 6D object pose and predicting next-best-view in the crowd. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE, 2016. 3583-3592
[78]	Tejani A, Tang D, Kouskouridas R, Kim T K. Latent-class hough forests for 3D object detection and pose estimation. In: Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014. 462-477
[79]	Agrawal P, Carreira J, Malik J. Learning to see by moving. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 37-45
[80]	Vijayanarasimhan S, Ricco S, Schmid C, Sukthankar R, Fragkiadaki K. SfM-Net: learning of structure and motion from video. arXiv: 1704. 07804, 2017.
[81]	Byravan A, Fox D. SE3-Nets: learning rigid body motion using deep neural networks. arXiv: 1606. 02378, 2016.
[82]	Chopra S, Hadsell R, LeCun Y. Learning a similarity metric discriminatively, with application to face verification. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). San Diego, CA, USA: IEEE, 2005. 539-546
[83]	Lengyel E S. Voxel-based Terrain for Real-time Virtual Simulations [Ph. D. dissertation], University of California, USA, 2010. 67-82
[84]	Wohlhart P, Lepetit V. Learning descriptors for object recognition and 3D pose estimation. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA: IEEE, 2015. 3109-3118
[85]	Hazirbas C, Ma L N, Domokos C, Cremers D. FuseNet: incorporating depth into semantic segmentation via fusionbased CNN architecture. In: Proceedings of the 13th Asian Conference on Computer Vision. Taipei, China: Springer, 2016. 213-228
[86]	DeTone D, Malisiewicz T, Rabinovich A. Deep image homography estimation. arXiv: 1606. 03798, 2016.
[87]	Liu F Y, Shen C H, Lin G S. Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA: IEEE, 2015. 5162-5170
[88]	Handa A, Bloesch M, Pǎtrǎucean V, Stent S, McCormac J, Davison A. Gvnn: neural network library for geometric computer vision. Computer Vision-ECCV 2016 Workshops. Cham: Springer, 2016.
[89]	Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K. Spatial transformer networks. In: Proceedings of the 2015 Advances in Neural Information Processing Systems. Montreal, Canada: Curran Associates, Inc., 2015. 2017-2025
[90]	Han X F, Leung T, Jia Y Q, Sukthankar R, Berg A C. MatchNet: unifying feature and metric learning for patch-based matching. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA: IEEE, 2015. 3279-3286
[91]	Burgard W, Stachniss C, Grisetti G, Steder B, Kÿmmerle R, Dornhege C, Ruhnke M, Kleiner A, Tardös J D. A comparison of SLAM algorithms based on a graph of relations. In: Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems. St. Louis, MO, USA: IEEE, 2009. 2089-2095
[92]	Kümmerle R, Steder B, Dornhege C, Ruhnke M, Grisetti G, Stachniss C, Kleiner A. On measuring the accuracy of SLAM algorithms. Autonomous Robots, 2009, 27(4): 387 -407 doi: 10.1007/s10514-009-9155-6
[93]	Kaehler A, Bradski G. Open source computer vision library [Online], available: https://github.com/itseez/opencv, February 2, 2018
[94]	Furgale P, Rehder J, Siegwart R. Unified temporal and spatial calibration for multi-sensor systems. In: Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Tokyo, Japan: IEEE, 2013. 1280-1286
[95]	Snavely N, Seitz S M, Szeliski R. Photo tourism: exploring photo collections in 3D. ACM Transactions on Graphics, 2006, 25(3): 835-846 doi: 10.1145/1141911
[96]	Moulon P, Monasse P, Marlet R. OpenMVG [Online], available: https://github.com/openMVG/openMVG, December 9, 2017
[97]	Capel D, Fitzgibbon A, Kovesi P, Werner T, Wexler Y, Zisserman A. MATLAB functions for multiple view geometry [Online], available: http://www.robots.ox.ac.uk/~vgg/hzbook/code, October 14, 2017
[98]	Agarwal S, Mierle K. Ceres solver [Online], available: http://ceres-solver.org, January 9, 2018
[99]	Dellaert F. Factor Graphs and GTSAM: a Hands-on Introduction, Technical Report, GT-RIM-CP & R-2012-002, February 10, 2018
[100]	Kaess M, Ranganathan A, Dellaert F. iSAM: incremental smoothing and mapping. IEEE Transactions on Robotics, 2008, 24(6): 1365-1378 doi: 10.1109/TRO.2008.2006706
[101]	Polok L, Ila V, Solony M, Smrz P, Zemcik P. Incremental block cholesky factorization for nonlinear least squares in robotics. In: Proceedings of the 2013 Robotics: Science and Systems. Berlin, Germany: MIT Press, 2013. 1-7
[102]	Vedaldi A, Fulkerson B. VLFeat: an open and portable library of computer vision algorithms [Online], available: http://www.vlfeat.org/, November 5, 2017
[103]	Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? the KITTI vision benchmark suite. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, RI, USA: IEEE, 2012. 3354-3361
[104]	Sturm J, Engelhard N, Endres F, Burgard W, Cremers D. A benchmark for the evaluation of RGB-D slam systems. In: Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robot and Systems (IROS). Vilamoura, Portugal: IEEE, 2012. 573-580
[105]	Handa A, Whelan T, McDonald J, Davison A J. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In: Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA). Hong Kong, China: IEEE, 2014. 1524-1531
[106]	Burri M, Nikolic J, Gohl P, Schneider T, Rehder J, Omari S, Achtelik M W, Siegwart R. The EuRoC micro aerial vehicle datasets. The International Journal of Robotics Research, 2016, 35(10): 1157-1163 doi: 10.1177/0278364915620033
[107]	Mayer N, Ilg E, Häusser P, Fischer P, Cremers D, Dosovitskiy A, Brox T. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the 2016 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE, 2016. 4040-4048
[108]	Pronobis A, Caputo B. COLD: the COsy localization database. The International Journal of Robotics Research, 2009, 28(5): 588-594 doi: 10.1177/0278364909103912
[109]	Silberman N, Hoiem D, Kohli P, Fergus R. Indoor segmentation and support inference from RGBD images. In: Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: ACM, 2012. 746-760
[110]	Silberman N, Fergus R. Indoor scene segmentation using a structured light sensor. In: Proceedings of the 2011 IEEE International Conference on Computer Vision Workshop. Barcelona, Spain: IEEE, 2011. 601-608
[111]	Xiang Y, Mottaghi R, Savarese S. Beyond PASCAL: a benchmark for 3D object detection in the wild. In: Proceedings of the 2014 IEEE Winter Conference on Applications of Computer Vision (WACV). Steamboat Springs, CO, USA: IEEE, 2014. 75-82
[112]	Nikolskiy V P, Stegailov V V, Vecher V S. Efficiency of the tegra K1 and X1 systems-on-chip for classical molecular dynamics. In: Proceedings of the 2016 International Conference on High Performance Computing and Simulation (HPCS). Innsbruck, Austria: IEEE, 2016. 682-689
[113]	Pizzoli M, Forster C, Scaramuzza D. REMODE: probabilistic, monocular dense reconstruction in real time. In: Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA). Hong Kong, China: IEEE, 2014. 2609-2616
[114]	Faessler M, Fontana F, Forster C, Mueggler E, Pizzoli M, Scaramuzza D. Autonomous, vision-based flight and live dense 3D mapping with a quadrotor micro aerial vehicle. Journal of Field Robotics, 2016, 33(4): 431-450 doi: 10.1002/rob.2016.33.issue-4
[115]	Weiss S, Achtelik M W, Chli M, Siegwart R. Versatile distributed pose estimation and sensor self-calibration for an autonomous MAV. In: Proceedings of the 2012 IEEE International Conference on Robotics and Automation (ICRA). Saint Paul, MN, USA: IEEE, 2012. 31-38
[116]	Weiss S, Siegwart R. Real-time metric state estimation for modular vision-inertial systems. In: Proceedings of the 2011 IEEE International Conference on Robotics and Automation (ICRA). Shanghai, China: IEEE, 2011. 4531-4537
[117]	Lynen S, Achtelik M W, Weiss S, Chli M, Siegwart R. A robust and modular multi-sensor fusion approach applied to MAV navigation. In: Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Tokyo, Japan: IEEE, 2013. 3923-3929