2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于GPR和深度强化学习的分层人机协作控制

金哲豪 刘安东 俞立

金哲豪, 刘安东, 俞立. 基于GPR和深度强化学习的分层人机协作控制. 自动化学报, 2020, 46(x): 1−11 doi: 10.16383/j.aas.c190451
引用本文: 金哲豪, 刘安东, 俞立. 基于GPR和深度强化学习的分层人机协作控制. 自动化学报, 2020, 46(x): 1−11 doi: 10.16383/j.aas.c190451
Jin Zhe-Hao, Liu An-Dong, Yu Li. Hierarchical human-robot cooperative control based on GPR and DRL. Acta Automatica Sinica, 2020, 46(x): 1−11 doi: 10.16383/j.aas.c190451
Citation: Jin Zhe-Hao, Liu An-Dong, Yu Li. Hierarchical human-robot cooperative control based on GPR and DRL. Acta Automatica Sinica, 2020, 46(x): 1−11 doi: 10.16383/j.aas.c190451

基于GPR和深度强化学习的分层人机协作控制

doi: 10.16383/j.aas.c190451
基金项目: NSFC-浙江两化融合联合基金(U1709213), 国家自然科学基金(61973275)资助
详细信息
    作者简介:

    金哲豪:浙江工业大学信息工程学院硕士研究生. 主要研究方向为人机协作.E-mail: jzh839881963@163.com

    刘安东:浙江工业大学信息工程学院讲师. 主要研究方向为模型预测控制和网络化控制系统.E-mail: lad@zjut.edu.cn

    俞立:浙江工业大学信息工程学院教授. 主要研究方向为无线传感网络, 网络化控制系统和运动控制. 本文通信作者.E-mail: lyu@zjut.edu.cn

Hierarchical Human-Robot Cooperative Control Based on GPR and DRL

Funds: Supported by NFSC-Zhejiang Joint Foundation for the Integration of Industrialization and Informatization (U1709213), Natural Science Foundation of China (61973275)
  • 摘要: 本文提出了一种基于高斯过程回归(Gaussian Process Regression, GPR)与深度强化学习(Deep Reinforcement Learning, DRL)的分层人机协作(Human-Robot Collaborative, HRC)控制方法, 并以人机协作控制球杆系统为例检验该方法的高效性. 本文的主要贡献是: 1)在模型未知的情况下, 采用DRL算法设计了一种有效的非线性次优控制策略, 并将其作为顶层期望控制策略以引导HRC控制过程, 解决了传统控制方法无法直接应用于模型未知人机协作场景的问题; 2) 针对HRC过程中人未知和随机控制策略带来的不利影响, 采用GPR拟合人体控制策略以建立机器人对人控制行为的认知模型, 在减弱该不利影响的同时提升机器人在协作过程中的主动性, 从而进一步提升协作效率; 3)利用所得认知模型和期望控制策略设计机器人末端速度的控制律, 并通过实验对比验证了所提方法的有效性.
  • 图  1  人机协作控制球杆系统示意图

    Fig.  1  Schematic diagram of the HRC task

    图  2  分层人机协作球杆结构示意图

    Fig.  2  Schematic diagram of hierarchical HRC

    图  3  DDPG训练过程曲线图.

    Fig.  3  Training process curves of DDPG.

    图  4  DDPG控制效果图.

    Fig.  4  The control result of DDPG.

    图  5  DDPG与DQN的控制效果对比图.

    Fig.  5  The comparison of control effects between DDPG and DQN.

    图  6  人机协作实验环境图.

    Fig.  6  The environment of the HRC task.

    图  7  志愿者控制球杆数据的滤波效果图.

    Fig.  7  Filtering results of the data generated by volunteers' control process.

    图  8  志愿者控制球杆系统的部分轨迹图.

    Fig.  8  Some trajectories of the volunteers' control processes.

    图  9  人体控制策略预测模型拟合结果图.

    Fig.  9  The fitting result of human-control policy prediction model.

    图  10  顶层期望控制策略控制效果的实验验证.

    Fig.  10  The experimental validation of the expected control policy.

    图  11  人机协作控制效果的实验验证.

    Fig.  11  The experimental validation of the HRC control.

    图  12  人体控制策略预测模型预测结果.

    Fig.  12  The prediction result of the human-control policy prediction model.

  • [1] Amirshirzad N, Kumru A, Oztop E. Human adaptation to human–robot shared control. IEEE Transactions on Human-Machine Systems, 2019, 49(2): 126−136 doi: 10.1109/THMS.2018.2884719
    [2] Wojtara Y, Murayama H, Howard M, Shimoda S, Sakai S, Fujimoto H, et al. Human-robot collaboration in precise positioning of a three-dimensional object. Automatica, 2009, 45(2): 333−342 doi: 10.1016/j.automatica.2008.08.021
    [3] Dumora J, Geffard F, Bidard C, Brouillet T, Fraisse P. Experimental study on haptic communication of a human in a shared human-robot collaborative task. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. Vilamoura, Portugal: IEEE, 2012. 5137−5144
    [4] Karayiannidis Y, Smith C, Kragic D. Mapping human intentions to robot motions via physical interaction through a jointly-held object. In: Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication. Edinburgh, UK: IEEE, 2014. 391−397
    [5] Karayiannidis Y, Smith C, Vina F E, Kragic D. Online kinematics estimation for active human-robot manipulation of jointly held objects. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. Tokyo, Japan: IEEE, 2013. 4872−4878
    [6] Burdet E, Milner T E. Quantization of human motions and learning of accurate movements. Biological cybernetics, 1998, 78(4): 307−318 doi: 10.1007/s004220050435
    [7] Maeda Y, Hara T, Arai T. Human-robot cooperative manipulation with motion estimation. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. Maui, USA: IEEE, 2001. 2240−2245
    [8] Corteville B, Aertbelien E, Bruyninckx H, Schutter J D, Brussel H V. Human-inspired robot assistant for fast point-to-point movements. In: Proceedings of 2007 IEEE International Conference on Robotics and Automation. Roma, Italy: IEEE, 2007. 3639−3644
    [9] Miossec S, Kheddar A. Human motion in cooperative tasks: Moving object case study. In: Proceedings of 2008 IEEE International Conference on Robotics and Biomimetics. Bangkok, Thailand: IEEE, 2009. 1509−1514
    [10] Sheng W H, Thobbi A, Gu Y. An integrated framework for human–robot collaborative manipulation. IEEE Transactions on Cybernetics, 2015, 45(10): 2030−2041 doi: 10.1109/TCYB.2014.2363664
    [11] Thobbi A, Gu Y, Sheng W H. Using human motion estimation for human-robot cooperative manipulation. In: Proceedings of 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. San Francisco, USA: IEEE, 2011. 2873−2878
    [12] Deng Z, Mi J P, Han D, Huang R, Xiong X F, Zhang J W. Hierarchical robot learning for physical collaboration between humans and robots. In: Proceedings of IEEE International Conference on Robotics and Biomimetics. Macau, China: IEEE, 2017. 750−755
    [13] Agravante D J, Cherubini A, Bussy A, Kheddar A. Humanhumanoid joint haptic table carrying task with height stabilization using vision. In: Proceedings of 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. Tokyo, Japan: IEEE, 2013. 4609−4614
    [14] Agravante D J, Cherubini A, Bussy A, Gergondet P, Kheddar A. Collaborative human-humanoid carrying using vision and haptic sensing. In: Proceedings of 2014 IEEE International Conference on Robotics and Automation. Hong Kong, China: IEEE, 2014. 607−612
    [15] Mainprice J, Berenson D. Human-robot collaborative manipulation planning using early prediction of human motion. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. Tokyo, Japan: IEEE, 2013. 299−306
    [16] Maria K, Muhammad A H, Danijela R D, Axel G. Robot learning of industrial assembly task via human demonstrations. Autonomous Robots, 2019, 43(1): 239−257 doi: 10.1007/s10514-018-9725-6
    [17] Ghadirzadeh A, Butepage J, Maki A, Kragic D, Bjorkman M. A sensorimotor reinforcement learning framework for physical human-robot interaction. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. Daejeon, South Korea: IEEE, 2016. 2682−2688
    [18] Wang P, Liu H Y, Wang L H, Gao R X. Deep learning-based human motion recognition for predictive context-aware human-robot collaboration. CIRP Annals - Manufacturing Technology, 2018, 67(1): 17−20 doi: 10.1016/j.cirp.2018.04.066
    [19] Wang Z, Peer A, Buss M. An HMM approach to realistic haptic human-robot interaction. In: Proceedings of World Haptics 2009 - 3rd Joint EuroHaptics conference and Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems. Teleoperator Syst. Salt Lake City, USA: IEEE, 2016. 374−379
    [20] Mainprice J, Berenson D. Learning human-robot collaboration with POMDP. In: Proceedings of International Conference on Control, Automation and Systems. Gyeongju, South Korea: IEEE, 2013. 1238−1243
    [21] Hawkins K P, Vo N, Bansal S, Bobick A F. Probabilistic human action prediction and wait-sensitive planning for responsive human-robot collaboration. In: Proceedings of the 13th IEEE-RAS International Conference on Humanoid Robots. Atlanta, USA: IEEE, 2013. 499−506
    [22] Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Silver D, et al. Continuous control with deep reinforcement learning. In: Proceedings of International Conference on Learning Representations. San Juan, Puerto Rico: IEEE, 2016. 2153−0866
    [23] Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529−533 doi: 10.1038/nature14236
    [24] Hado V H, Guez A, Silver D. Deep reinforcement learning with double q-learning. In: Proceedings of AAAI Conference on Artificial Intelligence. Arizona, USA: AAAI, 2016. 2094−2100
    [25] Silver D, Lever G, Hess N, Degris T, Wierstra D, Riedmiller M. Deterministic policy gradient algorithms. In: Proceedings of International Conference on Machine Learning. Beijing, China: MIT, 2014. 605−619
    [26] Espersson M. Vision Algorithms for ball on beam and plate[Master. dissertation], Lund University, 2010
  • 加载中
计量
  • 文章访问数:  4
  • HTML全文浏览量:  1
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-06-11
  • 录用日期:  2019-12-06

目录

    /

    返回文章
    返回