2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

性能函数引导的无人机集群深度强化学习控制方法

王耀南 华和安 张辉 钟杭 樊叶心 梁鸿涛 常浩 方勇纯

王耀南, 华和安, 张辉, 钟杭, 樊叶心, 梁鸿涛, 常浩, 方勇纯. 性能函数引导的无人机集群深度强化学习控制方法. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240519
引用本文: 王耀南, 华和安, 张辉, 钟杭, 樊叶心, 梁鸿涛, 常浩, 方勇纯. 性能函数引导的无人机集群深度强化学习控制方法. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240519
Wang Yao-Nan, Hua He-An, Zhang Hui, Zhong Hang, Fan Ye-Xin, Liang Hong-Tao, Chang Hao, Fang Yong-Chun. Performance function-guided deep reinforcement learning control for UAV swarm. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240519
Citation: Wang Yao-Nan, Hua He-An, Zhang Hui, Zhong Hang, Fan Ye-Xin, Liang Hong-Tao, Chang Hao, Fang Yong-Chun. Performance function-guided deep reinforcement learning control for UAV swarm. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240519

性能函数引导的无人机集群深度强化学习控制方法

doi: 10.16383/j.aas.c240519 cstr: 32138.14.j.aas.c240519
基金项目: 科技创新2030-“新一代人工智能”重大项目(2021ZD0114503), 国家自然科学基金(62403190, 62427813, 62433010)资助
详细信息
    作者简介:

    王耀南:中国工程院院士, 湖南大学电气与信息工程学院教授. 主要研究方向为机器人学, 智能控制和图像处理. E-mail: yaonan@hnu.edu.cn

    华和安:湖南大学电气与信息工程学院副研究员. 主要研究方向为空中机器人的智能规划、控制与集群. E-mail: huahean@hnu.edu.cn

    张辉:湖南大学机器人学院教授. 主要研究方向为机器视觉, 图像处理和机器人控制. 本文通信作者. E-mail: zhanghuihby@126.com

    钟杭:湖南大学机器人学院副教授. 主要研究方向为机器人控制, 视觉伺服和路径规划. E-mail: zhonghang@hnu.edu.cn

    樊叶心:湖南大学机器人学院博士后. 主要研究方向为 机器人感知与控制、深度强化学习及运动规划. E-mail: yexinfan@hnu.edu.cn

    梁鸿涛:湖南大学电气与信息工程学院博士研究生. 主要研究方向为空中机器人集群运动控制与路径规划. E-mail: lianghongtao1@hnu.edu.cn

    常浩:湖南大学电气与信息工程学院博士研究生. 主要研究方向为空中机器人的视觉感知与路径规划. E-mail: changhao@hnu.edu.cn

    方勇纯:南开大学机器人与信息自动化研究所教授. 主要研究方向为非线性控制、视觉伺服、欠驱动系统控制和基于原子力显微镜的纳米系统. E-mail: fangyc@nankai.edu.cn

Performance Function-Guided Deep Reinforcement Learning Control for UAV Swarm

Funds: Supported by the National Key Research and Development Program of China (2021ZD0114503), National Natural Science Foundation of China (62403190, 62427813, 62433010)
More Information
    Author Bio:

    WANG Yao-Nan Academician at Chinese Academy of Engineering, professor at the College of Electrical and Information Engineering, Hunan University. His research interest covers robotics, intelligent control, and image processing

    HUA He-An Associate researcher atthe College of Electrical and Information Engineering, Hunan University. His main research interest is intelligent planning, control, and swarming of aerial robots

    ZHANG Hui Professor at the School of Robotics, Hunan University. His research interest covers machine vision, image processing, and robot control. Corresponding author of this paper

    ZHONG Hang Associate Professor at the School of Robotics, Hunan University. His research interest covers robot control, visual servoing, and path planning

    FAN Ye-Xin Postdoctoral fellow at the School of Robotics, Hunan University. Her research interests include robot perception and control, deep reinforcement learning and motion planning

    LIANG Hong-Tao PhD candidate at the College of Electrical and Information Engineering, Hunan University. His research interests include swarm motion control and path planning for aerial robots

    CHANG Hao PhD candidate at the College of Electrical and Information Engineering, Hunan University. His research interests include visual perception and path planning for aerial robots

    FANG Yong-Chun Professor at the Institute of Robotics and Automatic Information Systems, Nankai University. His research interest covers nonlinear control, robot visual servoing control, control of underactuated systems and AFM-based nanosystems

  • 摘要: 针对无人机集群系统, 提出了一种性能函数引导的深度强化学习控制方法, 同时评估性能函数的示范经验与学习策略的探索动作, 保证了高效可靠的策略更新, 实现了无人机集群系统的高性能控制. 首先, 利用领航-跟随集群框架, 将无人机集群的控制问题转化为领航-跟随框架下的跟踪问题, 进而提出了基于模型的跟踪控制方法, 利用性能函数将集群编队误差约束在给定范围内, 实现了无人机集群的模型驱动控制. 接下来, 为了解决复杂工况下性能函数极易失效难题, 将深度强化学习方法和性能函数驱动方法结合, 提出了性能函数引导的深度强化学习控制方法, 利用性能函数的示范经验辅助训练强化学习网络, 通过同时评估探索与示范动作, 保证学习策略显著优于性能函数驱动控制方法, 有效提高了无人机编队控制精度与鲁棒性. 实验结果表明, 该方法能够显著提升无人机集群的控制精度, 实现了兼顾鲁棒性与飞行精度的高性能集群控制.
  • 图  1  无人机领航-跟随编队模型示意图

    Fig.  1  Schematic diagram of drone pilot-following formation model

    图  2  性能函数引导的深度强化学习集群控制框架

    Fig.  2  Performance-driven cluster control bootstrap policy framework

    图  4  无人机集群控制策略训练与测试框架

    Fig.  4  Uav cluster control strategy training and testing framework

    图  3  性能函数驱动的集群控制引导策略框架

    Fig.  3  Performance function driven cluster control guidance policy framework

    图  5  无人机集群拓扑结构

    Fig.  5  Topology of UAV cluster

    图  6  无人机集群的编队飞行轨迹

    Fig.  6  Drone cluster formation flight trajectory

    图  9  无人机集群在$ x,\;y,\;z $三个方向的飞行曲线

    Fig.  9  The flight curve of the drone cluster in $x $, $y $ and $z $ directions

    图  7  无人机集群的飞行速度与误差曲线

    Fig.  7  Flight speed and error curve of UAV cluster

    图  8  无人机集群的飞行轨迹曲线

    Fig.  8  Uav cluster flight trajectory curve

    图  12  无人机集群的飞行轨迹曲线

    Fig.  12  The flight trajectory curves of drone swarms

    图  13  无人机集群的误差与评价曲线

    Fig.  13  Error and Evaluation Curves of Drone Swarms

    图  10  无人机集群在$ x,\;y,\;z $三个方向的飞行误差

    Fig.  10  Flight error of UAV cluster in $x $, $y $ and $z $ directions

    图  11  深度强化学习策略与引导策略在双critic框架中的评价曲线

    Fig.  11  Evaluation curves of deep reinforcement learning strategies and guidance strategies in the dual critic framework

    参数
    无人机质量$ \text{m}_i $ $ \text{m}_1 = 1.6\text{kg},\;\text{m}_{2-5} = 1.0 \text{kg} $
    无人机转动惯量$ \mathbb{I}_i $ $ \text{diag}[0.01\,\;\,\;0.01\,\;\,\;0.01]\text{kg}\cdot \text{m}^2 $
    重力加速度 $ 9.8\text{m}/\text{s}^2 $
    学习率$ \lambda_{\alpha_{1,\;2,\;3}} $ $ 1,\;2,\;2\times 10^{-4} $
    训练回合数$ M_\text{max} $ $ 100 $
    训练步数$ N_\text{max} $ $ 500 $
    经验池大小$ \mathcal{B}_{1,\;2} $ $ 10000 $
    采样数据量$ {N_m} $ $ 128 $
    训练折扣因子$ \gamma $ $ 0.95 $
    探索与平滑系数$ \sigma_{1,\;2} $ $ 0.1,\;0.05 $
    控制策略交互频率 $ 100 $ Hz
    引导策略参数$ k_{\varphi ij},\;\beta_{ij} $ $ 0.2,\; 0.3 $
    辅助增益矩阵$ K_{pi} $ $ \text{diag}[4\ 4\ 4] $
    辅助增益矩阵$ K_{Ri} $ $ \text{diag}[1.5\ 1.5\ 1.5] $
    外环控制参数$ K_{\zeta i} $ $ \text{diag}[2\ 2\ 2] $
    内环控制参数 $ k_{\eta i} $ $ \text{diag}[1.5\ 1.5\ 1.5] $
    内环控制参数$ k_i $ $ \text{diag}[2\ 2\ 2] $
    下载: 导出CSV
  • [1] 陈谋, 马浩翔, 雍可南, 吴颖. 无人机安全飞行控制综述. 机器人, 2023, 45: 345−366

    Chen M, Ma H X, Yong K N, Wu Y. Safety Flight Control of UAV:A Survey. Robotics, 2023, 45: 345−366
    [2] Erskine J, Briot S, Fantoni I, Chriette A. Singularity Analysis of Rigid Directed Bearing Graphs for Quadrotor Formations. IEEE Transactions on Robotics, 2024, 40: 139−157 doi: 10.1109/TRO.2023.3324198
    [3] 代波, 何玉庆, 谷丰, 王骞翰, 徐卫良. 基于加速度反馈增强的旋翼无人机抗风扰控制. 机器人, 2020, 42(1): 79−88

    Dai B, He Y Q, Gu F, Wang Q H, Xu W L. Acceleration Feedback Enhanced Controller for Wind Disturbance Rejection of Rotor Unmanned Aerial Vehicle. Robotics, 2020, 42(1): 79−88
    [4] 蔡运颂, 许璟, 牛玉刚. 基于自适应多尺度超螺旋算法的无人机集群姿态同步控制. 自动化学报, 2023, 49: 1656−1666

    Cai Y S, Xu J, Niu Y G. Attitude Consensus Control of UAV Swarm Based on Adaptive Multi-scale Super-twisting Algorithm. Acta Automatica Sinica, 2023, 49: 1656−1666
    [5] Ille M, Namerikawa T. Collision Avoidance between Multi-UAV-Systems Considering Formation Control using MPC. 2017 IEEE International Conference on Advanced Intelligent Mechatronics (AIM), 2017: 651−656
    [6] 茹常剑, 魏瑞轩, 戴静, 沈东, 张立鹏. 基于纳什议价的无人机编队自主重构控制方法. 自动化学报, 2013, 39: 1349−1359

    Ru C J, Wei R X, Dai J, Shen D, Zhang L P. Autonomous Reconfiguration Control Method for UAV s Formation Based on Nash Bargain. Acta Automatica Sinica, 2013, 39: 1349−1359
    [7] Qi J T, Guo J J, Wang M M, Wu C, Ma Z W. Formation Tracking and Obstacle Avoidance for Multiple Quadrotors With Static and Dynamic Obstacles. IEEE Robotics and Automation Letters, 2022, 7(2): 1713−1720
    [8] Shi Y, Hua Y Z, Yu J L, Dong X W, Lü J H, Ren Z. Cooperative Fault-Tolerant Formation Tracking Control for Heterogeneous Air-Ground Systems Using a Learning-Based Method. IEEE Transactions on Aerospace and Electronic Systems, 2024, 60(2): 1505−1518 doi: 10.1109/TAES.2023.3336638
    [9] Zhang Y, Ma L, Yang C Y, Zhou L N, Wang G Q, Dai W. Formation Control for Multiple Quadrotors Under DoS Attacks via Singular Perturbation. IEEE Transactions on Aerospace and Electronic Systems, 2023, 59(4): 4753−4762 doi: 10.1109/TAES.2023.3241139
    [10] Park B S, Yoo S J. Time-Varying Formation Control With Moving Obstacle Avoidance for Input-Saturated Quadrotors With External Disturbances. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024, 54(5): 3270−3282 doi: 10.1109/TSMC.2024.3358345
    [11] Du H B, Zhu W W, Wen G H, Duan Z S, Lü J H. Distributed Formation Control of Multiple Quadrotor Aircraft Based on Nonsmooth Consensus Algorithms. IEEE Transactions on Cybernetics, 2019, 49(1): 342−353 doi: 10.1109/TCYB.2017.2777463
    [12] Dong X W, Yu B C, Shi Z Y, Zhong Y S. Time-Varying Formation Control for Unmanned Aerial Vehicles: Theories and Applications. IEEE Transactions on Control Systems Technology, 2015, 23(1): 340−348
    [13] Hu Z J, Jin X. Formation Control for an UAV Team With Environment-Aware Dynamic Constraints. IEEE Transactions on Intelligent Vehicles, 2024, 9(1): 1465−1480 doi: 10.1109/TIV.2023.3295354
    [14] Wang Z X, Zou Y, Liu Y Z, Meng Z Y. Distributed Control Algorithm for Leader-Follower Formation Tracking of Multiple Quadrotors: Theory and Experiment. IEEE/ASME Transactions on Mechatronics, 2021, 26(2): 1095−1105
    [15] Liu H, Ma T, Lewis F L, Wan Y. Robust Formation Trajectory Tracking Control for Multiple Quadrotors With Communication Delays. IEEE Transactions on Control Systems Technology, 2020, 28(6): 2633−2640 doi: 10.1109/TCST.2019.2942277
    [16] Wu J, Luo C, Min G, McClean S. Formation Control Algorithms for Multi-UAV Systems with Unstable Topologies and Hybrid Delays. IEEE Transactions on Vehicular Technology, 20241−12
    [17] Dai S L, He S D, Chen X, Jin X. Adaptive Leader-Follower Formation Control of Nonholonomic Mobile Robots With Prescribed Transient and Steady-State Performance. IEEE Transactions on Industrial Informatics, 2020, 16(6): 3662−3671 doi: 10.1109/TII.2019.2939263
    [18] Shen Y Y, Zhou J, Xu Z D, Zhao F G, Xu J M, Chen J M, Li S. Aggressive Trajectory Generation for a Swarm of Autonomous Racing Drones. 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 20237436−7441
    [19] Song F L, Li Z, Yu X H. A Feedforward Quadrotor Disturbance Rejection Method for Visually Identified Gust Sources Based on Transfer Reinforcement Learning. IEEE Transactions on Aerospace and Electronic Systems, 2023, 59(5): 6612−6623
    [20] Xiao C X, Lu P, He Q X. Flying Through a Narrow Gap Using End-to-End Deep Reinforcement Learning Augmented With Curriculum Learning and Sim2Real. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(5): 2701−2708
    [21] Han H R, Cheng J, Xi Z L, Yao Bingcai. Cascade Flight Control of Quadrotors Based on Deep Reinforcement Learning. IEEE Robotics and Automation Letters, 2022, 7(4): 11134−11141 doi: 10.1109/LRA.2022.3196455
    [22] Hua H A, Fang Y C. A Novel Reinforcement Learning-Based Robust Control Strategy for a Quadrotor. IEEE Transactions on Industrial Electronics, 2023, 70(3): 2812−2821 doi: 10.1109/TIE.2022.3165288
    [23] Zhao W B, Liu H, Lewis F L. Robust Formation Control for Cooperative Underactuated Quadrotors via Reinforcement Learning. IEEE Trans. Neural Netw. Learn. Syst., 2021, 32(10): 4577−4587 doi: 10.1109/TNNLS.2020.3023711
    [24] Hua H A, Fang Y C. A Novel Learning-Based Trajectory Generation Strategy for a Quadrotor. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(7): 9068−9079
    [25] Hwangbo J, Sa I, Siegwart R, Hutter M. Control of a Quadrotor With Reinforcement Learning. IEEE Rob. Autom. Lett., 2017, 2(4): 2096−2103 doi: 10.1109/LRA.2017.2720851
    [26] Pu Z Q, Wang H M, Liu Z, Yi J Q, Wu S G. Attention Enhanced Reinforcement Learning for Multi-agent Cooperation. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(11): 8235−8249
    [27] Sun Q Y, Fang J B, Zheng W X, Tang Y. Aggressive Quadrotor Flight Using Curiosity-Driven Reinforcement Learning. IEEE Transactions on Industrial Electronics, 2022, 69(12): 13838−13848 doi: 10.1109/TIE.2022.3144586
    [28] Wang Y D, Sun J, He H B, Sun C Y. Deterministic Policy Gradient With Integral Compensator for Robust Quadrotor Control. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(10): 3713−3725 doi: 10.1109/TSMC.2018.2884725
    [29] Raja G, Essaky S, Ganapathisubramaniyan A, Baskar Y. Nexus of Deep Reinforcement Learning and Leader-Follower Approach for AIoT Enabled Aerial Networks. IEEE Transactions on Industrial Informatics, 2023, 19(8): 9165−9172 doi: 10.1109/TII.2022.3226529
    [30] Yoo J, Jang D, Kim H J, Johansson K H. Hybrid Reinforcement Learning Control for a Micro Quadrotor Flight. IEEE Control Systems Letters, 2021, 5(2): 505−510
    [31] Koryakovskiy I, Kudruss M, Vallery H, Babuska R, Caarls W. Model-Plant Mismatch Compensation Using Reinforcement Learning. IEEE Robotics and Automation Letters, 2018, 3(3): 2471−2477 doi: 10.1109/LRA.2018.2800106
    [32] Furrer F, Burri M, Achtelik M, and Siegwart R, RotorS-A Modular Gazebo MAV Simulator Framework. Springer International Publishing, 2016, pp. 595-625.
    [33] Hua H, Fang Y, Zhang X, and Qian C. A time-optimal trajectory planning strategy for an aircraft with a suspended payload via optimization and learning approaches. IEEE Transactions on Control Systems Technology, 2022, 30(6): 2333−2343 doi: 10.1109/TCST.2021.3139762
  • 加载中
计量
  • 文章访问数:  23
  • HTML全文浏览量:  27
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-07-23
  • 录用日期:  2024-11-11
  • 网络出版日期:  2024-11-22

目录

    /

    返回文章
    返回