• 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于因果影响检测的多无人机海上协同导航策略优化方法

柳文章 陆建华 任璐 孙长银

柳文章, 陆建华, 任璐, 孙长银. 基于因果影响检测的多无人机海上协同导航策略优化方法. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250691
引用本文: 柳文章, 陆建华, 任璐, 孙长银. 基于因果影响检测的多无人机海上协同导航策略优化方法. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250691
Liu Wen-Zhang, Lu Jian-Hua, Ren Lu, Sun Chang-Yin. Policy optimization method for multi-uav cooperative maritime navigation based on causal influence detection. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250691
Citation: Liu Wen-Zhang, Lu Jian-Hua, Ren Lu, Sun Chang-Yin. Policy optimization method for multi-uav cooperative maritime navigation based on causal influence detection. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250691

基于因果影响检测的多无人机海上协同导航策略优化方法

doi: 10.16383/j.aas.c250691 cstr: 32138.14.j.aas.c250691
基金项目: 国家自然科学基金(62303009, 62495083, 62236002), 安徽省教育厅高校科研项目青年项目(2025AHGXZK40374)资助
详细信息
    作者简介:

    柳文章:安徽大学人工智能学院讲师. 2022年获得东南大学控制科学与工程专业博士学位.主要研究方向为深度强化学习, 具身智能系统, 多智能体强化学习, 迁移强化学习. E-mail: wzliu@ahu.edu.cn

    陆建华:安徽大学人工智能学院硕士研究生. 主要研究方向为多智能体强化学习, 多无人机协同控制. E-mail: jhlu@stu.ahu.edu.cn

    任璐:安徽大学人工智能学院副教授. 2021年获得东南大学控制科学与工程专业博士学位. 主要研究方向为自主无人系统分布式协同控制, 深度强化学习和多智能体强化学习. E-mail: penny_lu@ahu.edu.cn

    孙长银:安徽大学人工智能学院教授. 2004年获得东南大学电子工程专业博士学位. 主要研究方向为智能控制, 飞行器控制, 模式识别和优化理论. 本文通信作者. E-mail: cysun@seu.edu.cn

Policy Optimization Method for Multi-UAV Cooperative Maritime Navigation Based on Causal Influence Detection

Funds: Supported by National Natural Science Foundation of China (62303009, 62495083, and 62236002) and Youth Research Project of Anhui Provincial Department of Education (2025AHGXZK40374)
More Information
    Author Bio:

    LIU Wen-Zhang Lecturer at the School of Artificial Intelligence, Anhui University. He received his Ph.D. degree in control science and engineering from Southeast University in 2022. His research interests include deep reinforcement learning, embodied intelligent systems, multi-agent reinforcement learning, and transfer reinforcement learning

    LU Jian-Hua Master student at the School of Artificial Intelligence, Anhui University. His research interests include multi-agent reinforcement learning and multi-UAV cooperative control

    REN Lu Associate professor at the School of Artificial Intelligence, Anhui University. She received her Ph.D. degree in control science and engineering from Southeast University in 2021. Her research interests include distributed cooperative control of autonomous unmanned systems, deep reinforcement learning, and multi-agent reinforcement learning

    SUN Chang-Yin Professor at the School of Artificial Intelligence, Anhui University. He received his Ph.D. degree in electrical engineering from Southeast University in 2004. His research interests include intelligent control, flight control, pattern recognition and optimal theory. Corresponding author of this paper

  • 摘要: 多无人机协同导航是实现高效海上协同作业的重要技术. 然而, 在广阔且动态未知的海域中, 受限感知能力与自主决策机制使无人机之间的协作关系复杂, 难以获取全局信息. 近年来, 基于集中训练与分散执行范式的多智能体强化学习在协作行为学习方面取得显著进展, 并被广泛应用于海上协同导航任务. 但由于智能体交互往往仅在特定情境下发生, 如何有效提升协作效率与探索能力仍是关键挑战. 为解决上述问题, 提出一种基于因果影响检测的多智能体近端策略优化方法. 该方法以智能体之间的因果影响为衡量准则, 引入基于协作规则设计的内在奖励机制, 利用因果推断与条件互信息来检测智能体之间在行为上的因果影响, 从而引导其优先探索对全局状态具有正向影响的动作, 强化多智能体间的合作. 实验结果表明, 所提方法表现出显著的性能提升, 尤其在海上搜救任务中展现出更高的协同效率, 验证了方法的有效性.
  • 图  1  多无人机协同导航示意图

    Fig.  1  Diagram of multi-UAV cooperative navigation

    图  2  无人机的机体固定坐标系和惯性坐标系

    Fig.  2  The body-fixed coordinate system and inertial coordinate system of UAV

    图  3  具有3个智能体的单步状态转移因果图模型

    Fig.  3  The causal graphic model of one-step state transition with three agents

    图  4  不考虑智能体$i$动作的因果图模型

    Fig.  4  The causal graphic model without considering the action of agent $i$

    图  5  $\beta$-VAE 框架示意图

    Fig.  5  Diagram of $\beta$-VAE framework

    图  6  CID-MAPPO示意图

    Fig.  6  Diagram of CID-MAPPO

    图  7  实验场景示意图

    Fig.  7  Diagram of the experimental scenes

    图  8  不同场景下各算法学习曲线

    Fig.  8  Learning curves of different algorithms under various scenarios

    图  9  部分实验场景轨迹示意图

    Fig.  9  Diagram of trajectories in some experimental scenes

    图  10  超参数消融实验结果

    Fig.  10  Results of the ablation study on hyper-parameters

    表  1  数学符号说明

    Table  1  Explanation of mathematical symbols

    数学符号符号说明
    $ \mathcal{I}=\{1,\; 2,\; \cdots,\; N\} $ 无人机集合
    $ s_t $ 时刻 $ t $ 的系统状态
    $ o_t^i $ 无人机 $ i $ 在时刻 $ t $ 的局部观测
    $ a_t^i $ 无人机 $ i $ 在时刻 $ t $ 的动作
    $ \pi^i $ 无人机 $ i $ 的策略
    $ \boldsymbol{a}_t=[a_t^1,\; \cdots,\; a_t^N] $ 联合动作
    $ \boldsymbol{a}_t \setminus a_t^i $ 去除无人机 $ i $ 动作后的联合动作
    $ P(s_{t+1} | s_t,\; \boldsymbol{a}_t) $ 状态转移概率
    $ \gamma $ 折扣因子
    $ C^i(s_t,\; s_{t+1}) $ 无人机 $ i $ 的因果影响度量
    $ D_\mathrm{KL}(\cdot \| \cdot) $ KL散度
    $ \beta $ $ \beta $ -VAE正则系数
    $ r^i_t $ 无人机 $ i $ 在时刻 $ t $ 的奖励
    $ r^i_{\mathrm{cid},\; t} $ 无人机 $ i $ 在时刻 $ t $ 的因果内在奖励
    $ \lambda $ 内在奖励权重
    下载: 导出CSV

    表  2  实验超参数设置

    Table  2  Experimental hyper-parameters setting

    参数名称 取值
    学习率 $ \alpha $ 0.000 5
    VAE模块学习率 $ \alpha_\mathrm{vae} $ 0.000 5
    折扣因子 $ \gamma $ 0.99
    裁剪系数 $ \varepsilon $ 0.2
    激活函数 ReLU
    批量数据大小 1 024
    经验回放池大小 3 200
    奖励权重 $ \lambda $ 1.0
    VAE参数 $ \beta $ 0
    蒙特卡洛采样数 $ K $ 128
    Actor网络全连接层节点数 [64, 64, 64]
    Actor网络RNN隐层节点数 64
    Critic网络全连接层节点数 [64, 64]
    Critic网络RNN隐层节点数 64
    $ \beta $ -VAE编码器隐层节点数 (状态编码) [256, 128, 64]
    $ \beta $ -VAE编码器隐层节点数 (动作编码) [64, 128]
    $ \beta $ -VAE解码器隐层节点数 [64, 128, 256]
    下载: 导出CSV

    表  3  各算法在不同实验配置下的累计回报统计结果

    Table  3  Statistical results of cumulative rewards for different algorithms under various experimental configurations

    场景 CID-MAPPO MAPPO VDN QMIX MADDPG PMIC
    场景1下3无人机 −131.89 ± 22.66 −155.67 ± 15.71 −225.21 ± 39.60 −194.05 ± 5.96 −225.02 ± 9.64 −147.94 ± 17.65
    场景1下5无人机 −342.43 ± 26.52 −377.48 ± 16.83 −442.86 ± 11.65 −548.82 ± 116.16 −425.96 ± 6.70 −371.49 ± 17.98
    场景1下7无人机 −590.88 ± 43.33 −616.58 ± 35.03 −673.14 ± 211.54 −916.50 ± 239.70 −673.55 ± 28.43 −610.09 ± 26.67
    场景2下3无人机 89.79 ± 10.45 65.35 ± 12.62 −43.16 ± 16.48 7.42 ± 32.91 −75.18 ± 29.76 74.46 ± 18.73
    场景2下5无人机 156.93 ± 16.15 128.10 ± 15.49 −117.25 ± 23.16 −49.41 ± 29.84 −120.12 ± 33.18 125.02 ± 21.03
    场景2下7无人机 259.88 ± 41.40 194.71 ± 23.77 −176.29 ± 26.21 −180.94 ± 31.54 −173.93 ± 24.84 228.79 ± 16.97
    场景3下3无人机 149.63 ± 12.43 71.39 ± 63.32 −73.31 ± 14.59 −55.00 ± 21.38 −106.13 ± 33.02 89.83 ± 48.67
    场景3下5无人机 279.15 ± 11.32 232.04 ± 23.27 −70.00 ± 29.73 55.69 ± 35.57 −141.55 ± 14.94 209.06 ± 28.05
    场景3下7无人机 332.24 ± 12.04 92.45 ± 41.34 −58.83 ± 35.12 −276.00 ± 76.42 −197.56 ± 27.56 115.33 ± 34.52
    下载: 导出CSV
  • [1] Chang Y Y, John A, Hsiung P A. Maritime UAV patrol tasks based on yolov4 object detection. In: Proceedings of the 2022 International Conference on Computational Science and Computational Intelligence. Las Vegas, USA: IEEE, 2022. 1484–1490
    [2] Xu P F, Fang Y, Jiang Q Y, Lu H F, Li G X, Zhou H. Design and research of a water quality monitoring system for aquaculture using uav integrated with 3D GIS. Advances in Transdisciplinary Engineering, DOI: 10.3233/atde241331
    [3] Sendner, Michael F. An energy-autonomous UAV swarm concept to support sea-rescue and maritime patrol missions in the mediterranean sea. Aircraft Engineering and Aerospace Technology, 2022, 94(1): 112−123 doi: 10.1108/AEAT-12-2020-0316
    [4] Nomikos N, Gkonis P K, Bithas P S, Trakadas P. A survey on UAV-aided maritime communications: Deployment considerations, applications, and future challenges. IEEE Open Journal of the Communications Society, 2022, 4: 56−78 doi: 10.1109/ojcoms.2022.3225590
    [5] Liu H D, Long X L, Li Y, Yan J J, Li M Y, Chen C, et al. Adaptive multi-UAV cooperative path planning based on novel rotation artificial potential fields. Knowledge-Based Systems, 2025, 317: Article No. 113429 doi: 10.1016/j.knosys.2025.113429
    [6] Zhao H M, Gu M X, Qiu S P, Zhao A, Deng W. Dynamic path planning for space-time optimization cooperative tasks of multiple unmanned aerial vehicles in uncertain environment. IEEE Transactions on Consumer Electronics, 2025, 71(3): 7673−7682 doi: 10.1109/TCE.2025.3593383
    [7] Hu W J, Yu Y, Liu S M, She C Y, Guo L, Vucetic B. Multi-uav coverage path planning: A distributed online cooperation method. IEEE Transactions on Vehicular Technology, 2023, 72(9): 11727−11740 doi: 10.1109/TVT.2023.3266817
    [8] Sil M, Rakshit P, Chatterjee S, Ghosh Roy A, Chowdhury A. Multi-UAV cooperative path-planning in complex terrain: A multi-objective optimization approach. IETE Journal of Research, 2025, 71(8): 2606 doi: 10.1080/03772063.2025.2497515
    [9] Sun W M, Hao M R. A survey of cooperative path planning for multiple UAVs. In: Proceedings of the 2021 International Conference on Autonomous Unmanned Systems. Singapore: Springer, 2021. 189–196
    [10] Liu W, Cai W Z, Jiang K, Cheng G R, Wang Y D, Wang J W, et al. Xuance: A comprehensive and unified deep reinforcement learning library. arXiv preprint arXiv: 2312.16248, 2023.
    [11] Alexandros T, Georgios D. A comprehensive survey on the applications of swarm intelligence and bio-inspired evolutionary strategies. Machine Learning Paradigms: Advances in Deep Learning-based Technological Applications, DOI: 10.1007/978-3-030-49724-8_15
    [12] Dario F, Claudio M. Bio-inspired Artificial Intelligence: Theories, Methods, and Technologies. USA: MIT press, 2008.
    [13] Haldorai A, Kandaswamy U. A bio-inspired swarm intelligence technique for social aware cognitive radio handovers. Computers & Electrical Engineering, 2018, 71: 925−937 doi: 10.1016/j.compeleceng.2017.09.016
    [14] 孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题. 自动化学报, 2020, 46(7): 1301−1312 doi: 10.16383/j.aas.c200159

    Sun Chang-Yin, Mu Chao-Xu. Important scientific problems of multi-agent deep reinforcement learning. Acta Automatica Sinica, 2020, 46(7): 1301−1312 doi: 10.16383/j.aas.c200159
    [15] 罗彪, 胡天萌, 周育豪, 黄廷文, 阳春华, 桂卫华. 多智能体强化学习控制与决策研究综述. 自动化学报, 2025, 51(3): 510−539 doi: 10.16383/j.aas.c240392

    Luo Biao, Hu Tian-Meng, Zhou Yu-Hao, Huang Ting-Wen, Yang Chun-Hua, Gui Wei-Hua. Survey on multi-agent reinforcement learning for control and decision-making. Acta Automatica Sinica, 2025, 51(3): 510−539 doi: 10.16383/j.aas.c240392
    [16] 陈凯, 雷一辰, 李琰泽, 方国宇, 胡子卓, 杨明实, 等. 基于改进MADDPG的多目标航迹规划方法. 北京航空航天大学学报, DOI: 10.13700/j.bh.1001-5965.2025.0636

    Chen Kai, Lei Yi-Chen, Li Yan-Ze, Fang Guo-Yu, Hu Zi-Zhuo, Yang Ming-Shi, et al. Multi-object trajectory planning method based on improved MADDPG. Journal of Beijing University of Aeronautics and Astronautics, DOI: 10.13700/j.bh.1001-5965.2025.0636
    [17] Sunehag P, Lever G, Gruslys A, Czarnecki W M, Zambaldi V, Jaderberg M, et al. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv: 1706.05296, 2017.
    [18] Rashid T, Samvelyan M, de Witt C S, Farquhar G, Foerster J, Whiteson S. Monotonic value function factorisation for deep multi-agent reinforcement learning. Journal of Machine Learning Research, 2020, 21(178): 1−51
    [19] Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I. Multi-agent actor-critic for mixed cooperative competitive environments. Advances in Neural Information Processing Systems, DOI: 10.48550/arXiv.1706.02275
    [20] Yu C, Velu A, Vinitsky E, Gao J X, Wang Y, Bayen A, Wu Y. The surprising effectiveness of PPO in cooperative multi-agent games. Advances in Neural Information Processing Systems, 2022, 35: 24611−24624 doi: 10.52202/068431-1787
    [21] Foerster J, Farquhar G, Afouras T, Nardelli N, Whiteson S. Counterfactual multi-agent policy gradients. In: Proceedings of the 2018 AAAI Conference on Artificial Intelligence. Louisiana, USA: AAAI Press, 2018. 2974–2982
    [22] Rashid T, Samvelyan M, de Witt C S, Farquhar G, Foerster J, Whiteson S. Maritime search and rescue based on group mobile computing for unmanned aerial vehicles and unmanned surface vehicles. IEEE Transactions on Industrial Informatics, 2020, 16(12): 7700−7708 doi: 10.1109/TII.2020.2974047
    [23] Lei C J, Wu S H, Yang Y, Xue J Y, Zhang Q Y. Maritime search and rescue leveraging heterogeneous units: A multi agent reinforcement learning approach. In: Proceedings of the 12th IEEE/CIC International Conference on Communications. Dalian, China: IEEE, 2018. 1–6
    [24] Lei C J, Wu S H, Yang Y, Xue J Y, Zhang Q Y. Joint trajectory and communication optimization for heterogeneous vehicles in maritime sar: Multi-agent reinforcement learning. IEEE Transactions on Vehicular Technology, 2024, 73(9): 12328−12344 doi: 10.1109/TVT.2024.3388499
    [25] Wu X, Yan Q Z, Wang J C, Zhou Y H, Huang Q L, Jiang C H. Dynamic task allocation for UAV swarms in maritime rescue scenarios based on PG-MAPPO. IEEE Internet of Things Journal, 2025, 12(18): 38073−38087 doi: 10.1109/JIOT.2025.3584767
    [26] Luo Q Y, Luan T H, Shi W S, Fan P Z. Deep reinforcement learning based computation offloading and trajectory planning for multi-UAV cooperative target search. IEEE Journal on Selected Areas in Communications, 2022, 41(2): 504−520 doi: 10.23919/ccc64809.2025.11179539
    [27] Hou Y K, Zhao J, Zhang R Q, Cheng X, Yang L Q. UAV swarm cooperative target search: A multi-agent reinforcement learning approach. IEEE Transactions on Intelligent Vehicles, 2022, 9(1): 568−578
    [28] Panait L, Luke S. Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems, 2005, 11(3): 387−434 doi: 10.1007/s10458-005-2631-2
    [29] Liu M H, Zhou M, Zhang W N, Zhuang Y Z, Wang J, Liu W L, et al. Multi-agent interactions modeling with correlated policies. arXiv preprint arXiv: 2001.03415, 2020.
    [30] Du X, Ye Y T, Zhang P Y, Yang Y N, Chen M S, Wang T. Situation-dependent causal influence-based cooperative multi-agent reinforcement learning. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. Vancouver, Canada: AAAI, 2024. 17362–17370
    [31] Li P Y, Tang H Y, Yang T P, Hao X T, Sang T, Zheng Y, et al. PMIC: Improving multi-agent reinforcement learning with progressive mutual information collaboration. In: Proceedings of the 39th International Conference on Machine Learning. Seoul, South Korea: PMLR, 2022. 12979–12997
    [32] Kim W J, Jung W Y, Cho M S, Sung Y C. A variational approach to mutual information-based coordination for multi-agent reinforcement learning. arXiv preprint arXiv: 2303.00451, 2023.
    [33] Kim W J, Jung W Y, Cho M S, Sung Y C. Signal instructed coordination in cooperative multi-agent reinforcement learning. arXiv preprint arXiv: 1909.04224, 2019.
    [34] Seitzer M, Schölkopf B, Martius G. Causal influence detection for improving efficiency in reinforcement learning. In: Proceedings of the 2021 Advances in Neural Information Processing Systems. Virtual Event: NeurlPS, 2021. 22905–22918
    [35] Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, et al. Beta–VAE: learning basic visual concepts with a constrained variational framework. In: Proceedings of the 2017 International Conference on Learning Representations. Toulon, France: ICLR, 2017. 60–81
  • 加载中
计量
  • 文章访问数:  6
  • HTML全文浏览量:  5
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-11-30
  • 录用日期:  2026-03-16
  • 网络出版日期:  2026-04-22

目录

    /

    返回文章
    返回