2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

含动力学奖励的航天器编队深度强化学习控制

金伟成 陈提 胡海岩

金伟成, 陈提, 胡海岩. 含动力学奖励的航天器编队深度强化学习控制. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250202
引用本文: 金伟成, 陈提, 胡海岩. 含动力学奖励的航天器编队深度强化学习控制. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250202
Jin Wei-Cheng, Chen Ti, Hu Hai-Yan. Deep reinforcement learning control for spacecraft formation with dynamical reward. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250202
Citation: Jin Wei-Cheng, Chen Ti, Hu Hai-Yan. Deep reinforcement learning control for spacecraft formation with dynamical reward. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250202

含动力学奖励的航天器编队深度强化学习控制

doi: 10.16383/j.aas.c250202 cstr: 32138.14.j.aas.c250202
基金项目: 国家重点研发计划(2022YFC2204800), 国家自然科学基金(12494562, 12472015)资助
详细信息
    作者简介:

    金伟成:南京航空航天大学航空学院博士研究生. 2021年获得南京航空航天大学工程力学专业学士学位.主要研究方向为航天器集群的导航, 动力学与控制, 马尔科夫过程, 分布式系统控制. E-mail: jinweich@nuaa.edu.cn

    陈提:南京航空航天大学教授. 2017年获得南京航空航天大学动力学与控制专业博士学位.主要研究方向为在轨自主组装, 绳系卫星, 复杂结构的动力学与控制. E-mail: chenti@nuaa.edu.cn

    胡海岩:南京航空航天大学教授. 1988年获得南京航空航天大学固体力学专业博士学位.主要研究方向为柔性结构的时滞控制, 颤振主动抑制, 空间结构展开动力学. 本文通信作者. E-mail: hhyae@nuaa.edu.cn

Deep Reinforcement Learning Control for Spacecraft Formation With Dynamical Reward

Funds: Supported by National Key Research and Development Program of China (2022YFC2204800) and National Natural Science Foundation of China (12494562, 12472015)
More Information
    Author Bio:

    JIN Wei-Cheng Ph.D. candidate at dynamics and control with the College of Aerospace Engineering, Nanjing University of Aeronautics and Astronautics. He received his bachelor degree in engineering mechanics from Nanjing University of Aeronautics and Astronautics in 2021. His research interest covers include guidance, dynamics and control for spacecraft swarm, Markov process, and distributed control system

    CHEN Ti Professor at Nanjing University of Aeronautics and Astronautics. He received his Ph.D. degree in dynamics and control from Nanjing University of Aeronautics and Astronautics in 2017. His research interest covers in-orbit autonomous assembly, tethered satellite, and complex structure dynamics and control

    HU Hai-Yan  Professor at Nanjing University of Aeronautics and Astronautics. He received his Ph.D. degree in solid mechanics from Nanjing University of Aeronautics and Astronautics in 1988. His research interest covers the delayed control of flexible structures, the active flutter suppression of aircraft structures, and the deployment dynamics of space structures. Corresponding author of this paper

  • 摘要: 提出了一种航天器编队的深度强化学习控制方法. 该方法通过引入动力学奖励, 考虑轨迹的动力学可行性并优化燃料消耗量. 在训练环境中, 引入$J_{2}$摄动相对动力学模型, 基于近端策略优化算法, 将航天器的局部观测信息作为策略网络和评价网络的输入. 策略网络输出航天器的期望位置和速度, 结合动力学模型限制策略任意动作之间的转换控制, 使输出轨迹考虑动力学可行性. 评价网络基于局部观测信息估计由动力学模型限制的优势函数, 从而辅助策略网络更新参数. 进一步地, 以燃料消耗量的负数作为动力学奖励, 结合避撞和任务相关奖励后, 训练得到的策略网络在完成航天器编队任务的同时优化了燃料消耗.
  • 图  1  航天器自主编队示意图

    Fig.  1  Schematic diagram of autonomous shape formation of spacecraft

    图  2  部分可观马尔科夫博弈

    Fig.  2  Partially observable Markov games

    图  3  所提方法流程图

    Fig.  3  Flowchart of the proposed method

    图  4  训练环境渲染图

    Fig.  4  Rendering graphs of the training environment

    图  5  训练架构示意图

    Fig.  5  Schematic diagram of the training framework

    图  6  局部观测信息

    Fig.  6  Local observed information

    图  7  有无动力学对比图

    Fig.  7  Comparison graph with and without dynamic

    图  8  平均奖励随训练轮数变化图

    Fig.  8  Graph of average reward over training epoch

    图  9  有无动力学奖励的轨迹对比

    Fig.  9  Comparison of trajectories with or without dynamical reward

    图  10  策略在不同场景下的渲染图

    Fig.  10  Rendering graphs in different scenarios

    表  1  部分超参数取值

    Table  1  The values of some hyperparameters

    参数符号含义取值
    $p$并行训练环境数128
    $\sigma $策略信息熵加权系数$\{3\times10^{-3},\;3\times10^{-5}\}$
    $\varepsilon,\; \delta $裁剪超参数0.2
    $\gamma $折扣因子0.99
    ${{u}_{\min }}$控制加速度下限$-$0.001m/s$^2$
    ${{u}_{\max }}$控制加速度上限0.001m/s$^2$
    $\dim\left({\boldsymbol{l}}\right)$低功耗雷达检测数据数90
    ${{r}_{\text{b}}}$与边界碰撞的奖励$-$500
    ${{r}_{\text{inter}}}$相互碰撞的奖励$-$10
    ${{r}_{\text{base}}},\; {{r}_{\text{inc}}}$期望奖励相关超参数1, 35
    ${{\alpha }_{1}}$期望奖励加权系数0.6
    ${{l}_{r}}$学习率$\{2\times10^{-4},\; 0\}$
    $a$半长轴7100km
    $\omega $近地点幅角$-{{20}^{\circ }}$
    $f$真近点角${{20}^{\circ }}$
    $\Omega $升交点经度$0^{\circ }$
    $e$离心率0.05
    $i_{o}$轨道倾角${{15}^{\circ }}$
    $k_{p},\; k_{i},\; k_{d}$PID跟踪器参数1.0, 0.01, 2.0
    下载: 导出CSV

    表  2  500轮后不同动力学奖励占比的结果

    Table  2  Results under different percentages of dynamical reward after 500 epochs

    $\alpha_{2} $完成率$\left(\% \right)$平均完成步数燃料消耗率$\left(\% \right)$燃料消耗绝对值
    -91.5142.95
    084.5123.9561.2175.88
    0.569.5125.4258.1772.96
    1.064.0127.7946.8359.84
    1.50
    下载: 导出CSV
  • [1] Xue Z H, Liu J G, Wu C C, Tong Y C. Review of in-space assembly technologies. Chinese Journal of Aeronautics, 2021, 34(11): 21−47 doi: 10.1016/j.cja.2020.09.043
    [2] 马亚杰, 姜斌, 任好. 航天器位姿运动一体化直接自适应容错控制研究. 自动化学报, 2023, 49(3): 678−686

    Ma Ya-Jie, Jiang Bin, Ren Hao. Adaptive direct fault-tolerant control design for spacecraft integrated attitude and orbit system. Acta Automatica Sinica, 2023, 49(3): 678−686
    [3] Lymer J, Hanson M, Tadros A, Boccio J, Hollenstein B, Emerick K, et al. Commercial application of in-space assembly. In: Proceedings of the AIAA SPACE. Long Beach, USA: AIAA, 2016. 5236-5253
    [4] Bartlett R O. NASA standard multimission modular spacecraft for future space exploration. In: Proceedings of the 16th Goddard Memorial Symposium. Washington, USA: American Astronautical Society and Deutsche Gesellschaft fuer Luft- und Raumfahrt, 1978.
    [5] 韦正涛. 模块化航天器自主组装控制及地面实验研究[博士学位论文], 南京航空航天大学, 中国, 2023

    Wei Zhengtao. Autonomous Assembly Control and Ground Experiment of Modular Spacecraft[Ph. D. dissertation], Nanjing University of Aeronautics and Astronautics, China, 2023
    [6] Dennison K, Stacey N, D'Amico S. Autonomous asteroid characterization through nanosatellite swarming. IEEE Transactions on Aerospace and Electronic Systems, 2023, 59(4): 4604−4624 doi: 10.1109/TAES.2023.3245997
    [7] Zheng S K, Li T J, Zhao J, Ma X F, Zhu J L, Huang Z R, et al. Deployment impact experiment and dynamic analysis of modular truss antenna. International Journal of Aerospace Engineering, 2022, 2022: Article No. 2038932
    [8] Burns R, McLaughlin C A, Leitner J, Martin M. TechSat 21: Formation design, control, and simulation. In: Proceedings of the IEEE Aerospace Conference. Proceedings (Cat. No.00TH8484). Big Sky, USA: IEEE, 2000. 19-25
    [9] Bonin G, Roth N, Armitage S, Newman J, Risi B, Zee R E. CanX- 4 and CanX- 5 precision formation flight: Mission accomplished! In: Proceedings of 29th Annual AIAA/USU Conference on Small Satellites. 2015. 查阅网上资料, 未找到出版信息, 请补充)
    [10] 郑重, 李鹏, 钱默抒. 具有角速度和输入约束的航天器姿态协同控制. 自动化学报, 2021, 47(6): 1444−1452

    Zheng Zhong, Li Peng, Qian Mo-Shu. Spacecraft attitude coordination control with angular velocity and input constraints. Acta Automatica Sinica, 2021, 47(6): 1444−1452
    [11] Foust R C, Lupu E S, Nakka Y K, Chung S J, Hadaegh F Y. Autonomous in-orbit satellite assembly from a modular heterogeneous swarm. Acta Astronautica, 2020, 169: 191−205 doi: 10.1016/j.actaastro.2020.01.006
    [12] Camacho E F, Bordons C. Model Predictive Control. London, UK: Springer, 2007.
    [13] Ortolano N, Geller D K, Avery A. Autonomous optimal trajectory planning for orbital rendezvous, satellite inspection, and final approach based on convex optimization. The Journal of the Astronautical Sciences, 2021, 68(2): 444−479 doi: 10.1007/s40295-021-00260-5
    [14] Basu H, Pedari Y, Almassalkhi M, Ossareh H R. Computationally efficient collision-free trajectory planning of satellite swarms under unmodeled orbital perturbations. Journal of Guidance, Control, and Dynamics, 2023, 46(8): 1548−1563
    [15] 于杰. 基于深度强化学习的多智能体协同包围算法研究[硕士学位论文], 哈尔滨理工大学, 中国, 2024

    Yu Jie. Research on Multi-agent Cooperative Encirclement Algorithm Based on Deep Reinforcement Learning[Master dissertation], Harbin University of Science and Technology, China, 2024
    [16] 王龙, 黄锋. 多智能体博弈、学习与控制. 自动化学报, 2023, 49(3): 580−613

    Wang Long, Huang Feng. An interdisciplinary survey of multi-agent games, learning, and control. Acta Automatica Sinica, 2023, 49(3): 580−613
    [17] 赵春宇, 赖俊. 元强化学习综述. 计算机应用研究, 2023, 40(1): 1−10

    Zhao Chun-Yu, Lai Jun. Survey on meta reinforcement learning. Application Research of Computers, 2023, 40(1): 1−10
    [18] Luo B, Wu H N, Huang T W. Off-policy reinforcement learning for ${H_{\infty}}$ control design. IEEE Transactions on Cybernetics, 2015, 45(1): 65−76 doi: 10.1109/TCYB.2014.2319577
    [19] Vamvoudakis K G, Lewis F L. Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. International Journal of Robust and Nonlinear Control, 2012, 22(13): 1460−1483 doi: 10.1002/rnc.1760
    [20] Yan C, Xiang X J, Wang C. Fixed-wing UAVs flocking in continuous spaces: A deep reinforcement learning approach. Robotics and Autonomous Systems, 2020, 131: Article No. 103594 doi: 10.1016/j.robot.2020.103594
    [21] Xu D, Guo Y X, Yu Z Y, Wang Z F, Lan R Z, Zhao R H, et al. PPO-exp: Keeping fixed-wing UAV formation with deep reinforcement learning. Drones, 2023, 7(1): Article No. 28
    [22] Chen Y F, Liu M, Everett M, How J P. Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Singapore, Singapore: IEEE, 2017. 285−292
    [23] 胡成. 基于深度强化学习的多智能体协作关键技术研究[博士学位论文], 北京邮电大学, 中国, 2025

    Hu Cheng. Research on Key Technologies of Multi-agent Collaboration Based on Deep Reinforcement Learning[Ph. D. dissertation], Beijing University of Posts and Telecommunications, China, 2025
    [24] Wang D W, Wu B L, Poh E K. Satellite Formation Flying. Singapore, Singapore: Springer, 2017.
    [25] Hansen E A, Bernstein D S, Zilberstein S. Dynamic programming for partially observable stochastic games. In: Proceedings of the 19th National Conference on Artifical Intelligence. San Jose, USA: AAAI, 2004. 709−715
    [26] 温广辉, 杨涛, 周佳玲, 付俊杰, 徐磊. 强化学习与自适应动态规划:从基础理论到多智能体系统中的应用进展综述. 控制与决策, 2023, 38(5): 1200−1230

    Wen Guang-Hui, Yang Tao, Zhou Jia-Ling, Fu Jun-Jie, Xu Lei. Reinforcement learning and adaptive/approximate dynamic programming: A survey from theory to applications in multi-agent systems. Control and Decision, 2023, 38(5): 1200−1230
    [27] Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv preprint arXiv: 1707.06347, 2017.
    [28] Yu C, Velu A, Vinitsky E, Gao J X, Wang Y, Bayen A, et al. The surprising effectiveness of PPO in cooperative multi-agent games. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans, USA: Curran Associates Inc., 2022. Article No. 1787
    [29] Zhao S Y. Mathematical Foundations of Reinforcement Learning. Singapore, Singapore: Springer, 2025.
  • 加载中
计量
  • 文章访问数:  8
  • HTML全文浏览量:  4
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-05-07
  • 录用日期:  2025-08-28
  • 网络出版日期:  2025-09-19

目录

    /

    返回文章
    返回