• 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

连续时间系统混合迭代鲁棒自适应评判控制

王鼎 刘奥 乔俊飞

王鼎, 刘奥, 乔俊飞. 连续时间系统混合迭代鲁棒自适应评判控制. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250327
引用本文: 王鼎, 刘奥, 乔俊飞. 连续时间系统混合迭代鲁棒自适应评判控制. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250327
Wang Ding, Liu Ao, Qiao Jun-Fei. Robust adaptive critic control with hybrid iteration for continuous-time systems. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250327
Citation: Wang Ding, Liu Ao, Qiao Jun-Fei. Robust adaptive critic control with hybrid iteration for continuous-time systems. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250327

连续时间系统混合迭代鲁棒自适应评判控制

doi: 10.16383/j.aas.c250327 cstr: 32138.14.j.aas.c250327
基金项目: 国家自然科学基金(62473012, 62222301, 62021003), 新一代人工智能国家科技重大专项(2021ZD0112302, 2021ZD0112301), 北京市自然科学基金(F251019)资助
详细信息
    作者简介:

    王鼎:北京工业大学信息科学技术学院教授. 2012 年获得中国科学院自动化研究所博士学位. 主要研究方向为强化学习与智能控制. 本文通信作者. E-mail: dingwang@bjut.edu.cn

    刘奥:北京工业大学信息科学技术学院博士研究生. 主要研究方向为强化学习和智能控制. E-mail: liuao@emails.bjut.edu.cn

    乔俊飞:北京工业大学信息科学技术学院教授. 主要研究方向为污水处理过程智能控制和神经网络结构设计与优化. E-mail: adqiao@bjut.edu.cn

Robust Adaptive Critic Control With Hybrid Iteration for Continuous-time Systems

Funds: Supported by National Natural Science Foundation of China (62473012, 62222301, 62021003), National Science and Technology Major Project (2021ZD0112302, 2021ZD0112301), Beijing Natural Science Foundation (F251019).
More Information
    Author Bio:

    WANG Ding Professor at the School of Information Science and Technology, Beijing University of Technology. He received his Ph.D. degree from Institute of Automation, Chinese Academy of Sciences in 2012. His research interests include reinforcement learning and intelligent control. Corresponding author of this paper

    LIU Ao Ph.D. candidate at the School of Information Science and Technology, Beijing University of Technology. Her research interests include reinforcement learning and intelligent control

    QIAO Jun-Fei Professor at the School of Information Science and Technology, Beijing University of Technology. His research interests include intelligent control of wastewater treatment processes, structure design and optimization of neural networks

  • 摘要: 针对存在扰动的连续时间非线性系统, 设计一种结合混合迭代机制和自适应评判框架的鲁棒控制方法. 通过优化传统值迭代算法, 实现加速学习并放宽预设条件的目标. 引入可调参数确保控制策略在迭代过程中的可容许性, 从而放松加速因子的设置条件. 结合广义策略迭代的思想, 构建新型混合迭代机制, 从而获得更优的收敛特性. 最后, 利用两个仿真实例验证算法性能. 针对线性系统的仿真结果表明, 算法具有较高的收敛精度. 在导弹自动驾驶仪系统仿真中, 相对于值迭代方法, 本文算法不依赖初始可容许控制策略, 同时能使收敛速度提高约49%.
  • 图  1  混合迭代算法迭代过程示意图

    Fig.  1  Schematic diagram of the iteration process of hybrid iteration algorithm

    图  2  混合迭代算法框架

    Fig.  2  The framework of hybrid iteration algorithm

    图  3  线性系统评判网络权值范数曲线

    Fig.  3  The curve of the norm of the critic neural network weights for the linear system

    图  4  线性系统状态曲线

    Fig.  4  The curve of the system states for the linear system

    图  5  线性系统控制输入曲线

    Fig.  5  The curve of the control input for the linear system

    图  6  导弹自动驾驶仪系统评判网络权值范数曲线

    Fig.  6  The curve of the norm of the critic neural network weights for the missile autopilot system

    图  7  导弹自动驾驶仪系统状态曲线

    Fig.  7  The curve of the states for the missile autopilot system

    图  8  导弹自动驾驶仪系统控制输入曲线

    Fig.  8  The curve of the control input for the missile autopilot system

    图  9  导弹自动驾驶仪系统扰动律曲线

    Fig.  9  The curve of the disturbance law of the missile autopilot system

    表  1  线性系统算法性能指标对比

    Table  1  The comparison of performance indicators for the linear system

    算法CPU运行时间$ ({s}) $迭代步数
    混合迭代自适应评判$ 11.170\,\;1 $$ 38 $
    改进值迭代$ 27.419\,\;7 $$ 55 $
    下载: 导出CSV

    表  2  导弹自动驾驶仪系统参数

    Table  2  The parameters of the missile autopilot system

    符号参数数值
    $ \mathscr{M}/\text{kg} $质量4410
    $ \mathscr{V}/\rm{\left(m\cdot s^{-1}\right)} $速度947.6
    $ \mathscr{P}/\rm{\left(kg\cdot m^{2}\right)} $俯仰力矩247
    $ \mathscr{Q}/\rm{\left(N\cdot m^{-2}\right)} $动压293
    $ S/\rm{m^{2}} $参考面积0.04087
    $ D/\rm{m} $参考直径0.229
    $ \mathscr{G}/\rm{\left(m\cdot s^{-2}\right)} $重力加速度9.8
    $ \mathscr{H} $时间常数0.1
    下载: 导出CSV

    表  3  导弹自动驾驶仪系统算法性能指标对比

    Table  3  The comparison of performance indicators for the missile autopilot system

    算法CPU运行时间$ ({s}) $迭代步数
    混合迭代自适应评判$ 318.088\,\;1 $$ 79 $
    改进值迭代$ 668.875\,\;9 $$ 161 $
    下载: 导出CSV
  • [1] Yang R H, Zhang H, Feng G, Yan H C, Wang Z P. Robust cooperative output regulation of multi-agent systems via adaptive event-triggered control. Automatica, 2019, 102: 129−136 doi: 10.1016/j.automatica.2019.01.001
    [2] Zhang Y, Edwards C, Belmont M, Li G. Robust model predictive control for constrained linear system based on a sliding mode disturbance observer. Automatica, 2023, 154: Article No. 111101 doi: 10.1016/j.automatica.2023.111101
    [3] Pal A, Naskar A K. Mixed $H_2/H_\infty$ robust formation tracking control of linear multi-agent system using output information. Systems & Control Letters, 2024, 188: Article No. 105802
    [4] Wang D, Gao N, Liu D R, Li J N, Lewis F L. Recent progress in reinforcement learning and adaptive dynamic programming for advanced control applications. IEEE/CAA Journal of Automatica Sinica, 2024, 11(1): 18−36 doi: 10.1109/JAS.2023.123843
    [5] Werbos P J. Neural networks for control and system identification. In: Proceedings of the 28th IEEE Conference on Decision and Control. Tampa, FL, USA: IEEE, 1989. 260-265
    [6] Yue S, Deng Y H, Wang G B, Ren J, Zhang Y X. Federated offline reinforcement learning with proximal policy evaluation. Chinese Journal of Electronics, 2024, 33(6): 1360−1372 doi: 10.23919/cje.2023.00.288
    [7] Liu D R, Wang D, Li H L. Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(2): 418−428 doi: 10.1109/TNNLS.2013.2280013
    [8] Zhao F Y, Gao W N, Liu T F, Jiang Z P. Event-triggered robust adaptive dynamic programming with output feedback for large-scale systems. IEEE Transactions on Control of Network Systems, 2023, 10(1): 63−74 doi: 10.1109/TCNS.2022.3186623
    [9] 王鼎, 范文倩, 刘奥. 未知不匹配互联系统的非对称输入约束分散控制器设计. 工程科学学报, 2024, 46(12): 2269−2278

    Wang Ding, Fan Wen-Qian, Liu Ao. Decentralized controller design with asymmetric input constraints for unknown unmatched interconnected systems. Chinese Journal of Engineering, 2024, 46(12): 2269−2278
    [10] Lin M D, Zhao B, Liu D R. Policy gradient adaptive critic designs for model-free optimal tracking control with experience replay. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52(6): 3692−3703 doi: 10.1109/TSMC.2021.3071968
    [11] Li D D, Dong J X. Approximate optimal robust tracking control based on state error and derivative without initial admissible input. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024, 54(2): 1059−1069 doi: 10.1109/TSMC.2023.3320653
    [12] Dahal R, Kar I. Robust tracking control of nonlinear unmatched uncertain systems via event-based adaptive dynamic programming. Nonlinear Dynamics, 2022, 109: 2831−2850 doi: 10.1007/s11071-022-07594-1
    [13] Yang X, Xu M M, Wei Q L. Adaptive dynamic programming for nonlinear-constrained $H_\infty$ control. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53(7): 4393−4403 doi: 10.1109/TSMC.2023.3247888
    [14] Zhang Y, Zhao B, Liu D R, Zhang S C. Adaptive dynamic programming-based event-triggered robust control for multiplayer nonzero-sum games with unknown dynamics. IEEE Transactions on Cybernetics, 2023, 53(8): 5151−5164 doi: 10.1109/TCYB.2022.3175650
    [15] 王鼎, 王将宇, 乔俊飞. 融合自适应评判的随机系统数据驱动策略优化. 自动化学报, 2024, 50(5): 980−990 doi: 10.16383/j.aas.c230678

    Wang Ding, Wang Jiang-Yu, Qiao Jun-Fei. Data-driven policy optimization for stochastic systems involving adaptive critic. Acta Automatica Sinica, 2024, 50(5): 980−990 doi: 10.16383/j.aas.c230678
    [16] Wang D, Ren J, Huang H M, Qiao J F. Particle swarm optimization for adaptive-critic feedback control with power system applications. Chinese Journal of Electronics, 2025, 34(4): 1265−1274 doi: 10.23919/cje.2024.00.287
    [17] Wang D, Hu L Z, Zhao M M, Qiao J F. Adaptive critic for event-triggered unknown nonlinear optimal tracking design with wastewater treatment applications. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(9): 6276−6288 doi: 10.1109/TNNLS.2021.3135405
    [18] 王鼎, 赵慧玲, 李鑫. 基于多目标粒子群优化的污水处理系统自适应评判控制. 工程科学学报, 2024, 46(5): 908−917

    Wang Ding, Zhao Hui-Ling, Li Xin. Adaptive critic control for wastewater treatment systems based on multiobjective particle swarm optimization. Chinese Journal of Engineering, 2024, 46(5): 908−917
    [19] Cao W W, Yang Q M, Meng W C, Xie S Z. Data-based robust adaptive dynamic programming for balancing control performance and energy consumption in wastewater treatment process. IEEE Transactions on Industrial Informatics, 2024, 20(4): 6622−6630 doi: 10.1109/TII.2023.3346468
    [20] Wang D, Wang J Y, Zhao M M, Xin P, Qiao J F. Adaptive multi-step evaluation design with stability guarantee for discrete-time optimal learning control. IEEE/CAA Journal of Automatica Sinica, 2023, 10(9): 1797−1809 doi: 10.1109/JAS.2023.123684
    [21] Liu D R, Wei Q L. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(3): 621−634 doi: 10.1109/TNNLS.2013.2281663
    [22] Wei Q L, Zhou T M, Lu J W, Liu Y, Su S, Xiao J. Continuous-time stochastic policy iteration of adaptive dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53(10): 6375−6387 doi: 10.1109/TSMC.2023.3284612
    [23] Wang D, Zhao M M, Ha M M, Qiao J F. Stability and admissibility analysis for zero-sum games under general value iteration formulation. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(11): 8707−8718 doi: 10.1109/TNNLS.2022.3152268
    [24] Wang D, Mu C X. Adaptive-critic-based robust trajectory tracking of uncertain dynamics and its application to a spring-mass-damper system. IEEE Transactions on Industrial Electronics, 2018, 65(1): 654−663 doi: 10.1109/TIE.2017.2722424
    [25] Duan J L, Li J, Ge Q, Li S E, Bujarbaruah M, Ma F, et al. Relaxed actor-critic with convergence guarantees for continuous-time optimal control of nonlinear systems. IEEE Transactions on Intelligent Vehicles, 2023, 8(5): 3299−3311 doi: 10.1109/TIV.2023.3255264
    [26] Bian T, Jiang Z P. Reinforcement learning and adaptive optimal control for continuous-time nonlinear systems: A value iteration approach. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(7): 2781−2790 doi: 10.1109/TNNLS.2020.3045087
    [27] Xiao G Y, Zhang H G. Convergence analysis of value iteration adaptive dynamic programming for continuous-time nonlinear systems. IEEE Transactions on Cybernetics, 2024, 54(3): 1639−1649 doi: 10.1109/TCYB.2022.3232599
    [28] Ha M M, Wang D, Liu D R. Novel discounted adaptive critic control designs with accelerated learning formulation. IEEE Transactions on Cybernetics, 2024, 54(5): 3003−3016 doi: 10.1109/TCYB.2022.3233593
    [29] Qasem O, Gutierrez H, Gao W N. Experimental validation of data-driven adaptive optimal control for continuous-time systems via hybrid iteration: An application to rotary inverted pendulum. IEEE Transactions on Industrial Electronics, 2024, 71(6): 6210−6220 doi: 10.1109/TIE.2023.3292873
    [30] Liu A, Wang D, He Y Y, Ye K, Qiao J F. Value-iteration-based robust adaptive critic for disturbed non-affine continuous-time systems. International Journal of Robust and Nonlinear Control, 2025, 35(6): 4407−4415
  • 加载中
计量
  • 文章访问数:  11
  • HTML全文浏览量:  7
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-07-17
  • 录用日期:  2025-10-14
  • 网络出版日期:  2025-12-28

目录

    /

    返回文章
    返回