2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

带有输入时滞的非线性系统基于学习的输出反馈控制

刘思彤 高伟男 姜钟平

刘思彤, 高伟男, 姜钟平. 带有输入时滞的非线性系统基于学习的输出反馈控制. 自动化学报, 2025, 51(10): 1001−1009 doi: 10.16383/j.aas.c250101
引用本文: 刘思彤, 高伟男, 姜钟平. 带有输入时滞的非线性系统基于学习的输出反馈控制. 自动化学报, 2025, 51(10): 1001−1009 doi: 10.16383/j.aas.c250101
Liu Si-Tong, Gao Wei-Nan, Jiang Zhong-Ping. Learning-based output-feedback control for nonlinear systems with input time-delay. Acta Automatica Sinica, 2025, 51(10): 1001−1009 doi: 10.16383/j.aas.c250101
Citation: Liu Si-Tong, Gao Wei-Nan, Jiang Zhong-Ping. Learning-based output-feedback control for nonlinear systems with input time-delay. Acta Automatica Sinica, 2025, 51(10): 1001−1009 doi: 10.16383/j.aas.c250101

带有输入时滞的非线性系统基于学习的输出反馈控制

doi: 10.16383/j.aas.c250101 cstr: 32138.14.j.aas.c250101
基金项目: 国家重点研发计划(2024YFA1012702), 国家自然科学基金(62373090, 62521001)和辽宁省兴辽英才计划(XLYC2403177)资助
详细信息
    作者简介:

    刘思彤:东北大学流程工业综合自动化国家重点实验室硕士研究生. 2023年获得东北大学机械工程专业学士学位. 主要研究方向为自适应动态规划, 输出反馈, 最优控制和强化学习. E-mail: 2370761@stu.neu.edu.cn

    高伟男:东北大学流程工业综合自动化全国重点实验室教授. 2017 年获美国纽约大学博士学位. 主要研究方向为人工智能, 自适应动态规划, 优化控制和输出调节. 本文通信作者. E-mail: gaown@mail.neu.edu.cn

    姜钟平:欧洲科学院外籍院士, 美国 纽约大学杰出教授, IEEE Fellow, IFAC Fellow. 1993 年获法国巴黎高等矿业 大学自动控制与数学博士学位. 主要 研究方向为稳定性理论, 鲁棒/自适 应/分布式非线性控制, 鲁棒自适应 动态规划, 强化学习及其在信息, 机 械和生物系统中的应用. E-mail: zjiang@nyu.edu

Learning-based Output-feedback Control for Nonlinear Systems With Input Time-delay

Funds: Supported by National Key Research and Development Program of China (2024YFA1012702), National Natural Science Foundation of China (62373090,62521001), and the Liaoning Revitalization Talents Program under Grant (No. XLYC2403177)
More Information
    Author Bio:

    LIU Si-Tong Master student at the State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University. He received his bachelor degree in mechanical engineering from Northeastern University in 2023. His research interest covers adaptive dynamic programming, output feedback, optimal control, and reinforcement learning

    GAO Wei-Nan Professor at the State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University. He received his Ph.D. degree from New York University, USA in 2017. His research interest covers artificial intelligence, adaptive dynamic programming, optimal control, and output regulation. Corresponding author of this paper

    JIANG Zhong-Ping Foreign Member of the Academia Europaea (Academy of Europe), Institute Professor at New York University Tandon School of Engineering, IEEE Fellow, IFAC Fellow. He received his Ph.D. degree in automatic control and mathematics from the Ecole des Mines de Paris, France in 1993. His research interest covers stability theory, robust/adaptive/distributed nonlinear control, robust adaptive dynamic programming, reinforcement learning and their applications in information, mechanical, and biological systems

  • 摘要: 针对具有输入时滞的非线性系统直接自适应最优控制问题, 提出了一种新的数据驱动输出反馈控制方法. 该方法通过融合Q学习与值迭代和策略迭代, 在学习过程中无需依赖系统动力学知识. 在系统满足一致可观性的条件下, 提出了一种基于输出数据和带有时滞的输入数据的系统状态重构方法, 基于值迭代和策略迭代来学习自适应最优控制策略. 将该方法应用于范德波尔振荡器这一经典非线性系统的控制, 并通过仿真结果充分验证了该方法的有效性.
  • 图  1  时滞系统的输出反馈控制器设计思路

    Fig.  1  Design approach for output feedback controllers with time delay

    图  2  权重范数随迭代的变化情况

    Fig.  2  Variation of weighting paradigm with iteration

    图  3  考虑时滞的输出反馈下闭环系统的输入输出轨迹

    Fig.  3  The input-output trajectories of the closed-loop system under output feedback with time delay

    图  4  不考虑时滞后闭环系统的输入输出轨迹)

    Fig.  4  The input-output trajectories of the closed-loop system under output feedback without time delay

    图  5  $Q$函数迭代前后对对比(除$u_k,\;u_{k-1}$外其他参数为0)

    Fig.  5  Comparison of the $Q$ function before and after iteration (with other parameters set to 0 except for $u_k$ and $u_{k-1}$)

    图  8  $Q$函数迭代前后对比(除$u_{k-2},\;u_{k-3}$外其他参数为0)

    Fig.  8  Comparison of the $Q$ function before and after iteration (with other parameters set to 0 except for $u_{k-2}$ and $u_{k-3}$

    图  6  $Q$函数迭代前后对比(除$u_k,\;u_{k-3}$外其他参数为0)

    Fig.  6  Comparison of the $Q$ function before and after iteration (with other parameters set to 0 except for $u_k$ and $u_{k-3}$)

    图  7  $Q$函数迭代前后对比(除$u_{k-1},\;u_{k-2}$外其他参数为0)

    Fig.  7  Comparison of the $Q$ function before and after iteration (with other parameters set to 0 except for $u_{k-1}$ and $u_{k-2}$

  • [1] Lewis F L, Vrabie D, Syrmos V L. Optimal control. Hoboken: Wiley, 2012.
    [2] Gao W, Jiang Z P, Chai T. Bridging the gap between reinforcement learning and nonlinear output-feedback control. In: 2024 43rd Chinese Control Conference(CCC), Kunming, China: 2024, 2425-2431.
    [3] Gao W, Jiang Z P. Data-driven cooperative output regulation of multi-agent systems under distributed denial of service attacks. Science China Information Sciences, 2023, 6(9): 1869−1919
    [4] Hou Z, Wang Z. From model-based control to data-driven control: Survey, classification and perspective. Information Sciences, 2013, 235: 3−35 doi: 10.1016/j.ins.2012.07.014
    [5] Bu X, Yu Q, Hou Z, Qian W. Model Free Adaptive Iterative Learning Consensus Tracking Control for a Class of Nonlinear Multiagent Systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2019, 49(4): 677−686 doi: 10.1109/TSMC.2017.2734799
    [6] Hou Z, Xiong S. On Model-Free Adaptive Control and Its Stability Analysis. IEEE Transactions on Automatic Control, 2019, 64(11): 4555−4569 doi: 10.1109/TAC.2019.2894586
    [7] Liu S, Lin G, Ji H, Jin S, Hou Z. A Novel Enhanced Data-Driven Model-Free Adaptive Control Scheme for Path Tracking of Autonomous Vehicles. IEEE Transactions on Intelligent Transportation Systems, 2025, 26(1): 579−590 doi: 10.1109/TITS.2024.3487299
    [8] Xiong S, Hou Z. Model-Free Adaptive Control for Unknown MIMO Nonaffine Nonlinear Discrete-Time Systems With Experimental Validation. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(4): 1727−1739 doi: 10.1109/TNNLS.2020.3043711
    [9] 董昱辰, 高伟男, 姜钟平. 基于分布式自适应内模的多智能体系统协同最优输出调节. 自动化学报, 2025, 51(3): 1−14

    Dong Y, Gao W, Jiang Z P. Cooperative optimal output regulation for multi-agent systems based on distributed adaptive internal model. Acta Automatica Sinica, 2025, 51(3): 1−14
    [10] Lewis F L, Liu D. Reinforcement learning and approximate dynamic programming for feedback control. Hoboken: Wiley, 2013.
    [11] Vrabie D, Vamvoudakis K G, Lewis F L. Optimal adaptive control and differential games by reinforcement learning principles. London: Institution of Engineering and Technology, 2013.
    [12] Jiang Y, Jiang Z P. Global adaptive dynamic programming for continuous-time nonlinear systems. IEEE Transactions on Automatic Control, 2015, 60(11): 2917−2929 doi: 10.1109/TAC.2015.2414811
    [13] Wei Q, Liu D, Liu Y, et al. Optimal constrained self-learning battery sequential management in microgrid via adaptive dynamic programming. IEEE/CAA Journal of Automatica Sinica, 2016, 4(2): 168−176
    [14] Zhang H, Liu D, Luo Y, et al. Adaptive dynamic programming for control: Algorithms and stabilityCommunications and control engineering. Springer, 2013.
    [15] Lewis F L, Vrabie D, Vamvoudakis K G. Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers. IEEE Control Systems Magazine, 2012, 32(6): 76−105 doi: 10.1109/MCS.2012.2214134
    [16] Liu D, Wei Q, Wang D, et al. Adaptive dynamic programming with applications in optimal control. Springer, 2017.
    [17] Yang Y, Modares H, Vamvoudakis K G, et al. Hamiltonian-driven adaptive dynamic programming with approximation errors. IEEE Transactions on Cybernetics, 2022, 52(12): 13762−13773 doi: 10.1109/TCYB.2021.3108034
    [18] Gao W, Jiang Z P. Learning-based adaptive optimal output regulation of linear and nonlinear systems: an overview. Control Theory and Technology, 2022, 20(1).
    [19] Liu D, Wei Q. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(3): 621−634 doi: 10.1109/TNNLS.2013.2281663
    [20] Li C, Liu D, Wang D. Data-based optimal control for weakly coupled nonlinear systems using policy iteration. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2016, 48(4): 511−521
    [21] Zhao B, Wang D, Shi G, et al. Decentralized control for large-scale nonlinear systems with unknown mismatched interconnections via policy iteration. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2017, 48(10): 1725−1735
    [22] Gao W, Jiang Z P, Chai T. Resilient Control Under Denial-of-service and Uncertainty: An Adaptive Dynamic Programming Approach. IEEE Transactions on Automatic Control, 2025.
    [23] Jiang Y, Jiang Z P. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica, 2012, 48(10): 2699−2704 doi: 10.1016/j.automatica.2012.06.096
    [24] Kleinman D. On an iterative technique for Riccati equation computations. IEEE Transactions on Automatic Control, 1968, 13(1): 114−115 doi: 10.1109/TAC.1968.1098829
    [25] Wei Q, Liu D, Lin H. Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Transactions on Cybernetics, 2016, 46(3): 840−853 doi: 10.1109/TCYB.2015.2492242
    [26] Gao W, Mynuddin M, Wunsch D C, et al. Reinforcement learning-based cooperative optimal output regulation via distributed adaptive internal model. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(10): 5229−5240 doi: 10.1109/TNNLS.2021.3069728
    [27] Gao W, Huang M, Jiang Z P, et al. Sampled-data- based adaptive optimal output-feedback control of a 2-DOF helicopter. IET Control Theory and Applications, 2016, 10: 1440−1447 doi: 10.1049/iet-cta.2015.0977
    [28] Liu D, Wei Q. Adaptive dynamic programming for a class of discrete-time non-affine nonlinear systems with time-delays. In: The 2010 International Joint Conference on Neural Networks(IJCNN), Barcelona, Spain: 2010, 1-6.
    [29] Xiao F, Shi Y, Ren W. Robustness analysis of asynchronous sampled-data multiagent networks with time-varying delays. IEEE Transactions on Automatic Control, 2018, 63(7): 2145−2152 doi: 10.1109/TAC.2017.2756860
    [30] Liu Y, Zhang H, Yu R, et al. H∞ tracking control of discrete-time system with delays via databased adaptive dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(11): 4078−4085 doi: 10.1109/TSMC.2019.2946397
    [31] Gao W, Jiang Z P. Adaptive optimal output regulation of time-delay systems via measurement feedback. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(3): 938−945 doi: 10.1109/TNNLS.2018.2850520
    [32] Gao W, Jiang Y, Jiang Z P, et al. Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming. Automatica, 2016, 72: 37−45 doi: 10.1016/j.automatica.2016.05.008
    [33] Lewis F L, Vamvoudakis K G. Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measublack output data. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2011, 41(1): 14−25 doi: 10.1109/TSMCB.2010.2043839
    [34] Moraal P, Grizzle J. Observer design for nonlinear systems with discrete-time measurements. IEEE Transactions on Automatic Control, 1995, 40(3): 395−404 doi: 10.1109/9.376051
    [35] Heydari A. Analyzing policy iteration in optimal control. In: 2016 American Control Conference(ACC), Boston, MA, USA: 2016, 5728-5733.
  • 加载中
计量
  • 文章访问数:  15
  • HTML全文浏览量:  10
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-03-13
  • 录用日期:  2025-07-15
  • 网络出版日期:  2025-09-29

目录

    /

    返回文章
    返回