带有输入时滞的非线性系统基于学习的输出反馈控制

刘思彤; 高伟男; 姜钟平

doi:10.16383/j.aas.c250101

带有输入时滞的非线性系统基于学习的输出反馈控制

doi: 10.16383/j.aas.c250101 cstr: 32138.14.j.aas.c250101

1.
东北大学流程工业综合自动化全国重点实验室沈阳 110819 中国
2.
纽约大学坦登工程学院电子与计算机工程系纽约 NY11201 美国

基金项目: 国家重点研发计划(2024YFA1012702), 国家自然科学基金(62373090, 62521001)和辽宁省兴辽英才计划(XLYC2403177)资助

详细信息

作者简介:
刘思彤：东北大学流程工业综合自动化国家重点实验室硕士研究生. 2023年获得东北大学机械工程专业学士学位. 主要研究方向为自适应动态规划, 输出反馈, 最优控制和强化学习. E-mail: 2370761@stu.neu.edu.cn

高伟男：东北大学流程工业综合自动化全国重点实验室教授. 2017 年获美国纽约大学博士学位. 主要研究方向为人工智能, 自适应动态规划, 优化控制和输出调节. 本文通信作者. E-mail: gaown@mail.neu.edu.cn

姜钟平：欧洲科学院外籍院士, 美国纽约大学杰出教授, IEEE Fellow, IFAC Fellow. 1993 年获法国巴黎高等矿业大学自动控制与数学博士学位. 主要研究方向为稳定性理论, 鲁棒/自适应/分布式非线性控制, 鲁棒自适应动态规划, 强化学习及其在信息, 机械和生物系统中的应用. E-mail: zjiang@nyu.edu

计量
- 文章访问数: 47
- HTML全文浏览量: 44
- 被引次数: 0
出版历程
- 收稿日期: 2025-03-13
- 录用日期: 2025-07-15
- 网络出版日期: 2025-09-29

Learning-based Output-feedback Control for Nonlinear Systems With Input Time-delay

1.
State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang, 110819, China
2.
Department of Electrical and Computer Engineering, Tandon School of Engineering, New York University, New York NY 11201, USA

Funds: Supported by National Key Research and Development Program of China (2024YFA1012702), National Natural Science Foundation of China (62373090,62521001), and the Liaoning Revitalization Talents Program under Grant (No. XLYC2403177)

More Information

Author Bio:
LIU Si-Tong　Master student at the State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University. He received his bachelor degree in mechanical engineering from Northeastern University in 2023. His research interest covers adaptive dynamic programming, output feedback, optimal control, and reinforcement learning

GAO Wei-Nan　Professor at the State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University. He received his Ph.D. degree from New York University, USA in 2017. His research interest covers artificial intelligence, adaptive dynamic programming, optimal control, and output regulation. Corresponding author of this paper

JIANG Zhong-Ping　Foreign Member of the Academia Europaea (Academy of Europe), Institute Professor at New York University Tandon School of Engineering, IEEE Fellow, IFAC Fellow. He received his Ph.D. degree in automatic control and mathematics from the Ecole des Mines de Paris, France in 1993. His research interest covers stability theory, robust/adaptive/distributed nonlinear control, robust adaptive dynamic programming, reinforcement learning and their applications in information, mechanical, and biological systems

摘要

摘要: 针对具有输入时滞的非线性系统直接自适应最优控制问题, 提出了一种新的数据驱动输出反馈控制方法. 该方法通过融合Q学习与值迭代和策略迭代, 在学习过程中无需依赖系统动力学知识. 在系统满足一致可观性的条件下, 提出了一种基于输出数据和带有时滞的输入数据的系统状态重构方法, 基于值迭代和策略迭代来学习自适应最优控制策略. 将该方法应用于范德波尔振荡器这一经典非线性系统的控制, 并通过仿真结果充分验证了该方法的有效性.
- 最优控制 /
- 输出反馈 /
- 时滞 /
- 自适应动态规划
Abstract: This paper proposes a new data-driven output-feedback control method to address the direct adaptive optimal control problem for nonlinear systems with input time-delay. The combination of $ Q$-learning with value iteration (VI) and policy iteration (PI) enables the learning process to be conducted without any knowledge of the system dynamics. Under the condition that the system is uniformly observable, we propose a novel method to reconstruct the state of the system based on output data and input data with time-delay. We then present two iterative methods, VI and PI, to learn the adaptive optimal control policy. Finally, the proposed methods are applied to the classical nonlinear system——Van der Pol oscillator. The simulation results demonstrate the effectiveness of the proposed methods.
- Optimal control /
- output-feedback /
- time-delay /
- adaptive dynamic programming

HTML全文

图 1 时滞系统的输出反馈控制器设计思路

Fig. 1 Design approach for output feedback controllers with time delay

下载: 全尺寸图片幻灯片

图 2 权重范数随迭代的变化情况

Fig. 2 Variation of weighting paradigm with iteration

下载: 全尺寸图片幻灯片

图 3 考虑时滞的输出反馈下闭环系统的输入输出轨迹

Fig. 3 The input-output trajectories of the closed-loop system under output feedback with time delay

下载: 全尺寸图片幻灯片

图 4 不考虑时滞后闭环系统的输入输出轨迹)

Fig. 4 The input-output trajectories of the closed-loop system under output feedback without time delay

下载: 全尺寸图片幻灯片

图 5 $Q$函数迭代前后对对比(除$u_k,\;u_{k-1}$外其他参数为0)

Fig. 5 Comparison of the $Q$ function before and after iteration (with other parameters set to 0 except for $u_k$ and $u_{k-1}$)

下载: 全尺寸图片幻灯片

图 8 $Q$函数迭代前后对比(除$u_{k-2},\;u_{k-3}$外其他参数为0)

Fig. 8 Comparison of the $Q$ function before and after iteration (with other parameters set to 0 except for $u_{k-2}$ and $u_{k-3}$

下载: 全尺寸图片幻灯片

图 6 $Q$函数迭代前后对比(除$u_k,\;u_{k-3}$外其他参数为0)

Fig. 6 Comparison of the $Q$ function before and after iteration (with other parameters set to 0 except for $u_k$ and $u_{k-3}$)

下载: 全尺寸图片幻灯片

图 7 $Q$函数迭代前后对比(除$u_{k-1},\;u_{k-2}$外其他参数为0)

Fig. 7 Comparison of the $Q$ function before and after iteration (with other parameters set to 0 except for $u_{k-1}$ and $u_{k-2}$

下载: 全尺寸图片幻灯片

参考文献(35)

[1]	Lewis F L, Vrabie D, Syrmos V L. Optimal control. Hoboken: Wiley, 2012.
[2]	Gao W, Jiang Z P, Chai T. Bridging the gap between reinforcement learning and nonlinear output-feedback control. In: 2024 43rd Chinese Control Conference(CCC), Kunming, China: 2024, 2425-2431.
[3]	Gao W, Jiang Z P. Data-driven cooperative output regulation of multi-agent systems under distributed denial of service attacks. Science China Information Sciences, 2023, 6(9): 1869−1919
[4]	Hou Z, Wang Z. From model-based control to data-driven control: Survey, classification and perspective. Information Sciences, 2013, 235: 3−35 doi: 10.1016/j.ins.2012.07.014
[5]	Bu X, Yu Q, Hou Z, Qian W. Model Free Adaptive Iterative Learning Consensus Tracking Control for a Class of Nonlinear Multiagent Systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2019, 49(4): 677−686 doi: 10.1109/TSMC.2017.2734799
[6]	Hou Z, Xiong S. On Model-Free Adaptive Control and Its Stability Analysis. IEEE Transactions on Automatic Control, 2019, 64(11): 4555−4569 doi: 10.1109/TAC.2019.2894586
[7]	Liu S, Lin G, Ji H, Jin S, Hou Z. A Novel Enhanced Data-Driven Model-Free Adaptive Control Scheme for Path Tracking of Autonomous Vehicles. IEEE Transactions on Intelligent Transportation Systems, 2025, 26(1): 579−590 doi: 10.1109/TITS.2024.3487299
[8]	Xiong S, Hou Z. Model-Free Adaptive Control for Unknown MIMO Nonaffine Nonlinear Discrete-Time Systems With Experimental Validation. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(4): 1727−1739 doi: 10.1109/TNNLS.2020.3043711
[9]	董昱辰, 高伟男, 姜钟平. 基于分布式自适应内模的多智能体系统协同最优输出调节. 自动化学报, 2025, 51(3): 1−14 Dong Y, Gao W, Jiang Z P. Cooperative optimal output regulation for multi-agent systems based on distributed adaptive internal model. Acta Automatica Sinica, 2025, 51(3): 1−14
[10]	Lewis F L, Liu D. Reinforcement learning and approximate dynamic programming for feedback control. Hoboken: Wiley, 2013.
[11]	Vrabie D, Vamvoudakis K G, Lewis F L. Optimal adaptive control and differential games by reinforcement learning principles. London: Institution of Engineering and Technology, 2013.
[12]	Jiang Y, Jiang Z P. Global adaptive dynamic programming for continuous-time nonlinear systems. IEEE Transactions on Automatic Control, 2015, 60(11): 2917−2929 doi: 10.1109/TAC.2015.2414811
[13]	Wei Q, Liu D, Liu Y, et al. Optimal constrained self-learning battery sequential management in microgrid via adaptive dynamic programming. IEEE/CAA Journal of Automatica Sinica, 2016, 4(2): 168−176
[14]	Zhang H, Liu D, Luo Y, et al. Adaptive dynamic programming for control: Algorithms and stabilityCommunications and control engineering. Springer, 2013.
[15]	Lewis F L, Vrabie D, Vamvoudakis K G. Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers. IEEE Control Systems Magazine, 2012, 32(6): 76−105 doi: 10.1109/MCS.2012.2214134
[16]	Liu D, Wei Q, Wang D, et al. Adaptive dynamic programming with applications in optimal control. Springer, 2017.
[17]	Yang Y, Modares H, Vamvoudakis K G, et al. Hamiltonian-driven adaptive dynamic programming with approximation errors. IEEE Transactions on Cybernetics, 2022, 52(12): 13762−13773 doi: 10.1109/TCYB.2021.3108034
[18]	Gao W, Jiang Z P. Learning-based adaptive optimal output regulation of linear and nonlinear systems: an overview. Control Theory and Technology, 2022, 20(1).
[19]	Liu D, Wei Q. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(3): 621−634 doi: 10.1109/TNNLS.2013.2281663
[20]	Li C, Liu D, Wang D. Data-based optimal control for weakly coupled nonlinear systems using policy iteration. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2016, 48(4): 511−521
[21]	Zhao B, Wang D, Shi G, et al. Decentralized control for large-scale nonlinear systems with unknown mismatched interconnections via policy iteration. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2017, 48(10): 1725−1735
[22]	Gao W, Jiang Z P, Chai T. Resilient Control Under Denial-of-service and Uncertainty: An Adaptive Dynamic Programming Approach. IEEE Transactions on Automatic Control, 2025.
[23]	Jiang Y, Jiang Z P. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica, 2012, 48(10): 2699−2704 doi: 10.1016/j.automatica.2012.06.096
[24]	Kleinman D. On an iterative technique for Riccati equation computations. IEEE Transactions on Automatic Control, 1968, 13(1): 114−115 doi: 10.1109/TAC.1968.1098829
[25]	Wei Q, Liu D, Lin H. Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Transactions on Cybernetics, 2016, 46(3): 840−853 doi: 10.1109/TCYB.2015.2492242
[26]	Gao W, Mynuddin M, Wunsch D C, et al. Reinforcement learning-based cooperative optimal output regulation via distributed adaptive internal model. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(10): 5229−5240 doi: 10.1109/TNNLS.2021.3069728
[27]	Gao W, Huang M, Jiang Z P, et al. Sampled-data- based adaptive optimal output-feedback control of a 2-DOF helicopter. IET Control Theory and Applications, 2016, 10: 1440−1447 doi: 10.1049/iet-cta.2015.0977
[28]	Liu D, Wei Q. Adaptive dynamic programming for a class of discrete-time non-affine nonlinear systems with time-delays. In: The 2010 International Joint Conference on Neural Networks(IJCNN), Barcelona, Spain: 2010, 1-6.
[29]	Xiao F, Shi Y, Ren W. Robustness analysis of asynchronous sampled-data multiagent networks with time-varying delays. IEEE Transactions on Automatic Control, 2018, 63(7): 2145−2152 doi: 10.1109/TAC.2017.2756860
[30]	Liu Y, Zhang H, Yu R, et al. H∞ tracking control of discrete-time system with delays via databased adaptive dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(11): 4078−4085 doi: 10.1109/TSMC.2019.2946397
[31]	Gao W, Jiang Z P. Adaptive optimal output regulation of time-delay systems via measurement feedback. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(3): 938−945 doi: 10.1109/TNNLS.2018.2850520
[32]	Gao W, Jiang Y, Jiang Z P, et al. Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming. Automatica, 2016, 72: 37−45 doi: 10.1016/j.automatica.2016.05.008
[33]	Lewis F L, Vamvoudakis K G. Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measublack output data. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2011, 41(1): 14−25 doi: 10.1109/TSMCB.2010.2043839
[34]	Moraal P, Grizzle J. Observer design for nonlinear systems with discrete-time measurements. IEEE Transactions on Automatic Control, 1995, 40(3): 395−404 doi: 10.1109/9.376051
[35]	Heydari A. Analyzing policy iteration in optimal control. In: 2016 American Control Conference(ACC), Boston, MA, USA: 2016, 5728-5733.