数据驱动自适应评判控制研究进展

王鼎; 赵明明; 刘德荣; 乔俊飞; 宋世杰

doi:10.16383/j.aas.c240706

数据驱动自适应评判控制研究进展

doi: 10.16383/j.aas.c240706 cstr: 32138.14.j.aas.c240706

王鼎^{1, 2, 3, 4,},
赵明明^{1, 2, 3, 4,},
刘德荣^5,,
乔俊飞^{1, 2, 3, 4,},
宋世杰^6,

1.
北京工业大学信息科学技术学院北京 100124
2.
计算智能与智能系统北京市重点实验室北京 100124
3.
智慧环保北京实验室北京 100124
4.
北京人工智能研究院北京 100124
5.
南方科技大学自动化与智能制造学院深圳 518055
6.
西南交通大学智慧城市与交通学院成都 611756

基金项目: 国家自然科学基金(62222301, 62473012, 62021003), 国家科技重大专项(2021ZD0112302)资助

详细信息

作者简介:
王鼎：北京工业大学信息科学技术学院教授. 主要研究方向为强化学习与智能控制. 本文通信作者. E-mail: dingwang@bjut.edu.cn

赵明明：北京工业大学信息科学技术学院博士研究生. 主要研究方向为强化学习和智能控制. E-mail: zhaomm@emails.bjut.edu.cn

刘德荣：南方科技大学自动化与智能制造学院教授. 主要研究方向为强化学习和智能控制. E-mail: liudr@sustech.edu.cn

乔俊飞：北京工业大学信息科学技术学院教授. 主要研究方向为污水处理过程智能控制和神经网络结构设计与优化. E-mail: adqiao@bjut.edu.cn

宋世杰：西南交通大学智慧城市与交通学院讲师. 主要研究方向为强化学习和智能控制. E-mail: shijie.song@swjtu.edu.cn

计量
- 文章访问数: 1339
- HTML全文浏览量: 502
- PDF下载量: 332
- 被引次数: 0
出版历程
- 收稿日期: 2024-10-31
- 录用日期: 2025-01-17
- 网络出版日期: 2025-03-24
- 刊出日期: 2025-06-24

Research Advances on Data-driven Adaptive Critic Control

WANG Ding^{1, 2, 3, 4
,},
ZHAO Ming-Ming^{1, 2, 3, 4
,},
LIU De-Rong^5
,,
QIAO Jun-Fei^{1, 2, 3, 4
,},
SONG Shi-Jie^6
,

1.
School of Information Science and Technology, Beijing University of Technology, Beijing 100124
2.
Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing 100124
3.
Beijing Laboratory of Smart Environmental Protection, Beijing 100124
4.
Beijing Institute of Artificial Intelligence, Beijing 100124
5.
School of Automation and Intelligent Manufacturing, Southern University of Science and Technology, Shenzhen 518055
6.
Institute of Smart City and Intelligent Transportation, Southwest Jiaotong University, Chengdu 611756

Funds: Supported by National Natural Science Foundation of China (62222301, 62473012, 62021003) and National Science and Technology Major Project (2021ZD0112302)

More Information

Author Bio:
WANG Ding　Professor at the School of Information Science and Technology, Beijing University of Technology. His research interest covers reinforcement learning and intelligent control. Corresponding author of this paper

ZHAO Ming-Ming　Ph.D. candidate at the School of Information Science and Technology, Beijing University of Technology. His research interest covers reinforcement learning and intelligent control

LIU De-Rong　Professor at the School of Automation and Intelligent Manufacturing, Southern University of Science and Technology. His research interest covers reinforcement learning and intelligent control

QIAO Jun-Fei　Professor at the School of Information Science and Technology, Beijing University of Technology. His research interest covers intelligent control of wastewater treatment processes, structure design and optimization of neural networks

SONG Shi-Jie　Lecturer at the Institute of Smart City and Intelligent Transportation, Southwest Jiaotong University. His research interest covers reinforcement learning and intelligent control

摘要

摘要: 最优控制与人工智能的融合发展产生了一类以执行−评判设计为主要思想的自适应动态规划(ADP)方法. 通过集成动态规划理论、强化学习机制、神经网络技术、函数优化算法, ADP在求解大规模复杂非线性系统的决策和调控问题上取得重要进展. 然而, 实际系统的未知参数和不确定扰动经常导致难以建立精确的数学模型, 对最优控制器的设计提出挑战. 近年来, 具有强大自学习和自适应能力的数据驱动ADP方法受到广泛关注, 它能够在不依赖动态模型的情况下, 仅利用系统的输入输出数据为复杂非线性系统设计出稳定、安全、可靠的最优控制器, 符合智能自动化的发展潮流. 通过对数据驱动ADP方法的算法实现、理论特性、相关应用等方面进行梳理, 着重介绍了最新的研究进展, 包括在线Q学习、值迭代Q学习、策略迭代Q学习、加速Q学习、迁移Q学习、跟踪Q学习、安全Q学习和博弈Q学习, 并涵盖数据学习范式、稳定性、收敛性以及最优性的分析. 此外, 为提高学习效率和控制性能, 设计了一些改进的评判机制和效用函数. 最后, 以污水处理过程为背景, 总结数据驱动ADP方法在实际工业系统中的应用效果和存在问题, 并展望一些未来的研究方向.
- 自适应评判控制 /
- 自适应动态规划 /
- 数据驱动设计 /
- 在线Q学习 /
- 迭代Q学习
Abstract: The fusion and development of optimal control and artificial intelligence yields adaptive dynamic programming (ADP) methods, which are primarily constructed based on the actor-critic design. By integrating dynamic programming theory, reinforcement learning mechanisms, neural network technologies, and function optimization algorithms, ADP has achieved significant progress in solving decision-making and control problems for large-scale complex nonlinear systems. However, the unknown parameters and uncertain disturbances of actual systems often make it difficult to establish accurate mathematical models, posing challenges to the design of optimal controllers. In recent years, data-driven ADP methods with strong self-learning and adaptive capabilities have received widespread attention. ADP methods can design stable, safe, and reliable optimal controllers for complex nonlinear systems using only the input-output data of the system without relying on dynamical models, aligning with the trend of intelligent automation. This paper comprehensively reviews the algorithm implementation, theoretical characteristics, and related applications of data-driven ADP methods, emphasizing the latest research progress, including online Q-learning, value-iteration-based Q-learning, policy-iteration-based Q-learning, accelerated Q-learning, transfer Q-learning, tracking Q-learning, safe Q-learning and game Q-learning. This paper also covers the analysis of data learning paradigms, stability, convergence, and optimality. Furthermore, in order to enhance learning efficiency and control performance, this paper designs some improved critic schemes and utility functions. Finally, with the background of wastewater treatment processes, this paper summarizes the application effects and existing issues of data-driven ADP approaches in practical industrial systems, and outlines several future research directions.
- Adaptive critic control /
- adaptive dynamic programming /
- data-driven design /
- online Q-learning /
- iterative Q-learning

HTML全文

图 1 在线Q学习算法结构图

Fig. 1 The structure diagram of the online Q-learning algorithm

下载: 全尺寸图片幻灯片

图 2 确定的值迭代Q学习算法结构图

Fig. 2 The structure diagram of the deterministic value iteration-based Q-learning algorithm

下载: 全尺寸图片幻灯片

图 3 确定的策略迭代Q学习算法结构图

Fig. 3 The structure diagram of the deterministic policy iteration-based Q-learning algorithm

下载: 全尺寸图片幻灯片

参考文献(187)

[1]	张化光, 张欣, 罗艳红, 杨珺. 自适应动态规划综述. 自动化学报, 2013, 39(4): 303−311 doi: 10.1016/S1874-1029(13)60031-2 Zhang Hua-Guang, Zhang Xin, Luo Yan-Hong, Yang Jun. An overview of research on adaptive dynamic programming. Acta Automatica Sinica, 2013, 39(4): 303−311 doi: 10.1016/S1874-1029(13)60031-2
[2]	Lewis F L, Vrabie D, Vamvoudakis K G. Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers. IEEE Control Systems Magazine, 2012, 32(6): 76−105 doi: 10.1109/MCS.2012.2214134
[3]	Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998.
[4]	Werbos P J. Approximate Dynamic Programming for Real-time Control and Neural Modeling. New York: Van Nostrand Reinhold, 1992.
[5]	刘德荣, 李宏亮, 王鼎. 基于数据的自学习优化控制: 研究进展与展望. 自动化学报, 2013, 39(11): 1858−1870 doi: 10.3724/SP.J.1004.2013.01858 Liu De-Rong, Li Hong-Liang, Wang Ding. Data-based self-learning optimal control: Research progress and prospects. Acta Automatica Sinica, 2013, 39(11): 1858−1870 doi: 10.3724/SP.J.1004.2013.01858
[6]	Mao R Q, Cui R X, Chen C L P. Broad learning with reinforcement learning signal feedback: Theory and applications. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(7): 2952−2964 doi: 10.1109/TNNLS.2020.3047941
[7]	Shi Y X, Hu Q L, Li D Y, Lv M L. Adaptive optimal tracking control for spacecraft formation flying with event-triggered input. IEEE Transactions on Industrial Informatics, 2023, 19(5): 6418−6428 doi: 10.1109/TII.2022.3181067
[8]	孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题. 自动化学报, 2020, 46(7): 1301−1312 Sun Chang-Yin, Mu Chao-Xu. Important scientific problems of multi-agent deep reinforcement learning. Acta Automatica Sinica, 2020, 46(7): 1301−1312
[9]	Wei Q L, Liao Z H, Shi G. Generalized actor-critic learning optimal control in smart home energy management. IEEE Transactions on Industrial Informatics, 2021, 17(10): 6614−6623 doi: 10.1109/TII.2020.3042631
[10]	王鼎, 赵明明, 哈明鸣, 乔俊飞. 基于折扣广义值迭代的智能最优跟踪及应用验证. 自动化学报, 2022, 48(1): 182−193 Wang Ding, Zhao Ming-Ming, Ha Ming-Ming, Qiao Jun-Fei. Intelligent optimal tracking with application verifications via discounted generalized value iteration. Acta Automatica Sinica, 2022, 48(1): 182−193
[11]	Sun J Y, Dai J, Zhang H G, Yu S H, Xu S, Wang J J. Neural-network-based immune optimization regulation using adaptive dynamic programming. IEEE Transactions on Cybernetics, 2023, 53(3): 1944−1953 doi: 10.1109/TCYB.2022.3179302
[12]	Liu D R, Ha M M, Xue S. State of the art of adaptive dynamic programming and reinforcement learning. CAAI Artificial Intelligence Research, 2022, 1(2): 93−110 doi: 10.26599/AIR.2022.9150007
[13]	Wang D, Gao N, Liu D R, Li J N, Lewis F L. Recent progress in reinforcement learning and adaptive dynamic programming for advanced control applications. IEEE/CAA Journal of Automatica Sinica, 2024, 11(1): 18−36 doi: 10.1109/JAS.2023.123843
[14]	王鼎, 赵明明, 哈明鸣, 任进. 智能控制与强化学习: 先进值迭代评判设计. 北京: 人民邮电出版社, 2024. Wang Ding, Zhao Ming-Ming, Ha Ming-Ming, Ren Jin. Intelligent Control and Reinforcement Learning: Advanced Value Iteration Critic Design. Beijing: Posts and Telecommunications Press, 2024.
[15]	孙景亮, 刘春生. 基于自适应动态规划的导弹制导律研究综述. 自动化学报, 2017, 43(7): 1101−1113 Sun Jing-Liang, Liu Chun-Sheng. An overview on the adaptive dynamic programming based missile guidance law. Acta Automatica Sinica, 2017, 43(7): 1101−1113
[16]	Zhao M M, Wang D, Qiao J F, Ha M M, Ren J. Advanced value iteration for discrete-time intelligent critic control: A survey. Artificial Intelligence Review, 2023, 56: 12315−12346 doi: 10.1007/s10462-023-10497-1
[17]	Wang D, Ha M M, Zhao M M. The intelligent critic framework for advanced optimal control. Artificial Intelligence Review, 2022, 55(1): 1−22 doi: 10.1007/s10462-021-10118-9
[18]	Al-Tamimi A, Lewis F L, Abu-Khalaf M. Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 2008, 38(4): 943−949 doi: 10.1109/TSMCB.2008.926614
[19]	Li H L, Liu D R. Optimal control for discrete-time affine non-linear systems using general value iteration. IET Control Theory and Applications, 2012, 6(18): 2725−2736 doi: 10.1049/iet-cta.2011.0783
[20]	Wei Q L, Liu D R, Lin H Q. Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Transactions on Cybernetics, 2016, 46(3): 840−853 doi: 10.1109/TCYB.2015.2492242
[21]	Wang D, Zhao M M, Ha M M, Qiao J F. Stability and admissibility analysis for zero-sum games under general value iteration formulation. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(11): 8707−8718 doi: 10.1109/TNNLS.2022.3152268
[22]	Wang D, Ren J, Ha M M, Qiao J F. System stability of learning-based linear optimal control with general discounted value iteration. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(9): 6504−6514 doi: 10.1109/TNNLS.2021.3137524
[23]	Heydari A. Stability analysis of optimal adaptive control under value iteration using a stabilizing initial policy. IEEE Transactions on Neural Networks and Learning Syetems, 2018, 29(9): 4522−4527 doi: 10.1109/TNNLS.2017.2755501
[24]	Wei Q L, Lewis F L, Liu D R, Song R Z, Lin H Q. Discrete-time local value iteration adaptive dynamic programming: Convergence analysis. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2018, 48(6): 875−891 doi: 10.1109/TSMC.2016.2623766
[25]	Zhao M M, Wang D, Ha M M, Qiao J F. Evolving and incremental value iteration schemes for nonlinear discrete-time zero-sum games. IEEE Transactions on Cybernetics, 2023, 53(7): 4487−4499 doi: 10.1109/TCYB.2022.3198078
[26]	Ha M M, Wang D, Liu D R. Neural-network-based discounted optimal control via an integrated value iteration with accuracy guarantee. Neural Networks, 2021, 144: 176−186 doi: 10.1016/j.neunet.2021.08.025
[27]	Luo B, Liu D R, Huang T W, Yang X, Ma H W. Multi-step heuristic dynamic programming for optimal control of nonlinear discrete-time systems. Information Sciences, 2017, 411: 66−83 doi: 10.1016/j.ins.2017.05.005
[28]	Wang D, Wang J Y, Zhao M M, Xin P, Qiao J F. Adaptive multi-step evaluation design with stability guarantee for discrete-time optimal learning control. IEEE/CAA Journal of Automatica Sinica, 2023, 10(9): 1797−1809 doi: 10.1109/JAS.2023.123684
[29]	Rao J, Wang J C, Xu J H, Zhao S W. Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces. Nonlinear Dynamics, 2023, 111: 20041−20053 doi: 10.1007/s11071-023-08909-6
[30]	Yu L Y, Liu W B, Liu Y R, Alsaadi F E. Learning-based T-sHDP (λ) for optimal control of a class of nonlinear discrete-time systems. International Journal of Robust and Nonlinear Control, 2022, 32(5): 2624−2643 doi: 10.1002/rnc.5847
[31]	Al-Dabooni S, Wunsch D. An improved n-step value gradient learning adaptive dynamic programming algorithm for online learning. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(4): 1155−1169 doi: 10.1109/TNNLS.2019.2919338
[32]	Wang J Y, Wang D, Li X, Qiao J F. Dichotomy value iteration with parallel learning design towards discrete-time zero-sum games. Neural Networks, 2023, 167: 751−762 doi: 10.1016/j.neunet.2023.09.009
[33]	Wei Q L, Wang L X, Lu J W, Wang F Y. Discrete-time self-learning parallel control. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52(1): 192−204 doi: 10.1109/TSMC.2020.2995646
[34]	Ha M M, Wang D, Liu D R. A novel value iteration scheme with adjustable convergence rate. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(10): 7430−7442 doi: 10.1109/TNNLS.2022.3143527
[35]	Ha M M, Wang D, Liu D R. Novel discounted adaptive critic control designs with accelerated learning formulation. IEEE Transactions on Cybernetics, 2024, 54(5): 3003−3016 doi: 10.1109/TCYB.2022.3233593
[36]	Wang D, Huang H M, Liu D R, Zhao M M, Qiao J F. Evolution-guided adaptive dynamic programming for nonlinear optimal control. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024, 54(10): 6043−6054 doi: 10.1109/TSMC.2024.3417230
[37]	Liu D R, Wei Q L. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(3): 621−634 doi: 10.1109/TNNLS.2013.2281663
[38]	Liu D R, Wei Q L. Generalized policy iteration adaptive dynamic programming for discrete-time nonlinear systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2015, 45(12): 1577−1591 doi: 10.1109/TSMC.2015.2417510
[39]	Liang M M, Wang D, Liu D R. Neuro-optimal control for discrete stochastic processes via a novel policy iteration algorithm. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(11): 3972−3985 doi: 10.1109/TSMC.2019.2907991
[40]	Luo B, Yang Y, Wu H N, Huang T W. Balancing value iteration and policy iteration for discrete-time control. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(11): 3948−3958 doi: 10.1109/TSMC.2019.2898389
[41]	Li T, Wei Q L, Wang F Y. Multistep look-ahead policy iteration for optimal control of discrete-time nonlinear systems with isoperimetric constraints. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024, 54(3): 1414−1426 doi: 10.1109/TSMC.2023.3327492
[42]	Yang Y L, Kiumarsi B, Modares H, Xu C Z. Model-free λ-policy iteration for discrete-time linear quadratic regulation. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(2): 635−649 doi: 10.1109/TNNLS.2021.3098985
[43]	Huang H M, Wang D, Wang H, Wu J L, Zhao M M. Novel generalized policy iteration for efficient evolving control of nonlinear systems. Neurocomputing, 2024, 608: Article No. 128418 doi: 10.1016/j.neucom.2024.128418
[44]	Dierks T, Jagannathan S. Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using time-based policy update. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(7): 1118−1129 doi: 10.1109/TNNLS.2012.2196708
[45]	Wang D, Xin P, Zhao M M, Qiao J F. Intelligent optimal control of constrained nonlinear systems via receding-horizon heuristic dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024, 54(1): 287−299 doi: 10.1109/TSMC.2023.3306338
[46]	Moghadam R, Natarajan P, Jagannathan S. Online optimal adaptive control of partially uncertain nonlinear discrete-time systems using multilayer neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(9): 4840−4850 doi: 10.1109/TNNLS.2021.3061414
[47]	Zhang H G, Qin C B, Jiang B, Luo Y H. Online adaptive policy learning algorithm for H_∞ state feedback control of unknown affine nonlinear discrete-time systems. IEEE Transactions on Cybernetics, 2014, 44(12): 2706−2718 doi: 10.1109/TCYB.2014.2313915
[48]	Ming Z Y, Zhang H G, Yan Y Q, Zhang J. Tracking control of discrete-time system with dynamic event-based adaptive dynamic programming. IEEE Transactions on Circuits and Systems II: Express Briefs, 2022, 69(8): 3570−3574
[49]	罗彪, 欧阳志华, 易昕宁, 刘德荣. 基于自适应动态规划的移动机器人视觉伺服跟踪控制. 自动化学报, 2023, 49(11): 2286−2296 Luo Biao, Ouyang Zhi-Hua, Yi Xin-Ning, Liu De-Rong. Adaptive dynamic programming based visual servoing tracking control for mobile robots. Acta Automatica Sinica, 2023, 49(11): 2286−2296
[50]	Ha M M, Wang D, Liu D R. Discounted iterative adaptive critic designs with novel stability analysis for tracking control. IEEE/CAA Journal of Automatica Sinica, 2022, 9(7): 1262−1272 doi: 10.1109/JAS.2022.105692
[51]	Dong L, Zhong X N, Sun C Y, He H B. Adaptive event-triggered control based on heuristic dynamic programming for nonlinear discrete-time systems. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(7): 1594−1605 doi: 10.1109/TNNLS.2016.2541020
[52]	Wang D, Hu L Z, Zhao M M, Qiao J F. Dual event-triggered constrained control through adaptive critic for discrete-time zero-sum games. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53(3): 1584−1595 doi: 10.1109/TSMC.2022.3201671
[53]	Yang X, Wang D. Reinforcement learning for robust dynamic event-driven constrained control. IEEE Transactions on Neural Networks and Learning Systems, DOI: 10.1109/TNNLS.2024.3394251
[54]	王鼎. 基于学习的鲁棒自适应评判控制研究进展. 自动化学报, 2019, 45(6): 1031−1043 Wang Ding. Research progress on learning-based robust adaptive critic control. Acta Automatica Sinica, 2019, 45(6): 1031−1043
[55]	Ren H, Jiang B, Ma Y J. Zero-sum differential game-based fault-tolerant control for a class of affine nonlinear systems. IEEE Transactions on Cybernetics, 2024, 54(2): 1272−1282 doi: 10.1109/TCYB.2022.3215716
[56]	Zhang S C, Zhao B, Liu D R, Zhang Y W. Event-triggered decentralized integral sliding mode control for input-constrained nonlinear large-scale systems with actuator failures. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024, 54(3): 1914−1925 doi: 10.1109/TSMC.2023.3331150
[57]	Wei Q L, Zhu L, Song R Z, Zhang P J, Liu D R, Xiao J. Model-free adaptive optimal control for unknown nonlinear multiplayer nonzero-sum game. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(2): 879−892 doi: 10.1109/TNNLS.2020.3030127
[58]	Ye J, Bian Y G, Luo B, Hu M J, Xu B, Ding R. Costate-supplement ADP for model-free optimal control of discrete-time nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(1): 45−59 doi: 10.1109/TNNLS.2022.3172126
[59]	Li Y Q, Yang C Z, Hou Z S, Feng Y J, Yin C K. Data-driven approximate Q-learning stabilization with optimality error bound analysis. Automatica, 2019, 103: 435−442
[60]	Al-Dabooni S, Wunsch D C. Online model-free n-step HDP with stability analysis. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(4): 1255−1269 doi: 10.1109/TNNLS.2019.2919614
[61]	Ni Z, He H B, Zhong X N, Prokhorov D V. Model-free dual heuristic dynamic programming. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(8): 1834−1839 doi: 10.1109/TNNLS.2015.2424971
[62]	Wang D, Ha M M, Qiao J F. Self-learning optimal regulation for discrete-time nonlinear systems under event-driven formulation. IEEE Transactions on Automatic Control, 2020, 65(3): 1272−1279 doi: 10.1109/TAC.2019.2926167
[63]	Wang D, Ha M M, Qiao J F. Data-driven iterative adaptive critic control toward an urban wastewater treatment plant. IEEE Transactions on Industrial Electronics, 2021, 68(8): 7362−7369 doi: 10.1109/TIE.2020.3001840
[64]	Wang D, Hu L Z, Zhao M M, Qiao J F. Adaptive critic for event-triggered unknown nonlinear optimal tracking design with wastewater treatment applications. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(9): 6276−6288 doi: 10.1109/TNNLS.2021.3135405
[65]	Zhu L, Wei Q L, Guo P. Synergetic learning neuro-control for unknown affine nonlinear systems with asymptotic stability guarantees. IEEE Transactions on Neural Networks and Learning Systems, 2025, 36(2): 3479−3489 doi: 10.1109/TNNLS.2023.3347663
[66]	Pang B, Jiang Z P. Adaptive optimal control of linear periodic systems: An off-policy value iteration approach. IEEE Transactions on Automatic Control, 2021, 66(2): 888−894 doi: 10.1109/TAC.2020.2987313
[67]	Xu Y S, Zhao Z G, Yin S. Performance optimization and fault-tolerance of highly dynamic systems via Q-learning with an incrementally attached controller gain system. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(11): 9128−9138 doi: 10.1109/TNNLS.2022.3155876
[68]	Yang X, Xu M M, Wei Q L. Adaptive dynamic programming for nonlinear-constrained H_∞ control. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53(7): 4393−4403 doi: 10.1109/TSMC.2023.3247888
[69]	Werbos P J. Neural networks for control and system identification. In: Proceedings of the 28th IEEE Conference on Decision and Control. Tampa, FL, USA: IEEE, 1989. 260−265
[70]	Prokhorov D V, Wunsch D C. Adaptive critic designs. IEEE Transactions on Neural Networks, 1997, 8(5): 997−1007 doi: 10.1109/72.623201
[71]	Watkins C. Learning From Delayed Rewards [Ph.D. dissertation], King's College of Cambridge, UK, 1989.
[72]	Al-Tamimi A, Lewis F L, Abu-Khalaf M. Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control. Automatica, 2007, 43(3): 473−481 doi: 10.1016/j.automatica.2006.09.019
[73]	Kiumarsi B, Lewis F L, Modares H, Karimpour A, Naghibi-Sistani M. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica, 2014, 50(4): 1167−1175 doi: 10.1016/j.automatica.2014.02.015
[74]	Jiang Y, Jiang Z P. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica, 2012, 48(10): 2699−2704 doi: 10.1016/j.automatica.2012.06.096
[75]	Kiumarsi B, Lewis F L, Jiang Z P. H_∞ control of linear discrete-time systems: Off-policy reinforcement learning. Automatica, 2017, 78: 144−152 doi: 10.1016/j.automatica.2016.12.009
[76]	Farjadnasab M, Babazadeh M. Model-free LQR design by Q-function learning. Automatica, 2022, 137: Article No. 110060 doi: 10.1016/j.automatica.2021.110060
[77]	Lopez V G, Alsalti M, Müller M A. Efficient off-policy Q-learning for data-based discrete-time LQR problems. IEEE Transactions on Automatic Control, 2023, 68(5): 2922−2933 doi: 10.1109/TAC.2023.3235967
[78]	Nguyen H, Dang H B, Dao P N. On-policy and off-policy Q-learning strategies for spacecraft systems: An approach for time-varying discrete-time without controllability assumption of augmented system. Aerospace Science and Technology, 2024, 146: Article No. 108972 doi: 10.1016/j.ast.2024.108972
[79]	Skach J, Kiumarsi B, Lewis F L, Straka O. Actor-critic off-policy learning for optimal control of multiple-model discrete-time systems. IEEE Transactions on Cybernetics, 2018, 48(1): 29−40 doi: 10.1109/TCYB.2016.2618926
[80]	Wen Y L, Zhang H G, Ren H, Zhang K. Off-policy based adaptive dynamic programming method for nonzero-sum games on discrete-time system. Journal of the Franklin Institute, 2020, 357(12): 8059−8081 doi: 10.1016/j.jfranklin.2020.05.038
[81]	Xu Y, Wu Z G. Data-efficient off-policy learning for distributed optimal tracking control of HMAS with unidentified exosystem dynamics. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(3): 3181−3190 doi: 10.1109/TNNLS.2022.3172130
[82]	Cui L L, Pang B, Jiang Z P. Learning-based adaptive optimal control of linear time-delay systems: A policy iteration approach. IEEE Transactions on Automatic Control, 2024, 69(1): 629−636 doi: 10.1109/TAC.2023.3273786
[83]	Amirparast A, Sani S K H. Off-policy reinforcement learning algorithm for robust optimal control of uncertain nonlinear systems. International Journal of Robust and Nonlinear Control, 2024, 34(8): 5419−5437 doi: 10.1002/rnc.7278
[84]	Qasem O, Gao W N, Vamvoudakis K G. Adaptive optimal control of continuous-time nonlinear affine systems via hybrid iteration. Automatica, 2023, 157: Article No. 111261 doi: 10.1016/j.automatica.2023.111261
[85]	Jiang H Y, Zhou B, Duan G R. Modified λ-policy iteration based adaptive dynamic programming for unknown discrete-time linear systems. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(3): 3291−3301 doi: 10.1109/TNNLS.2023.3244934
[86]	Zhao J G, Yang C Y, Gao W N, Park J H. Novel single-loop policy iteration for linear zero-sum games. Automatica, 2024, 163: Article No. 111551 doi: 10.1016/j.automatica.2024.111551
[87]	肖振飞, 李金娜. 基于非策略Q学习方法的两个个体优化控制. 控制工程, 2022, 29(10): 1874−1880 Xiao Zhen-Fei, Li Jin-Na. Two-player optimization control based on off-policy Q-learning algorithm. Control Engineering of China, 2022, 29(10): 1874−1880
[88]	Liu Y, Zhang H G, Yu R, Xing Z X. H_∞ tracking control of discrete-time system with delays via data-based adaptive dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(11): 4078−4085 doi: 10.1109/TSMC.2019.2946397
[89]	Zhang H G, Liu Y, Xiao G Y, Jiang H. Data-based adaptive dynamic programming for a class of discrete-time systems with multiple delays. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(2): 432−441 doi: 10.1109/TSMC.2017.2758849
[90]	Tan X F, Li Y, Liu Y. Stochastic linear quadratic optimal tracking control for discrete-time systems with delays based on Q-learning algorithm. AIMS Mathematics, 2023, 8(5): 10249−10265 doi: 10.3934/math.2023519
[91]	Zhang L L, Zhang H G, Sun J Y, Yue X. ADP-based fault-tolerant control for multiagent systems with semi-markovian jump parameters. IEEE Transactions on Cybernetics, 2024, 54(10): 5952−5962 doi: 10.1109/TCYB.2024.3411310
[92]	Li Y, Zhang H, Wang Z P, Huang C, Yan H C. Data-driven decentralized control for large-scale systems with sparsity and communication delays. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53(9): 5614−5624 doi: 10.1109/TSMC.2023.3274292
[93]	Shen X Y, Li X J. Data-driven output-feedback LQ secure control for unknown cyber-physical systems against sparse actuator attacks. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2021, 51(9): 5708−5720 doi: 10.1109/TSMC.2019.2957146
[94]	Qasem O, Davari M, Gao W N, Kirk D R, Chai T Y. Hybrid iteration ADP algorithm to solve cooperative, optimal output regulation problem for continuous-time, linear, multiagent systems: Theory and application in islanded modern microgrids with IBRs. IEEE Transactions on Industrial Electronics, 2024, 71(1): 834−845 doi: 10.1109/TIE.2023.3247734
[95]	Zhang H G, Liang H J, Wang Z S, Feng T. Optimal output regulation for heterogeneous multiagent systems via adaptive dynamic programming. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(1): 18−29 doi: 10.1109/TNNLS.2015.2499757
[96]	Wang W, Chen X. Model-free optimal containment control of multi-agent systems based on actor-critic framework. Neurocomputing, 2018, 314(7): 242−250
[97]	Cui L L, Wang S, Zhang J F, Zhang D S, Lai J, Zheng Y, et al. Learning-based balance control of wheel-legged robots. IEEE Robotics and Automation Letters, 2021, 6(4): 7667−7674 doi: 10.1109/LRA.2021.3100269
[98]	Liu T, Cui L L, Pang B, Jiang Z P. A unified framework for data-driven optimal control of connected vehicles in mixed traffic. IEEE Transactions on Intelligent Vehicles, 2023, 8(8): 4131−4145 doi: 10.1109/TIV.2023.3287131
[99]	Davari M, Gao W N, Aghazadeh A, Blaabjerg F, Lewis F L. An optimal synchronization control method of PLL utilizing adaptive dynamic programming to synchronize inverter-based resources with unbalanced, low-inertia, and very weak grids. IEEE Transactions on Automation Science and Engineering, 2025, 22: 24−42 doi: 10.1109/TASE.2023.3329479
[100]	Wang Z Y, Wang Y Q, Davari M, Blaabjerg F. An effective PQ-decoupling control scheme using adaptive dynamic programming approach to reducing oscillations of virtual synchronous generators for grid connection with different impedance types. IEEE Transactions on Industrial Electronics, 2024, 71(4): 3763−3775 doi: 10.1109/TIE.2023.3279564
[101]	Si J, Wang Y T. Online learning control by association and reinforcement. IEEE Transactions on Neural Networks, 2001, 12(2): 264−276 doi: 10.1109/72.914523
[102]	Liu F, Sun J, Si J, Guo W T, Mei S W. A boundedness result for the direct heuristic dynamic programming. Neural Networks, 2012, 32: 229−235 doi: 10.1016/j.neunet.2012.02.005
[103]	Sokolov Y, Kozma R, Werbos L D, Werbos P J. Complete stability analysis of a heuristic approximate dynamic programming control design. Automatica, 2015, 59: 9−18 doi: 10.1016/j.automatica.2015.06.001
[104]	Malla N, Ni Z. A new history experience replay design for model-free adaptive dynamic programming. Neurocomputing, 2017, 266(29): 141−149
[105]	Luo B, Wu H N, Huang T W, Liu D R. Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. Automatica, 2014, 50(12): 3281−3290 doi: 10.1016/j.automatica.2014.10.056
[106]	Zhao D B, Xia Z P, Wang D. Model-free optimal control for affine nonlinear systems with convergence analysis. IEEE Transactions on Automation Science and Engineering, 2015, 12(4): 1461−1468 doi: 10.1109/TASE.2014.2348991
[107]	Xu J H, Wang J C, Rao J, Zhong Y J, Wu S Y, Sun Q F. Parallel cross entropy policy gradient adaptive dynamic programming for optimal tracking control of discrete-time nonlinear systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024, 54(6): 3809−3821 doi: 10.1109/TSMC.2024.3373456
[108]	Wei Q L, Lewis F L, Sun Q Y, Yan P F, Song R Z. Discrete-time deterministic Q-learning: A novel convergence analysis. IEEE Transactions on Cybernetics, 2017, 47(5): 1224−1237 doi: 10.1109/TCYB.2016.2542923
[109]	王鼎, 王将宇, 乔俊飞. 融合自适应评判的随机系统数据驱动策略优化. 自动化学报, 2024, 50(5): 980−990 Wang Ding, Wang Jiang-Yu, Qiao Jun-Fei. Data-driven policy optimization for stochastic systems involving adaptive critic. Acta Automatica Sinica, 2024, 50(5): 980−990
[110]	Qiao J F, Zhao M M, Wang D, Ha M M. Adjustable iterative Q-learning schemes for model-free optimal tracking control. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024, 54(2): 1202−1213 doi: 10.1109/TSMC.2023.3324215
[111]	Ni Z, Malla N, Zhong X N. Prioritizing useful experience replay for heuristic dynamic programming-based learning systems. IEEE Transactions on Cybernetics, 2019, 49(11): 3911−3922 doi: 10.1109/TCYB.2018.2853582
[112]	Al-Dabooni S, Wunsch D. The boundedness conditions for model-free HDP (λ). IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(7): 1928−1942 doi: 10.1109/TNNLS.2018.2875870
[113]	Zhao Q T, Si J, Sun J. Online reinforcement learning control by direct heuristic dynamic programming: From time-driven to event-driven. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(8): 4139−4144 doi: 10.1109/TNNLS.2021.3053037
[114]	Wei Q L, Liao Z H, Song R Z, Zhang P J, Wang Z, Xiao J. Self-learning optimal control for ice-storage air conditioning systems via data-based adaptive dynamic programming. IEEE Transactions on Industrial Electronics, 2021, 68(4): 3599−3608 doi: 10.1109/TIE.2020.2978699
[115]	Zhao J, Wang T Y, Pedrycz W, Wang W. Granular prediction and dynamic scheduling based on adaptive dynamic programming for the blast furnace gas system. IEEE Transactions on Cybernetics, 2021, 51(4): 2201−2214 doi: 10.1109/TCYB.2019.2901268
[116]	Wang D, Li X, Zhao M M, Qiao J F. Adaptive critic control design with knowledge transfer for wastewater treatment applications. IEEE Transactions on Industrial Informatics, 2024, 20(2): 1488−1497 doi: 10.1109/TII.2023.3278875
[117]	Qiao J F, Zhao M M, Wang D, Li M H. Action-dependent heuristic dynamic programming with experience replay for wastewater treatment processes. IEEE Transactions on Industrial Informatics, 2024, 20(4): 6257−6265 doi: 10.1109/TII.2023.3344130
[118]	Luo B, Liu D R, Wu H N. Adaptive constrained optimal control design for data-based nonlinear discrete-time systems with critic-only structure. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(6): 2099−2111 doi: 10.1109/TNNLS.2017.2751018
[119]	Zhao M M, Wang D, Qiao J F. Stabilizing value iteration Q-learning for online evolving control of discrete-time nonlinear systems. Nonlinear Dynamics, 2024, 112: 9137−9153 doi: 10.1007/s11071-024-09524-9
[120]	Xiang Z R, Li P C, Zou W C, Ahn C K. Data-based optimal switching and control with admissibility guaranteed Q-learning. IEEE Transactions on Neural Networks and Learning Systems, DOI: 10.1109/TNNLS.2024.3405739
[121]	Li X F, Dong L, Xue L, Sun C Y. Hybrid reinforcement learning for optimal control of non-linear switching system. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(11): 9161−9170 doi: 10.1109/TNNLS.2022.3156287
[122]	Li J N, Chai T Y, Lewis F L, Ding Z T, Jiang Y. Off-policy interleaved Q-learning: Optimal control for affine nonlinear discrete-time systems. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(5): 1308−1320 doi: 10.1109/TNNLS.2018.2861945
[123]	Song S J, Zhao M M, Gong D W, Zhu M L. Convergence and stability analysis of value iteration Q-learning under non-discounted cost for discrete-time optimal control. Neurocomputing, 2024, 606: Article No. 128370 doi: 10.1016/j.neucom.2024.128370
[124]	Song S J, Zhu M L, Dai X L, Gong D W. Model-free optimal tracking control of nonlinear input-affine discrete-time systems via an iterative deterministic Q-learning algorithm. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(1): 999−1012 doi: 10.1109/TNNLS.2022.3178746
[125]	Wei Q L, Liu D R. A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems. Science China Information Sciences, 2015, 58(12): 1−15
[126]	Yan P F, Wang D, Li H L, Liu D R. Error bound analysis of Q-function for discounted optimal control problems with policy iteration. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2017, 47(7): 1207−1216 doi: 10.1109/TSMC.2016.2563982
[127]	Wang W, Chen X, Fu H, Wu M. Model-free distributed consensus control based on actor-critic framework for discrete-time nonlinear multiagent systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(11): 4123−4134 doi: 10.1109/TSMC.2018.2883801
[128]	Luo B, Liu D R, Wu H N, Wang D, Lewis F L. Policy gradient adaptive dynamic programming for data-based optimal control. IEEE Transactions on Cybernetics, 2017, 47(10): 3341−3354 doi: 10.1109/TCYB.2016.2623859
[129]	Zhang Y W, Zhao B, Liu D R. Deterministic policy gradient adaptive dynamic programming for model-free optimal control. Neurocomputing, 2020, 387: 40−50 doi: 10.1016/j.neucom.2019.11.032
[130]	Xu J H, Wang J C, Rao J, Zhong Y J, Zhao S W. Twin deterministic policy gradient adaptive dynamic programming for optimal control of affine nonlinear discrete-time systems. International Journal of Control, Automation, and Systems, 2022, 20(9): 3098−3109 doi: 10.1007/s12555-021-0473-6
[131]	Xu J H, Wang J C, Rao J, Wu S Y, Zhong Y J. Adaptive dynamic programming for optimal control of discrete-time nonlinear systems with trajectory-based initial control policy. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024, 54(3): 1489−1501 doi: 10.1109/TSMC.2023.3327450
[132]	Lin M D, Zhao B. Policy optimization adaptive dynamic programming for optimal control of input-affine discrete-time nonlinear systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53(7): 4339−4350 doi: 10.1109/TSMC.2023.3247466
[133]	Lin M D, Zhao B, Liu D R. Policy gradient adaptive critic designs for model-free optimal tracking control with experience replay. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52(6): 3692−3703 doi: 10.1109/TSMC.2021.3071968
[134]	Luo B, Yang Y, Liu D R. Adaptive Q-learning for data-based optimal output regulation with experience replay. IEEE Transactions on Cybernetics, 2018, 48(12): 3337−3348 doi: 10.1109/TCYB.2018.2821369
[135]	Qasem O, Gutierrez H, Gao W N. Experimental validation of data-driven adaptive optimal control for continuous-time systems via hybrid iteration: An application to rotary inverted pendulum. IEEE Transactions on Industrial Electronics, 2024, 71(6): 6210−6220 doi: 10.1109/TIE.2023.3292873
[136]	李满园, 罗飞, 顾春华, 罗勇军, 丁炜超. 基于自适应动量更新策略的Adams算法. 上海理工大学学报, 2023, 45(2): 112−119 Li Man-Yuan, Luo Fei, Gu Chun-Hua, Luo Yong-Jun, Ding Wei-Chao. Adams algorithm based on adaptive momentum update strategy. Journal of University of Shanghai for Science and Technology, 2023, 45(2): 112−119
[137]	姜志侠, 宋佳帅, 刘宇宁. 一种改进的自适应动量梯度下降算法. 华中科技大学学报(自然科学版), 2023, 51(5): 137−143 Jiang Zhi-Xia, Song Jia-Shuai, Liu Yu-Ning. An improved adaptive momentum gradient descent algorithm. Journal of Huazhong University of Science and Technology (Natural Science Edition), 2023, 51(5): 137−143
[138]	姜文翰, 姜志侠, 孙雪莲. 一种修正学习率的梯度下降算法. 长春理工大学学报(自然科学版), 2023, 46(6): 112−120 Jiang Wen-Han, Jiang Zhi-Xia, Sun Xue-Lian. A gradient descent algorithm with modified learning rate. Journal of Changchun University of Science and Technology (Natural Science Edition), 2023, 46(6): 112−120
[139]	Zhao B, Shi G, Liu D R. Event-triggered local control for nonlinear interconnected systems through particle swarm optimization-based adaptive dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53(12): 7342−7353 doi: 10.1109/TSMC.2023.3298065
[140]	Zhang L J, Zhang K, Xie X P, Chadli M. Adaptive critic control with knowledge transfer for uncertain nonlinear dynamical dystems: A reinforcement learning approach. IEEE Transactions on Automation Science and Engineering, DOI: 10.1109/TASE.2024.3453926
[141]	Gao X, Si J, Huang H. Reinforcement learning control with knowledge shaping. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(3): 3156−3167 doi: 10.1109/TNNLS.2023.3243631
[142]	Gao X, Si J, Wen Y, Li M H, Huang H. Reinforcement learning control of robotic knee with human-in-the-loop by flexible policy iteration. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(10): 5873−5887 doi: 10.1109/TNNLS.2021.3071727
[143]	Guo W T, Liu F, Si J, He D W, Harley R, Mei S W. Online supplementary ADP learning controller design and application to power system frequency control with large-scale wind energy integration. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(8): 1748−1761 doi: 10.1109/TNNLS.2015.2431734
[144]	Zhao M M, Wang D, Ren J, Qiao J. Integrated online Q-learning design for wastewater treatment processes. IEEE Transactions on Industrial Informatics, 2025, 21(2): 1833−1842 doi: 10.1109/TII.2024.3488790
[145]	Zhang H G, Wei Q L, Luo Y H. A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2008, 38(4): 937−942 doi: 10.1109/TSMCB.2008.920269
[146]	Song S J, Gong D W, Zhu M L, Zhao Y Y, Huang C. Data-driven optimal tracking control for discrete-time nonlinear systems with unknown dynamics using deterministic ADP. IEEE Transactions on Neural Networks and Learning Systems, 2025, 36(1): 1184−1198 doi: 10.1109/TNNLS.2023.3323142
[147]	Luo B, Liu D R, Huang T W, Wang D. Model-free optimal tracking control via critic-only Q-learning. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(10): 2134−2144 doi: 10.1109/TNNLS.2016.2585520
[148]	Li C, Ding J L, Lewis F L, Chai T Y. A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems. Automatica, 2021, 129: Article No. 109687 doi: 10.1016/j.automatica.2021.109687
[149]	Wang D, Gao N, Ha M M, Zhao M M, Wu J L, Qiao J F. Intelligent-critic-based tracking control of discrete-time input-affine systems and approximation error analysis with application verification. IEEE Transactions on Cybernetics, 2024, 54(8): 4690−4701 doi: 10.1109/TCYB.2023.3312320
[150]	Liang Z T, Ha M M, Liu D R, Wang Y H. Stable approximate Q-learning under discounted cost for data-based adaptive tracking control. Neurocomputing, 2024, 568: Article No. 127048 doi: 10.1016/j.neucom.2023.127048
[151]	Wang Y, Wang D, Zhao M M, Liu A, Qiao J F. Adjustable iterative Q-learning for advanced neural tracking control with stability guarantee. Neurocomputing, 2024, 584: Article No. 127592 doi: 10.1016/j.neucom.2024.127592
[152]	Zhao M M, Wang D, Li M H, Gao N, Qiao J F. A new Q-function structure for model-free adaptive optimal tracking control with asymmetric constrained inputs. International Journal of Adaptive Control and Signal Processing, 2024, 38(5): 1561−1578 doi: 10.1002/acs.3761
[153]	Wang T, Wang Y J, Yang X B, Yang J. Further results on optimal tracking control for nonlinear systems with nonzero equilibrium via adaptive dynamic programming. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(4): 1900−1910 doi: 10.1109/TNNLS.2021.3105646
[154]	Li D D, Dong J X. Approximate optimal robust tracking control based on state error and derivative without initial admissible input. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024, 54(2): 1059−1069 doi: 10.1109/TSMC.2023.3320653
[155]	Zhang H G, Luo Y H, Liu D R. Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Transactions on Neural Networks, 2009, 20(9): 1490−1503 doi: 10.1109/TNN.2009.2027233
[156]	Marvi Z, Kiumarsi B. Reinforcement learning with safety and stability guarantees during exploration for linear systems. IEEE Open Journal of Control Systems, 2022, 1: 322−334 doi: 10.1109/OJCSYS.2022.3209945
[157]	Zanon M, Gros S. Safe reinforcement learning using robust MPC. IEEE Transactions on Automatic Control, 2021, 66(8): 3638−3652 doi: 10.1109/TAC.2020.3024161
[158]	Yang Y L, Vamvoudakis K G, Modares H, Yin Y X, Wunsch D C. Safe intermittent reinforcement learning with static and dynamic event generators. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(12): 5441−5455 doi: 10.1109/TNNLS.2020.2967871
[159]	Yazdani N M, Moghaddam R K, Kiumarsi B, Modares H. A safety-certified policy iteration algorithm for control of constrained nonlinear systems. IEEE Control Systems Letters, 2020, 4(3): 686−691 doi: 10.1109/LCSYS.2020.2990632
[160]	Yang Y L, Vamvoudakis K G, Modares H. Safe reinforcement learning for dynamical games. International Journal of Robust and Nonlinear Control, 2020, 30(9): 3521−3800 doi: 10.1002/rnc.4942
[161]	Song R Z, Liu L, Xia L N, Lewis F L. Online optimal event-triggered H_∞ control for nonlinear systems with constrained state and input. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53(1): 131−141 doi: 10.1109/TSMC.2022.3173275
[162]	Fan B, Yang Q M, Tang X Y, Sun Y X. Robust ADP design for continuous-time nonlinear systems with output constraints. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(6): 2127−2138 doi: 10.1109/TNNLS.2018.2806347
[163]	Liu S H, Liu L J, Yu Z. Safe reinforcement learning for affine nonlinear systems with state constraints and input saturation using control barrier functions. Neurocomputing, 2023, 518: 562−576 doi: 10.1016/j.neucom.2022.11.006
[164]	Farzanegan B, Jagannathan S. Continual reinforcement learning formulation for zero-sum game-based constrained optimal tracking. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53(12): 7744−7757 doi: 10.1109/TSMC.2023.3299556
[165]	Marvi Z, Kiumarsi B. Safe reinforcement learning: A control barrier function optimization approach. International Journal of Robust and Nonlinear Control, 2021, 31(6): 1923−1940 doi: 10.1002/rnc.5132
[166]	Qin C B, Qiao X P, Wang J G, Zhang D H, Hou Y D, Hu S L. Barrier-critic adaptive robust control of nonzero-sum differential games for uncertain nonlinear systems with state constraints. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024, 54(1): 50−63 doi: 10.1109/TSMC.2023.3302656
[167]	Xu J H, Wang J C, Rao J, Zhong Y J, Wang H Y. Adaptive dynamic programming for optimal control of discrete-time nonlinear system with state constraints based on control barrier function. International Journal of Robust and Nonlinear Control, 2021, 32(6): 3408−3424
[168]	Jha M S, Kiumarsi B. Off-policy safe reinforcement learning for nonlinear discrete-time systems. Neurocomputing, 2024, 611: Article No. 128677
[169]	Zhang L Z, Xie L, Jiang Y, Li Z S, Liu X Q, Su H Y. Optimal control for constrained discrete-time nonlinear systems based on safe reinforcement learning. IEEE Transactions on Neural Networks and Learning Systems, 2025, 36(1): 854−865 doi: 10.1109/TNNLS.2023.3326397
[170]	Cohen M H, Belta C. Safe exploration in model-based reinforcement learning using control barrier functions. Automatica, 2023, 147: Article No. 110684 doi: 10.1016/j.automatica.2022.110684
[171]	Liu S H, Liu L J, Yu Z. Fully cooperative games with state and input constraints using reinforcement learning based on control barrier functions. Asian Journal of Control, 2024, 26(2): 888−905 doi: 10.1002/asjc.3226
[172]	Zhao M M, Wang D, Song S J, Qiao J F. Safe Q-learning for data-driven nonlinear optimal control with asymmetric state constraints. IEEE/CAA Journal of Automatica Sinica, 2024, 11(12): 2408−2422 doi: 10.1109/JAS.2024.124509
[173]	Liu D R, Li H L, Wang D. Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm. Neurocomputing, 2013, 110: 92−100 doi: 10.1016/j.neucom.2012.11.021
[174]	Luo B, Yang Y, Liu D R. Policy iteration Q-learning for data-based two-player zero-sum game of linear discrete-time systems. IEEE Transactions on Cybernetics, 2021, 51(7): 3630−3640 doi: 10.1109/TCYB.2020.2970969
[175]	Zhong X N, He H B, Wang D, Ni Z. Model-free adaptive control for unknown nonlinear zero-sum differential game. IEEE Transactions on Cybernetics, 2018, 48(5): 1633−1646 doi: 10.1109/TCYB.2017.2712617
[176]	Wang Y, Wang D, Zhao M M, Liu N, Qiao J F. Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate. Neural Networks, 2024, 175: Article No. 106274 doi: 10.1016/j.neunet.2024.106274
[177]	Zhang Y W, Zhao B, Liu D R, Zhang S C. Event-triggered control of discrete-time zero-sum games via deterministic policy gradient adaptive dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52(8): 4823−4835 doi: 10.1109/TSMC.2021.3105663
[178]	Lin M D, Zhao B, Liu D R. Policy gradient adaptive dynamic programming for nonlinear discrete-time zero-sum games with unknown dynamics. Soft Computing, 2023, 27: 5781−5795 doi: 10.1007/s00500-023-07817-6
[179]	王鼎, 赵慧玲, 李鑫. 基于多目标粒子群优化的污水处理系统自适应评判控制. 工程科学学报, 2024, 46(5): 908−917 Wang Ding, Zhao Hui-Ling, Li Xin. Adaptive critic control for wastewater treatment systems based on multiobjective particle swarm optimization. Chinese Journal of Engineering, 2024, 46(5): 908−917
[180]	Yang Q M, Cao W W, Meng W C, Si J. Reinforcement-learning-based tracking control of waste water treatment process under realistic system conditions and control performance requirements. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52(8): 5284−5294 doi: 10.1109/TSMC.2021.3122802
[181]	Yang R Y, Wang D, Qiao J F. Policy gradient adaptive critic design with dynamic prioritized experience replay for wastewater treatment process control. IEEE Transactions on Industrial Informatics, 2022, 18(5): 3150−3158 doi: 10.1109/TII.2021.3106402
[182]	Qiao J F, Yang R Y, Wang D. Offline data-driven adaptive critic design with variational inference for wastewater treatment process control. IEEE Transactions on Automation Science and Engineering, 2024, 21(4): 4987−4998 doi: 10.1109/TASE.2023.3305615
[183]	Sun B, Kampen E J V. Incremental model-based global dual heuristic programming with explicit analytical calculations applied to flight control. Engineering Applications of Artificial Intelligence, 2020, 89: Article No. 103425 doi: 10.1016/j.engappai.2019.103425
[184]	Zhou Y, Kampen E J V, Chu Q P. Incremental model based online heuristic dynamic programming for nonlinear adaptive tracking control with partial observability. Aerospace Science and Technology, 2020, 105: Article No. 106013 doi: 10.1016/j.ast.2020.106013
[185]	赵振根, 程磊. 基于增量式Q学习的固定翼无人机跟踪控制性能优化. 控制与决策, 2024, 39(2): 391−400 Zhao Zhen-Gen, Cheng Lei. Performance optimization for tracking control of fixed-wing UAV with incremental Q-learning. Control and Decision, 2024, 39(2): 391−400
[186]	Cao W W, Yang Q M, Meng W C, Xie S Z. Data-based robust adaptive dynamic programming for balancing control performance and energy consumption in wastewater treatment process. IEEE Transactions on Industrial Informatics, 2024, 20(4): 6622−6630 doi: 10.1109/TII.2023.3346468
[187]	Fu Y, Hong C W, Fu J, Chai T Y. Approximate optimal tracking control of nondifferentiable signals for a class of continuous-time nonlinear systems. IEEE Transactions on Cybernetics, 2022, 52(6): 4441−4450 doi: 10.1109/TCYB.2020.3027344