Intelligent Optimal Tracking With Application Verifications via Discounted Generalized Value Iteration
-
摘要: 设计了一种基于折扣广义值迭代的智能算法, 用于解决一类复杂非线性系统的最优跟踪控制问题. 通过选取合适的初始值, 值迭代过程中的代价函数将以单调递减的形式收敛到最优代价函数. 基于单调递减的值迭代算法, 在不同折扣因子的作用下, 讨论了迭代跟踪控制律的可容许性和误差系统的渐近稳定性. 为了促进算法的实现, 建立一个数据驱动的模型网络用于学习系统动态信息, 同时构造评判网络和执行网络用于近似迭代代价函数和计算迭代跟踪控制律. 值得注意的是, 我们提出了新颖的停止准则来保证迭代跟踪控制律的有效性. 这种停止准则包含两个条件, 一个条件用来保证迭代跟踪控制律的可用性, 这有利于评估误差系统的渐近稳定性; 而另一个条件用来确保跟踪控制律的近似最优性. 最后, 通过包括污水处理在内的两个应用实例验证了本文提出的近似最优跟踪控制方法的可行性和有效性.Abstract: In this paper, based on the discounted generalized value iteration, an intelligent algorithm is designed to address optimal tracking control problems for a class of complex nonlinear systems. By choosing an appropriate initial value, the iterative cost function converges to the optimum in a monotonically decreasing form. In the light of the monotonically decreasing value iteration algorithm, we discuss the admissibility properties of the iterative tracking control law and the asymptotic stability of the error system with different discounted factors. For facilitating the implementation of the algorithm, a data-driven model network is established to learn the unknown system. The critic and action networks are constructed to approximate the cost function and compute the iterative tracking control law. It is worth noting that a new termination criterion is developed to guarantee the effectiveness of the iterative tracking control law. The termination criterion contains two conditions. The first condition is used to ensure the validity of the tracking control law, which is helpful to evaluate the stability of the error system. The second condition is adopted to guarantee the near-optimal properties of the tracking control law. Finally, two experimental examples are conducted, where a wastewater treatment application is involved, in order to demonstrate the control performance of the proposed near-optimal tracking control method.1) 收稿日期 2021-07-15 录用日期 2021-11-02 Manuscript received July 15, 2021; accepted November 2, 2021 北京市自然科学基金 (JQ19013), 国家自然科学基金 (61773373, 61890930-5, 62021003), 科技创新2030——“新一代人工智能”重大项目(2021ZD0112302, 2021ZD0112301), 国家重点研发计划 (2018YFC1900800-5) 资助 Supported by Beijing Natural Science Foundation (JQ19013), National Natural Science Foundation of China (61773373, 61890930-5, 62021003), and National Key Research and Development Program of China (2021ZD0112302, 2021ZD0112301, 2018YFC1900800-5) 本文责任编委 刘艳军 Recommended by Associate Editor LIU Yan-Jun 1. 北京工业大学信息学部 北京 100124 2. 计算智能与智能系统北京市重点实验室 北京 100124 3. 北京人工智能研究院 北京2) 100124 4. 智慧环保北京实验室 北京 100124 5. 北京科技大学自动化学院 北京 100083 1. Faculty of Information Technology, Beijing University of Technology, Beijing 100124 2. Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing 1001243. Beijing Institute of Artificial Intelligence, Beijing 100124 4. Beijing Laboratory of Smart Environmental Protection, Beijing100124 5. School of Automation and Electrical Engineering,University of Science and Technology Beijing, Beijing 100083
-
表 1 基于广义值迭代算法的跟踪控制参数值
Table 1 Parameter values of tracking control based on generalized value iterative algorithm
符号 $Q$ $R$ $\Lambda$ $\gamma$ 例1 $I_2$ $0.5I_2$ $40I_2$ 0.97 例2 $0.01I_2$ $0.01I_2$ $I_2$ 0.98 -
[1] Liu Y J, Zeng Q, Tong S C, Chen C L P, Liu L. Actuator failure compensation-based adaptive control of active suspension systems with prescribed performance. IEEE Transactions on Industrial Electronics, 2020, 67(8): 7044- 7053 doi: 10.1109/TIE.2019.2937037 [2] Wang T C, Li Y M. Neural-network adaptive output-feedback saturation control for uncertain active suspension systems. IEEE Transactions on Cybernetics, 2020. DOI: 10.1109/TCYB.2020.3001581 [3] 王鼎. 基于学习的鲁棒自适应评判控制研究进展. 自动化学报, 2019, 45(6): 1031-1043Wang D. Research progress on learning-based robust adaptive critic control. Acta Automatica Sinica, 2019, 45(6): 1031-1043 [4] 刘德荣, 李宏亮, 王鼎. 基于数据的自学习优化控制: 研究进展与展望. 自动化学报, 2013, 39(11): 1858-1870 doi: 10.3724/SP.J.1004.2013.01858Liu D R, Li H L, Wang D. Data-based self-learning optimal control: Research progress and prospects. Acta Automatica Sinica, 2013, 39(11): 1858-1870 doi: 10.3724/SP.J.1004.2013.01858 [5] Song R Z, Zhu L. Optimal flxed-point tracking control for discrete-time nonlinear systems via ADP. IEEE/CAA Journal of Automatica Sinica, 2019, 6(3): 657-666 doi: 10.1109/JAS.2019.1911453 [6] Zhang H G, Wei Q L, Luo Y H. A novel inflnite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Transactions on Systems, Man, and Cybernetics- Part B: Cybernetics, 2008, 38(4): 937-942 doi: 10.1109/TSMCB.2008.920269 [7] Wang D, Liu D R, Wei Q L. Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach. Neurocomputing, 2012, 78: 14-22 doi: 10.1016/j.neucom.2011.03.058 [8] Kiumarsi B, Lewis F L. Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(1): 140-151 doi: 10.1109/TNNLS.2014.2358227 [9] Wang D, He H B, Liu D R. Adaptive critic nonlinear robust control: A survey. IEEE Transactions on Cybernetics, 2017, 47(10): 3429-3451 doi: 10.1109/TCYB.2017.2712188 [10] Li J N, Ding J L, Chai T Y, Lewis F L, Sarangapani J. Adaptive interleaved reinforcement learning: Robust stability of affine nonlinear systems with unknown uncertainty. IEEE Transactions on Neural Networks and Learning Systems, 2020. DOI: 10.1109/TNNLS.2020.3027653 [11] Zhang Q C, Zhao D B. Data-based reinforcement learning for nonzero-sum games with unknown drift dynamics. IEEE Transactions on Cybernetics, 2019, 49(8): 2874-2885 doi: 10.1109/TCYB.2018.2830820 [12] Ha M M, Wang D, Liu D R. Event-triggered adaptive critic control design for discrete-time constrained nonlinear systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(9): 3158-3168 doi: 10.1109/TSMC.2018.2868510 [13] Dong L, Zhong X N, Sun C Y, He H B. Adaptive eventtriggered control based on heuristic dynamic programming for nonlinear discrete-time systems. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(7): 1594-1605 doi: 10.1109/TNNLS.2016.2541020 [14] Wang D, Ha M M, Qiao J F. Self-learning optimal regulation for discrete-time nonlinear systems under event-driven formulation. IEEE Transactions on Automatic Control, 2020, 65(3): 1272-1279 doi: 10.1109/TAC.2019.2926167 [15] Al-Tamimi A, Lewis F L, Abu-Khalaf M. Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics. 2008, 38(4): 943-949 doi: 10.1109/TSMCB.2008.926614 [16] Liu D, Wei Q L. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(3): 621-634 doi: 10.1109/TNNLS.2013.2281663 [17] Wei Q L, Liu D R, Lin H Q. Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Transactions on Cybernetics, 2016, 46(3): 840-853 doi: 10.1109/TCYB.2015.2492242 [18] Li H L, Liu D R. Optimal control for discrete-time a–ne non-linear systems using general value iteration. IET Control Theory and Applications, 2012, 6(18): 2725-2736 doi: 10.1049/iet-cta.2011.0783 [19] Wei Q L, Lewis F L, Liu D R, Song R Z, Lin H Q. Discrete-time local value iteration adaptive dynamic programming: Convergence analysis. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2018, 48(6): 875-891 doi: 10.1109/TSMC.2016.2623766 [20] Ha M M, Wang D, Liu D R. Generalized value iteration for discounted optimal control with stability analysis. Systems & Control Letters, 2021, 147: 104847 [21] Song R Z, Xiao W D, Sun C Y. Optimal tracking control for a class of unknown discrete-time systems with actuator saturation via data-based ADP algorithm. Acta Automatica Sinica, 2013, 39(9): 1413-1420 doi: 10.1016/S1874-1029(13)60070-1 [22] Ha M M, Wang D, Liu D R. Data-based nonaffine optimal tracking control using iterative DHP approach. IFAC-PapersOnLine, 2020, 53(2): 4246−4251 [23] Wang D, Ha M M, Qiao J F. Data-Driven iterative adaptive critic control toward an urban wastewater treatment plant. IEEE Transactions on Industrial Electronics, 2021, 68(8): 7362-7369 doi: 10.1109/TIE.2020.3001840 [24] Wang D, Zhao M M, Ha M M, Ren J. Neural optimal tracking control of constrained nona–ne systems with a wastewater treatment application. Neural Networks, 2021, 143: 121-132 doi: 10.1016/j.neunet.2021.05.027 [25] Wang D, Zhao M M, Qiao J F. Intelligent optimal tracking with asymmetric constraints of a nonlinear wastewater treatment system. International Journal of Robust and Nonlinear Control, 2021, 31(14): 6773-6787 doi: 10.1002/rnc.5639 [26] Zhang H G, Luo Y H, Liu D R. Neural-network-based nearoptimal control for a class of discrete-time a–ne nonlinear systems with control constraints. IEEE Transactions on Neural Networks, 2009, 20(9): 1490-1503 doi: 10.1109/TNN.2009.2027233 [27] Wang D, Qiao J F. Approximate neural optimal control with reinforcement learning for a torsional pendulum device. Neural Networks, 2019, 117: 1-7 doi: 10.1016/j.neunet.2019.04.026 [28] Bo Y C, Qiao J F. Heuristic dynamic programming using echo state network for multivariable tracking control of wastewater treatment process. Asian Journal of Control, 2015, 17(5): 1654-1666 doi: 10.1002/asjc.994 [29] 韩红桂, 张琳琳, 伍小龙, 乔俊飞. 数据和知识驱动的城市污水处理过程多目标优化控制. 自动化学报, 2021, 47(11): 1-9Han H G, Zhang L L, Wu X L, Qiao J F. Dataknowledge driven multiobjective optimal control for municipal wastewater treatment process. Acta Automatica Sinica, 2021, 47(11): 1-9