Nearly Optimal Control Scheme Using Adaptive Dynamic Programming Based on Generalized Fuzzy Hyperbolic Model
-
摘要: 为连续非线性系统提出了一种有效的最优控制设计方法. 广义模糊双曲模型(Generalized fuzzy hyperbolic model, GFHM)首次作为逼近器用来估计 HJB (Hamilton-Jacobi-Bellman)方程的解 (值函数,即它是状态与代价函数之间的映射), 然后,利用该近似解获得最优控制. 本文方法只需要一个GFHM估计值函数. 首先, 阐述了对于连线非线性系统最优控制的设计过程; 然后,证明了逼近误差是一致最终有界的 (Uniformly ultimately bounded, UUB); 最后, 一个数值例子验证了本文方法的有效性. 另一个例子通过与神经网络自适应动态规划的方法作比较, 演示了本文方法的优点.Abstract: An effective scheme is presented to design the nearly optimal control for continuous-time (C-T) nonlinear systems. The generalized fuzzy hyperbolic model (GFHM) is used to approximate the solution of the Hamilton-Jacobi-Bellman (HJB) equation (i.e., the value function) for the first time. Further, the approximate solution is utilized to obtain the nearly optimal control. The value function is estimated by only using single GFHM, which captures the mapping between the state and value function. First, we illustrate the design process for the nearly optimal control involving nonlinear systems. Then stability conditions and conservatism analysis are given, and the approximate errors are proven to be uniformly ultimately bounded (UUB). Finally, a numerical example illustrates the effectiveness of our method and an example compared with the adaptive method based on dual neural-network models is used to demonstrate the advantages of our method.
-
[1] Prokhorov D V, Wunsch D C. Adaptive critic designs. IEEE Transactions on Neural Networks, 1997, 8(5): 997-1007[2] Murray J J, Cox C J, Lendaris G G, Saeks R. Adaptive dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 2002, 32(2): 140-153[3] Wang F Y, Jin N, Liu D R, Wei Q L. Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with bound. IEEE Transactions on Neural Networks, 2011, 22(1): 24-36[4] Dierks T, Thumati B T, Jagannathan S. Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Network, 2009, 22(5-6): 851-860[5] Zhang H G, Luo Y H, Liu D R. Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Transactions on Neural Networks, 2009, 20(9): 1490-1503[6] Zhang H G, Cui L L, Luo Y H. Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2012, doi: 10.1109/TSMCB.2012.2203336[7] Wei Q L, Zhang H G, Cui L L. Data-based optimal control for discrete-time zero-sum games of 2-D systems using adaptive critic designs. Acta Automatica Sinica, 2009, 35(6): 682-692[8] Wei Qing-Lai, Zhang Hua-Guang, Liu De-Rong, Zhao Yan. An optimal control scheme for a class of discrete-time nonlinear systems with time delays using adaptive dynamic programming. Acta Automatica Sinica, 2010, 36(1): 121-129(魏庆来, 张化光, 刘德荣, 赵琰. 基于自适应动态规划的一类带有时滞的离散时间非线性系统的最优控制策略.自动化学报, 2010, 36(1): 121-129)[9] Wei Q L, Liu D R. An iterative 2-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state. Neural Networks, 2012, 32: 236-244[10] Si J, Barto A G, Powell W B, Wunsch D. Handbook of Learning and Approximate Dynamic Programming. New York: Wiley, 2004[11] Kirk D E. Optimal Control Theory: An Introduction. New York: Dover, Inc., 2004[12] Lewis F L, Vrabie D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits and Systems Magazine, 2009, 9(3): 32-50[13] Vrabie D, Pastravanu O, Abu-Khalaf M, Lewis F L. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica, 2009, 45(2): 477-484[14] Vrabie D, Lewis F. Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Networks, 2009, 22(3): 237-246[15] Vamvoudakis K G, Lewis F L. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 2010, 46(5): 878-888[16] Zhang H G, Cui L L, Zhang X, Luo Y H. Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Transactions on Neural Networks, 2011, 22(12): 2226-2236[17] Haddad W M, Chellaboina V. Nonlinear Dynamical Systems and Control: A Lyapunov-Based Approach. United Kingdom: Princeton University Press, 2008[18] Lewis F L. Optimal Control. New York: John Wiley and Sons, 1986[19] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998[20] Al-Tamimi A, Lewis F L, Abu-Khalaf M. Model-free Q-learning designs for linear discrete-time zero-sum games with application to H∞ control. Automatica, 2007, 43(3): 473-481[21] Beard R W. Improving the Closed-Loop Performance of Nonlinear Systems [Ph.D. dissertation], Rensselaer Polytechnic Institute, Troy, 1995[22] Beard R W, Saridis G N, Wen J T. Approximate solutions to the time-invariant Hamilton-Jacobi-Bellman equation. Journal of Optimization Theory and Applications, 1998, 96(3): 589-626[23] Abu-Khalaf M, Lewis F L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica, 2005, 41(5): 779-791[24] Wang D, Liu D R, Wei Q L. Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach. Neurocomputing, 2012, 78(1): 14-22[25] Al-Tamimi A, Lewis F L, Abu-Khalaf M. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 943-949[26] Finlayson B A. The Method of Weighted Residuals and Variational Principles. New York: Academic Press, 1972[27] Zhang H G, Quan Y B. Modeling identification and control of a class of nonlinear system. IEEE Transactions on Fuzzy Systems, 2001, 9(2): 349-354[28] Kim Y H, Lewis F L, Dawson D M. Hamilton-Jacobi-Bellman optimal design of functional link neural network controller for robot manipulators. In: Proceedings of the 36th IEEE Conference on Decision and Control. San Diego, California USA, 1997, 2: 1038-1043[29] Wang L X, Mendel J M. Fuzzy basis functions, universal approximation, and orthogonal least-squares learning. IEEE Transactions on Neural Networks, 1992, 3(5): 807-814[30] Lewis F W, Jagannathan S, Yesildirek A. Neural Network Control of Robot Manipulators and Nonlinear Systems. USA: Taylor and Francis, Inc., 1998[31] Hagan M T, Demuth H B, Beale M H. Neural Network Design. Boston, MA: PWS Publishing, 1996[32] Abdollahi F, Talebi H A, Patel R V. A stable neural network observer with application to flexible-joint manipulators. In: Proceedings of the 9th International Conference on Neural Information Processing (ICONIP'02). Singapore: IEEE, 2002, 4: 1910-1914[33] Khalil H K. Nonlinear Systems, Third edition. New Jersey: Prentice Hall, 2001
点击查看大图
计量
- 文章访问数: 2052
- HTML全文浏览量: 55
- PDF下载量: 1444
- 被引次数: 0