2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

自适应动态规划综述

张化光 张欣 罗艳红 杨珺

张化光, 张欣, 罗艳红, 杨珺. 自适应动态规划综述. 自动化学报, 2013, 39(4): 303-311. doi: 10.3724/SP.J.1004.2013.00303
引用本文: 张化光, 张欣, 罗艳红, 杨珺. 自适应动态规划综述. 自动化学报, 2013, 39(4): 303-311. doi: 10.3724/SP.J.1004.2013.00303
ZHANG Hua-Guang, ZHANG Xin, LUO Yan-Hong, YANG Jun. An Overview of Research on Adaptive Dynamic Programming. ACTA AUTOMATICA SINICA, 2013, 39(4): 303-311. doi: 10.3724/SP.J.1004.2013.00303
Citation: ZHANG Hua-Guang, ZHANG Xin, LUO Yan-Hong, YANG Jun. An Overview of Research on Adaptive Dynamic Programming. ACTA AUTOMATICA SINICA, 2013, 39(4): 303-311. doi: 10.3724/SP.J.1004.2013.00303

自适应动态规划综述

doi: 10.3724/SP.J.1004.2013.00303
详细信息
    通讯作者:

    张化光

An Overview of Research on Adaptive Dynamic Programming

  • 摘要: 自适应动态规划(Adaptive dynamic programming, ADP)是最优控制领域新兴起的一种近似最优方法, 是当前国际最优化领域的研究热点. ADP方法 利用函数近似结构来近似哈密顿--雅可比--贝尔曼(Hamilton-Jacobi-Bellman, HJB)方程的解, 采用离线迭代或者在线更新的方法, 来获得系统的近似最优控制策略, 从而能够有效地解决非线性系统的优化控制问题. 本文按照ADP的结构变化、算法的发展和应用三个方面介绍ADP方法. 对目前ADP方法的研究成果加以总结, 并对这 一研究领域仍需解决的问题和未来的发展方向作了进一步的展望.
  • [1] Bellman R E. Dynamic Programming. Princeton: Princeton University Press, 1957[2] Dreyfus S E, Law A M. The Art and Theory of Dynamic Programming. New York: Academic Press, 1977[3] White D A, Sofge D A. Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. New York: Van Nostrand Reinhold, 1992[4] Werbos P J. Advanced forecasting methods for global crisis warning and models of intelligence. General Systems Yearbook, 1977, 22: 25-38[5] Werbos P J. A Menu of Designs for Reinforcement Learning over Time. Cambridge, MA: MIT Press, 1990. 67-95[6] Widrow B, Gupta N, Maitra S. Punish/reward: learning with a critic in adaptive threshold systems. IEEE Transactions on Systems, Man, and Cybernetics, 1973, 3(5): 455- 465[7] Chen Zong-Hai, Wen Feng, Wang Zhi-Ling. Neural network control of nonlinear systems based on adaptive critic. Control and Decision, 2007, 22(7): 765-768, 773(陈宗海, 文峰, 王智灵. 基于自适应评价的非线性系统神经网络控制. 控制与决策, 2007, 22(7): 765-768, 773)[8] Lendaris G G, Paintz C. Training strategies for critic and action neural networks in dual heuristic programming method. In: Proceedings of the 1997 IEEE International Conference on Neural Networks. Houston, USA: IEEE, 1997. 712-717[9] Werbos P J. Consistency of HDP applied to a simple reinforcement learning problem. Neural Networks, 1990, 3(2): 179-189[10] Bertsekas D P, Tsitsiklis J N. Neuro-Dynamic Programming. Belmont: Athena Scientific, 1996[11] Bertsekas D P. Dynamic programming and optimal control. Approximate Dynamic Programming (Fourth edition) II. Belmont: Athena Scientific, 2012[12] Murray J J, Cox C J, Lendaris G G, Saeks R. Adaptive dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and reviews, 2002, 32(2): 140-153[13] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA: The MIT Press, 1998[14] Si J, Barto A G, Powell W B, Wunsch D. Handbook of Learning and Approximate Dynamic Programming. Hoboken: Wiley-IEEE Press, 2004[15] Powell W B. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Princeton: Wiley, 2007[16] Balakrishnan S N, Ding J, Lewis F L. Issues on stability of ADP feedback controllers for dynamical systems. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 913-917[17] Wang F Y, Zhang H G, Liu D R. Adaptive dynamic programming: an introduction. IEEE Computational Intelligence Magazine, 2009, 4(2): 39-47[18] Prokhorov D V, Wunsch D C II. Adaptive critic designs. IEEE Transactions on Neural Networks, 1997, 8(5): 997-1007[19] Padhi R, Unnikrishnan N, Wang X H, Balakrishnan S N. A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Networks, 2006, 19(10): 1648-1660[20] Abu-Khalaf M, Lewis F L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica, 2005, 41(5): 779-791[21] Al-Tamimi A, Lewis F L, Abu-Khalaf M. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 943-949[22] Zhang H G, Wei Q L, Luo Y H. A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 937-942[23] Zhang H G, Luo Y H, Liu D R. Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Transactions on Neural Networks, 2009, 20(9): 1490-1503[24] Wei Q L, Zhang H G, Liu D R, Zhao Y. An optimal control scheme for a class of discrete-time nonlinear systems with time delays using adaptive dynamic programming. Acta Automatica Sinica, 2010, 36(1): 121-129[25] Song R Z, Zhang H G, Luo Y H, Wei Q L. Optimal control laws for time-delay systems with saturating actuators based on heuristic dynamic programming. Neurocomputing, 2010, 73(16-18): 3020-3027[26] Zhang H G, Song R Z, Wei Q L, Zhang T Y. Optimal tracking control for a class of nonlinear discrete-time systems with time delays based on heuristic dynamic programming. IEEE Transaction on Neural Networks, 2011, 22(12): 1851-1862[27] Al-Tamimi A, Abu-Khalaf M, Lewis F L. Adaptive critic designs for discrete-time zero-sum games with application to H∞ control. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2007, 37(1): 240-247[28] Abu-Khalaf M, Lewis F L, Huang J. Policy iterations on the Hamilton-Jacobi-Isaacs equation for H∞ state feedback control with input saturation. IEEE Transactions on Automatic Control, 2006, 51(12): 1989-1995[29] Abu-Khalaf M, Lewis F L, Huang J. Neurodynamic programming and zero-sum games for constrained control systems. IEEE Transactions on Neural Networks, 2008, 19(7): 1243-1252[30] Zhang X, Zhang H G, Wang X Y, Luo Y H. A new iteration approach to solve a class of finite-horizon continuous-time nonaffine nonlinear zero-sum game. International Journal of Innovative Computing, Information and Control, 2011, 7(2): 597-608[31] Zhang H G, Wei Q L, Liu D R. An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica, 2011, 47(1): 207- 214[32] Wei Q L, Zhang H G, Cui L L. Data-based optimal control for discrete-time zero-sum games of 2-D systems using adaptive critic designs. Acta Automatica Sinica, 2009, 35(6): 682-692[33] Wang F Y, Jin N, Liu D R, Wei Q L. Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with ε-error bound. IEEE Transactions on Neural Networks, 2011, 22(1): 24-36[34] Lin Xiao-Feng, Zhang Heng, Song Shao-Jian, Song Chun-Ning. Adaptive dynamic programming with ε-error bound for nonlinear discrete-time systems. Control and Decision, 2011, 26(10): 1586-1590, 1595(林小峰, 张衡, 宋绍剑, 宋春宁. 非线性离散时间系统带ε误差限的自适应动态规划. 控制与决策, 2011, 26(10): 1586-1590, 1595)[35] Vamvoudakis K G, Vrabie D, Lewis F L. Online policy iteration based algorithms to solve the continuous-time infinite horizon optimal control problem. In: Proceedings of the 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning. Nashville, USA: IEEE, 2009. 36-41[36] Vamvoudakis K G, Lewis F L. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 2010, 46(5): 878-888[37] Dierks T, Jagannthan S. Optimal control of affine nonlinear discrete-time systems. In: Proceedings of the 17th Mediterranean Conference on Control and Automation. Thessaloniki, Greece: IEEE, 2009. 1390-1395[38] Dierks T, Jagannathan S. Optimal tracking control of affine nonlinear discrete-time systems with unknown internal dynamics. In: Proceedings of the 48th IEEE Conference on Decision and Control and Conference on Chinese Control. Shanghai, China: IEEE, 2009. 6750-6755[39] Dierks T, Thumati B T, Jagannathan S. Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Networks, 2009, 22(5-6): 851-860[40] Zhang H G, Cui L L, Zhang X, Luo Y H. Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Transactions on Neural Networks, 2011, 22(12): 2226-2236[42] Vamvoudakis K G, Lewis F L. Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 2011, 47(8): 1556-1569[41] Dierks T, Jagannathan S. Optimal control of affine nonlinear continuous-time systems. In: Proceedings of the 2010 American Control Conference (ACC). Baltimore, USA: IEEE, 2010. 1568-1573[43] Liu W X, Venayagamoorthy G K, Wunsch D C II. A heuristic-dynamic-programming-based power system stabilizer for a turbogenerator in a single-machine power system. IEEE Transactions on Industry Applications, 2005, 41(5): 1377-1385[44] Park J W, Harley R G, Venayagamoorthy G K. Adaptive-critic-based optimal neurocontrol for synchronous generators in a power system using MLP/RBF neural networks. IEEE Transactions on Industry Applications, 2003, 39(5): 1529-1540[45] Venayagamoorthy G K, Harley R G, Wunsch D C. Dual heuristic programming excitation neurocontrol for generators in a multimachine power system. IEEE Transactions on Industry Applications, 2003, 39(2): 382-394[46] Lu C, Si J, Xie X R. Direct heuristic dynamic programming for damping oscillations in a large power system. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 1008-1013[47] Sun Jian, Liu Feng, Si J, Guo Wen-Tao, Mei Sheng-Wei. An improved approximate dynamic programming and its application in SVC control. Electric Machines and Control, 2011, 15(5): 95-102 (孙健, 刘锋, Si J, 郭文涛, 梅生伟. 一种改进的近似动态规划方法及其在SVC的应用. 电机与控制学报, 2011, 15(5): 95-102)[48] Bazzan A L C. A distributed approach for coordination of traffic signal agents. Autonomous Agents and Multi-Agent Systems, 2005, 10(1): 131-164[49] Zhao Dong-Bin, Liu De-Rong, Yi Jian-Qiang. An overview on the adaptive dynamic programming based urban city traffic signal optimal control. Acta Automatica Sinica, 2009, 35(6): 677-681(赵冬斌, 刘德荣, 易建强. 基于自适应动态规划的城市交通信号优化控制方法综述. 自动化学报, 2009, 35(6): 677-681)[50] Ray S, Venayagamoorthy G K, Chaudhuri B, Majumder R. Comparison of adaptive critic-based and classical wide-area controllers for power systems. IEEE Transactions Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 1002-1007[51] Li T, Zhao D B, Yi J Q. Heuristic dynamic programming strategy with eligibility traces. In: Proceedings of the 2008 American Control Conference. Seattle, USA: IEEE, 2008. 4535-4540[52] Bai X R, Zhao D B, Yi J Q, Xu J. Coordinated control of multiple ramp metering based on DHP(λ) controller. In: Proceedings of the 11th IEEE International Conference on Intelligent Transportation Systems. Beijing, China: IEEE, 2008. 351-356[53] Cai C. An approximate dynamic programming strategy for responsive traffic signal control. In: Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. Honolulu, USA: IEEE, 2007. 303-310[54] Li T, Zhao D B, Yi J Q. Adaptive dynamic programming for multi-intersections traffic signal intelligent control. In: Proceedings of the 11th IEEE International Conference on Intelligent Transportation Systems. Beijing, China: IEEE, 2008. 286-291[55] Bertsekas D P, Homer M L, Logan D A, Patek S D, Sandell N R. Missile defense and interceptor allocation by neuro-dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, 2000, 30(1): 42-51[56] Ferrari S, Stengel R F. Online adaptive critic flight control. Journal of Guidance, Control, and Dynamics, 2004, 27(5): 777-786[57] Liu D R, Javaherian H, Kovalenko O, Huang T. Adaptive critic learning techniques for engine torque and air-fuel ratio control. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 988-993[58] Liu D R, Zhang Y, Zhang H G. A self-learning call admission control scheme for CDMA cellular networks. IEEE Transactions on Neural Networks, 2005, 16(5): 1219-1228
  • 加载中
计量
  • 文章访问数:  8247
  • HTML全文浏览量:  345
  • PDF下载量:  7118
  • 被引次数: 0
出版历程
  • 收稿日期:  2012-07-19
  • 修回日期:  2012-10-29
  • 刊出日期:  2013-04-20

目录

    /

    返回文章
    返回