2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于数据的自学习优化控制:研究进展与展望

刘德荣 李宏亮 王鼎

刘德荣, 李宏亮, 王鼎. 基于数据的自学习优化控制:研究进展与展望. 自动化学报, 2013, 39(11): 1858-1870. doi: 10.3724/SP.J.1004.2013.01858
引用本文: 刘德荣, 李宏亮, 王鼎. 基于数据的自学习优化控制:研究进展与展望. 自动化学报, 2013, 39(11): 1858-1870. doi: 10.3724/SP.J.1004.2013.01858
LIU De-Rong, LI Hong-Liang, WANG Ding. Data-based Self-learning Optimal Control: Research Progress and Prospects. ACTA AUTOMATICA SINICA, 2013, 39(11): 1858-1870. doi: 10.3724/SP.J.1004.2013.01858
Citation: LIU De-Rong, LI Hong-Liang, WANG Ding. Data-based Self-learning Optimal Control: Research Progress and Prospects. ACTA AUTOMATICA SINICA, 2013, 39(11): 1858-1870. doi: 10.3724/SP.J.1004.2013.01858

基于数据的自学习优化控制:研究进展与展望


DOI: 10.3724/SP.J.1004.2013.01858
详细信息
    作者简介:

    刘德荣 中国科学院自动化研究所研究员. 主要研究方向为智能控制理论及应用, 自适应动态规划, 人工神经网络, 计算神经科学, 电力系统运行与控制. E-mail: derong.liu@ia.ac.cn

  • 基金项目:

    国家自然科学基金(61034002,61233001,61273140)资助

Data-based Self-learning Optimal Control: Research Progress and Prospects

More Information
  • Fund Project:

    Supported by National Natural Science Foundation of China (61034002, 61233001, 61273140)

  • 摘要: 自适应动态规划(Adaptive dynamic programming, ADP)方法可以解决传统动态规划中的"维数灾"问题, 已经成为控制理论和计算智能领域最新的研究热点. ADP方法采用函数近似结构来估计系统性能指标函数, 然后依据最优性原理来获得近优的控制策略. ADP是一种具有学习和优化能力的智能控制方法, 在求解复杂非线性系统的最优控制问题中具有极大的潜力. 本文对ADP的理论研究、算法实现、相关应用等方面进行了全面的梳理, 涵盖了最新的研究进展, 并对ADP的未来发展趋势进行了分析和展望.
  • [1] Bellman R E. Dynamic Programming. Princeton, NJ: Princeton University Press, 1957
    [2] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998
    [3] Doya K. Reinforcement learning in continuous time and space. Neural Computation, 2000, 12(1): 219-245
    [4] Murray J J, Cox C J, Lendaris G G, Saeks R. Adaptive dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 2002, 32(2): 140-153
    [5] Prokhorov D V, Wunsch D C. Adaptive critic designs. IEEE Transactions on Neural Networks, 1997, 8(5): 997-1007
    [6] Werbos P J. Approximate dynamic programming for real-time control and neural modeling. Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. New York: Van Nostrand, 1992
    [7] Bertsekas D P, Tsitsiklis J N. Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996
    [8] Lewis F L, Huang J, Parisini T, Prokhorov D V, Wunsch D C. Special issue on neural networks for feedback control systems. IEEE Transactions on Neural Networks, 2007, 18(4): 969-972
    [9] Lewis F L, Lendaris G, Liu D. Special issue on adaptive dynamic programming and reinforcement learning in feeedback control. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 896-897
    [10] Ferrari S, Sarangapani J, Lewis F L. Special issue on approximate dynamic programming and reinforcement learning. Journal of Control Theory and Applications, 2011, 9(3): 309
    [11] White D A, Sofge D A. Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. New York: Van Nostrand, 1992
    [12] Si J, Barto A G, Powel W B, Wunsch D C. Handbook of Learning and Approximate Dynamic Programming. Piscataway, NJ: IEEE, 2004
    [13] Powel W B. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Hoboken, NJ: Wiley, 2007
    [14] Lucian B, Robert B, Bart S, Damien E. Reinforcement Learning and Dynamic Programming Using Function Approximators. Boca Raton, FL: CRC Press, 2010
    [15] Lewis F L, Liu D R. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Hoboken, NJ: Wiley, 2013
    [16] Shan Q H, Liu D R, Luo Y H, Wang D. Adaptive Dynamic Programming for Control: Algorithms and Stability. London, UK: Springer, 2013
    [17] Xu Xin. Reinforcement Learning and Approximate Dynamic Programming. Beijing: Science Press, 2010(徐昕. 增强学习与近似动态规划. 北京: 科学出版社, 2010)
    [18] Lewis F L, Vrabie D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits and Systems Magazine, 2009, 9(3): 32-50
    [19] Wang F Y, Zhang H G, Liu D R. Adaptive dynamic programming: an introduction. IEEE Computational Intelligence Magazine, 2009, 4(2): 39-47
    [20] Lewis F L, Vrabie D, Vamvoudakis K. Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Systems Magazine, 2012, 32(6): 76-105
    [21] Xu Xin, Shen Dong, Gao Yan-Qing, Wang Kai. Learning control of dynamical systems based on Markov decision processes: research frontiers and outlooks. Acta Automatica Sinica, 2012, 38(5): 673-687(徐昕, 沈栋, 高岩青, 王凯. 基于马氏决策过程模型的动态系统学习控制: 研究前沿与展望. 自动化学报, 2012, 38(5): 673-687)
    [22] Zhang Hua-Guang, Zhang Xin, Luo Yan-Hong, Yang Jun. An overview of research on adaptive dynamic programming. Acta Automatica Sinica, 2013, 39(4): 303-311(张化光, 张欣, 罗艳红, 杨珺. 自适应动态规划综述. 自动化学报, 2013, 39(4): 303-311)
    [23] Geist M, Pietquin O. Algorithmic survey of parametric value function approximation. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(6): 845-867
    [24] Al-Tamimi A, Lewis F L, Abu-Khalaf M. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 943-949
    [25] Dierks T, Thumati B T, Jagannathan S. Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Networks, 2009, 22(5-6): 851-860
    [26] Zhang H G, Wei Q L, Luo Y H. A novel infinite-time optimal tracking control scheme for a class of discrete-time non-linear systems via the greedy HDP iteration algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 937-942
    [27] Zhang H G, Luo Y H, Liu D R. Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Transactions on Neural Networks, 2009, 20(9): 1490-1503
    [28] Zhang H G, Song R Z, Wei Q L, Zhang T Y. Optimal tracking control for a class of nonlinear discrete-time systems with time delays based on heuristic dynamic programming. IEEE Transactions on Neural Networks, 2011, 22(12): 1851-1862
    [29] Wang F Y, Jin N, Liu D E, Wei Q L. Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with ε-error bound. IEEE Transactions on Neural Networks, 2011, 22(1): 24-36
    [30] Heydari A, Balakrishnan S N. Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(1): 145-157
    [31] Wang D, Liu D R, Wei Q L, Zhao D B, Jin N. Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica, 2012, 48(8): 1825-1832
    [32] Chen Z, Jagannathan S. Generalized Hamilton-Jacobi-Bellman formulation-based neural network control of affine nonlinear discrete time systems. IEEE Transactions on Neural Networks, 2008, 19(1): 90-106
    [33] Watkins C J C H, Dayan P. Q-Learning. Machine Learning, 1992, 8: 279-292
    [34] Liu D R, Wei Q L. Finite-approximation-error based optimal control approach for discrete-time nonlinear systems. IEEE Transactions on Cybernetics, 2013, 43(2): 779-789
    [35] Seong C Y, Widrow B. Neural dynamic optimization for control systems. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2001, 31(4): 482-513
    [36] Lewis F L, Vamvoudakis K G. Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output data. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2011, 41(1): 14-25
    [37] He P A, Jagannathan S. Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2007, 37(2): 425-436
    [38] Yang L, Si J, Tsakalis K S, Rodriguez A A. Direct heuristic dynamic programming for nonlinear tracking control with filtered tracking error. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2009, 39(6): 1617-1622
    [39] Yang Q M, Jagannathan S. Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2012, 42(2): 377-390
    [40] Yang Q M, Vance J B, Jagannathan S. Control of nonaffine nonlinear discrete-time systems using reinforcement-learning-based linearly parameterized neural networks. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 994-1001
    [41] Dierks T, Jagannathan S. Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using time-based policy update. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(7): 1118-1129
    [42] Kleinman D. On an iterative technique for Riccati equation computations. IEEE Transactions on Automatic Control, 1968, 13(1): 114-115
    [43] Saridis G N, Lee C S. An approximation theory of optimal control for trainable manipulators. IEEE Transactions on Systems, Man, and Cybernetics, 1979, 9(3): 152-159
    [44] Wang F Y, Saridis G N. Suboptimal control for nonlinear stochastic systems. In: Proceedings of the 31st IEEE Conference on Decision and Control. Tucson, Arizona, USA: IEEE, 1992. 1856-1861
    [45] Saridis G N, Wang F Y. Suboptimal control for nonlinear stochastic systems. Control Theory and Advanced Technology, 1994, 10(4): 847-871
    [46] Wang F Y, Saridis G N. On successive approximation of optimal control of stochastic dynamic systems. Modeling Uncertainty: An Examination of Stochastic Theory, Methods, and Applications. Boston, MA: Kluwer, 2002. 333-386
    [47] Beard R W, Saridis G N, Wen J T. Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation. Automatica, 1997, 33(12): 2159-2177
    [48] Abu-Khalaf M, Lewis F L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica, 2005, 41(5): 779-791
    [49] Cheng T, Lewis F L, Abu-Khalaf M. Fixed-final-time-constrained optimal control of nonlinear systems using neural network HJB approach. IEEE Transactions on Neural Networks, 2007, 18(6): 1725-1736
    [50] Cheng T, Lewis F L, Abu-Khalaf M. A neural network solution for fixed-final time optimal control of nonlinear systems. Automatica, 2007, 43(3): 482-490
    [51] Tassa Y, Erez T. Least squares solutions of the HJB equation with neural network value-function approximators. IEEE Transactions on Neural Networks, 2007, 18(4): 1031-1041
    [52] Hanselmann T, Noakes L, Zaknich A. Continuous time adaptive critics. IEEE Transactions on Neural Networks, 2007, 18(3): 631-647
    [53] Ferrari S, Steck J E, Chandramohan R. Adaptive feedback control by constrained approximate dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 982-987
    [54] Seiffertt J, Sanyal S, Wunsch D C. Hamilton-Jacobi-Bellman equations and approximate dynamic programming on time scales. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 918-923
    [55] Vrabie D, Pastravanu O, Abu-Khalaf M, Lewis F L. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica, 2009, 45(2): 477-484
    [56] Vrabie D, Lewis F L. Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Networks, 2009, 22(3): 23-246
    [57] Vamvoudakis K G, Lewis F L. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 2010, 46(5): 878-888.
    [58] Zhang H G, Cui L L, Zhang X, Luo Y H. Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. Transactions on Neural Networks, 2011, 22(12): 2226-2236
    [59] Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis K G, Lewis F L, Dixon, W E. A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica, 2013, 49(1): 82-92
    [60] Mehta P, Meyn S. Q-learning and Pontryagin's minimum principle. In: Proceedings of the 48th IEEE Conference on Decision and Control. Shanghai, China: IEEE, 2009. 3598-3605
    [61] Jiang Y, Jiang Z P. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica, 2012, 48(10): 2699-2704
    [62] Lee J Y, Park J B, Choi Y H. Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems. Automatica, 2012, 48(11): 2850-2859
    [63] Lee J Y, Park J B, Choi Y H. Integral reinforcement learning with explorations for continuous-time nonlinear systems. In: Proceedings of the 2012 IEEE World Congress on Computational Intelligence. Brisbane, Australia: IEEE, 2012. 1042-1047
    [64] Werbos P J. Advanced forecasting methods for global crisis warning and models of intelligence. General System Yearbook, 1977, 22: 25-38
    [65] Liu D R, Wang D, Zhao D B, Wei Q L, Jin N. Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Transactions on Automation Science and Engineering, 2012, 9(3): 628-634
    [66] Fairbank M, Alonso E, Prokhorov D. Simple and fast calculation of the second-order gradients for globalized dual heuristic dynamic programming in neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(10): 1671-1676
    [67] Si J, Wang Y T. Online learning control by association and reinforcement. IEEE Transactions on Neural Networks, 2001, 12(2): 264-276
    [68] Padhi R, Unnikrishnan N, Wang X H, Balakrishnan S N. A single network adaptive critic architecture for optimal control synthesis for a class of nonlinear systems. Neural Networks, 2006, 19(10): 1648-1660
    [69] Ding J, Balakrishnan S N. Approximate dynamic programming solutions with a single network adaptive critic for a class of nonlinear systems. Journal of Control Theory and Applications, 2011, 9(3): 370-380
    [70] Ni Z, He H B, Wen J Y. Adaptive learning in tracking control based on the dual critic network design. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(6): 913-928
    [71] Jin N, Liu D R, Huang T, Pang Z Y. Discrete-time adaptive dynamic programming using wavelet basis function neural networks. In: Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning. Honolulu, HI: IEEE, 2007. 135-142
    [72] Deb A K, Jayadeva, Gopal M. SVM-based tree-type neural networks as a critic in adaptive critic designs for control. IEEE Transactions on Neural Networks, 2007, 18(4): 1016-1030
    [73] Eaton P H, Prokhorov D V, Wunsch D C II. Neurocontroller alternatives for fuzzy ball-and-beam systems with nonuniform nonlinear friction. IEEE Transactions on Neural Networks, 2000, 11(2): 432-435
    [74] Koprinkova-Hristova P, Oubbati M, Palm G. Adaptive critic design with echo state network. In: Proceedings of the 2010 IEEE International Conference on Systems Man and Cybernetics. Sofia, Bulgaria: IEEE, 2010. 1010-1015
    [75] Xu X, Hou Z S, Lian C Q, He H B. Online learning control using adaptive critic designs with sparse kernel machines. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(5): 762-775
    [76] Fu J, He H, Zhou X. Adaptive learning and control for MIMO system based on adaptive dynamic programming. IEEE Transactions on Neural Networks, 2011, 22(7): 1133-1148
    [77] Mohagheghi S, Venayagamoorthy G K, Harley R G. Fully evolvable optimal neurofuzzy controller using adaptive critic designs. IEEE Transactions on Fuzzy Systems, 2008, 16(6): 1450-1461
    [78] Kulkarni R V, Venayagamoorthy G K. Adaptive critics for dynamic optimization. Neural Networks, 2010, 23(5): 587-591
    [79] Kang Qi, Wang Lei, An Jing, Wu Qi-Di. Approximate dynamic programming based parameter optimization of particle swarm systems. Acta Automatica Sinica, 2010, 36(8): 1171-1181(康琦, 汪镭, 安静, 吴启迪. 基于近似动态规划的微粒群系统参数优化研究. 自动化学报, 2010, 36(8): 1171-1181)
    [80] Jiang Y, Jiang Z P. Robust adaptive dynamic programming with an application to power systems. IEEE Transactions on Neural Networks, 2013, 24(7): 1150-1156
    [81] Jiang Y, Jiang Z P. Approximate dynamic programming for optimal stationary control with control-dependent noise. IEEE Transactions on Neural Networks, 2011, 22(12): 2392-2398
    [82] Varbie D, Lewis F L. Adaptive dynamic programming algorithm for finding online the equilibrium solution of the two-player zero-sum differential game. In: Proceedings of the 2010 International Joint Conference on Neural Networks. Barcelona, Spain: IEEE, 2010. 1-8
    [83] Varbie D, Lewis F L. Adaptive dynamic programming for online solution of a zero-sum differential game. Journal of Control Theory and Applications, 2011, 9(3): 353-360
    [84] Wu H, Luo B. Simultaneous policy update algorithms for learning the solution of linear continuous-time H∞ state feedback control. Information Sciences, 2013, 222(10): 472-485
    [85] Abu-Khalaf M, Lewis F L, Huang J. Policy iterations and the Hamilton-Jacobi-Isaacs equation for H∞ state feedback control with input saturation. IEEE Transactions on Automatic Control, 2006, 51(12): 1989-1995
    [86] Abu-Khalaf M, Lewis F L, Huang J. Neurodynamic progarmming and zero-sum games for constrained control systems. IEEE Transactions on Neural Networks, 2008, 19(7): 1243-1252
    [87] Zhang H G, Wei Q L, Liu D R. An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica, 2011, 47(1): 207-214
    [88] Vamvoudakis K G, Lewis F L. Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. International Journal of Robust and Nonlinear Control, 2012, 22(13): 1460-1483
    [89] Wu H N, Luo B. Neural network based online simultaneous policy update algorithm for solving the HJI equation in nonlinear H_∞ control. IEEE Transactions on Neural Networks, 2012, 23(12): 1884-1895
    [90] Johnson M, Bhasin S, Dixon W E. Nonlinear two-player zero-sum game approximate solution using a policy iteration algorithm. In: Proceedings of the 50th IEEE Conference on Decision and Control and European Control Conference. Orlando, USA: IEEE, 2011. 142-147
    [91] Al-Tamimi A, Abu-Khalaf M, Lewis F L. Adaptive critic designs for discrete-time zero-sum games with application to H_∞ control. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2007, 37(1): 240-247
    [92] Al-Tamimi A, Lewis F L, Abu-Khalaf M. Modelfree Q-learning designs for linear discretet-time zero-sum games with application to H_∞ control. Automatica, 2007, 43(3): 473-481
    [93] Kim J H, Lewis F L. Model-free H_∞ control design for unknown linear discrete-time systems via Q-learning with LMI. Automatica, 2010, 46(8): 1320-1326
    [94] Mehraeen S, Dierks T, Jagannathan S, Crow M L. Zero-sum two-player game theoretic formulation of affine nonlinear discrete-time systems using neural networks. In: Proceedings of the 2010 International Joint Conference on Neural Networks. Barcelona, Spain: IEEE, 2010. 1-8
    [95] Liu D R, Li H L, Wang D. H_∞ control of unknown discretetime nonlinear systems with control constraints using adaptive dynamic programming. In: Proceedings of the 2012 International Joint Conference on Neural Networks. Brisbane, Australia: IEEE, 2012. 3056-3061
    [96] Vrabie D, Lewis F L. Integral reinforcement learning for online computation of feedback Nash strategies of nonzero-sum differential games. In: Proceedings of the 49th IEEE Conference on Decision and Control. Atlanta, GA: IEEE, 2010. 3066-3071
    [97] Vamvoudakis K G, Lewis F L. Multi-player non-zero sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 2011, 47(8): 1556-1569
    [98] Vamvoudakis K G, Lewis F L, Hudas G R. Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica, 2012, 48(8): 1598-1611
    [99] Zhang H G, Cui L L, Luo Y H. Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Transactions on Cybernetics, 2013, 43(1): 206-216
    [100] Vamvoudakis K G, Lewis F L, Johnson M, Dixon W E. Online learning algorithm for Stackelberg games in problems with hierarchy. In: Proceedings of the 51st IEEE Conference on Decision and Control. Maui, Hawaii, USA: IEEE, 2012. 1883-1889
    [101] Mehraeen S, Jagannathan S. Decentralized optimal control of a class of interconnected nonlinear discrete-Time systems by using online Hamilton-Jacobi-Bellman formulation. IEEE Transactions on Neural Networks, 2011, 22(11): 1757-1769
    [102] Jiang Y, Jiang Z P. Robust adaptive dynamic programming for large-scale systems with an application to multimachine power systems. IEEE Transactions on Circuits and Systems II: Express Briefs, 2012, 59(10): 693-697
    [103] Xu H, Jagannathan S, Lewis F L. Stochastic optimal control of unknown linear networked control system in the presence of random delays and packet losses. Automatica, 2012, 48(6): 1017-1030
    [104] Xu H, Jagannathan S. Stochastic optimal controller design for uncertain nonlinear networked control system via neuro dynamic programming. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(5): 471-484
    [105] Bertsekas D P. Dynamic programming and suboptimal control: a survey from ADP to MPC. European Journal of Control, 2005, 11(4-5): 310-334
    [106] Cox C J, Stepniewski S W, Jorgensen C C, Saeks R. On the design of a neural network autolander. International Journal of Robust and Nonlinear Control, 1999, 9: 1071-1096
    [107] Enns R, Si J. Helicopter trimming and tracking control using direct neural dynamic programming. IEEE Transactions on Neural Networks, 2003, 14(4): 929-939
    [108] Nodland D, Zargarzadeh H, Jagannathan S. Neural-network-based optimal adaptive output feedback control of a helicopter UAV. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(7): 1061-1073
    [109] Lin C. Adaptive critic autopilot design of bank-to-turn missiles using fuzzy basis function networks. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 35(2): 929-939
    [110] Han D, Balakrishnan S N. State-constrained agile missile control with adaptive-critic-based neural networks. IEEE Transactions on Control Systems Technology, 2002, 10(4): 481-489
    [111] Lin W S, Chang L H, Yang P C. Adaptive critic anti-slip control of wheeled autonomous robot. Automatica, 2008, 44(11): 2716-2723
    [112] Liu D R, Javaherian H, Kovalenko O, Huang T. Adaptive critic learning techniques for engine torque and air-fuel ratio control. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 988-993
    [113] Shih P, Kaul B C, Jagannathan S, Drallmeier J A. Reinforcement-learning-based output-feedback control of nonstrict nonlinear discrete-time systems with application to engine emission control. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2009, 39(5): 1162-1179
    [114] Peter S, Brian C K, Jagannathan S, Drallmeier J A. Reinforcement-learning-based dual-control methodology for complex nonlinear discrete-time systems with application to spark engine EGR operation. IEEE Transactions on Neural Networks, 2008, 19(8): 1369-1388
    [115] Park J W, Harley R G, Venayagamoorthy G K. Adaptive critic-based optimal neurocontrol for synchronous generators in a power system using MLP/RBF neural networks. IEEE Transactions on Industry Applications, 2003, 39(5): 1529-1540
    [116] Liu W X, Venayagamoorthy G K, Wunsch D C. A heuristic-dynamic-programming-based power system stabilizer for a turbogenerator in a single-machine power system. IEEE Transactions on Industry Applications, 2005, 41(5): 1377-385
    [117] Mohagheghi S, Venayagamoorthy G K, Harley R G. Adaptive critic design based neuro-fuzzy controller for a static compensator in a multimachine power system. IEEE Transactions on Power Systems, 2006, 21(4): 1744-1754
    [118] Mohagheghi S, Valle Y, Venayagamoorthy G K, Harley R G. A proportional-integrator type adaptive critic design-based neurocontroller for a static compensator in a multimachine power system. IEEE Transactions on Industry Applications Electronics, 2007, 54(1): 86-96
    [119] Ray S, Venayagamoorthy G K, Chaudhuri B, Majumder R. Comparison of adaptive critics and classical approaches based wide area controllers for a power system. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 1002-1007
    [120] Qiao W, Harley R G, Venayagamoorthy G K. Coordinated reactive power control of a large wind farm and a STATCOM using heuristic dynamic programming. IEEE Transactions on Energy Conversion, 2009, 24(2): 493-503
    [121] Liang J Q, Venayagamoorthy G K, Harley R G. Wide-area measurement based dynamic stochastic optimal power flow control for smart grids with high variability and uncertainty. IEEE Transactions on Smart Grid, 2012, 3(1): 59-69
    [122] Lu C, Si J N, Xie X R. Direct heuristic dynamic programming for damping oscillations in a large power system. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 1008-1013
    [123] Huang T, Liu D E. A self-learning scheme for residential energy system control and management. Neural Computing and Applications, 2013, 2(2): 259-269
    [124] Zhao Dong-Bin, Liu De-Rong, Yi Jian-Qiang. An overview on the adaptive dynamic programming based urban city traffic signal optimal control. Acta Automatica Sinica, 2009, 35(6): 676-681(赵冬斌, 刘德荣, 易建强. 基于自适应动态规划的城市交通信号优化控制方法综述. 自动化学报, 2009, 35(6): 676-681)
    [125] Lin W S, Sheu J W. Metro traffic regulation by adaptive optimal control. EEE Transactions on Intelligent Transportation System, 2011, 12(4): 1064-1073
    [126] Lin W S, Sheu J W. Optimization of train regulation and energy usage of metro lines using an adaptive-optimal-control algorithm. IEEE Transactions on Automation Science and Engineering, 2011, 8(4): 855-864
    [127] Sheu J W, Lin W S. Adaptive optimal control for designing automatic train regulation for metro line. IEEE Transactions on Control Systems Technology, 2012, 20(5): 1319-1327
    [128] Sheu J W, Lin W S. Energy-saving automatic train regulation using dual heuristic programming. IEEE Transactions on Vehicular Technology, 2012, 61(4): 1503-1514
    [129] Zhao D B, Bai X R, Wang F Y, Xu J, Yu W. DHP method for ramp metering of freeway traffic. EEE Transactions on Intelligent Transportation System, 2011, 12(4): 990-999
    [130] Cai C, Wong C K, Heydecker B G. Adaptive traffic signal control using approximate dynamic programming. Transportation Research Part C, 2009, 17(5): 456-474
    [131] Shervais S, Shannon T T, Lendaris G G. Intelligent supply chain management using adaptive critic learning. IEEE Transactions on Systems, Man, and Cybernetics, Part A, 2003, 33(2): 235-244
    [132] Sun Z, Chen X, He Z Z. Adaptive critic design for energy minimization of portable video communication devices. IEEE Transactions on Circuits and Systems for Video Technology, 2010, 20(1): 37-37
    [133] Iftekharuddin K M. Transformation invariant on-line target recognition. IEEE Transactions on Neural Networks, 2011, 22(6): 906-918
    [134] Venayagamoorthy G K, Zha W. Comparison of nonuniform optimal quantizer designs for speech coding with adaptive critics and particle swarm. IEEE Transactions on Industry Applications, 2007, 43(1): 238-244
    [135] Lee J M, Lee J H. Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes. Automatica, 2005, 41(7): 1281-1288
    [136] Lee J M, Lee J H. An approximate dynamic programming based approach to dual adaptive control. Journal of Process Control, 2009, 19(1): 85-864
    [137] Lee J M, Kaisare N S, Lee J H. Choice of approximator and design of penalty function for an approximate dynamic programming based control approach. Journal of Process Control, 2006, 16(2): 135-156
    [138] Lee J M, Lee J H. Value function-based approach to the scheduling of multiple controllers. Journal of Process Control, 2008, 18(6): 533-542
    [139] Govindhasamy J J, McLoone S F, Irwin G W. Second-order training of adaptive critics for online process control. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2005, 35(2): 381-385
    [140] Iyer M S, Wunsch D C. Dynamic re-optimization of a fed-batch fermentor using adaptive critic designs. IEEE Transactions on Neural Networks, 2001, 12(6): 1433-1444
    [141] Marbach P, Mihatsch O, Tsitsiklis J N. Call admission control and routing in integrated services networks using neuro-dynamic programming. IEEE Journal on Selected Areas in Communications, 2000, 18(2): 197-208
    [142] Liu D, Zhang Y, Zhang H. A self-learning call admission control scheme for CDMA cellular networks. IEEE Transactions on Neural Networks, 2005, 16(5): 1219-1228
    [143] Williams J L, Fisher J W, Willsky A S. Approximate dynamic programming for communication-constrained sensor network management. IEEE Transactions on Signal Processing, 2007, 55(8): 3995-4003
  • [1] 张耀中, 胡小方, 周跃, 段书凯. 基于多层忆阻脉冲神经网络的强化学习及应用[J]. 自动化学报, 2019, 45(8): 1536-1547. doi: 10.16383/j.aas.c180685
    [2] 袁兆麟, 何润姿, 姚超, 李佳, 班晓娟, 李潇睿. 基于强化学习的浓密机底流浓度在线控制算法[J]. 自动化学报, 2019, 45(): 1-15. doi: 10.16383/j.aas.c190348
    [3] 张绍杰, 吴雪, 刘春生. 执行器故障不确定非线性系统最优自适应输出跟踪控制[J]. 自动化学报, 2018, 44(12): 2188-2197. doi: 10.16383/j.aas.2018.c170300
    [4] 孙景亮, 刘春生. 基于自适应动态规划的导弹制导律研究综述[J]. 自动化学报, 2017, 43(7): 1101-1113. doi: 10.16383/j.aas.2017.c160735
    [5] 王鼎, 穆朝絮, 刘德荣. 基于迭代神经动态规划的数据驱动非线性近似最优调节[J]. 自动化学报, 2017, 43(3): 366-375. doi: 10.16383/j.aas.2017.c160272
    [6] 陈兴国, 俞扬. 强化学习及其在电脑围棋中的应用[J]. 自动化学报, 2016, 42(5): 685-695. doi: 10.16383/j.aas.2016.y000003
    [7] 王康, 李晓理, 贾超, 宋桂芝. 基于自适应动态规划的矿渣微粉生产过程跟踪控制[J]. 自动化学报, 2016, 42(10): 1542-1551. doi: 10.16383/j.aas.2016.c150808
    [8] 王澄, 刘德荣, 魏庆来, 赵冬斌, 夏振超. 带有储能设备的智能电网电能迭代自适应动态规划最优控制[J]. 自动化学报, 2014, 40(9): 1984-1990. doi: 10.3724/SP.J.1004.2014.01984
    [9] 张吉烈, 张化光, 罗艳红, 梁洪晶. 基于广义模糊双曲模型的自适应动态规划最优控制设计[J]. 自动化学报, 2013, 39(2): 142-149. doi: 10.3724/SP.J.1004.2013.00142
    [10] 张化光, 张欣, 罗艳红, 杨珺. 自适应动态规划综述[J]. 自动化学报, 2013, 39(4): 303-311. doi: 10.3724/SP.J.1004.2013.00303
    [11] 徐昕, 沈栋, 高岩青, 王凯. 基于马氏决策过程模型的动态系统学习控制:研究前沿与展望[J]. 自动化学报, 2012, 38(5): 673-687. doi: 10.3724/SP.J.1004.2012.00673
    [12] 陈杰, 李志平, 张国柱. 变结构神经网络自适应鲁棒控制[J]. 自动化学报, 2010, 36(1): 174-178. doi: 10.3724/SP.J.1004.2010.00174
    [13] 康琦, 汪镭, 安静, 吴启迪. 基于近似动态规划的微粒群系统参数优化研究[J]. 自动化学报, 2010, 36(8): 1171-1181. doi: 10.3724/SP.J.1004.2010.01171
    [14] 魏庆来, 张化光, 刘德荣, 赵琰. 基于自适应动态规划的一类带有时滞的离散时间非线性系统的最优控制策略[J]. 自动化学报, 2010, 36(1): 121-129. doi: 10.3724/SP.J.1004.2010.00121
    [15] 王良勇, 柴天佑, 方正. 考虑驱动系统动态的机械手神经网络控制及应用[J]. 自动化学报, 2009, 35(5): 622-626. doi: 10.3724/SP.J.1004.2009.00622
    [16] 赵冬斌, 刘德荣, 易建强. 基于自适应动态规划的城市交通信号优化控制方法综述[J]. 自动化学报, 2009, 35(6): 676-681. doi: 10.3724/SP.J.1004.2009.00676
    [17] 袁著祉, 陈增强, 李翔. 联接主义智能控制综述[J]. 自动化学报, 2002, 28(增刊): 38-59.
    [18] 侯增广, 吴沧浦. 一种基于动态规划策略的离散动态大系统递阶优化神经网络[J]. 自动化学报, 1999, 25(1): 45-51.
    [19] 倪先锋, 陈宗基, 周绥平. 基于神经网络的非线性学习控制研究[J]. 自动化学报, 1993, 19(3): 307-315.
    [20] 应行仁, 曾南. 采用BP神经网络记忆模糊规则的控制[J]. 自动化学报, 1991, 17(1): 63-67.
  • 加载中
计量
  • 文章访问数:  2105
  • HTML全文浏览量:  85
  • PDF下载量:  3591
  • 被引次数: 0
出版历程
  • 收稿日期:  2013-06-28
  • 修回日期:  2013-09-02
  • 刊出日期:  2013-11-20

基于数据的自学习优化控制:研究进展与展望

doi: 10.3724/SP.J.1004.2013.01858
    基金项目:

    国家自然科学基金(61034002,61233001,61273140)资助

    作者简介:

    刘德荣 中国科学院自动化研究所研究员. 主要研究方向为智能控制理论及应用, 自适应动态规划, 人工神经网络, 计算神经科学, 电力系统运行与控制. E-mail: derong.liu@ia.ac.cn

摘要: 自适应动态规划(Adaptive dynamic programming, ADP)方法可以解决传统动态规划中的"维数灾"问题, 已经成为控制理论和计算智能领域最新的研究热点. ADP方法采用函数近似结构来估计系统性能指标函数, 然后依据最优性原理来获得近优的控制策略. ADP是一种具有学习和优化能力的智能控制方法, 在求解复杂非线性系统的最优控制问题中具有极大的潜力. 本文对ADP的理论研究、算法实现、相关应用等方面进行了全面的梳理, 涵盖了最新的研究进展, 并对ADP的未来发展趋势进行了分析和展望.

English Abstract

刘德荣, 李宏亮, 王鼎. 基于数据的自学习优化控制:研究进展与展望. 自动化学报, 2013, 39(11): 1858-1870. doi: 10.3724/SP.J.1004.2013.01858
引用本文: 刘德荣, 李宏亮, 王鼎. 基于数据的自学习优化控制:研究进展与展望. 自动化学报, 2013, 39(11): 1858-1870. doi: 10.3724/SP.J.1004.2013.01858
LIU De-Rong, LI Hong-Liang, WANG Ding. Data-based Self-learning Optimal Control: Research Progress and Prospects. ACTA AUTOMATICA SINICA, 2013, 39(11): 1858-1870. doi: 10.3724/SP.J.1004.2013.01858
Citation: LIU De-Rong, LI Hong-Liang, WANG Ding. Data-based Self-learning Optimal Control: Research Progress and Prospects. ACTA AUTOMATICA SINICA, 2013, 39(11): 1858-1870. doi: 10.3724/SP.J.1004.2013.01858
参考文献 (143)

目录

    /

    返回文章
    返回