[1]
|
Bellman R E. Dynamic Programming. Princeton, NJ: Princeton University Press, 1957.
|
[2]
|
Werbos P J. Approximate dynamic programming for real-time control and neural modeling. Handbook of Intelligent Control. New York: Van Nostrand Reinhold, 1992.
|
[3]
|
Lewis F L, Vrabie D, Vamvoudakis K G. Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Systems, 2012, 32(6): 76-105 doi: 10.1109/MCS.2012.2214134
|
[4]
|
张化光, 张欣, 罗艳红, 杨珺.自适应动态规划综述.自动化学报, 2013, 39(4): 303-311 doi: 10.1016/S1874-1029(13)60031-2Zhang Hua-Guang, Zhang Xin, Luo Yan-Hong, Yang Jun. An overview of research on adaptive dynamic programming. Acta Automatica Sinica, 2013, 39(4): 303-311 doi: 10.1016/S1874-1029(13)60031-2
|
[5]
|
刘德荣, 李宏亮, 王鼎.基于数据的自学习优化控制:研究进展与展望.自动化学报, 2013, 39(11): 1858-1870 doi: 10.3724/SP.J.1004.2013.01858Liu De-Rong, Li Hong-Liang, Wang Ding. Data-based self-learning optimal control: research progress and prospects. Acta Automatica Sinica, 2013, 39(11): 1858-1870 doi: 10.3724/SP.J.1004.2013.01858
|
[6]
|
Hou Z S, Wang Z. From model-based control to data-driven control: survey, classification and perspective. Information Sciences, 2013, 235: 3-35 doi: 10.1016/j.ins.2012.07.014
|
[7]
|
Prokhorov D V, Wunsch D C. Adaptive critic designs. IEEE Transactions on Neural Networks, 1997, 8(5): 997-1007 doi: 10.1109/72.623201
|
[8]
|
Sutton R S, Barto A G. Reinforcement Learning——An Introduction. Cambridge, MA: MIT Press, 1998.
|
[9]
|
Si J, Wang Y T. Online learning control by association and reinforcement. IEEE Transactions on Neural Networks, 2001, 12(2): 264-276 doi: 10.1109/72.914523
|
[10]
|
王飞跃.平行控制:数据驱动的计算控制方法.自动化学报, 2013, 39(4): 293-302 http://www.aas.net.cn/CN/abstract/abstract17915.shtmlWang Fei-Yue. Parallel control: a method for data-driven and computational control. Acta Automatica Sinica, 2013, 39(4): 293-302 http://www.aas.net.cn/CN/abstract/abstract17915.shtml
|
[11]
|
Al-Tamimi A, Lewis F L, Abu-Khalaf M. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Transactions on Systems, Man, Cybernetics, Part B, Cybernetics, 2008, 38(4): 943-949 doi: 10.1109/TSMCB.2008.926614
|
[12]
|
Zhang H G, Luo Y H, Liu D R. Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Transactions on Neural Networks, 2009, 20(9): 1490-1503 doi: 10.1109/TNN.2009.2027233
|
[13]
|
Dierks T, Thumati B T, Jagannathan S. Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Networks, 2009, 22(5-6): 851-860 doi: 10.1016/j.neunet.2009.06.014
|
[14]
|
Wang F Y, Jin N, Liu D R, Wei Q L. Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with ε-error bound. IEEE Transactions on Neural Networks, 2011, 22(1): 24-36 doi: 10.1109/TNN.2010.2076370
|
[15]
|
Liu D R, Wang D, Zhao D B, Wei Q L, Jin N. Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Transactions on Automation Science and Engineering, 2012, 9(3): 628-634 doi: 10.1109/TASE.2012.2198057
|
[16]
|
Wang D, Liu D R, Wei Q L, Zhao D B, Jin N. Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica, 2012, 48(8): 1825-1832 doi: 10.1016/j.automatica.2012.05.049
|
[17]
|
Zhang H G, Qin C B, Luo Y H. Neural-network-based constrained optimal control scheme for discrete-time switched nonlinear system using dual heuristic programming. IEEE Transactions on Automation Science and Engineering, 2014, 11(3): 839-849 doi: 10.1109/TASE.2014.2303139
|
[18]
|
Liu D R, Li H L, Wang D. Error bounds of adaptive dynamic programming algorithms for solving undiscounted optimal control problems. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(6): 1323-1334 doi: 10.1109/TNNLS.2015.2402203
|
[19]
|
Zhong X N, Ni Z, He H B. A theoretical foundation of goal representation heuristic dynamic programming. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(12): 2513-2525 doi: 10.1109/TNNLS.2015.2490698
|
[20]
|
Heydari A, Balakrishnan S N. Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(1): 145-157 doi: 10.1109/TNNLS.2012.2227339
|
[21]
|
Jiang Y, Jiang Z P. Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(5): 882-893 doi: 10.1109/TNNLS.2013.2294968
|
[22]
|
Na J, Herrmann G. Online adaptive approximate optimal tracking control with simplified dual approximation structure for continuous-time unknown nonlinear systems. IEEE/CAA Journal of Automatica Sinica, 2014, 1(4): 412-422 doi: 10.1109/JAS.2014.7004668
|
[23]
|
Liu D R, Yang X, Wang D, Wei Q L. Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints. IEEE Transactions on Cybernetics, 2015, 45(7): 1372-1385 doi: 10.1109/TCYB.2015.2417170
|
[24]
|
Luo B, Wu H N, Huang T W. Off-policy reinforcement learning for H∞ control design. IEEE Transactions on Cybernetics, 2015, 45(1): 65-76 doi: 10.1109/TCYB.2014.2319577
|
[25]
|
Mu C X, Ni Z, Sun C Y, He H B. Air-breathing hypersonic vehicle tracking control based on adaptive dynamic programming. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(3): 584-598 doi: 10.1109/TNNLS.2016.2516948
|
[26]
|
Wang D, Liu D R, Zhang Q C, Zhao D B. Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2016, 46(11): 1544-1555 doi: 10.1109/TSMC.2015.2492941
|