2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于自适应动态规划的导弹制导律研究综述

孙景亮 刘春生

孙景亮, 刘春生. 基于自适应动态规划的导弹制导律研究综述. 自动化学报, 2017, 43(7): 1101-1113. doi: 10.16383/j.aas.2017.c160735
引用本文: 孙景亮, 刘春生. 基于自适应动态规划的导弹制导律研究综述. 自动化学报, 2017, 43(7): 1101-1113. doi: 10.16383/j.aas.2017.c160735
SUN Jing-Liang, LIU Chun-Sheng. An Overview on the Adaptive Dynamic Programming Based Missile Guidance Law. ACTA AUTOMATICA SINICA, 2017, 43(7): 1101-1113. doi: 10.16383/j.aas.2017.c160735
Citation: SUN Jing-Liang, LIU Chun-Sheng. An Overview on the Adaptive Dynamic Programming Based Missile Guidance Law. ACTA AUTOMATICA SINICA, 2017, 43(7): 1101-1113. doi: 10.16383/j.aas.2017.c160735

基于自适应动态规划的导弹制导律研究综述

doi: 10.16383/j.aas.2017.c160735
基金项目: 

国家自然科学基金 61473147

江苏省普通高校学术学位研究生科研创新计划项目 KYLX16_0376

南京航空航天大学博士学位论文创新与创优基金 BCXJ16-02

详细信息
    作者简介:

    孙景亮 南京航空航天大学自动化学院博士研究生.主要研究方向为最优控制, 微分对策, 自适应动态规划.E-mail:sunjingliangac@163.com

    通讯作者:

    刘春生 南京航空航天大学自动化学院教授.主要研究方向为自适应控制, 最优控制, 故障诊断与容错控制及其在飞行器中的应用.本文通信作者. E-mail:liuchsh@nuaa.edu.cn

An Overview on the Adaptive Dynamic Programming Based Missile Guidance Law

Funds: 

National Natural Science Foundation of China 61473147

Funding of Jiangsu Innovation Program for Graduated Education KYLX16_0376

Funding for Outstanding Doctoral Dissertation in Nanjing University of Acronautics and Astronautics BCXJ16-02

More Information
    Author Bio:

      Ph. D. candidate at the College of Automation Engineering, Nanjing University of Aeronautics and Astronautics. His research interest covers optimal control, difierential game, and adaptive dynamic programming

    Corresponding author: LIU Chun-Sheng  Professor at the College of Automation Engineering, Nanjing University of Aeronautics and Astronautics. Her research interest covers adaptive control, optimal control, fault diagnosis and tolerant control with the application in aircraft. Corresponding author of this paper. E-mail:liuchsh@nuaa.edu.cn
  • 摘要: 自适应动态规划(Adaptive dynamic programming,ADP)作为最优控制领域的近似优化方法,是求解复杂非线性系统最优控制问题的有力工具.近年来,已成为控制理论与计算智能领域的研究热点.本文着重介绍ADP算法的理论研究进展及其在航空航天领域的应用.分析了几种典型的制导律优化设计方法,以及ADP方法在导弹制导律设计中的应用现状和前景.
    1)  本文责任编委 魏庆来
  • [1] Zhang H G, Zhang X, Luo Y H, Yang J. An overview of research on adaptive dynamic programming. Acta Automatica Sinica, 2013, 39(4):303-311 doi: 10.1016/S1874-1029(13)60031-2
    [2] Liu D R, Li H L, Wang D. Data-based self-learning optimal control:research progress and prospects. Acta Automatica Sinica, 2013, 39(11):1858-1870 doi: 10.3724/SP.J.1004.2013.01858
    [3] Werbos P. Beyond Regression:New Tools for Prediction and Analysis in the Behavioral Sciences[Ph., D. dissertation], Harvard University, USA, 1974.
    [4] Prokhorov D V, Wunsch D C. Adaptive critic designs. IEEE Transactions on Neural Networks, 1997, 8(5):997-1007 doi: 10.1109/72.623201
    [5] Padhi R, Unnikrishnan N, Wang X H, Balakrishnan S N. A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems. Neural Networks, 2006, 19(10):1648-1660 doi: 10.1016/j.neunet.2006.08.010
    [6] Wang Y, O'Donoghue B, Boyd S. Approximate dynamic programming via iterated Bellman inequalities. International Journal of Robust and Nonlinear Control, 2015, 25(10):1472-1496 doi: 10.1002/rnc.v25.10
    [7] Bertsekas D P, Tsitsiklis J N. Neuro-dynamic programming:an overview. In:Proceedings of the 34th IEEE Conference on Decision and Control. New Orleans, LA, USA:IEEE, 1995. 560-564
    [8] Zhu L M, Modares H, Peen G O, Lewis F L, Yue B Z. Adaptive suboptimal output-feedback control for linear systems using integral reinforcement learning. IEEE Transactions on Control Systems Technology, 2015, 23(1):264-273 doi: 10.1109/TCST.2014.2322778
    [9] Bhasin S. Reinforcement Learning and Optimal Control Methods for Uncertain Nonlinear Systems[Ph., D. dissertation], University of Florida, USA, 2011.
    [10] Vrabie D, Vamvoudakis K G, Lewis F L. Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles. London:IET, 2012.
    [11] Zhang H, Liu D, Luo Y, Wang D. Adaptive Dynamic Programming for Control:Algorithms and Stability. London:Springer-Verlag, 2013.
    [12] Lewis F L, Liu D R. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. New Jersey:IEEE Press, 2013.
    [13] Jiang Z P, Jiang Y. Robust adaptive dynamic programming for linear and nonlinear systems:an overview. European Journal of Control, 2013, 19(5):417-425 doi: 10.1016/j.ejcon.2013.05.017
    [14] Khan S G, Herrmann G, Lewis F L, Pipe T, Melhuish C. Reinforcement learning and optimal adaptive control:an overview and implementation examples. Annual Reviews in Control, 2012, 36(1):42-59 doi: 10.1016/j.arcontrol.2012.03.004
    [15] Buşoniu L, Ernst D, De Schutter B, Babuška R. Approximate reinforcement learning:an overview. In:Proceedings of the 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning. Paris, France:IEEE, 2011. 1-8
    [16] 刁兆师. 导弹精确高效末制导与控制若干关键技术研究[博士学位论文], 北京理工大学, 中国, 2015. http://cdmd.cnki.com.cn/Article/CDMD-10007-1015801403.htm

    Diao Zhao-Shi. Research on High-precision and High-efficiency Terminal Guidance and Control Key Technologies for Missiles[Ph., D. dissertation], Beijing Institute of Technology, China, 2015. http://cdmd.cnki.com.cn/Article/CDMD-10007-1015801403.htm
    [17] 李运迁. 大气层内拦截弹制导控制及一体化研究[博士学位论文], 哈尔滨工业大学, 中国, 2011. http://cdmd.cnki.com.cn/Article/CDMD-10213-1012000340.htm

    Li Yun-Qian. Integrated Guidance and Control for Endo-Atmospheric Interceptors[Ph., D. dissertation], Harbin Institute of Technology, China, 2011. http://cdmd.cnki.com.cn/Article/CDMD-10213-1012000340.htm
    [18] 孙传鹏. 基于博弈论的拦截制导问题研究[博士学位论文], 哈尔滨工业大学, 中国, 2014. http://cdmd.cnki.com.cn/Article/CDMD-10213-1014081874.htm

    Sun Chuan-Peng. Research on Interception Guidance Based on Game Theory[Ph., D. dissertation], Harbin Institute of Technology, China, 2014. http://cdmd.cnki.com.cn/Article/CDMD-10213-1014081874.htm
    [19] Liu D R, Wei Q L. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(3):621-634 doi: 10.1109/TNNLS.2013.2281663
    [20] Wei Q L, Liu D R, Lin Q, Song R Z. Discrete-time optimal control via local policy iteration adaptive dynamic programming. IEEE Transactions on Cybernetics, 2016, DOI: 10.1109/TCYB.2016.2586082
    [21] Zhang H G, Song R Z, Wei Q L, Zhang T Y. Optimal tracking control for a class of nonlinear discrete-time systems with time delays based on heuristic dynamic programming. IEEE Transactions on Neural Networks, 2011, 22(12):1851-1862 doi: 10.1109/TNN.2011.2172628
    [22] Song R Z, Xiao W D, Zhang H G, Sun C Y. Adaptive dynamic programming for a class of complex-valued nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(9):1733-1739 doi: 10.1109/TNNLS.2014.2306201
    [23] Wei Q L, Liu D R, Yang X. Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(4):866-879 doi: 10.1109/TNNLS.2015.2401334
    [24] Wei Q L, Liu D R, Lewis F L, Liu Y, Zhang J. Mixed iterative adaptive dynamic programming for optimal battery energy control in smart residential microgrids. IEEE Transactions on Industrial Electronics, 2017, DOI:10. 1109/TIE.2017.265087
    [25] Wei Q L, Liu D R, Lin Q, Song R Z. Adaptive dynamic programming for discrete-time zero-sum games. IEEE Transactions on Neural Networks and Learning Systems, 2017, DOI: 10.1109/TNNLS.2016.2638863
    [26] Wei Q L, Liu D R. A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems. Science China Information Sciences, 2015, 58(12):1-15 doi: 10.1007%2F978-981-10-4080-1_4
    [27] Kiumarsi B, Lewis F L, Modares H, Karimpour A, Naghibi-Sistani M B. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica, 2014, 50(4):1167-1175 doi: 10.1016/j.automatica.2014.02.015
    [28] Vamvoudakis K G. Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems. Automatica, 2015, 61:274-281 doi: 10.1016/j.automatica.2015.08.017
    [29] Murray J J, Cox C J, Lendaris G G, Saeks R. Adaptive dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics, Part C:Applications and Reviews, 2002, 32(2):140-153 doi: 10.1109/TSMCC.2002.801727
    [30] Al-Tamimi A, Lewis F L, Abu-Khalaf M. Discrete-time nonlinear HJB solution using approximate dynamic programming:convergence proof. IEEE Transactions on Systems, Man, and Cybernetics, Part B:Cybernetics, 2008, 38(4):943-949 doi: 10.1109/TSMCB.2008.926614
    [31] Wang F Y, Jin N, Liu D R, Wei Q L. Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with şvarepsilon-error bound. IEEE Transactions on Neural Networks, 2011, 22(1):24-36 doi: 10.1109/TNN.2010.2076370
    [32] Heydari A, Balakrishnan S N. Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(1):145-157 doi: 10.1109/TNNLS.2012.2227339
    [33] Wei Q L, Liu D R, Xu Y C. Neuro-optimal tracking control for a class of discrete-time nonlinear systems via generalized value iteration adaptive dynamic programming approach. Soft Computing, 2016, 20(2):697-706 doi: 10.1007/s00500-014-1533-0
    [34] Wei Q L, Liu D R, Lin H Q. Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Transactions on Cybernetics, 2016, 46(3):840-853 doi: 10.1109/TCYB.2015.2492242
    [35] Wei Q L, Lewis F L, Sun Q Y, Yan P F, Song R Z. Discrete-time deterministic Q-learning:a novel convergence analysis. IEEE Transactions on Cybernetics, 2017, 47(5):1024-0237 http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7450633&
    [36] Wei Q L, Song R Z, Sun Q Y. Nonlinear neuro-optimal tracking control via stable iterative Q-learning algorithm. Neurocomputing, 2015, 168:520-528 doi: 10.1016/j.neucom.2015.05.075
    [37] Wei Q L, Liu D R. Stable iterative adaptive dynamic programming algorithm with approximation errors for discrete-time nonlinear systems. Neural Computing and Applications, 2014, 24(6):1355-1367 doi: 10.1007/s00521-013-1361-7
    [38] Wei Q L, Liu D R. Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. IEEE Transactions on Automation Science and Engineering, 2014, 11(4):1020-1036 doi: 10.1109/TASE.2013.2284545
    [39] Wei Q L, Liu D R. Numerical adaptive learning control scheme for discrete-time non-linear systems. IET Control Theory and Applications, 2013, 7(11):1472-1486 doi: 10.1049/iet-cta.2012.0486
    [40] Wei Q L, Liu D R, Lin Q. Discrete-time local value iteration adaptive dynamic programming:admissibility and termination analysis. IEEE Transactions on Neural Networks and Learning Systems, 2017, DOI:10.1109/TNNLS. 2016.2593743
    [41] Wei Q L, Wang F Y, Liu D R, Yang X. Finite-approximation-error-based discrete-time iterative adaptive dynamic programming. IEEE Transactions on Cybernetics, 2014, 44(12):2820-2833 doi: 10.1109/TCYB.2014.2354377
    [42] Zhang H G, Luo Y H, Liu D R. Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Transactions on Neural Networks, 2009, 20(9):1490-1503 doi: 10.1109/TNN.2009.2027233
    [43] Wei Q L, Zhang H G, Liu D R, Zhao Y. An optimal control scheme for a class of discrete-time nonlinear systems with time delays using adaptive dynamic programming. Acta Automatica Sinica, 2010, 36(1):121-129 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.669.1013&rep=rep1&type=pdf
    [44] Song R Z, Wei Q L, Sun Q Y. Nearly finite-horizon optimal control for a class of nonaffine time-delay nonlinear systems based on adaptive dynamic programming. Neurocomputing, 2015, 156:166-175 doi: 10.1016/j.neucom.2014.12.066
    [45] Wang D, Liu D R, Wei Q L. Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach. Neurocomputing, 2012, 78(1):14-22 doi: 10.1016/j.neucom.2011.03.058
    [46] Abu-Khalaf M, Lewis F L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica, 2005, 41(5):779-791 doi: 10.1016/j.automatica.2004.11.034
    [47] Tassa Y, Erez T. Least squares solutions of the HJB equation with neural network value-function approximators. IEEE Transactions on Neural Networks, 2007, 18(4):1031-1041 doi: 10.1109/TNN.2007.899249
    [48] Song R Z, Lewis F L, Wei Q L, Zhang H G. Off-policy actor-critic structure for optimal control of unknown systems with disturbances. IEEE Transactions on Cybernetics, 2016, 46(5):1041-1050 doi: 10.1109/TCYB.2015.2421338
    [49] Song R Z, Lewis F L, Wei Q L. Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero-sum games. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(3):704-713 doi: 10.1109/TNNLS.2016.2582849
    [50] Vrabie D, Pastravanu O, Abu-Khalaf M, Lewis F L. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica, 2009, 45(2):477-484 doi: 10.1016/j.automatica.2008.08.017
    [51] Vrabie D, Lewis F. Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Networks, 2009, 22(3):237-246 doi: 10.1016/j.neunet.2009.03.008
    [52] Vamvoudakis K G, Lewis F L. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 2010, 46(5):878-888 doi: 10.1016/j.automatica.2010.02.018
    [53] Vamvoudakis K G, Vrabie D, Lewis F L. Online adaptive learning of optimal control solutions using integral reinforcement learning. In:Proceedings of the 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning. Paris, France:IEEE, 2011. 250-257
    [54] Zhang H G, Cui L L, Zhang X, Luo Y H. Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Transactions on Neural Networks, 2011, 22(12):2226-2236 doi: 10.1109/TNN.2011.2168538
    [55] Liu D R, Yang X, Wang D, Wei Q L. Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints. IEEE Transactions on Cybernetics, 2015, 45(7):1372-1385 doi: 10.1109/TCYB.2015.2417170
    [56] Yang X, Liu D R, Huang Y Z. Neural-network-based online optimal control for uncertain non-linear continuous-time systems with control constraints. IET Control Theory and Applications, 2013, 7(17):2037-2047 doi: 10.1049/iet-cta.2013.0472
    [57] Wang D, Liu D R, Zhang Q C, Zhao D B. Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Transactions on Systems, Man, and Cybernetics:Systems, 2016, 46(11):1544-1555 doi: 10.1109/TSMC.2015.2492941
    [58] Wang D, Liu D R, Li H L. Policy iteration algorithm for online design of robust control for a class of continuous-time nonlinear systems. IEEE Transactions on Automation Science and Engineering, 2014, 11(2):627-632 doi: 10.1109/TASE.2013.2296206
    [59] Wang D, Liu D R, Li H L, Ma H W. Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming. Information Sciences, 2014, 282:167-179 doi: 10.1016/j.ins.2014.05.050
    [60] Vamvoudakis K, Vrabie D, Lewis F. Online policy iteration based algorithms to solve the continuous-time infinite horizon optimal control problem. In:Proceedings of the 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning. Nashville, TN, USA:IEEE, 2009.
    [61] Yang X, Liu D R, Wei Q L. Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming. IET Control Theory and Applications, 2014, 8(16):1676-1688 doi: 10.1049/iet-cta.2014.0186
    [62] Wang Z, Liu X P, Liu K F, Li S, Wang H Q. Backstepping-based Lyapunov function construction using approximate dynamic programming and sum of square techniques. IEEE Transactions on Cybernetics, 2016, DOI: 10.1109/TCYB.2016.2574747
    [63] Vamvoudakis K G, Miranda M F, Hespanha J P. Asymptotically stable adaptive-optimal control algorithm with saturating actuators and relaxed persistence of excitation. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(11):2386-2398 doi: 10.1109/TNNLS.2015.2487972
    [64] Yang X, Liu D R, Wei Q L, Wang D. Guaranteed cost neural tracking control for a class of uncertain nonlinear systems using adaptive dynamic programming. Neurocomputing, 2016, 198:80-90 doi: 10.1016/j.neucom.2015.08.119
    [65] Zargarzadeh H, Dierks T, Jagannathan S. State and output feedback-based adaptive optimal control of nonlinear continuous-time systems in strict feedback form. In:Proceedings of the 2012 American Control Conference. Montréal, Canada:IEEE, 2012. 6412-6417
    [66] Zargarzadeh H, Dierks T, Jagannathan S. Optimal control of nonlinear continuous-time systems in strict-feedback form. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(10):2535-2549 doi: 10.1109/TNNLS.2015.2441712
    [67] Modares H, Lewis F L. Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica, 2014, 50(7):1780-1792 doi: 10.1016/j.automatica.2014.05.011
    [68] Kamalapurkar R, Dinh H, Bhasin S, Dixon W E. Approximate optimal trajectory tracking for continuous-time nonlinear systems. Automatica, 2015, 51:40-48 doi: 10.1016/j.automatica.2014.10.103
    [69] Zhou Q, Shi P, Tian Y, Wang M Y. Approximation-based adaptive tracking control for mimo nonlinear systems with input saturation. IEEE Transactions on Cybernetics, 2015, 45(10):2119-2128 doi: 10.1109/TCYB.2014.2365778
    [70] Modares H, Lewis F L, Sistani M B N. Online solution of nonquadratic two-player zero-sum games arising in the H control of constrained input systems. International Journal of Adaptive Control and Signal Processing, 2014, 28(3-5):232-254 doi: 10.1002/acs.v28.3-5
    [71] Modares H, Sistani M B N, Lewis F L. A policy iteration approach to online optimal control of continuous-time constrained-input systems. ISA Transactions, 2013, 52(5):611-621 doi: 10.1016/j.isatra.2013.04.004
    [72] Abu-Khalaf M, Lewis F L, Huang J. Neurodynamic programming and zero-sum games for constrained control systems. IEEE Transactions on Neural Networks, 2008, 19(7):1243-1252 doi: 10.1109/TNN.2008.2000204
    [73] Yang X, Liu D R, Wang D. Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints. International Journal of Control, 2014, 87(3):553-566 doi: 10.1080/00207179.2013.848292
    [74] Jiang Y, Jiang Z P. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica, 2012, 48(10):2699-2704 doi: 10.1016/j.automatica.2012.06.096
    [75] Jiang Y, Jiang Z P. Robust approximate dynamic programming and global stabilization with nonlinear dynamic uncertainties. In:Proceedings of the 50th IEEE Conference on Decision and Control and European Control Conference. Orlando, FL, USA:IEEE, 2011. 115-120
    [76] Jiang Y, Jiang Z P. Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(5):882-893 doi: 10.1109/TNNLS.2013.2294968
    [77] Jiang Y, Jiang Z P. Global adaptive dynamic programming for continuous-time nonlinear systems. IEEE Transactions on Automatic Control, 2015, 60(11):2917-2929 doi: 10.1109/TAC.2015.2414811
    [78] Wang D, Liu D R, Li H L, Ma H W. Adaptive dynamic programming for infinite horizon optimal robust guaranteed cost control of a class of uncertain nonlinear systems. In:Proceedings of the 2015 American Control Conference. Chicago, IL, USA:IEEE, 2015. 2900-2905
    [79] Luo Y H, Sun Q Y, Zhang H G, Cui L L. Adaptive critic design-based robust neural network control for nonlinear distributed parameter systems with unknown dynamics. Neurocomputing, 2015, 148:200-208 doi: 10.1016/j.neucom.2013.08.049
    [80] Fan Q Y, Yang G H. Adaptive actor-critic design-based integral sliding-mode control for partially unknown nonlinear systems with input disturbances. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(1):165-177 doi: 10.1109/TNNLS.2015.2472974
    [81] Liu D R, Huang Y Z, Wang D, Wei Q L. Neural-network-observer-based optimal control for unknown nonlinear systems using adaptive dynamic programming. International Journal of Control, 2013, 86(9):1554-1566 doi: 10.1080/00207179.2013.790562
    [82] Lv Y F, Na J, Yang Q M, Wu X, Guo Y. Online adaptive optimal control for continuous-time nonlinear systems with completely unknown dynamics. International Journal of Control, 2016, 89(1):99-112 doi: 10.1080/00207179.2015.1060362
    [83] Zhu Y H, Zhao D B, Li X J. Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(3):714-724 doi: 10.1109/TNNLS.2016.2561300
    [84] Song R Z, Lewis F L, Wei Q L, Zhang H G, Jiang Z P, Levine D. Multiple actor-critic structures for continuous-time optimal control using input-output data. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(4):851-865 doi: 10.1109/TNNLS.2015.2399020
    [85] Vamvoudakis K, Vrabie D, Lewis F L. Adaptive optimal control algorithm for zero-sum Nash games with integral reinforcement learning. In:Proceedings of the 2012 AIAA Guidance, Navigation, and Control Conference. Minneapolis, Minnesota, USA:AIAA, 2012.
    [86] Wei Q L, Song R Z, Yan P F. Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(2):444-458 doi: 10.1109/TNNLS.2015.2464080
    [87] Vamvoudakis K G, Lewis F L. Multi-player non-zero-sum games:online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 2011, 47(8):1556-1569 doi: 10.1016/j.automatica.2011.03.005
    [88] Vamvoudakis K G, Lewis F L, Hudas G R. Multi-agent differential graphical games:online adaptive learning solution for synchronization with optimality. Automatica, 2012, 48(8):1598-1611 doi: 10.1016/j.automatica.2012.05.074
    [89] Wei Q L, Liu D R, Lewis F L. Optimal distributed synchronization control for continuous-time heterogeneous multi-agent differential graphical games. Information Sciences, 2015, 317:96-113 doi: 10.1016/j.ins.2015.04.044
    [90] Zhang H G, Zhang J L, Yang G H, Luo Y H. Leader-based optimal coordination control for the consensus problem of multiagent differential games via fuzzy adaptive dynamic programming. IEEE Transactions on Fuzzy Systems, 2015, 23(1):152-163 doi: 10.1109/TFUZZ.2014.2310238
    [91] Nguyen T L. Adaptive dynamic programming-based design of integrated neural network structure for cooperative control of multiple MIMO nonlinear systems. Neurocomputing, 2017, 237:12-24 doi: 10.1016/j.neucom.2016.05.044
    [92] Jiao Q, Modares H, Xu S Y, Lewis F L, Vamvoudakis K G. Multi-agent zero-sum differential graphical games for disturbance rejection in distributed control. Automatica, 2016, 69:24-34 doi: 10.1016/j.automatica.2016.02.002
    [93] Jiao Q, Modares H, Lewis F L, Xu S Y, Xie L H. Distributed L2-gain output-feedback control of homogeneous and heterogeneous systems. Automatica, 2016, 71:361-368 doi: 10.1016/j.automatica.2016.04.025
    [94] Adib Yaghmaie F, Lewis F L, Su R. Output regulation of linear heterogeneous multi-agent systems via output and state feedback. Automatica, 2016, 67:157-164 doi: 10.1016/j.automatica.2016.01.040
    [95] Zhang H G, Jiang H, Luo Y H, Xiao G Y. Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method. IEEE Transactions on Industrial Electronics, 2017, 64(5):4091-4100 doi: 10.1109/TIE.2016.2542134
    [96] Venayagamoorthy G K, Harley R G, Wunsch D C. Dual heuristic programming excitation neurocontrol for generators in a multimachine power system. IEEE Transactions on Industry Applications, 2003, 39(2):382-394 doi: 10.1109/TIA.2003.809438
    [97] Park J W, Harley R G, Venayagamoorthy G K. Adaptive-critic-based optimal neurocontrol for synchronous generators in a power system using MLP/RBF neural networks. IEEE Transactions on Industry Applications, 2003, 39(5):1529-1540 doi: 10.1109/TIA.2003.816493
    [98] Wei Q L, Liu D R, Shi G, Liu Y. Multibattery optimal coordination control for home energy management systems via distributed iterative adaptive dynamic programming. IEEE Transactions on Industrial Electronics, 2015, 62(7):4203-4214 doi: 10.1109/TIE.2014.2388198
    [99] Wei Q L, Liu D R, Shi G. A novel dual iterative Q-learning method for optimal battery management in smart residential environments. IEEE Transactions on Industrial Electronics, 2015, 62(4):2509-2518 doi: 10.1109/TIE.2014.2361485
    [100] Cai C, Wong C K, Heydecker B G. Adaptive traffic signal control using approximate dynamic programming. Transportation Research Part C:Emerging Technologies, 2009, 17(5):456-474 doi: 10.1016/j.trc.2009.04.005
    [101] 赵冬斌, 刘德荣, 易建强.基于自适应动态规划的城市交通信号优化控制方法综述.自动化学报, 2009, 35(6):676-681 http://www.aas.net.cn/CN/abstract/abstract13331.shtml

    Zhao Dong-Bin, Liu De-Rong, Yi Jian-Qiang. An overview on the adaptive dynamic programming based urban city traffic signal optimal control. Acta Automatica Sinica, 2009, 35(6):676-681 http://www.aas.net.cn/CN/abstract/abstract13331.shtml
    [102] Wang F Y. Agent-based control for networked traffic management systems. IEEE Intelligent Systems, 2005, 20(5):92-96 doi: 10.1109/MIS.2005.80
    [103] Lee J M, Lee J H. An approximate dynamic programming based approach to dual adaptive control. Journal of Process Control, 2009, 19(5):859-864 doi: 10.1016/j.jprocont.2008.11.009
    [104] Wei Q L, Liu D R. Data-driven neuro-optimal temperature control of water-gas shift reaction using stable iterative adaptive dynamic programming. IEEE Transactions on Industrial Electronics, 2014, 61(11):6399-6408 doi: 10.1109/TIE.2014.2301770
    [105] 林小峰, 黄元君, 宋春宁.带şvarepsilon误差限的近似最优控制.控制理论与应用, 2012, 29(1):104-108 http://www.cnki.com.cn/Article/CJFDTOTAL-KZLY201201017.htm

    Lin Xiao-Feng, Huang Yuan-Jun, Song Chun-Ning. Approximate optimal control with şvarepsilon-error bound. Control Theory and Applications, 2012, 29(1):104-108 http://www.cnki.com.cn/Article/CJFDTOTAL-KZLY201201017.htm
    [106] Nodland D, Zargarzadeh H, Jagannathan S. Neural network-based optimal adaptive output feedback control of a helicopter UAV. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(7):1061-1073 doi: 10.1109/TNNLS.2013.2251747
    [107] Stingu E, Lewis F L. An approximate dynamic programming based controller for an underactuated 6DOF quadrotor. In:Proceedings of the 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning. Paris, France:IEEE, 2011.
    [108] Xie Q Q, Luo B, Tan F X, Guan X P. Optimal control for vertical take-off and landing aircraft non-linear system by online kernel-based dual heuristic programming learning. IET Control Theory and Applications, 2015, 9(6):981-987 doi: 10.1049/iet-cta.2013.0889
    [109] Mu C X, Ni Z, Sun C Y, He H B. Air-breathing hypersonic vehicle tracking control based on adaptive dynamic programming. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(3):584-598 doi: 10.1109/TNNLS.2016.2516948
    [110] Balakrishnan S N, Biega V. Adaptive-critic-based neural networks for aircraft optimal control. Journal of Guidance, Control, and Dynamics, 1996, 19(4):893-898 doi: 10.2514/3.21715
    [111] Enns R, Si J. Apache helicopter stabilization using neural dynamic programming. Journal of Guidance, Control, and Dynamics, 2002, 25(1):19-25 doi: 10.2514/2.4870
    [112] Enns R, Si J. Helicopter trimming and tracking control using direct neural dynamic programming. IEEE Transactions on Neural Networks, 2003, 14(4):929-939 doi: 10.1109/TNN.2003.813839
    [113] Ferrari S, Stengel R F. Online adaptive critic flight control. Journal of Guidance, Control, and Dynamics, 2004, 27(5):777-786 doi: 10.2514/1.12597
    [114] Valasek J, Doebbler J, Tandale M D, Meade A J. Improved adaptive-reinforcement learning control for morphing unmanned air vehicles. IEEE Transactions on Systems, Man, and Cybernetics, Part B:Cybernetics, 2008, 38(4):1014-1020 doi: 10.1109/TSMCB.2008.922018
    [115] Guo C, Wu H N, Luo B, Guo L. H control for air-breathing hypersonic vehicle based on online simultaneous policy update algorithm. International Journal of Intelligent Computing and Cybernetics, 2013, 6(2):126-143 doi: 10.1108/IJICC-Jun-2012-0031
    [116] Luo X, Chen Y, Si J, Feng L. Longitudinal control of hypersonic vehicles based on direct heuristic dynamic programming using ANFIS. In:Proceedings of the 2014 International Joint Conference on Neural Networks. Beijing, China:IEEE, 2014. 3685-3692
    [117] Furfaro R, Wibben D R, Gaudet B, Simo J. Terminal multiple surface sliding guidance for planetary landing:development, tuning and optimization via reinforcement learning. The Journal of the Astronautical Sciences, 2015, 62(1):73-99 doi: 10.1007/s40295-015-0045-1
    [118] Zhou Y, Van Kampen E J, Chu Q P. Nonlinear adaptive flight control using incremental approximate dynamic programming and output feedback. Journal of Guidance, Control, and Dynamics, 2017, 40(S):493-500 https://www.researchgate.net/publication/310781464_Nonlinear_Adaptive_Flight_Control_Using_Incremental_Approximate_Dynamic_Programming_and_Output_Feedback
    [119] Zhou Y, Van Kampen E J, Chu Q P. An incremental approximate dynamic programming flight controller based on output feedback. In:Proceedings of the 2016 AIAA Guidance, Navigation, and Control Conference. San Diego, California, USA:AIAA, 2016.
    [120] Ghosh S, Ghose D, Raha S. Capturability of augmented pure proportional navigation guidance against time-varying target maneuvers. Journal of Guidance, Control, and Dynamics, 2014, 37(5):1446-1461 doi: 10.2514/1.G000561
    [121] Shaferman V, Shima T. Linear quadratic guidance laws for imposing a terminal intercept angle. Journal of Guidance, Control, and Dynamics, 2008, 31(5):1400-1412 doi: 10.2514/1.32836
    [122] Lee Y, Kim Y, Moon G, Jun B E. Sliding-mode-based missile-integrated attitude control schemes considering velocity change. Journal of Guidance, Control, and Dynamics, 2016, 39(3):423-436 doi: 10.2514/1.G001416
    [123] Kumar S R, Rao S, Ghose D. Nonsingular terminal sliding mode guidance with impact angle constraints. Journal of Guidance, Control, and Dynamics, 2014, 37(4):1114-1130 doi: 10.2514/1.62737
    [124] 周慧波. 基于有限时间和滑模理论的导引律及多导弹协同制导研究[博士学位论文], 哈尔滨工业大学, 中国, 2015. http://cdmd.cnki.com.cn/Article/CDMD-10213-1015957301.htm

    Zhou Hui-Bo. Study on Guidance Law and Cooperative Guidance for Multi-missiles Based on Finite-time and Sliding Mode Theory[Ph., D. dissertation], Harbin Institute of Technology, China, 2015. http://cdmd.cnki.com.cn/Article/CDMD-10213-1015957301.htm
    [125] 张友安, 黄诘, 王丽英.约束条件下的末制导律研究进展.海军航空工程学院学报, 2013, 28(6):581-586 http://www.cnki.com.cn/Article/CJFDTOTAL-HJHK201306001.htm

    Zhang You-An, Huang Jie, Wang Li-Ying. Research progress of terminal guidance law with constraint. Journal of Naval Aeronautical and Astronautical, 2013, 28(6):581-586 http://www.cnki.com.cn/Article/CJFDTOTAL-HJHK201306001.htm
    [126] Imado F, Kuroda T, Miwa S. Optimal midcourse guidance for medium-range air-to-air missiles. Journal of Guidance, Control, and Dynamics, 1990, 13(4):603-608 doi: 10.2514/3.25376
    [127] Balakrishnan S N, Xin M. Robust state dependent Riccati equation based guidance laws. In:Proceedings of the 2001 American Control Conference. Arlington, VA, USA:IEEE, 2001. 3352-3357
    [128] Indig N, Ben-Asher J Z, Sigal E. Near-optimal minimum-time guidance under spatial angular constraint in atmospheric flight. Journal of Guidance, Control, and Dynamics, 2016, 39(7):1563-1577 doi: 10.2514/1.G001485
    [129] Taub I, Shima T. Intercept angle missile guidance under time varying acceleration bounds. Journal of Guidance, Control, and Dynamics, 2013, 36(3):686-699 doi: 10.2514/1.59139
    [130] 陈克俊, 赵汉元.一种适用于攻击地面固定目标的最优再入机动制导律.宇航学报, 1994, 15(1):1-7, 94 http://www.cnki.com.cn/Article/CJFDTOTAL-YHXB401.000.htm

    Chen Ke-Jun, Zhao Han-Yuan. An optimal reentry maneuver guidance law applying to attack the ground fixed target. Journal of Astronautics, 1994, 15(1):1-7, 94 http://www.cnki.com.cn/Article/CJFDTOTAL-YHXB401.000.htm
    [131] 赵汉元.飞行器再入动力学和制导.北京:国防科技大学出版社, 1997.

    Zhao Han-Yuan. Reentry Vehicle Dynamics and Guidance. Beijing:National University of Defense Technology, 1997.
    [132] Lee Y I, Ryoo C K, Kim E. Optimal guidance with constraints on impact angle and terminal acceleration. In:Proceedings of the 2003 AIAA Guidance, Navigation, and Control Conference and Exhibit. Austin, Texas, USA:AIAA, 2003.
    [133] Lee J I, Jeon I S, Tahk M J. Guidance law to control impact time and angle. IEEE Transactions on Aerospace and Electronic Systems, 2007, 43(1):301-310 doi: 10.1109/TAES.2007.357135
    [134] 胡正东, 郭才发, 蔡洪.带落角约束的再入机动弹头的复合导引律.国防科技大学学报, 2008, 30(3):21-26 http://www.cnki.com.cn/Article/CJFDTOTAL-GFKJ200803004.htm

    Hu Zheng-Dong, Guo Cai-Fa, Cai Hong. Integrated guidance law of reentry maneuvering warhead with terminal angular constraint. Journal of National University of Defense Technology, 2008, 30(3):21-26 http://www.cnki.com.cn/Article/CJFDTOTAL-GFKJ200803004.htm
    [135] Bardhan R, Ghose D. Nonlinear differential games-based impact-angle-constrained guidance law. Journal of Guidance, Control, and Dynamics, 2015, 38(3):384-402 doi: 10.2514/1.G000940
    [136] 方绍琨, 李登峰.微分对策及其在军事领域的研究进展.指挥控制与仿真, 2008, 30(1):114-117 http://www.cnki.com.cn/Article/CJFDTOTAL-QBZH200801032.htm

    Fang Shao-Kun, Li Deng-Feng. Research advances on differential games and applications to military field. Command Control and Simulation, 2008, 30(1):114-117 http://www.cnki.com.cn/Article/CJFDTOTAL-QBZH200801032.htm
    [137] Yang C D, Chen H Y. Nonlinear H robust guidance law for homing missiles. Journal of Guidance, Control, and Dynamics, 1998, 21(6):882-890 doi: 10.2514/2.4321
    [138] Dalton J, Balakrishnan S N. A neighboring optimal adaptive critic for missile guidance. Mathematical and Computer Modelling, 1996, 23(1-2):175-188 doi: 10.1016/0895-7177(95)00226-X
    [139] Han D C, Balakrishnan S N. Adaptive critic based neural networks for control-constrained agile missile control. In:Proceedings of the 1999 American Control Conference. San Diego, California, USA:IEEE, 1999. 2600-2604
    [140] Si J, Barto A, Powell W, Wunsch D. Adaptive Critic Based Neural Network for Control-Constrained Agile Missile. New Jersey:John Wiley and Sons, Inc., 2012.
    [141] Han D C, Balakrishnan S. Midcourse guidance law with neural networks. In:Proceedings of the 2000 AIAA Guidance, Navigation, and Control Conference and Exhibit. Denver, CO, USA:AIAA, 2000.
    [142] Han D C, Balakrishnan S N. State-constrained agile missile control with adaptive-critic-based neural networks. IEEE Transactions on Control Systems Technology, 2002, 10(4):481-489 doi: 10.1109/TCST.2002.1014669
    [143] Bertsekas D P, Homer M L, Logan D A, Patek S D, Sandell N R. Missile defense and interceptor allocation by neuro-dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics, Part A:Systems and Humans, 2000, 30(1):42-51 doi: 10.1109/3468.823480
    [144] Davis M T, Robbins M J, Lunday B J. Approximate dynamic programming for missile defense interceptor fire control. European Journal of Operational Research, 2017, 259(3):873-886 doi: 10.1016/j.ejor.2016.11.023
    [145] Lin C K. Adaptive critic autopilot design of bank-to-turn missiles using fuzzy basis function networks. IEEE Transactions on Systems, Man, and Cybernetics, Part B:Cybernetics, 2005, 35(2):197-207 doi: 10.1109/TSMCB.2004.842246
    [146] 卢超群, 江加和, 任章.基于增强学习的空空导弹智能精确制导律研究.战术导弹控制技术, 2006, (4):19-22, 76 http://d.wanfangdata.com.cn/Periodical/zsddkzjs200604007

    Lu Chao-Qun, Jiang Jia-He, Ren Zhang. Research of precision guidance law based on Q-learning for air-to-air missile. Control Technology of Tactical Missile, 2006, (4):19-22, 76 http://d.wanfangdata.com.cn/Periodical/zsddkzjs200604007
    [147] McGrew J S, How J P, Bush L, Williams B, Roy N. Air combat strategy using approximate dynamic programming. In:Proceedings of the 2008 AIAA Guidance, Navigation and Control Conference and Exhibit. Honolulu, Hawaii, USA:AIAA, 2008.
    [148] Gaudet B, Furfaro R. Missile homing-phase guidance law design using reinforcement learning. In:Proceedings of the 2012 AIAA Guidance, Navigation, and Control Conference. Minneapolis, Minnesota, USA:AIAA, 2012.
    [149] Lee D, Bang H. Planar evasive aircrafts maneuvers using reinforcement learning. Intelligent Autonomous Systems 12:Advances in Intelligent Systems and Computing. Berlin Heidelberg:Springer, 2013. 533-542
    [150] Sun J L, Liu C S, Ye Q. Robust differential game guidance laws design for uncertain interceptor-target engagement via adaptive dynamic programming. International Journal of Control, 2017, 64(5):4091-4100 https://www.researchgate.net/publication/283513965_Robust_Adaptive_Dynamic_Programming_of_Two-Player_Zero-Sum_Games_for_Continuous-Time_Linear_Systems
    [151] 姚郁, 郑天宇, 贺风华, 王龙, 汪洋, 张曦, 朱柏羊, 杨宝庆.飞行器末制导中的几个热点问题与挑战.航空学报, 2015, 36(8):2696 http://www.cnki.com.cn/Article/CJFDTOTAL-HKXB201508020.htm

    Yao Yu, Zheng Tian-Yu, He Feng-Hua, Wang Long, Wang Yang, Zhang Xi, Zhu Bai-Yang, Yang Bao-Qing. Several hot issues and challenges in terminal guidance of flight vehicles. Acta Aeronautica et Astronautica Sinica, 2015, 36(8):2696-2716 http://www.cnki.com.cn/Article/CJFDTOTAL-HKXB201508020.htm
  • 加载中
计量
  • 文章访问数:  2514
  • HTML全文浏览量:  309
  • PDF下载量:  2098
  • 被引次数: 0
出版历程
  • 收稿日期:  2016-10-25
  • 录用日期:  2017-02-06
  • 刊出日期:  2017-07-20

目录

    /

    返回文章
    返回