2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于迭代神经动态规划的数据驱动非线性近似最优调节

王鼎 穆朝絮 刘德荣

王鼎, 穆朝絮, 刘德荣. 基于迭代神经动态规划的数据驱动非线性近似最优调节. 自动化学报, 2017, 43(3): 366-375. doi: 10.16383/j.aas.2017.c160272
引用本文: 王鼎, 穆朝絮, 刘德荣. 基于迭代神经动态规划的数据驱动非线性近似最优调节. 自动化学报, 2017, 43(3): 366-375. doi: 10.16383/j.aas.2017.c160272
WANG Ding, MU Chao-Xu, LIU De-Rong. Data-driven Nonlinear Near-optimal Regulation Based on Iterative Neural Dynamic Programming. ACTA AUTOMATICA SINICA, 2017, 43(3): 366-375. doi: 10.16383/j.aas.2017.c160272
Citation: WANG Ding, MU Chao-Xu, LIU De-Rong. Data-driven Nonlinear Near-optimal Regulation Based on Iterative Neural Dynamic Programming. ACTA AUTOMATICA SINICA, 2017, 43(3): 366-375. doi: 10.16383/j.aas.2017.c160272

基于迭代神经动态规划的数据驱动非线性近似最优调节

doi: 10.16383/j.aas.2017.c160272 cstr: 32138.14.j.aas.2017.c160272
基金项目: 

国家自然科学基金 61304086

国家自然科学基金 61533017

天津市自然科学基金 14JCQNJC05400

国家自然科学基金 61273140

天津市过程检测与控制重点实验室开放课题基金 TKLPMC-201612

国家自然科学基金 U1501251

国家自然科学基金 61411130160

国家自然科学基金 61304018

国家自然科学基金 61233001

北京市自然科学基金 4162065

详细信息
    作者简介:

    穆朝絮天津大学电气自动化与信息工程学院副教授.2012年获得东南大学工学博士学位.主要研究方向为非线性控制理论与应用, 智能控制与优化, 智能电网.E-mail:cxmu@tju.edu.cn

    刘德荣北京科技大学教授.主要研究方向为自适应动态规划, 计算智能, 智能控制与信息处理, 复杂工业系统建模与控制.E-mail:derong@ustb.edu.cn

    通讯作者:

    王鼎中国科学院自动化研究所副研究员.2009年获得东北大学理学硕士学位, 2012年获得中国科学院自动化研究所工学博士学位.主要研究方向为自适应与学习系统, 智能控制, 神经网络.本文通信作者.E-mail:ding.wang@ia.ac.cn

Data-driven Nonlinear Near-optimal Regulation Based on Iterative Neural Dynamic Programming

Funds: 

National Natural Science Foundation of China 61304086

National Natural Science Foundation of China 61533017

Tianjin Natural Science Foundation 14JCQNJC05400

National Natural Science Foundation of China 61273140

Research Fund of Tianjin Key Laboratory of Process Measurement and Control TKLPMC-201612

National Natural Science Foundation of China U1501251

National Natural Science Foundation of China 61411130160

National Natural Science Foundation of China 61304018

National Natural Science Foundation of China 61233001

Beijing Natural Science Foundation 4162065

More Information
    Author Bio:

    Associate professor at the School of Electrical and Information Engineering, Tianjin University. She received her Ph. D. degree in control science and engineering from Southeast University, Nanjing, China, in 2012. Her research interest covers nonlinear control and application, intelligent control and optimization, and smart grid

    Professor at University of Science and Technology Beijing. His research interest covers adaptive dynamic programming, computational intelligence, intelligent control and information processing, and modeling and control for complex industrial systems

    Corresponding author: WANG DingAssociate professor at the Institute of Automation, Chinese Academy of Sciences. He received his master degree in operations research and cybernetics from Northeastern University, Shenyang, China and his Ph. D. degree in control theory and control engineering from the Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2009 and 2012, respectively. His research interest covers adaptive and learning systems, intelligent control, and neural networks. Corresponding author of this paper
  • 摘要: 利用数据驱动控制思想,建立一种设计离散时间非线性系统近似最优调节器的迭代神经动态规划方法.提出针对离散时间一般非线性系统的迭代自适应动态规划算法并且证明其收敛性与最优性.通过构建三种神经网络,给出全局二次启发式动态规划技术及其详细的实现过程,其中执行网络是在神经动态规划的框架下进行训练.这种新颖的结构可以近似代价函数及其导函数,同时在不依赖系统动态的情况下自适应地学习近似最优控制律.值得注意的是,这在降低对于控制矩阵或者其神经网络表示的要求方面,明显地改进了迭代自适应动态规划算法的现有结果,能够促进复杂非线性系统基于数据的优化与控制设计的发展.通过两个仿真实验,验证本文提出的数据驱动最优调节方法的有效性.
    1)  本文责任编委 侯忠生
  • 图  1  评判网络结构

    Fig.  1  The architecture of critic network

    图  2  迭代神经动态规划结构

    Fig.  2  The architecture of iterative neural dynamic programming

    图  3  权值矩阵范数的收敛过程

    Fig.  3  The convergence process of the norm of weight matrices

    图  4  代价函数及其偏导数的收敛过程

    Fig.  4  The convergence process of the cost function and its derivative

    图  5  系统状态轨迹x

    Fig.  5  The system state trajectory x

    图  6  控制输入轨迹u

    Fig.  6  The control input trajectory u

    图  7  权值矩阵范数的收敛过程

    Fig.  7  The convergence process of the norm of weight matrices

    图  8  代价函数及其偏导数的收敛过程

    Fig.  8  The convergence process of the cost function and its derivative

    图  9  系统状态轨迹x和控制输入轨迹u

    Fig.  9  The system state trajectory x and control input trajectory u

  • [1] Bellman R E. Dynamic Programming. Princeton, NJ: Princeton University Press, 1957.
    [2] Werbos P J. Approximate dynamic programming for real-time control and neural modeling. Handbook of Intelligent Control. New York: Van Nostrand Reinhold, 1992.
    [3] Lewis F L, Vrabie D, Vamvoudakis K G. Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Systems, 2012, 32(6): 76-105 doi: 10.1109/MCS.2012.2214134
    [4] 张化光, 张欣, 罗艳红, 杨珺.自适应动态规划综述.自动化学报, 2013, 39(4): 303-311 doi: 10.1016/S1874-1029(13)60031-2

    Zhang Hua-Guang, Zhang Xin, Luo Yan-Hong, Yang Jun. An overview of research on adaptive dynamic programming. Acta Automatica Sinica, 2013, 39(4): 303-311 doi: 10.1016/S1874-1029(13)60031-2
    [5] 刘德荣, 李宏亮, 王鼎.基于数据的自学习优化控制:研究进展与展望.自动化学报, 2013, 39(11): 1858-1870 doi: 10.3724/SP.J.1004.2013.01858

    Liu De-Rong, Li Hong-Liang, Wang Ding. Data-based self-learning optimal control: research progress and prospects. Acta Automatica Sinica, 2013, 39(11): 1858-1870 doi: 10.3724/SP.J.1004.2013.01858
    [6] Hou Z S, Wang Z. From model-based control to data-driven control: survey, classification and perspective. Information Sciences, 2013, 235: 3-35 doi: 10.1016/j.ins.2012.07.014
    [7] Prokhorov D V, Wunsch D C. Adaptive critic designs. IEEE Transactions on Neural Networks, 1997, 8(5): 997-1007 doi: 10.1109/72.623201
    [8] Sutton R S, Barto A G. Reinforcement Learning——An Introduction. Cambridge, MA: MIT Press, 1998.
    [9] Si J, Wang Y T. Online learning control by association and reinforcement. IEEE Transactions on Neural Networks, 2001, 12(2): 264-276 doi: 10.1109/72.914523
    [10] 王飞跃.平行控制:数据驱动的计算控制方法.自动化学报, 2013, 39(4): 293-302 http://www.aas.net.cn/CN/abstract/abstract17915.shtml

    Wang Fei-Yue. Parallel control: a method for data-driven and computational control. Acta Automatica Sinica, 2013, 39(4): 293-302 http://www.aas.net.cn/CN/abstract/abstract17915.shtml
    [11] Al-Tamimi A, Lewis F L, Abu-Khalaf M. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Transactions on Systems, Man, Cybernetics, Part B, Cybernetics, 2008, 38(4): 943-949 doi: 10.1109/TSMCB.2008.926614
    [12] Zhang H G, Luo Y H, Liu D R. Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Transactions on Neural Networks, 2009, 20(9): 1490-1503 doi: 10.1109/TNN.2009.2027233
    [13] Dierks T, Thumati B T, Jagannathan S. Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Networks, 2009, 22(5-6): 851-860 doi: 10.1016/j.neunet.2009.06.014
    [14] Wang F Y, Jin N, Liu D R, Wei Q L. Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with ε-error bound. IEEE Transactions on Neural Networks, 2011, 22(1): 24-36 doi: 10.1109/TNN.2010.2076370
    [15] Liu D R, Wang D, Zhao D B, Wei Q L, Jin N. Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Transactions on Automation Science and Engineering, 2012, 9(3): 628-634 doi: 10.1109/TASE.2012.2198057
    [16] Wang D, Liu D R, Wei Q L, Zhao D B, Jin N. Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica, 2012, 48(8): 1825-1832 doi: 10.1016/j.automatica.2012.05.049
    [17] Zhang H G, Qin C B, Luo Y H. Neural-network-based constrained optimal control scheme for discrete-time switched nonlinear system using dual heuristic programming. IEEE Transactions on Automation Science and Engineering, 2014, 11(3): 839-849 doi: 10.1109/TASE.2014.2303139
    [18] Liu D R, Li H L, Wang D. Error bounds of adaptive dynamic programming algorithms for solving undiscounted optimal control problems. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(6): 1323-1334 doi: 10.1109/TNNLS.2015.2402203
    [19] Zhong X N, Ni Z, He H B. A theoretical foundation of goal representation heuristic dynamic programming. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(12): 2513-2525 doi: 10.1109/TNNLS.2015.2490698
    [20] Heydari A, Balakrishnan S N. Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(1): 145-157 doi: 10.1109/TNNLS.2012.2227339
    [21] Jiang Y, Jiang Z P. Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(5): 882-893 doi: 10.1109/TNNLS.2013.2294968
    [22] Na J, Herrmann G. Online adaptive approximate optimal tracking control with simplified dual approximation structure for continuous-time unknown nonlinear systems. IEEE/CAA Journal of Automatica Sinica, 2014, 1(4): 412-422 doi: 10.1109/JAS.2014.7004668
    [23] Liu D R, Yang X, Wang D, Wei Q L. Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints. IEEE Transactions on Cybernetics, 2015, 45(7): 1372-1385 doi: 10.1109/TCYB.2015.2417170
    [24] Luo B, Wu H N, Huang T W. Off-policy reinforcement learning for H control design. IEEE Transactions on Cybernetics, 2015, 45(1): 65-76 doi: 10.1109/TCYB.2014.2319577
    [25] Mu C X, Ni Z, Sun C Y, He H B. Air-breathing hypersonic vehicle tracking control based on adaptive dynamic programming. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(3): 584-598 doi: 10.1109/TNNLS.2016.2516948
    [26] Wang D, Liu D R, Zhang Q C, Zhao D B. Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2016, 46(11): 1544-1555 doi: 10.1109/TSMC.2015.2492941
  • 加载中
图(9)
计量
  • 文章访问数:  3251
  • HTML全文浏览量:  366
  • PDF下载量:  1894
  • 被引次数: 0
出版历程
  • 收稿日期:  2016-03-16
  • 录用日期:  2016-05-17
  • 刊出日期:  2017-03-20

目录

    /

    返回文章
    返回