2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

数据驱动的保证收敛速率最优输出调节

姜艺 范家璐 柴天佑

姜艺, 范家璐, 柴天佑. 数据驱动的保证收敛速率最优输出调节. 自动化学报, 2021, 47(x): 1−12 doi: 10.16383/j.aas.c200932
引用本文: 姜艺, 范家璐, 柴天佑. 数据驱动的保证收敛速率最优输出调节. 自动化学报, 2021, 47(x): 1−12 doi: 10.16383/j.aas.c200932
Jiang Yi, Fan Jia-Lu, Chai Tian-You. Data-driven optimal output regulation with assured convergence rate. Acta Automatica Sinica, 2021, 47(x): 1−12 doi: 10.16383/j.aas.c200932
Citation: Jiang Yi, Fan Jia-Lu, Chai Tian-You. Data-driven optimal output regulation with assured convergence rate. Acta Automatica Sinica, 2021, 47(x): 1−12 doi: 10.16383/j.aas.c200932

数据驱动的保证收敛速率最优输出调节

doi: 10.16383/j.aas.c200932
基金项目: 国家自然科学基金 (61991400, 61991404, 61991403, 61533015), 中央高校基本科研专项资金 (N180804001), 2020年度辽宁省科技重大专项计划 (2020JH1/10100008)资助
详细信息
    作者简介:

    姜艺:东北大学流程工业综合自动化国家重点实验室博士. 2020年获得东北大学控制理论与控制工程博士学位. 主要研究方向包括工业过程运行控制、网络控制、自适应动态规划、强化学习. E-mail: JY369356904@163.com

    范家璐:东北大学流程工业综合自动化国家重点实验室副教授. 2011年获浙江大学控制科学与工程系博士学位(与美国宾夕法尼亚州立大学联合培养). 主要研究方向为工业过程运行控制、工业无线传感器网络与移动社会网络. E-mail: jlfan@mail.neu.edu.cn

    柴天佑:中国工程院院士, 东北大学教授, IEEE Fellow, IFAC Fellow. 1985年获得东北大学博士学位. 主要研究方向为自适应控制, 智能解耦控制, 流程工业综台自动化理论、方法与技术. 本文通信作者. E-mail: tychai@mail.neu.edu.cn

Data-Driven Optimal Output Regulation with Assured Convergence Rate

Funds: Supported by National Natural Science Foundation of P. R. China (61991400, 61991404, 61991403, 61533015), the Fundamental Research Funds for the Central Universities (N180804001), 2020 Science and Technology Major Project of Liaoning Province (2020JH1/10100008)
More Information
    Author Bio:

    JIANG Yi Ph. D. at the State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University. He received the Ph. D. degree in Control Theory and Engineering from Northeastern University in 2020. His research interest includes industrial process operational control, networked control, adaptive dynamic programming and reinforcement learning

    FAN Jia-Lu Associate Professor at the State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University. She received her Ph. D. degree from Zhejiang University in 2011. She was a Visiting Scholar with the Pennsylvania State University during 2009-2010. Her research interest covers networked operational control, industrial wireless sensor networks and mobile social networks

    CHAI Tian-You Academician of Chinese Academy of Engineering, professor at Northeastern University, IEEE Fellow, IFAC Fellow. He received his Ph. D. degree from Northeastern University in 1985. His research interest covers adaptive control, intelligent decoupling control, and integrated automation theory, method and technology of industrial process. Corresponding author of this paper

  • 摘要: 本文针对具有外部系统扰动的线性离散时间系统的输出调节问题, 提出了可保证收敛速率的数据驱动最优输出调节方法, 包括状态可在线测量系统的基于状态反馈的算法, 与状态不可在线测量系统的基于输出反馈的算法. 首先, 该问题被分解为输出调节方程求解问题与反馈控制律设计问题, 基于输出调节方程的解, 本文通过引入收敛速率参数, 建立了可保证收敛速率的最优控制问题, 通过求解该问题得到具有保证收敛速率的输出调节器. 之后, 利用强化学习的方法, 设计基于值迭代的数据驱动状态反馈控制器, 学习得到基于状态反馈的最优输出调节器. 对于状态无法在线测量的被控对象, 本文利用历史输入输出数据对状态进行重构, 并以此为基础设计基于值迭代的数据驱动输出反馈控制器. 仿真实验验证了本文所提方法的有效性.
  • 图  1  基于状态反馈的输出y(k)与参考信号yd(k)轨迹

    Fig.  1  Trajectories of the output y(k) and the reference signal yd(k) via state feedback

    图  3  基于状态反馈的误差e(k)与${\gamma ^{ - k}}e({k_0})$对比曲线

    Fig.  3  Comparison curve of e(k) and ${\gamma ^{ - k}}e({k_0})$ via state feedback

    图  2  基于状态反馈的$ \Vert {P}_{j}-{P}^{*}\Vert $$ \Vert {K}_{j}-{K}^{*}\Vert $误差轨迹

    Fig.  2  Trajectory of the error between $ \Vert {P}_{j}-{P}^{*}\Vert $ and $ \Vert {K}_{j}-{K}^{*}\Vert $ via state feedback

    图  4  基于输出反馈的输出y(k)与参考信号yd(k)轨迹

    Fig.  4  Trajectories of the output y(k) and the reference signal yd(k) via output feedback

    图  6  基于输出反馈的误差e(k)与${\gamma ^{ - k}}e({k_0})$对比曲线

    Fig.  6  Comparison curve of e(k) and ${\gamma ^{ - k}}e({k_0})$ via output feedback

    图  5  基于输出反馈的$ \Vert {\overline{P}}_{j}-{\overline{P}}^{*}\Vert $$ \Vert {\overline{K}}_{j}-{\overline{K}}^{*}\Vert $误差轨迹

    Fig.  5  Trajectory of the error between $ \Vert {\overline{P}}_{j}-{\overline{P}}^{*}\Vert $ and $ \Vert {\overline{K}}_{j}-{\overline{K}}^{*}\Vert $ via output feedback

    图  7  对比仿真结果

    Fig.  7  Comparison of simulation results

  • [1] Åström, K J, Tore H. PID controllers: theory, design, and tuning. Research Triangle Park, NC: Instrument society of America, 1995.
    [2] Garcia C E, Prett D M, Morari M. Model predictive control: theory and practice—a survey. Automatica, 1989, 25(3): 335−348 doi: 10.1016/0005-1098(89)90002-2
    [3] Francis B A. The Linear Multivariable Regulator Problem. SIAM Journal on Control and Optimization, 1977, 15(3): 486−505 doi: 10.1137/0315033
    [4] Isidori A, Byrnes C I. Output regulation of nonlinear systems. IEEE Transactions on Automatic Control, 1990, 35(2): 131−140 doi: 10.1109/9.45168
    [5] Ding Z T. Output regulation of uncertain nonlinear systems with nonlinear exosystems. IEEE Transactions on Automatic Control, 2006, 51(3): 498−503 doi: 10.1109/TAC.2005.864199
    [6] Huang J, Chen Z. A general framework for tackling the output regulation problem. IEEE Transactions on Automatic Control, 2004, 49(12): 2203−2218 doi: 10.1109/TAC.2004.839236
    [7] Parks P. Liapunov redesign of model reference adaptive control systems. IEEE Transactions on Automatic Control, 1966, 11(3): 362−367 doi: 10.1109/TAC.1966.1098361
    [8] 田涛涛, 侯忠生, 刘世达, 邓志东. 基于无模型自适应控制的无人驾驶汽车横向控制方法. 自动化学报, 2017, 43(11): 1931−1940

    Tian Tao-Tao, Hou Zhong-Sheng, Liu Shi-Da, Deng Zhi-Dong. Model-free Adaptive Control Based Lateral Control of Self-driving Car. Acta Automatica Sinica, 2017, 43(11): 1931−1940
    [9] 于欣波, 贺威, 薛程谦, 孙永坤, 孙长银. 基于扰动观测器的机器人自适应神经网络跟踪控制研究. 自动化学报, 2019, 45(7): 1307−1324

    Yu Xin-Bo, He Wei, Xue Cheng-Qian, Sun Yong-Kun, Sun Chang-Yin. Disturbance Observer-based Adaptive Neural Network Tracking Control for Robots. Acta Automatica Sinica, 2019, 45(7): 1307−1324
    [10] Modares H, Lewis F L. Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning. IEEE Transactions on Automatic Control, 2014, 59(11): 3051−3056 doi: 10.1109/TAC.2014.2317301
    [11] Xue W Q, Fan J L, Lopez V G, Yi Jiang Y, Chai T Y, Lewis F L. Off-Policy Reinforcement Learning for Tracking in Continuous-Time Systems on Two Time-Scales. IEEE Transactions on Neural Networks and Learning Systems, to be published, doi: 10.1109/TNNLS.2020.3017461.
    [12] Kiumarsi B, Lewis F L, Modares H, Karimpour A, Naghibisistani M B. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica, 2014, 50(4): 1167−1175 doi: 10.1016/j.automatica.2014.02.015
    [13] Jiang Y, Fan J, Chai T, Lewis F L, Li J N. Tracking Control for Linear Discrete-Time Networked Control Systems With Unknown Dynamics and Dropout. IEEE Transactions on Neural Networks and Learning System, 2018, 29(10): 4607−4620 doi: 10.1109/TNNLS.2017.2771459
    [14] 吴倩, 范家璐, 姜艺, 柴天佑. 无线网络环境下数据驱动混合选别浓密过程双率控制方法. 自动化学报, 2019, 45(6): 1122−1135

    Wu Qian, Fan Jia-Lu, Jiang Yi, Chai Tian-You. Data-driven Dual-rate Control for Mixed Separation Thickening Process in a Wireless Network Environment. Acta Automatica Sinica, 2019, 45(6): 1122−1135
    [15] Xue W Q, Fan J L, Lopez V G, Li J N, Jiang Y, Chai T Y, Lewis F L. New Methods for Optimal Operational Control of Industrial Processes Using Reinforcement Learning on Two Time Scales. IEEE Transactions on Industrial Informatics, 2020, 16(5): 3085−3099 doi: 10.1109/TII.2019.2912018
    [16] Modares H, Lewis F L. Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica, 2014, 50(7): 1780−1792 doi: 10.1016/j.automatica.2014.05.011
    [17] Kiumarsi B, Lewis F L. Actor–critic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 26(1): 140−151
    [18] Jiang Y, Fan J L, Chai T Y, Li J N, Lewis F L. Data-driven flotation industrial process operational optimal control based on reinforcement learning. IEEE Transactions on Industrial Informatics, 2018, 14(5): 1974−1989 doi: 10.1109/TII.2017.2761852
    [19] Jiang Y, Fan J L, Chai T Y, Lewis F L. Dual-rate operational optimal control for flotation industrial process with unknown operational model. IEEE Transactions on Industrial Electronics, 2019, 66(6): 4587−4599 doi: 10.1109/TIE.2018.2856198
    [20] Gao W N, Jiang Z P. Adaptive Dynamic Programming and Adaptive Optimal Output Regulation of Linear Systems. IEEE Transactions on Automatic Control, 2016, 61(12): 4164−4169 doi: 10.1109/TAC.2016.2548662
    [21] Gao W N, Jiang Z P, Lewis F L, Wang Y B. Leader-to-Formation Stability of Multi-agent Systems: An Adaptive Optimal Control Approach. IEEE Transactions on Automatic Control, 2018, 63(10): 3581−3587 doi: 10.1109/TAC.2018.2799526
    [22] Chen C, Modares H, Xie K, Lewis F L, Wan Y, Xie S L. Reinforcement Learning-Based Adaptive Optimal Exponential Tracking Control of Linear Systems With Unknown Dynamics. IEEE Transactions on Automatic Control, 2019, 64(11): 4423−4438 doi: 10.1109/TAC.2019.2905215
    [23] Chen C, Lewis F L, Xie K, Xie S L, Liu Y L. Off-policy learning for adaptive optimal output synchronization of heterogeneous multi-agent systems. Automatica, 2020, 119: 109081 doi: 10.1016/j.automatica.2020.109081
    [24] Jiang Y, Kiumarsi B, Fan J L, Chai T Y, Li J N, Lewis. Optimal Output Regulation of Linear Discrete-Time Systems with Unknown Dynamics using Reinforcement Learning. IEEE Transactions on Cybernetics, 2020, 50(7): 3147−3156 doi: 10.1109/TCYB.2018.2890046
    [25] 庞文砚, 范家璐, 姜艺, 刘易斯·弗兰克. 基于强化学习的部分线性离散时间系统最优输出调节. 自动化学报, 已录用.

    Pang Wenyan, Fan Jialu, Jiang Yi, Lewis F L. Optimal Output Regulation of Partially Linear Discrete-Time Systems Using Reinforcement Learning. Acta Automatica Sinica, to be published.
    [26] Fan J L, Wu Q, Jiang Y, Chai T Y, Lewis F L. Model-Free Optimal Output Regulation for Linear Discrete-Time Lossy Networked Control Systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(11): 4033−4042 doi: 10.1109/TSMC.2019.2946382
    [27] Gao W N, Jiang Z P. Learning-Based Adaptive Optimal Tracking Control of Strict-Feedback Nonlinear Systems. IEEE Transactions on Neural Networks and Learning System, 2018, 29(6): 2614−2624 doi: 10.1109/TNNLS.2017.2761718
    [28] Jiang Y, Fan J L, Gao W N, Chai T Y, Lewis F L. Cooperative Adaptive Optimal Output Regulation of Discrete-Time Nonlinear Multi-Agent Systems. Automatica, 2020, 121: 109149 doi: 10.1016/j.automatica.2020.109149
    [29] Kiumarsi B, Lewis F L, Modares H, Karimpour A, Naghibisistani M B. Optimal Tracking Control of Unknown Discrete-Time Linear Systems Using Input-Output Measured Data. IEEE Transactions on Cybernetics, 2015, 45(12): 2770−2779 doi: 10.1109/TCYB.2014.2384016
    [30] Gao W N, Jiang Z P. Adaptive optimal output regulation of time-delay systems via measurement feedback. IEEE Transactions on Neural Networks and Learning System, 2018, 30(3): 938−945
    [31] 张春燕, 戚国庆, 李银伢, 盛安冬. 一种基于有限时间稳定的环绕控制器设计. 自动化学报, 2018, 44(11): 2056−2067

    Zhang Chun-Yan, Qi Guo-Qing, Li Yin-Ya, Sheng An-Dong. Standoff Tracking Control With Respect to Moving Target via Finite-time Stabilization. Acta Automatica Sinica, 2018, 44(11): 2056−2067
    [32] Hong Y G, Xu Y S, Huang J. Finite-time control for robot manipulators. Systems and control letters, 2002, 46(4): 243−253 doi: 10.1016/S0167-6911(02)00130-5
    [33] Huang J. Nonlinear output regulation: theory and applications. SIAM, 2004.
    [34] Krener A J. The construction of optimal linear and nonlinear regulators. Systems, Models and Feedback: Theory and Applications. Springer, 1992.
    [35] Arnold W F, Laub A J. Generalized eigen problem algorithms and software for algebraic Riccati equations. Proceedings of the IEEE, 1984, 72(12): 1746−1754 doi: 10.1109/PROC.1984.13083
    [36] Lewis F L, Vrabie D, Syrmos V L. Optimal Control. John Wiley & Sons, 2012.
    [37] Lancaster P, Rodman L. Algebraic Riccati Equations. New York, NY, USA: Oxford Univ. Press, 1995.
    [38] Hewer G. An iterative technique for the computation of the steady state gains for the discrete optimal regulator. IEEE Transactions on Automatic Control, 1971, 16(4): 382−384 doi: 10.1109/TAC.1971.1099755
    [39] Li J N, Chai T Y, Lewis F L, Ding Z T, Jiang Y. Off-Policy Interleaved Q-Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems. IEEE Transactions on Neural Networks and Learning System, 2019, 30(5): 1308−1320 doi: 10.1109/TNNLS.2018.2861945
    [40] Kiumarsi B, Lewis F L, Jiang Z P. H control of linear discrete-time systems: Off-policy reinforcement learning. Automatica, 2017, 78: 144−152 doi: 10.1016/j.automatica.2016.12.009
    [41] 李臻, 范家璐, 姜艺, 柴天佑. 一种基于Off-Policy的无模型输出数据反馈H控制方法. 自动化学报, 已录用

    Li Zhen, Fan Jia-Lu, Jiang Yi, Chai Tian-You. A model-Free H Method Based on Off-Policy with Output Data Feedback. Acta Automatica Sinica, to be published
  • 加载中
计量
  • 文章访问数:  87
  • HTML全文浏览量:  19
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-11-23
  • 录用日期:  2021-01-27
  • 网络出版日期:  2021-03-02

目录

    /

    返回文章
    返回