2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

数据驱动的保证收敛速率最优输出调节

姜艺 范家璐 柴天佑

姜艺, 范家璐, 柴天佑. 数据驱动的保证收敛速率最优输出调节. 自动化学报, 2022, 48(4): 980−991 doi: 10.16383/j.aas.c200932
引用本文: 姜艺, 范家璐, 柴天佑. 数据驱动的保证收敛速率最优输出调节. 自动化学报, 2022, 48(4): 980−991 doi: 10.16383/j.aas.c200932
Jiang Yi, Fan Jia-Lu, Chai Tian-You. Data-driven optimal output regulation with assured convergence rate. Acta Automatica Sinica, 2022, 48(4): 980−991 doi: 10.16383/j.aas.c200932
Citation: Jiang Yi, Fan Jia-Lu, Chai Tian-You. Data-driven optimal output regulation with assured convergence rate. Acta Automatica Sinica, 2022, 48(4): 980−991 doi: 10.16383/j.aas.c200932

数据驱动的保证收敛速率最优输出调节

doi: 10.16383/j.aas.c200932
基金项目: 国家自然科学基金 (61991404, 61991403, 61991400, 61533015), 中央高校基本科研专项资金 (N180804001), 2020年度辽宁省科技重大专项计划 (2020JH1/10100008)资助
详细信息
    作者简介:

    姜艺:2020年获东北大学控制理论与控制工程博士学位. 主要研究方向为工业过程运行控制,网络控制,自适应动制,网络控制,自适应动态规划和强化学习. E-mail: yjian22@cityu.edu.hk

    范家璐:东北大学流程工业综合自动化国家重点实验室副教授. 2011 年获浙江大学控制科学与工程系博士学位,2009至2010年美国宾夕法尼亚州立大学访问学者. 主要研究方向为工业过程运行控制,工业无线传感器网络与移动社会网络. E-mail: jlfan@mail.neu.edu.cn

    柴天佑:中国工程院院士, 东北大学教授.1985年获东北大学博士学位. 主要研究方向为自适应控制, 智能解耦控制, 流程工业综台自动化理论和方法与技术. 本文通信作者. E-mail: tychai@mail.neu.edu.cn

Data-driven Optimal Output Regulation With Assured Convergence Rate

Funds: Supported by National Natural Science Foundation of China (61991404, 61991403, 61991400, 61533015), the Fundamental Research Funds for the Central Universities (N180804001) and 2020 Science and Technology Major Project of Liaoning Province (2020JH1/10100008)
More Information
    Author Bio:

    JIANG Yi He received his Ph. D. degree in Control Theory and Engineering from Northeastern University in 2020. His research interest covers industrial process operational control, networked control, adaptive dynamic programming and reinforcement learning

    FAN Jia-Lu Associate professor at the State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University. She received her Ph. D. degree from Zhejiang University in 2011. She was a visiting scholar at Pennsylvania State University during 2009 to 2010. Her research interest covers networked operational control, industrial wireless sensor networks and mobile social networks

    CHAI Tian-You Academician of Chinese Academy of Engineering, professor at Northeastern University. He received his Ph. D. degree from Northeastern University in 1985. His research interest covers adaptive control, intelligent decoupling control and integrated automation theory, method and technology of industrial process. Corresponding author of this paper

  • 摘要: 针对具有外部系统扰动的线性离散时间系统的输出调节问题, 提出了可保证收敛速率的数据驱动最优输出调节方法, 包括状态可在线测量系统的基于状态反馈的算法, 与状态不可在线测量系统的基于输出反馈的算法. 首先, 该问题被分解为输出调节方程求解问题与反馈控制律设计问题, 基于输出调节方程的解, 通过引入收敛速率参数, 建立了可保证收敛速率的最优控制问题, 通过求解该问题得到具有保证收敛速率的输出调节器. 之后, 利用强化学习的方法, 设计基于值迭代的数据驱动状态反馈控制器, 学习得到基于状态反馈的最优输出调节器. 对于状态无法在线测量的被控对象, 利用历史输入输出数据对状态进行重构, 并以此为基础设计基于值迭代的数据驱动输出反馈控制器. 仿真结果验证了所提方法的有效性.
  • 图  1  基于状态反馈的输出y(k)与参考信号yd(k)轨迹

    Fig.  1  Trajectories of the output y(k) and the reference signal yd(k) via state feedback

    图  3  基于状态反馈的误差e(k)与${\gamma ^{ - k}}e({k_0})$对比曲线

    Fig.  3  Comparison curve of e(k) and ${\gamma ^{ - k}}e({k_0})$ via state feedback

    图  2  基于状态反馈的$ \Vert {P}_{j}-{P}^{*}\Vert $$ \Vert {K}_{j}-{K}^{*}\Vert $误差轨迹

    Fig.  2  Trajectory of the error between $ \Vert {P}_{j}-{P}^{*}\Vert $ and $ \Vert {K}_{j}-{K}^{*}\Vert $ via state feedback

    图  4  基于输出反馈的输出y(k)与参考信号yd(k)轨迹

    Fig.  4  Trajectories of the output y(k) and the reference signal yd(k) via output feedback

    图  6  基于输出反馈的误差e(k)与${\gamma ^{ - k}}e({k_0})$对比曲线

    Fig.  6  Comparison curve of e(k) and ${\gamma ^{ - k}}e({k_0})$ via output feedback

    图  5  基于输出反馈的$ \Vert {\overline{P}}_{j}-{\overline{P}}^{*}\Vert $$ \Vert {\overline{K}}_{j}-{\overline{K}}^{*}\Vert $误差轨迹

    Fig.  5  Trajectory of the error between $ \Vert {\overline{P}}_{j}-{\overline{P}}^{*}\Vert $ and $ \Vert {\overline{K}}_{j}-{\overline{K}}^{*}\Vert $ via output feedback

    图  7  对比仿真结果

    Fig.  7  Comparison of simulation results

  • [1] Åström K J, Tore H. PID Controllers: Theory, Design, and Tuning. Research Triangle Park, NC: Instrument Society of America, 1995.
    [2] Garcia C E, Prett D M, Morari M. Model predictive control: theory and practice—a survey. Automatica, 1989, 25(3): 335-348. doi: 10.1016/0005-1098(89)90002-2
    [3] Francis B A. The Linear Multivariable Regulator Problem. SIAM Journal on Control and Optimization, 1977, 15(3): 486-505. doi: 10.1137/0315033
    [4] Isidori A, Byrnes C I. Output regulation of nonlinear systems. IEEE Transactions on Automatic Control, 1990, 35(2): 131-140. doi: 10.1109/9.45168
    [5] Ding Z T. Output regulation of uncertain nonlinear systems with nonlinear exosystems. IEEE Transactions on Automatic Control, 2006, 51(3): 498-503. doi: 10.1109/TAC.2005.864199
    [6] Huang J, Chen Z. A general framework for tackling the output regulation problem. IEEE Transactions on Automatic Control, 2004, 49(12): 2203-2218. doi: 10.1109/TAC.2004.839236
    [7] Parks P. Liapunov redesign of model reference adaptive control systems. IEEE Transactions on Automatic Control, 1966, 11(3): 362-367. doi: 10.1109/TAC.1966.1098361
    [8] 田涛涛, 侯忠生, 刘世达, 邓志东. 基于无模型自适应控制的无人驾驶汽车横向控制方法. 自动化学报, 2017, 43(11): 1931-1940.

    Tian Tao-Tao, Hou Zhong-Sheng, Liu Shi-Da, Deng Zhi-Dong. Model-free Adaptive Control Based Lateral Control of Self-driving Car. Acta Automatica Sinica, 2017, 43(11): 1931-1940.
    [9] 于欣波, 贺威, 薛程谦, 孙永坤, 孙长银. 基于扰动观测器的机器人自适应神经网络跟踪控制研究. 自动化学报, 2019, 45(7): 1307-1324.

    Yu Xin-Bo, He Wei, Xue Cheng-Qian, Sun Yong-Kun, Sun Chang-Yin. Disturbance Observer-based Adaptive Neural Network Tracking Control for Robots. Acta Automatica Sinica, 2019, 45(7): 1307-1324.
    [10] Modares H, Lewis F L. Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning. IEEE Transactions on Automatic Control, 2014, 59(11): 3051-3056. doi: 10.1109/TAC.2014.2317301
    [11] Xue W Q, Fan J L, Lopez V G, Jiang Y, Chai T Y, Lewis F L. Off-policy reinforcement learning for tracking in continuous-time systems on two time-scales. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32 (10), 4334−4346
    [12] Kiumarsi B, Lewis F L, Modares H, Karimpour A, Naghibisistani M B. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica, 2014, 50(4): 1167-1175. doi: 10.1016/j.automatica.2014.02.015
    [13] Jiang Y, Fan J, Chai T, Lewis F L, Li J N. Tracking Control for Linear Discrete-Time Networked Control Systems With Unknown Dynamics and Dropout. IEEE Transactions on Neural Networks and Learning System, 2018, 29(10): 4607-4620. doi: 10.1109/TNNLS.2017.2771459
    [14] 吴倩, 范家璐, 姜艺, 柴天佑. 无线网络环境下数据驱动混合选别浓密过程双率控制方法. 自动化学报, 2019, 45(6): 1122-1135.

    Wu Qian, Fan Jia-Lu, Jiang Yi, Chai Tian-You. Data-driven Dual-rate Control for Mixed Separation Thickening Process in a Wireless Network Environment. Acta Automatica Sinica, 2019, 45(6): 1122-1135.
    [15] Xue W Q, Fan J L, Lopez V G, Li J N, Jiang Y, Chai T Y, Lewis F L. New Methods for Optimal Operational Control of Industrial Processes Using Reinforcement Learning on Two Time Scales. IEEE Transactions on Industrial Informatics, 2020, 16(5): 3085-3099. doi: 10.1109/TII.2019.2912018
    [16] Modares H, Lewis F L. Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica, 2014, 50(7): 1780-1792. doi: 10.1016/j.automatica.2014.05.011
    [17] Kiumarsi B, Lewis F L. Actor–critic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 26(1): 140-151.
    [18] Jiang Y, Fan J L, Chai T Y, Li J N, Lewis F L. Data-driven flotation industrial process operational optimal control based on reinforcement learning. IEEE Transactions on Industrial Informatics, 2018, 14(5): 1974-1989. doi: 10.1109/TII.2017.2761852
    [19] Jiang Y, Fan J L, Chai T Y, Lewis F L. Dual-rate operational optimal control for flotation industrial process with unknown operational model. IEEE Transactions on Industrial Electronics, 2019, 66(6): 4587-4599. doi: 10.1109/TIE.2018.2856198
    [20] Gao W N, Jiang Z P. Adaptive Dynamic Programming and Adaptive Optimal Output Regulation of Linear Systems. IEEE Transactions on Automatic Control, 2016, 61(12): 4164-4169. doi: 10.1109/TAC.2016.2548662
    [21] Gao W N, Jiang Z P, Lewis F L, Wang Y B. Leader-to-Formation Stability of Multi-agent Systems: An Adaptive Optimal Control Approach. IEEE Transactions on Automatic Control, 2018, 63(10): 3581-3587. doi: 10.1109/TAC.2018.2799526
    [22] Chen C, Modares H, Xie K, Lewis F L, Wan Y, Xie S L. Reinforcement Learning-Based Adaptive Optimal Exponential Tracking Control of Linear Systems With Unknown Dynamics. IEEE Transactions on Automatic Control, 2019, 64(11): 4423-4438. doi: 10.1109/TAC.2019.2905215
    [23] Chen C, Lewis F L, Xie K, Xie S L, Liu Y L. Off-policy learning for adaptive optimal output synchronization of heterogeneous multi-agent systems. Automatica, 2020, 119: 109081. doi: 10.1016/j.automatica.2020.109081
    [24] Jiang Y, Kiumarsi B, Fan J L, Chai T Y, Li J N, Lewis. Optimal Output Regulation of Linear Discrete-Time Systems with Unknown Dynamics using Reinforcement Learning. IEEE Transactions on Cybernetics, 2020, 50(7): 3147-3156. doi: 10.1109/TCYB.2018.2890046
    [25] 庞文砚, 范家璐, 姜艺, 刘易斯·弗兰克. 基于强化学习的部分线性离散时间系统最优输出调节. 自动化学报, DOI: 10.16383/j.aas.c190853

    Pang Wen-Yan, Fan Jia-Lu, Jiang Yi, Lewis Frank Leroy. Optimal output regulation of partially linear discrete-time systems using reinforcement learning. Acta Automatica Sinica, DOI: 10.16383/j.aas.c190853
    [26] Fan J L, Wu Q, Jiang Y, Chai T Y, Lewis F L. Model-Free Optimal Output Regulation for Linear Discrete-Time Lossy Networked Control Systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(11): 4033-4042. doi: 10.1109/TSMC.2019.2946382
    [27] Gao W N, Jiang Z P. Learning-Based Adaptive Optimal Tracking Control of Strict-Feedback Nonlinear Systems. IEEE Transactions on Neural Networks and Learning System, 2018, 29(6): 2614-2624. doi: 10.1109/TNNLS.2017.2761718
    [28] Jiang Y, Fan J L, Gao W N, Chai T Y, Lewis F L. Cooperative Adaptive Optimal Output Regulation of Discrete-Time Nonlinear Multi-Agent Systems. Automatica, 2020, 121: 109149. doi: 10.1016/j.automatica.2020.109149
    [29] Kiumarsi B, Lewis F L, Modares H, Karimpour A, Naghibisistani M B. Optimal Tracking Control of Unknown Discrete-Time Linear Systems Using Input-Output Measured Data. IEEE Transactions on Cybernetics, 2015, 45(12): 2770-2779. doi: 10.1109/TCYB.2014.2384016
    [30] Gao W N, Jiang Z P. Adaptive optimal output regulation of time-delay systems via measurement feedback. IEEE Transactions on Neural Networks and Learning System, 2018, 30(3): 938-945.
    [31] 张春燕, 戚国庆, 李银伢, 盛安冬. 一种基于有限时间稳定的环绕控制器设计. 自动化学报, 2018, 44(11): 2056-2067.

    Zhang Chun-Yan, Qi Guo-Qing, Li Yin-Ya, Sheng An-Dong. Standoff Tracking Control With Respect to Moving Target via Finite-time Stabilization. Acta Automatica Sinica, 2018, 44(11): 2056-2067.
    [32] Hong Y G, Xu Y S, Huang J. Finite-time control for robot manipulators. Systems and control letters, 2002, 46(4): 243-253. doi: 10.1016/S0167-6911(02)00130-5
    [33] Huang J. Nonlinear Output Regulation: Theory and Applications. SIAM, 2004.
    [34] Krener A J. The construction of optimal linear and nonlinear regulators. Systems, Models and Feedback: Theory and Applications. Springer, 1992.
    [35] Arnold W F, Laub A J. Generalized eigen problem algorithms and software for algebraic Riccati equations. Proceedings of the IEEE. 1984, 72(12): 1746-1754. doi: 10.1109/PROC.1984.13083
    [36] Lewis F L, Vrabie D, Syrmos V L. Optimal Control. John Wiley & Sons, 2012.
    [37] Lancaster P, Rodman L. Algebraic Riccati Equations. New York: Oxford University Press, 1995.
    [38] Hewer G. An iterative technique for the computation of the steady state gains for the discrete optimal regulator. IEEE Transactions on Automatic Control, 1971, 16(4): 382-384. doi: 10.1109/TAC.1971.1099755
    [39] Li J N, Chai T Y, Lewis F L, Ding Z T, Jiang Y. Off-Policy Interleaved Q-Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems. IEEE Transactions on Neural Networks and Learning System, 2019, 30(5): 1308-1320. doi: 10.1109/TNNLS.2018.2861945
    [40] Kiumarsi B, Lewis F L, Jiang Z P. H control of linear discrete-time systems: Off-policy reinforcement learning. Automatica, 2017, 78: 144–152. doi: 10.1016/j.automatica.2016.12.009
    [41] 李臻, 范家璐, 姜艺, 柴天佑. 一种基于Off-policy的无模型输出数据反馈H控制方法. 自动化学报, 2021, 47(9), 2182-2193

    Li Zhen, Fan Jia-Lu, Jiang Yi, Chai Tian-You. A model-free H method based on off-policy with output data feedback. Acta Automatica Sinica, 2021, 47(9), 2182−21932
  • 加载中
图(7)
计量
  • 文章访问数:  2062
  • HTML全文浏览量:  391
  • PDF下载量:  510
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-11-23
  • 录用日期:  2021-01-27
  • 网络出版日期:  2021-03-02
  • 刊出日期:  2022-04-13

目录

    /

    返回文章
    返回