2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于强化学习的部分线性离散时间系统的最优输出调节

庞文砚 范家璐 姜艺 LEWIS Frank Leroy

庞文砚, 范家璐, 姜艺, Lewis Frank Leroy. 基于强化学习的部分线性离散时间系统的最优输出调节. 自动化学报, 2022, 48(9): 2242−2253 doi: 10.16383/j.aas.c190853
引用本文: 庞文砚, 范家璐, 姜艺, Lewis Frank Leroy. 基于强化学习的部分线性离散时间系统的最优输出调节. 自动化学报, 2022, 48(9): 2242−2253 doi: 10.16383/j.aas.c190853
Pang Wen-Yan, Fan Jian-Lu, Jiang Yi, Lewis Frank Leroy. Optimal output regulation of partially linear discrete-time systems using reinforcement learning. Acta Automatica Sinica, 2022, 48(9): 2242−2253 doi: 10.16383/j.aas.c190853
Citation: Pang Wen-Yan, Fan Jian-Lu, Jiang Yi, Lewis Frank Leroy. Optimal output regulation of partially linear discrete-time systems using reinforcement learning. Acta Automatica Sinica, 2022, 48(9): 2242−2253 doi: 10.16383/j.aas.c190853

基于强化学习的部分线性离散时间系统的最优输出调节

doi: 10.16383/j.aas.c190853
基金项目: 国家自然科学基金(61533015, 61991404, 61991403)和辽宁省兴辽英才计划(XLYC2007135)资助
详细信息
    作者简介:

    庞文砚:东北大学流程工业综合自动化国家重点实验室硕士研究生. 主要研究方向为工业过程运行控制和强化学习. E-mail: pangwy799@163.com

    范家璐:东北大学流程工业综合自动化国家重点实验室副教授. 2011年获浙江大学博士学位. 主要研究方向为工业过程运行控制, 工业无线传感器网络与强化学习. 本文通信作者. E-mail: jlfan@mail.neu.edu.cn

    姜艺:中国香港城市大学博士后. 2020年获东北大学控制理论与控制工程专业博士学位. 主要研究方向为工业过程运行控制, 网络控制, 自适应动态规划和强化学习. E-mail: yjian22@cityu.edu.hk

    LEWIS Frank Leroy:德克萨斯大学阿灵顿分校教授. 主要研究方向为反馈控制, 强化学习, 智能系统, 协同控制系统和非线性系统. E-mail: lewis@uta.edu

Optimal Output Regulation of Partially Linear Discrete-time Systems Using Reinforcement Learning

Funds: Supported by National Natural Science Foundations of China (61533015, 61991404, 61991403) and Liaoning Revitalization Talents Program (XLYC2007135)
More Information
    Author Bio:

    PANG Wen-Yan Master student at the State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University. Her research interest covers industrial process operational control and reinforcement learning

    FAN Jia-Lu Associate professor at the State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University. She received her Ph.D. degree from Zhejiang University in 2011. Her research interest covers industrial process operational control, industrial wireless sensor networks and reinforcement learning. Corresponding author of this paper

    JIANG Yi Postdoctor at City University of Hong Kong, China. He received his Ph.D. degree in control theory and engineering from Northeastern University in 2020. His research interest covers industrial process operational control, networked control, adaptive dynamic programming and reinforcement learning

    LEWIS Frank Leroy Professor at University of Texas at Arlington. His research interest covers feedback control, reinforcement learning, intelligent systems, cooperative control systems and nonlinear systems

  • 摘要: 针对同时具有线性外部干扰与非线性不确定性下的离散时间部分线性系统的最优输出调节问题, 提出了仅利用在线数据的基于强化学习的数据驱动控制方法. 首先, 该问题可拆分为一个受约束的静态优化问题和一个动态规划问题, 第一个问题可以解出调节器方程的解. 第二个问题可以确定出控制器的最优反馈增益. 然后, 运用小增益定理证明了存在非线性不确定性离散时间部分线性系统的最优输出调节问题的稳定性. 针对传统的控制方法需要准确的系统模型参数用来解决这两个优化问题, 提出了一种数据驱动离线策略更新算法, 该算法仅使用在线数据找到动态规划问题的解. 然后, 基于动态规划问题的解, 利用在线数据为静态优化问题提供了最优解. 最后, 仿真结果验证了该方法的有效性.
  • 图  1  系统输出与参考轨迹及跟踪误差

    Fig.  1  Trajectories of system output and reference and tracking error

    图  3  系统干扰

    Fig.  3  The disturbance of system

    图  4  学习阶段$P$$K $的收敛情况

    Fig.  4  The convergence of $P,K$ during learning phase

    图  5  误差系统状态轨迹

    Fig.  5  The error system state trajectory

    图  2  控制输入轨迹

    Fig.  2  The control input trajectory

    图  7  对比实验2仿真结果图

    Fig.  7  The result of comparison experiment 2

    图  6  对比实验1仿真结果图

    Fig.  6  The result of comparison experiment 1

    表  1  对比实验评价指标

    Table  1  Performance index of comparison experiment

    $220<k<280$ IAE RMSE
    本文方法 1.8330×10−6 3.6653×10−8
    对比方法 8.2293 0.1349
    下载: 导出CSV
  • [1] Francis B A. The linear multivariable regulator problem. SIAM Journal on Control Optimization, 1977, 15(3): 486−505 doi: 10.1137/0315033
    [2] Davison E, Goldenberg A. Robust control of a general servomechanism problem: The servo compensator. Automatica, 1975, 11(5): 461−471 doi: 10.1016/0005-1098(75)90022-9
    [3] Davison E. The robust control of a servomechanism problem for linear time-invariant multivariable systems. IEEE Transactions on Automatic Control, 1976, 1(1): 25−34
    [4] Sontag E D. Adaptation and regulation with signal detection implies internal model. System. & Control Letters, 2003, 50(2): 119−126
    [5] Huang J. Nonlinear Output Regulation: Theory and Applications. Philadelphia: Society for Industrial and Applied Mathematics, 2004.
    [6] Saberi A, Stoorvogel A A, Sannuti P, Shi G Y. On optimal output regulation for linear systems. International Journal of Control, 76(4): 2003, 319−333 doi: 10.1080/0020717031000073054
    [7] Gao W N, Jiang Z P. Global optimal output regulation of partially linear systems via robust adaptive dynamic programming. IFAC-Papers OnLine, 2015, 48(11): 742−747 doi: 10.1016/j.ifacol.2015.09.278
    [8] Gao W N, Jiang Z P. Adaptive dynamics programming and adptive optimal output regulation of linear systems. IEEE Transactions on Automatic Control, 2016, 61(12): 4164−4169 doi: 10.1109/TAC.2016.2548662
    [9] Kiumarsi B, Vamvoudakis K G, Modares H, Lewis F L. Optimal and autonomous control using reinforcement learning: a survey. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(6): 2042−2062 doi: 10.1109/TNNLS.2017.2773458
    [10] 李臻, 范家璐, 姜艺, 柴天佑. 一种基于Off-policy的无模型输出数据反馈H∞控制方法. 自动化学报, 2021, 47(9): 2182−2193

    Li Zhen, Fan Jia-Lu, Jiang Yi, Chai Tian-You. A model-free H∞ method based on off-policy with output data feedback. Acta Automatica Sinica, 2021,47(9): 2182−2193
    [11] 姜艺. 数据驱动的复杂工业过程运行优化控制方法研究[博士论文], 东北大学, 中国, 2020

    Jiang Yi. Research on Data-driven Operational Optimization Control Approach for Complex Industrial Processes[Ph.D. disse-rtation], Northeastern University, China, 2020
    [12] Kiumarsi B, Lewis F L, Modares H, Karimpour A, Naghibi M B. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica, 2014, 50(4): 1167−1175 doi: 10.1016/j.automatica.2014.02.015
    [13] Kiumarsi B, Lewis F L, Naghibi M B, Karimpour A. Optimal tracking control of unknown discrete-time linear systems using input-output measured data. IEEE Transactions on Cybernetics, 2015, 4(12): 2770−2779
    [14] Kiumarsi B, Lewis F L. Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(1): 140−151 doi: 10.1109/TNNLS.2014.2358227
    [15] Kiumarsi B, Lewis F L, Jiang Z P. H∞ control of linear discrete-time systems: off-policy reinforcement learning. Automatica A Journal of Ifac the International Federation of Automatic Control, 2017, 78: 144−152
    [16] Modares H, Lewis F L, Jiang Z P. H∞ tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Transactions on Neural Networks and learning systems, 2015, 26(10): 2550−2562 doi: 10.1109/TNNLS.2015.2441749
    [17] Jiang Y, Fan J L, Chai T Y, Lewis F L, Li J N. Tracking control for linear discrete-time networked control systems with unknown dynamics and dropout. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(10): 4607-4620
    [18] Jiang Y, Kiumarsi B, Fan J L, Chai T Y, Li J N, Lewis F L. Optimal output regulation of linear discrete-time system with unknow dynamics using reinforcement learning. IEEE Transactions on Cybernetics, 2020, 50(4): 3147−3156
    [19] Khalil H K, Grizzle J W. Nonlinear Systems. Upper Saddle Riv-er: Prentice hall, 2002.
    [20] Lan W Y, Huang J. Robust output regulation for discrete-time nonlinear systems. International Journal of Robust and Nonlinear Control, 2005, 15(2):63−81 doi: 10.1002/rnc.970
    [21] Hewer G. An iterative technique for the computation of the steady state gains for the discrete optimal regulator. IEEE Transactions on Automatic Control, 1971, 16(4): 382−384 doi: 10.1109/TAC.1971.1099755
    [22] Werbos P J. Neural network for control and system identification. In: Proceedings of the 28th IEEE Conference on Decision and Control. Tampa, USA: 1989, 260−265
    [23] Jiang Z P, Wang Y. Input-to-state stability for discrete-time nonlinear systems. Automatica, 2001, 37: 857−869. doi: 10.1016/S0005-1098(01)00028-0
    [24] Jiang Z P, Teel A R, Praly L. Small-gain theorem for ISS systems and applications. Mathematics of Control Signals and Systems, 1994, 7(2):95−120 doi: 10.1007/BF01211469
    [25] 刘腾飞, 姜钟平. 信息约束下的非线性控制, 北京: 科学出版社, 2018.

    Liu Teng-Fei, Jiang Zhong-Ping. Nonlinear Control Under Information Constraints, Beijing: Science Press, 2018.
    [26] Jiang Y, Fan J L, Chai T Y, Lewis F L. Dual-rate operational optimal control for flotation industrial process with unknown operational model. IEEE Transaction on Industrial Electronics, 2019, 66(6): 4587−4599 doi: 10.1109/TIE.2018.2856198
    [27] Jiang Y, Fan J L, Chai T Y, Li J N, Lewis F L. Data driven flotation industrial process operational optimal control based on reinforcement learning. IEEE Transcations on Industrial Informatics, 2018, 66(5): 1974−1989
    [28] 吴倩, 范家璐, 姜艺, 柴天佑. 无线网络环境下数据驱动混合选别浓密过程双率控制方法. 自动化学报, 2019, 45(6): 1128−1141

    Wu Qian, Fan Jia-Lu, Jiang Yi, Chai Tian-You. Data-Driven Dual-Rate Control for Mixed Separation Thickening Process in a Wireless Network Environment. Acta Automatica Sinica, 2019, 45(6): 1128−1141.
    [29] 姜艺, 范家璐, 贾瑶, 柴天佑. 数据驱动的浮选过程运行反馈解耦控制方法. 自动化学报, 2019, 45(4): 759−770

    Jiang Yi, Fan Jia-Lu, Jia Yao, Chai Tian-You. Data-driven flotation process operational feedback decoupling control. Acta Automatica Sinica, 2019, 45(4): 759−770
  • 加载中
图(7) / 表(1)
计量
  • 文章访问数:  1604
  • HTML全文浏览量:  344
  • PDF下载量:  346
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-12-16
  • 录用日期:  2020-04-07
  • 网络出版日期:  2021-01-20
  • 刊出日期:  2022-09-16

目录

    /

    返回文章
    返回