一种基于Off-Policy的无模型输出数据反馈H 控制方法

李臻 范家璐 姜艺 柴天佑

Li Zhen, Fan Jia-Lu, Jiang Yi, Chai Tian-You. A model-free H∞ control method based on off-policy with output data feedback. Acta Automatica Sinica, 2021, 47(9): 2182−2193 doi: 10.16383/j.aas.c190499
一种基于Off-Policy的无模型输出数据反馈H 控制方法

doi: 10.16383/j.aas.c190499
基金项目: 国家自然科学基金(61533015, 61304028),兴辽英才计划(XLYC2007135)资助

    李臻:东北大学流程工业综合自动化国家重点实验室硕士研究生. 主要研究方向为工业过程运行控制, 强化学习. E-mail: alilili0131@gmail.com

    范家璐:东北大学流程工业综合自动化国家重点实验室副教授. 2011年获得浙江大学博士学位(与美国宾夕法尼亚州立大学联合培养). 主要研究方向为工业过程运行控制, 工业无线传感器网络与移动社会网络. 本文通信作者.E-mail: jlfan@mail.neu.edu.cn

    姜艺:东北大学流程工业综合自动化国家重点实验室博士研究生. 2016 年获得东北大学控制理论与控制工程硕士学位. 主要研究方向为工业过程运行控制, 网络控制, 自适应动态规划, 强化学习.E-mail: JY369356904@163.com

    柴天佑:中国工程院院士, 东北大学教授, IEEE Fellow, IFAC Fellow. 1985年获得东北大学博士学位. 主要研究方向为自适应控制, 智能解耦控制, 流程工业综台自动化理论、方法与技术. E-mail: tychai@mail.neu.edu.cn

A Model-Free H Control Method Based on Off-Policy With Output Data Feedback

Funds: Supported by National Natural Science Foundations of China (61533015, 61304028) and Liaoning Revitalization Talents Program (XLYC2007135)
    Author Bio:

    LI Zhen Master student at the State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University. His research interest covers industrial process operational control and reinforcement learning

    FAN Jia-Lu Associate professor at the State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University. She received her Ph.D. degree in control science and engineering from Zhejiang University in 2011. She was a visiting scholar with the Pennsylvania State University during 2009 to 2010. Her research interest covers industrial process operational control and reinforcement learning. Corresponding author of this paper

    JIANG Yi Ph.D. candidate at the State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University. He received his master degree in control theory and engineering from Northeastern University in 2016. His research interest covers industrial process operational control, networked control, adaptive dynamic programming, and reinforcement learning

    CHAI Tian-You Academician of Chinese Academy of Engineering, professor at Northeastern University, IEEE Fellow, IFAC Fellow. He received his Ph.D. degree from Northeastern University in 1985. His research interest covers adaptive control, intelligent decoupling control, and integrated automation theory, method and technology of industrial process

  • 摘要: 针对模型未知的线性离散系统在扰动存在条件下的调节控制问题, 提出了一种基于Off-policy的输入输出数据反馈的H控制方法. 本文从状态反馈在线学习算法出发, 针对系统运行过程中状态数据难以测得的问题, 通过引入增广数据向量将状态反馈策略迭代在线学习算法转化为输入输出数据反馈在线学习算法. 更进一步, 通过引入辅助项的方法将输入输出数据反馈策略迭代在线学习算法转化为无模型输入输出数据反馈Off-policy学习算法. 该算法利用历史输入输出数据实现最优输出反馈策略的学习, 同时克服了On-policy算法需要频繁与实际环境进行交互这一缺点. 除此之外, 与On-policy算法相比, Off-policy学习算法具有克服学习噪声的影响, 使学习结果收敛于理论最优值这一优点. 最终, 通过仿真实验验证了学习算法的收敛性.
  • 图  1  飞机飞行示意图

    Fig.  1  Aircraft flight diagram

    图  2  三组实验参数收敛曲线

    Fig.  2  Three groups of experimental parameters convergence curves

    图  3  三组实验范数收敛曲线

    Fig.  3  Three groups of experimental parameters convergence curves

