董昱辰 高伟男 姜钟平

董昱辰, 高伟男, 姜钟平. 基于分布式自适应内模的多智能体系统协同最优输出调节. 自动化学报, 2025, 51(3): 678−691 doi: 10.16383/j.aas.c240371
Dong Yu-Chen, Gao Wei-Nan, Jiang Zhong-Ping. Cooperative optimal output regulation for multi-agent systems based on distributed adaptive internal model. Acta Automatica Sinica, 2025, 51(3): 678−691 doi: 10.16383/j.aas.c240371
doi: 10.16383/j.aas.c240371 cstr: 32138.14.j.aas.c240371
基金项目: 国家自然科学基金(62373090), 国家重点研发计划(2024YFA1012702)资助

    董昱辰:东北大学流程工业综合自动化全国重点实验室博士研究生. 2023年获得河北工业大学人工智能与数据科学学院硕士学位. 主要研究方向为网络攻击, 强化学习, 数据驱动和弹性控制. E-mail: 2310268@stu.neu.edu.cn

    高伟男:东北大学流程工业综合自动化全国重点实验室教授. 2017年获得美国纽约大学博士学位. 主要研究方向为人工智能, 自适应动态规划, 优化控制和输出调节. 本文通信作者. E-mail: gaown@mail.neu.edu.cn

    姜钟平:欧洲科学院外籍院士, 美国纽约大学教授, IEEE Fellow, IFAC Fellow. 1993年获得法国巴黎高等矿业大学自动控制与数学博士学位. 主要研究方向为稳定性理论, 鲁棒/自适应/分布式非线性控制, 鲁棒自适应动态规划, 强化学习及其在信息, 机械和生物系统中的应用. E-mail: zjiang@nyu.edu

Cooperative Optimal Output Regulation for Multi-agent Systems Based on Distributed Adaptive Internal Model

Funds: Supported by National Natural Science Foundation of China (62373090) and National Key Research and Development Program of China (2024YFA1012702)
    Author Bio:

    DONG Yu-Chen Ph.D. candidate at the State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University. She received her master degree from the School of Artificial Intelligence, Hebei University of Technology in 2023. Her research interest covers network attacks, reinforcement learning, data-driven, and resilient control

    GAO Wei-Nan Professor at the State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University. He received his Ph.D. degree from New York University, USA in 2017. His research interest covers artificial intelligence, adaptive dynamic programming, optimal control, and output regulation. Corresponding author of this paper

    JIANG Zhong-Ping Foreign Member of the Academia Europaea (Academy of Europe), professor at the New York University, USA, IEEE Fellow, IFAC Fellow. He received his Ph.D. degree in automatic control and mathematics from the Ecole des Mines de Paris, France in 1993. His research interest covers stability theory, robust/adaptive/distributed nonlinear control, robust adaptive dynamic programming, reinforcement learning and their applications in information, mechanical, and biological systems

  • 摘要: 针对离散时间多智能体系统的协同最优输出调节问题, 在不依赖多智能体系统矩阵精确信息的条件下提出分布式数据驱动自适应控制策略. 基于自适应动态规划和分布式自适应内模, 通过引入值迭代和策略迭代两种强化学习算法, 利用在线数据学习最优控制器, 实现多智能体系统的协同输出调节. 考虑到跟随者只能访问领导者的估计值进行在线学习, 对闭环系统的稳定性和学习算法的收敛性进行严格的理论分析, 证明所学习的控制增益可以收敛到最优控制增益. 仿真结果验证了所提控制方法的有效性.
  • 图  1  网络拓扑

    Fig.  1  Network topology

    图  2  算法1下跟随者#1 ~ #6对外部系统矩阵E的估计

    Fig.  2  Estimation of exosystem matrix E for followers #1 ~ #6 under Algorithm 1

    图  3  算法1下${\cal{W}}_{i}^{(m)}$, $i=1,\;2,\;\cdots,\;20$及其最优值的比较

    Fig.  3  The comparison of ${\cal{W}}_{i}^{(m)}$, $i=1,\;2,\;\cdots,\;20$ and their optimal values under Algorithm 1

    图  4  算法1下$K_{i}^{(m)}$, $i=1,\;2,\;\cdots,\;20$及其最优值的比较

    Fig.  4  The comparison of $K_{i}^{(m)}$, $i=1,\;2,\;\cdots,\;20$ and their optimal values under Algorithm 1

    图  5  算法1下智能体$i$, $i=1,\;2,\;\cdots,\;20$的跟踪误差

    Fig.  5  Tracking errors of agent $i$, $i=1,\;2,\;\cdots,\;20$ under Algorithm 1

    图  6  算法1下智能体$i$, $i=1,\;2,\;\cdots,\;20$的分布式控制输入

    Fig.  6  Distributed control inputs of agent $i$, $i=1,\;2,\;\cdots,\;$ $20$ under Algorithm 1

    图  7  算法2下$K_{i}^{(m)}$, $i=1,\;2,\;\cdots,\;20$及其最优值的比较

    Fig.  7  The comparison of $K_{i}^{(m)}$, $i=1,\;2,\;\cdots,\;20$ and their optimal values under Algorithm 2

    图  8  算法2下智能体$i$, $i=1,\;2,\;\cdots,\;20$的跟踪误差

    Fig.  8  Tracking errors of agent $i$, $i=1,\;2,\;\cdots,\;20$ under Algorithm 2

    图  9  算法2下跟随者#7 ~ #12对外部系统矩阵$E$的估计

    Fig.  9  Estimation of exosystem matrix $E$ for followers #7 ~ #12 under Algorithm 2

    图  10  算法2下智能体$i$, $i=1,\;2,\;\cdots,\;20$的分布式控制输入

    Fig.  10  Distributed control inputs of agent $i$, $i=1,\;2,\;$ $\cdots,\;20$ under Algorithm 2

    图  11  本文提出的值迭代控制策略与其他控制策略下跟踪误差动态响应对比

    Fig.  11  Comparison of the tracking error dynamic response under the value iteration control strategy proposed in this paper and other control strategies

    图  12  本文提出的策略迭代控制策略与其他控制策略下跟踪误差动态响应对比

    Fig.  12  Comparison of the tracking error dynamic response under the policy iteration control strategy proposed in this paper and other control strategies

