基于分布式自适应内模的多智能体系统协同最优输出调节

董昱辰; 高伟男; 姜钟平

doi:10.16383/j.aas.c240371

基于分布式自适应内模的多智能体系统协同最优输出调节

doi: 10.16383/j.aas.c240371 cstr: 32138.14.j.aas.c240371

1.
东北大学流程工业综合自动化全国重点实验室沈阳 110819 中国
2.
纽约大学坦登工程学院电子与计算机工程系纽约NY 11201 美国

基金项目: 国家自然科学基金(62373090), 国家重点研发计划(2024YFA1012702)资助

详细信息

作者简介:
董昱辰：东北大学流程工业综合自动化全国重点实验室博士研究生. 2023年获得河北工业大学人工智能与数据科学学院硕士学位. 主要研究方向为网络攻击, 强化学习, 数据驱动和弹性控制. E-mail: 2310268@stu.neu.edu.cn

高伟男：东北大学流程工业综合自动化全国重点实验室教授. 2017年获得美国纽约大学博士学位. 主要研究方向为人工智能, 自适应动态规划, 优化控制和输出调节. 本文通信作者. E-mail: gaown@mail.neu.edu.cn

姜钟平：欧洲科学院外籍院士, 美国纽约大学教授, IEEE Fellow, IFAC Fellow. 1993年获得法国巴黎高等矿业大学自动控制与数学博士学位. 主要研究方向为稳定性理论, 鲁棒/自适应/分布式非线性控制, 鲁棒自适应动态规划, 强化学习及其在信息, 机械和生物系统中的应用. E-mail: zjiang@nyu.edu

计量
- 文章访问数: 3355
- HTML全文浏览量: 2948
- PDF下载量: 251
- 被引次数: 0
出版历程
- 收稿日期: 2024-06-20
- 录用日期: 2025-01-17
- 网络出版日期: 2025-02-13
- 刊出日期: 2025-03-18

Cooperative Optimal Output Regulation for Multi-agent Systems Based on Distributed Adaptive Internal Model

1.
State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, China
2.
Department of Electrical and Computer Engineering, Tandon School of Engineering, New York University, New York NY 11201, USA

Funds: Supported by National Natural Science Foundation of China (62373090) and National Key Research and Development Program of China (2024YFA1012702)

More Information

Author Bio:
DONG Yu-Chen　Ph.D. candidate at the State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University. She received her master degree from the School of Artificial Intelligence, Hebei University of Technology in 2023. Her research interest covers network attacks, reinforcement learning, data-driven, and resilient control

GAO Wei-Nan　Professor at the State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University. He received his Ph.D. degree from New York University, USA in 2017. His research interest covers artificial intelligence, adaptive dynamic programming, optimal control, and output regulation. Corresponding author of this paper

JIANG Zhong-Ping　Foreign Member of the Academia Europaea (Academy of Europe), professor at the New York University, USA, IEEE Fellow, IFAC Fellow. He received his Ph.D. degree in automatic control and mathematics from the Ecole des Mines de Paris, France in 1993. His research interest covers stability theory, robust/adaptive/distributed nonlinear control, robust adaptive dynamic programming, reinforcement learning and their applications in information, mechanical, and biological systems

摘要

摘要: 针对离散时间多智能体系统的协同最优输出调节问题, 在不依赖多智能体系统矩阵精确信息的条件下提出分布式数据驱动自适应控制策略. 基于自适应动态规划和分布式自适应内模, 通过引入值迭代和策略迭代两种强化学习算法, 利用在线数据学习最优控制器, 实现多智能体系统的协同输出调节. 考虑到跟随者只能访问领导者的估计值进行在线学习, 对闭环系统的稳定性和学习算法的收敛性进行严格的理论分析, 证明所学习的控制增益可以收敛到最优控制增益. 仿真结果验证了所提控制方法的有效性.
- 自适应动态规划 /
- 分布式自适应内模 /
- 强化学习 /
- 协同输出调节 /
- 多智能体系统
Abstract: In this paper, a distributed data-driven adaptive control strategy is proposed for the problem of cooperative optimal output regulation of discrete-time multi-agent systems, in the absence of precise information of multi-agent system matrices. Based on adaptive dynamic programming and distributed adaptive internal model, two reinforcement learning algorithms, value iteration and policy iteration, are introduced to learn the optimal controller by using online data, so as to achieve the cooperative output regulation of multi-agent systems. Considering that the followers can only access the estimated value of the leader for online learning, in order to prove that the learned control gain converges to the optimal control gain, this paper provides a rigorous analysis of the stability of the closed-loop system and the convergence of the learning algorithm. The simulation results verify the effectiveness of the proposed control method.
- Adaptive dynamic programming /
- distributed adaptive internal model /
- reinforcement learning /
- cooperative output regulation /
- multi-agent systems

HTML全文

图 1 网络拓扑

Fig. 1 Network topology

下载: 全尺寸图片幻灯片

图 2 算法1下跟随者#1 ~ #6对外部系统矩阵E的估计

Fig. 2 Estimation of exosystem matrix E for followers #1 ~ #6 under Algorithm 1

下载: 全尺寸图片幻灯片

图 3 算法1下${\cal{W}}_{i}^{(m)}$, $i=1,\;2,\;\cdots,\;20$及其最优值的比较

Fig. 3 The comparison of ${\cal{W}}_{i}^{(m)}$, $i=1,\;2,\;\cdots,\;20$ and their optimal values under Algorithm 1

下载: 全尺寸图片幻灯片

图 4 算法1下$K_{i}^{(m)}$, $i=1,\;2,\;\cdots,\;20$及其最优值的比较

Fig. 4 The comparison of $K_{i}^{(m)}$, $i=1,\;2,\;\cdots,\;20$ and their optimal values under Algorithm 1

下载: 全尺寸图片幻灯片

图 5 算法1下智能体$i$, $i=1,\;2,\;\cdots,\;20$的跟踪误差

Fig. 5 Tracking errors of agent $i$, $i=1,\;2,\;\cdots,\;20$ under Algorithm 1

下载: 全尺寸图片幻灯片

图 6 算法1下智能体$i$, $i=1,\;2,\;\cdots,\;20$的分布式控制输入

Fig. 6 Distributed control inputs of agent $i$, $i=1,\;2,\;\cdots,\;$ $20$ under Algorithm 1

下载: 全尺寸图片幻灯片

图 7 算法2下$K_{i}^{(m)}$, $i=1,\;2,\;\cdots,\;20$及其最优值的比较

Fig. 7 The comparison of $K_{i}^{(m)}$, $i=1,\;2,\;\cdots,\;20$ and their optimal values under Algorithm 2

下载: 全尺寸图片幻灯片

图 8 算法2下智能体$i$, $i=1,\;2,\;\cdots,\;20$的跟踪误差

Fig. 8 Tracking errors of agent $i$, $i=1,\;2,\;\cdots,\;20$ under Algorithm 2

下载: 全尺寸图片幻灯片

图 9 算法2下跟随者#7 ~ #12对外部系统矩阵$E$的估计

Fig. 9 Estimation of exosystem matrix $E$ for followers #7 ~ #12 under Algorithm 2

下载: 全尺寸图片幻灯片

图 10 算法2下智能体$i$, $i=1,\;2,\;\cdots,\;20$的分布式控制输入

Fig. 10 Distributed control inputs of agent $i$, $i=1,\;2,\;$ $\cdots,\;20$ under Algorithm 2

下载: 全尺寸图片幻灯片

图 11 本文提出的值迭代控制策略与其他控制策略下跟踪误差动态响应对比

Fig. 11 Comparison of the tracking error dynamic response under the value iteration control strategy proposed in this paper and other control strategies

下载: 全尺寸图片幻灯片

图 12 本文提出的策略迭代控制策略与其他控制策略下跟踪误差动态响应对比

Fig. 12 Comparison of the tracking error dynamic response under the policy iteration control strategy proposed in this paper and other control strategies

下载: 全尺寸图片幻灯片

参考文献(35)

[1]	Zhang L, Chen Z Y, Yu X H, Yang J, Li S H. Sliding-mode-based robust output regulation and its application in PMSM servo systems. IEEE Transactions on Industrial Electronics, 2022, 70(2): 1852−1860
[2]	Enderes T, Gabriel J, Deutscher J. Cooperative output regulation for networks of hyperbolic systems using adaptive cooperative observers. Automatica, 2024, 162: Article No. 111506 doi: 10.1016/j.automatica.2023.111506
[3]	Li Y S, Zhang Y, Li X D, Sun C Y. Regional multi-agent cooperative reinforcement learning for city-level traffic grid signal control. IEEE/CAA Journal of Automatica Sinica, 2024, 11(9): 1987−1998 doi: 10.1109/JAS.2024.124365
[4]	Zhang X F, Wang G, Sun J. Data-driven control of consensus tracking for discrete-time multi-agent systems. Journal of the Franklin Institute, 2023, 360(7): 4661−4674 doi: 10.1016/j.jfranklin.2023.02.036
[5]	Zheng S Q, Shi P, Zhang H Y. Semiglobal periodic event-triggered output regulation for nonlinear multiagent systems. IEEE Transactions on Automatic Control, 2023, 68(1): 393−399 doi: 10.1109/TAC.2022.3142123
[6]	Liu T, Huang J. Adaptive cooperative output regulation of discrete-time linear multi-agent systems by a distributed feedback control law. IEEE Transactions on Automatic Control, 2018, 63(12): 4383−4390 doi: 10.1109/TAC.2018.2823266
[7]	Qasem O, Davari M, Gao W, Kirk D R, Chai T Y. Hybrid iteration ADP algorithm to solve cooperative, optimal output regulation problem for continuous-time, linear, multiagent systems: Theory and application in islanded modern microgrids with IBRs. IEEE Transactions on Industrial Electronics, 2023, 71(1): 834−845
[8]	Zhang H G, Liang H J, Wang Z S, Feng T. Optimal output regulation for heterogeneous multiagent systems via adaptive dynamic programming. IEEE Transactions on Neural Networks and Learning Systems, 2015, 28(1): 18−29
[9]	Jing G X, Huang B N, Sun J Y, Xie X P, Sun Q Y. Distributed cooperative control for power sharing of DC distribution network with event-triggered communication mechanism. International Journal of Robust and Nonlinear Control, 2024, 34(5): 3351−3373 doi: 10.1002/rnc.7140
[10]	Hong Y X, Su Y F, Cai H. Internal model based cooperative robust resilient control under DoS attacks with application to vehicles formation. IEEE Transactions on Industrial Informatics, 2024, 20(11): 13124−13134 doi: 10.1109/TII.2024.3431094
[11]	Cai H, Su Y F, Huang J. Cooperative robust output regulation for a class of nonlinear multi-agent systems over jointly connected switching networks. International Journal of Control, 2024, 97(11): 2625−2638
[12]	Xie K D, Jiang Y, Yu X, Lan W Y. Data-driven cooperative optimal output regulation for linear discrete-time multi-agent systems by online distributed adaptive internal model approach. Science China Information Sciences, 2023, 66(7): Article No. 170202 doi: 10.1007/s11432-022-3687-1
[13]	Hao Y H, Zhang J, Liu L. Fully distributed event-triggered cooperative output regulation of multi-agent systems under jointly connected digraphs. IEEE Transactions on Automatic Control, 2023, 68(7): 4241−4248
[14]	Song G, Shi P, Lim C P. Distributed fault-tolerant cooperative output regulation for multiagent networks via fixed-time observer and adaptive control. IEEE Transactions on Control of Network Systems, 2021, 9(2): 845−855
[15]	Deng C, Zhang D, Feng G. Resilient practical cooperative output regulation for MASs with unknown switching exosystem dynamics under DoS attacks. Automatica, 2022, 139: Article No. 110172 doi: 10.1016/j.automatica.2022.110172
[16]	姜艺, 范家璐, 柴天佑. 数据驱动的保证收敛速率最优输出调节. 自动化学报, 2022, 48(4): 980−991 Jiang Yi, Fan Jia-Lu, Chai Tian-You. Data-driven optimal output regulation with assured convergence rate. Acta Automatica Sinica, 2022, 48(4): 980−991
[17]	Bian T, Jiang Z P. Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design. Automatica, 2016, 71: 348−360 doi: 10.1016/j.automatica.2016.05.003
[18]	Gao W, Jiang Y, Davari M. Data-driven cooperative output regulation of multi-agent systems via robust adaptive dynamic programming. IEEE Transactions on Circuits and Systems II: Express Briefs, 2018, 66(3): 447−451
[19]	赵建国, 杨春雨. 复杂工业过程非串级双速率组合分散运行优化控制. 自动化学报, 2023, 49(1): 172−184 Zhao Jian-Guo, Yang Chun-Yu. Non-cascade dual-rate composite decentralized operational optimal control for complex industrial processes. Acta Automatica Sinica, 2023, 49(1): 172−184
[20]	Jiang Y, Jiang Z P. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica, 2012, 48(10): 2699−2704 doi: 10.1016/j.automatica.2012.06.096
[21]	Wang B J, Xu L, Yi X L, Jia Y, Yang T. Semiglobal suboptimal output regulation for heterogeneous multi-agent systems with input saturation via adaptive dynamic programming. IEEE Transactions on Neural Networks and Learning Systems, 2022, 35(3): 3242−3250
[22]	Liu W J, Sun J, Wang G, Bullo F, Chen J. Data-driven self-triggered control via trajectory prediction. IEEE Transactions on Automatic Control, 2023, 68(11): 6951−6958 doi: 10.1109/TAC.2023.3244116
[23]	Liu S L, Niu B, Zong G D, Zhao X D, Xu N. Data-driven-based event-triggered optimal control of unknown nonlinear systems with input constraints. Nonlinear Dynamics, 2022, 109(2): 891−909 doi: 10.1007/s11071-022-07459-7
[24]	Jiang Y, Fan J L, Gao W, Chai T Y, Lewis F L. Cooperative adaptive optimal output regulation of nonlinear discrete-time multi-agent systems. Automatica, 2020, 121: Article No. 109149 doi: 10.1016/j.automatica.2020.109149
[25]	Gao W, Jiang Z P, Lewis F L, Wang Y B. Leader-to-formation stability of multiagent systems: An adaptive optimal control approach. IEEE Transactions on Automatic Control, 2018, 63(10): 3581−3587 doi: 10.1109/TAC.2018.2799526
[26]	Cai H, Lewis F L, Hu G Q, Huang J. The adaptive distributed observer approach to the cooperative output regulation of linear multi-agent systems. Automatica, 2017, 75: 299−305 doi: 10.1016/j.automatica.2016.09.038
[27]	Gao W, Mynuddin M, Wunsch D C, Jiang Z P. Reinforcement learning-based cooperative optimal output regulation via distributed adaptive internal model. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(10): 5229−5240
[28]	Gao W, Liu Y Y, Odekunle A, Yu Y J, Liu P L. Adaptive dynamic programming and cooperative output regulation of discrete-time multi-agent systems. International Journal of Control, Automation and Systems, 2018, 16(5): 2273−2281 doi: 10.1007/s12555-017-0635-8
[29]	Huang J. Nonlinear output regulation: Theory and applications. Society for Industrial and Applied Mathematics, 2004.
[30]	Huang J. The cooperative output regulation problem of discrete time linear multi-agent systems by the adaptive distributed observer. IEEE Transactions on Automatic Control, 2016, 62(4): 1979−1984
[31]	Jiang Z P, Wang Y. Input-to-state stability for discrete-time nonlinear systems. Automatica, 2001, 37(6): 857−869 doi: 10.1016/S0005-1098(01)00028-0
[32]	Yuan J, Wonham W. Probing signals for model reference identification. IEEE Transactions on Automatic Control, 1977, 22(4): 530−538 doi: 10.1109/TAC.1977.1101556
[33]	Lewis F L, Vamvoudakis K G. Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2010, 41(1): 14−25
[34]	Kleinman D. On an iterative technique for Riccati equation computations. IEEE Transactions on Automatic Control, 1968, 13(1): 114−115 doi: 10.1109/TAC.1968.1098829
[35]	Yan Y, Huang J. Cooperative robust output regulation problem for discrete-time linear time-delay multi-agent systems. International Journal of Robust and Nonlinear Control, 2018, 28(3): 1035−1048 doi: 10.1002/rnc.3917