基于多智能体强化学习的流程工业多操作参数协同优化

刘柢炬; 王雅琳; 刘晨亮; 罗彪; 桂卫华

doi:10.16383/j.aas.c250308

基于多智能体强化学习的流程工业多操作参数协同优化

doi: 10.16383/j.aas.c250308 cstr: 32138.14.j.aas.c250308

1.
中南大学自动化学院长沙 410083

基金项目: 国家自然科学基金(U25A20466, 92267205, 62503507), 湖南省自然科学基金(2025JJ60423, 2025JJ10007), 湖南省教育厅研究生教改项目(2025JGYB024)资助

详细信息

作者简介:
刘柢炬：中南大学博士研究生. 主要研究方向为深度学习建模, 复杂工业过程优化控制, 强化学习. E-mail: djliu@csu.edu.cn

王雅琳：中南大学自动化学院教授. 主要研究方向为复杂工业过程的建模与优化控制, 智能控制以及过程仿真. E-mail: ylwang@csu.edu.cn

刘晨亮：中南大学自动化学院讲师. 主要研究方向为深度学习, 复杂工业过程建模与优化控制. 本文通信作者. E-mail: lcliang@csu.edu.cn

罗彪：中南大学自动化学院教授. 主要研究方向为智能控制, 强化学习, 深度学习和自主决策. E-mail: biao.luo@hotmail.com

桂卫华：中国工程院院士, 中南大学自动化学院教授. 主要研究方向为复杂工业过程建模, 优化与控制应用和故障诊断与分布式鲁棒控制. E-mail: gwh@csu.edu.cn

中图分类号: Y
计量
- 文章访问数: 606
- HTML全文浏览量: 464
- PDF下载量: 108
- 被引次数: 0
出版历程
- 收稿日期: 2025-07-10
- 录用日期: 2025-09-22
- 网络出版日期: 2025-12-02
- 刊出日期: 2026-01-20

Collaborative Optimization of Multiple Operating Parameters for Process Industries Based on Multi-Agent Reinforcement Learning

1.
School of Automation, Central South University, Changsha 410083

Funds: Supported by National Natural Science Foundation of China (U25A20466, 92267205, 62503507), Natural Science Foundation of Hunan Province (2025JJ60423, 2025JJ10007), and Hunan Provincial Department of Education Graduate Education Reform Program (2025JGYB024)

More Information

Author Bio:
LIU Di-Ju　Ph.D. candidate at the School of Automation, Central South University. His research interests include deep learning modeling, optimization and control of complex industrial process, and reinforcement learning

WANG Ya-Lin　Professor at the School of Automation, Central South University. Her research interests include the modeling, optimization and control for complex industrial process, intelligent control, and process simulation

LIU Chen-Liang　Lecturer at the School of Automation, Central South University. His research interests include deep learning, modeling and optimization and control of complex industrial process. Corresponding author of this paper

LUO Biao　Professor at the School of Automation, Central South University. His research interests include intelligent control, reinforcement learning, deep learning, and autonomous decision-making

GUI Wei-Hua　Academician of Chinese Academy of Engineering, and professor at the School of Automation, Central South University. His research interests include complex industrial process modeling, optimization and control applications, and fault diagnosis and distributed robust control

摘要

摘要: 流程工业普遍存在多操作参数强耦合、工艺拓扑复杂及多工序协同困难等问题, 传统局部优化方法难以实现全局最优运行. 针对上述挑战, 提出一种基于图谱理论的流程拓扑结构感知的多智能体强化学习协同优化方法, 以实现复杂拓扑流程工业的多操作参数协同优化. 首先, 构建基于拉普拉斯谱分析的拓扑结构解析框架, 刻画多操作参数间的耦合关系, 为智能体任务分配与协同决策提供支撑; 随后, 设计融合长短期记忆网络与多头注意机制的时序感知模块, 提取历史状态轨迹中的关键时间依赖特征; 进一步, 引入多层次空间注意力机制, 在组织层、变量层及连续控制域实现优化关注度的动态自适应调节; 在此基础上, 构建局部−全局协同的分层强化学习决策架构, 实现多智能体协调控制与策略优化. 基于连续搅拌釜反应器系统及盐湖化工典型流程工业数据开展仿真实验, 验证了所提方法的有效性. 实验结果表明, 该方法较传统方法性能提升41.2%, 在收敛速度与策略稳定性方面表现更优, 为流程工业多操作参数协同优化提供新的技术路径.
- 协同优化 /
- 图谱感知强化学习 /
- 拉普拉斯谱分析 /
- 层次化注意力 /
- 流程工业智能优化
Abstract: Process industries are often confronted with strong multi-operational parameter couplings, intricate process topologies, and difficulties in multi-stage coordination, which render conventional localized optimization methods inadequate for achieving global optimality. To address these challenges, this paper proposes a graph spectral theory-based process topology-aware multi-agent reinforcement learning collaborative optimization method for multiple operating parameter collaborative optimization in complex topological process industries. Specifically, a topology analysis framework based on Laplacian spectral analysis is developed to characterize structural coupling relationships among multiple operating parameters, thereby supporting agent task allocation and coordinated decision-making. Subsequently, a temporal perception module integrating long short-term memory networks with a multi-head attention mechanism is designed to extract key temporal dependencies from historical state trajectories. Furthermore, a hierarchical spatial attention mechanism is introduced to enable dynamic and adaptive regulation of optimization attention across organizational, variable, and continuous control domains. On this basis, a hierarchical reinforcement learning architecture is constructed to coordinate local and global policy optimization, facilitating cooperative control and strategy optimization among multiple agents. Simulation experiments using industrial data from a continuous stirred tank reactor system and a representative salt-lake chemical process validate the effectiveness of the proposed method. Experimental results show that the proposed method achieves up to a 41.2% performance improvement over conventional approaches, exhibiting superior convergence behavior and policy stability, and providing a viable technical pathway for multiple operating parameter collaborative optimization in process industries.
- collaborative optimization /
- graph-aware reinforcement learning /
- Laplacian spectral analysis /
- hierarchical attention /
- intelligent optimization of process industries

HTML全文

图 1 基于图谱理论的多智能体强化学习协同优化框架

Fig. 1 Graph-aware multi-agent reinforcement learning based cooperative optimization framework

下载: 全尺寸图片幻灯片

图 2 闭环级联特征的CSTR系统示意图

Fig. 2 The schematic diagram of a CSTR system with closed-loop cascade characteristics

下载: 全尺寸图片幻灯片

图 3 盐湖化工洗涤−结晶过程示意图

Fig. 3 The schematic diagram of the washing-crystallization process of salt lake chemical industry

下载: 全尺寸图片幻灯片

图 4 各算法学习过程中的测试奖励变化曲线

Fig. 4 The testing reward curves of change in the learning process of each algorithm

下载: 全尺寸图片幻灯片

表 1 超参数配置

Table 1 Hyperparameter configuration

组件	配置
actor网络	两层隐藏层(400, 300单元), ReLU激活
critic网络	两层隐藏层(400, 300单元), ReLU激活
图网络	隐藏维度: 64, GCN层数: 2, 循环维度: 64
优化器	Adam, 学习率: $ 3 \times 10^{-4} $
训练配置	1 000 000步, 回放缓冲: 1 000 000条转移
折扣因子$ \gamma $	0.99
软更新系数$ \tau $	0.005
批处理大小	256
多头注意力头数$ M $	8
历史窗口长度$ H $	CSTR: 24, 盐湖: 16
评估设置	10次随机种子, 报告均值和标准差

下载: 导出CSV

表 2 算法性能对比结果

Table 2 Algorithm performance comparison results

方法	CSTR过程	盐湖化工过程
DDPG	358.8030 ± 30.2031	18.5837 ± 1.3508
IDDPG	233.8783 ± 29.5558	17.1829 ± 2.4319
MADDPG	206.1738 ± 69.6413	18.8161 ± 1.1507
所提方法	506.5871 ± 25.8564	19.2423 ± 1.0250

下载: 导出CSV

参考文献(32)

[1]	阳春华, 孙备, 李勇刚, 黄科科, 桂卫华. 复杂生产流程协同优化与智能控制. 自动化学报, 2023, 49(3): 528−539 doi: 10.16383/j.aas.c220737 Yang Chun-Hua, Sun Bei, Li Yong-Gang, Huang Ke-Ke, Gui Wei-Hua. Cooperative optimization and intelligent control of complex production processes. Acta Automatica Sinica, 2023, 49(3): 528−539 doi: 10.16383/j.aas.c220737
[2]	柴天佑. 工业人工智能发展方向. 自动化学报, 2020, 46(10): 2005−2012 doi: 10.16383/j.aas.c200796 Chai Tian-You. Development directions of industrial artificial intelligence. Acta Automatica Sinica, 2020, 46(10): 2005−2012 doi: 10.16383/j.aas.c200796
[3]	Wang Y L, Tan X J, Liu C L, Huang P Q, Zhang Q, Yang C H. Exploring interpretable evolutionary optimization via significance of each constraint and population diversity. Swarm and Evolutionary Computation, 2024, 91: Article No. 101679 doi: 10.1016/j.swevo.2024.101679
[4]	Han H G, Zhang L, Zhang L L, He Z, Qiao J F. Cooperative optimal controller and its application to activated sludge process. IEEE Transactions on Cybernetics, 2019, 51(8): 3938−3951
[5]	Han H G, Tang Z C, Wu X L, Yang H Y, Qiao J F. Robust reconstructed neural network with spectral reshaping activation. IEEE Transactions on Cybernetics, 2025, 55(6): 2765−2778 doi: 10.1109/TCYB.2025.3557397
[6]	Liu D J, Wang Y L, Liu C L, Yuan X F, Wang K, Yang C H. Scope-free global multi-condition-aware industrial missing data imputation framework via diffusion transformer. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(11): 6977−6988 doi: 10.1109/TKDE.2024.3392897
[7]	Li L, Rong S M, Wang R, Yu S L. Recent advances in artificial intelligence and machine learning for nonlinear relationship analysis and process control in drinking water treatment: A review. Chemical Engineering Journal, 2021, 405: Article No. 126673 doi: 10.1016/j.cej.2020.126673
[8]	Zhao C. Perspectives on nonstationary process monitoring in the era of industrial artificial intelligence. Journal of Process Control, 2022, 116: 255−272 doi: 10.1016/j.jprocont.2022.06.011
[9]	Liu D J, Wang Y L, Liu C L, Luo B, Huang B. EKG-AC: A new paradigm for process industrial optimization based on offline reinforcement learning with expert knowledge guidance. IEEE Transactions on Cybernetics, DOI: 10.1109/TCYB.2025.3579361
[10]	Ding J, Yang C, Chai T. Recent progress on data-based optimization for mineral processing plants. Engineering, 2017, 3(2): 183−187 doi: 10.1016/J.ENG.2017.02.015
[11]	李康, 王福利, 何大阔, 贾润达. 基于数据的湿法冶金全流程操作量优化设定补偿方法. 自动化学报, 2017, 43(6): 1047−1055 Li Kang, Wang Fu-Li, He Da-Kuo, Jia Run-Da. A data-based compensation method for optimal setting of hydrometallurgical process. Acta Automatica Sinica, 2017, 43(6): 1047−1055
[12]	Schwenzer M, Ay M, Bergs T, Abel D. Review on model predictive control: An engineering perspective. The International Journal of Advanced Manufacturing Technology, 2021, 117: 1327−1349 doi: 10.1007/s00170-021-07682-3
[13]	Zhou P, Chai T, Wang H. Intelligent optimal-setting control for grinding circuits of mineral processing process. IEEE Transactions on Automation Science and Engineering, 2009, 6(4): 730−743 doi: 10.1109/TASE.2008.2011562
[14]	丁进良, 杨翠娥, 陈远东, 柴天佑. 复杂工业过程智能优化决策系统的现状与展望. 自动化学报, 2018, 44(11): 1931−1943 doi: 10.16383/j.aas.2018.c180550 Ding Jin-Liang, Yang Cui-E, Chen Yuan-Dong, Chai Tian-You. Research progress and prospects of intelligent optimization decision making in complex industrial process. Acta Automatica Sinica, 2018, 44(11): 1931−1943 doi: 10.16383/j.aas.2018.c180550
[15]	Sun B, Yang C H, Zhu H Q, Gui W H. Modeling, optimization, and control of solution purification process in zinc hydrometallurgy. IEEE/CAA Journal of Automatica Sinica, 2018, 5(2): 564−576 doi: 10.1109/JAS.2017.7510844
[16]	Lattanzi L, Raffaeli R, Peruzzini M, Pellicciari M. Digital twin for smart manufacturing: A review of concepts towards a practical industrial implementation. International Journal of Computer Integrated Manufacturing, 2021, 34(6): 567−597 doi: 10.1080/0951192X.2021.1911003
[17]	代伟, 陆文捷, 付俊, 马小平. 工业过程多速率分层运行优化控制. 自动化学报, 2019, 45(10): 1946−1959 Dai Wei, Lu Wen-Jie, Fu Jun, Ma Xiao-Ping. Multi-rate layered optimal operational control of industrial processes. Acta Automatica Sinica, 2019, 45(10): 1946−1959
[18]	阳春华, 刘一顺, 黄科科, 孙备, 李勇刚, 陈晓方, 等. 有色金属工业智能模型库构建方法及应用. 中国工程科学, 2022, 24(4): 188−201 Yang Chun-Hua, Liu Yi-Shun, Huang Ke-Ke, Sun Bei, Li Yong-Gang, Chen Xiao-Fang, et al. Intelligent model library for nonferrous metal industry: Construction method and application. Strategic Study of Chinese Academy of Engineering, 2022, 24(4): 188−201
[19]	刘强, 卓洁, 郎自强, 秦泗钊. 数据驱动的工业过程运行监控与自优化研究展望. 自动化学报, 2018, 44(11): 1944−1956 doi: 10.16383/j.aas.2018.c180207 Liu Qiang, Zhuo Jie, Lang Zi-Qiang, Qin S. Joe. Perspectives on data-driven operation monitoring and self-optimization of industrial processes. Acta Automatica Sinica, 2018, 44(11): 1944−1956 doi: 10.16383/j.aas.2018.c180207
[20]	周晓君, 阳春华, 桂卫华. 全局优化视角下的有色冶金过程建模与控制. 控制理论与应用, 2015, 32(9): 1158−1169 Zhou Xiao-Jun, Yang Chun-Hua, Gui Wei-Hua. Modeling and control of nonferrous metallurgical processes on the perspective of global optimization. Control Theory & Applications, 2015, 32(9): 1158−1169
[21]	Wei D, Ding S F, Zhang C L, Shi Z Z. Multiagent reinforcement learning with heterogeneous graph attention network. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(10): 6851−6860 doi: 10.1109/TNNLS.2022.3215774
[22]	朱美强, 程玉虎, 李明, 王雪松, 冯涣婷. 一类基于谱方法的强化学习混合迁移算法. 自动化学报, 2012, 38(11): 1765−1776 doi: 10.3724/SP.J.1004.2012.01765 Zhu Mei-Qiang, Cheng Yu-Hu, Li Ming, Wang Xue-Song, Feng Huan-Ting. A hybrid transfer algorithm for reinforcement learning based on spectral method. Acta Automatica Sinica, 2012, 38(11): 1765−1776 doi: 10.3724/SP.J.1004.2012.01765
[23]	Huang J, Su J, Chang Q. Graph neural network and multi-agent reinforcement learning for machine-process-system integrated control to optimize production yield. Journal of Manufacturing Systems, 2022, 64: 81−93 doi: 10.1016/j.jmsy.2022.05.018
[24]	Jiang Y, Fan J L, Chai T Y, Li J N, Lewis F L. Data-driven flotation industrial process operational optimal control based on reinforcement learning. IEEE Transactions on Industrial Informatics, 2017, 14(5): 1974−1989
[25]	李金娜, 袁林, 丁进良. 不确定工业过程运行指标异步更新强化学习决策算法. 自动化学报, 2023, 49(2): 461−472 doi: 10.16383/j.aas.c210983 Li Jin-Na, Yuan Lin, Ding Jin-Liang. Asynchronous updating reinforcement learning algorithm for decision-making operational indices of uncertain industrial processes. Acta Automatica Sinica, 2023, 49(2): 461−472 doi: 10.16383/j.aas.c210983
[26]	Dogru O, Chiplunkar R, Huang B. Reinforcement learning with constrained uncertain reward function through particle filtering. IEEE Transactions on Industrial Electronics, 2022, 69(7): 7491−7499 doi: 10.1109/TIE.2021.3099234
[27]	Yoo H, Byun H E, Han D, Lee J H. Reinforcement learning for batch process control: Review and perspectives. Annual Reviews in Control, 2021, 52: 108−119 doi: 10.1016/j.arcontrol.2021.10.006
[28]	Shakya A K, Pillai G, Chakrabarty S. Reinforcement learning algorithms: A brief survey. Expert Systems With Applications, 2023, 231: Article No. 120495 doi: 10.1016/j.eswa.2023.120495
[29]	Liu D J, Wang Y L, Liu C L, Yuan X F, Yang C H, Gui W H. Data mode related interpretable transformer network for predictive modeling and key sample analysis in industrial processes. IEEE Transactions on Industrial Informatics, 2023, 19(9): 9325−9336 doi: 10.1109/TII.2022.3227731
[30]	Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv: 1509.02971, 2015.
[31]	Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P. Benchmarking deep reinforcement learning for continuous control. In: Proceedings of the International Conference on Machine Learning. New York, USA: PMLR, 2016. 1329−1338
[32]	Lowe R, Wu Y I, Tamar A, Harb J, Abbeel P, Mordatch I, et al. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in Neural Information Processing Systems, 2017, 30: 1−12