• 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于多智能体强化学习的流程工业多操作参数协同优化

刘柢炬 王雅琳 刘晨亮 罗彪 桂卫华

刘柢炬, 王雅琳, 刘晨亮, 罗彪, 桂卫华. 基于多智能体强化学习的流程工业多操作参数协同优化. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250308
引用本文: 刘柢炬, 王雅琳, 刘晨亮, 罗彪, 桂卫华. 基于多智能体强化学习的流程工业多操作参数协同优化. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250308
Liu Di-Ju, Wang Ya-Lin, Liu Chen-Liang, Luo Biao, Gui Wei-Hua. Collaborative optimization of multiple operating parameters for industrial processes based on multi-agent reinforcement learning. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250308
Citation: Liu Di-Ju, Wang Ya-Lin, Liu Chen-Liang, Luo Biao, Gui Wei-Hua. Collaborative optimization of multiple operating parameters for industrial processes based on multi-agent reinforcement learning. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250308

基于多智能体强化学习的流程工业多操作参数协同优化

doi: 10.16383/j.aas.c250308 cstr: 32138.14.j.aas.c250308
基金项目: 国家自然科学基金(U25A20466, 92267205, 62503507), 湖南省自然科学基金(2025JJ60423, 2025JJ10007), 湖南省教育厅研究生教改项目(2025JGYB024)资助
详细信息
    作者简介:

    刘柢炬:中南大学博士研究生. 主要研究方向为深度学习建模, 复杂工业过程优化控制, 强化学习. E-mail: djliu@csu.edu.cn

    王雅琳:中南大学自动化学院教授. 主要研究方向为复杂工业过程的建模与优化控制, 智能控制以及过程仿真. E-mail: ylwang@csu.edu.cn

    刘晨亮:中南大学自动化学院讲师. 主要研究方向为深度学习, 复杂工业过程建模与优化控制. 本文通信作者. E-mail: lcliang@csu.edu.cn

    罗彪:中南大学自动化学院教授. 主要研究方向为智能控制, 强化学习, 深度学习和自主决策. E-mail: biao.luo@hotmail.com

    桂卫华:中国工程院院士, 中南大学自动化学院教授. 主要研究方向为复杂工业过程建模, 优化与控制应用和故障诊断与分布式鲁棒控制. E-mail: gwh@csu.edu.cn

Collaborative Optimization of Multiple Operating Parameters for Industrial Processes Based on Multi-Agent Reinforcement Learning

Funds: Supported by National Natural Science Foundation of China (U25A20466, 92267205, 62503507), Natural Science Foundation of HuNan Province (2025JJ60423, 2025JJ10007) and Hunan Provincial Department of Education Graduate Education Reform Program (2024JGYB021)
More Information
    Author Bio:

    LIU Di-Ju Ph.D. candidate at the School of Automation, Central South University. His research interest covers deep learning modeling, optimal control of complex industrial process and reinforcement learning

    WANG Ya-Lin Professor at the School of Automation, Central South University. Her research interests include the modeling, optimization and control for complex industrial process, intelligent control, and process simulation

    LIU Chen-Liang Lecture at the School of Automation, Central South University. His research interest covers modeling and optimal control of complex industrial process

    LUO Biao Professor at the School of Automation, Central South University. His research interest covers intelligent control, reinforcement learning, deep learning, and autonomous decision-making

    GUI Wei-Hua Academician of Chinese Academy of Engineering, and professor at the School of Automation, Central South University. His research interest covers complex industrial process modeling, optimization and control applications, and fault diagnosis and distributed robust control

  • 摘要: 流程工业普遍存在多操作参数强耦合、工艺拓扑复杂和多工序协同难等问题, 导致传统局部优化方法难以满足全局最优运行需求. 针对上述挑战, 提出一种基于图谱理论的流程拓扑结构感知的多智能体强化学习协同优化方法, 以实现复杂拓扑流程工业的多操作参数协同优化. 首先, 构建基于拉普拉斯谱分析的拓扑结构解析框架, 刻画工业过程多操作参数耦合结构关系, 为智能体任务分配与协同决策提供支撑; 然后, 设计融合长短期记忆网络与多头注意机制的时序感知模块, 实现历史状态轨迹中关键时间依赖特征提取; 进一步, 引入多层次空间注意力机制, 面向组织层、变量层及连续控制域实现优化关注度的动态自适应调节; 在此基础上, 构建局部-全局协同的分层强化学习决策架构, 实现多智能体间的协调控制与策略优化. 在连续搅拌釜反应器系统和盐湖化工典型流程的工业数据基础上, 构建了仿真实验以验证所提方法的有效性. 实验结果表明, 所提方法相较于传统方法性能提升41.2%, 展现出更优的收敛性能和策略稳定性, 为流程工业多操作参数协同优化提供了新思路和参考技术路径.
  • 图  1  基于图谱理论的多智能体强化学习协同优化框架.

    Fig.  1  Graph-aware multi-agent reinforcement learning based cooperative optimization framework.

    图  2  闭环级联特征的CSTR系统示意图

    Fig.  2  The schematic diagram of a CSTR system with closed-loop cascade characteristics

    图  3  盐湖化工洗涤-结晶过程示意图

    Fig.  3  The schematic diagram of the washing-crystallization process of salt lake chemical industry

    图  4  各算法学习过程中的测试奖励变化曲线

    Fig.  4  The testing reward curves of change in the learning process of each algorithm

    表  1  超参数配置

    Table  1  Hyperparameter configuration

    组件 配置
    Actor网络 两层隐藏层(400, 300单元), ReLU激活
    Critic网络 两层隐藏层(400, 300单元), ReLU激活
    图网络 隐藏维度: 64, GCN层数: 2, 循环维度: 64
    优化器 Adam, 学习率: $ 3 \times 10^{-4} $
    训练配置 1,000,000步, 回放缓冲: 1,000,000条转移
    折扣因子$ \gamma $ 0.99
    软更新系数$ \tau $ 0.005
    批处理大小 256
    多头注意力头数$ M $ 8
    历史窗口长度$ H $ CSTR: 24, 盐湖: 16
    评估设置 10次随机种子, 报告均值和标准差
    下载: 导出CSV

    表  2  算法性能对比结果

    Table  2  Algorithm performance comparison results

    方法 CSTR过程 盐湖化工过程
    DDPG 358.8030 ± 30.2031 18.5837 ± 1.3508
    IDDPG 233.8783 ± 29.5558 17.1829 ± 2.4319
    MADDPG 206.1738 ± 69.6413 18.8161 ± 1.1507
    所提方法 506.5871 ± 25.8564 19.2423 ± 1.0250
    下载: 导出CSV
  • [1] 阳春华, 孙备, 李勇刚, 黄科科, 桂卫华. 复杂生产流程协同优化与智能控制. 自动化学报, 2023, 49(3): 528−539 doi: 10.16383/j.aas.c220737

    Yang Chun-Hua, Sun Bei, Li Yong-Gang, Huang Ke-Ke, Gui Wei-Hua. Cooperative optimization and intelligent control of complex production processes. Acta Automatica Sinica, 2023, 49(3): 528−539 doi: 10.16383/j.aas.c220737
    [2] 柴天佑. 工业人工智能发展方向. 自动化学报, 2020, 46(10): 2005−2012 doi: 10.16383/j.aas.c200796

    Chai Tian-You. Development directions of industrial artificial intelligence. Acta Automatica Sinica, 2020, 46(10): 2005−2012 doi: 10.16383/j.aas.c200796
    [3] Wang Y L, Tan X J, Liu C L, Huang P Q, Zhang Q, Yang C H. Exploring interpretable evolutionary optimization via significance of each constraint and population diversity. Swarm and Evolutionary Computation, 2024, 91: 101679 doi: 10.1016/j.swevo.2024.101679
    [4] Han H G, Zhang L, Zhang L L, He Z, Qiao J F. Cooperative optimal controller and its application to activated sludge process. IEEE Transactions on Cybernetics, 2019, 51(8): 3938−3951
    [5] Han H G, Tang Z C, Wu X L, Yang H Y, Qiao J F. Robust reconstructed neural network with spectral reshaping activation. IEEE Transactions on Cybernetics, 2025, 55(6): 2765−2778 doi: 10.1109/TCYB.2025.3557397
    [6] Liu D J, Wang Y L, Liu C L, Yuan X F, Wang K, Yang C H. Scope-free global multi-condition-aware industrial missing data imputation framework via diffusion transformer. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(11): 6977−6988 doi: 10.1109/TKDE.2024.3392897
    [7] Li L, Rong S M, Wang R, Yu S L. Recent advances in artificial intelligence and machine learning for nonlinear relationship analysis and process control in drinking water treatment: A review. Chemical Engineering Journal, 2021, 405: 126673 doi: 10.1016/j.cej.2020.126673
    [8] Zhao C. Perspectives on nonstationary process monitoring in the era of industrial artificial intelligence. Journal of Process Control, 2022, 116: 255−272 doi: 10.1016/j.jprocont.2022.06.011
    [9] Liu D J, Wang Y L, Liu C L, Luo B, Huang B. EKG-AC: A new paradigm for process industrial optimization based on offline reinforcement learning with expert knowledge guidance. IEEE Transactions on Cybernetics, early access, doi: 10.1109/TCYB.2025.3579361
    [10] Ding J, Yang C, Chai T. Recent progress on data-based optimization for mineral processing plants. Engineering, 2017, 3(2): 183−187 doi: 10.1016/J.ENG.2017.02.015
    [11] 李康, 王福利, 何大阔, 贾润达. 基于数据的湿法冶金全流程操作量优化设定补偿方法. 自动化学报, 2017, 43(6): 1047−1055

    Li Kang, Wang Fu-Li, He Da-Kuo, Jia Run-Da. A data-based compensation method for optimal setting of hydrometallurgical process. Acta Automatica Sinica, 2017, 43(6): 1047−1055
    [12] Schwenzer M, Ay M, Bergs T, Abel D. Review on model predictive control: An engineering perspective. The International Journal of Advanced Manufacturing Technology, 2021, 117: 1327−1349 doi: 10.1007/s00170-021-07682-3
    [13] Zhou P, Chai T, Wang H. Intelligent optimal-setting control for grinding circuits of mineral processing process. IEEE Transactions on Automation Science and Engineering, 2009, 6(4): 730−743 doi: 10.1109/TASE.2008.2011562
    [14] 丁进良, 杨翠娥, 陈远东, 柴天佑. 复杂工业过程智能优化决策系统的现状与展望. 自动化学报, 2018, 44(11): 1931−1943 doi: 10.16383/j.aas.2018.c180550

    Ding Jin-Liang, Yang Cui-E, Chen Yuan-Dong, Chai Tian-You. Research progress and prospects of intelligent optimization decision making in complex industrial process. Acta Automatica Sinica, 2018, 44(11): 1931−1943 doi: 10.16383/j.aas.2018.c180550
    [15] Sun B, Yang C H, Zhu H Q, Gui W H. Modeling, optimization, and control of solution purification process in zinc hydrometallurgy. IEEE/CAA Journal of Automatica Sinica, 2018, 5(2): 564−576 doi: 10.1109/JAS.2017.7510844
    [16] Lattanzi L, Raffaeli R, Peruzzini M, Pellicciari M. Digital twin for smart manufacturing: A review of concepts towards a practical industrial implementation. International Journal of Computer Integrated Manufacturing, 2021, 34(6): 567−597 doi: 10.1080/0951192X.2021.1911003
    [17] 代伟, 陆文捷, 付俊, 马小平. 工业过程多速率分层运行优化控制. 自动化学报, 2019, 45(10): 1946−1959

    Dai Wei, Lu Wen-Jie, Fu Jun, Ma Xiao-Ping. Multi-rate layered optimal operational control of industrial processes. Acta Automatica Sinica, 2019, 45(10): 1946−1959
    [18] 阳春华, 刘一顺, 黄科科, 孙备, 李勇刚, 陈晓方, 等. 有色金属工业智能模型库构建方法及应用. 中国工程科学, 2022, 24(4): 188−201

    Yang Chun-Hua, Liu Yi-Shun, Huang Ke-Ke, Sun Bei, Li Yong-Gang, Chen Xiao-Fang, et. al. Intelligent model library for nonferrous metal industry: Construction method and application. Strategic Study of Chinese Academy of Engineering, 2022, 24(4): 188−201
    [19] 刘强, 卓洁, 郎自强, 秦泗钊. 数据驱动的工业过程运行监控与自优化研究展望. 自动化学报, 2018, 44(11): 1944−1956 doi: 10.16383/j.aas.2018.c180207

    Liu Qiang, Zhuo Jie, Lang Zi-Qiang, Qin S J. Perspectives on data-driven operation monitoring and self-optimization of industrial processes. Acta Automatica Sinica, 2018, 44(11): 1944−1956 doi: 10.16383/j.aas.2018.c180207
    [20] 周晓君, 阳春华, 桂卫华. 全局优化视角下的有色冶金过程建模与控制. 控制理论与应用, 2015, 32(9): 1158−1169

    Zhou Xiao-Jun, Yang Chun-Hua, Gui Wei-Hua. Modeling and control of nonferrous metallurgical processes on the perspective of global optimization. Control Theory & Applications, 2015, 32(9): 1158−1169
    [21] Wei D, Ding S F, Zhang C L, Shi Z Z. Multiagent reinforcement learning with heterogeneous graph attention network. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(10): 6851−6860 doi: 10.1109/TNNLS.2022.3215774
    [22] 朱美强, 程玉虎, 李明, 王雪松, 冯涣婷. 一类基于谱方法的强化学习混合迁移算法. 自动化学报, 2012, 38(11): 1765−1776 doi: 10.3724/SP.J.1004.2012.01765

    Zhu Mei-Qiang, Cheng Yu-Hu, Li Ming, Wang Xue-Song, Feng Huan-Ting. A hybrid transfer algorithm for reinforcement learning based on spectral method. Acta Automatica Sinica, 2012, 38(11): 1765−1776 doi: 10.3724/SP.J.1004.2012.01765
    [23] Huang J, Su J, Chang Q. Graph neural network and multi-agent reinforcement learning for machine-process-system integrated control to optimize production yield. Journal of Manufacturing Systems, 2022, 64: 81−93 doi: 10.1016/j.jmsy.2022.05.018
    [24] Jiang Y, Fan J L, Chai T Y, Li J N, Lewis F L. Data-driven flotation industrial process operational optimal control based on reinforcement learning. IEEE Transactions on Industrial Informatics, 2017, 14(5): 1974−1989
    [25] 李金娜, 袁林, 丁进良. 不确定工业过程运行指标异步更新强化学习决策算法. 自动化学报, 2023, 49(2): 461−472 doi: 10.16383/j.aas.c210983

    Li Jin-Na, Yuan Lin, Ding Jin-Liang. Asynchronous updating reinforcement learning algorithm for decision-making operational indices of uncertain industrial processes. Acta Automatica Sinica, 2023, 49(2): 461−472 doi: 10.16383/j.aas.c210983
    [26] Dogru O, Chiplunkar R, Huang B. Reinforcement learning with constrained uncertain reward function through particle filtering. IEEE Transactions on Industrial Electronics, 2022, 69(7): 7491−7499 doi: 10.1109/TIE.2021.3099234
    [27] Yoo H, Byun H E, Han D, Lee J H. Reinforcement learning for batch process control: Review and perspectives. Annual Reviews in Control, 2021, 52: 108−119 doi: 10.1016/j.arcontrol.2021.10.006
    [28] Shakya A K, Pillai G, Chakrabarty S. Reinforcement learning algorithms: A brief survey. Expert Systems with Applications, 2023, 231: 120495 doi: 10.1016/j.eswa.2023.120495
    [29] Liu D J, Wang Y L, Liu C L, Yuan X F, Yang C H, Gui W H. Data mode related interpretable transformer network for predictive modeling and key sample analysis in industrial processes. IEEE Transactions on Industrial Informatics, 2023, 19(9): 9325−9336 doi: 10.1109/TII.2022.3227731
    [30] Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv: 1509.02971, 2015
    [31] Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P. Benchmarking deep reinforcement learning for continuous control. In: Proceedings of the International Conference on Machine Learning. PMLR, 2016. 1329-1338
    [32] Lowe R, Wu Y I, Tamar A, Harb J, Abbeel P, Mordatch, I, et al. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in Neural Information Processing Systems, 2017, 30: 1−12
  • 加载中
计量
  • 文章访问数:  15
  • HTML全文浏览量:  8
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-07-10
  • 录用日期:  2025-09-22
  • 网络出版日期:  2025-12-02

目录

    /

    返回文章
    返回