| [1] | Harjunkoski I, Maravelias C T, Bongers P, Castro P M, Engell S, Grossmann I E, et al. Scope for industrial applications of production scheduling models and solution methods. Computers & Chemical Engineering, 2014, 62: 161−193 | 
		
				| [2] | Castro P M, Grossmann I E, Zhang Q. Expanding scope and computational challenges in process scheduling. Computers & Chemical Engineering, 2018, 114: 14−42 | 
		
				| [3] | Hou Y, Wu N Q, Zhou M C, Li Z W. Pareto-optimization for scheduling of crude oil operations in refinery via genetic algorithm. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2017, 47(3): 517−530 doi:  10.1109/TSMC.2015.2507161 | 
		
				| [4] | Gao K Z, Huang Y, Sadollah A, Wang L. A review of energy-efficient scheduling in intelligent production systems. Complex & Intelligent Systems, 2020, 6(2): 237−249 | 
		
				| [5] | Fathollahi-Fard A M, Woodward L, Akhrif O. A distributed permutation flow-shop considering sustainability criteria and real-time scheduling. Journal of Industrial Information Integration, 2024, 39: Article No. 100598 doi:  10.1016/j.jii.2024.100598 | 
		
				| [6] | Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998. | 
		
				| [7] | Esteso A, Peidro D, Mula J, Díaz-Madroñero M. Reinforcement learning applied to production planning and control. International Journal of Production Research, 2023, 61(16): 5772−5789 doi:  10.1080/00207543.2022.2104180 | 
		
				| [8] | Dogru O, Xie J Y, Prakash O, Chiplunkar R, Soesanto J, Chen H T, et al. Reinforcement learning in process industries: Review and perspective. IEEE/CAA Journal of Automatica Sinica, 2024, 11(2): 283−300 doi:  10.1109/JAS.2024.124227 | 
		
				| [9] | Nian R, Liu J F, Huang B. A review on reinforcement learning: Introduction and applications in industrial process control. Computers & Chemical Engineering, 2020, 139: Article No. 106886 | 
		
				| [10] | Bellman R. A Markovian decision process. Journal of Mathematics and Mechanics, 1957, 6(5): 679−684 | 
		
				| [11] | Sutton R S. Learning to predict by the methods of temporal differences. Machine Learning, 1988, 3(1): 9−44 doi:  10.1023/A:1022633531479 | 
		
				| [12] | Watkins C J C H, Dayan P. Q-learning. Machine Learning, 1992, 8(3−4): 279−292 | 
		
				| [13] | Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, et al. Playing Atari with deep reinforcement learning. arXiv preprint arXiv: 1312.5602, 2013. | 
		
				| [14] | van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. Phoenix, USA: AAAI Press, 2016. | 
		
				| [15] | Wang Z Y, Schaul T, Hessel M, van Hasselt H, Lanctot M, de Freitas N. Dueling network architectures for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning. New York, USA: JMLR.org, 2016. 1995−2003 | 
		
				| [16] | Williams R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 1992, 8(3−4): 229−256 doi:  10.1023/A:1022672621406 | 
		
				| [17] | Konda V R, Tsitsiklis J N. Actor-citic agorithms. In: Proceedings of the 13th International Conference on Neural Information Processing Systems. Denver, USA: MIT Press, 1999. 1008−1014 | 
		
				| [18] | Mnih V, Badia A P, Mirza M, Graves A, Harley T, Lillicrap T P, et al. Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning. New York, USA: JMLR.org, 2016. 1928−1937 | 
		
				| [19] | Schulman J, Levine S, Moritz P, Jordan M, Abbeel P. Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning. Lille, France: JMLR.org, 2015. 1889−1897 | 
		
				| [20] | Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv preprint arXiv: 1707.06347, 2017. | 
		
				| [21] | Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv: 1509.02971, 2015. | 
		
				| [22] | Fujimoto S, van Hoof H, Meger D. Addressing function approximation error in actor-critic methods. arXiv preprint arXiv: 1802.09477, 2018. | 
		
				| [23] | Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv: 1801.01290, 2018. | 
		
				| [24] | Gao X Y, Peng D, Kui G F, Pan J, Zuo X, Li F F. Reinforcement learning based optimization algorithm for maintenance tasks scheduling in coalbed methane gas field. Computers & Chemical Engineering, 2023, 170: Article No. 108131 | 
		
				| [25] | Nikita S, Tiwari A, Sonawat D, Kodamana H, Rathore A S. Reinforcement learning based optimization of process chromatography for continuous processing of biopharmaceuticals. Chemical Engineering Science, 2021, 230: Article No. 116171 doi:  10.1016/j.ces.2020.116171 | 
		
				| [26] | Cheng Y, Huang Y X, Pang B, Zhang W D. ThermalNet: A deep reinforcement learning-based combustion optimization system for coal-fired boiler. Engineering Applications of Artificial Intelligence, 2018, 74: 303−311 doi:  10.1016/j.engappai.2018.07.003 | 
		
				| [27] | Shao Z, Si F Q, Kudenko D, Wang P, Tong X Z. Predictive scheduling of wet flue gas desulfurization system based on reinforcement learning. Computers & Chemical Engineering, 2020, 141: Article No. 107000 | 
		
				| [28] | Hubbs C D, Li C, Sahinidis N V, Grossmann I E, Wassick J M. A deep reinforcement learning approach for chemical production scheduling. Computers & Chemical Engineering, 2020, 141: Article No. 106982 | 
		
				| [29] | Powell B K M, Machalek D, Quah T. Real-time optimization using reinforcement learning. Computers & Chemical Engineering, 2020, 143: Article No. 107077 | 
		
				| [30] | Liu C, Ding J L, Sun J Y. Reinforcement learning based decision making of operational indices in process industry under changing environment. IEEE Transactions on Industrial Informatics, 2021, 17(4): 2727−2736 doi:  10.1109/TII.2020.3005207 | 
		
				| [31] | Adams D, Oh D H, Kim D W, Lee C H, Oh M. Deep reinforcement learning optimization framework for a power generation plant considering performance and environmental issues. Journal of Cleaner Production, 2021, 291: Article No. 125915 doi:  10.1016/j.jclepro.2021.125915 | 
		
				| [32] | Pravin P S, Luo Z Y, Li L Y, Wang X N. Learning-based scheduling of industrial hybrid renewable energy systems. Computers & Chemical Engineering, 2022, 159: Article No. 107665 | 
		
				| [33] | Song G, Ifaei P, Ha J, Kang D, Won W, Liu J J, et al. The AI circular hydrogen economist: Hydrogen supply chain design via hierarchical deep multi-agent reinforcement learning. Chemical Engineering Journal, 2024, 497: Article No. 154464 doi:  10.1016/j.cej.2024.154464 | 
		
				| [34] | Wang H, He H W, Bai Y F, Yue H W. Parameterized deep Q-network based energy management with balanced energy economy and battery life for hybrid electric vehicles. Applied Energy, 2022, 320: Article No. 119270 doi:  10.1016/j.apenergy.2022.119270 | 
		
				| [35] | Campos G, El-Farra N H, Palazoglu A. Soft actor-critic deep reinforcement learning with hybrid mixed-integer actions for demand responsive scheduling of energy systems. Industrial & Engineering Chemistry Research, 2022, 61(24): 8443−8461 | 
		
				| [36] | Li L J, Yang X, Yang S P, Xu X W. Optimization of oxygen system scheduling in hybrid action space based on deep reinforcement learning. Computers & Chemical Engineering, 2023, 171: Article No. 108168 | 
		
				| [37] | Stops L, Leenhouts R, Gao Q H, Schweidtmann A M. Flowsheet generation through hierarchical reinforcement learning and graph neural networks. AIChE Journal, 2023, 69(1): Article No. e17938 doi:  10.1002/aic.17938 | 
		
				| [38] | Masson W, Ranchod P, Konidaris G. Reinforcement learning with parameterized actions. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. Phoenix, USA: AAAI Press, 2016. 1934−1940 | 
		
				| [39] | Göttl Q, Pirnay J, Burger J, Grimm D G. Deep reinforcement learning enables conceptual design of processes for separating azeotropic mixtures without prior knowledge. Computers & Chemical Engineering, 2025, 194: Article No. 108975 | 
		
				| [40] | Zhang H K, Liu X Y, Sun D S, Dabiri A, de Schutter B. Integrated reinforcement learning and optimization for railway timetable rescheduling. IFAC-PapersOnLine, 2024, 58(10): 310−315 doi:  10.1016/j.ifacol.2024.07.358 | 
		
				| [41] | Che G, Zhang Y Y, Tang L X, Zhao S N. A deep reinforcement learning based multi-objective optimization for the scheduling of oxygen production system in integrated iron and steel plants. Applied Energy, 2023, 345: Article No. 121332 doi:  10.1016/j.apenergy.2023.121332 | 
		
				| [42] | Shakya A K, Pillai G, Chakrabarty S. Reinforcement learning algorithms: A brief survey. Expert Systems With Applications, 2023, 231: Article No. 120495 doi:  10.1016/j.eswa.2023.120495 | 
		
				| [43] | Panzer M, Bender B. Deep reinforcement learning in production systems: A systematic literature review. International Journal of Production Research, 2022, 60(13): 4316−4341 doi:  10.1080/00207543.2021.1973138 | 
		
				| [44] | Ladosz P, Weng L L, Kim M, Oh H. Exploration in deep reinforcement learning: A survey. Information Fusion, 2022, 85: 1−22 doi:  10.1016/j.inffus.2022.03.003 | 
		
				| [45] | Knox W B, Stone P. Interactively shaping agents via human reinforcement: The TAMER framework. In: Proceedings of the 5th International Conference on Knowledge Capture. Redondo Beach, USA: Association for Computing Machinery, 2009. 9−16 | 
		
				| [46] | Christiano P F, Leike J, Brown T B, Martic M, Legg S, Amodei D. Deep reinforcement learning from human preferences. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc., 2017. 4302−4310 | 
		
				| [47] | Devlin S, Grześ M, Kudenko D. An empirical study of potential-based reward shaping and advice in complex, multi-agent systems. Advances in Complex Systems, 2011, 14(2): 251−278 doi:  10.1142/S0219525911002998 | 
		
				| [48] | Grześ M, Kudenko D. Plan-based reward shaping for reinforcement learning. In: Proceedings of the 4th International IEEE Conference Intelligent Systems. Varna, Bulgaria: IEEE, 2008. 10-22−10-29 | 
		
				| [49] | Grześ M, Kudenko D. Multigrid reinforcement learning with reward shaping. In: Proceedings of the 18th International Conference on Artificial Neural Networks. Prague, Czech Republic: Springer, 2008. 357−366 | 
		
				| [50] | Du Y, Li J Q, Chen X L, Duan P Y, Pan Q K. Knowledge-based reinforcement learning and estimation of distribution algorithm for flexible job shop scheduling problem. IEEE Transactions on Emerging Topics in Computational Intelligence, 2023, 7(4): 1036−1050 doi:  10.1109/TETCI.2022.3145706 | 
		
				| [51] | Gui Y, Tang D B, Zhu H H, Zhang Y, Zhang Z Q. Dynamic scheduling for flexible job shop using a deep reinforcement learning approach. Computers & Industrial Engineering, 2023, 180: Article No. 109255 | 
		
				| [52] | Chow Y, Ghavamzadeh M, Janson L, Pavone M. Risk-constrained reinforcement learning with percentile risk criteria. The Journal of Machine Learning Research, 2017, 18(1): 6070−6120 | 
		
				| [53] | Luo L, Yan X S. Scheduling of stochastic distributed hybrid flow-shop by hybrid estimation of distribution algorithm and proximal policy optimization. Expert Systems With Applications, 2025, 271: Article No. 126523 doi:  10.1016/j.eswa.2025.126523 | 
		
				| [54] | Li H P, Wan Z Q, He H B. Real-time residential demand response. IEEE Transactions on Smart Grid, 2020, 11(5): 4144−4154 doi:  10.1109/TSG.2020.2978061 | 
		
				| [55] | Dai J T, Ji J M, Yang L, Zheng Q, Pan G. Augmented proximal policy optimization for safe reinforcement learning. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. Washington, USA: AAAI Press, 2023. 7288−7295 | 
		
				| [56] | Gu S D, Yang L, Du Y L, Chen G, Walter F, Wang J, et al. A review of safe reinforcement learning: Methods, theories, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 11216−11235 doi:  10.1109/TPAMI.2024.3457538 | 
		
				| [57] | Yu D J, Zou W J, Yang Y J, Ma H T, Li S E, Yin Y M, et al. Safe model-based reinforcement learning with an uncertainty-aware reachability certificate. IEEE Transactions on Automation Science and Engineering, 2024, 21(3): 4129−4142 doi:  10.1109/TASE.2023.3292388 | 
		
				| [58] | Kou P, Liang D L, Wang C, Wu Z H, Gao L. Safe deep reinforcement learning-based constrained optimal control scheme for active distribution networks. Applied Energy, 2020, 264: Article No. 114772 doi:  10.1016/j.apenergy.2020.114772 | 
		
				| [59] | Song Y, Zhang B, Wen C B, Wang D, Wei G L. Model predictive control for complicated dynamic systems: A survey. International Journal of Systems Science, 2025, 56(9): 2168−2193 doi:  10.1080/00207721.2024.2439473 | 
		
				| [60] | Zanon M, Gros S. Safe reinforcement learning using robust MPC. IEEE Transactions on Automatic Control, 2021, 66(8): 3638−3652 doi:  10.1109/TAC.2020.3024161 | 
		
				| [61] | Sui Y N, Gotovos A, Burdick J W, Krause A. Safe exploration for optimization with Gaussian processes. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille, France: JMLR.org, 2015. 997−1005 | 
		
				| [62] | Turchetta M, Berkenkamp F, Krause A. Safe exploration in finite Markov decision processes with Gaussian processes. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: Curran Associates Inc., 2016. 4312−4320 | 
		
				| [63] | Yang H K, Bernal D E, Franzoi R E, Engineer F G, Kwon K, Lee S, et al. Integration of crude-oil scheduling and refinery planning by Lagrangean decomposition. Computers & Chemical Engineering, 2020, 138: Article No. 106812 | 
		
				| [64] | Castro P M. Optimal scheduling of a multiproduct batch chemical plant with preemptive changeover tasks. Computers & Chemical Engineering, 2022, 162: Article No. 107818 | 
		
				| [65] | Franzoi R E, Menezes B C, Kelly J D, Gut J A W, Grossmann I E. Large-scale optimization of nonconvex MINLP refinery scheduling. Computers & Chemical Engineering, 2024, 186: Article No. 108678 | 
		
				| [66] | Sun L, Lin L, Li H J, Gen M. Large scale flexible scheduling optimization by a distributed evolutionary algorithm. Computers & Industrial Engineering, 2019, 128: 894−904 | 
		
				| [67] | Zhang W T, Du W, Yu G, He R C, Du W L, Jin Y C. Knowledge-assisted dual-stage evolutionary optimization of large-scale crude oil scheduling. IEEE Transactions on Emerging Topics in Computational Intelligence, 2024, 8(2): 1567−1581 doi:  10.1109/TETCI.2024.3353590 | 
		
				| [68] | Xu M, Mei Y, Zhang F F, Zhang M J. Genetic programming with lexicase selection for large-scale dynamic flexible job shop scheduling. IEEE Transactions on Evolutionary Computation, 2024, 28(5): 1235−1249 doi:  10.1109/TEVC.2023.3244607 | 
		
				| [69] | Zheng Y, Wang J, Wang C M, Huang C Y, Yang J F, Xie N. Strategic bidding of wind farms in medium-to-long-term rolling transactions: A bi-level multi-agent deep reinforcement learning approach. Applied Energy, 2025, 383: Article No. 125265 doi:  10.1016/j.apenergy.2024.125265 | 
		
				| [70] | Liu Y S, Fan J X, Shen W M. A deep reinforcement learning approach with graph attention network and multi-signal differential reward for dynamic hybrid flow shop scheduling problem. Journal of Manufacturing Systems, 2025, 80: 643−661 doi:  10.1016/j.jmsy.2025.03.028 | 
		
				| [71] | Bashyal A, Boroukhian T, Veerachanchai P, Naransukh M, Wicaksono H. Multi-agent deep reinforcement learning based demand response and energy management for heavy industries with discrete manufacturing systems. Applied Energy, 2025, 392: Article No. 125990 doi:  10.1016/j.apenergy.2025.125990 | 
		
				| [72] | de Mars P, O'Sullivan A. Applying reinforcement learning and tree search to the unit commitment problem. Applied Energy, 2021, 302: Article No. 117519 doi:  10.1016/j.apenergy.2021.117519 | 
		
				| [73] | Zhu L W, Takami G, Kawahara M, Kanokogi H, Matsubara T. Alleviating parameter-tuning burden in reinforcement learning for large-scale process control. Computers & Chemical Engineering, 2022, 158: Article No. 107658 | 
		
				| [74] | Hu R, Huang Y F, Wu X, Qian B, Wang L, Zhang Z Q. Collaborative Q-learning hyper-heuristic evolutionary algorithm for the production and transportation integrated scheduling of silicon electrodes. Swarm and Evolutionary Computation, 2024, 86: Article No. 101498 doi:  10.1016/j.swevo.2024.101498 | 
		
				| [75] | Chen X, Li Y B, Wang K P, Wang L, Liu J, Wang J, et al. Reinforcement learning for distributed hybrid flowshop scheduling problem with variable task splitting towards mass personalized manufacturing. Journal of Manufacturing Systems, 2024, 76: 188−206 doi:  10.1016/j.jmsy.2024.07.011 | 
		
				| [76] | Bouton M, Julian K D, Nakhaei A, Fujimura K, Kochenderfer M J. Decomposition methods with deep corrections for reinforcement learning. Autonomous Agents and Multi-agent Systems, 2019, 33(3): 330−352 doi:  10.1007/s10458-019-09407-z | 
		
				| [77] | Chen Y D, Ding J L, Chen Q D. A reinforcement learning based large-scale refinery production scheduling algorithm. IEEE Transactions on Automation Science and Engineering, 2024, 21(4): 6041−6055 doi:  10.1109/TASE.2023.3321612 | 
		
				| [78] | Bonnans J F. Lectures on stochastic programming: Modeling and theory. SIAM Review, 2011, 53(1): 181−183 | 
		
				| [79] | Li Z K, Ierapetritou M G. Robust optimization for process scheduling under uncertainty. Industrial & Engineering Chemistry Research, 2008, 47(12): 4148−4157 | 
		
				| [80] | Glomb L, Liers F, Rösel F. A rolling-horizon approach for multi-period optimization. European Journal of Operational Research, 2022, 300(1): 189−206 doi:  10.1016/j.ejor.2021.07.043 | 
		
				| [81] | Qasim M, Wong K Y, Komarudin. A review on aggregate production planning under uncertainty: Insights from a fuzzy programming perspective. Engineering Applications of Artificial Intelligence, 2024, 128(C): Article No. 107436 | 
		
				| [82] | Wu X Q, Yan X F, Guan D H, Wei M Q. A deep reinforcement learning model for dynamic job-shop scheduling problem with uncertain processing time. Engineering Applications of Artificial Intelligence, 2024, 131: Article No. 107790 doi:  10.1016/j.engappai.2023.107790 | 
		
				| [83] | Ruiz-Rodríguez M L, Kubler S, Robert J, le Traon Y. Dynamic maintenance scheduling approach under uncertainty: Comparison between reinforcement learning, genetic algorithm simheuristic, dispatching rules. Expert Systems With Applications, 2024, 248: Article No. 123404 doi:  10.1016/j.eswa.2024.123404 | 
		
				| [84] | Rangel-Martinez D, Ricardez-Sandoval L A. Recurrent reinforcement learning strategy with a parameterized agent for online scheduling of a state task network under uncertainty. Industrial & Engineering Chemistry Research, 2025, 64(13): 7126−7140 | 
		
				| [85] | Huang M Y, He R C, Dai X, Du W L, Qian F. Reinforcement learning based gasoline blending optimization: Achieving more efficient nonlinear online blending of fuels. Chemical Engineering Science, 2024, 300: Article No. 120574 doi:  10.1016/j.ces.2024.120574 | 
		
				| [86] | Peng W L, Lin X J, Li H T. Critical chain based proactive-reactive scheduling for resource-constrained project scheduling under uncertainty. Expert Systems With Applications, 2023, 214: Article No. 119188 doi:  10.1016/j.eswa.2022.119188 | 
		
				| [87] | Grumbach F, Müller A, Reusch P, Trojahn S. Robust-stable scheduling in dynamic flow shops based on deep reinforcement learning. Journal of Intelligent Manufacturing, 2024, 35(2): 667−686 doi:  10.1007/s10845-022-02069-x | 
		
				| [88] | Huang J P, Gao L, Li X Y. A hierarchical multi-action deep reinforcement learning method for dynamic distributed job-shop scheduling problem with job arrivals. IEEE Transactions on Automation Science and Engineering, 2025, 22: 2501−2513 doi:  10.1109/TASE.2024.3380644 | 
		
				| [89] | Infantes G, Roussel S, Pereira P, Jacquet A, Benazera E. Learning to solve job shop scheduling under uncertainty. In: Proceedings of the 21st International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research. Uppsala, Sweden: Springer, 2024. 329−345 | 
		
				| [90] | de Puiseau C W, Meyes R, Meisen T. On reliability of reinforcement learning based production scheduling systems: A comparative survey. Journal of Intelligent Manufacturing, 2022, 33(4): 911−927 doi:  10.1007/s10845-022-01915-2 | 
		
				| [91] | Lee C Y, Huang Y T, Chen P J. Robust-optimization-guiding deep reinforcement learning for chemical material production scheduling. Computers & Chemical Engineering, 2024, 187: Article No. 108745 | 
		
				| [92] | Rangel-Martinez D, Ricardez-Sandoval L A. A recurrent reinforcement learning strategy for optimal scheduling of partially observable job-shop and flow-shop batch chemical plants under uncertainty. Computers & Chemical Engineering, 2024, 188: Article No. 108748 |