Research Reviews of Combinatorial Optimization Methods Based on Deep Reinforcement Learning
-
摘要: 组合优化问题广泛存在于国防、交通、工业、生活等各个领域, 几十年来, 传统运筹优化方法是解决组合优化问题的主要手段, 但随着实际应用中问题规模的不断扩大、求解实时性的要求越来越高, 传统运筹优化算法面临着很大的计算压力, 很难实现组合优化问题的在线求解. 近年来随着深度学习技术的迅猛发展, 深度强化学习在围棋、机器人等领域的瞩目成果显示了其强大的学习能力与序贯决策能力. 鉴于此, 近年来涌现出了多个利用深度强化学习方法解决组合优化问题的新方法, 具有求解速度快、模型泛化能力强的优势, 为组合优化问题的求解提供了一种全新的思路. 因此本文总结回顾近些年利用深度强化学习方法解决组合优化问题的相关理论方法与应用研究, 对其基本原理、相关方法、应用研究进行总结和综述, 并指出未来该方向亟待解决的若干问题.Abstract: Combinatorial optimization problems widely exist in various fields such as national defense, transportation, industry and life. For decades, traditional operational research methods are the main means to solve combinatorial optimization problems. However, with the increase of problem size in practical applications and the increasing demands for real-time optimization, traditional methods suffer from great computational burdens, and it is difficult to realize the online solution of combinatorial optimization problems. In recent years, with the rapid development of deep learning technology, the achievements of deep reinforcement learning in AlphaGo, robot and other fields show its strong learning ability and sequential decision-making ability. In view of this, in recent years, a number of new methods using deep reinforcement learning to solve combinatorial optimization problems have emerged, which have the advantages of fast solving speed and strong model generalization ability. It provides a new idea for solving combinatorial optimization problems. Therefore, this paper summarizes and reviews the theoretical methods and application researches of this kind of method in recent years.
-
表 1 现有算法模型、训练方法、求解问题、以及优化效果比较
Table 1 Comparison of model, training method, solving problems and performance with existing algorithms
方法类别 研究 模型以及训练方法 求解问题及优化效果 基于Pointer
Network的端
到端方法2015年Vinyals等人[30] Ptr-Net + 监督式训练 30 TSP问题: 接近最优解, 优于启发式算法. 40, 50-TSP:
与最优解存在一定差距. 凸包问题、三角剖分问题.2017年Bello等人[31] Ptr-Net + REINFORCE &
Critic baseline50-TSP: 优于[30]. 100-TSP: 接近Concorde最优解.
200-Knapsack: 达到最优解.2018年Nazari等人[32] Ptr-Net + REINFORCE &
Critic baseline100-TSP: 与[31]优化效果相近, 训练时间降低约60%.
100-CVRP/随机CVRP: 优于多个启发式算法.2018年Deudon等人[33] Transformer Attention +
REINFORCE & Critic baseline20, 50-TSP: 优于[31]. 100-TSP: 与[31]优化效果相近. 2019年Kool等人[34] Transformer Attention +
REINFORCE & Rollout baseline100-TSP: 优于[30-33,37,40]. 100-CVRP、100-SDVRP、100-OP、
100-PCTSP、SPCTSP: 接近Gurobi最优解,
优于多种启发式方法.2020年Ma等人[35] Graph Pointer Network + HRL 20, 50-TSP: 优于[31,37], 劣于[34]. 250, 500, 1000-TSP:
优于[31,34]. 20-TSPTW: 优于OR-Tools、蚁群算法.2020年Li等人[36] Ptr-Net + REINFORCE & Critic
baseline & 分解策略/参数迁移40, 100, 150, 200, 500-两目标/三目标TSP :
优于MOEA/D、NSGA-II、MOGLS.基于图神经网络
的端到端方法2017年Dai等人[37] structure2vec + DQN 1200-TSP: 接近[31]. 1200-MVC(最小顶点覆盖): 接近最优解.
1200-MAXCUT(最大割集): 接近最优解.2019年Mittal等人[38] GCN + DQN 2k至20k-MCP(最大覆盖问题): 优于[37]. 10k, 20k, 50k-MVC: 优于[37]. 2018年Li等人[39] GCN + 监督式训练 +
引导树搜索实际数据集MVC、MIS(最大独立点集)、MC(极大团)、
Satisfiability(适定性问题): 优于[37].2017年Nowak等人[40] GNN + 监督式训练 +
波束搜索20-TSP: 劣于[30]. 2019年Joshi等人[41] GCN + 监督式训练 +
波束搜索20, 50, 100-TSP: 略微优于[30,31,33,34], 优于[37]. 深度强化学习改
进的局部搜索
方法2019年Chen等人[47] Ptr-Net + Actor-Critic 20-CVRP: 达到最优解. 50, 100-CVRP: 优于[32,34]、OR-Tools.
作业车间调度: 优于OR-Tools、DeepRM2019年Yolcu等人[48] GNN + REINFORCE 实际数据集Satisfiability、MIS、MVC、MC、图着色问题: 更少
搜索步数得到最优解、但单步运行时间长于传统算法.2020年Gao等人[49] Graph Attention + PPO 100-CVPR: 优于[34]. 100-CVPRTW: 优于多个启发式方法.
400-CVRPTW: 劣于单个启发式方法, 优于其他.2020年Lu等人[50] Transformer Attention +
REINFORCE20, 50, 100-CVRP: 优于[32,34,47], 以及优于OR Tools、
LKH3. 且运行时间远低于LKH3.表 2 端到端模型在TSP问题上优化性能比较.
Table 2 Comparison of end-to-end model on TSP.
方法类别 模型 TSP-20 TSP-50 TSP-100 最优 Concorde 3.84 5.70 7.76 基于指
针网络
(Attention
机制)Vinyals[30] 3.88 7.66 — Bello[31] 3.89 5.95 8.30 Nazari[32] 3.97 6.08 8.44 Deudon[33] 3.86 5.81 8.85 Deudon[33]+2OPT 3.85 5.85 8.17 Kool[34](greedy) 3.85(0s) 5.80(2s) 8.12(6s) Kool[34](sampling) 3.84(5m) 5.73(24m) 7.94(1h) 基于图神
经网络Dai[37] 3.89 5.99 8.31 Nowak[40] 3.93 — — Joshi[41](greedy) 3.86(6s) 5.87(55s) 8.41(6m) Joshi[41](BS) 3.84(12m) 5.70(18m) 7.87(40m) 表 3 多个模型在VRP问题上优化性能比较.
Table 3 Comparison of models on VRP.
表 4 不同组合优化问题求解算法统计与比较
Table 4 Summary and comparison of algorithms on different combinatorial optimization problems
组合优化问题 文献 模型细节 TSP问题 [30–36] 基于Ptr-Net架构
(Encoder-Decoder-Attention)[37] GNN+DQN [40, 41] GNN+监督式训练+波束搜索 VRP问题 [32, 34] 基于Ptr-Net架构(Encoder-
Decoder-Attention)[47, 49, 50] DRL训练局部搜索算子. [47]:
Ptr-Net模型, [49]: Graph
Attention模型, [50]: Transformer
Attention模型.最小顶点覆盖问题(MVC) [37, 38, 48] GNN + RL [39] GNN + 监督式训练 + 树搜索 最大割集问题(MaxCut) [37] GNN + DQN [57] Message Passing Neural Network
(MPNN) + DQN[58] * CNN&RNN + PPO 适定性问题(Satisfiability) [39, 48] GNN + 监督式训练/RL 最小支配集问题 (MDS) [48] GNN + RL [59] * Decision Diagram + RL 极大团问题(MC) [39, 48] GNN + 监督式训练/RL 最大独立集问题(MIS) [39] GNN + 监督式训练 + 树搜索 [60] * GNN + RL + 蒙特卡洛树搜索 背包问题(Knapsack) [31] Ptr-Net + RL 车间作业调度问题 [47] LSTM + RL训练局部搜索算子 装箱问题(BPP) [61] * LSTM + RL [62] * NN + RL + 蒙特卡洛树搜索 图着色问题 [48] GNN + RL [63] * LSTM + RL + 蒙特卡洛树搜索 -
[1] Papadimitriou CH, Steiglitz K. Combinatorial Optimization: Algorithms and Complexity. Courier Corporation, 1998. [2] Festa P. A brief introduction to exact, approximation, and heuristic algorithms for solving hard combinatorial optimization problems. In: 2014 16th International Conference on Transparent Optical Networks (ICTON), IEEE, 2014.1−20. [3] Lawler EL, Wood DE. Branch-and-bound methods: A survey. Operations research, 1966, 14(4): 699−719 doi: 10.1287/opre.14.4.699 [4] Bertsekas DP. Dynamic Programming and Optimal Control. Athena scientific Belmont, MA, 1995. [5] Sniedovich M. Dynamic Programming: Foundations and Principles. CRC press, 2010. [6] Williamson DP, Shmoys DB. The Design of Approximation Algorithms. Cambridge university press, 2011. [7] Vazirani V V. Approximation Algorithms. Springer Science & Business Media, 2013. [8] Hochba DS. Approximation algorithms for NP-hard problems. ACM Sigact News, 1997, 28(2): 40−52 doi: 10.1145/261342.571216 [9] Teoh EJ, Tang H, Tan KC. A columnar competitive model with simulated annealing for solving combinatorial optimization problems. In: The 2006 IEEE International Joint Conference on Neural Network Proceedings, IEEE, 2006.3254−3259. [10] Van Laarhoven PJM, Aarts EHL, Lenstra JK. Job shop scheduling by simulated annealing. Operations research, 1992, 40(1): 113−125 doi: 10.1287/opre.40.1.113 [11] WESLEY BARNES J, LAGUNA M. Solving the multiple-machine weighted flow time problem using tabu search. IIE transactions, 1993, 25(2): 121−128 doi: 10.1080/07408179308964284 [12] Basu S. Tabu search implementation on traveling salesman problem and its variations: a literature survey. American Journal of Operations Research, 2012, 2(2): 163−173 doi: 10.4236/ajor.2012.22019 [13] Halim AH, Ismail I. Combinatorial optimization: comparison of heuristic algorithms in travelling salesman problem. Archives of Computational Methods in Engineering, 2019, 26(2): 367−380 doi: 10.1007/s11831-017-9247-y [14] Rezoug A, Bader-El-Den M, Boughaci D. Guided genetic algorithm for the multidimensional knapsack problem. Memetic Computing, 2018, 10(1): 29−42 doi: 10.1007/s12293-017-0232-7 [15] Lin BL, Sun X, Salous S. Solving travelling salesman problem with an improved hybrid genetic algorithm. Journal of computer and communications, 2016, 4(15): 98−106 doi: 10.4236/jcc.2016.415009 [16] Prado RS, Silva RCP, Guimarães FG, Neto OM. Using differential evolution for combinatorial optimization: A general approach. In: 2010 IEEE International Conference on Systems, Man and Cybernetics, IEEE, 2010.11−18. [17] Onwubolu GC, Davendra D. Differential Evolution: A Handbook for Global Permutation-Based Combinatorial Optimization. Vol 175. Springer Science & Business Media, 2009. [18] Deng W, Xu J, Zhao H. An improved ant colony optimization algorithm based on hybrid strategies for scheduling problem. IEEE Access, 2019, 7: 20281−20292 doi: 10.1109/ACCESS.2019.2897580 [19] Ramadhani T, Hertono GF, Handari BD. An Ant Colony Optimization algorithm for solving the fixed destination multi-depot multiple traveling salesman problem with non-random parameters. In: AIP Conference Proceedings, AIP Publishing LLC, 2017.30123. [20] Zhong Y, Lin J, Wang L, Zhang H. Discrete comprehensive learning particle swarm optimization algorithm with Metropolis acceptance criterion for traveling salesman problem. Swarm and Evolutionary Computation, 2018, 42: 77−88 doi: 10.1016/j.swevo.2018.02.017 [21] Nouiri M, Bekrar A, Jemai A, Niar S, Ammari AC. An effective and distributed particle swarm optimization algorithm for flexible job-shop scheduling problem. Journal of Intelligent Manufacturing, 2018, 29(3): 603−615 doi: 10.1007/s10845-015-1039-3 [22] Lourenço HR, Martin OC, Stützle T. Iterated local search: Framework and applications. In: Handbook of Metaheuristics, Springer, 2019.129−168. [23] Grasas A, Juan AA, Lourenço HR. SimILS: a simulation-based extension of the iterated local search metaheuristic for stochastic combinatorial optimization. Journal of Simulation, 2016, 10(1): 69−77 doi: 10.1057/jos.2014.25 [24] Zhang G, Zhang L, Song X, Wang Y, Zhou C. A variable neighborhood search based genetic algorithm for flexible job shop scheduling problem. Cluster Computing, 2019, 22(5): 11561−11572 [25] Hore S, Chatterjee A, Dewanji A. Improving variable neighborhood search to solve the traveling salesman problem. Applied Soft Computing, 2018, 68: 83−91 doi: 10.1016/j.asoc.2018.03.048 [26] Silver D, Schrittwieser J, Simonyan K, et al. Mastering the game of go without human knowledge. Nature, 2017, 550(7676): 354−359 doi: 10.1038/nature24270 [27] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529−533 doi: 10.1038/nature14236 [28] Hopfield JJ, Tank DW. “Neural” computation of decisions in optimization problems. Biological Cybernetics, 1985, 52(3): 141−152 [29] Smith KA. Neural networks for combinatorial optimization: a review of more than a decade of research. INFORMS Journal on Computing, 1999, 11(1): 15−34 doi: 10.1287/ijoc.11.1.15 [30] Vinyals O, Fortunato M, Jaitly N. Pointer networks. In: Advances in Neural Information Processing Systems, 2015. [31] Bello I, Pham H, Le Q V., Norouzi M, Bengio S. Neural combinatorial optimization with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017 - Workshop Track Proceedings, 2019. [32] Nazari M, Oroojlooy A, Takáč M, Snyder L V. Reinforcement learning for solving the vehicle routing problem. In: Advances in Neural Information Processing Systems, 2018. [33] Deudon M, Cournut P, Lacoste A, Adulyasak Y, Rousseau LM. Learning heuristics for the tsp by policy gradient. In: International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research, Springer, 2018.170−181. [34] Kool W, Van Hoof H, Welling M. Attention, learn to solve routing problems! In: 7th International Conference on Learning Representations, ICLR, 2019. [35] Ma, Qiang and Ge, Suwen and He, Danyang and Thaker, Darshan and Drori I. Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning. In: AAAI Workshop on Deep Learning on Graphs: Methodologies and Applications, 2020. [36] Li K, Zhang T, Wang R. Deep Reinforcement Learning for Multiobjective Optimization. IEEE Transactions on Cybernetics, 2020. [37] Dai H, Khalil EB, Zhang Y, Dilkina B, Song L. Learning combinatorial optimization algorithms over graphs. In: Advances in Neural Information Processing Systems, 2017.6348−6358. [38] Mittal A, Dhawan A, Manchanda S, Medya S, Ranu S, Singh A. Learning heuristics over large graphs via deep reinforcement learning. arXiv preprint arXiv: 190303332, 2019. [39] Li Z, Chen Q, Koltun V. Combinatorial optimization with graph convolutional networks and guided tree search. In: Advances in Neural Information Processing Systems, 2018.539−548. [40] Nowak A, Villar S, Bandeira AS, Bruna J. A note on learning algorithms for quadratic assignment with graph neural networks. In: Proceeding of the 34th International Conference on Machine Learning (ICML), 2017.1050: 22. [41] Joshi CK, Laurent T, Bresson X. An efficient graph convolutional network technique for the travelling salesman problem. arXiv preprint arXiv: 190601227, 2019. [42] Helsgaun K. An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained traveling salesman and vehicle routing problems. Roskilde: Roskilde University, 2017. [43] Perron L, Furnon V. Google’s OR-Tools. URL https://developersgooglecom/optimization, 2019. [44] OPTIMIZATION G. INC. Gurobi optimizer reference manual, 2015. URL: http://wwwgurobicom, 2014: 29. [45] Applegate D, Bixby R, Chvatal V, Cook W. Concorde TSP solver. 2006. [46] Bengio Y, Lodi A, Prouvost A. Machine Learning for Combinatorial Optimization: a Methodological Tour d’Horizon. arXiv preprint arXiv: 181106128, 2018. [47] Chen X, Tian Y. Learning to perform local rewriting for combinatorial optimization. In: Advances in Neural Information Processing Systems, 2019.6281−6292. [48] Yolcu E, Póczos B. Learning local search heuristics for boolean satisfiability. In: Advances in Neural Information Processing Systems, 2019.7992−8003. [49] Gao L, Chen M, Chen Q, Luo G, Zhu N, Liu Z. Learn to design the heuristics for vehicle routing problem. arXiv preprint arXiv: 200208539, 2020. [50] Lu H, Zhang X, Yang S. A Learning-based Iterative Method for Solving Vehicle Routing Problems. In: International Conference on Learning Representations, 2019. [51] Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G. The graph neural network model. IEEE Transactions on Neural Networks, 2008, 20(1): 61−80 [52] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Advances in Neural Information Processing Systems, 2017.5998−6008. [53] Hsu C-H, Chang S-H, Liang J-H, et al. Monas: Multi-objective neural architecture search using reinforcement learning. arXiv preprint arXiv: 180610332, 2018. [54] Mossalam H, Assael YM, Roijers DM, Whiteson S. Multi-objective deep reinforcement learning. arXiv preprint arXiv: 161002707, 2016. [55] Joshi CK, Laurent T, Bresson X. On Learning Paradigms for the Travelling Salesman Problem. In: NeurIPS Workshop on Graph Representation Learning, 2019. [56] Joshi CK, Cappart Q, Rousseau L-M, Laurent T, Bresson X. Learning TSP requires rethinking generalization. arXiv preprint arXiv: 200607054, 2020. [57] Barrett TD, Clements WR, Foerster JN, Lvovsky AI. Exploratory combinatorial optimization with reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2020.3243−3250. [58] Beloborodov D, Ulanov AE, Foerster JN, Whiteson S, Lvovsky AI. Reinforcement Learning Enhanced Quantum-inspired Algorithm for Combinatorial Optimization. arXiv preprint arXiv: 200204676, 2020. [59] Cappart Q, Goutierre E, Bergman D, Rousseau L-M. Improving optimization bounds using machine learning: Decision diagrams meet deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2019.1443−1451. [60] Abe K, Sato I, Sugiyama M. Solving NP-Hard Problems on Graphs by Reinforcement Learning without Domain Knowledge. Simulation, 2019, 1: 1 [61] Hu H, Zhang X, Yan X, Wang L, Xu Y. Solving a new 3d bin packing problem with deep reinforcement learning method. arXiv preprint arXiv: 170805930, 2017. [62] Laterre A, Fu Y, Jabri MK, et al. Ranked reward: Enabling self-play reinforcement learning for combinatorial optimization. arXiv preprint arXiv: 180701672, 2018. [63] Huang J, Patwary M, Diamos G. Coloring big graphs with alphagozero. arXiv preprint arXiv: 190210162, 2019. [64] Li J, Shi W, Zhang N, Shen X. Delay-Aware VNF Scheduling: A Reinforcement Learning Approach with Variable Action Set. IEEE Transactions on Cognitive Communications and Networking, 2020. [65] Mijumbi R, Hasija S, Davy S, Davy A, Jennings B, Boutaba R. Topology-aware prediction of virtual network function resource requirements. IEEE Transactions on Network and Service Management, 2017, 14(1): 106−120 doi: 10.1109/TNSM.2017.2666781 [66] Mijumbi R, Hasija S, Davy S, Davy A, Jennings B, Boutaba R. A connectionist approach to dynamic resource management for virtualised network functions. In: 2016 12th International Conference on Network and Service Management (CNSM), IEEE, 2016.1−9. [67] Quang PTA, Hadjadj-Aoul Y, Outtagarts A. A deep reinforcement learning approach for VNF Forwarding Graph Embedding. IEEE Transactions on Network and Service Management, 2019, 16(4): 1318−1331 doi: 10.1109/TNSM.2019.2947905 [68] Solozabal R, Ceberio J, Sanchoyerto A, Zabala L, Blanco B, Liberal F. Virtual Network Function Placement Optimization with Deep Reinforcement Learning. IEEE Journal on Selected Areas in Communications, 2020, 38(2) [69] Liu Q, Han T, Moges E. EdgeSlice: Slicing Wireless Edge Computing Network with Decentralized Deep Reinforcement Learning. arXiv preprint arXiv: 200312911, 2020. [70] Van Huynh N, Thai Hoang D, Nguyen DN, Dutkiewicz E. Optimal and Fast Real-Time Resource Slicing with Deep Dueling Neural Networks. IEEE Journal on Selected Areas in Communications, 2019, 37(6) [71] Mseddi A, Jaafar W, Elbiaze H, Ajib W. Intelligent Resource Allocation in Dynamic Fog Computing Environments. In: 2019 IEEE 8th International Conference on Cloud Networking (CloudNet), IEEE, 2019.1−7. [72] Almasan P, Suárez-Varela J, Badia-Sampera A, Rusek K, Barlet-Ros P, Cabellos-Aparicio A. Deep Reinforcement Learning meets Graph Neural Networks: exploring a routing optimization use case. arXiv preprint arXiv: 191007421, 2020. [73] Meng X, Inaltekin H, Krongold B. Deep reinforcement learning-based topology optimization for self-organized wireless sensor networks. In: 2019 IEEE Global Communications Conference (GLOBECOM), IEEE, 2019.1−6. [74] Lu J, Feng L, Yang J, Hassan MM, Alelaiwi A, Humar I. Artificial agent: The fusion of artificial intelligence and a mobile agent for energy-efficient traffic control in wireless sensor networks. Future Generation Computer Systems, 2019, 95: 45−51 doi: 10.1016/j.future.2018.12.024 [75] Zhang S, Shen W, Zhangt M, Cao X, Cheng Y. Experience-Driven Wireless D2D Network Link Scheduling: A Deep Learning Approach. In: IEEE International Conference on Communications, 2019.1−6. [76] Huang L, Bi S, Zhang YJ. Deep Reinforcement Learning for Online Computation Offloading in Wireless Powered Mobile-Edge Computing Networks. IEEE Transactions on Mobile Computing, 2019. [77] Wang J, Hu J, Min G, Zhan W, Ni Q, Georgalas N. Computation offloading in multi-access edge computing using a deep sequential model based on reinforcement learning. IEEE Communications Magazine, 2019, 57(5): 64−69 doi: 10.1109/MCOM.2019.1800971 [78] Jiang Q, Zhang Y, Yan J. Neural Combinatorial Optimization for Energy-Efficient Offloading in Mobile Edge Computing. IEEE Access, 2020: 8 [79] Yu JJQ, Yu W, Gu J. Online Vehicle Routing with Neural Combinatorial Optimization and Deep Reinforcement Learning. IEEE Transactions on Intelligent Transportation Systems, 2019, 20(10) [80] Holler J, Vuorio R, Qin Z, et al. Deep Reinforcement Learning for Multi-Driver Vehicle Dispatching and Repositioning Problem. In: 2019 IEEE International Conference on Data Mining (ICDM), IEEE, 2019.1090−1095. [81] Liang X, Du X, Wang G, Han Z. A deep reinforcement learning network for traffic light cycle control. IEEE Transactions on Vehicular Technology, 2019, 68(2): 1243−1253 doi: 10.1109/TVT.2018.2890726 [82] Chen X, Tian Y. Learning to Progressively Plan. arXiv preprint arXiv: 1810.00337, 2018. [83] Zheng P, Zuo LL, Wang JL, Zhang J. Pointer networks for solving the permutation flow shop scheduling problem. In: Proceedings of International Conference on Computers and Industrial Engineering, IEEE, 2018.2−5. [84] Pan R, Dong X, Han S. Solving Permutation Flowshop Problem with Deep Reinforcement Learning. In: 2020 Prognostics and Health Management Conference (PHM-Besançon), IEEE, 2020.349−353. [85] Mirhoseini A, Pham H, Le Q V, et al. Device placement optimization with reinforcement learning. arXiv preprint arXiv: 170604972, 2017. [86] Mirhoseini A, Goldie A, Pham H, Steiner B, Le Q V, Dean J. A hierarchical model for device placement. In: International Conference on Learning Representations, 2018. [87] François-Lavet V, Taralla D, Ernst D, Fonteneau R. Deep reinforcement learning solutions for energy microgrids management. In: European Workshop on Reinforcement Learning (EWRL 2016), 2016. [88] 张自东, 邱才明, 张东霞, 徐舒玮贺兴. 基于深度强化学习的微电网复合储能协调控制方法. 电网技术, 2019, 43(6): 1914−1921Zhang Zi-dong, Qiu Cai-Ming, Zhang Dong-Xia, Xu Shu-Wei, He Xing. A Coordinated Control Method for Hybrid Energy Storage System in Microgrid Based on Deep Reinforcement Learning. Power System Technology, 2019, 43(6): 1914−1921 [89] Valladares W, Galindo M, Gutiérrez J, et al. Energy optimization associated with thermal comfort and indoor air control via a deep reinforcement learning algorithm. Building and Environment, 2019, 155: 105−117 doi: 10.1016/j.buildenv.2019.03.038 [90] Mocanu E, Mocanu DC, Nguyen PH, et al. On-line Building Energy Optimization using Deep Reinforcement Learning. IEEE Transactions on Smart Grid, 2018, 10(4): 3698−3708 -

计量
- 文章访问数: 1287
- HTML全文浏览量: 452
- 被引次数: 0