2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于深度强化学习的组合优化研究进展

李凯文 张涛 王锐 覃伟健 贺惠晖 黄鸿

李凯文, 张涛, 王锐, 覃伟健, 贺惠晖, 黄鸿. 基于深度强化学习的组合优化研究进展. 自动化学报, 2020, 41(x): 1−17 doi: 10.16383/j.aas.c200551
引用本文: 李凯文, 张涛, 王锐, 覃伟健, 贺惠晖, 黄鸿. 基于深度强化学习的组合优化研究进展. 自动化学报, 2020, 41(x): 1−17 doi: 10.16383/j.aas.c200551
Li Kai-Wen, Zhang Tao, Wang Rui, Qin Wei-Jian, He Hui-Hui, Huang Hong. Research reviews of combinatorial optimization methods based on deep reinforcement learning. Acta Automatica Sinica, 2020, 41(x): 1−17 doi: 10.16383/j.aas.c200551
Citation: Li Kai-Wen, Zhang Tao, Wang Rui, Qin Wei-Jian, He Hui-Hui, Huang Hong. Research reviews of combinatorial optimization methods based on deep reinforcement learning. Acta Automatica Sinica, 2020, 41(x): 1−17 doi: 10.16383/j.aas.c200551

基于深度强化学习的组合优化研究进展

doi: 10.16383/j.aas.c200551
基金项目: 国家自然科学基金面上项目(61773390, 72071205), 国防科大重点科研计划(ZK18-02-09), 湖湘青年英才计划(2018RS3081), 自主科研计划(ZZKY-ZX-11-04)资助
详细信息
    作者简介:

    李凯文:国防科技大学管理科学与工程系博士研究生. 主要研究方向为能源互联网技术, 深度强化学习与优化技术. E-mail: likaiwen@nudt.edu.cn

    张涛:国防科技大学管理科学与工程系教授. 主要研究方向为能源互联网技术, 基于计算智能的优化与决策技术. E-mail: zhangtao@nudt.edu.cn

    王锐:国防科技大学管理科学与工程系副教授. 主要研究方向为能源互联网技术, 计算智能理论与方法, 多目标进化算法. 本文通信作者. E-mail: ruiwangnudt@gmail.com

    覃伟健:国防科技大学管理科学与工程系硕士研究生. 主要研究方向为能源互联网技术, 深度强化学习与优化技术. E-mail: qinweijian@nudt.edu.cn

    贺惠晖:国防科技大学管理科学与工程系硕士研究生. 主要研究方向为能源互联网技术, 基于计算智能的优化与决策技术. E-mail: hehuihui@nudt.edu.cn

    黄鸿:国防科技大学管理科学与工程系硕士研究生. 主要研究方向为能源互联网技术, 基于计算智能的优化与决策技术. E-mail: huanghong@nudt.edu.cn

Research Reviews of Combinatorial Optimization Methods Based on Deep Reinforcement Learning

Funds: Supported by Natural Science Foundation of China (61773390, 72071205), the Hunan Youth elite program(2018RS3081), the key project of National University of Defense Technology (ZK18-02-09) and the key project ZZKY-ZX-11-04.
  • 摘要: 组合优化问题广泛存在于国防、交通、工业、生活等各个领域, 几十年来, 传统运筹优化方法是解决组合优化问题的主要手段, 但随着实际应用中问题规模的不断扩大、求解实时性的要求越来越高, 传统运筹优化算法面临着很大的计算压力, 很难实现组合优化问题的在线求解. 近年来随着深度学习技术的迅猛发展, 深度强化学习在围棋、机器人等领域的瞩目成果显示了其强大的学习能力与序贯决策能力. 鉴于此, 近年来涌现出了多个利用深度强化学习方法解决组合优化问题的新方法, 具有求解速度快、模型泛化能力强的优势, 为组合优化问题的求解提供了一种全新的思路. 因此本文总结回顾近些年利用深度强化学习方法解决组合优化问题的相关理论方法与应用研究, 对其基本原理、相关方法、应用研究进行总结和综述, 并指出未来该方向亟待解决的若干问题.
  • 图  1  Pointer Network模型示意图

    Fig.  1  Schematic diagram of Pointer Network Model

    表  1  现有算法模型、训练方法、求解问题、以及优化效果比较

    Table  1  Comparison of model, training method, solving problems and performance with existing algorithms

    方法类别研究模型以及训练方法求解问题及优化效果
    基于Pointer
    Network的端
    到端方法
    2015年Vinyals等人[30]Ptr-Net + 监督式训练30 TSP问题: 接近最优解, 优于启发式算法. 40, 50-TSP:
    与最优解存在一定差距. 凸包问题、三角剖分问题.
    2017年Bello等人[31]Ptr-Net + REINFORCE &
    Critic baseline
    50-TSP: 优于[30]. 100-TSP: 接近Concorde最优解.
    200-Knapsack: 达到最优解.
    2018年Nazari等人[32]Ptr-Net + REINFORCE &
    Critic baseline
    100-TSP: 与[31]优化效果相近, 训练时间降低约60%.
    100-CVRP/随机CVRP: 优于多个启发式算法.
    2018年Deudon等人[33]Transformer Attention +
    REINFORCE & Critic baseline
    20, 50-TSP: 优于[31]. 100-TSP: 与[31]优化效果相近.
    2019年Kool等人[34]Transformer Attention +
    REINFORCE & Rollout baseline
    100-TSP: 优于[30-33,37,40]. 100-CVRP、100-SDVRP、100-OP、
    100-PCTSP、SPCTSP: 接近Gurobi最优解,
    优于多种启发式方法.
    2020年Ma等人[35]Graph Pointer Network + HRL20, 50-TSP: 优于[31,37], 劣于[34]. 250, 500, 1000-TSP:
    优于[31,34]. 20-TSPTW: 优于OR-Tools、蚁群算法.
    2020年Li等人[36]Ptr-Net + REINFORCE & Critic
    baseline & 分解策略/参数迁移
    40, 100, 150, 200, 500-两目标/三目标TSP :
    优于MOEA/D、NSGA-II、MOGLS.
    基于图神经网络
    的端到端方法
    2017年Dai等人[37]structure2vec + DQN1200-TSP: 接近[31]. 1200-MVC(最小顶点覆盖): 接近最优解.
    1200-MAXCUT(最大割集): 接近最优解.
    2019年Mittal等人[38]GCN + DQN2k至20k-MCP(最大覆盖问题): 优于[37]. 10k, 20k, 50k-MVC: 优于[37].
    2018年Li等人[39]GCN + 监督式训练 +
    引导树搜索
    实际数据集MVC、MIS(最大独立点集)、MC(极大团)、
    Satisfiability(适定性问题): 优于[37].
    2017年Nowak等人[40]GNN + 监督式训练 +
    波束搜索
    20-TSP: 劣于[30].
    2019年Joshi等人[41]GCN + 监督式训练 +
    波束搜索
    20, 50, 100-TSP: 略微优于[30,31,33,34], 优于[37].
    深度强化学习改
    进的局部搜索
    方法
    2019年Chen等人[47]Ptr-Net + Actor-Critic20-CVRP: 达到最优解. 50, 100-CVRP: 优于[32,34]、OR-Tools.
    作业车间调度: 优于OR-Tools、DeepRM
    2019年Yolcu等人[48]GNN + REINFORCE实际数据集Satisfiability、MIS、MVC、MC、图着色问题: 更少
    搜索步数得到最优解、但单步运行时间长于传统算法.
    2020年Gao等人[49]Graph Attention + PPO100-CVPR: 优于[34]. 100-CVPRTW: 优于多个启发式方法.
    400-CVRPTW: 劣于单个启发式方法, 优于其他.
    2020年Lu等人[50]Transformer Attention +
    REINFORCE
    20, 50, 100-CVRP: 优于[32,34,47], 以及优于OR Tools、
    LKH3. 且运行时间远低于LKH3.
    下载: 导出CSV

    表  2  端到端模型在TSP问题上优化性能比较.

    Table  2  Comparison of end-to-end model on TSP.

    方法类别模型TSP-20TSP-50TSP-100
    最优Concorde3.845.707.76
    基于指
    针网络
    (Attention
    机制)
    Vinyals[30]3.887.66
    Bello[31]3.895.958.30
    Nazari[32]3.976.088.44
    Deudon[33]3.865.818.85
    Deudon[33]+2OPT3.855.858.17
    Kool[34](greedy)3.85(0s)5.80(2s)8.12(6s)
    Kool[34](sampling)3.84(5m)5.73(24m)7.94(1h)
    基于图神
    经网络
    Dai[37]3.895.998.31
    Nowak[40]3.93
    Joshi[41](greedy)3.86(6s)5.87(55s)8.41(6m)
    Joshi[41](BS)3.84(12m)5.70(18m)7.87(40m)
    下载: 导出CSV

    表  3  多个模型在VRP问题上优化性能比较.

    Table  3  Comparison of models on VRP.

    模型VRP-20VRP-50VRP-100
    LKH36.14(2h)10.38(7h)15.65(13h)
    Nazari[32]6.4011.1516.96
    Kool[34](greedy)6.40(1s)10.98(3s)16.80(8s)
    Kool[34](sampling)6.25(6m)10.62(28m)16.23(2h)
    Chen[47]6.1210.5116.10
    Lu[50]6.12(12m)10.35(17m)15.57(24m)
    下载: 导出CSV

    表  4  不同组合优化问题求解算法统计与比较

    Table  4  Summary and comparison of algorithms on different combinatorial optimization problems

    组合优化问题文献模型细节
    TSP问题[3036]基于Ptr-Net架构
    (Encoder-Decoder-Attention)
    [37]GNN+DQN
    [40, 41]GNN+监督式训练+波束搜索
    VRP问题[32, 34]基于Ptr-Net架构(Encoder-
    Decoder-Attention)
    [47, 49, 50]DRL训练局部搜索算子. [47]:
    Ptr-Net模型, [49]: Graph
    Attention模型, [50]: Transformer
    Attention模型.
    最小顶点覆盖问题(MVC)[37, 38, 48]GNN + RL
    [39]GNN + 监督式训练 + 树搜索
    最大割集问题(MaxCut)[37]GNN + DQN
    [57]Message Passing Neural Network
    (MPNN) + DQN
    [58] *CNN&RNN + PPO
    适定性问题(Satisfiability)[39, 48]GNN + 监督式训练/RL
    最小支配集问题 (MDS)[48]GNN + RL
    [59] *Decision Diagram + RL
    极大团问题(MC)[39, 48]GNN + 监督式训练/RL
    最大独立集问题(MIS)[39]GNN + 监督式训练 + 树搜索
    [60] *GNN + RL + 蒙特卡洛树搜索
    背包问题(Knapsack)[31]Ptr-Net + RL
    车间作业调度问题[47]LSTM + RL训练局部搜索算子
    装箱问题(BPP)[61] *LSTM + RL
    [62] *NN + RL + 蒙特卡洛树搜索
    图着色问题[48]GNN + RL
    [63] *LSTM + RL + 蒙特卡洛树搜索
    下载: 导出CSV
  • [1] Papadimitriou CH, Steiglitz K. Combinatorial Optimization: Algorithms and Complexity. Courier Corporation, 1998.
    [2] Festa P. A brief introduction to exact, approximation, and heuristic algorithms for solving hard combinatorial optimization problems. In: 2014 16th International Conference on Transparent Optical Networks (ICTON), IEEE, 2014.1−20.
    [3] Lawler EL, Wood DE. Branch-and-bound methods: A survey. Operations research, 1966, 14(4): 699−719 doi: 10.1287/opre.14.4.699
    [4] Bertsekas DP. Dynamic Programming and Optimal Control. Athena scientific Belmont, MA, 1995.
    [5] Sniedovich M. Dynamic Programming: Foundations and Principles. CRC press, 2010.
    [6] Williamson DP, Shmoys DB. The Design of Approximation Algorithms. Cambridge university press, 2011.
    [7] Vazirani V V. Approximation Algorithms. Springer Science & Business Media, 2013.
    [8] Hochba DS. Approximation algorithms for NP-hard problems. ACM Sigact News, 1997, 28(2): 40−52 doi: 10.1145/261342.571216
    [9] Teoh EJ, Tang H, Tan KC. A columnar competitive model with simulated annealing for solving combinatorial optimization problems. In: The 2006 IEEE International Joint Conference on Neural Network Proceedings, IEEE, 2006.3254−3259.
    [10] Van Laarhoven PJM, Aarts EHL, Lenstra JK. Job shop scheduling by simulated annealing. Operations research, 1992, 40(1): 113−125 doi: 10.1287/opre.40.1.113
    [11] WESLEY BARNES J, LAGUNA M. Solving the multiple-machine weighted flow time problem using tabu search. IIE transactions, 1993, 25(2): 121−128 doi: 10.1080/07408179308964284
    [12] Basu S. Tabu search implementation on traveling salesman problem and its variations: a literature survey. American Journal of Operations Research, 2012, 2(2): 163−173 doi: 10.4236/ajor.2012.22019
    [13] Halim AH, Ismail I. Combinatorial optimization: comparison of heuristic algorithms in travelling salesman problem. Archives of Computational Methods in Engineering, 2019, 26(2): 367−380 doi: 10.1007/s11831-017-9247-y
    [14] Rezoug A, Bader-El-Den M, Boughaci D. Guided genetic algorithm for the multidimensional knapsack problem. Memetic Computing, 2018, 10(1): 29−42 doi: 10.1007/s12293-017-0232-7
    [15] Lin BL, Sun X, Salous S. Solving travelling salesman problem with an improved hybrid genetic algorithm. Journal of computer and communications, 2016, 4(15): 98−106 doi: 10.4236/jcc.2016.415009
    [16] Prado RS, Silva RCP, Guimarães FG, Neto OM. Using differential evolution for combinatorial optimization: A general approach. In: 2010 IEEE International Conference on Systems, Man and Cybernetics, IEEE, 2010.11−18.
    [17] Onwubolu GC, Davendra D. Differential Evolution: A Handbook for Global Permutation-Based Combinatorial Optimization. Vol 175. Springer Science & Business Media, 2009.
    [18] Deng W, Xu J, Zhao H. An improved ant colony optimization algorithm based on hybrid strategies for scheduling problem. IEEE Access, 2019, 7: 20281−20292 doi: 10.1109/ACCESS.2019.2897580
    [19] Ramadhani T, Hertono GF, Handari BD. An Ant Colony Optimization algorithm for solving the fixed destination multi-depot multiple traveling salesman problem with non-random parameters. In: AIP Conference Proceedings, AIP Publishing LLC, 2017.30123.
    [20] Zhong Y, Lin J, Wang L, Zhang H. Discrete comprehensive learning particle swarm optimization algorithm with Metropolis acceptance criterion for traveling salesman problem. Swarm and Evolutionary Computation, 2018, 42: 77−88 doi: 10.1016/j.swevo.2018.02.017
    [21] Nouiri M, Bekrar A, Jemai A, Niar S, Ammari AC. An effective and distributed particle swarm optimization algorithm for flexible job-shop scheduling problem. Journal of Intelligent Manufacturing, 2018, 29(3): 603−615 doi: 10.1007/s10845-015-1039-3
    [22] Lourenço HR, Martin OC, Stützle T. Iterated local search: Framework and applications. In: Handbook of Metaheuristics, Springer, 2019.129−168.
    [23] Grasas A, Juan AA, Lourenço HR. SimILS: a simulation-based extension of the iterated local search metaheuristic for stochastic combinatorial optimization. Journal of Simulation, 2016, 10(1): 69−77 doi: 10.1057/jos.2014.25
    [24] Zhang G, Zhang L, Song X, Wang Y, Zhou C. A variable neighborhood search based genetic algorithm for flexible job shop scheduling problem. Cluster Computing, 2019, 22(5): 11561−11572
    [25] Hore S, Chatterjee A, Dewanji A. Improving variable neighborhood search to solve the traveling salesman problem. Applied Soft Computing, 2018, 68: 83−91 doi: 10.1016/j.asoc.2018.03.048
    [26] Silver D, Schrittwieser J, Simonyan K, et al. Mastering the game of go without human knowledge. Nature, 2017, 550(7676): 354−359 doi: 10.1038/nature24270
    [27] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529−533 doi: 10.1038/nature14236
    [28] Hopfield JJ, Tank DW. “Neural” computation of decisions in optimization problems. Biological Cybernetics, 1985, 52(3): 141−152
    [29] Smith KA. Neural networks for combinatorial optimization: a review of more than a decade of research. INFORMS Journal on Computing, 1999, 11(1): 15−34 doi: 10.1287/ijoc.11.1.15
    [30] Vinyals O, Fortunato M, Jaitly N. Pointer networks. In: Advances in Neural Information Processing Systems, 2015.
    [31] Bello I, Pham H, Le Q V., Norouzi M, Bengio S. Neural combinatorial optimization with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017 - Workshop Track Proceedings, 2019.
    [32] Nazari M, Oroojlooy A, Takáč M, Snyder L V. Reinforcement learning for solving the vehicle routing problem. In: Advances in Neural Information Processing Systems, 2018.
    [33] Deudon M, Cournut P, Lacoste A, Adulyasak Y, Rousseau LM. Learning heuristics for the tsp by policy gradient. In: International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research, Springer, 2018.170−181.
    [34] Kool W, Van Hoof H, Welling M. Attention, learn to solve routing problems! In: 7th International Conference on Learning Representations, ICLR, 2019.
    [35] Ma, Qiang and Ge, Suwen and He, Danyang and Thaker, Darshan and Drori I. Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning. In: AAAI Workshop on Deep Learning on Graphs: Methodologies and Applications, 2020.
    [36] Li K, Zhang T, Wang R. Deep Reinforcement Learning for Multiobjective Optimization. IEEE Transactions on Cybernetics, 2020.
    [37] Dai H, Khalil EB, Zhang Y, Dilkina B, Song L. Learning combinatorial optimization algorithms over graphs. In: Advances in Neural Information Processing Systems, 2017.6348−6358.
    [38] Mittal A, Dhawan A, Manchanda S, Medya S, Ranu S, Singh A. Learning heuristics over large graphs via deep reinforcement learning. arXiv preprint arXiv: 190303332, 2019.
    [39] Li Z, Chen Q, Koltun V. Combinatorial optimization with graph convolutional networks and guided tree search. In: Advances in Neural Information Processing Systems, 2018.539−548.
    [40] Nowak A, Villar S, Bandeira AS, Bruna J. A note on learning algorithms for quadratic assignment with graph neural networks. In: Proceeding of the 34th International Conference on Machine Learning (ICML), 2017.1050: 22.
    [41] Joshi CK, Laurent T, Bresson X. An efficient graph convolutional network technique for the travelling salesman problem. arXiv preprint arXiv: 190601227, 2019.
    [42] Helsgaun K. An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained traveling salesman and vehicle routing problems. Roskilde: Roskilde University, 2017.
    [43] Perron L, Furnon V. Google’s OR-Tools. URL https://developersgooglecom/optimization, 2019.
    [44] OPTIMIZATION G. INC. Gurobi optimizer reference manual, 2015. URL: http://wwwgurobicom, 2014: 29.
    [45] Applegate D, Bixby R, Chvatal V, Cook W. Concorde TSP solver. 2006.
    [46] Bengio Y, Lodi A, Prouvost A. Machine Learning for Combinatorial Optimization: a Methodological Tour d’Horizon. arXiv preprint arXiv: 181106128, 2018.
    [47] Chen X, Tian Y. Learning to perform local rewriting for combinatorial optimization. In: Advances in Neural Information Processing Systems, 2019.6281−6292.
    [48] Yolcu E, Póczos B. Learning local search heuristics for boolean satisfiability. In: Advances in Neural Information Processing Systems, 2019.7992−8003.
    [49] Gao L, Chen M, Chen Q, Luo G, Zhu N, Liu Z. Learn to design the heuristics for vehicle routing problem. arXiv preprint arXiv: 200208539, 2020.
    [50] Lu H, Zhang X, Yang S. A Learning-based Iterative Method for Solving Vehicle Routing Problems. In: International Conference on Learning Representations, 2019.
    [51] Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G. The graph neural network model. IEEE Transactions on Neural Networks, 2008, 20(1): 61−80
    [52] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Advances in Neural Information Processing Systems, 2017.5998−6008.
    [53] Hsu C-H, Chang S-H, Liang J-H, et al. Monas: Multi-objective neural architecture search using reinforcement learning. arXiv preprint arXiv: 180610332, 2018.
    [54] Mossalam H, Assael YM, Roijers DM, Whiteson S. Multi-objective deep reinforcement learning. arXiv preprint arXiv: 161002707, 2016.
    [55] Joshi CK, Laurent T, Bresson X. On Learning Paradigms for the Travelling Salesman Problem. In: NeurIPS Workshop on Graph Representation Learning, 2019.
    [56] Joshi CK, Cappart Q, Rousseau L-M, Laurent T, Bresson X. Learning TSP requires rethinking generalization. arXiv preprint arXiv: 200607054, 2020.
    [57] Barrett TD, Clements WR, Foerster JN, Lvovsky AI. Exploratory combinatorial optimization with reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2020.3243−3250.
    [58] Beloborodov D, Ulanov AE, Foerster JN, Whiteson S, Lvovsky AI. Reinforcement Learning Enhanced Quantum-inspired Algorithm for Combinatorial Optimization. arXiv preprint arXiv: 200204676, 2020.
    [59] Cappart Q, Goutierre E, Bergman D, Rousseau L-M. Improving optimization bounds using machine learning: Decision diagrams meet deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2019.1443−1451.
    [60] Abe K, Sato I, Sugiyama M. Solving NP-Hard Problems on Graphs by Reinforcement Learning without Domain Knowledge. Simulation, 2019, 1: 1
    [61] Hu H, Zhang X, Yan X, Wang L, Xu Y. Solving a new 3d bin packing problem with deep reinforcement learning method. arXiv preprint arXiv: 170805930, 2017.
    [62] Laterre A, Fu Y, Jabri MK, et al. Ranked reward: Enabling self-play reinforcement learning for combinatorial optimization. arXiv preprint arXiv: 180701672, 2018.
    [63] Huang J, Patwary M, Diamos G. Coloring big graphs with alphagozero. arXiv preprint arXiv: 190210162, 2019.
    [64] Li J, Shi W, Zhang N, Shen X. Delay-Aware VNF Scheduling: A Reinforcement Learning Approach with Variable Action Set. IEEE Transactions on Cognitive Communications and Networking, 2020.
    [65] Mijumbi R, Hasija S, Davy S, Davy A, Jennings B, Boutaba R. Topology-aware prediction of virtual network function resource requirements. IEEE Transactions on Network and Service Management, 2017, 14(1): 106−120 doi: 10.1109/TNSM.2017.2666781
    [66] Mijumbi R, Hasija S, Davy S, Davy A, Jennings B, Boutaba R. A connectionist approach to dynamic resource management for virtualised network functions. In: 2016 12th International Conference on Network and Service Management (CNSM), IEEE, 2016.1−9.
    [67] Quang PTA, Hadjadj-Aoul Y, Outtagarts A. A deep reinforcement learning approach for VNF Forwarding Graph Embedding. IEEE Transactions on Network and Service Management, 2019, 16(4): 1318−1331 doi: 10.1109/TNSM.2019.2947905
    [68] Solozabal R, Ceberio J, Sanchoyerto A, Zabala L, Blanco B, Liberal F. Virtual Network Function Placement Optimization with Deep Reinforcement Learning. IEEE Journal on Selected Areas in Communications, 2020, 38(2)
    [69] Liu Q, Han T, Moges E. EdgeSlice: Slicing Wireless Edge Computing Network with Decentralized Deep Reinforcement Learning. arXiv preprint arXiv: 200312911, 2020.
    [70] Van Huynh N, Thai Hoang D, Nguyen DN, Dutkiewicz E. Optimal and Fast Real-Time Resource Slicing with Deep Dueling Neural Networks. IEEE Journal on Selected Areas in Communications, 2019, 37(6)
    [71] Mseddi A, Jaafar W, Elbiaze H, Ajib W. Intelligent Resource Allocation in Dynamic Fog Computing Environments. In: 2019 IEEE 8th International Conference on Cloud Networking (CloudNet), IEEE, 2019.1−7.
    [72] Almasan P, Suárez-Varela J, Badia-Sampera A, Rusek K, Barlet-Ros P, Cabellos-Aparicio A. Deep Reinforcement Learning meets Graph Neural Networks: exploring a routing optimization use case. arXiv preprint arXiv: 191007421, 2020.
    [73] Meng X, Inaltekin H, Krongold B. Deep reinforcement learning-based topology optimization for self-organized wireless sensor networks. In: 2019 IEEE Global Communications Conference (GLOBECOM), IEEE, 2019.1−6.
    [74] Lu J, Feng L, Yang J, Hassan MM, Alelaiwi A, Humar I. Artificial agent: The fusion of artificial intelligence and a mobile agent for energy-efficient traffic control in wireless sensor networks. Future Generation Computer Systems, 2019, 95: 45−51 doi: 10.1016/j.future.2018.12.024
    [75] Zhang S, Shen W, Zhangt M, Cao X, Cheng Y. Experience-Driven Wireless D2D Network Link Scheduling: A Deep Learning Approach. In: IEEE International Conference on Communications, 2019.1−6.
    [76] Huang L, Bi S, Zhang YJ. Deep Reinforcement Learning for Online Computation Offloading in Wireless Powered Mobile-Edge Computing Networks. IEEE Transactions on Mobile Computing, 2019.
    [77] Wang J, Hu J, Min G, Zhan W, Ni Q, Georgalas N. Computation offloading in multi-access edge computing using a deep sequential model based on reinforcement learning. IEEE Communications Magazine, 2019, 57(5): 64−69 doi: 10.1109/MCOM.2019.1800971
    [78] Jiang Q, Zhang Y, Yan J. Neural Combinatorial Optimization for Energy-Efficient Offloading in Mobile Edge Computing. IEEE Access, 2020: 8
    [79] Yu JJQ, Yu W, Gu J. Online Vehicle Routing with Neural Combinatorial Optimization and Deep Reinforcement Learning. IEEE Transactions on Intelligent Transportation Systems, 2019, 20(10)
    [80] Holler J, Vuorio R, Qin Z, et al. Deep Reinforcement Learning for Multi-Driver Vehicle Dispatching and Repositioning Problem. In: 2019 IEEE International Conference on Data Mining (ICDM), IEEE, 2019.1090−1095.
    [81] Liang X, Du X, Wang G, Han Z. A deep reinforcement learning network for traffic light cycle control. IEEE Transactions on Vehicular Technology, 2019, 68(2): 1243−1253 doi: 10.1109/TVT.2018.2890726
    [82] Chen X, Tian Y. Learning to Progressively Plan. arXiv preprint arXiv: 1810.00337, 2018.
    [83] Zheng P, Zuo LL, Wang JL, Zhang J. Pointer networks for solving the permutation flow shop scheduling problem. In: Proceedings of International Conference on Computers and Industrial Engineering, IEEE, 2018.2−5.
    [84] Pan R, Dong X, Han S. Solving Permutation Flowshop Problem with Deep Reinforcement Learning. In: 2020 Prognostics and Health Management Conference (PHM-Besançon), IEEE, 2020.349−353.
    [85] Mirhoseini A, Pham H, Le Q V, et al. Device placement optimization with reinforcement learning. arXiv preprint arXiv: 170604972, 2017.
    [86] Mirhoseini A, Goldie A, Pham H, Steiner B, Le Q V, Dean J. A hierarchical model for device placement. In: International Conference on Learning Representations, 2018.
    [87] François-Lavet V, Taralla D, Ernst D, Fonteneau R. Deep reinforcement learning solutions for energy microgrids management. In: European Workshop on Reinforcement Learning (EWRL 2016), 2016.
    [88] 张自东, 邱才明, 张东霞, 徐舒玮贺兴. 基于深度强化学习的微电网复合储能协调控制方法. 电网技术, 2019, 43(6): 1914−1921

    Zhang Zi-dong, Qiu Cai-Ming, Zhang Dong-Xia, Xu Shu-Wei, He Xing. A Coordinated Control Method for Hybrid Energy Storage System in Microgrid Based on Deep Reinforcement Learning. Power System Technology, 2019, 43(6): 1914−1921
    [89] Valladares W, Galindo M, Gutiérrez J, et al. Energy optimization associated with thermal comfort and indoor air control via a deep reinforcement learning algorithm. Building and Environment, 2019, 155: 105−117 doi: 10.1016/j.buildenv.2019.03.038
    [90] Mocanu E, Mocanu DC, Nguyen PH, et al. On-line Building Energy Optimization using Deep Reinforcement Learning. IEEE Transactions on Smart Grid, 2018, 10(4): 3698−3708
  • 加载中
计量
  • 文章访问数:  1287
  • HTML全文浏览量:  452
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-07-14
  • 录用日期:  2020-11-04
  • 网络出版日期:  2020-12-10

目录

    /

    返回文章
    返回