A Four Directional Cooperative Three-dimensional Packing Method Based on Deep Reinforcement Learning
-
摘要: 物流作为现代经济的重要组成部分, 在国民经济和社会发展中发挥着重要作用. 物流中的三维装箱问题(Three-dimensional bin packing problem, 3D-BPP)是提高物流运作效率必须解决的关键难题之一. 深度强化学习(Deep reinforcement learning, DRL)具有强大的学习与决策能力, 基于DRL的三维装箱方法(Three-dimensional bin packing method based on DRL, DRL-3DBP)已成为智能物流领域的研究热点之一. 现有DRL-3DBP面对大尺寸容器3D-BPP时难以达成动作空间、计算复杂性与探索能力之间的平衡. 为此, 提出一种四向协同装箱(Four directional cooperative packing, FDCP)方法: 两阶段策略网络接收旋转后的容器状态, 生成4个方向的装箱策略; 根据由4个策略采样而得的动作更新对应的4个状态, 选取其中价值最大的对应动作为装箱动作. FDCP在压缩动作空间、减小计算复杂性的同时, 鼓励智能体对4个方向合理装箱位置的探索. 实验结果表明, FDCP在100 × 100大尺寸容器以及20、30、50箱子数量的装箱问题上实现了1.2% ~ 2.9%的空间利用率提升.Abstract: As an important part of the modern economy, logistics plays an important role in the national economy and social development. The three-dimensional bin packing problem (3D-BPP) in logistics is one of the key problems that must be solved to improve the efficiency of logistics operations. Deep reinforcement learning (DRL) has a powerful learning and decision-making ability, and the three-dimensional bin packing method based on DRL (DRL-3DBP) has become one of the research hotspots in the field of intelligent logistics. The existing DRL-3DBPs have difficulty in striking a balance between the action space, computational complexity, and exploration capability when solving 3D-BPP with large-size bins. To this end, this paper proposes a four directional cooperative packing (FDCP) method. The two-stage policy network receives the rotated bin states and generates four directional packing policies. Based on the actions sampled from the four policies, the four states are updated accordingly, and the action corresponding to the highest value is selected as the packing action. FDCP encourages agent to explore reasonable packing positions in all four directions while compressing the action space and reducing computational complexity. Experimental results show that FDCP achieves 1.2% ~ 2.9% improvement in space utilization on the packing problem with 100 × 100 large-sized bin and the numbers of 20, 30, and 50 items.
-
表 1 $ 100 \times 100 $容器装箱算例上的对比结果
Table 1 Comparative results on packing instances with $ 100 \times 100 $ bin
方法 $N=20$ $N=30$ $N=50$ UR (%) Time (s) UR (%) Time (s) UR (%) Time (s) Heuristic-3DBP GA+DBLF 70.2 17.5 69.4 36.3 66.3 71.9 EP 62.7 <1.0 63.8 <1.0 66.3 <1.0 LAFF 58.6 <1.0 59.1 <1.0 61.9 <1.0 EBAFIT 65.4 <1.0 65.9 <1.0 66.1 1.5 DRL-3DBP MTSL 62.4 4.8 60.1 10.2 55.3 23.0 CQL 67.0 1.0 69.3 1.2 73.6 3.3 JIANG 73.5 2.3 76.9 3.2 82.0 10.9 QUE 76.5 1.4 79.3 2.1 82.4 3.5 FDCP 79.4 1.9 81.7 3.1 83.6 5.2 表 2 $ 200 \times 200 $和$ 400 \times 200 $容器算例上各方法的空间利用率UR (%)
Table 2 Space utilization (UR) of each method on instances with $ 200 \times 200 $ and $ 400 \times 200 $ bins (%)
容器尺寸 GA+DBLF EP LAFF EBAFIT MTSL CQL JIANG QUE FDCP 200$\times$200 61.4 63.3 58.0 62.8 50.8 58.7 75.2 80.5 81.2 400$\times$200 58.7 60.1 55.4 60.5 46.9 47.5 70.5 76.7 76.8 表 3 消融实验结果 (%)
Table 3 Results of ablation experiment (%)
方法 $N=20$ $N=30$ $N=50$ FDCP 79.4 81.7 83.6 −CO 75.9 79.2 81.4 −FD 75.9 79.0 81.5 −PN 76.5 79.5 81.9 -
[1] Bortfeldt A, Wascher G. Constraints in container loading——A state-of-the-art review. European Journal of Operational Research, 2013, 229(1): 1−20 doi: 10.1016/j.ejor.2012.12.006 [2] Jiao G S, Huang M, Song Y, Liu H B, Wang X W. Container loading problem based on robotic loader system: An optimization approach. Expert Systems With Applications, 2024, 236: Article No. 121222 doi: 10.1016/j.eswa.2023.121222 [3] Consolini L, Laurini M, Locatelli M. A dynamic programming approach for cooperative pallet-loading manipulators. IEEE Transactions on Automation Science and Engineering, DOI: 10.1109/TASE.2023.3310007 [4] Martello S, Pisinger D, Vigo D. The three-dimensional bin packing problem. Operations Research, 2000, 48(2): 256−267 doi: 10.1287/opre.48.2.256.12386 [5] Hifi M. Exact algorithms for unconstrained three-dimensional cutting problems: A comparative study. Computers & Operations Research, 2004, 31(5): 657−674 [6] Chen C S, Lee S M, Shen Q S. An analytical model for the container loading problem. European Journal of Operational Research, 1995, 80(1): 68−76 doi: 10.1016/0377-2217(94)00002-T [7] Dósa G, Sgall J. First Fit bin packing: A tight analysis. In: Proceedings of the 30th International Symposium on Theoretical Aspects of Computer Science (STACS 2013). Dagstuhl, Germany: Schloss Dagstuhl——Leibniz-Zentrum für Informatik, 2013. 538−549 [8] Crainic T G, Perboli G, Tadei R. Extreme point-based heuristics for three-dimensional bin packing. INFORMS Journal on Computing, 2008, 20(3): 368−384 doi: 10.1287/ijoc.1070.0250 [9] Karabulut K, Inceoglu M M. A hybrid genetic algorithm for packing in 3D with deepest bottom left with fill method. In: Proceedings of the Third International Conference on Advances in Information Systems. Berlin, Heidelberg: Springer, 2004. 441−450 [10] Gurbuz M Z, Akyokus S, Emiroglu I, Guran A. An efficient algorithm for 3D rectangular box packing. In: Proceedings of the Selected AAS 2009 Papers. Skopje, Macedonia: Society for ETAI of Republic of Macedonia, 2009. 131−134 [11] Parreño F, Alvarez-Valdés R, Tamarit J M, Oliveira J F. A maximal-space algorithm for the container loading problem. INFORMS Journal on Computing, 2008, 20(3): 412−422 doi: 10.1287/ijoc.1070.0254 [12] Hasan J, Kaabi J, Harrath Y. Multi-objective 3D bin-packing problem. In: Proceedings of the 8th International Conference on Modeling Simulation and Applied Optimization (ICMSAO). Manama, Bahrain: IEEE, 2019. 1−5 [13] 张德富, 魏丽军, 陈青山, 陈火旺. 三维装箱问题的组合启发式算法. 软件学报, 2007, 18(9): 2083−2089 doi: 10.1360/jos182083Zhang De-Fu, Wei Li-Jun, Chen Qing-Shan, Chen Huo-Wang. A combinational heuristic algorithm for the three-dimensional packing problem. Journal of Software, 2007, 18(9): 2083−2089 doi: 10.1360/jos182083 [14] 何琨, 黄文奇. 基于动作空间的三维装箱问题的确定性高效率求解算法. 计算机学报, 2014, 37(8): 1786−1793He Kun, Huang Wen-Qi. An action space based deterministic efficient algorithm for solving the three-dimensional container loading. Chinese Journal of Computers, 2014, 37(8): 1786−1793 [15] Wu Y, Li W K, Goh M, de Souza R. Three-dimensional bin packing problem with variable bin height. European Journal of Operational Research, 2010, 202(2): 347−355 doi: 10.1016/j.ejor.2009.05.040 [16] Yang J L, Liu H W, Liang K B, Zhou L, Zhao J H. Variable neighborhood genetic algorithm for multi-order multi-bin open packing optimization. Applied Soft Computing, 2024, 163: Article No. 111890 doi: 10.1016/j.asoc.2024.111890 [17] Silveira M E, Vieira S M, Sousa J M D C. An ACO algorithm for the 3D bin packing problem in the steel industry. In: Proceedings of the 26th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Berlin, Heidelberg: Springer, 2013. 535−544 [18] 张德富, 彭煜, 朱文兴, 陈火旺. 求解三维装箱问题的混合模拟退火算法. 计算机学报, 2009, 32(11): 2147−2156Zhang De-Fu, Peng Yu, Zhu Wen-Xing, Chen Huo-Wang. A hybrid simulated annealing algorithm for the three-dimensional packing problem. Chinese Journal of Computers, 2009, 32(11): 2147−2156 [19] Tsao Y C, Tai J Y, Vu T L, Chen T H. Multiple bin-size bin packing problem considering incompatible product categories. Expert Systems With Applications, 2024, 247: Article No. 123340 doi: 10.1016/j.eswa.2024.123340 [20] Bortfeldt A, Gehring H. Applying tabu search to container loading problems. In: Proceedings of the Operations Research Proceedings 1997. Berlin, Heidelberg: Springer, 1998. 533−538 [21] 刘胜, 沈大勇, 商秀芹, 赵红霞, 董西松, 王飞跃. 求解三维装箱问题的多层树搜索算法. 自动化学报, 2020, 46(6): 1178−1187Liu Sheng, Shen Da-Yong, Shang Xiu-Qin, Zhao Hong-Xia, Dong Xi-Song, Wang Fei-Yue. A multi-level tree search algorithm for three dimensional container loading problem. Acta Automatica Sinica, 2020, 46(6): 1178−1187 [22] Ren J D, Tian Y J, Sawaragi T. A tree search method for the container loading problem with shipment priority. European Journal of Operational Research, 2011, 214(3): 526−535 doi: 10.1016/j.ejor.2011.04.025 [23] Zhu W B, Lim A. A new iterative-doubling Greedy-Lookahead algorithm for the single container loading problem. European Journal of Operational Research, 2012, 222(3): 408−417 doi: 10.1016/j.ejor.2012.04.036 [24] Araya I, Moyano M, Sanchez C. A beam search algorithm for the biobjective container loading problem. European Journal of Operational Research, 2020, 286(2): 417−431 doi: 10.1016/j.ejor.2020.03.040 [25] Vinyals O, Fortunato M, Jaitly N. Pointer networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press, 2015. 2692−2700 [26] Bello I, Pham H, Le Q V, Norouzi M, Bengio S. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv: 1611.09940, 2017. [27] Jin Y, Ding Y D, Pan X H, He K, Zhao L, Qin T, et al. Pointerformer: Deep reinforced multi-pointer Transformer for the traveling salesman problem. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. Washington, USA: AAAI Press, 2023. 8132−8140 [28] Tang Y H, Agrawal S, Faenza Y. Reinforcement learning for integer programming: Learning to cut. In: Proceedings of the 37th International Conference on Machine Learning. Vienna, Austria: PMLR, 2020. 9367−9376 [29] Theodoropoulos T, Makris A, Psomakelis E, Carlini E, Mordacchini M, Dazzi P, et al. GNOSIS: Proactive image placement using graph neural networks & deep reinforcement learning. In: Proceedings of the IEEE 16th International Conference on Cloud Computing (CLOUD). Chicago, USA: IEEE, 2023. 120−128 [30] Ahn S, Seo Y, Shin J. Deep auto-deferring policy for combinatorial optimization [Online], avaliable: https://openreview.net/forum?id=Hkexw1BtDr, March 12, 2024 [31] Hu H Y, Zhang X D, Yan X W, Wang L F, Xu Y H. Solving a new 3D bin packing problem with deep reinforcement learning method. arXiv preprint arXiv: 1708.05930, 2017. [32] Duan L, Hu H Y, Qian Y, Gong Y, Zhang X D, Xu Y H, et al. A multi-task selected learning approach for solving 3D flexible bin packing problem. arXiv preprint arXiv: 1804.06896, 2019. [33] Verma R, Singhal A, Khadilkar H, Basumatary A, Nayak S, Singh H V, et al. A generalized reinforcement learning algorithm for online 3D bin-packing. arXiv preprint arXiv: 2007.00463, 2020. [34] Liu H W, Zhou L, Yang J L, Zhao J H. The 3D bin packing problem for multiple boxes and irregular items based on deep Q-network. Applied Intelligence, 2023, 53(20): 23398−23425 doi: 10.1007/s10489-023-04604-6 [35] Li D D, Ren C W, Gu Z Q, Wang Y X, Lau F. Solving packing problems by conditional query learning [Online], avaliable: https://openreview.net/forum?id=BkgTwRNtPB, March 12, 2024 [36] Jiang Y, Cao Z G, Zhang J. Learning to solve 3-D bin packing problem via deep reinforcement learning and constraint programming. IEEE Transactions on Cybernetics, 2023, 53(5): 2864−2875 doi: 10.1109/TCYB.2021.3121542 [37] Zhang J W, Zi B, Ge X Y. Attend2Pack: Bin packing through deep reinforcement learning with attention. arXiv preprint arXiv: 2107.04333, 2021. [38] Que Q Q, Yang F, Zhang D F. Solving 3D packing problem using Transformer network and reinforcement learning. Expert Systems With Applications, 2022, 214: Article No. 119153 [39] Parker J A, Kenyon R V, Troxel D E. Comparison of interpolating methods for image resampling. IEEE Transactions on Medical Imaging, 1983, 2(1): 31−39 doi: 10.1109/TMI.1983.4307610