基于深度强化学习的四向协同三维装箱方法

尹昊; 陈帆; 和红杰

doi:10.16383/j.aas.c240124

基于深度强化学习的四向协同三维装箱方法

doi: 10.16383/j.aas.c240124 cstr: 32138.14.j.aas.c240124

尹昊^1,,
陈帆^2,,
和红杰^1,

1.
西南交通大学信息科学与技术学院成都 611756
2.
西南交通大学计算机与人工智能学院成都 611756

详细信息

作者简介:
尹昊：西南交通大学信息科学与技术学院博士研究生. 2020年获得西南交通大学学士学位. 主要研究方向为强化学习, 人工智能. E-mail: haoyin@my.swjtu.edu.cn

陈帆：西南交通大学计算机与人工智能学院副教授. 主要研究方向为机器学习, 多媒体安全和计算机应用. E-mail: fchen@swjtu.edu.cn

和红杰：西南交通大学信息科学与技术学院教授. 主要研究方向为深度学习, 图像处理和信息安全. 本文通信作者. E-mail: hjhe@swjtu.edu.cn

计量
- 文章访问数: 973
- HTML全文浏览量: 585
- PDF下载量: 197
- 被引次数: 0
出版历程
- 收稿日期: 2024-03-12
- 录用日期: 2024-07-04
- 网络出版日期: 2024-10-29
- 刊出日期: 2024-12-20

A Four Directional Cooperative Three-dimensional Packing Method Based on Deep Reinforcement Learning

YIN Hao^1
,,
CHEN Fan^2
,,
HE Hong-Jie^1
,

1.
School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756
2.
School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu 611756

More Information

Author Bio:
YIN Hao　Ph.D. candidate at the School of Information Science and Technology, Southwest Jiaotong University. He received his bachelor degree from Southwest Jiaotong University in 2020. His research interest covers reinforcement learning and artificial intelligence

CHEN Fan　Associate professor at the School of Computing and Artificial Intelligence, Southwest Jiaotong University. His research interest covers machine learning, multimedia security, and computer applications

HE Hong-Jie　Professor at the School of Information Science and Technology, Southwest Jiaotong University. Her research interest covers deep learning, image processing, and information security. Corresponding author of this paper

摘要

摘要: 物流作为现代经济的重要组成部分, 在国民经济和社会发展中发挥着重要作用. 物流中的三维装箱问题(Three-dimensional bin packing problem, 3D-BPP)是提高物流运作效率必须解决的关键难题之一. 深度强化学习(Deep reinforcement learning, DRL)具有强大的学习与决策能力, 基于DRL的三维装箱方法(Three-dimensional bin packing method based on DRL, DRL-3DBP)已成为智能物流领域的研究热点之一. 现有DRL-3DBP面对大尺寸容器3D-BPP时难以达成动作空间、计算复杂性与探索能力之间的平衡. 为此, 提出一种四向协同装箱(Four directional cooperative packing, FDCP)方法: 两阶段策略网络接收旋转后的容器状态, 生成4个方向的装箱策略; 根据由4个策略采样而得的动作更新对应的4个状态, 选取其中价值最大的对应动作为装箱动作. FDCP在压缩动作空间、减小计算复杂性的同时, 鼓励智能体对4个方向合理装箱位置的探索. 实验结果表明, FDCP在100 × 100大尺寸容器以及20、30、50箱子数量的装箱问题上实现了1.2% ~ 2.9%的空间利用率提升.
- 三维装箱问题 /
- 组合优化问题 /
- 深度强化学习 /
- 四向协同装箱
Abstract: As an important part of the modern economy, logistics plays an important role in the national economy and social development. The three-dimensional bin packing problem (3D-BPP) in logistics is one of the key problems that must be solved to improve the efficiency of logistics operations. Deep reinforcement learning (DRL) has a powerful learning and decision-making ability, and the three-dimensional bin packing method based on DRL (DRL-3DBP) has become one of the research hotspots in the field of intelligent logistics. The existing DRL-3DBPs have difficulty in striking a balance between the action space, computational complexity, and exploration capability when solving 3D-BPP with large-size bins. To this end, this paper proposes a four directional cooperative packing (FDCP) method. The two-stage policy network receives the rotated bin states and generates four directional packing policies. Based on the actions sampled from the four policies, the four states are updated accordingly, and the action corresponding to the highest value is selected as the packing action. FDCP encourages agent to explore reasonable packing positions in all four directions while compressing the action space and reducing computational complexity. Experimental results show that FDCP achieves 1.2% ~ 2.9% improvement in space utilization on the packing problem with 100 × 100 large-sized bin and the numbers of 20, 30, and 50 items.

HTML全文

图 1 3D-BPP在物流中的应用场景

Fig. 1 Application scenarios of 3D-BPP in logistics

下载: 全尺寸图片幻灯片

图 2 笛卡尔坐标系与箱子属性

Fig. 2 Cartesian coordinate system and item properties

下载: 全尺寸图片幻灯片

图 3 容器状态表示

Fig. 3 Representation of the bin state

下载: 全尺寸图片幻灯片

图 4 各编码器和解码器的结构

Fig. 4 Structure of each encoder and decoder

下载: 全尺寸图片幻灯片

图 5 4种方向的装箱策略

Fig. 5 The packing policy for the four directions

下载: 全尺寸图片幻灯片

图 6 四向协同装箱方法结构

Fig. 6 Structure of the four directional cooperative packing method

下载: 全尺寸图片幻灯片

图 7 FDCP求解3D-SPP流程图

Fig. 7 Flowchart of FDCP for solving 3D-SPP

下载: 全尺寸图片幻灯片

图 8 装箱结果可视化

Fig. 8 Visualization of packing results

下载: 全尺寸图片幻灯片

图 9 FDCP在不同箱子数量算例上的测试结果

Fig. 9 Test results of FDCP on instances with different numbers of items

下载: 全尺寸图片幻灯片

图 10 容器各区域放置次数热力图

Fig. 10 Heat map of the number of placements in each area of the bin

下载: 全尺寸图片幻灯片

表 1 $ 100 \times 100 $容器装箱算例上的对比结果

Table 1 Comparative results on packing instances with $ 100 \times 100 $ bin

方法		$N=20$		$N=30$		$N=50$
方法		UR (%)	Time (s)	UR (%)	Time (s)	UR (%)	Time (s)
Heuristic-3DBP	GA+DBLF	70.2	17.5	69.4	36.3	66.3	71.9
	EP	62.7	<1.0	63.8	<1.0	66.3	<1.0
	LAFF	58.6	<1.0	59.1	<1.0	61.9	<1.0
	EBAFIT	65.4	<1.0	65.9	<1.0	66.1	1.5
DRL-3DBP	MTSL	62.4	4.8	60.1	10.2	55.3	23.0
	CQL	67.0	1.0	69.3	1.2	73.6	3.3
	JIANG	73.5	2.3	76.9	3.2	82.0	10.9
	QUE	76.5	1.4	79.3	2.1	82.4	3.5
	FDCP	79.4	1.9	81.7	3.1	83.6	5.2

下载: 导出CSV

表 2 $ 200 \times 200 $和$ 400 \times 200 $容器算例上各方法的空间利用率UR (%)

Table 2 Space utilization (UR) of each method on instances with $ 200 \times 200 $ and $ 400 \times 200 $ bins (%)

容器尺寸	GA+DBLF	EP	LAFF	EBAFIT	MTSL	CQL	JIANG	QUE	FDCP
200$\times$200	61.4	63.3	58.0	62.8	50.8	58.7	75.2	80.5	81.2
400$\times$200	58.7	60.1	55.4	60.5	46.9	47.5	70.5	76.7	76.8

下载: 导出CSV

表 3 消融实验结果 (%)

Table 3 Results of ablation experiment (%)

方法	$N=20$	$N=30$	$N=50$
FDCP	79.4	81.7	83.6
−CO	75.9	79.2	81.4
−FD	75.9	79.0	81.5
−PN	76.5	79.5	81.9

下载: 导出CSV

参考文献(39)

[1]	Bortfeldt A, Wascher G. Constraints in container loading——A state-of-the-art review. European Journal of Operational Research, 2013, 229(1): 1−20 doi: 10.1016/j.ejor.2012.12.006
[2]	Jiao G S, Huang M, Song Y, Liu H B, Wang X W. Container loading problem based on robotic loader system: An optimization approach. Expert Systems With Applications, 2024, 236: Article No. 121222 doi: 10.1016/j.eswa.2023.121222
[3]	Consolini L, Laurini M, Locatelli M. A dynamic programming approach for cooperative pallet-loading manipulators. IEEE Transactions on Automation Science and Engineering, DOI: 10.1109/TASE.2023.3310007
[4]	Martello S, Pisinger D, Vigo D. The three-dimensional bin packing problem. Operations Research, 2000, 48(2): 256−267 doi: 10.1287/opre.48.2.256.12386
[5]	Hifi M. Exact algorithms for unconstrained three-dimensional cutting problems: A comparative study. Computers & Operations Research, 2004, 31(5): 657−674
[6]	Chen C S, Lee S M, Shen Q S. An analytical model for the container loading problem. European Journal of Operational Research, 1995, 80(1): 68−76 doi: 10.1016/0377-2217(94)00002-T
[7]	Dósa G, Sgall J. First Fit bin packing: A tight analysis. In: Proceedings of the 30th International Symposium on Theoretical Aspects of Computer Science (STACS 2013). Dagstuhl, Germany: Schloss Dagstuhl——Leibniz-Zentrum für Informatik, 2013. 538−549
[8]	Crainic T G, Perboli G, Tadei R. Extreme point-based heuristics for three-dimensional bin packing. INFORMS Journal on Computing, 2008, 20(3): 368−384 doi: 10.1287/ijoc.1070.0250
[9]	Karabulut K, Inceoglu M M. A hybrid genetic algorithm for packing in 3D with deepest bottom left with fill method. In: Proceedings of the Third International Conference on Advances in Information Systems. Berlin, Heidelberg: Springer, 2004. 441−450
[10]	Gurbuz M Z, Akyokus S, Emiroglu I, Guran A. An efficient algorithm for 3D rectangular box packing. In: Proceedings of the Selected AAS 2009 Papers. Skopje, Macedonia: Society for ETAI of Republic of Macedonia, 2009. 131−134
[11]	Parreño F, Alvarez-Valdés R, Tamarit J M, Oliveira J F. A maximal-space algorithm for the container loading problem. INFORMS Journal on Computing, 2008, 20(3): 412−422 doi: 10.1287/ijoc.1070.0254
[12]	Hasan J, Kaabi J, Harrath Y. Multi-objective 3D bin-packing problem. In: Proceedings of the 8th International Conference on Modeling Simulation and Applied Optimization (ICMSAO). Manama, Bahrain: IEEE, 2019. 1−5
[13]	张德富, 魏丽军, 陈青山, 陈火旺. 三维装箱问题的组合启发式算法. 软件学报, 2007, 18(9): 2083−2089 doi: 10.1360/jos182083 Zhang De-Fu, Wei Li-Jun, Chen Qing-Shan, Chen Huo-Wang. A combinational heuristic algorithm for the three-dimensional packing problem. Journal of Software, 2007, 18(9): 2083−2089 doi: 10.1360/jos182083
[14]	何琨, 黄文奇. 基于动作空间的三维装箱问题的确定性高效率求解算法. 计算机学报, 2014, 37(8): 1786−1793 He Kun, Huang Wen-Qi. An action space based deterministic efficient algorithm for solving the three-dimensional container loading. Chinese Journal of Computers, 2014, 37(8): 1786−1793
[15]	Wu Y, Li W K, Goh M, de Souza R. Three-dimensional bin packing problem with variable bin height. European Journal of Operational Research, 2010, 202(2): 347−355 doi: 10.1016/j.ejor.2009.05.040
[16]	Yang J L, Liu H W, Liang K B, Zhou L, Zhao J H. Variable neighborhood genetic algorithm for multi-order multi-bin open packing optimization. Applied Soft Computing, 2024, 163: Article No. 111890 doi: 10.1016/j.asoc.2024.111890
[17]	Silveira M E, Vieira S M, Sousa J M D C. An ACO algorithm for the 3D bin packing problem in the steel industry. In: Proceedings of the 26th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Berlin, Heidelberg: Springer, 2013. 535−544
[18]	张德富, 彭煜, 朱文兴, 陈火旺. 求解三维装箱问题的混合模拟退火算法. 计算机学报, 2009, 32(11): 2147−2156 Zhang De-Fu, Peng Yu, Zhu Wen-Xing, Chen Huo-Wang. A hybrid simulated annealing algorithm for the three-dimensional packing problem. Chinese Journal of Computers, 2009, 32(11): 2147−2156
[19]	Tsao Y C, Tai J Y, Vu T L, Chen T H. Multiple bin-size bin packing problem considering incompatible product categories. Expert Systems With Applications, 2024, 247: Article No. 123340 doi: 10.1016/j.eswa.2024.123340
[20]	Bortfeldt A, Gehring H. Applying tabu search to container loading problems. In: Proceedings of the Operations Research Proceedings 1997. Berlin, Heidelberg: Springer, 1998. 533−538
[21]	刘胜, 沈大勇, 商秀芹, 赵红霞, 董西松, 王飞跃. 求解三维装箱问题的多层树搜索算法. 自动化学报, 2020, 46(6): 1178−1187 Liu Sheng, Shen Da-Yong, Shang Xiu-Qin, Zhao Hong-Xia, Dong Xi-Song, Wang Fei-Yue. A multi-level tree search algorithm for three dimensional container loading problem. Acta Automatica Sinica, 2020, 46(6): 1178−1187
[22]	Ren J D, Tian Y J, Sawaragi T. A tree search method for the container loading problem with shipment priority. European Journal of Operational Research, 2011, 214(3): 526−535 doi: 10.1016/j.ejor.2011.04.025
[23]	Zhu W B, Lim A. A new iterative-doubling Greedy-Lookahead algorithm for the single container loading problem. European Journal of Operational Research, 2012, 222(3): 408−417 doi: 10.1016/j.ejor.2012.04.036
[24]	Araya I, Moyano M, Sanchez C. A beam search algorithm for the biobjective container loading problem. European Journal of Operational Research, 2020, 286(2): 417−431 doi: 10.1016/j.ejor.2020.03.040
[25]	Vinyals O, Fortunato M, Jaitly N. Pointer networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press, 2015. 2692−2700
[26]	Bello I, Pham H, Le Q V, Norouzi M, Bengio S. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv: 1611.09940, 2017.
[27]	Jin Y, Ding Y D, Pan X H, He K, Zhao L, Qin T, et al. Pointerformer: Deep reinforced multi-pointer Transformer for the traveling salesman problem. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. Washington, USA: AAAI Press, 2023. 8132−8140
[28]	Tang Y H, Agrawal S, Faenza Y. Reinforcement learning for integer programming: Learning to cut. In: Proceedings of the 37th International Conference on Machine Learning. Vienna, Austria: PMLR, 2020. 9367−9376
[29]	Theodoropoulos T, Makris A, Psomakelis E, Carlini E, Mordacchini M, Dazzi P, et al. GNOSIS: Proactive image placement using graph neural networks & deep reinforcement learning. In: Proceedings of the IEEE 16th International Conference on Cloud Computing (CLOUD). Chicago, USA: IEEE, 2023. 120−128
[30]	Ahn S, Seo Y, Shin J. Deep auto-deferring policy for combinatorial optimization [Online], avaliable: https://openreview.net/forum?id=Hkexw1BtDr, March 12, 2024
[31]	Hu H Y, Zhang X D, Yan X W, Wang L F, Xu Y H. Solving a new 3D bin packing problem with deep reinforcement learning method. arXiv preprint arXiv: 1708.05930, 2017.
[32]	Duan L, Hu H Y, Qian Y, Gong Y, Zhang X D, Xu Y H, et al. A multi-task selected learning approach for solving 3D flexible bin packing problem. arXiv preprint arXiv: 1804.06896, 2019.
[33]	Verma R, Singhal A, Khadilkar H, Basumatary A, Nayak S, Singh H V, et al. A generalized reinforcement learning algorithm for online 3D bin-packing. arXiv preprint arXiv: 2007.00463, 2020.
[34]	Liu H W, Zhou L, Yang J L, Zhao J H. The 3D bin packing problem for multiple boxes and irregular items based on deep Q-network. Applied Intelligence, 2023, 53(20): 23398−23425 doi: 10.1007/s10489-023-04604-6
[35]	Li D D, Ren C W, Gu Z Q, Wang Y X, Lau F. Solving packing problems by conditional query learning [Online], avaliable: https://openreview.net/forum?id=BkgTwRNtPB, March 12, 2024
[36]	Jiang Y, Cao Z G, Zhang J. Learning to solve 3-D bin packing problem via deep reinforcement learning and constraint programming. IEEE Transactions on Cybernetics, 2023, 53(5): 2864−2875 doi: 10.1109/TCYB.2021.3121542
[37]	Zhang J W, Zi B, Ge X Y. Attend2Pack: Bin packing through deep reinforcement learning with attention. arXiv preprint arXiv: 2107.04333, 2021.
[38]	Que Q Q, Yang F, Zhang D F. Solving 3D packing problem using Transformer network and reinforcement learning. Expert Systems With Applications, 2022, 214: Article No. 119153
[39]	Parker J A, Kenyon R V, Troxel D E. Comparison of interpolating methods for image resampling. IEEE Transactions on Medical Imaging, 1983, 2(1): 31−39 doi: 10.1109/TMI.1983.4307610