尹昊 陈帆 和红杰

doi: 10.16383/j.aas.c240124 cstr: 32138.14.j.aas.c240124

    尹昊:西南交通大学信息科学与技术学院博士研究生. 2020年获得西南交通大学学士学位. 主要研究方向为强化学习, 人工智能. E-mail: haoyin@my.swjtu.edu.cn

    陈帆:西南交通大学计算机与人工智能学院副教授. 主要研究方向为机器学习, 多媒体安全和计算机应用. E-mail: fchen@swjtu.edu.cn

    和红杰:西南交通大学信息科学与技术学院教授. 主要研究方向为深度学习, 图像处理和信息安全. 本文通信作者. E-mail: hjhe@swjtu.edu.cn

A Four Directional Cooperative Three-dimensional Packing Method Based on Deep Reinforcement Learning

More Information
    Author Bio:

    YIN Hao Ph.D. candidate at the School of Information Science and Technology, Southwest Jiaotong University. He received his bachelor degree from Southwest Jiaotong University in 2020. His research interest covers reinforcement learning and artificial intelligence

    CHEN Fan Associate professor at the School of Computing and Artificial Intelligence, Southwest Jiaotong University. His research interest covers machine learning, multimedia security, and computer applications

    HE Hong-Jie Professor at the School of Information Science and Technology, Southwest Jiaotong University. Her research interest covers deep learning, image processing, and information security. Corresponding author of this paper

  • 摘要: 物流作为现代经济的重要组成部分, 在国民经济和社会发展中发挥着重要作用. 物流中的三维装箱问题(Three-dimensional bin packing problem, 3D-BPP)是提高物流运作效率必须解决的关键难题之一. 深度强化学习(Deep reinforcement learning, DRL)具有强大的学习与决策能力, 基于DRL的三维装箱方法(Three-dimensional bin packing method based on DRL, DRL-3DBP)已成为智能物流领域的研究热点之一. 现有DRL-3DBP面对大尺寸容器3D-BPP时难以达成动作空间、计算复杂性与探索能力之间的平衡. 为此, 提出一种四向协同装箱(Four directional cooperative packing, FDCP)方法: 两阶段策略网络接收旋转后的容器状态, 生成4个方向的装箱策略; 根据由4个策略采样而得的动作更新对应的4个状态, 选取其中价值最大的对应动作为装箱动作. FDCP在压缩动作空间、减小计算复杂性的同时, 鼓励智能体对4个方向合理装箱位置的探索. 实验结果表明, FDCP在100 × 100大尺寸容器以及20、30、50箱子数量的装箱问题上实现了1.2% ~ 2.9%的空间利用率提升.
  • 图  1  3D-BPP在物流中的应用场景

    Fig.  1  Application scenarios of 3D-BPP in logistics

    图  2  笛卡尔坐标系与箱子属性

    Fig.  2  Cartesian coordinate system and item properties

    图  3  容器状态表示

    Fig.  3  Representation of the bin state

    图  4  各编码器和解码器的结构

    Fig.  4  Structure of each encoder and decoder

    图  5  4种方向的装箱策略

    Fig.  5  The packing policy for the four directions

    图  6  四向协同装箱方法结构

    Fig.  6  Structure of the four directional cooperative packing method

    图  7  FDCP求解3D-SPP流程图

    Fig.  7  Flowchart of FDCP for solving 3D-SPP

    图  8  装箱结果可视化

    Fig.  8  Visualization of packing results

    图  9  FDCP在不同箱子数量算例上的测试结果

    Fig.  9  Test results of FDCP on instances with different numbers of items

    图  10  容器各区域放置次数热力图

    Fig.  10  Heat map of the number of placements in each area of the bin

    表  1  $ 100 \times 100 $容器装箱算例上的对比结果

    Table  1  Comparative results on packing instances with $ 100 \times 100 $ bin

    方法 $N=20$ $N=30$ $N=50$
    UR (%) Time (s) UR (%) Time (s) UR (%) Time (s)
    下载: 导出CSV

    表  2  $ 200 \times 200 $和$ 400 \times 200 $容器算例上各方法的空间利用率UR (%)

    Table  2  Space utilization (UR) of each method on instances with $ 200 \times 200 $ and $ 400 \times 200 $ bins (%)

    下载: 导出CSV

    表  3  消融实验结果 (%)

    Table  3  Results of ablation experiment (%)

    下载: 导出CSV
