2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

支持重规划的战时保障动态调度研究

曾斌 樊旭 李厚朴

曾斌, 樊旭, 李厚朴. 支持重规划的战时保障动态调度研究. 自动化学报, 2021, x(x): 1−11 doi: 10.16383/j.aas.c200416
引用本文: 曾斌, 樊旭, 李厚朴. 支持重规划的战时保障动态调度研究. 自动化学报, 2021, x(x): 1−11 doi: 10.16383/j.aas.c200416
Zeng Bin, Fan Xu, Li Hou-Pu. Research of dynamic scheduling with re-planning for wartime logistics support. Acta Automatica Sinica, 2021, x(x): 1−11 doi: 10.16383/j.aas.c200416
Citation: Zeng Bin, Fan Xu, Li Hou-Pu. Research of dynamic scheduling with re-planning for wartime logistics support. Acta Automatica Sinica, 2021, x(x): 1−11 doi: 10.16383/j.aas.c200416

支持重规划的战时保障动态调度研究

doi: 10.16383/j.aas.c200416
基金项目: 国家自然科学基金(41771487), 湖北省杰出青年科学基金(2019CFA086)资助
详细信息
    作者简介:

    曾斌:海军工程大学教授. 2006年获得华中科技大学博士学位, 主要研究方向为信息管理和装备保障. E-mail: zbtrueice@126.com

    樊旭:海军工程大学硕士研究生.主要研究方向为信息管理. E-mail: fanxu926@163.com

    李厚朴:海军工程大学副教授. 2010年获得海军工程大学博士学位, 主要研究方向为导航技术和计算机代数分析. E-mail: lihoupu1985@126.com

Research of Dynamic Scheduling with Re-planning for Wartime Logistics Support

Funds: Supported by National Natural Science Foundation of P. R. China (41771487), Hubei Science Foundation of for distinguished Young Scholars (2019CFA086)
More Information
    Author Bio:

    ZENG Bin Professor at Naval University of Engineering. He received his Ph. D. degree from Huazhong University of Science and Technology in 2006. His research interest covers information management and equipment maintenance

    FAN Xu Master student at Naval University of Engineering. His research interest covers information management

    LI Hou-Pu Associate professor at Naval University of Engineering. He received his Ph. D. degree from Naval University of Engineering in 2010. His research interest covers navigation and computer algebra analysis

  • 摘要: 复杂多变的战场环境要求后装保障能够根据战场环境变化, 预见性地做出决策, 为此提出了基于强化学习的动态调度方法. 为了准确描述保障调度问题, 提出了支持抢占调度、重分配及重部署决策的马尔可夫决策过程模型, 模型中综合考量了任务排队、保障优先级以及油料约束等诸多问题的影响; 随后设计了改进策略迭代算法, 训练基于神经网络的保障调度模型; 训练后的神经网络模型能够近似计算状态价值函数, 从而求解出产生最大期望价值的优化调度策略. 最后设计了一个分布式战场保障仿真实验, 通过与常规调度策略的对比, 验证了动态调度算法具有良好的自适应性和自主学习能力, 能够根据历史数据和当前态势预判后续变化, 并重新规划和配置保障资源的调度方案.
  • 图  1  神经网络结构示例

    Fig.  1  Illustration of neural network structure

    图  2  策略迭代算法流程

    Fig.  2  Workflow of policy iteration algorithm

    图  3  实验案例

    Fig.  3  Experiment case

    图  4  相较于比对策略A的性能改进

    Fig.  4  Improvement compared to policy A

    图  5  响应时间变化情况

    Fig.  5  Response delay with time

    图  6  保障申请发生率的敏感性分析

    Fig.  6  Sensitivity analysis of request rate

    表  1  仿真实验结果

    Table  1  Simulation result

    N K β Impr1,A Impr1,B Impr2,A Impr2,B
    30 500 0.3 8.2±0.2 5.3±0.2 6.9±0.2 3.1±0.1
    35 1000 0.3 8.6±0.2 5.6±0.1 7.8±0.1 3.2±0.2
    25 2000 0.3 9.5±0.2 6.5±0.2 7.2±0.2 4.2±0.1
    18 4000 0.3 9.3±0.2 6.4±0.2 5.1±0.1 2.9±0.1
    10 500 0.5 8.3±0.2 5.3±0.1 6.9±0.2 2.7±0.2
    9 1000 0.5 8.3±0.2 5.3±0.1 7.4±0.2 2.3±0.2
    28 2000 0.5 9.6±0.2 6.6±0.2 6.8±0.1 2.7±0.1
    12 4000 0.5 8.2±0.2 5.2±0.1 7.1±0.1 4.2±0.1
    10 500 0.7 8.4±0.2 5.4±0.1 5.9±0.2 4.1±0.2
    6 1000 0.7 8.0±0.1 5.1±0.1 7.8±0.1 4.5±0.1
    25 2000 0.7 9.0±0.2 6.0±0.2 8.1±0.2 4.4±0.2
    4 4000 0.7 8.4±0.2 5.4±0.1 6.4±0.2 4.3±0.2
    30 500 0.9 8.1±0.2 5.2±0.1 5.9±0.1 2.1±0.1
    7 1000 0.9 7.8±0.1 4.9±0.1 7.2±0.1 4.9±0.2
    33 2000 0.9 9.4±0.2 6.4±0.2 5.7±0.2 3.4±0.1
    25 4000 0.9 8.4±0.1 5.4±0.1 5.3±0.2 2.9±0.2
    下载: 导出CSV

    表  2  动态算法和比对策略的性能比较

    Table  2  Comparison of the algorithm and policies

    调度策略响应时间(mins)性能相对改进量(%)
    紧急申请一般申请
    比对策略A56.3±0.256.8±0.2
    比对策略B53.4±0.157.7±0.22.9±0.1
    动态调度48.1±0.157.9±0.19.6±0.2
    下载: 导出CSV
  • [1] 昝翔, 陈春良, 张仕新, 王铮, 刘彦. 多约束条件下战时装备维修任务分配方法. 兵工学报, 2017, 38(8): 1603−1609 doi: 10.3969/j.issn.1000-1093.2017.08.019

    Zan X, Chen C L, Zhang S X, Wang Z, Liu Y. Task Allocation Method for Wartime Equipment Maintenance under Multiple Constraint Conditions. Acta Armamentarii, 2017, 38(8): 1603−1609 doi: 10.3969/j.issn.1000-1093.2017.08.019
    [2] 何岩, 赵劲松, 王少聪, 刘亚东, 周玄. 基于维修优先级的战时装备维修保障力量优化调度. 军事交通学院学报, 2019, 21(5): 42−46

    He Y, Zhao J S, Wang S C, Liu Y D, Zhou X. Maintenance Priority-Based Optimization and Scheduling of Equipment Maintenance Support Strength in Wartime. Journal of Military Transportation University, 2019, 21(5): 42−46
    [3] 曹继平, 宋建社, 王正元, 黄超. 战时装备维修保障资源优化调度方法研究. 系统工程与电子技术, 2007, 29(6): 915−919 doi: 10.3321/j.issn:1001-506X.2007.06.019

    Chao J P, Song J S, Wang Z Y, Huang C. Study on optimization dispatching method of the equipment maintenance support resources in wartime. Systems Engineering and Electronics, 2007, 29(6): 915−919 doi: 10.3321/j.issn:1001-506X.2007.06.019
    [4] 曾斌, 姚路, 胡炜. 考虑不确定因素影响的保障任务调度算法. 系统工程与电子技术, 2016, 38(3): 595−601

    Zeng B, Yao L, Hu W. Scheduling Algorithm for Maintenance Tasks under uncertainty. Systems Engineering and Electronics, 2016, 38(3): 595−601
    [5] 任帆, 吕学志, 王宪文, 曲长征. 巡回修理中的维修任务调度策略. 火力与指挥控制, 2013, 38(12): 171−175 doi: 10.3969/j.issn.1002-0640.2013.12.045

    Ren F, Lv X Z, Wang X W, Qu C Z. Research on Maintenance Task Scheduling Strategies in Contact Repairing. Fire Control & Command Control, 2013, 38(12): 171−175 doi: 10.3969/j.issn.1002-0640.2013.12.045
    [6] 陈春良, 刘彦, 王生凤, 陈伟龙, 等. 装备维修任务调度研究综述. 装甲兵工程学院学报, 2018, 32(1): 2−11

    Chen C L, Liu Y, Wang S F, Chen W L. Summary of Research on Equipment Maintenance Task Scheduling. Journal of Academy of Armored Force Engineering, 2018, 32(1): 2−11
    [7] 师娇, 刘宸宁, 冷德新, 郑鑫, 李宝玉. 面向未来作战的装备智能化保障模式研究. 兵器装备工程学报, 2019, 29(10): 226−229

    Shi J, Liu C N, Leng D X, Li B Y. Weapon Intelligent Support Mode of Future War. Journal of Ordnance Equipment Engineering, 2019, 29(10): 226−229
    [8] 梁星星, 冯旸赫, 马扬, 程光权, 黄金才, 王琦, 等. 多Agent深度强化学习综述. 自动化学报, 2020, 46(12): 2537−2537

    Liang X X, Feng Y H, Ma Y, Cheng G Q, Huang J C, Wang Q, et al. Deep multi-agent reinforcement learning: a survey. Acta Automatica Sinica, 2020, 46(12): 2537−2537
    [9] 孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题. 自动化学报, 2020, 46(7): 1301−1301

    Sun C Y, Mu C X. Important scientific problems of multi-agent deep reinforcement learning. Acta Automatica Sinica, 2020, 46(7): 1301−1301
    [10] Ji S, Zheng Y, Wang Z, Li T. A Deep Reinforcement Learning-Enabled Dynamic Redeployment System for Mobile Ambulances. In: Proceeding of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Washington: IEEE Press, 2019. 1−20
    [11] Hamasha M M, Rumbe G. Determining optimal policy for emergency department using Markov decision process. World Journal of Engineering, 2017, 14(5): 467−472 doi: 10.1108/WJE-12-2016-0148
    [12] Ni Y, Wang K, Zhao L. A Markov decision process model of allocating emergency medical resource among multi-priority injuries. International Journal of Mathematics in Operational Research, 2017, 10(1): 1−17 doi: 10.1504/IJMOR.2017.080738
    [13] Lee E K, Viswanathan H, Pompili D. RescueNet: Reinforcement-learning-based communication framework for emergency networking. Computer Networks, 2016, 98(5): 14−28
    [14] Nadi A, Edrisi A. Adaptive multi-agent relief assessment and emergency response. International journal of disaster risk reduction, 2017, 24(6): 12−23
    [15] Keneally S K, Robbin M J, Lunday B J. A markov decision process model for the optimal dispatch of military medical evacuation assets. Health care management science, 2016, 19(2): 111−129 doi: 10.1007/s10729-014-9297-8
    [16] Huang Q, Huang R, Hao W, et al. Adaptive Power System Emergency Control Using Deep Reinforcement Learning. IEEE Transactions on Smart Grid, 2020, 11(2): 1171−1182 doi: 10.1109/TSG.2019.2933191
    [17] 王云鹏, 郭戈. 基于深度强化学习的有轨电车信号优先控制. 自动化学报, 2019, 45(12): 2366−2366

    Wang Y P, Guo G. Signal priority control for trams using deep reinforcement learning. Acta Automatica Sinica, 2019, 45(12): 2366−2366
    [18] Wang C, Ju P, Lei S, et al. Markov Decision Process-based Resilience Enhancement for Distribution Systems: An Approximate Dynamic Programming Approach. IEEE Transactions on Smart Grid, 2020, 11(3): 2498−2510 doi: 10.1109/TSG.2019.2956740
    [19] Senn M, Link N, Pollak J, et al. Reducing the computational effort of optimal process controllers for continuous state spaces by using incremental learning and post-decision state formulations. Journal of Process Control, 2014, 24(3): 133−143 doi: 10.1016/j.jprocont.2014.01.002
    [20] Berthier E, Bach F. Max-plus linear approximations for deterministic continuous-state markov decision processes. IEEE Control Systems Letters, 2020, 4(3): 767−772 doi: 10.1109/LCSYS.2020.2973199
    [21] Li J, Zhou Y, Chen H, et al. Age of Aggregated Information: Timely Status Update with Over-the-Air Computation. In: Proceeding of GLOBECOM 2020-2020 IEEE Global Communications Conference. Washington: IEEE Press, 2020. 1−6
    [22] Wang C, Lei S, Ju P, et al. MDP-based distribution network reconfiguration with renewable distributed generation: Approximate dynamic programming approach. IEEE Transactions on Smart Grid, 2020, 11(4): 3620−3631 doi: 10.1109/TSG.2019.2963696
    [23] Sharma H, Jain R, Gupta A. An empirical relative value learning algorithm for non-parametric mdps with continuous state space. In: Proceeding of 2019 18th European Control Conference (ECC), Washington: IEEE Press, 2019, 1368−1373
    [24] Madjiheurem S, Toni L. Representation learning on graphs: A reinforcement learning application. In: Proceeding of the 22nd International Conference on Artificial Intelligence and Statistics. Okinawa: PMLR, 2019, 3391−3399
    [25] Bai W, Zhou Q, Li T, et al. Adaptive reinforcement learning neural network control for uncertain nonlinear system with input saturation. IEEE transactions on cybernetics, 2019, 50(8): 3433−3443
    [26] Bertsekas D. Multiagent reinforcement learning: Rollout and policy iteration. IEEE/CAA Journal of Automatica Sinica, 2021, 8(2): 249−272 doi: 10.1109/JAS.2021.1003814
  • 加载中
计量
  • 文章访问数:  43
  • HTML全文浏览量:  20
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-06-14
  • 修回日期:  2020-08-18
  • 网络出版日期:  2021-03-11

目录

    /

    返回文章
    返回