面向飞行目标的多传感器协同探测资源调度方法

汪梦倩; 梁皓星; 郭茂耘; 陈小龙; 武艺

doi:10.16383/j.aas.c210498

面向飞行目标的多传感器协同探测资源调度方法

doi: 10.16383/j.aas.c210498

1.
重庆大学自动化学院重庆 400044

详细信息

作者简介:
汪梦倩：重庆大学自动化学院硕士研究生. 2018年获得武汉工程大学学士学位. 主要研究方向为任务调度, 机器学习. E-mail: 201813131064@cqu.edu.cn

梁皓星：重庆大学自动化学院硕士研究生. 2017年获得重庆大学学士学位. 主要研究方向为任务调度, 机器学习. E-mail: lianghaoxing841@gmail.com

郭茂耘：重庆大学自动化学院副教授. 2011年获得重庆大学博士学位. 主要研究方向为信息融合, 决策支持和系统仿真. 本文通信作者. E-mail: gmy@cqu.edu.cn

陈小龙：重庆大学自动化学院助理研究员. 主要研究方向为系统辨识, 软测量建模和机器学习. E-mail: xiaolong.chen@cqu.edu.cn

武艺：重庆大学自动化学院硕士研究生. 2017年获得重庆大学学士学位. 主要研究方向为资源调度. E-mail: 201713021031@cqu.edu.cn

计量
- 文章访问数: 1239
- HTML全文浏览量: 517
- PDF下载量: 316
- 被引次数: 0
出版历程
- 收稿日期: 2021-06-04
- 录用日期: 2022-04-07
- 网络出版日期: 2023-02-24
- 刊出日期: 2023-06-20

Resource Scheduling Method of Multi-sensor Cooperative Detection for Flying Targets

1.
School of Automation, Chongqing University, Chongqing 400044

More Information

Author Bio:
WANG Meng-Qian　Master student at the School of Automation, Chongqing University. She received her bachelor degree from Wuhan Institute of Technology in 2018. Her research interest covers task scheduling and machine learning

LIANG Hao-Xing　Master student at the School of Automation, Chongqing University. He received his bachelor degree from Chongqing University in 2017. His research interest covers task scheduling and machine learning

GUO Mao-Yun　Associate professor at the School of Automation, Chongqing University. He received his Ph.D. degree from Chongqing University in 2011. His research interest covers information fusion, decision support, and system simulation. Corresponding author of this paper

CHEN Xiao-Long　Associate professor at the School of Automation, Chongqing University. His research interest covers system identification, soft sensor modeling, and machine learning

WU Yi　Master student at the Sch-ool of Automation, Chongqing University. She received her bachelor degree from Chongqing University in 2017. Her main research interest is scheduling of resources

摘要

摘要: 针对飞行目标机动性带来的多传感器协同探测资源调度动态性需求, 提出一种新的基于近端策略优化(Proximal policy optimization, PPO)与全连接神经网络结合的多传感器协同探测资源调度算法. 首先, 分析影响多传感器协同探测资源调度的复杂约束条件, 形成评价多传感器协同探测资源调度过程指标; 然后, 引入马尔科夫决策过程(Markov decision process, MDP)模拟多传感器协同探测资源调度过程, 并为提高算法稳定性, 将Adam算法与学习率衰减算法结合, 控制学习率调整步长; 最后, 基于改进近端策略优化与全卷积神经网络结合算法求解动态资源调度策略, 并通过对比实验表明该算法的优越性.
- 多传感器协同 /
- 资源调度 /
- 马尔科夫决策过程 /
- 强化学习
Abstract: Aiming at the dynamic demand of multi-sensor cooperative detection resource scheduling brought by the maneuverability of flying targets, a new multi-sensor cooperative detection resource scheduling algorithm based on proximal policy optimization (PPO) and fully connected neural network is proposed. In this paper, we first build a constraint index model that affects the scheduling of multi-sensor cooperative detection resources. Next, we introduce the Markov decision process (MDP) to simulate the multi-sensor cooperative detection resource scheduling process, and in order to improve the stability of the algorithm, the Adam algorithm is combined with the learning rate attenuation algorithm to control the up-to-date step of learning rate. Finally, the optimal resource scheduling strategy is solved based on the improved proximal policy optimization and fully connected neural network algorithm, and the comparative experiment shows the superiority of the algorithm proposed in this paper.
- Multi-sensor cooperative /
- resource scheduling /
- Markov decision process (MDP) /
- reinforcement learning

HTML全文

图 1 多传感器探测资源调度过程中复杂约束条件

Fig. 1 Complex constraints in the process of multi-sensor resources schedule

下载: 全尺寸图片幻灯片

图 2 多传感器资源调度时序决策

Fig. 2 Multi-sensor resources scheduling sequential decision-making

下载: 全尺寸图片幻灯片

图 3 $t$时刻传感器动作空间

Fig. 3 Action space of sensors at $t$ moment

下载: 全尺寸图片幻灯片

图 4 全连接神经网络结构图

Fig. 4 Structure of fully connected neural network

下载: 全尺寸图片幻灯片

图 5 基于改进PPO-FCNN的多传感器协同探测资源调度算法训练示意图

Fig. 5 Training algorithm for multi-sensor cooperative detection resource scheduling based on improved PPO-FCNN

下载: 全尺寸图片幻灯片

图 6 基于改进PPO-FCNN的多传感器协同探测资源动态调度算法流程

Fig. 6 Process of multi-sensor cooperative detection dynamic scheduling based on improved PPO-FCNN

下载: 全尺寸图片幻灯片

图 7 评价指标层次结构模型

Fig. 7 Hierarchical model of evaluation indexs

下载: 全尺寸图片幻灯片

图 8 面向不同传感器数量的不同算法训练效果

Fig. 8 Training effects of different algorithms for different sensor numbers

下载: 全尺寸图片幻灯片

图 9 面向不同传感器数量的收敛时间对比

Fig. 9 Comparison of convergence time for different sensor numbers

下载: 全尺寸图片幻灯片

表 1 各层次神经网络参数

Table 1 Parameters of neural network at various layers

层次名	隐元个数	激活函数
FCCN_1	100	ReLU
FCCN_2	200	ReLU
FCCN_3	200	Tanh
FCCN_4	200	Tanh
Softmax	$num$	Softmax

下载: 导出CSV

表 2 仿真参数设置

Table 2 Simulation parameters

参数配置	数值
Actor学习率	0.0001
Critic学习率	0.0002
衰减因子	0.9
最小样本数	64
更新间隔	10 次
裁剪函数参数$\varepsilon$	0.2

下载: 导出CSV

表 3 飞行目标状态参数

Table 3 Parameters of flight target status

飞行目标参数	取值范围
横坐标$x_2^{(t)}$	97 ~ 30
纵坐标$y_2^{(t)}$	814 ~ 348
高度$z_2^{(t)}$	168 ~ 400

下载: 导出CSV

表 4 第$i$号探测设备状态参数

Table 4 Status parameters of No. $i$ detection equipment

第$i$号探测设备参数	取值
通视性$vis_{i,t}$	1或0
最大探测范围$Dis_i^{Max}$	0 ~ 400
最大可工作时长$Store_i^{Max}$	20
最大切换次数$ht$	12
优先级$pre_{i,t}$	1或0

下载: 导出CSV

表 5 传感器约束层次总排序表

Table 5 Hierarchical sorting summary for constraints of sensors

约束分类	权重	复杂约束	权重	$ {\alpha _1} $					$ {\alpha _2} $
约束分类	权重	复杂约束	权重	$ {\beta _1} $	$ {\beta _2} $	$ {\beta _3} $	$ {\beta _4} $	$ {\beta _5} $	$ {\alpha _2} $
单传感器约束	0.5	探测性能约束	0.5	0.4	0.4	0.2	0	0	0
单传感器约束	0.5	探测效率约束	0.5	0	0	0.5	0.5	0	0
多传感器约束	0.5	关联约束	1.0	0	0	0	0	0	1.0
层次总排序				0.1	0.1	0.05	0.125	0.125	0.5

下载: 导出CSV

表 6 随机一致性指标

Table 6 Random consistent index

$ n $	${\rm{RI}}$
1	0
2	0
3	0.58
4	0.90
5	1.12
6	1.24
7	1.32
8	1.41
9	1.45

下载: 导出CSV

表 7 不同算法训练至收敛的迭代次数

Table 7 Iteration numbers of training to convergence for different algorithms

场景	改进PPO-FCNN	PPO-FCNN	DQN	遗传算法
面向10个传感器	10300	7133	38000	29000
面向15个传感器	10000	10712	42000	33000
面向20个传感器	10418	1935	26000	28000

下载: 导出CSV

表 8 改进PPO-FCNN面向不同传感器数量的收敛时间幅度对比(%)

Table 8 Comparison of convergence time amplitude of improved PPO-FCNN for different sensor numbers (%)

场景	PPO-FCNN	DQN	遗传算法
面向10个传感器	39.00	–68.90	–59.10
面向15个传感器	–0.06	–72.30	–62.90
面向20个传感器	4.10	–54.15	–56.30

下载: 导出CSV

参考文献(38)

[1]	韩志钢, 卿利. 多节点传感器协同探测技术综述与展望. 电讯技术, 2020, 60(3): 358—364 doi: 10.3969/j.issn.1001-893x.2020.03.020 Han Zhi-Gang, Qing Li. Overview and prospect of cooperative detection technology for multi-node＇s sensors. Telecommunication Engineering, 2020, 60(3): 358—364 doi: 10.3969/j.issn.1001-893x.2020.03.020
[2]	范成礼, 付强, 宋亚飞. 临空高速目标多传感器自主协同资源调度算法. 军事运筹与系统工程, 2018, 32(4): 45—50 doi: 10.3969/j.issn.1672-8211.2018.04.010 Fan Cheng-Li, Fu Qiang, Song Ya-Fei. Multi sensor autonomous collaborative resource scheduling algorithm for high-speed target in the air. Military Operations Research and Systems Engineering, 2018, 32(4): 45—50 doi: 10.3969/j.issn.1672-8211.2018.04.010
[3]	高嘉乐, 邢清华, 梁志兵. 空天高速目标探测跟踪传感器资源调度模型与算法. 系统工程与电子技术, 2019, 41(10): 2243—2251 doi: 10.3969/j.issn.1001-506X.2019.10.13 Gao Jia-Le, Xing Qing-Hua, Liang Zhi-Bing. Multiple sensor resources scheduling model and algorithm for high speed target tracking in aerospace. System Engineering and Electronics, 2019, 41(10): 2243—2251 doi: 10.3969/j.issn.1001-506X.2019.10.13
[4]	徐伯健, 李昌哲, 卜德锋, 符京杨. 基于多目标规划的GNSS地面站任务资源优化. 无线电工程, 2016, 46(7): 45—48 doi: 10.3969/j.issn.1003-3106.2016.07.12 Xu Bo-Jian, Li Chang-Zhe, Bu De-Feng, Fu Jing-Yang. Optimization of GNSS ground station task resources based on multi-objective programming. Radio Engineering, 2016, 46(07): 45—48 doi: 10.3969/j.issn.1003-3106.2016.07.12
[5]	陈明, 周云龙, 刘晋飞, 靳文瑞. 基于MDP的多Agent生产线动态调度策略. 机电一体化, 2017, 23(11): 15—19, 56 doi: 10.16413/j.cnki.issn.1007-080x.2017.11.003 Chen Ming, Zhou Yun-Long, Liu Jin-Fei, Jin Wen-Rui. Dynamic scheduling strategy of multi-agent productionLine based on MDP. Mechatronics, 2017, 23(11): 15—19, 56 doi: 10.16413/j.cnki.issn.1007-080x.2017.11.003
[6]	Wei W, Fan X, Song H, Fan X, Yang J. Imperfect information dynamic stackelberg game based resource allocation using hidden Markov for cloud computing. IEEE Transactionson Services Computing, 2018, 11(99): 78—89
[7]	Afzalirad M, Shafipour M. Design of an efficient genetic algorithm for resource constrained unrelated parallel machine scheduling problem with machine eligibility restrictions. Journal of Intelligent Manufacturing, 2018: 1—15
[8]	Asghari A, Sohrabi M K, Yaghmaee F. Task scheduling, resource provisioning and load balancing on scientific workflows using parallel SARSA reinforcement learning agents and genetic algorithm. The Journal of Supercomputing, 2020: 1—29
[9]	孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题. 自动化学报, 2020, 46(7): 1301—1312 doi: 10.16383/j.aas.c200159 Sun Chang-Yin, Mu Chao-Xu. Important scientific problems of multi-agent deep reinforcement learning. Acta Automatica Sinica, 2020, 46(7): 1301—1312 doi: 10.16383/j.aas.c200159
[10]	梁星星, 冯旸赫, 马扬, 程光权, 黄金才, 王琦, 等. 多Agent深度强化学习综述. 自动化学报, 2020, 46(12): 2537—2557 doi: 10.16383/j.aas.c180372 Liang Xing-Xing, Feng Yang-He, Ma Yang, Cheng Guang-Quan, Huang Jin-Cai, Wang Qi, et al. Deep multi-agent reinforcement learning: A survey. Acta Automatica Sinica, 2020, 46(12): 2537—2557 doi: 10.16383/j.aas.c180372
[11]	Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529—533 doi: 10.1038/nature14236
[12]	Volodymyr M, Koray K, David S, Alex G, Ioannis A, Daan W, et al. Playing Atari with deep reinforcement learning [Online], available: https://arxiv.org, December 19, 2013
[13]	Hado V H, Arthur G, David S. Deep reinforcement learning with double Q-learning [Online], available: https://arxiv.org, December 8, 2015
[14]	John S, Sergey L, Philipp M, Michael I J, Pieter A. Trust region policy optimization [Online], available: https://arxiv.org, April 20, 2017
[15]	Wu Y H, Elman M, Shun L, Roger G, Jimmy B. Scaleable trust-region method for deep reinforcement learning using Kronecker-factored approximate [Online], available: https://arxiv.org, August 18, 2017
[16]	Nicolas H, Dhruva T B, Srinivasan S, Jay L, Josh M, Greg W, et al. Emergence of locomotion behaviours in rich environments [Online], available: http://www.arXiv.org, July 10, 2017
[17]	Gao J L, Ye W J, Guo J, Li Z J. Deep reinforcement learning for indoor mobile robot path planning. Sensors, 2020, 20(19): 5493 doi: 10.3390/s20195493
[18]	Shi X G, Timothy L, Ilya S, Sergey L. Continuous deep Q-learning with model-based acceleration [Online], available: http://www.arXiv.org, May 2, 2016
[19]	Timothy P L, Jonathan J H, Alexander P, Nicolas H, Tom E, Yuval T, et al. Continuous control with deep reinforcement learning [Online], available: http://www.arXiv.org, July 5, 2019
[20]	Zhan Y F, Guo S, Li P, Zhang J. A Deep reinforcement learning based offloading game in edge computing. IEEE Transactions on Computers, 2020, 69(6): 883—893 doi: 10.1109/TC.2020.2969148
[21]	Gaudet B, Linares R, Furfaro R. Deep reinforcement learning for six degree-of–freedom planetary landing. Advances in Space Research, 2020, 65(7): 1723—1741 doi: 10.1016/j.asr.2019.12.030
[22]	Tang F, Zhou Y, Kato N. Deep reinforcement learning for dynamic uplink/downlink resource allocation in high mobility 5G hetnet. IEEE Journal on Selected Areas in Communications, 2020, 38(12): 2773—2782 doi: 10.1109/JSAC.2020.3005495
[23]	周飞燕, 金林鹏, 董军. 卷积神经网络研究综述. 计算机学报, 2017, 40(06): 1229—251 doi: 10.11897/SP.J.1016.2017.01229 Zhou Fei-Yan, Jin Lin-Peng, Dong Jun. A review of convolutional neural networks. Journal of Computer Science, 2017, 40(06): 1229—251 doi: 10.11897/SP.J.1016.2017.01229
[24]	马丁 T. 哈根, 霍华德 B. 德姆斯, 马克 H. 比乐. 神经网络设计. 北京: 机械工业出版社, 2002. 78−89 Martin T. Hagen, Howard B. Demuth, Mark H. Beale. Neural Network Design. Beijing: China Machine Press, 2002. 78−89
[25]	董晨, 刘兴科, 周金鹏, 陆志沣. 导弹防御多传感器协同探测任务规划. 现代防御技术, 2018, 46(6): 57—63 doi: 10.3969/j.issn.1009-086x.2018.06.009 Dong Chen, Liu Xing-Ke, Zhou Jin-Peng, Lu Zhi-Pei. Cooperative detection task programming of multi sensor for ballistic missile defense. Modern Defense Technology, 2018, 46(6): 57—63 doi: 10.3969/j.issn.1009-086x.2018.06.009
[26]	倪鹏, 王刚, 刘统民, 孙文. 反导作战多传感器任务规划技术. 火力与指挥控制, 2017, 42(8): 1—5 doi: 10.3969/j.issn.1002-0640.2017.08.001 Ni Peng, Wang Gang, Liu Tong-Min, Sun Wen. Research on layered decision-making of multi-sensors planning based on heterogeneous MAS in anti-TBM combat. Fire Control and Command Control, 2017, 42(8): 1—5 doi: 10.3969/j.issn.1002-0640.2017.08.001
[27]	李志汇, 刘昌云, 倪鹏, 于洁, 李松. 反导多传感器协同任务规划综述. 宇航学报, 2016, 37(1): 29—38 Li Zhi-Hui, Liu Chang-Yun, Ni Peng, Yu Jie, Li song. Review on multisensor cooperative mission planning in anti-TBM System. Joumal of Astronautics, 2016, 37(1): 29—38
[28]	唐俊林, 张栋, 王玉茜, 刘莉. 防空作战多传感器任务规划算法设计. 无人系统技术, 2019, 2(5): 46—55 doi: 10.19942/j.issn.2096-5915.2019.05.007 Tang Jun-Lin, Zhang Dong, Wang Yu-Qian, Liu Li. Research on multi-sensor task planning algorithms for air defense operations. Unmanned System Technology, 2019, 2(5): 46—55 doi: 10.19942/j.issn.2096-5915.2019.05.007
[29]	谢红卫, 张明. 航天测控系统. 北京: 国防科技大学出版社, 2000. 100−109 Xie Hong-Wei, Zhang Ming. Space TT&C System. Beijing: National University of Defense Technology Press, 2000. 100−109
[30]	郭茂耘. 航天发射安全控制决策的空间信息分析与处理研究 [博士论文], 重庆大学, 中国, 2011 Guo Mao-Yun. Study on Spatial Information Analysis and Processing of the Dicision-making for Launching Safety Control [Ph.D. dissertation], Chongqing University, China, 2011
[31]	梁皓星. 基于深度强化学习的飞行目标探测传感器资源调度方法研究 [硕士论文], 重庆大学, 中国, 2020 Liang Hao-Xing. Research on Flight Target Detection Sensor Resource Scheduling Method Based on Deep Reinforcement Learning [Master thesis], Chongqing University, China, 2020
[32]	刘建伟, 高峰, 罗雄麟. 基于值函数和策略梯度的深度强化学习综述. 计算机学报, 2019, 42(6): 1406—1438 doi: 10.11897/SP.J.1016.2019.01406 Liu Jian-Wei, Gao Feng, Luo Xiong-Lin. Survey of deep reinforcement learning based on value function and policy gradient. Chinese Journal of Computers, 2019, 42(6): 1406—1438 doi: 10.11897/SP.J.1016.2019.01406
[33]	John S, Filip W, Prafulla D, Alec R, Oleg K. Proximal policy optimization algorithms [Online], available: http://www.arXiv.org, August 28, 2017
[34]	Kingma D, Ba J. Adam: A method for stochastic optimization [Online], available: http://www.arXiv.org, January 30, 2017
[35]	Sun R Y. Optimization for deep learning: An overview. Journal of the Operations Research Society of China, 2020, 8(2): 249—294 doi: 10.1007/s40305-020-00309-6
[36]	Tuomas H, Aurick Z, Pieter A, Sergey L. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor [Online], available: http://www.arXiv.org, August 8, 2018
[37]	Wang H J, Yang Z, Zhou W G, Li D L. Online scheduling of image satellites based on neural networks and deep reinforcement learning. Chinese Journal of Aeronautics, 2019, 32(4): 1011—1019 doi: 10.1016/j.cja.2018.12.018
[38]	王肖宇. 基于层次分析法的京沈清文化遗产廊道构建 [博士论文], 西安建筑科技大学, 中国, 2009 Wang Xiao-Yu. Creation of Beijing-Shenyang Qing (Dynasty) Cultural Heritage Corridor Based on Analytic Hierarchy Process [Ph.D. dissertation], Xi＇an University of Architecture and Technology, China, 2009