2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

无人飞行器集群智能调度技术综述

杜永浩 邢立宁 蔡昭权

蒋芸, 谭宁.基于条件深度卷积生成对抗网络的视网膜血管分割.自动化学报, 2021, 47(1): 136−147 doi: 10.16383/j.aas.c180285
引用本文: 杜永浩, 邢立宁, 蔡昭权. 无人飞行器集群智能调度技术综述. 自动化学报, 2020, 46(2): 222-241. doi: 10.16383/j.aas.c170681
Jiang Yun, Tan Ning. Retinal vessel segmentation based on conditional deep convolutional generative adversarial networks. Acta Automatica Sinica, 2021, 47(1): 136−147 doi: 10.16383/j.aas.c180285
Citation: DU Yong-Hao, XING Li-Ning, CAI Zhao-Quan. Survey on Intelligent Scheduling Technologies for Unmanned Flying Craft Clusters. ACTA AUTOMATICA SINICA, 2020, 46(2): 222-241. doi: 10.16383/j.aas.c170681

无人飞行器集群智能调度技术综述

doi: 10.16383/j.aas.c170681
基金项目: 

国家自然科学基金 61773120

国家自然科学基金 61873328

国家自然科学基金 61772225

国家杰出青年科学基金 61525304

高等学校全国优秀博士学位论文作者专项资金 2014-92

广东省自然科学基金 2018B030311046

广东省自然科学杰出青年基金 2017KZDXM081

湖南省研究生科研创新项目 CX2018B022

详细信息
    作者简介:

    杜永浩   国防科技大学系统工程学院博士研究生. 2017年获国防科技大学硕士学位.主要研究方向为智能优化理论、方法与应用. E-mail: duyonghao15@163.com

    蔡昭权   惠州学院教授. 2006年获华中科技大学硕士学位.主要研究方向为计算机网络, 智能计算和数据库. E-mail: 13502279833@126.com

    通讯作者:

    邢立宁   国防科技大学系统工程学院研究员, 全国优秀博士学位论文获得者, 入选教育部新世纪优秀人才计划, 湖南省自然科学杰出青年基金获得者. 2009年获国防科技大学博士学位.主要研究方向为智能优化理论、方法与应用.本文通信作者. E-mail: xinglining@gmail.com

Survey on Intelligent Scheduling Technologies for Unmanned Flying Craft Clusters

Funds: 

Supported by National Natural Science Foundation of China 61773120

Supported by National Natural Science Foundation of China 61873328

Supported by National Natural Science Foundation of China 61772225

National Science Fund for Distinguished Young Scholars 61525304

National Excellent Doctoral Dissertation Foundation of China 2014-92

Natural Science Foundation of Guangdong 2018B030311046

Foundation for Distinguished Young Talents in Higher Education of Guangdong 2017KZDXM081

Hunan Postgraduate Research Innovation Project CX2018B022

More Information
    Author Bio:

    DU Yong-Hao Ph. D. candidate at the College of Systems Engineering, National University of Defense Technology. He received his master degree from National University of Defense Technology in 2017. His research interest covers intelligent optimization theory, method and application

    CAI Zhao-Quan Professor at Huizhou University. He received his master degree from Huazhong University of Science and Technology in 2006. His research interest covers computer networks, intelligent computing and database

    Corresponding author: XING Li-Ning Professor at the College of Systems Engineering, National University of Defense Technology. He was awarded with the National Excellent Ph. D. Dissertation of China and the New Century Excellent Researcher of Ministry of Education. He is also supported by the Natural Science Funds for Distinguished Young Scholar of Hunan Province. He received his Ph. D. degree from National University of Defense Technology in 2009. His research interest covers intelligent optimization theory, method and application. Corresponding author of this paper
  • 摘要: 随着飞行器技术的快速发展, 以无人机和卫星为代表的无人飞行器在集群任务中得到广泛应用, 但日益增长的多样化任务需求和不平衡、不充足的任务资源也对无人飞行器集群调度技术提出新的挑战.针对无人飞行器任务类型特点, 分别从无人机群和多星两个角度出发, 介绍了无人机群访问、打击和察打一体化任务调度技术进展, 阐述了多星成像、数传与天地一体化任务调度研究成果.同时, 梳理了无人机群和多星任务调度问题的主要约束与收益指标, 综述了无人机群和多星任务调度常用的智能优化算法.最后, 面向未来无人飞行器技术应用需求, 指出了无人飞行器集群智能调度技术进一步的研究方向.
    Recommended by Associate Editor SUN Fu-Chun
  • 基因突变是由DNA分子中碱基对发生增添、缺失或替换而引起的基因结构变化. 基因突变具有随机性, 是一种可遗传的变异现象. 致病基因突变通过阻止一种或多种蛋白质正常工作扰乱正常发育过程或导致疾病. 癌症是由控制细胞功能的基因突变引起的一系列相关疾病的统称. 导致癌症的基因突变可能遗传自父母, 也可能是人体自身受致癌环境或致癌物质刺激导致细胞分裂时产生的错误. 一般来说, 癌细胞比正常细胞有更多的基因突变. 乳腺癌是世界上最常见的疾病之一, 2018年新增乳腺癌患者约20亿人[1]. 医学领域的多项研究表明, BRCA1、BRCA2和PALB2基因的突变会导致乳腺癌风险增加, 其他与乳腺癌患病风险相关的基因突变包括ATM、TP53、PTEN等. 因此, 从乳腺癌组学数据中挖掘出与其密切相关的致病基因对乳腺癌的临床诊断、预后和治疗有着深远意义.

    在生物信息学中, 癌症致病基因预测通过基因排序方法实现. 基于网络相似度的基因排序算法通过分析多种基因−疾病网络中的局部、全局信息, 计算基因与疾病之间的相似性, 从而对基因进行排序. 例如, Kohler等[2]提出重启随机游走算法利用网络全局拓扑信息对致病基因进行预测; Xu等[3]提出多路径随机游走的网络嵌入模型对异构网络进行致病基因预测. 这些方法过度依赖网络拓扑信息, 不能对网络外的基因进行预测, 且对癌症数据中的噪声比较敏感. 随着机器学习理论的发展, 基于机器学习的基因排序方法利用监督学习或非监督学习方式实现基因预测, 能够挖掘到与癌症相关的致病基因, 被广泛应用于癌症致病基因的预测. 例如Han等[4]将图卷积网络和矩阵分解结合提出一种疾病基因关联任务框架; Natarajan等[5]将推荐系统中的归纳矩阵补全用于预测基因与疾病的相关性.

    在乳腺癌致病基因预测方面, 自然启发式算法应用较广, 例如粒子群优化 (Particle swarm optimization, PSO)、遗传算法等. Sahu等[6]提出一种基于PSO的基因选择算法, 首先采用$ k $均值聚类方法对数据集进行聚类, 利用信噪比评分对聚类簇中的基因进行排序, 然后从每个聚类簇中收集得分最高的基因生成新的特征子集, 最后将新特征子集作为PSO的输入, 生成优化后的特征子集. Malar等[7]通过将关联特征选择方法和改进的二进制PSO结合选择致病基因, 同时解决了微阵列数据的高维性问题. 为了消除对乳腺癌无意义的基因, AliazKovic等[8]将遗传算法用于提取乳腺癌数据中的重要信息, 挖掘与乳腺癌生物过程相关的致病基因. Sangaiah等[9]将特征加权和基于熵的遗传算法结合起来, 提出一种乳腺癌致病基因预测的混合方法. Alzubaidi等[10]将遗传算法与互信息结合应用于乳腺癌致病基因选择. 通过遗传算法将基于互信息的基因选择算法转化为全局优化算法, 能够有效选择基因. 避免算法陷入局部最优. Alomari等[11]结合最小冗余、最大关联算法和花授粉算法来确定包含更多癌症信息的基因子集. Hamim等[12]提出一种基于决策树模型的乳腺癌致病基因选择策略, 该策略包括两个阶段: 基于Fisher评分的过滤阶段和基于C5.0算法的基因选择阶段. Liu等[13]为了提高基因选择效率, 将基因评分与深度神经网络产生的基因重要性相结合, 同时考虑癌症亚型间的差异性和亚型内基因间的相关性来选择乳腺癌三阴性亚型的最优致病基因子集. Zhao等[14]基于信息熵的不确定性系数被用来定义基因间是否存在逻辑关系, 进而构建基因逻辑网络, 最终通过比较对照组与实验组网络之间的差异程度, 提取乳腺癌致病基因.

    上述预测方法都是基于已有癌症组学数据进行基因预测, 这些组学数据来源于对癌症患者的测序. 换言之, 这些方法仅能根据目前已发病患者的基因突变状态来分析基因与癌症之间的关联, 无法预知患者发病前的基因突变状态, 而发病前的基因突变状态与发病基因突变状态之间的差异才是癌症发生的关键.

    强化学习[15]是一类结合了优化控制思想和生命体学习行为的机器学习方法, 其要求待处理的问题环境拥有马尔可夫性质, 即当前状态仅受上一状态的影响, 与其余状态无关. 强化学习希望智能体在指定的状态能够得到让回报最大化的动作, 并通过智能体与环境的交互进行学习, 从而改变特定状态选择某个动作的趋势. 强化学习还是一种拥有自主决策能力的算法, 它使智能体通过在环境中的不断试错得到回报值和下一时刻状态的观测值, 最终学习到一个能够获取较大折扣累积回报的策略. 强化学习已被成功应用于多个研究领域, 例如, 数据驱动控制[16]、多机协同决策[17]、交通控制[18]等.

    本文通过分析基因突变, 发现其过程满足马尔可夫过程, 且基因突变与癌症之间的关联性可以通过强化学习中累计回报函数构建的方式进行计算. 因此, 基于乳腺癌突变数据, 本文设计一套强化学习环境与算法对患者从正常基因突变状态至死亡基因突变状态的过程进行评估、决策, 旨在为癌症致病基因预测提供新思路, 并挖掘出导致乳腺癌死亡状态的致病基因. 实验结果表明, 提出的强化学习算法能够挖掘出与乳腺癌密切相关的致病基因.

    由于基因突变并非确定性事件, 在非人为干涉的前提下, 基因突变可视为一个随机过程. 设任意$ t $时刻基因突变状态(后文简称状态)为$ {{\boldsymbol{s}}_t} $, 下一时刻状态为$ {{\boldsymbol{s}}_{t+1}} $, 则在$ t+1 $时刻状态发生的变化只与$ t $时刻的状态有关, 与之前$ 0 \sim t-1 $的状态并无关联, 即

    $$ \begin{equation} P\left( {{{\boldsymbol{s}}_{t + 1}}\left| {{{\boldsymbol{s}}_0},{{\boldsymbol{s}}_1}, \cdots ,{{\boldsymbol{s}}_t}} \right.} \right) = P\left( {{{\boldsymbol{s}}_t}} \right) \end{equation} $$ (1)

    其中, $ P\left( \cdot \right) $为概率. 基于上述考虑, 可以认为基因突变对应的随机过程为马尔可夫过程.

    本文根据乳腺癌患者生存数据中患者的临床信息来定义死亡状态和非死亡状态. 患者生存数据兼有时间和结局两种属性信息. 时间描述的是患者由观察起点至观察终点的时间间隔, 通常称为生存时间. 患者生存数据的结局即为观察终点, 观察终点分为死亡和存活两种, 在生存数据中记为1和0. 在本文中, 如果某患者的观察终点为死亡, 则将该患者在乳腺癌数据中的基因突变状态定义为死亡状态. 值得注意的是, 具有相同基因突变状态的患者, 观察终点并不一定相同, 因此通过定义死亡率来更加精细地对数据进行描述. 若基因突变状态使所有癌症患者死亡, 则该状态的死亡率为100%; 若基因突变状态有一定概率导致患者死亡, 例如100个患者有相同的状态, 其中有10个患者死亡, 则死亡率为10%. 这里将有概率死亡的基因突变状态统称为死亡状态. 设一个基因与$ t $时刻状态$ {{\boldsymbol{s}}_t} $之间的关联性为$ r\left( {{{\boldsymbol{s}}_t}} \right) $, 已有基因排序算法更关注对历史病例数据的数理统计, 通过计算$ r\left( {{{\boldsymbol{s}}_t}} \right) $的大小来评价某个基因突变与癌症患者之间的联系强弱. 然而这类方法没有充分考虑患者的死亡状态, 且忽视了癌症的发生过程, 比如死亡状态$ {\boldsymbol{s}}_\alpha $虽然死亡率不高, 且$ r\left( {{{\boldsymbol{s}}_t}} \right) $值较小, 但可能在一定时期内突变成死亡率很高的其他状态, 这类状态$ {\boldsymbol{s}}_\alpha $中的基因与癌症患者死亡之间的应该有很强的关联性. 因此, 对基因与癌症患者之间关联的评估不应只关注状态$ {{\boldsymbol{s}}_t} $中基因与癌症关联性, 更应从一个正常状态经历漫长基因突变过程至死亡状态的角度, 评估突变基因与某个死亡状态的关联性, 即$ \sum\nolimits_i {r\left( {{{\boldsymbol{s}}_i}} \right)} $.

    乳腺癌突变数据中, 每个患者的所有基因突变状态是一个样本, 每个基因在所有患者上的突变状况是一个特征, 如图1所示. 患者的某个基因发生突变, 则记为1 (图1中黑色格子), 不发生突变则记为0 (图1中非黑色格子). 本文构建强化学习环境如下: 将基因作为智能体 (Agent), $ t $时刻基因突变状况作为状态$ {{\boldsymbol{s}}_t} $, 基因突变作为动作$ {{\boldsymbol{a}}_t} $, 根据死亡状态的死亡率设计回报函数$ r\left( {{{\boldsymbol{s}}_t}} \right) $, 当智能体达到死亡状态时获得最优策略, 停止与环境交互, 给予高回报值. 基因突变数据中的基因数目成百上千, 在一个状态中, 使用单智能体进行强化学习时, 状态−动作空间复杂度极高, 需要大量计算成本. 为此, 考虑利用多智能体深度Q网络 (Deep Q network, DQN)[19]对乳腺癌突变数据进行强化学习. 一方面, 相比于Q学习方法, DQN通过训练更新值函数神经网络的参数, 减小状态高维度对算法训练效果的影响; 另一方面, 使用多智能体进行强化学习, 可降低动作空间复杂度, 大大减少强化学习的计算量.

    图 1  乳腺癌突变数据
    Fig. 1  Breast cancer mutation data

    多智能体DQN使得学习任务的复杂度减小, 但多智能体的动作维度并没有下降, 智能体探索到最优策略的概率很低. 由于所有死亡状态均来自乳腺癌突变数据, 可将死亡状态作为专家意见指导强化学习过程, 根据演示学习理论, 提出两种多智能体DQN: 基于行为克隆的多智能体DQN (Behavioral cloning-based multi-agent DQN, BCDQN)和基于预训练记忆的多智能体DQN (Pre-training memory-based multi-agent DQN, PMDQN). 设置探索经验池$ {B_1} $和演示经验池$ {B_2} $两个经验池 , 更好地实现演示学习. 当智能体数量较少时, BCDQN使智能体在每一步探索时都给出专家意见, 保证$ {B_1} $$ {B_2} $在状态上同分布, 实现探索策略对专家策略的完全克隆; 当智能体数量较大时, PMDQN通过预训练将一定数量的专家经验保存在$ {B_2} $中, 再使智能体随机探索填充$ {B_1} $, 并通过训练最终实现$ {B_1} $$ {B_2} $同分布, 这能够使$ {B_2} $中样本之间的相关性下降, 从而加快算法的学习.

    设基因数为$ N $, 构建一个状态、动作维度都为$ N $的状态−动作空间, 则状态空间$ S $中任一状态$ {{\boldsymbol{s}}_t} = \left[ {s_t^1,s_t^2, \cdots ,s_t^N} \right] $$ N $维二进制向量, 其中$s_t^k( k = $$ 1,2, \cdots ,N )$的取值满足: 基因在$ s_t^k $上发生突变则$ s_t^k = 1 $, 不发生突变则$ s_t^k = 0 $. 动作空间$ A $中动作$ {{\boldsymbol{a}}_t} = \left[ {a_t^1,a_t^2, \cdots ,a_t^N} \right] $$ N $维二进制向量, 其中$a_t^k( k = $$ 1,2, \cdots ,N )$满足: 基因在$ s_t^k $下一状态发生突变则调整$ a_t^k = 1 $, 不发生突变则保持$ a_t^k = 0 $. 状态间的状态转移$ {{\boldsymbol{s}}_{t+1}} $满足

    $$ {{\boldsymbol{s}}_{t + 1}} = {{\boldsymbol{s}}_t} \oplus {{\boldsymbol{a}}_t} = \left[ {s_t^k \oplus a_t^k, \cdots ,s_t^k \oplus a_t^k} \right] $$ (2)

    其中, $ \oplus $为异或运算. 定义汉明距离$ D $为:

    $$ D\left( {{{\boldsymbol{s}}_t},{{\boldsymbol{s}}_{t + 1}}} \right) = \sum\limits_{i = 1}^N {s_t^k \oplus s_{t + 1}^k} = {\left\| {{{\boldsymbol{a}}_t}} \right\|_1} $$ (3)

    回报函数$ r\left( {{{\boldsymbol{s}}_t}} \right) $定义为:

    $$ r\left( {{{\boldsymbol{s}}_t}} \right) = \left\{ \begin{aligned} &- 1 - \eta D\left( {{{\boldsymbol{s}}_t},{{\boldsymbol{s}}_{t + 1}}} \right),\; {\rm{Alive}}\\ &- \eta D\left( {{{\boldsymbol{s}}_t},{{\boldsymbol{s}}_{t + 1}}} \right),\qquad {\rm{Dead}} \end{aligned} \right. $$ (4)

    式中, 设死亡状态(Dead)的死亡率为$ P_d $, 即若状态对应的死亡率不为0, 则智能体在该状态有$ P_d $的概率死亡. 若智能体触发死亡事件, 则停止智能体与环境的交互. 智能体在环境中探索时, 智能体如果存活则给予智能体负的回报, 智能体在环境中存活的时间越长, 对应的累积回报$ \sum\nolimits_{i = t}^\infty {{\gamma ^{i - t}}r\left( {{{\boldsymbol{s}}_i}} \right)} $就越低, 其中, $ \gamma \left( {0 < \gamma < 1} \right) $为折扣因子. 式(4)中的$ D $则限制了状态的变化幅度, 以避免违背基因突变的客观规律, 即智能体要想获得更高的回报则必须要用较小动作幅度触发死亡事件. 由于$ D $值在$ N $足够大情况下会远大于1, 由霍夫丁不等式可知, 随机变量总和与其期望值之间的偏差上限与随机变量取值区间大小正相关. 因此, 使用常数$ \eta \left( {0 < \eta < 1} \right) $限制回报变化幅度, 降低学习任务的复杂度.

    强化学习目标是找到最优策略$ {\pi ^*} = P\left( {{{\boldsymbol{a}}_t}\left| {{{\boldsymbol{s}}_t}} \right.} \right) $, 即最大化期望折扣回报

    $$ \begin{equation} {\rm{E}}\left[ {\sum\limits_{i\; =\; t}^\infty {{\gamma ^{i - t}}} r\left( {{{\boldsymbol{s}}_i}} \right)} \right] \end{equation} $$ (5)

    常用的强化学习算法为异步策略的Q学习方法[6]. 对于当前的学习问题, Q学习方法的迭代公式为

    $$ \begin{split} &Q\left( {{{\boldsymbol{s}}_t},{{\boldsymbol{a}}_t}} \right) = Q\left( {{{\boldsymbol{s}}_t},{{\boldsymbol{a}}_t}} \right) + \\ &\qquad\alpha \left( {r\left( {{{\boldsymbol{s}}_t}} \right) + \gamma \mathop {\max }\limits_{\boldsymbol{a}} Q\left( {{{\boldsymbol{s}}_{t + 1}},{\boldsymbol{a}}} \right) - Q\left( {{{\boldsymbol{s}}_t},{{\boldsymbol{a}}_t}} \right)} \right) \end{split} $$ (6)

    从式(6)可以看出, Q学习方法要求智能体使用贪心算法进行动作选择, 从而刚性保证算法的收敛. Q学习方法倾向于直接估计状态−动作值矩阵. 在所设计的环境中, 状态、动作都是二进制向量, 所以动作空间复杂度为$ {2^{N + 1}} $, 状态空间复杂度为$ {2^N} $. 如果使用Q学习方法, 则需要估计复杂度为$ {2^{2N + 1}} $的值函数矩阵. Q学习方法在$ N $很大时, 需要耗费大量时间遍历求解值函数矩阵. 为此, 本文选择使用DQN通过神经网络训练更新值函数的参数, 减小状态维度对算法训练效果的影响. DQN的更新目标为

    $$ \begin{equation} Y_t^{} = r\left( {{{\boldsymbol{s}}_t}} \right) + \gamma \mathop {\max }\limits_{\boldsymbol{a}} Q\left( {{{\boldsymbol{s}}_{t + 1}},{\boldsymbol{a}}} \right) \end{equation} $$ (7)

    相应的损失函数为

    $$ \begin{equation} L\left( {{{\boldsymbol{\theta}}^k}} \right) = {\rm{E}}\left[ {{{\left( {Y - Q\left( {{\boldsymbol{s}},{\boldsymbol{a}};{\boldsymbol{\theta}} } \right)} \right)}^2}} \right] \end{equation} $$ (8)

    其中, $ {\boldsymbol{\theta}} $为值函数网络参数. DQN采用经验回放技术, 训练值函数网络所用的数据需要从环境交互得到的经验信息中随机采样得到, 以消除训练数据之间的相关性, 从而满足深度学习对训练集数据独立同分布的前提条件. DQN可以高效处理状态−动作空间维度较大的学习问题, 并通过经验回放技术提高经验数据的利用效率.

    本文实验环境如果使用单智能体深度强化学习算法, 则其状态−动作空间复杂度为$ {2^{2N + 1}} $; 如果使用多智能框架, 则会使$ {2^{N + 1}} $的动作空间复杂度变为$ 2N $, 整体上的状态−动作空间复杂度则变为$ N{2^{N + 1}} $. 环境所使用的基因数$ N $一般很大, 因此$N{2^{N + 1}} \ll {2^{2N + 1}}$, 多智能体框架可以大幅降低学习问题的复杂程度, 减少了设计单智能体所需的网络参数.

    多智能体强化学习框架如图2所示. 首先, 将$ {{\boldsymbol{s}}_t} = \left[ {s_t^1,s_t^2, \cdots ,s_t^N} \right] $输入到具有$ N $个智能体的值网络中, 根据$ t $时刻每个基因的突变状态, 分别输出动作$ a_t^k $, 并将输出的$ a_t^k $组合成$ {\boldsymbol{a}}_t $, 进而生成新状态$ {\boldsymbol{s}}_{t+1} $. 之后, 根据乳腺癌突变数据中患者的死亡状态, 判断是否停止与环境交互, 如果不停止, 则将$ {\boldsymbol{s}}_{t+1} $输入网络继续上述迭代过程.

    图 2  多智能体强化学习框架(以第k个智能体为例)
    Fig. 2  Multi-agent reinforcement learning framework (Take the k-th agent as an example)

    每个智能体的更新目标为

    $$ \begin{equation} Y_t^k = r\left( {{{\boldsymbol{s}}_t}} \right) + \gamma \mathop {\max }\limits_{a_{}^k} {Q^k}\left( {{{\boldsymbol{s}}_{t + 1}},a_{}^k;{{\boldsymbol{\theta}} ^k}} \right) \end{equation} $$ (9)

    其中, 第$ k $个智能体的动作$ {a_{}^k} $属于各自的动作空间$ {A^k} $, $ {{\boldsymbol{\theta}} ^k} $则为第$ k $个智能体的值函数网络参数. 第$ k $个智能体系统的损失函数为

    $$ \begin{equation} L\left( {{{\boldsymbol{\theta}}^k}} \right) = {\rm{E}}\left[ {\sum\limits_{k \;=\; 1}^N {{{\left( {{Y^k} - {Q^k}\left( {{\boldsymbol{s}},a_{}^k;{{\boldsymbol{\theta}}^k}} \right)} \right)}^2}} } \right] \end{equation} $$ (10)

    多智能体DQN的伪代码如算法1所示.

    算法1. 多智能体DQN

    输入: 最大迭代次数$ {I_{\max }} $, 折扣因子$ \gamma $, 学习率$ \eta $, 智能体个数$ N $.

    输出: 网络参数$ {{\boldsymbol{\theta}} ^k}\left( {k = 1,2, \cdots ,N} \right) $.

    1) 初始化网络参数${{\boldsymbol{\theta}} ^k}\left( {k = 1,2, \cdots ,N} \right) ;$

    2) While $I < {I_{\max }};$

    3) $t = 0;$

    4) 随机初始化状态$ {\boldsymbol{s}}_t $;

    5) While $ t \le {t_{\max }} $ or 患者死亡;

    6) For $k = 1:N ;$

    7) 计算动作: $ a_t^k = \arg \mathop {\max }\limits_{{a^k}} {Q^k}\left( {{{\boldsymbol{s}}_t},a_{}^k;{{\boldsymbol{\theta}} ^k}} \right) $;

    8) end For;

    9) 环境中应用动作$ {{\boldsymbol{a}}_t} = \left[ {a_t^1,a_t^2, \cdots ,a_t^N} \right] $, 并返回回报$ r\left( {{{\boldsymbol{s}}_t}} \right) $和下一时刻状态$ {{\boldsymbol{s}}_{t + 1}} $;

    10) $t \leftarrow t + 1 ;$

    11) end While;

    12) $I \leftarrow I + 1;$

    13) For $k = 1:N ;$

    14) 随机采样并更新$ {{\boldsymbol{\theta}} ^k} $:

    ${{\boldsymbol{\theta}}^k} \leftarrow {{\boldsymbol{\theta}}^k} + \eta {\nabla _{{{\boldsymbol{\theta}} ^k}}}{\rm{E}}\left[ {\sum\limits_{k = 1}^N {{{\left( {{Y^k} - {Q^k}\left( {{\boldsymbol{s}},{\boldsymbol{a}}_{}^k;{{\boldsymbol{\theta}} ^k}} \right)} \right)}^2}} } \right]$;

    15) end For;

    16) end While.

    本文环境中的基因数目$ N $很大, 则对应的动作维度也很大, 这使得智能体通过随机探索找到最优路径的概率很低. 单纯使用多智能体框架也无法完全避免难以探索得到最优路径的问题, 这是因为: 多智能体框架可以使得学习任务的复杂度下降, 但动作的维数并没有下降, 因而随机探索得到最优策略的概率还是很低. 考虑到环境中包含的所有死亡状态和状态转移均已知, 本文将死亡状态视为专家意见, 采用演示学习[20]方式加快算法的学习.

    在计算专家意见对应的回报$ {r^e}\left( {{{\boldsymbol{s}}_t}} \right) $时, 需要考虑死亡概率, 即

    $$ \begin{equation} \begin{array}{l} {r^e}\left( {{{\boldsymbol{s}}_t}} \right) = {\rm{E}}\left[ {r\left( {{{\boldsymbol{s}}_t}} \right)} \right] = - 1 + {P_d}\left( {{{\boldsymbol{s}}^*}} \right) - \eta D\left( {{{\boldsymbol{s}}_t},{{\boldsymbol{s}}^*}} \right) \end{array} \end{equation} $$ (11)

    其中, $ {\boldsymbol{s}}^* $为目标状态, $ {P_d}\left( {{{\boldsymbol{s}}^*}} \right) $为目标状态的死亡概率. 每个智能体的更新目标为

    $$ \begin{equation} Y_t^{e,k} = r_{}^e\left( {{{\boldsymbol{s}}_t}} \right) + \gamma \mathop {\max }\limits_{{a^k}} \left( {Q\left( {{{\boldsymbol{s}}_{t + 1}},{a^k};{{\boldsymbol{\theta}} ^k}} \right)} \right) \end{equation} $$ (12)

    如果专家意见对应的回报和环境的期望回报$ E\left[ {r\left( {{{\boldsymbol{s}}_t}} \right)} \right] $不相符, 值估计将不收敛, 这时专家系统给出的动作$ {{\boldsymbol{a}}^*} $即为最优动作. 为了更好地实现演示学习, 单独设计一个经验池$ B_2 $来保存演示经验. 将随机探索得到的经验池$ B_1 $和演示经验池$ B_2 $的经验按照$ P_s $的概率进行采样, 即用于网络训练的Batch有$ P_s $的概率从$ B_1 $采样, $ 1-P_s $的概率从$ B_2 $采样. 基于值的强化学习问题本质上是对值函数的拟合问题, 所以无论是专家经验还是智能体随机探索得到的非最优解经验, 都需要应用于值迭代.

    启发于行为克隆[21]思想, 在智能体随机探索的同时, 对应每一步都给出相应的专家意见, 专家意见即为最优策略, 以保证$ B_1 $$ B_2 $在状态上同分布. 算法的每一次迭代训练都会拉近$ B_1 $$ B_2 $之间对应动作的分布差异, 当算法收敛时, $ B_1 $$ B_2 $将完全同分布, 从而实现了智能体探索策略对专家策略的完全克隆. BCDQN的优势是算法会收敛到与专家策略完全相同的策略上.

    $ {L^o} $$ {L^e} $分别为智能体探索系统和专家演示系统的损失函数, 则有

    $$ {L^o}\left( {{{\boldsymbol{\theta}} ^k}} \right) = {{\rm{E}}_{{\boldsymbol{s}}\sim\psi ,{\boldsymbol{a}}\sim\varphi }} \left[ {\sum\limits_{k = 1}^N {{{\left( {{Y^k} - {Q^k}\left( {{\boldsymbol{s}},a_{}^k;{{\boldsymbol{\theta}} ^k}} \right)} \right)}^2}} } \right] $$ (13)
    $$ \begin{split} &{L^e}\left( {{{\boldsymbol{\theta}} ^k}} \right) = \\ &{{\rm{E}}_{{\boldsymbol{s}}\sim\psi ,{\boldsymbol{a}}\sim\varphi ',\varphi '\sim{\pi ^*}\left( \psi \right)}}\left[ {\sum\limits_{k = 1}^N {{{\left( {Y_{}^{e,k} - {Q^k}\left( {{\boldsymbol{s}},a_{}^k;{{\boldsymbol{\theta}} ^k}} \right)} \right)}^2}} } \right] \end{split} $$ (14)

    其中, $ \psi $$ \varphi $分别为探索路径下的状态空间和动作空间. 最终BCDQN的损失函数为

    $$ \begin{equation} L\left( {{{\boldsymbol{\theta}} ^k}} \right) = {P_s}{L^o}\left( {{{\boldsymbol{\theta}} ^k}} \right) + \left( {1 - {P_s}} \right){L^e}\left( {{{\boldsymbol{\theta}} ^k}} \right) \end{equation} $$ (15)

    综上所述, BCDQN的伪代码如下:

    算法2. BCDQN算法

    输入: 最大迭代次数$ {I_{\max }} $, 折扣因子$ \gamma $, 学习率$ \eta $, 智能体个数$ N $, 采样概率$ P_s $, 初始化探索经验池$ B_1 $和演示经验池$ B_2 $.

    输出: 网络参数$ {{\boldsymbol{\theta}} ^k}\left( {k = 1,2, \cdots ,N} \right) $.

    1) 初始化网络参数$ {{\boldsymbol{\theta}} ^k}\left( {k = 1,2, \cdots ,N} \right) $;

    2) While $I < {I_{\max }} ;$

    3) $t = 0 ;$

    4) 随机初始化状态$ {\boldsymbol{s}}_t $;

    5) While $ t \le {t_{\max }} $ or 患者死亡;

    6) For $k = 1:N ;$

    7) 计算动作: $ a_t^k = \arg \mathop {\max }\limits_{{a^k}} {Q^k}\left( {{{\boldsymbol{s}}_t},a_{}^k;{{\boldsymbol{\theta}} ^k}} \right) $;

    8) 计算专家动作$ a_t^{*k} $;

    9) end For;

    10) 环境中应用动作$ {{\boldsymbol{a}}_t} = \left[ {a_t^1,a_t^2, \cdots ,a_t^N} \right] $, 并返回回报$ r\left( {{{\boldsymbol{s}}_t}} \right) $和下一时刻状态$ {{\boldsymbol{s}}_{t + 1}} $, 存入$ B_1 $;

    11) 环境中应用动作$ {{\boldsymbol{a}}_t^*} = \left[ {a_t^{*1},a_t^{*2}, \cdots ,a_t^{*N}} \right] $, 并返回回报$ r^e\left( {{{\boldsymbol{s}}_t}} \right) $和下一时刻状态$ {{\boldsymbol{s}}_{t + 1}} $, 存入$ B_2 $;

    12) $t \leftarrow t + 1;$

    13) end While;

    14) $I \leftarrow I + 1 ;$

    15) For $k = 1:N ;$

    16) 随机采样并更新$ {{\boldsymbol{\theta}} ^k} $:

    $ {{\boldsymbol{\theta}} ^k} \leftarrow {{\boldsymbol{\theta}} ^k} + \eta {\nabla _{{{\boldsymbol{\theta}} _k}}}\left( {{P_s}{L^o}\left( {{{\boldsymbol{\theta}} ^k}} \right) + \left( {1 - {P_s}} \right){L^e}\left( {{{\boldsymbol{\theta}} ^k}} \right)} \right) $;

    17) end For;

    18) end While.

    随着$ N $的增大, BCDQN中$ B_1 $$ B_2 $状态上同分布反而会使得智能体难以找到最优路径. $ N $越大, 智能体的随机探索得到最优路径的概率就越低, 经验池里经验向量来自同一条路径的概率就越高, 这间接增加了训练样本间的相关性. 而深度强化学习要求训练样本间要尽可能独立, 所以提出基于预训练记忆的多智能体DQN (PMDQN)先使智能体在环境中进行预训练, 并将数量$ T $的专家经验保存在$ B_2 $中, 然后不再对$ B_2 $进行更新. 随后使智能体进行随机探索填充$ B_1 $, 并继续智能体的训练. 由于最终算法收敛时, $ B_1 $$ B_2 $不一定会完全同分布, 因此, 智能体不能保证学习到最优策略. 但PMDQN可以使专家经验池提供的样本间的相关性下降, 并加快了算法的学习速度.

    这时, 智能体探索系统和专家演示系统的损失函数分别为$ {L^o} $$ {L^e} $, 则有

    $$ \begin{split} &{L^o}\left( {{{\boldsymbol{\theta}} ^k}} \right) = \\ &\qquad {\rm{E}}_{{\boldsymbol{s}}\sim\psi ,{\boldsymbol{a}}\sim\varphi } \left[ {\sum\limits_{k = 1}^N {{{\left( {{Y^k} - {Q^k}\left( {{\boldsymbol{s}},a_{}^k;{{\boldsymbol{\theta}} ^k}} \right)} \right)}^2}} } \right] \end{split} $$ (16)
    $$ \begin{split} &{L^e}\left( {{{\boldsymbol{\theta}} ^k}} \right) = \\ &\qquad{{\rm{E}}_{\left( {{\boldsymbol{s}},{\boldsymbol{a}}} \right)\sim{B_2}}}\left[ {\sum\limits_{k = 1}^N {{{\left( {Y_{}^{e,k} - {Q^k}\left( {{\boldsymbol{s}},a_{}^k;{{\boldsymbol{\theta}} ^k}} \right)} \right)}^2}} } \right] \end{split} $$ (17)

    最终PMDQN的损失函数为

    $$ \begin{equation} L\left( {{{\boldsymbol{\theta}} ^k}} \right) = {P_s}{L^o}\left( {{{\boldsymbol{\theta}} ^k}} \right) + \left( {1 - {P_s}} \right){L^e}\left( {{{\boldsymbol{\theta}} ^k}} \right) \end{equation} $$ (18)

    PMDQN的伪代码如下:

    算法3. PMDQN算法

    输入: 最大迭代次数$ {I_{\max }} $, 折扣因子$ \gamma $, 学习率$ \eta $, 智能体个数$ N $, 采样概率$ P_s $, 专家经验数量$ T $, 初始化探索经验池$ B_1 $和演示经验池$ B_2 $.

    输出: 网络参数$ {{\boldsymbol{\theta}} ^k}\left( {k = 1,2, \cdots ,N} \right) $.

    1) While $I < T;$

    2) 随机生成状态$ {{\boldsymbol{s}}_t} $, 并计算专家动作$ a_t^{*k} $;

    3) 环境中应用动作$ {{\boldsymbol{a}}_t^*} = \left[ {a_t^{*1},a_t^{*2}, \cdots ,a_t^{*N}} \right] $, 并返回回报$ r^e\left( {{{\boldsymbol{s}}_t}} \right) $和下一时刻状态$ {{\boldsymbol{s}}_{t + 1}} $, 存入$ B_2 $;

    4) 初始化网络参数$ {{\boldsymbol{\theta}} ^k}\left( {k = 1,2, \cdots ,N} \right) $;

    5) While $I < {I_{\max }};$

    6) $t = 0 ;$

    7) 随机初始化状态$ {\boldsymbol{s}}_t $;

    8) While $ t \le {t_{\max }} $ or 患者死亡;

    9) For $k = 1:N ;$

    10) 计算动作: $ a_t^k = \arg \mathop {\max }\limits_{{a^k}} {Q^k}\left( {{{\boldsymbol{s}}_t},a_{}^k;{{\boldsymbol{\theta}} ^k}} \right) $;

    11) end For;

    12) 环境中应用动作$ {{\boldsymbol{a}}_t} = \left[ {a_t^1,a_t^2, \cdots ,a_t^N} \right] $, 并返回回报$ r\left( {{{\boldsymbol{s}}_t}} \right) $和下一时刻状态$ {{\boldsymbol{s}}_{t + 1}} $, 存入$ B_1 $;

    13) $t \leftarrow t + 1 ;$

    14) end While;

    15) $I \leftarrow I + 1 ;$

    16) For $k = 1:N ;$

    17) 随机采样并更新$ {{\boldsymbol{\theta}} ^k} $;

    ${{\boldsymbol{\theta}} ^k} \leftarrow {{\boldsymbol{\theta}} ^k} + \eta {\nabla _{{{\boldsymbol{\theta}} _k}}}\left( {{P_s}{L^o}\left( {{{\boldsymbol{\theta}} ^k}} \right) + \left( {1 - {P_s}} \right){L^e}\left( {{{\boldsymbol{\theta}} ^k}} \right)} \right);$

    18) end For;

    19) end While.

    通过比较每个基因突变状态$ s^k $的值$ F\left( {{s^k}} \right) $进行乳腺癌致病基因排序. $ F\left( {{s^k}} \right) $可表示为

    $$ \begin{split} F\left( {{s^k}} \right) =\;& {\rm{E}}\left[ {Q\left( {{\boldsymbol{s}}\left| {_{{s^k} = 0}} \right.,{a^k} = 1;{{\boldsymbol{\theta}} ^k}} \right)} \right]+\\ &{\rm{E}}\left[ {Q\left( {{\boldsymbol{s}}\left| {_{{s^k} = 1}} \right.,{a^k} = 0;{{\boldsymbol{\theta}} ^k}} \right)} \right] \end{split} $$ (19)

    式中, 由于第$ k $个智能体从未突变状态($ s^k = 0 $)到最终突变状态($ s^k = 1 $)采取的动作为$ a^k = 1 $; 从突变状态($ s^k = 1 $)到最终突变状态($ s^k = 1 $)采取的动作为$ a^k = 0 $, 所以$ F\left( {{s^k}} \right) $可以用于表示某个基因突变对患者死亡贡献度的高低. 这里默认最终状态为未突变状态($ s^k = 0 $)时, 对乳腺癌突变基因的分析无意义.

    在多智能体框架中, 每一个智能体只处理动作空间为2、状态空间为$ 2^N $的强化学习问题, 并使用基于值的强化学习来进行训练, 这时输入为$ N $维二进制向量, 输出为2维的Q值. 这时的多智能框架对神经网络结构的要求不高. 为了加快多智能体的训练速度, 所有DQN仅使用单层神经网络, 即第$ k $个网络参数$ {\boldsymbol{\theta}} ^k $只包含权值向量$ {\boldsymbol{w}} ^k $和偏置向量$ {\boldsymbol{b}} ^k $, 则有

    $$ \begin{split} &{2^{N - 1}}\left( {{{\left\| {{{\boldsymbol{w}}^k}} \right\|}_1} + {{\left\| {{{\boldsymbol{b}}^k}} \right\|}_1}} \right) =\\ &\qquad {\rm{E}}\left[ {Q\left( {{\boldsymbol{s}}\left| {_{{s^k} = 0}} \right.,{a^k} = 1;{{\boldsymbol{\theta}} ^k}} \right)} \right]+\\ &\qquad {\rm{E}}\left[ {Q\left( {{\boldsymbol{s}}\left| {_{{s^k} = 1}} \right.,{a^k} = 0;{{\boldsymbol{\theta}} ^k}} \right)} \right] \end{split} $$ (20)

    由于$ \mathop {\arg \max \limits_k} \left( {F\left( {{s^k}} \right)} \right) $$ \mathop {\arg \max \limits_k } \left( {{{\left\| {{{\boldsymbol{w}}^k}} \right\|}_1} + {{\left\| {{{\boldsymbol{b}}^k}} \right\|}_1}} \right) $相等, 所以最终使用下式进行致病基因排序

    $$ \begin{equation} F\left( {{s^k}} \right) = {\left\| {{{\boldsymbol{w}}^k}} \right\|_1} + {\left\| {{{\boldsymbol{b}}^k}} \right\|_1} \end{equation} $$ (21)

    深度强化学习方法主要通过评估状态−动作值的高低来决定动作: 如果某个基因在式(21)中的值越大, 说明智能体在任意状态下发生突变的状态−动作值越大, 即该基因发生突变导致病人死亡的概率越高. 因此, 通过式(21)指标可以排序出基因突变与患者死亡之间的关联性. 最后, 根据需求选择排序靠前的$ n $个基因作为致病基因.

    本文通过在乳腺癌基因突变数据构建的环境来预测乳腺癌的致病基因. 乳腺癌突变数据和生存数据由TCGA数据官网下载得到(网址: https://portal.gdc.cancer.gov). 深度强化学习的训练时间与环境的状态−动作空间复杂度正相关. 一般环境的状态−动作空间复杂度越高, 需要的神经网络越复杂, 训练时间越长. 受限于实验设备的计算效率, 实验中需要通过一定的规则来限制状态、动作的维度, 因此通过基因突变率来筛选基因数目.

    根据乳腺癌突变数据中的基因突变率将实验设置为2组: 第1组选择基因突变率$\ge 50\%$的基因, 得到$ N = 188 $个基因, 其中包含53种不同的死亡状态; 第2组选择基因突变率$\ge 30\%$的基因, 得到$ N = 420 $个基因, 其中包含81种不同的死亡状态. 由于BCDQN比PMDQN更稳定, 所以$ N = 188 $时使用BCDQN进行训练. 当$ N = 420 $时, BCDQN需耗费大量时间进行训练, 为了使算法快速收敛, 使用PMDQN进行训练.

    本文将基因突变视为多智能体的动作, 若基因突变率太低, 则基因/智能体数目增多, 而死亡状态中突变基因的占比急剧减小, 多智能体很难通过动作学习到死亡状态, 所以选择使用30%、50%的基因突变率来确保构建环境所用的基因数满足智能体对乳腺癌死亡状态的学习. 当然, 也可以选择其他突变率的基因数目, 例如突变率$ \ge 40$%, 理论上在合理的基因突变率范围内, 本文提出的算法都能够适用. 不同基因突变率数据集的选择会对实验结果产生影响, 这体现在两个方面: 1) 突变率越低得到的基因数目越大, 状态−动作空间维度也越大, 导致模型收敛速度变慢, 无法学习到最优策略; 突变率越高, 则得到的基因越少, 使得强化学习任务更简单, 且过高突变率的基因使乳腺癌致病基因预测任务无意义. 2) 突变率改变将会产生不同的患者死亡率, 影响智能体完成任务情况. 因此, 在实验设备的允许的情况下, 建议基因突变率的选择范围为10% ~ 50%.

    $ N = 188 $时, 使用BCDQN进行训练. 多智能体在53个死亡状态上的回报值如图3所示, 其中, 横坐标表示episode, 纵坐标表示回报值. 由图3可以看出, 所有的策略处于收敛状态, 在每个死亡状态上, 多智能体在每个episode都可以取得稳定的回报. 由于策略收敛, BCDQN可以完成所有学习任务, 具备较好的鲁棒性. 图4表示当$ N = 188 $时, 多智能体完成任务情况 (达到死亡状态), 其中, 横坐标表示episode, 纵坐标表示完成任务的次数. 图4中除0、1、6、7四个死亡状态外, 智能体能够稳定学习到死亡状态的最优策略. 智能体在0、1、6、7四个死亡状态产生波动是由于这几个死亡状态的死亡率较低 (死亡率分别为4.60%、9.7%、7.69%和9.09%), 使得智能体在上限步数内虽然停留在死亡状态却无法触发死亡事件, 导致智能体无法完全保证稳定学习到最优策略. BCDQN在状态−动作空间维度较小环境中可以确保找到最优策略. 而在较复杂的状态−动作空间维度中, 若存在充足的专家经验, 则算法一定可以收敛至最优策略, 但需要耗费的训练时间难以估计.

    图 3  $ N = 188$时, BCDQN在53个死亡状态上的回报值
    Fig. 3  The rewards of BCDQN at 53 death states under the condition of $ N = 188$
    图 4  $ N = 188$时, BCDQN在53个死亡状态上的完成任务情况
    Fig. 4  The task completion status of BCDQN at 53 death states under the condition of $ N = 188$

    $ N = 420 $时, 使用PMDQN进行训练. 多智能体在81个死亡状态上的回报值如图5所示. 除61、62、67、69、71五个死亡状态外, 多智能体可在其余所有死亡状态上学习到最高的回报值. 图6是当$ N = 420 $时, 多智能体完成任务情况. 除61、62、67、69、71五个死亡状态外, 智能体能够学习到死亡状态的最优策略. 产生这种结果的原因是由于智能体增多导致动作−状态空间复杂度增大, 智能体训练时间不够长, 暂时没有学习到最优策略. PMDQN虽然保证了采样效率, 提供了大量有效的专家经验, 加快了算法的训练, 却不可避免地会因为环境的太过复杂而遇到专家经验不足的问题. 此时通过专家经验的扩充可在一定程度上的减少这种陷入局部最优现象的发生. 当$ N = 420 $时, 状态−空间维度较大且复杂, 多智能体在一个情节内经历的轨迹较长, 这也会导致智能体无法探索到上述五个死亡状态. 因此, 也可以尝试利用增强探索的强化学习方法解决此问题.

    图 5  $ N = 420$时, PMDQN在81个死亡状态上的回报值
    Fig. 5  The rewards of PMDQN at 81 death states under the condition of $ N = 420$
    图 6  $ N = 420$时, PMDQN在81个死亡状态上的完成任务情况
    Fig. 6  The task completion status of PMDQN at 81 death states under the condition of $ N = 420$

    根据上述结果, 总结BCDQN和PMDQN的特点和适用情况如下: BCDQN在状态−动作空间维度较小时, 能够保证智能体探索到与专家策略相同的策略, 稳定找到最优策略; 在状态−动作空间维度大且复杂时, PMDQN可以减小样本间的相关性, 满足更多智能体快速进行强化学习, 但不能保证智能体学习到最优策略. 综上所述, 在实验设备允许情况下, 建议在$ N<420 $时使用BCDQN, 在$ N \ge 420 $时使用PMDQN.

    $当N = 188$$ N = 420 $时, BCDQN和PMDQN预测的前10个致病基因如表1所示. 在这两种情况下, 预测的致病基因有重叠部分, 例如TP53、MYC和PVT1.

    表 1  BCDQN和PMDQN预测的前10个致病基因
    Table 1  Top 10 pathogenic genes predicted by BCDQN and PMDQN
    序号BCDQNPMDQN
    1TP53TP53
    2FAM91A1PIK3CA
    3TNFRSF11BTG
    4KCNQ3HHLA1
    5MYCASAP1
    6COL14A1 CASC8
    7CCDC26SNORA12
    8CCN3MYC
    9PVT1PVT1
    10DSCC1RN7SL329
    下载: 导出CSV 
    | 显示表格

    肿瘤抑制基因TP53在控制细胞增殖、细胞存活和基因组完整性的许多细胞通路中发挥着关键作用. 当细胞经历应激条件 (如DNA损伤、缺氧或致癌基因激活)时, TP53作为细胞增殖的制动器, 几乎在所有类型的癌症中发生突变. Silwal-Pandit等[22]分析了1420名乳腺癌患者体细胞的TP53突变, 研究结果表明TP53突变谱在乳腺癌中具有亚型特异性和明显的预后相关性. Funda等[23]对257例转移性乳腺癌患者的202个基因进行了高通量测序, 研究表明TP53在乳腺癌的三种亚型中都存在显著突变, 且与无复发生存期、无进展生存期和总生存期相关. Han等[24]分析了187例转移性乳腺癌患者的血液样本, 研究表明TP53突变转移性乳腺癌患者的预后明显低于TP53野生型患者, 特别是激素受体阳性/表皮生长因子受体2阴性和三阴性队列患者. 在TP53突变的患者中, DNA结合域中非错义突变的乳腺癌患者的相关生存率更低.

    MYC是细胞生长、增殖、代谢、分化和凋亡的关键调控因子, 它的扩增或过表达常见于多种恶性肿瘤. 乳腺癌中MYC的解除涉及多种机制, 包括基因扩增、转录调节、mRNA和蛋白质稳定, 这与肿瘤抑制子的缺失和致癌途径的激活相关. Xu等[25]报道了肿瘤抑制因子BRCA1能够抑制MYC的转录和转化活性, 并且BRCA1缺失和MYC过表达导致乳腺癌的发生, 特别是基底细胞样亚型的乳腺癌. Terunuma等[26]发现乳腺癌中2-羟戊二酸水平升高与MYC通路激活之间存在关联, 并在人类乳腺上皮细胞和乳腺癌细胞中MYC的过表达和敲低进一步证实了这一关系. Camarda等[27]通过靶向代谢组学方法, 发现脂肪酸氧化中间体在MYC驱动的三阴性乳腺癌模型中显著上调.

    PVT1在多种恶性肿瘤中高表达, 是潜在的癌基因, 它还可与MYC基因相互作用, 通过多种途径参与恶性肿瘤细胞的增殖、凋亡等调控. Cho等[28]证明了PVT1启动子具有独立于PVT1 lncRNA的肿瘤抑制功能, 且PVT1启动子CRISPR增强了乳腺癌细胞在体内的竞争和生长. Tang等[29]报道了PVT1在临床三阴性乳腺癌中上调, 并促进KLF5/beta-catenin信号通路以驱动三阴性乳腺癌的发生. Wang等[30]的研究表明, PVT1的表达增加与乳腺癌患者的临床分期、淋巴结转移和总生存率有关.

    为进一步验证预测得到的致病基因与乳腺癌密切相关, 首先利用ToppGene工具(网址: https://toppgene.cchmc.org/)进行基因富集分析. 基因富集分析是指将一组基因按照基因组注释信息进行分类的过程, 能够发现基因间是否具有某方面的共性. 基因组注释信息存储于基因注释数据库(Gene anotation database), 能够帮助理解基因功能, 发现基因与疾病之间的关联等. 本文采用的基因注释数据库是基因本体数据库(Gene ontology, GO), 其涵盖多种语义分类, 如分子功能、生物学过程、细胞组分等. GO术语 (GO term) 是GO数据库中的基本描述单元, 可描述基因产物的功能, 例如: GO术语: regulation of DNA biosynthetic process描述的是一组基因在生物过程中对DNA生物合成过程起调节作用.

    在富集分析圈图(图7 ~ 8)中, 圆形的左半圆部分表示基因, 右半边表示GO术语, 基因与GO术语之间有连线表示基因产物与GO术语相关, 一个基因与越多GO术语相连, 则表示该基因的产物功能越多. 图7是在$ N = 188 $时, 前10个致病基因的富集分析圈图, 其中基因CCDC26无法与其他基因得到富集结果. 图7中的GO术语是从富集结果的众多GO术语中与乳腺功能密切相关的15个GO术语, 基因MYC与最多数目的GO术语相连, 且与多个乳腺癌相关的GO术语有关, 表示MYC与乳腺癌的发生、发展最为密切, 其次是基因TP53, 以此类推. 由此可见, 图7中的9个基因的产物都与乳腺癌的发病过程相关. 虽然CCDC26无法与其他基因得到富集结果, 但在文献[31]中, CCDC26作为下调基因, 可在多种癌症的发生过程产生作用, 例如白血病、胶质瘤等.

    图 7  $ N = 188$时, BCDQN预测的前10个致病基因的富集分析圈图
    Fig. 7  The enrichment analysis circle diagram of the top 10 pathogenic genes predicted by BCDQN under the condition of $ N = 188$
    图 8  $ N = 420$时, PMDQN预测的前10个致病基因的富集分析圈图
    Fig. 8  The enrichment analysis circle diagram of the top 10 pathogenic genes predicted by PMDQN under the condition of $ N = 420$

    图8是在$ N = 420 $时, 前10个致病基因的富集分析圈图, 本文从富集结果的众多GO术语中选择了与乳腺功能密切相关的18个GO术语. 基因TP53、MYC、PIK3CA、PVT1和TG与这18个GO术语相关, 表明与乳腺癌有关联. 虽然基因HHLA1、ASAP1与上述18个GO术语无关, 但与基因MYC、PVT1、TG一起与GO术语: Human Leukemia Schoch05 1052genes相关, 即与白血病相关. 基因SNORA12在文献[32]中被验证为宫颈癌的8个过表达基因之一. 通过RNA测序结果, 基因RN7SL329P是前列腺癌中前10位差异表达的IncRNAs[33].

    值得注意的是, 生命科学是一门实验科学, 由人类在长期的科学探究中不断积累知识逐步完善. 本文预测的部分致病基因现阶段虽与乳腺癌无直接关联, 但都参与了其他癌症的发生过程, 可作为乳腺癌的候选致病基因以待临床验证. 导致乳腺癌风险增加最常见的突变基因BRCA1、BRCA2和PALB2没有出现在本实验中, 这是由于这些基因的突变率没有达到实验设置要求, 即在$ N = 188 $$ N = 420 $的实验中不包含这些基因. 受篇幅限制, 这里仅提供两种方法预测的前10个基因, 排名靠后的基因不再进行分析, 但是, 这并不代表这些基因与乳腺癌无关, 例如, $ N = 420 $的实验结果中, 基因PIK3CA排在第2位, 但在$ N = 188 $的实验结果中, 其排在第23位.

    本文基于乳腺癌突变数据, 构建多智能体强化学习环境, 并根据突变数据特性设计了两种基于演示学习的多智能体DQN. 借鉴行为克隆思想提出BCDQN, 将患者死亡状态作为专家信息, 对智能体的每一步探索都给予指导, 最终实现探索经验池与专家经验池完全同分布. 为了满足更多智能体快速进行强化学习, 并减小样本间的相关性, 提出PMDQN通过预训练方式将一定数量的专家经验保存在专家经验池中, 然后令智能体进行随机探索, 加快智能体探索到与专家策略相同的策略. 最后, 通过基因富集分析对预测得到的致病基因进行分析, 实验结果表明, 本文方法能够挖掘出乳腺癌致病基因. 同时, 该算法也挖掘出一些与其他癌症的发生过程相关的基因, 可作为乳腺癌的候选致病基因.

    未来的研究工作包括设计癌症连续数据的强化学习环境, 进一步提出适用于连续数据的多智能体强化学习算法.


  • 本文责任编委 孙富春
  • 图  1  无人飞行器集群任务分类图

    Fig.  1  Mission classification of unmanned flying craft clusters

    图  2  覆盖任务的优化策略

    Fig.  2  Optimization strategy for coverage missions

    图  3  无人机群智能调度问题常见约束

    Fig.  3  Common constraints in intelligent scheduling for unmanned aerial vehicles

    图  4  敏捷卫星与传统卫星成像策略对比

    Fig.  4  Comparison on observation strategies between agile satellites and traditional satellites

    图  5  多星智能调度问题常见约束

    Fig.  5  Common constraints in intelligent scheduling for multi-satellites

  • [1] Luo C, Yu L J, Ren P. A vision-aided approach to perching a bioinspired unmanned aerial vehicle. IEEE Transactions on Industrial Electronics, 2018, 65(5): 3976-3984 doi: 10.1109/TIE.2017.2764849
    [2] De Castro A I, Torres-Sánchez J T, Peña J M, Jiménez-Brenes F M, Csillik O, López-Granados F. An automatic random forest-OBIA algorithm for early weed mapping between and within crop rows using UAV imagery. Remote Sensing, 2018, 10(2): Article No. 285 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=remotesensing-10-00285
    [3] Kim B O, Yun K H, Chang T S, Bahk J J, Kim S P. A preliminary study on UAV photogrammetry for the hyanho coast near the military reservation zone, eastern coast of Korea. Ocean and Polar Research, 2017, 39(2): 159-168 doi: 10.4217/OPR.2017.39.2.159
    [4] 王宁, 王永.基于模糊不确定观测器的四旋翼飞行器自适应动态面轨迹跟踪控制.自动化学报, 2018, 44(4): 685-695 doi: 10.16383/j.aas.2017.c160481

    Wang Ning, Wang Yong. Fuzzy uncertainty observer based adaptive dynamic surface control for trajectory tracking of a quadrotor. Acta Automatica Sinica, 2018, 44(4): 685-695 doi: 10.16383/j.aas.2017.c160481
    [5] 罗木生, 姜青山, 侯学隆.直升机使用吊声应召反潜兵力需求仿真.系统仿真学报, 2012, 24(6): 1277-1281, 1286 http://d.old.wanfangdata.com.cn/Periodical/xtfzxb201206025

    Luo Mu-Sheng, Jiang Qing-Shan, Hou Xue-Long. Simulation of optimum helicopter force in definite second time submarine search by dipping sonar. Journal of System Simulation, 2012, 24(6): 1277-1281, 1286 http://d.old.wanfangdata.com.cn/Periodical/xtfzxb201206025
    [6] Lee T K, Baek S H, Choi Y H, Oh S Y. Smooth coverage path planning and control of mobile robots based on high-resolution grid map representation. Robotics and Autonomous Systems, 2011, 59(10): 801-812 doi: 10.1016/j.robot.2011.06.002
    [7] 陈海, 王新民, 焦裕松, 李俨.一种凸多边形区域的无人机覆盖航迹规划算法.航空学报, 2010, 31(9): 1082-1088 http://d.old.wanfangdata.com.cn/Periodical/hkxb201009015

    Chen Hai, Wang Xin-Min, Jiao Yu-Song, Li Yan. An algorithm of coverage flight path planning for UAVs in convex polygon areas. Acta Aeronautica et Astronautica Sinica, 2010, 31(9): 1082-1088 http://d.old.wanfangdata.com.cn/Periodical/hkxb201009015
    [8] Kolling A, Kleiner A. Multi-UAV motion planning for guaranteed search. In: Proceedings of the 12th International Conference on Autonomous Agents and Multi-agent Systems. St. Paul, USA: IFAAMS, 2013. 79-86
    [9] Barrientos A, Colorado J, del Cerro J, Martinez A, Rossi C, Sanz D, et al. Aerial remote sensing in agriculture: a practical approach to area coverage and path planning for fleets of mini aerial robots. Journal of Field Robotics, 2011, 28(5): 667-689 doi: 10.1002/rob.20403
    [10] 晋一宁, 吴炎烜, 范宁军.群无人机动态环境分布式持续覆盖算法.北京理工大学学报, 2016, 36(6): 588-592 http://d.old.wanfangdata.com.cn/Periodical/bjlgdxxb201606007

    Jin Yi-Ning, Wu Yan-Xuan, Fan Ning-Jun. Distributed cooperative control of swarm UAVs for dynamic environment persistent coverage. Transactions of Beijing Institute of Technology, 2016, 36(6): 588-592 http://d.old.wanfangdata.com.cn/Periodical/bjlgdxxb201606007
    [11] Varela G, Caamaño P, Orjales F, Deibe Á, López-Peña F, Duro R J. Autonomous UAV based search operations using Constrained Sampling Evolutionary Algorithms. Neurocomputing, 2014, 132: 54-67 doi: 10.1016/j.neucom.2013.03.060
    [12] Lanillos P, Gan S K, Besada-Portas E, Pajares G, Sukkarieh S. Multi-UAV target search using decentralized gradient-based negotiation with expected observation. Information Sciences, 2014, 282: 92-110 doi: 10.1016/j.ins.2014.05.054
    [13] Wang J J, Zhang Y F, Geng L, Fuh J Y H, Teo S H. Mission planning for heterogeneous tasks with heterogeneous UAVs. In: Proceedings of the 13th International Conference on Control Automation Robotics & Vision (ICARCV). Singapore: IEEE, 2014. 1484-1489
    [14] 符小卫, 魏广伟, 高晓光.不确定环境下多无人机协同区域搜索算法.系统工程与电子技术, 2016, 38(4): 821-827 http://d.old.wanfangdata.com.cn/Periodical/xtgcydzjs201604015

    Fu Xiao-Wei, Wei Guang-Wei, Gao Xiao-Guang. Cooperative area search algorithm for multi-UAVs in uncertainty environment. Systems Engineering and Electronics, 2016, 38(4): 821-827 http://d.old.wanfangdata.com.cn/Periodical/xtgcydzjs201604015
    [15] Ru C J, Qi X M, Guan X N. Distributed cooperative search control method of multiple UAVs for moving target. International Journal of Aerospace Engineering, 2015, 2015: Article No. 317953
    [16] 马纯超, 尹栋, 朱华勇.网络化战场环境下多无人机调度问题.火力与控制指挥, 2015, 40(10): 31-36 http://d.old.wanfangdata.com.cn/Periodical/hlyzhkz201510008

    Ma Chun-Chao, Yin Dong, Zhu Hua-Yong. A study on multi-UAVs scheduling in networked battlefield. Fire Control & Command Control, 2015, 40(10): 31-36 http://d.old.wanfangdata.com.cn/Periodical/hlyzhkz201510008
    [17] 倪谣, 周德云, 马云红, 贺宝财.基于MILP模型的多无人机对地攻击任务分配.火力与指挥控制, 2008, 33(11): 62-65 doi: 10.3969/j.issn.1002-0640.2008.11.018

    Ni Yao, Zhou De-Yun, Ma Yun-Hong, He Bao-Cai. The air-to-ground tasks assignment for multi-UAV based mixed integer linear programming. Fire Control & Command Control, 2008, 33(11): 62-65 doi: 10.3969/j.issn.1002-0640.2008.11.018
    [18] 周小程, 严建钢, 谢宇鹏, 翟鸿君.多无人机对地攻击任务分配算法.海军航空工程学院学报, 2012, 27(3): 308-312 http://d.old.wanfangdata.com.cn/Periodical/hjhkgcxyxb201203003

    Zhou Xiao-Cheng, Yan Jian-Gang, Xie Yu-Peng, Zhai Hong-Jun. Task distributed algorithmic for multi-UAV based on auction mechanism. Journal of Naval Aeronautical and Astronautical University, 2012, 27(3): 308-312 http://d.old.wanfangdata.com.cn/Periodical/hjhkgcxyxb201203003
    [19] Weinstein A L, Schumacher C. UAV scheduling via the vehicle routing problem with time windows. In: Proceedings of the AIAA Infotech@Aerospace 2007 Conference and Exhibit. Rohnert Park: AIAA, 2007.
    [20] Zhen Z Y, Xing D J, Gao C. Cooperative search-attack mission planning for multi-UAV based on intelligent self-organized algorithm. Aerospace Science and Technology, 2018, 76: 402-411 doi: 10.1016/j.ast.2018.01.035
    [21] 常一哲, 李战武, 杨海燕, 罗卫平, 徐安.未来中远距协同空战多目标攻击决策研究.火力与指挥控制, 2015, 40(6): 36-40 doi: 10.3969/j.issn.1002-0640.2015.06.009

    Chang Yi-Zhe, Li Zhan-Wu, Yang Hai-Yan, Luo Wei-Ping, Xu An. A decision-making for multiple target attack based on characteristic of future long-range cooperative air combat. Fire Control & Command Control, 2015, 40(6): 36-40 doi: 10.3969/j.issn.1002-0640.2015.06.009
    [22] 陈洁钰, 姚佩阳, 唐剑, 贾方超.多无人机分布式协同动态目标分配方法.空军工程大学学报(自然科学版), 2014, 15(6): 11-16 doi: 10.3969/j.issn.1009-3516.2014.06.003

    Chen Jie-Yu, Yao Pei-Yang, Tang Jian, Jia Fang-Chao. Multi-UAV decentralized coopertative dynamic target assignment method. Journal of Air Force Engineering University (Natural Science Edition), 2014, 15(6): 11-16 doi: 10.3969/j.issn.1009-3516.2014.06.003
    [23] 刘重, 高晓光, 符小卫, 牟之英.未知环境下异构多无人机协同搜索打击中的联盟组建.兵工学报, 2015, 36(12): 2284-2297 doi: 10.3969/j.issn.1000-1093.2015.12.011

    Liu Chong, Gao Xiao-Guang, Fu Xiao-Wei, Mu Zhi-Ying. Coalition formation of multiple heterogeneous unmanned aerial vehicles in cooperative search and attack in unknown environment. Acta Armamentarii, 2015, 36(12): 2284-2297 doi: 10.3969/j.issn.1000-1093.2015.12.011
    [24] Deng Q B, Yu J Q, Wang N F. Cooperative task assignment of multiple heterogeneous unmanned aerial vehicles using a modified genetic algorithm with multi-type genes. Chinese Journal of Aeronautics, 2013, 26(5): 1238-1250 doi: 10.1016/j.cja.2013.07.009
    [25] 吴蔚楠, 关英姿, 郭继峰, 崔乃刚.基于SEAD任务特性约束的协同任务分配方法.控制与决策, 2017, 32(9): 1574-1582 http://d.old.wanfangdata.com.cn/Periodical/kzyjc201709005

    Wu Wei-Nan, Guan Ying-Zi, Guo Ji-Feng, Cui Nai-Gang. Research on cooperative task assignment method used to the mission SEAD with real constraints. Control and Decision, 2017, 32(9): 1574-1582 http://d.old.wanfangdata.com.cn/Periodical/kzyjc201709005
    [26] 戚泽旸, 王强, 黄英杰.多无人机侦察打击任务分配建模仿真.计算机仿真, 2015, 32(9): 142-146, 188 doi: 10.3969/j.issn.1006-9348.2015.09.031

    Qi Ze-Yang, Wang Qiang, Huang Ying-Jie. Task assignment modeling and simulation for cooperative surveillance and strike of multiple unmanned aerial vehicle. Computer Simulation, 2015, 32(9): 142-146, 188 doi: 10.3969/j.issn.1006-9348.2015.09.031
    [27] Zeng J, Yang X K, Yang L Y, Shen G Z. Modeling for UAV resource scheduling under mission synchronization. Journal of Systems Engineering and Electronics, 2010, 21(5): 821-826 doi: 10.3969/j.issn.1004-4132.2010.05.016
    [28] Di Franco C, Buttazzo G. Coverage path planning for UAVs photogrammetry with energy and resolution constraints. Journal of Intelligent & Robotic Systems, 2016, 83(3-4): 445-462 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=ccaa7b190871ef35e766f3826444f298
    [29] 李炜, 张伟.基于粒子群算法的多无人机任务分配方法.控制与决策, 2010, 25(9): 1359-1363, 1368 http://d.old.wanfangdata.com.cn/Periodical/kzyjc201009016

    Li Wei, Zhang Wei. Method of tasks allocation of multi-UAVs based on particles swarm optimization. Control and Decision, 2010, 25(9): 1359-1363, 1368 http://d.old.wanfangdata.com.cn/Periodical/kzyjc201009016
    [30] Kim J, Morrison J R. On the concerted design and scheduling of multiple resources for persistent UAV operations. Journal of Intelligent & Robotic Systems, 2014, 74(1-2), 479-498
    [31] 罗德林, 吴顺祥, 段海滨, 李茂青.无人机协同多目标攻击空战决策研究.系统仿真学报, 2008, 20(24): 6778-6782

    Luo De-Lin, Wu Shun-Xiang, Duan Hai-Bin, Li Mao-Qing. Air-combat decision-making for UAVs cooperatively attacking multiple targets. Journal of Systems Simulation, 2008, 20(24): 6778-6782
    [32] Kim M H, Baik H, Lee S. Resource welfare based task allocation for UAV team with resource constraints. Journal of Intelligent & Robotic Systems, 2015, 77(3-4): 611-627 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=9ab4b76628dd4d0c1274bd4c2e67054a
    [33] 张民强, 宋建梅, 薛瑞彬.通信距离受限下多无人机分布式协同搜索.系统工程理论与实践, 2015, 35(11): 2980-2986 doi: 10.12011/1000-6788(2015)11-2980

    Zhang Min-Qiang, Song Jian-Mei, Xue Rui-Bin. Multiple UAVs cooperative search under limited communication range. Systems Engineering-Theory & Practice, 2015, 35(11): 2980-2986 doi: 10.12011/1000-6788(2015)11-2980
    [34] Sujit P B, Sousa J B. Multi-UAV task allocation with communication faults. In: Proceedings of the 2012 American Control Conference. Montreal, Canada: IEEE, 2012. 3724-3729
    [35] Mirzaei M, Gordon B W. Cooperative multi-UAV search problem with communication delay. In: Proceedings of the 2010 AIAA Guidance, Navigation, and Control Conference. Toronto, Canada: AIAA, 2010. 519-524
    [36] Ahmadzadeh A, Sayyar-Roudsari B, Homaifar A. A hybrid projected gradient-evolutionary search algorithm for capacitated multi-source multiuavs scheduling with time windows. Recent Developments in Cooperative Control and Optimization. Boston, USA: Springer, 2004. 1-21
    [37] 伍思远.无人机安保任务的调度研究---以杨浦区为例.科技风, 2016, (9): 143 doi: 10.3969/j.issn.1671-7341.2016.09.127
    [38] Ramirez-Atencia C, Bello-Orgaz G, R-Moreno M D, Camacho D. A hybrid MOGA-CSP for multi-UAV mission planning. In: Proceedings of the 2015 Companion Publication of the the 2015 Annual Conference on Genetic and Evolutionary Computation. Madrid, Spain: ACM, 2015. 1205-1208
    [39] 曹文静, 杨林.多无人机自主协同方法协同性能研究.飞航导弹, 2017, (5): 43-49 http://d.old.wanfangdata.com.cn/Periodical/fhdd201705010
    [40] Shima T, Schumacher C. Assignment of cooperating UAVs to simultaneous tasks using genetic algorithms. In: Proceedings of the 2005 Guidance, Navigation, and Control Conference and Exhibit. San Francisco, USA: AIAA, 2005. 1-14
    [41] 仲筱艳, 黄大庆.一种典型任务的多无人机协同任务分配算法研究.自动化技术与应用, 2016, 35(8): 7-12, 22 http://d.old.wanfangdata.com.cn/Periodical/hljzdhjsyyy201608002

    Zhong Xiao-Yan, Huang Da-Qing. Research of cooperation task allocation algorithm for a kind of typical mission. Techniques of Automation and Applications, 2016, 35(8): 7-12, 22 http://d.old.wanfangdata.com.cn/Periodical/hljzdhjsyyy201608002
    [42] Khosiawan Y, Park Y, Moon I, Nilakantan J M, Nielsen I. Task scheduling system for UAV operations in indoor environment. Neural Computing and Applications, 2019, 31(9): 5431-5459 doi: 10.1007/s00521-018-3373-9
    [43] 赵明, 苏小红, 马培军, 赵玲玲.复杂多约束UAVs协同目标分配的一种统一建模方法.自动化学报, 2012, 38(12): 2038-2048 doi: 10.3724/SP.J.1004.2012.02038

    Zhao Ming, Su Xiao-Hong, Ma Pei-Jun, Zhao Ling-Ling. A unified modeling method of UAVs cooperative target assignment by complex multi-constraint conditions. Acta Automatica Sinica, 2012, 38(12): 2038-2048 doi: 10.3724/SP.J.1004.2012.02038
    [44] Wang Z, Liu L, Long T, Wen Y L. Multi-UAV reconnaissance task allocation for heterogeneous targets using an opposition-based genetic algorithm with double-chromosome encoding. Chinese Journal of Aeronautics, 2018, 31(2): 339-350 doi: 10.1016/j.cja.2017.09.005
    [45] 尹高扬, 周绍磊, 莫骏超, 曹明川, 康宇航.基于多目标粒子群优化的无人机协同多任务分配.计算机与现代化, 2016, (8): 7-11 doi: 10.3969/j.issn.1006-2475.2016.08.002

    Yin Gao-Yang, Zhou Shao-Lei, Mo Jun-Chao, Cao Ming-Chuan, Kang Yu-Hang. Multiple task assignment for cooperating unmanned aerial vehicles using multi-objective particle swarm optimization. Computer and Modernization, 2016, (8): 7-11 doi: 10.3969/j.issn.1006-2475.2016.08.002
    [46] Oh G, Kim Y, Ahn J, Choi H L. Task allocation of multiple UAVs for cooperative parcel delivery. Advances in Aerospace Guidance, Navigation and Control. Cham, Germany: Springer, 2018. 443-454
    [47] Oh G, Kim Y, Ahn J, Choi H L. PSO-based Optimal task allocation for cooperative timing missions. IFAC-PapersOnLine, 2016, 49(17): 314-319 doi: 10.1016/j.ifacol.2016.09.054
    [48] 赵宏伟, 许锦州.一种基于在线仿真的多无人机任务调度方法研究.见: 2009年中国高校通信类院系学术研讨会论文集.北京, 中国: 电子工业出版社, 2009. 54-59
    [49] Zhang J X, Zhu Q, Shen F Q, Miao S X, Cao Z Y, Weng Q Q. Hierarchical scheduling method of UAV resources for emergency surveying. In: Proceedings of the 2015 the International Conference on Intelligent Earth Observing and Applications. Guilin, China: SPIE, 2015. Article No. 98083B
    [50] 郑晓辉.无人机协同作战的目标分配算法研究.兵工自动化, 2014, 33(3): 16-18, 31 http://d.old.wanfangdata.com.cn/Periodical/bgzdh201403005

    Zheng Xiao-Hui. Research on target assignment algorithm for multi-UAV cooperative combat. Ordnance Industry Automation, 2014, 33(3): 16-18, 31 http://d.old.wanfangdata.com.cn/Periodical/bgzdh201403005
    [51] 刘毅, 李为民, 邢清华, 徐小来.基于双层规划的攻击无人机协同目标分配优化.系统工程与电子技术, 2010, 32(3): 579-583 http://d.old.wanfangdata.com.cn/Periodical/xtgcydzjs201003031

    Li Yi, Li Wei-Min, Xing Qing-Hua, Xu Xiao-Lai. Cooperative mission assignment optimization of unmanned combat aerial vehicles based on bilevel programming. Systems Engineering and Electronics, 2010, 32(3): 579-583 http://d.old.wanfangdata.com.cn/Periodical/xtgcydzjs201003031
    [52] Wu H S, Li H, Xiao R B, Liu J. Modeling and simulation of dynamic ant colony$'$s labor division for task allocation of UAV swarm. Physica A: Statistical Mechanics and Its Applications, 2018, 491: 127-141 doi: 10.1016/j.physa.2017.08.094
    [53] Perez-Carabaza S, Besada-Portas E, Lopez-Orozco J A, de la Cruz J M. Ant colony optimization for multi-UAV minimum time search in uncertain domains. Applied Soft Computing, 2018, 62: 789-806 doi: 10.1016/j.asoc.2017.09.009
    [54] Gao C, Zhen Z Y, Gong H J. A self-organized search and attack algorithm for multiple unmanned aerial vehicles. Aerospace Science and Technology, 2016, 54: 229-240 doi: 10.1016/j.ast.2016.03.022
    [55] Cekmez U, Ozsiginan M, Sahingoz O K. A UAV path planning with parallel ACO algorithm on CUDA platform. In: Proceedings of the 2014 International Conference on Unmanned Aircraft Systems (ICUAS). Orlando, USA: IEEE, 2014. 347-354
    [56] 陆志强, 刘欣仪.考虑资源转移时间的资源受限项目调度问题的算法.自动化学报, 2018, 44(6): 1028-1036 doi: 10.16383/j.aas.2017.c160834

    Lu Zhi-Qiang, Liu Xin-Yi. Algorithm for resource-constrained project scheduling problem with resource transfer time. Acta Automatica Sinica, 2018, 44(6): 1028-1036 doi: 10.16383/j.aas.2017.c160834
    [57] Ramirez-Atencia C, Bello-Orgaz G, R-Moreno M D, Camacho D. A simple CSP-based model for unmanned air vehicle mission planning. In: Proceedings of the 2014 International Symposium on Innovations in Intelligent Systems and Applications. Alberobello, Italy: IEEE, 2014. 146-153
    [58] Turker T, Yilmaz G, Sahingoz O K. GPU-accelerated flight route planning for multi-UAV systems using simulated annealing. In: Proceedings of the 17th International Conference on Artificial Intelligence: Methodology, Systems, and Applications. Varna, Bulgaria: Springer, 2016. 279-288
    [59] Darrah M A, Niland W, Stolarik B. Multiple UAV Task Allocation for an Electronic Warfare Mission Comparing Genetic Algorithms and Simulated Annealing (Preprint), Technical Report AFRL-VA-WP-TP-2006-340, Air Force Research Laboratory, USA, 2006
    [60] 施蓉花, 吴庆宪, 姜长生.无人机协同攻击的混合粒子群算法.火力与指挥控制, 2009, 34(9): 10-13 doi: 10.3969/j.issn.1002-0640.2009.09.003

    Shi Rong-Hua, Wu Qing-Xian, Jiang Chang-Sheng. Heuristic particle swarm optimization algorithm of multi-UAV cooperative attacking logic. Fire Control & Command, 2009, 34(9): 10-13 doi: 10.3969/j.issn.1002-0640.2009.09.003
    [61] Zhang Y Z, Li J W, Hu B, Zhang J D. An improved PSO algorithm for solving multi-UAV cooperative reconnaissance task decision-making problem. In: Proceedings of the 2016 International Conference on Aircraft Utility Systems (AUS). Beijing, China: IEEE, 2016. 434-437
    [62] Cao L, Tan H S, Peng H, Pan M C. Multiple UAVs hierarchical dynamic task allocation based on PSO-FSA and decentralized auction. In: Proceedings of the 2014 International Conference on Robotics and Biomimetics (ROBIO 2014). Bali, Indonesia: IEEE, 2014. 2368-2373
    [63] 石岭.基于改进的模拟退火PSO无人机资源分配[硕士学位论文], 南京航空航天大学, 中国, 2015

    Shi Ling. UAVs Resource Allocation based on Improved SA-PSO [Master dissertation], Nanjing University of Aeronautics and Astronautics, China, 2015
    [64] 梁国强, 康宇航, 邢志川, 尹高扬.基于离散粒子群优化的无人机协同多任务分配.计算机仿真, 2018, 35(2): 22-28 doi: 10.3969/j.issn.1006-9348.2018.02.005

    Liang Guo-Qiang, Kang Yu-Hang, Xing Zhi-Chuan, Yin Gao-Yang. UAV cooperative multi-task assignment based on discrete particle swarm optimization algorithm. Computer Simulation, 2018, 35(2): 22-28 doi: 10.3969/j.issn.1006-9348.2018.02.005
    [65] 曹攀峰, 崔升.考虑信息延迟的无人机分布式协同搜索算法.电光与控制, 2010, 17(3): 27-29, 34 doi: 10.3969/j.issn.1671-637X.2010.03.007

    Cao Pan-Feng, Cui Sheng. An cooperative search algorithm for multi-UAV with time-delays. Electronics Optics & Control, 2010, 17(3): 27-29, 34 doi: 10.3969/j.issn.1671-637X.2010.03.007
    [66] Kingston D B, Beard R W, Holt R S. Decentralized perimeter surveillance using a team of UAVs. IEEE Transactions on Robotics, 2008, 24(6): 1394-1404
    [67] Li W J, Bi Y Z, Zhu X F, Yuan C A, Zhang X B. Hybrid swarm intelligent parallel algorithm research based on multi-core clusters. Microprocessors and Microsystems, 2016, 47: 151-160 doi: 10.1016/j.micpro.2016.05.009
    [68] Yao P, Wang H L, Ji H X. Multi-UAVs tracking target in urban environment by model predictive control and Improved Grey Wolf Optimizer. Aerospace Science and Technology, 2016, 55: 131-143 doi: 10.1016/j.ast.2016.05.016
    [69] 马焱, 赵捍东, 张玮, 陈白禹, 邵先锋, 张晓东, 等.基于自适应烟花算法的多无人机任务分配.电光与控制, 2018, 25(1): 37-43

    Ma Yan, Zhao Han-Dong, Zhang Wei, Chen Bai-Yu, Shao Xian-Feng, Zhang Xiao-Dong, et al. Task allocation for multi-UAVs based on adaptive fireworks algorithm. Electronics Optics & Control, 2018, 25(1): 37-43
    [70] 刘跃峰, 张安.有人机/无人机编队协同任务分配方法.系统工程与电子技术, 2010, 32(3): 584-588 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=xtgcydzjs201003032

    Liu Yue-Feng, Zhang An. Cooperative task assignment method of manned/unmanned aerial vehicle formation. Systems Engineering and Electronics, 2010, 32(3): 584-588 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=xtgcydzjs201003032
    [71] 刘洋, 陈英武, 谭跃进.一种有新任务到达的多卫星动态调度模型与方法.系统工程理论与实践, 2005, 25(4): 35-41 doi: 10.3321/j.issn:1000-6788.2005.04.006

    Liu Yang, Chen Ying-Wu, Tan Yue-Jin. A modeling and algorithm for the new tasks$'$ arriving in multi-satellites dynamic scheduling. Systems Engineering-Theory & Practice, 2005, 25(4): 35-41 doi: 10.3321/j.issn:1000-6788.2005.04.006
    [72] Qiu D S, He C, Liu J, Ma M H. A dynamic scheduling method of earth-observing satellites by employing rolling horizon strategy. The Scientific World Journal, 2013, 2013: Article No. 304047
    [73] Niu X N, Zhai X J, Tang H, Wu L X. Multi-satellite scheduling approach for dynamic areal tasks triggered by emergent disasters. In: Proceedings of the 2016 International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Prague, Czech Republic: ISPRS, 2016. 475-481
    [74] Skobelev P O, Simonova E V, Zhilyaev A A, Travin V S. Application of multi-agent technology in the scheduling system of swarm of Earth remote sensing satellites. Procedia Computer Science, 2017, 103: 396-402 doi: 10.1016/j.procs.2017.01.127
    [75] Lemaȋtre M, Verfaillie G, Jouhaud F, Lachiver J M, Bataille N. Selecting and scheduling observations of agile satellites. Aerospace Science and Technology, 2002, 6(5): 367-381 doi: 10.1016/S1270-9638(02)01173-2
    [76] Tangpattanakul P, Jozefowiez N, Lopez P. A multi-objective local search heuristic for scheduling Earth observations taken by an agile satellite. European Journal of Operational Research, 2015, 245(2): 542-554 doi: 10.1016/j.ejor.2015.03.011
    [77] 章登义, 郭雷, 王骞, 邹华.一种面向区域目标的敏捷成像卫星单轨调度方法.武汉大学学报$\cdot$信息科学版, 2014, 39(8): 901-905, 922 http://d.old.wanfangdata.com.cn/Periodical/whchkjdxxb201408004

    Zhang Deng-Yi, Guo Lei, Wang Qian, Zou Hua. An improved single-orbit scheduling method for agile imaging satellite towards area target. Geomatics and Information Science of Wuhan University, 2014, 39(8): 901-905, 922 http://d.old.wanfangdata.com.cn/Periodical/whchkjdxxb201408004
    [78] Sarkheyli A, Bagheri A, Ghorbani-Vaghei B, Askari-Moghadam R. Using an effective tabu search in interactive resources scheduling problem for LEO satellites missions. Aerospace Science and Technology, 2013, 29(1): 287-295 doi: 10.1016/j.ast.2013.04.001
    [79] Liu X L, Laporte G, Chen Y W, He R J. An adaptive large neighborhood search metaheuristic for agile satellite scheduling with time-dependent transition time. Computers & Operations Research, 2017, 86: 41-53 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=1176d7c33ea3f15f8326d945dd22d54b
    [80] He L, Liu X L, Laporte G, Chen Y W, Chen Y G. An improved adaptive large neighborhood search algorithm for multiple agile satellites scheduling. Computers & Operations Research, 2018, 100: 12-25 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=f04b55028114ff3001cd437b6bde8ebe
    [81] Parish D A. A Genetic Algorithm Approach to Automating Satellite Range Scheduling [Master dissertation], Air Force Institute of Technology, USA, 1994.
    [82] 朱小满, 王钧, 李军, 景宁. SAR卫星成像任务规划的DHIP方法.计算机工程与科学, 2011, 33(9): 179-183 doi: 10.3969/j.issn.1007-130X.2011.09.032

    Zhu Xiao-Man, Wang Jun, Li Jun, Jing Ning. A DHIP algorithm for SAR satellite imaging planning. Computer Engineering and Science, 2011, 33(9): 179-183 doi: 10.3969/j.issn.1007-130X.2011.09.032
    [83] 王沛.基于分支定价的多星多站集成调度方法研究[博士学位论文], 国防科学技术大学, 中国, 2011

    Wang Pei. Research on Branch-and-Price based Multi-satellite Multi-station Integrated Scheduling Method [Ph. D. dissertation], National University of Defense Technology, China, 2011
    [84] Marinelli F, Nocella S, Rossi F, Smriglio S. A Lagrangian heuristic for satellite range scheduling with resource constraints. Computers & Operations Research, 2011, 38(11): 1572-1583
    [85] Pemberton J C, Galiber Ⅲ F. A constraint-based approach to satellite scheduling. In: Proceedings of the 2001 Constraint Programming and Large Scale Discrete Optimization. Piscataway, USA: American Mathematical Society, 2001. 101-114
    [86] 李云峰, 武小悦.基于多星联合侦察的卫星数传调度问题模型.北京航空航天大学学报, 2008, 34(8): 948-951, 960 http://d.old.wanfangdata.com.cn/Periodical/bjhkhtdxxb200808020

    Li Yun-Feng, Wu Xiao-Yue. Model of satellite data transmission scheduling problem based on multi-satellite combined reconnaissance. Journal of Beijing University of Aeronautics and Astronautics, 2008, 34(8): 948-951, 960 http://d.old.wanfangdata.com.cn/Periodical/bjhkhtdxxb200808020
    [87] Bianchessi N, Righini G. Planning and scheduling algorithms for the COSMO-SkyMed constellation. Aerospace Science and Technology, 2008, 12(7): 535-544 doi: 10.1016/j.ast.2008.01.001
    [88] Xu B, Wang D H, Liu W X, Sun G F. A hybrid navigation constellation inter-satellite link assignment algorithm for the integrated optimization of the inter-satellite observing and communication performance. In: Proceedings of the 2015 China Satellite Navigation Conference. Berlin, Germany: Springer, 2015. 283-296
    [89] 陈祥国, 武小悦.基于解构造图的卫星数传调度ACO算法.系统工程与电子技术, 2010, 32(3): 592-597 http://d.old.wanfangdata.com.cn/Periodical/xtgcydzjs201003034

    Chen Xiang-Guo, Wu Xiao-Yue. ACO algorithm of satellite data transmission scheduling based on solution construction graph. Systems Engineering and Electronics, 2010, 32(3): 592-597 http://d.old.wanfangdata.com.cn/Periodical/xtgcydzjs201003034
    [90] Corrao G, Falone R, Gambi E, Spinsante S. Ground station activity planning through a multi-algorithm optimisation approach. In: Proceedings of the 2012 AESS European Conference on Satellite Telecommunications (ESTEL). Rome, Italy: IEEE, 2012. 1-6
    [91] Skobelev P, Simonova E V, Ivanov A, Mayorov I, Travin V, Zhilyaev A. Real time scheduling of data transmission sessions in a microsatellites swarm and ground stations network based on multi-agent technology. In: Proceedings of the 2014 International Conference on Evolutionary Computation Theory and Applications. Rome, Italy: SciTePress, 2014. 153-159
    [92] 王远振, 赵坚, 聂成.多卫星--地面站系统的Petri网模型研究.空军工程大学学报(自然科学版), 2003, 4(2): 7-11 doi: 10.3969/j.issn.1009-3516.2003.02.002

    Wang Yuan-Zhen, Zhao Jian, Nie Cheng. Study on Petri net model for multi-satellites-ground station system. Journal of Air Force Engineering University (Natural Science Edition), 2003, 4(2): 7-11 doi: 10.3969/j.issn.1009-3516.2003.02.002
    [93] Wang P, Reinelt G, Gao P, Tan Y J. A model, a heuristic and a decision support system to solve the scheduling problem of an earth observing satellite constellation. Computers & Industrial Engineering, 2011, 61(2): 322-335 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=1b9f4075623677a059762396113dd878
    [94] Li Z X, Li J, Mu W T. Space-ground TT&C resources integrated scheduling based on the hybrid ant colony optimization. In: Proceedings of the 28th Conference of Spacecraft TT&C Technology. Singapore: Springer, 2016. 179-196.
    [95] Chen H, Wu J J, Shi W Y, Li J, Zhong Z N. Coordinate scheduling approach for EDS observation tasks and data transmission jobs. Journal of Systems Engineering and Electronics, 2016, 27(4): 822-835 doi: 10.21629/JSEE.2016.04.11
    [96] Sun K, Yang Z Y, Wang P, Chen Y W. Mission planning and action planning for agile earth-observing satellite with genetic algorithm. Journal of Harbin Institute of Technology, 2013, 20(5): 51-56 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=hebgydxxb-e201305010
    [97] Zhu W M, Hu X X, Xia W, Jin P. A two-phase genetic annealing method for integrated Earth observation satellite scheduling problems. Soft Computing, 2019, 23(1): 181-196 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=67afc6f1ee6311f76d89bca4f1610710
    [98] Dilkina B, Havens B. Agile Satellite Scheduling via Permutation Search with Constraint Propagation. Actenum Corporation: Vancouver Canada, 2005: 1-20
    [99] Xu R, Chen H P, Liang X L, Wang H M. Priority-based constructive algorithms for scheduling agile earth observation satellites with total priority maximization. Expert Systems with Applications, 2016, 51: 195-206 doi: 10.1016/j.eswa.2015.12.039
    [100] Zhu K J, Li J F, Baoyin H X. Satellite scheduling considering maximum observation coverage time and minimum orbital transfer fuel cost. Acta Astronautica, 2010, 66(1-2): 220-229 doi: 10.1016/j.actaastro.2009.05.029
    [101] 邱涤珊, 郭浩, 贺川, 伍国华.敏捷成像卫星多星密集任务调度方法.航空学报, 2013, 34(4): 882-889 http://d.old.wanfangdata.com.cn/Periodical/hkxb201304019

    Qiu Di-Shan, Guo Hao, He Chuan, Wu Guo-Hua. Intensive task scheduling method for multi-agile imaging satellites. Acta Aeronautica et Astronautica Sinica, 2013, 34(4): 882-889 http://d.old.wanfangdata.com.cn/Periodical/hkxb201304019
    [102] 范国伟, 常琳, 杨秀彬, 王旻, 王绍举.面向新颖成像模式敏捷卫星的联合执行机构控制方法.自动化学报, 2017, 43(10): 1858-1868 doi: 10.16383/j.aas.2017.c160579

    Fan Guo-Wei, Chang Lin, Yang Xiu-Bin, Wang Min, Wang Shao-Ju. Control strategy of hybrid actuator for novel imaging modes of agile satellites. Acta Automatica Sinica, 2017, 43(10): 1858-1868 doi: 10.16383/j.aas.2017.c160579
    [103] 赵琳, 王硕, 郝勇, 刘源, 柴毅.基于地面任务-空间姿态映射的敏捷卫星任务规划.航空学报, 2018, 39(10): Article No. 322066 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=hkxb201810014

    Zhao Lin, Wang Shuo, Hao Yong, Liu Yuan, Chai Yi. Mission planning for agile satellite based on the mapping relationship between ground missions and spatial attitudes. Acta Aeronautica et Astronautica Sinica, 2018, 39(10): Article No. 322066 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=hkxb201810014
    [104] 经飞, 王钧, 李军, 景宁.考虑多数传模式组合的卫星数传调度方法.系统工程学报, 2012, 27(2): 160-168 doi: 10.3969/j.issn.1000-5781.2012.02.003

    Jing Fei, Wang Jun, Li Jun, Jing Ning. Approach considers multiform data transmission mode for satellite data transmission scheduling problem. Journal of Systems Engineering, 2012, 27(2): 160-168 doi: 10.3969/j.issn.1000-5781.2012.02.003
    [105] Chen H, Li L M, Zhong Z N, Li J. Approach for earth observation satellite real-time and playback data transmission scheduling. Journal of Systems Engineering and Electronics, 2015, 26(5): 982-992 doi: 10.1109/JSEE.2015.00107
    [106] Kim H, Chang Y K. Mission scheduling optimization of SAR satellite constellation for minimizing system response time. Aerospace Science and Technology, 2015, 40: 17-32 doi: 10.1016/j.ast.2014.10.006
    [107] Zhang Z J, Zhang N, Feng Z R. Multi-satellite control resource scheduling based on ant colony optimization. Expert Systems with Applications, 2014, 41(6): 2816-2823 doi: 10.1016/j.eswa.2013.10.014
    [108] Verfaillie G, Lemaȋtre M, Bataille N, Lachiver J M. Management of the Mission of Earth Observation Satellites Challenge Description, Technical Report, Centre National d$'$Etudes Spatiales, France, 2002
    [109] Li J, Li J, Chen H, Jing N. A data transmission scheduling algorithm for rapid-response earth-observing operations. Chinese Journal of Aeronautics, 2014, 27(2): 349-364 doi: 10.1016/j.cja.2014.02.014
    [110] Wu G H, Ma M H, Zhu J H, Qiu D S. Multi-satellite observation integrated scheduling method oriented to emergency tasks and common tasks. Journal of Systems Engineering and Electronics, 2012, 23(5): 723-733 doi: 10.1109/JSEE.2012.00089
    [111] 姜维, 庞秀丽.提高卫星服务寿命的任务规划方法研究.自动化学报, 2014, 40(5): 909-920 doi: 10.3724/SP.J.1004.2014.00909

    Jiang Wei, Peng Xiu-Li. The task scheduling model and algorithm for imaging satellites with optimizing satellite service life. Acta Automatica Sinica, 2014, 40(5): 909-920 doi: 10.3724/SP.J.1004.2014.00909
    [112] Karapetyan D, Minic S M, Malladi K T, Punnen A P. Satellite downlink scheduling problem: a case study. Omega, 2015, 53: 115-123 doi: 10.1016/j.omega.2015.01.001
    [113] 严珍珍, 陈英武, 邢立宁.基于改进蚁群算法设计的敏捷卫星调度方法.系统工程理论与实践, 2014, 34(3): 793-801 http://d.old.wanfangdata.com.cn/Periodical/xtgcllysj201403028

    Yan Zhen-Zhen, Chen Ying-Wu, Xing Li-Ning. Agile satellite scheduling based on improved ant colony algorithm. Systems Engineering-Theory & Practice, 2014, 34(3): 793-801 http://d.old.wanfangdata.com.cn/Periodical/xtgcllysj201403028
    [114] 陈宇宁, 邢立宁, 陈英武.基于蚁群算法的灵巧卫星调度.科学技术与工程, 2011, 11(3): 484-489, 502 doi: 10.3969/j.issn.1671-1815.2011.03.012

    Chen Yu-Ning, Xing Li-Ning, Chen Ying-Wu. Scheduling of agile satellites based on ant colony algorithm. Science Technology and Engineering, 2011, 11(3): 484-489, 502 doi: 10.3969/j.issn.1671-1815.2011.03.012
    [115] 朱新新, 谭跃进, 邓宏钟, 邢立宁.求解成像卫星调度问题的改进蚁群算法.科学技术与工程, 2012, 12(31): 8322-8326 doi: 10.3969/j.issn.1671-1815.2012.31.037

    Zhu Xin-Xin, Tan Yue-Jin, Deng Hong-Zhong, Xing Li-Ning. The improved ant colony algorithm solving the scheduling problem of imaging satellites. Science Technology and Engineering, 2012, 12(31): 8322-8326 doi: 10.3969/j.issn.1671-1815.2012.31.037
    [116] Gao K B, Wu G H, Zhu J H. Multi-satellite observation scheduling based on a hybrid ant colony optimization. In: Proceedings of the 2nd International Symposium on Computer, Communication, Control and Automation. Paris, France: Atlantis Press, 2013. 675-678
    [117] Zhang N, Feng Z R. Cooperative ant colony optimization for multisatellite resource scheduling problem. In: Proceedings of the 2007 Congress on Evolutionary Computation. Singapore: IEEE, 2007. 2822-2828
    [118] 邢立宁, 陈英武.基于混合蚁群优化的卫星地面站系统任务调度方法.自动化学报, 2008, 34(4): 414-418 doi: 10.3724/SP.J.1004.2008.00414

    Xing Li-Ning, Chen Ying-Wu. Mission planning of satellite ground station system based on the hybrid ant colony optimization. Acta Automatica Sinica, 2008, 34(4): 414-418 doi: 10.3724/SP.J.1004.2008.00414
    [119] 姚锋, 邢立宁.求解卫星地面站调度问题的演化学习型蚁群算法.系统工程与电子技术, 2012, 34(11): 2270-2274 doi: 10.3969/j.issn.1001-506X.2012.11.14

    Yao Feng, Xing Li-Ning. Learnable ant colony optimization algorithm for solving satellite ground station scheduling problems. Systems Engineering and Electronics, 2012, 34(11): 2270-2274 doi: 10.3969/j.issn.1001-506X.2012.11.14
    [120] Hosseinabadi S, Ranjbar M, Ramyar S, Amel-Monirian M. Scheduling a constellation of agile earth observation satellites with preemption. Journal of Quality Engineering and Production Optimization, 2017, 2(1): 47-64
    [121] 孙凯, 邢立宁, 陈英武.基于分解优化策略的多敏捷卫星联合对地观测调度.计算机集成制造系统, 2013, 19(1): 127-136 http://d.old.wanfangdata.com.cn/Conference/9007689

    Sun Kai, Xing Li-Ning, Chen Ying-Wu. Agile earth observing satellites mission scheduling based on decomposition optimization algorithm. Computer Integrated Manufacturing Systems, 2013, 19(1): 127-136 http://d.old.wanfangdata.com.cn/Conference/9007689
    [122] Niu X N, Tang H, Wu L X. Satellite scheduling of large areal tasks for rapid response to natural disaster using a multi-objective genetic algorithm. International Journal of Disaster Risk Reduction, 2018, 28: 813-825 doi: 10.1016/j.ijdrr.2018.02.013
    [123] Xhafa F, Herrero X, Barolli A, Takizawa M. A comparison study on meta-heuristics for ground station scheduling problem. In: Proceedings of the 17th International Conference on Network-Based Information Systems. Salerno, Italy: IEEE, 2015. 172-179
    [124] 周毅荣, 陈浩, 李龙梅, 陈荦, 景宁.一种基于免疫遗传的卫星数传调度方法.小型微型计算机系统, 2015, 36(12): 2725-2729 doi: 10.3969/j.issn.1000-1220.2015.12.020

    Zhou Yi-Rong, Chen Hao, Li Long-Mei, Chen Luo, Jing Ning. Immune genetic algorithm for satellite data transmission scheduling. Journal of Chinese Computer Systems, 2015, 36(12): 2725-2729 doi: 10.3969/j.issn.1000-1220.2015.12.020
    [125] Chen H, Zhou Y R, Du C, Li J. A satellite cluster data transmission scheduling method based on genetic algorithm with rote learning operator. In: Proceedings of the 2016 Congress on Evolutionary Computation. Vancouver, Canada: IEEE, 2016. 5076-5083.
    [126] 贺仁杰, 高鹏, 白保存, 李菊芳, 姚锋, 邢立宁.成像卫星任务规划模型、算法及其应用.系统工程理论与实践, 2011, 31(3): 411-422 http://d.old.wanfangdata.com.cn/Periodical/xtgcllysj201103004

    He Ren-Jie, Gao Peng, Bao Bai-Cun, Li Ju-Fang, Yao Feng, Xing Li-Ning. Models, algorithms and applications to the mission planning system of imaging satellites. Systems Engineering-Theory & Practice, 2011, 31(3): 411-422 http://d.old.wanfangdata.com.cn/Periodical/xtgcllysj201103004
    [127] 黄瀚, 张晓倩.基于图论模型的成像卫星任务规划方法研究.桂林航天工业学院学报, 2016, 21(2): 155-158 doi: 10.3969/j.issn.1009-1033.2016.02.005
    [128] Xhafa F, Herrero X, Barolli A, Takizawa M. A simulated annealing algorithm for ground station scheduling problem. In: Proceedings of the 16th International Conference on Network-based Information Systems. Gwangju, South Korea: IEEE, 2013. 24-30
    [129] Xhafa F, Herrero X, Barolli A, Takizawa M. A tabu search algorithm for ground station scheduling problem. In: Proceedings of the 28th International Conference on Advanced Information Networking and Applications. Victoria, Canada: IEEE, 2014. 1033-1040
    [130] 张超, 李艳斌.多敏捷卫星协同任务规划调度方法.科学技术与工程, 2017, 17(22): 271-277 doi: 10.3969/j.issn.1671-1815.2017.22.044

    Zhang Chao, Li Yan-Bin. Planning and scheduling method for multi agile satellite coordinated mission. Science Technology and Engineering, 2017, 17(22): 271-277 doi: 10.3969/j.issn.1671-1815.2017.22.044
    [131] Chen Y, Zhang D Y, Zhou M Q, Zou H. Multi-satellite observation scheduling algorithm based on hybrid genetic particle swarm optimization. Advances in Information Technology and Industry Applications. Berlin, Germany: Springer, 2012. 441-448
    [132] 汤绍勋, 易先清, 罗雪山.面向预警卫星调度问题的改进粒子群算法.系统工程, 2012, 30(1): 116-121 doi: 10.3969/j.issn.1001-2362.2012.01.059

    Tang Shao-Xun, Yi Xian-Qing, Luo Xue-Shan. An improved particle swarm optimization algorithm for early warning satellites scheduling problems. Systems Engineering, 2012, 30(1): 116-121 doi: 10.3969/j.issn.1001-2362.2012.01.059
    [133] 常飞, 武小悦.基于改进粒子群算法的卫星数传任务调度.系统工程与电子技术, 2009, 31(10): 2404-2408 doi: 10.3321/j.issn:1001-506X.2009.10.028

    Chang Fei, Wu Xiao-Yue. Satellite data transmission task scheduling based on advanced particle swarm optimization. Systems Engineering and Electronics, 2009, 31(10): 2404-2408 doi: 10.3321/j.issn:1001-506X.2009.10.028
    [134] 国晓博, 刘金灿, 周红彬.分布式卫星系统数传调度研究.无线电通信技术, 2016, 42(4): 29-32 doi: 10.3969/j.issn.1003-3114.2016.04.08

    Guo Xiao-Bo, Liu Jin-Can, Zhou Hong-Bin. Research on transmission task scheduling for distributed satellite systems. Radio Communications Technology, 2016, 42(4): 29-32 doi: 10.3969/j.issn.1003-3114.2016.04.08
    [135] Zhang T J, Ke L J, Li J S, Li J, Li Z X, Huang J Q. Fireworks algorithm for the satellite link scheduling problem in the navigation constellation. In: Proceedings of the 2016 Congress on Evolutionary Computation (CEC). Victoria, Canada: IEEE, 2016. 4029-4037
    [136] 经飞, 王钧, 李军, 陈浩, 景宁.基于吱呀轮优化的多卫星数传调度问题求解方法.宇航学报, 2011, 32(4): 863-870 doi: 10.3873/j.issn.1000-1328.2011.04.024

    Jing Fei, Wang Jun, Li Jun, Chen Hao, Jing Ning. A new scheduling method for multi-satellite data transmission based on squeaky-wheel optimization. Journal of Astronautics, 2011, 32(4): 863-870 doi: 10.3873/j.issn.1000-1328.2011.04.024
    [137] 李志亮, 李小将, 张东来.基于改进DE算法的敏捷成像卫星前摄式调度.系统工程与电子技术, 2018, 40(2): 353-359 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=xtgcydzjs201802017

    Li Zhi-Liang, Li Xiao-Jiang, Zhang Dong-Lai. Proactive scheduling of agile imaging satellite based on improved differential evolution algorithm. Systems Engineering and Electronic, 2018, 40(2): 353-359 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=xtgcydzjs201802017
  • 期刊类型引用(7)

    1. 顾扬,程玉虎,王雪松. 基于优先采样模型的离线强化学习. 自动化学报. 2024(01): 143-153 . 本站查看
    2. 王雪松,王荣荣,程玉虎. 基于表征学习的离线强化学习方法研究综述. 自动化学报. 2024(06): 1104-1128 . 本站查看
    3. 程玉虎,黄龙阳,侯棣元,张佳志,陈俊龙,王雪松. 广义行为正则化离线Actor-Critic. 计算机学报. 2023(04): 843-855 . 百度学术
    4. 王雪松,王荣荣,程玉虎. 安全强化学习综述. 自动化学报. 2023(09): 1813-1835 . 本站查看
    5. Jinying Yang,Yongjun Zhang,Tanju Yildirim,Jiawei Zhang. A Model Predictive Control Algorithm Based on Biological Regulatory Mechanism and Operational Research. IEEE/CAA Journal of Automatica Sinica. 2023(11): 2174-2176 . 必应学术
    6. 满坚平,黄国立,赖聪,陈子怡,周毅. 智能体在医疗健康领域的研究与应用. 医学信息学杂志. 2022(04): 20-26 . 百度学术
    7. 江雨龙,胡文峰,彭涛,阳春华. 基于重置控制的一般线性多智能体系统无领导者一致性问题. 厦门大学学报(自然科学版). 2022(06): 954-960 . 百度学术

    其他类型引用(11)

  • 加载中
  • 图(5)
    计量
    • 文章访问数:  5095
    • HTML全文浏览量:  1549
    • PDF下载量:  1293
    • 被引次数: 18
    出版历程
    • 收稿日期:  2017-11-30
    • 录用日期:  2019-03-28
    • 刊出日期:  2020-03-06

    目录

    /

    返回文章
    返回