兵棋推演的智能决策技术与挑战

尹奇跃; 赵美静; 倪晚成; 张俊格; 黄凯奇

doi:10.16383/j.aas.c210547

兵棋推演的智能决策技术与挑战

doi: 10.16383/j.aas.c210547

尹奇跃^{1, 2,},
赵美静^1,,
倪晚成^{1, 2,},
张俊格^{1, 2,},
黄凯奇^{1, 2,}

1.
中国科学院自动化研究所北京 100190
2.
中国科学院大学北京 100049

基金项目: 国家自然科学青年基金(61906197)资助

详细信息

作者简介:
尹奇跃：中国科学院自动化研究所副研究员. 主要研究方向为强化学习, 数据挖掘和人工智能与游戏. E-mail: qyyin@nlpr.ia.ac.cn

赵美静：中国科学院自动化研究所副研究员. 主要研究方向为知识表示与建模, 复杂系统决策. E-mail: meijing.zhao@ia.ac.cn

倪晚成：中国科学院自动化研究所研究员. 主要研究方向为数据挖掘与知识发现, 复杂系统建模和群体智能博弈决策平台与评估. E-mail: wancheng.ni@ia.ac.cn

张俊格：中国科学院自动化研究所研究员. 主要研究方向为持续学习, 小样本学习, 博弈决策和强化学习. E-mail: jgzhang@nlpr.ia.ac.cn

黄凯奇：中国科学院自动化研究所研究员. 主要研究方向为计算机视觉, 模式识别和认知决策. 本文通信作者. E-mail: kqhuang@nlpr.ia.ac.cn

计量
- 文章访问数: 5560
- HTML全文浏览量: 3357
- PDF下载量: 1460
- 被引次数: 20
出版历程
- 收稿日期: 2021-06-17
- 录用日期: 2021-09-17
- 网络出版日期: 2021-10-24
- 刊出日期: 2023-05-20

Intelligent Decision Making Technology and Challenge of Wargame

YIN Qi-Yue^{1, 2
,},
ZHAO Mei-Jing^1
,,
NI Wan-Cheng^{1, 2
,},
ZHANG Jun-Ge^{1, 2
,},
HUANG Kai-Qi^{1, 2
,}

1.
Institute of Automation, Chinese Academy of Sciences, Beijing 100190
2.
University of Chinese Academy of Sciences, Beijing 100049

Funds: Supported by National Natural Science Youth Foundation of China (61906197)

More Information

Author Bio:
YIN Qi-Yue　Associate professor at the Institute of Automation, Chinese Academy of Sciences. His research interest covers reinforcement learning, data mining, and artificial intelligence on games

ZHAO Mei-Jing　Associate professor at the Institute of Automation, Chinese Academy of Sciences. Her research interest covers knowledge representation and modeling, and complex system decision-making

NI Wan-Cheng　Professor at the Institute of Automation, Chinese Academy of Sciences. Her research interest covers data mining and knowledge discovery, complex system modeling, and swarm intelligence platform and evaluation

ZHANG Jun-Ge　Professor at the Institute of Automation, Chinese Academy of Sciences. His research interest covers continuous learning, small sample learning, game decision making, and reinforcement learning

HUANG Kai-Qi　Professor at the Institute of Automation, Chinese Academy of Sciences. His research interest covers computer vision, pattern recognition, and cognitive decision-making. Corresponding author of this paper

摘要

摘要: 近年来, 以人机对抗为途径的智能决策技术取得了飞速发展, 人工智能(Artificial intelligence, AI)技术AlphaGo、AlphaStar等分别在围棋、星际争霸等游戏环境中战胜了顶尖人类选手. 兵棋推演作为一种人机对抗策略验证环境, 由于其非对称环境决策、更接近真实环境的随机性与高风险决策等特点, 受到智能决策技术研究者的广泛关注. 通过梳理兵棋推演与目前主流人机对抗环境(如围棋、德州扑克、星际争霸等)的区别, 阐述了兵棋推演智能决策技术的发展现状, 分析了当前主流技术的局限与瓶颈, 对兵棋推演中的智能决策技术研究进行了思考, 期望能对兵棋推演相关问题中的智能决策技术研究带来启发.
- 兵棋推演 /
- 人机对抗 /
- 智能决策技术 /
- 博弈学习
Abstract: In recent years, decision-making intelligence based on human-machine confrontation has achieved rapid development. For example, artificial intelligence (AI) technology such as AlphaGo and AlphaStar have defeated top human players in games Go and StarCraft, respectively. Nowadays, wargame, as a new verification environment for human-machine confrontation, attracts more and more researchers due to new challenges being raised, i.e., asymmetric environmental decision-making and randomness with high-risk decision-making. In this paper, we will sort out the differences between wargame and the current mainstream human-machine confrontation environments such as Go, Poker and StarCraft. Then, we explain the development status of wargame intelligent technology, and analyze the limitations of current mainstream technologies. Finally, we present our thoughts about future development of technologies for wargame, hoping to inspire researchers for through study on wargame.
- Wargame /
- human-machine confrontation /
- intelligent decision making technology /
- game learning
注释:

1) ¹ http://turingai.ia.ac.cn

2) ² http://turingai.ia.ac.cn/ranks/wargame_list

3) ³ http://turingai.ia.ac.cn/notices/detail/116

4) ⁴ http://turingai.ia.ac.cn/bbs/detail/14/1/29

5) ⁵ http://www.cas.cn/syky/202107/t20210712_4798152.shtml

6) ⁶ http://gym.openai.com

HTML全文

在当今全球以"工业4.0"、"智慧工厂1.0"、"云制造"等智能制造^[1-2]为驱动的大背景下, 由于电子技术、计算机技术和自动化技术等先进技术的不断应用, 使得制造业正在向着数字化、网络化和智能化发展, 而此趋势的实现需要以制造信息准确、有效的传递为支撑^[3-6].因此, 制造信息在制造企业的生产与管理中所起的作用则愈加凸显.制造企业要想在竞争激烈的市场中占得先机, 必须快速响应并满足客户瞬息万变的个性化需求.由此, 多品种小批量的生产方式就逐渐替代了标准化和规模化的大批量生产方式.数据显示, 在制造业发达的美国和日本等国, 采用多品种小批量方式进行生产的企业已经占到所有企业的75\, %^[7].有鉴于此, 如何使制造信息准确而有效地为实施多品种小批量制造企业的生产和管理服务, 是学术界和工业界必须解决的问题.

基于多品种、小批量生产模式的先进制造技术与管理方法在近几十年的研究中不断取得突破性进展.从制造资源计划、成组技术、准时生产制、精益生产, 再到敏捷制造、虚拟制造以及如今的智能制造等, 每一个发展阶段的突破都离不开对制造信息的有效开发与利用.对实施多品种、小批量生产模式的制造企业来说, 除了生产过程中与产品本身相关的制造信息, 如:工艺结构、加工时间、工序排程等为制造过程所必须之外, 组织和管理生产所需的信息, 如:生产计划信息、生产调度信息、确定批量、批次大小及生产模式所需决策信息等, 也会对制造企业的生产运作产生重要影响, 必须加以重视.

当前, 聚焦利用先进制造信息实现对多品种、小批量生产方式进行控制与管理的研究更多地表现在制造的数字化、网络化和智能化研究方面, 即与产品本身相关的制造信息方面, 具体为:

1) 数字化制造方面, Cheng等^[8]提出了数字化工厂的概念, 以产品生命周期为基础, 通过计算机虚拟制造, 实现产品的仿真与数字化生产; Li等^[9]认为数字化工厂技术是智能工厂的基础, 而要实现数字化制造, 构建基础制造信息平台至关重要; Lu等^[10]结合离散制造系统仿真方案的鲁棒性特点, 利用基于WEB的智能技术, 提出了一种数字企业生产计划与控制的优化算法, 提高了企业制定计划的效率和精确性.

2) 云制造方面, 李伯虎等^[11]提出"云制造''模式, 指出了云制造的典型技术特征:虚拟化、物联化、服务化、协同化、智能化; Wang等^[12]则根据市场的动态变化, 用虚拟生产线技术动态组织逻辑生产线, 增强了企业制造过程的灵活性; Li等^[13]则将遗传算法运用于虚拟制造企业中, 实现了从多个制造资源中选择合适的制造资源.

3) 智能制造方面, Yin等^[14]建立了一套智能工厂的制造系统结构体系, 详细介绍了工厂自动化、信息物理融合系统和顾客需求之间的联系; Nouiri等^[15]则考虑其制造系统操作的不确定性, 提出了智能制造系统的概念, 实现数字企业对当前生产的内外部环境的快速适; Ta等^[16]则把制造系统分成小的柔性制造单元, 各个单元由特殊硬件控制, 工作单元控制则由实时软件Vxworks来完成, 提高了制造系统的敏捷响应能力.

另外, 针对与本研究有关的小批量生产中工件在设备上的加工工序相同, 且只加工一次的批量与批次信息控制问题, 目前的研究方法都是利用不同的智能算法来求解批量与目标函数间的约束关系, 如遗传算法、模拟退火算法、和声搜索算法、蚁群算法、粒子群算法等, 其中较有代表性的为: Ding等^[17]研究得出了批量数目与最大完工时间指标呈正比关系的结论. Busogi等^[18]分析了不同类型的工件批量调度问题的不同特征, 提出了批量分类机制与多台机器流水作业的支配关系. Defersha等^[19]提出了利用并行遗传算法来解决柔性批量流水线的调度问题, 结果证明了并行操作方式可提高后继启发式算法的效率. Karaboga^[20]于2009年提出了人工蜂群算法, 在目标函数优化与流水线排序问题中均得到了较广泛的应用. Pan等^[21]提出了基于概率论的分布估计算法:从父代种群中选取若干个体, 建立概率模型, 并通过学习与采样等行为, 生成新的个体.

综合以上文献可以看出, 尽管上述研究中利用制造信息实现了对多品种、小批量生产模式的数字化或者智能化的改造升级, 以及应用各类算法来控制与管理流水车间的批量、批次问题, 但均为结合某种先进技术或优化方法来实现制造系统的某种特定的功能; 而在针对小批量生产中的批量及批次与管理生产所需信息量的关系的研究方面, 则鲜有涉及, 如:在生产某一种产品时, 当产品总量不变的情况下, 采用小批量、多批次生产是否比管理大批量、少批次生产需要更多的信息以及批量、批次与管理生产所需信息具有有怎样的变化关系等问题.针对这些问题, 本文根据信息熵原理, 在求解度量生产中批量和批次的信息量表达式的基础上, 利用度量信息量的工具, 即信息熵函数, 构建生产线批量及批次的信息熵函数关系.在此基础上, 提出生产批次与熵函数变化关系的两个定理, 并利用求导法和极值法给予充分证明, 从而在理论上证明了流水车间的加工批量向小批量转移意味着所需信息量也将减少.最后, 分别以10和20个工作站的流水车间为例进行实证研究, 再次验证了所提定理的正确性.开展批量与批次的信息量测度及其与管理生产所需要信息量关系的理论研究, 无论是对实际流水车间生产批量与批次的作业安排方面, 还是为决策者提供最终生产方式的选择依据方面, 都具有重要的理论支撑和现实指导意义.

1. 熵函数的信息度量特性

由于现有的批量及批次信息控制研究主要集中在流水车间的批量调度问题上, 其研究方法是利用不同的智能算法求解批量与目标函数间的约束关系, 鲜有针对生产中的批量及批次与管理生产所需信息量关系的研究, 因此其研究结论很难为决策者从信息管理角度选择生产方式提供理论依据.另外, 如何在众多描述批量及批次的状态信息中准确选择度量参数并构建其函数变量关系, 也是其难点所在.因此, 要开展生产批量及批次信息的测度研究, 应首先对其度量工具进行选择和界定.

为实现对生产线上零部件或产品运行状态信息的度量, 首先引入信息熵概念.信息熵这一概念于1948年由Shannon引入.他在借鉴热力学熵的基础上, 将其用于研究信息的发送与接收机制等特性, 并逐渐发展成为用于度量信息量的一种工具.信息熵函数定义及其基本性质如下:

设给定一组事件的集合为$ E\{e_1, e_2, \cdots, e_n\} $, 且事件发生的先验概率分别为$ (p_1, p_2, \cdots, p_n) $, 其中有

$$ \begin{equation} p_i\geq 0\; \text{且}\; \sum\limits_{i = 1}^np_i = 1 \end{equation} $$

(1)

则熵函数可定义为

$$ \begin{equation} H = -\sum\limits_{i = 1}^np_i\cdot \log p_i \end{equation} $$

(2)

其中, $ e_i $和$ p_i\; (i = 1, 2, \cdots, n) $表示该集合中$ n $个可能的状态及各状态发生的概率, 则$ H $为信息熵函数, 即描述集合$ E $时的信息量.信息熵函数越大, 事件的不确定性越大.

在此, 对本文研究中的批量及批次信息量测度概念进行说明.针对流水车间的批量及批次的信息量测度意味着对描述不同批量及批次在工作站的状态所需信息量的度量, 即求解批量及批次的信息熵表达.

2. 流水车间生产线信息熵函数

为构建生产系统的熵模型, 对生产某产品A的生产线进行定义.首先, 对文中所涉及到的相关概念进行说明, 其中, 批量为生产线上一次连续操作中, 加工一批零件或产品的数量; 批次则为生产线上所加工的批量的次数.下面, 根据熵函数的信息特性, 首先建立度量流水车间生产线加工批次信息量的表达式.

2.1 加工批次在生产线上的信息熵

设产品$ A $的零部件在工作站上的加工按照不同批次由左向右依次进行, 最终进入成品库存中, 且产品$ A $在此生产线上的加工路线均保持一致, 且只加工一次(即流水车间的作业问题), 其总产量为常数, 工作站为串行(工作站可为加工设备、检测设备或某服务流程等).其中, $ M_i $为生产线中的第$ i $个工作站; $ R $为工作站数量; $ T_i $为工作站$ M_i $处理完一个批次零件所需时间(简单起见, 假设产品在每个工作站的生产时间为确定的); $ T_L $为该批次在生产线上的加工时间; $ T_S $为该批次的已加工完成零部件的等待时间; $ T_A $为该批次在此条生产线上的总时间, 具体表示见图 1.由此, 下面推导在总产量保持不变的情况下, 当加工零件的批量改变时, 度量生产线状态所需的信息量将如何变化.

图 1 加工批次在工作站上的时间表示

Fig. 1 Time representation of the processing batch on the workstation

下载: 全尺寸图片幻灯片

首先, 我们来求解度量单批次零件信息量的表达式; 然后, 推导出所有批次零件在此条生产线上的信息量表达式.根据已知条件, 单批次零件在工作站$ M_i $上的概率$ P_i $可由下式求解:

$$ \begin{equation} P_i = \frac{T_i}{T_A} \end{equation} $$

(3)

此批零件在生产线上的生产时间为

$$ \begin{equation} T_L = \sum\limits_{i = 1}^RT_i \end{equation} $$

(4)

则此批零件在生产线工作站上概率为

$$ \begin{equation} P_L = \frac{T_L}{T_A} = \sum\limits_{i = 1}^R\frac{T_i}{T_A} \end{equation} $$

(5)

已加工完成零件的等待时间为

$$ \begin{equation} T_S = T_A-T_L = T_A-\sum\limits_{i = 1}^RT_i \end{equation} $$

(6)

其在库存的概率则为

$$ \begin{equation} P_S = \frac{T_S}{T_A} = \frac{T_A-\sum\limits_{i = 1}^RT_i}{T_A} \end{equation} $$

(7)

由此, 单批次零件在工作站上的信息量$ H_A $为

$$ \begin{equation} H_A = -\sum\limits_{i = 1}^RP_i\cdot \log P_i-P\cdot \log P_S \end{equation} $$

(8)

由于产品经各工作站生产加工进入库存后, 有关于其储存的信息已经与生产线上加工批量的大小无关, 可不必考虑对此部分信息的度量.因此, 在求解度量产品生产线所有批次的信息量时, 只需要将单个批次所需的信息量乘以生产批次数(本文只讨论不同批次加工时间相同的情况), 再乘以其在工作站的概率即可.令批次数量为$ Q $, 则度量所有批次在工作站状态所需信息量(即系统熵)表达式$ H_Q $为

$$ \begin{equation} H_Q = P_L\cdot Q\cdot H_A = \left(\sum\limits_{i = 1}^R\frac{T_i}{T_T}\right)\cdot Q\cdot H_A \end{equation} $$

(9)

2.2 加工批量及批次的信息熵函数

本节将讨论度量生产线状态所需信息量与批量大小的函数关系.首先, 为方便讨论, 本节中的加工批次时间是指无论批次内的零件是相同品种还是不同品种, 都将其看作是这一批次零件的完成时间, 即一个加工批次作为一个整体; 而对于不同批次的零件的加工, 本文只讨论不同批次中零件的种类均相同的情况, 即不同批次零件的加工批次时间均相同.令$ D $为加工零件的总数; $ N $为单个批次的加工数量, 即加工批量; $ T $为单批次零件在一个工作站的加工时间; $ R $为工作站数量.则有: $ D = NQ $, 其中$ Q $为批次数量.由此, 生产线上所有零部件的加工时间为

$$ \begin{equation} T_L = \sum\limits_{i = 1}^RT_i = N\cdot T\cdot R \end{equation} $$

(10)

由于每批次在工作站上的加工时间相同, 则有下式

$$ \begin{equation} T_A = K\cdot T_L \end{equation} $$

(11)

$ T_A $为单批次零件在工作站上总的加工时间.因此, 上式表示单批次在系统中的总时间, 若将所有零件的加工视为一个批次, 则有

$$ \begin{equation} T_A = K(T_L)_{Q = 1} = K(T_L)_{D = N} = D\cdot T\cdot R\cdot K \end{equation} $$

(12)

式中, $ K $的意义为:将所有零件视为一个批次加工时, $ K $是零件在系统中的总时间(零件加工时间+加工完成零件的等待时间)与净时间(零件加工时间)的比率.因此, $ K\geq 1 $.

由于单批次零件出现在某个工作站上的概率就是其加工时间占在工作站上总时间的比率, 因此, 单批次零件在第$ i $个工作站的概率为

$$ \begin{align} P_i = \, &\frac{T_i}{T_A} = N\cdot \frac{T}{T_A} = N\cdot \frac{T}{K\cdot D\cdot T\cdot R} = \\ &\frac{N}{K\cdot D\cdot R} \end{align} $$

(13)

则单批次零件在生产线所有工作站上的概率, 即为出现在某个工作站的概率与工作站数量的乘积, 因此有

$$ \begin{align} P_L = \, &\sum\limits_{i = 1}^RP_i = R\cdot P_i = R\cdot \frac{N}{K\cdot D\cdot R} = \\&\frac{N}{K\cdot D} \end{align} $$

(14)

根据第2.1节中已知条件可知, 加工完成单批次零件的等待时间为

$$ \begin{align} T_S = \, &T_A-T_L = D\cdot T\cdot R\cdot K-N\cdot T\cdot R = \\&T\cdot R(DK-N) \end{align} $$

(15)

同样, 由第2.1节已知条件, 则已加工完成单批次零件在库存中的概率为

$$ \begin{align} P_S = \, &\frac{T_S}{T_A} = T\cdot \frac{R(DK-N)} {D\cdot T\cdot R\cdot K} = \\ &\frac{DK-N}{D\cdot K} \end{align} $$

(16)

根据第2.1节中求得的单批次零件在工作站信息量表达式, 则有单批次零件在工作站的信息量$ H_A $与其批量大小的函数关系为

$$ \begin{align} H_A = \, &\sum\limits_{i = 1}^RP_i\cdot \log P_i-P_S\cdot \log P_S = \\ &-\dfrac{N}{KD}\log\dfrac{N}{KDR}-\dfrac{DK-N}{KD} \log\dfrac{DK-N}{KD} \end{align} $$

(17)

将已知条件$ D = NQ $代入上式中, 则有单批次零件在工作站信息量HA与其批次的函数关系为

$$ \begin{equation} H_A = -\dfrac{1}{KQ}\log \dfrac{1}{KQR}-\dfrac{KQ-1}{KQ} \log\dfrac{KQ-1}{KQ} \end{equation} $$

(18)

同样有, 所有批次在工作站信息量(即系统熵) $ H_Q $与加工批次的函数为

$$ \begin{equation} H_Q = P_L\cdot Q\cdot H_A = Q\cdot \left\{\frac{N}{K\cdot D}\right\}\cdot H_A \end{equation} $$

(19)

再次将$ D = NQ $代入上式中, 则有

$$ \begin{align} H_Q = \, &P_L\cdot Q\cdot H_A = \dfrac{1}{K}\cdot H_A = \\&-\dfrac{1}{K^2Q}\cdot \log\dfrac{1}{KQR}-\\& \dfrac{KQ-1}{K^2Q}\cdot \log\dfrac{KQ-1}{KQ} \end{align} $$

(20)

则系统信息熵$ H_Q $与其加工批量的函数关系为

$$ \begin{align} H_Q = \, &-\dfrac{N}{K^2D}\cdot \log \dfrac{N}{KDR}-\\& \left(\dfrac{1}{K}-\dfrac{N}{K^2D}\right)\cdot \log\left(1-\dfrac{N}{kD}\right) \end{align} $$

(21)

由式(20)和式(21)可以看出, 生产线信息熵是加工批次数量、工作站数量及时间参数$ K $的函数, 即描述生产线上产品状态所需的信息量与生产线上所加工产品的批次数、生产线上的工作站数量以及零件在系统中的总时间(零件加工时间+加工完成零件等待时间)与净时间(零件加工时间)的比值有关; 进一步分析还可看出, 当批次数量和工作站数量大于2时, 此函数具有单调递减特性(具体证明过程见第3节).

3. 系统信息熵函数与加工批次定理

上节中, 利用单批次零件在生产线工作站的概率以及已加工完成零件在库存中的概率推导出单批次零件信息量与其批量及批次的函数关系表达式, 并最终得出所有批次在工作站的信息量与其加工批次和批量的函数关系.本节中则根据上述推导结果提出并证明信息熵函数的加工批次定理.

3.1 加工批次熵函数的单调特性

根据所构建的生产系统信息熵函数可知, 系统的信息熵是加工批次数、工作站数量及时间参数K的函数.通过分析系统信息熵函数关系式(20), 我们可以提出如下定理:

定理1. $ Q\geq 2 $且$ R\geq 2 $时, 加工批次的信息熵函数单调递减

证明设$ H_1 $是与系统熵函数$ H_Q $相同的连续函数, 方便起见将其导数分为两部分, 分别为$ D_1 $和$ D_2 $, 则有

$$ \begin{equation} \frac{{\rm d}H_1}{{\rm d}Q} = D_1+D_2 \end{equation} $$

(22)

将式(22)与式(20)对比, 为方便讨论, 此处取自然对数, 则有

$$ \begin{align} D_1 = \, &-\frac{1}{\ln (2)}\cdot \frac{1}{K^2}\cdot \left\{-\frac{1}{Q^2}\cdot \ln \frac{1}{QRK}+ \frac{1}{Q}\cdot \right. \\ &\left.QRK \cdot \left(-\frac{1}{Q^2RK}\right)\right\} = \\ &\frac{1}{\ln (2)}\cdot \frac{1}{K^2}\cdot \frac{1} {Q^2}\left(\ln \frac{1}{QRK+1}\right) \end{align} $$

(23)

对上式进行分析可以看出, 当公式$ \ln \dfrac{1}{QRK} = 1 $时, $ D_1 $为0.即$ Q = e/RK $时, 式(20)的第一部分取得极值.根据上节可知$ K\geq 1 $且$ R\geq 2 $ (工作站至少为2个), 即$ Q\leq 2/e<1.36 $时取得极大值, 因此对$ Q\geq 2 $时, 式(21)的第一部分单调递减.

再将式(22)与式(20)对比, 则有

$$ \begin{align} D_2 = \, &-\dfrac{1}{\ln (2)}\cdot \dfrac{1}{K^2}\cdot \left\{-\dfrac{1}{KQ^2}\cdot \right. \\ &\left. \ln\left(1-\dfrac{1}{KQ}\right)\cdot \dfrac{KQ}{KQ-1}\cdot \dfrac{1}{Q^2K} \right\} = \\ &-\dfrac{1}{\ln(2)}\cdot\dfrac{1}{K^2}\cdot\dfrac{1}{Q^2} \left\{\ln\left(1-\dfrac{1}{KQ}\right)+1 \right\} \end{align} $$

(24)

同样, 对上式进行分析, 当式$ \ln(1-{1}/{(KQ)}) = -1 $时, $ D_2 $为0.即$ Q = 1/(K\cdot(1-e^{-1})) $时, 式(20)的第二部分取得极值.由于$ K\geq 1 $, 则$ Q\leq 1-e^{-1}<1.58 $.由此, 我们可以看到式(20)的两部分极大值均在1和2之间, 这也是它们唯一的极大值.因此, 对于$ Q\geq 2 $, Q的熵函数$ H_1 $作为整体则单调递减, 而连续函数H1到非连续熵函数$ H_Q $的过渡是即时的.

3.2 加工批次熵函数的极值特性

加工批次的信息熵函数单调特性反映了加工批次的变化所引起的管理系统所需信息量变化的关系性质, 由定理1, 还可得出如下推论:

1) 条件$ R\geq 2 $并不限制定理的一般性, 因为对任何生产线来说, 至少要有两个工作站, 否则不称为生产线.

2) 批次增加(或减少批量), 会降低系统信息熵.在以上推论的基础上, 我们可以提出信息熵函数的加工批次定理2.

定理2. 随着批次趋于无穷大, 系统信息熵趋于零

证明当批次趋于无穷大时, 有$ 1-{1}/{(1-QK)}\to 1 $, 因此$ \log (1-{1}/{(QK)})\to 0 $.即式(20)的第二部分趋于0.同时, 由于有下式成立

$$ \begin{equation} \lim\limits_{A\to 0}A\cdot \log A = 0 \end{equation} $$

(25)

对比式(20)的第一部分, 当批次$ Q $趋于无穷大时, 则有

$$ \begin{align} -\frac{1}{K^2Q}\cdot \, &\log \frac{1}{KQR} = -\frac{R}{K}\cdot\ & \frac{1}{KQR}\cdot \log \frac{1}{KQR}\to 0 \end{align} $$

(26)

因此, 当批次无穷大时, 熵函数表达式(20)第一部分也倾向于零.因此, 随着批次趋向于无穷大, 整个熵函数$ H_Q $趋于零.

3.3 分析与讨论

通过上述信息熵函数的两个加工批次定理的提出与证明, 无论对流水车间批量及批次与管理生产所需信息量变化关系的问题理论研究, 还是实际流水车间的批量与批次生产的作业安排都具有重要的理论与现实意义.

1) 定理1的推论中存在一个特例, 系统信息熵将随着批次的增加而增加.即当加工批次从数从1变为2时, 其时间参考意义为该批次零件加工完成后的等待时间或成品库存时间非常短(即$ K $接近1).

2) 定理2的实际意义为:当生产批次数量变得非常大时, 则不需要信息.此时代表系统无需控制或处于自控状态.

3) 从定理1和2可以看出:在流水车间多品种产品的生产过程中, 加工批量向较小批量(或较大批次)的转移则意味着较少的信息需求.在采用"小批量生产方式''的准时生产制或精益生产已逐渐成为主流生产方式的背景下, 这一结论的得出从信息需求角度为这些先进生产方式的推广与应用提供了理论支撑.由此, 两个定理在实际生产中的应用性表现为:由于采用小批量生产其信息需求更少, 即生产所需信息量更小, 因此相同条件下, 实际流水车间多品种产品的生产更宜采用小批生产, 其生产中的不确定性更小, 即:生产过程与事先的生产计划也更为相符, 最终, 产品的加工生产则更加易于控制和管理.

4) 加工批次定理的得出有助于消除过去在实际生产中形成的认识上的误区, 即:批次增多, 则需要投入更强大的信息系统或信息设备来管理更多的批次.而本文的研究结论则表明, 向小批量、多批次生产的过渡带来了信息需求的减少, 而不是通常被认为的增长.这更有助于相关企业在实际生产中采用小批量生产等先进生产方式进行生产.

5) 本文中的结论也与实施先进生产方式准时生产制时需采用混流生产为前提相一致(混流生产中要求不同种类的产品在生产平衡的基础上尽量安排更少的批量和更大的批次).

6) 本文在第2.1节的讨论中, $ T_i $为工作站$ M_i $处理完一个批次零件所需时间, 即一个加工批次作为一个整体, 因此无论批次内的零件是相同品种还是不同品种(但应具有相似工艺结构, 才可在同一生产线上加工), 都可以将$ T_i $看作是这一批次零件的完成时间; 而对于不同批次的零件的加工, 本文目前只讨论不同批次中零件的种类均相同的情况, 对由不同批次零件种类不同而造成的加工时间不同的情况, 将在下一阶段的研究中加以讨论.

4. 实证研究

根据上节所建立的生产线系统熵函数与批次、批量的相互关系及所得出结论, 本节以一个流水车间加工齿轮轴生产线为例进行实证研究.

某工厂第四分厂实施精益生产的流水车间, 其生产线上的10个工作站采用U型布置进行非标准齿轮轴的加工生产.原材料由入口进入, 按照不同批次由左向右沿U型生产线, 从1到10依次经过10个工作站进行加工, 全部加工完成后由出口进入成品库存中, 其工作站布置及加工流程示意图如图 2所示.原材料及零件在此生产线上的加工路线均保持一致, 且只加工一次, 且总产量为常数, 工作站为串行连接.

图 2 流水车间工作站布置及加工流程示意图

Fig. 2 Flow shop workshop layout and processing flow diagram

下载: 全尺寸图片幻灯片

根据式(20), 为了计算简便, 此处选取时间比例系数$ K = 1 $.将$ R = 10 $代入式中, 有

$$ \begin{equation} H_Q = -\dfrac{1}{Q}\cdot \log \dfrac{1}{10Q}- \dfrac{Q-1}{Q}\cdot \log\dfrac{Q-1}{Q} \end{equation} $$

(27)

可以看出, 上式中只有批次数量$ Q $为变量, 我们依次将不同批次$ 10, 20, 30, \cdots, 120 $分别代入式(27)中, 可得出不同批次条件下的系统信息熵值.为了更加直观地反映两者之间的关系, 我们采用作图法进行验证; 同时, 也为更好地验证当工作站数量不同时, 同样批次数量变化的信息熵函数是否具有相似的变化趋势, 我们对比做出了当工作站数量分别$ R = 10 $和$ R = 20 $两条系统熵与批次函数变化曲线, 如图 3所示.

图 3 工作站数量为10和20的系统熵与批次函数关系曲线

Fig. 3 The entropy function curve of the size of lots with 10 and 20 workstations

下载: 全尺寸图片幻灯片

同理, 根据式(21), 将$ K = 1, R = 10 $代入式中, 有

$$ \begin{align} H_Q = \, &-\dfrac{N}{D}\cdot \log\dfrac{N}{10D}- \\&\left(1-\dfrac{N}{D}\right)\cdot \log\left(1-\dfrac{N}{D}\right) \end{align} $$

(28)

可以看出, 式中有批量数$ N $和零件加工总数$ D $为变量.由于$ D = NQ $, 将图 3中的批次数量分别代入即可得出零件加工总数, 再依次将批量$ 2, 4, 6, \cdots, 40 $代入式中, 可分别得出不同批次条件下的系统信息熵值.根据计算结果, 我们同样做出当工作站数量分别$ R = 10 $和$ R = 20 $两条不同批量条件下的系统信息熵与批量的函数关系变化图, 如图 4所示.

图 4 工作站数量分别为10时和20时系统熵与批量函数关系曲线

Fig. 4 The entropy function curve of the number of lots with 10 and 20 workstations

下载: 全尺寸图片幻灯片

根据图 3及图 4可以看出, 当工作站数量分别$ R = 10 $和$ R = 20 $时, 系统熵与批次及批量的函数关系都具有此种特征, 即:加工批次增加时, 则系统的信息熵减小, 且无限趋近于零; 加工批量降低时, 则系统的信息熵减小, 且两种情况下的信息熵函数均具有单调性; 而当工作站数量发生变化时, 系统熵与批次及批量函数的两条曲线具有相似的形状和变化趋势.因此, 实例研究再次验证了所提出两个加工批次定理的正确性, 也为流水车间合理安排加工批量及批次提供重要理论支撑和决策依据.

5. 结论

针对目前研究中鲜有涉及小批量生产中批量、批次与管理生产所需信息量的变化关系问题, 本文在分析熵函数信息度量特性的基础上, 详细讨论流水车间生产线上加工批量、批次与生产系统的信息熵函数相互关系.

1) 根据信息熵原理, 在分析熵函数信息度量特性的基础上, 分别建立流水车间生产线加工批量、批次与生产系统的信息熵函数.

2) 在所建熵函数基础上, 提出生产批次与熵函数变化关系的两个定理, 即:生产批次的信息熵函数单调递减; 批次趋于无穷大时, 系统信息熵趋于零, 并利用求导法和极值法给予证明, 理论上首次证明流水车间的加工批次增加(或批量减小), 系统的信息熵降低.

3) 实证研究中, 取工作站数量分别为10和20, 对所建生产线系统熵函数与批次、批量的相互关系加以分析, 以图示表达的计算结果再次验证了所提定理的正确性.

通过本文的批量与批次的信息量测度理论与实证研究可以看出:流水车间的多品种产品生产在加工批次增加同时批量降低时, 同等条件下, 采用小批量生产其信息需求更少, 从而生产中的不确定性也更小, 其生产过程与事先的生产计划也更加相符, 产品的加工生产则更易控制与管理, 这对实际流水车间生产批量与批次的作业安排以及为决策者提供生产方式的合理选择, 都具有重要的理论支撑和现实指导意义.

¹ http://turingai.ia.ac.cn
² http://turingai.ia.ac.cn/ranks/wargame_list
³ http://turingai.ia.ac.cn/notices/detail/116
⁴ http://turingai.ia.ac.cn/bbs/detail/14/1/29
⁵ http://www.cas.cn/syky/202107/t20210712_4798152.shtml
⁶ http://gym.openai.com

图 1 包以德循环

Fig. 1 OODA loop

下载: 全尺寸图片幻灯片

图 2 自博弈加强化学习训练

Fig. 2 Self-gaming and reinforcement learning

下载: 全尺寸图片幻灯片

图 3 IMAPLA用于兵棋推演智能体训练

Fig. 3 IMAPLA for training wargame agents

下载: 全尺寸图片幻灯片

图 4 知识与数据驱动“加性融合”框架

Fig. 4 Additive fusion framework of knowledge and data driven

下载: 全尺寸图片幻灯片

图 5 人机对抗框架^[45]

Fig. 5 Human-machine confrontation framework^[45]

下载: 全尺寸图片幻灯片

图 6 知识与数据驱动“主从融合”框架

Fig. 6 Principal and subordinate fusion framework of knowledge and data driven

下载: 全尺寸图片幻灯片

图 7 智能体单项能力评估

Fig. 7 Evaluation of specific capability of agents

下载: 全尺寸图片幻灯片

图 8 “图灵网”平台

Fig. 8 Turing website platform

下载: 全尺寸图片幻灯片

图 9 兵棋推演知识库构建示例

Fig. 9 Example of knowledge base construction for wargame

下载: 全尺寸图片幻灯片

图 10 兵棋推演中的异步协同与同步协同对比

Fig. 10 Comparison between asynchronous cooperation and synchronous cooperation in wargame

下载: 全尺寸图片幻灯片

图 11 兵棋推演大模型训练挑战

Fig. 11 Challenges of training big model for wargame

下载: 全尺寸图片幻灯片

图 12 排兵布阵问题示例

Fig. 12 Example for problem of arranging arms

下载: 全尺寸图片幻灯片

图 13 异步协同问题示例

Fig. 13 Example for problem of asynchronous multi-agent cooperation

下载: 全尺寸图片幻灯片

表 1 对决策带来挑战的代表性因素

Table 1 Representative factors of challenge decision-making

游戏	雅达利	围棋	德州扑克	星际争霸	兵棋推演
不完美信息博弈	√	×	√	√	√
长时决策	√	√	×	√	√
策略非传递性	×	√	√	√	√
智能体协作	×	×	×	√	√
非对称环境	×	×	×	×	√
随机性与高风险	×	×	×	×	√

下载: 导出CSV

参考文献(101)

[1]	Campbell M, Hoane A J Jr, Hsu F H. Deep blue. Artificial Intelligence, 2002, 134(1-2): 57-83 doi: 10.1016/S0004-3702(01)00129-1
[2]	Silver D, Huang A, Maddison C J, Guez A, Sifre L, van den Driessche G, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587): 484-489 doi: 10.1038/nature16961
[3]	Brown N, Sandholm T. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 2018, 359(6374): 418-424 doi: 10.1126/science.aao1733
[4]	Vinyals O, Babuschkin I, Czarnecki W M, Mathieu M, Dudzik A, Chung J, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 2019, 575(7782): 350-354 doi: 10.1038/s41586-019-1724-z
[5]	Ye D H, Chen G B, Zhang W, Chen S, Yuan B, Liu B, et al. Towards playing full MOBA games with deep reinforcement learning. In: Proceedings of the Advances in Neural Information Processing Systems 33. Virtual Event: MIT Press, 2020.
[6]	胡晓峰, 贺筱媛, 陶九阳. AlphaGo的突破与兵棋推演的挑战. 科技导报, 2017, 35(21): 49-60 Hu Xiao-Feng, He Xiao-Yuan, Tao Jiu-Yang. AlphaGo＇s breakthrough and challenges of wargaming. Science & Technology Review, 2017, 35(21): 49-60
[7]	胡晓峰, 齐大伟. 智能化兵棋系统: 下一代需要改变的是什么. 系统仿真学报, 2021, 33(9): 1997-2009 Hu Xiao-Feng, Qi Da-Wei. Intelligent wargaming system: Change needed by next generation need to be changed. Journal of System Simulation, 2021, 33(9): 1997-2009
[8]	吴琳, 胡晓峰, 陶九阳, 贺筱媛. 面向智能成长的兵棋推演生态系统. 系统仿真学报, 2021, 33(9): 2048-2058 Wu Lin, Hu Xiao-Feng, Tao Jiu-Yang, He Xiao-Yuan. Wargaming eco-system for intelligence growing. Journal of System Simulation, 2021, 33(9): 2048-2058
[9]	(徐佳乐, 张海东, 赵东海, 倪晚成. 基于卷积神经网络的陆战兵棋战术机动策略学习. 系统仿真学报, 2022, 34(10): 2181-2193.) Xu Jia-Le, Zhang Hai-Dong, Zhao Dong-Hai, Ni Wan-Cheng. Tactical maneuver strategy learning from land wargame replay based on convolutional neural network. Journal of System Simulation, 2022, 34(10): 2181-2193.
[10]	Moy G, Shekh S. The application of AlphaZero to wargaming. In: Proceedings of the 32nd Australasian Joint Conference on Artificial Intelligence. Adelaide, Australia: 2019. 3−14
[11]	Wu K, Liu M, Cui P, Zhang Y. A training model of wargaming based on imitation learning and deep reinforcement learning. In: Proceedings of the Chinese Intelligent Systems Conference. Beijing, China: 2022. 786−795
[12]	胡艮胜, 张倩倩, 马朝忠. 兵棋推演系统中的异常数据挖掘方法. 信息工程大学学报, 2020, 21(3): 373-377 doi: 10.3969/j.issn.1671-0673.2020.03.019 Hu Gen-Sheng, Zhang Qian-Qian, Ma Chao-Zhong. Outlier data mining of the war game system. Journal of Information Engineering University, 2020, 21(3): 373-377 doi: 10.3969/j.issn.1671-0673.2020.03.019
[13]	(张锦明. 运用栅格矩阵建立兵棋地图的地形属性. 系统仿真学报, 2016, 28(8): 1748-1756.) doi: 10.3969/j.issn.1673-3819.2018.05.016 Zhang Jin-Ming. Using raster lattices to build terrain attribute of wargame map. Journal of System Simulation, 2016, 28(8): 1748-1756. doi: 10.3969/j.issn.1673-3819.2018.05.016
[14]	Chen L, Liang X, Feng Y, Zhang L, Yang J, Liu Z. Online intention recognition with incomplete information based on a weighted contrastive predictive coding model in wargame. IEEE Transaction on Neural Networks and Learning Systems. 2022, DOI: 10.1109/TNNLS.2022.3144171
[15]	王桂起, 刘辉, 朱宁. 兵棋技术综述. 兵工自动化, 2012, 31(8): 38-41, 45 doi: 10.3969/j.issn.1006-1576.2012.08.012 Wang Gui-Qi, Liu Hui, Zhu Ning. A survey of war games technology. Ordnance Industry Automation, 2012, 31(8): 38-41, 45 doi: 10.3969/j.issn.1006-1576.2012.08.012
[16]	彭春光, 赵鑫业, 刘宝宏, 黄柯棣. 兵棋推演技术综述. 第 14 届系统仿真技术及其应用学术会议. 合肥, 中国: 2009. 366−370 Peng Chun-Guang, Zhao Xin-Ye, Liu Bao-Hong, Huang Ke-Di. The technology of wargaming: An overview. In: Proceedings of the 14th Chinese Conference on System Simulation Technology & Application. Hefei, China: 2009. 366−370
[17]	曹占广, 陶帅, 胡晓峰, 何吕龙. 国外兵棋推演及系统研究进展. 系统仿真学报, 2021, 33(9): 2059-2065 Cao Zhan-Guang, Tao Shuai, Hu Xiao-Feng, He Lü-Long. Abroad wargaming deduction and system research. Journal of System Simulation, 2021, 33(9): 2059-2065
[18]	司光亚, 王艳正. 新一代大型计算机兵棋系统面临的挑战与思考. 系统仿真学报, 2021, 33(9): 2010-2016 Si Guang-Ya, Wang Yan-Zheng. Challenges and reflection on next-generation large-scale computer wargame system. Journal of System Simulation, 2021, 33(9): 2010-2016
[19]	Ganzfried S, Sandholm T. Game theory-based opponent modeling in large imperfect-information games. In: Proceedings of the 10th International Conference on Autonomous Agents and Multi-agent Systems. Taipei, China: 2011. 533−540
[20]	Littman M L. Algorithms for Sequential Decision Making [Ph.D. dissertation], Brown University, USA, 1996
[21]	Nieves N P, Yang Y D, Slumbers O, Mguni D H, Wen Y, Wang J. Modelling behavioural diversity for learning in open-ended games. In: Proceedings of the 38th International Conference on Machine Learning. Vienna, Austria: 2021. 8514−8524
[22]	Jaderberg M, Czarnecki W M, Dunning I, Marris L, Lever G, Castañeda A G, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 2019, 364(6443): 859-865 doi: 10.1109/TVT.2021.3096928
[23]	Baker B, Kanitscheider I, Markov T M, Wu Y, Powell G, McGrew B, et al. Emergent tool use from multi-agent autocurricula. In: Proceedings of the 8th International Conference on Learning Representations. Addis Ababa, Ethiopia: 2020.
[24]	Liu I J, Jain U, Yeh R A, Schwing A G. Cooperative exploration for multi-agent deep reinforcement learning. In: Proceedings of the 38th International Conference on Machine Learning. Vienna, Austria: 2021. 6826−6836
[25]	周志杰, 曹友, 胡昌华, 唐帅文, 张春潮, 王杰. 基于规则的建模方法的可解释性及其发展. 自动化学报, 2021, 47(6): 1201-1216 Zhou Zhi-Jie, Cao You, Hu Chang-Hua, Tang Shuai-Wen, Zhang Chun-Chao, Wang Jie. The interpretability of rule-based modeling approach and its development. Acta Automatica Sinica, 2021, 47(6): 1201-1216
[26]	Révay M, Líška M. OODA loop in command & control systems. In: Proceedings of the Communication and Information Technologies. Vysoke Tatry, Slovakia: 2017.
[27]	IEEE Transactions on Computational Intelligence and AI in Games, 2017, 9(3): 227-238 doi: 10.1109/TCIAIG.2016.2543661
[28]	Najam-ul-lslam M, Zahra F T, Jafri A R, Shah R, Hassan M u, Rashid M. Auto implementation of parallel hardware architecture for Aho-Corasick algorithm. Design Automation for Embedded System, 2022, 26: 29-53
[29]	崔文华, 李东, 唐宇波, 柳少军. 基于深度强化学习的兵棋推演决策方法框架. 国防科技, 2020, 41(2): 113-121 Cui Wen-Hua, Li Dong, Tang Yu-Bo, Liu Shao-Jun. Framework of wargaming decision-making methods based on deep reinforcement learning. National Defense Technology, 2020, 41(2): 113-121
[30]	李琛, 黄炎焱, 张永亮, 陈天德. Actor-Critic框架下的多智能体决策方法及其在兵棋上的应用. 系统工程与电子技术, 2021, 43(3): 755-762 doi: 10.12305/j.issn.1001-506X.2021.03.20 Li Chen, Huang Yan-Yan, Zhang Yong-Liang, Chen Tian-De. Multi-agent decision-making method based on actor-critic framework and its application in wargame. Systems Engineering and Electronics, 2021, 43(3): 755-762 doi: 10.12305/j.issn.1001-506X.2021.03.20
[31]	张振, 黄炎焱, 张永亮, 陈天德. 基于近端策略优化的作战实体博弈对抗算法. 南京理工大学学报, 2021, 45(1): 77-83 Zhang Zhen, Huang Yan-Yan, Zhang Yong-Liang, Chen Tian-De. Battle entity confrontation algorithm based on proximal policy optimization. Journal of Nanjing University of Science and Technology, 2021, 45(1): 77-83
[32]	秦超, 高晓光, 万开方. 深度卷积记忆网络时空数据模型. 自动化学报, 2020, 46(3): 451-462 Qin Chao, Gao Xiao-Guang, Wan Kai-Fang. Deep spatio-temporal convolutional long-short memory network. Acta Automatica Sinica, 2020, 46(3): 451-462
[33]	陈伟宏, 安吉尧, 李仁发, 李万里. 深度学习认知计算综述. 自动化学报, 2017, 43(11): 1886-1897 Chen Wei-Hong, An Ji-Yao, Li Ren-Fa, Li Wan-Li. Review on deep-learning-based cognitive computing. Acta Automatica Sinica, 2017, 43(11): 1886-1897
[34]	Burda Y, Edwards H, Storkey A J, Klimov O. Exploration by random network distillation. In: Proceedings of the 7th International Conference on Learning Representations. New Orleans, USA: 2019.
[35]	Mnih V, Badia A P, Mirza M, Graves A, Harley T, Lillicrap T P, et al. Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning. New York, USA: 2016. 1928−1937
[36]	Horgan D, Quan J, Budden D, Barth-Maron G, Hessel M, Van Hasselt H, et al. Distributed prioritized experience replay. In: Proceedings of the 6th International Conference on Learning Representations. Vancouver, Canada: 2018.
[37]	Espeholt L, Soyer H, Munos R, Simonyan K, Mnih V, Ward T, et al. IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures. In: Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: 2018. 1407−1416
[38]	Jaderberg M, Czarnecki W M, Dunning I, Marris L, Lever G, Castañeda A G, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 2019, 364(6443): 859-865 doi: 10.1126/science.aau6249
[39]	Espeholt L, Marinier R, Stanczyk P, Wang K, Michalski M. SEED RL: Scalable and efficient deep-RL with accelerated central inference. In: Proceedings of the 8th International Conference on Learning Representations. Addis Ababa, Ethiopia: 2020.
[40]	Moritz P, Nishihara R, Wang S, Tumanov A, Liaw R, Liang E, et al. Ray: A distributed framework for emerging AI applications. In: Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation. Carlsbad, USA: 2018. 561−577
[41]	蒲志强, 易建强, 刘振, 丘腾海, 孙金林, 李非墨. 知识和数据协同驱动的群体智能决策方法研究综述. 自动化学报, 2022, 48(3): 627−643 doi: 10.16383/j.aas.c210118 Pu Zhi-Qiang, Yi Jian-Qiang, Liu Zhen, Qiu Teng-Hai, Sun Jin-Lin, Li Fei-Mo. Knowledge-based and data-driven integrating methodologies for collective intelligence decision making: A survey. Acta Automatica Sinica, 2022, 48(3): 627−643 doi: 10.16383/j.aas.c210118
[42]	Rueden L V, Mayer S, Beckh K, Georgiev B, Giesselbach S, Heese R, et al. Informed machine learning – a taxonomy and survey of integrating prior knowledge into learning systems. IEEE Transactions on Knowledge and Data Engineering, DOI: 10.1109/TKDE.2021.3079836, 2021, 5: 1−19
[43]	Hartmann G, Shiller Z, Azaria A. Deep reinforcement learning for time optimal velocity control using prior knowledge. In: Proceedings of the 31st International Conference on Tools With Artificial Intelligence. Portland, USA: 2019. 186−193
[44]	Zhang P, Hao J Y, Wang W X, Tang H Y, Ma Y, Duan Y H, et al. KoGuN: Accelerating deep reinforcement learning via integrating human suboptimal knowledge. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. Virtual Event: 2020. 2291−2297
[45]	黄凯奇, 兴军亮, 张俊格, 倪晚成, 徐博. 人机对抗智能技术. 中国科学: 信息科学, 2020, 50(4): 540-550 doi: 10.1360/N112019-00048 Huang Kai-Qi, Xing Jun-Liang, Zhang Jun-Ge, Ni Wan-Cheng, Xu Bo. Intelligent technologies of human-computer gaming. Scientia Sinica Informations, 2020, 50(4): 540-550 doi: 10.1360/N112019-00048
[46]	Elo A E. The Rating of Chess Players, Past and Present. London: Batsford, 1978.
[47]	Herbrich R, Minka T, Graepel T. TrueSkill (TM): A Bayesian skill rating system. In: Proceedings of the 19th International Conference on Neural Information Processing Systems. Vancou-ver, Canada: 2006. 569−576
[48]	Balduzzi D, Tuyls K, Perolat J, Graepel T. Re-evaluating evaluation. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: 2018. 3272−3283
[49]	Omidshafiei S, Papadimitriou C, Piliouras G, Tuyls K, Rowland M, Lespiau J B, et al. α-rank: Multi-agent evaluation by evolution. Scientific Reports, 2019, 9(1): Article No. 9937 doi: 10.1038/s41598-019-45619-9
[50]	唐宇波, 沈弼龙, 师磊, 易星. 下一代兵棋系统模型引擎设计问题研究. 系统仿真学报, 2021, 33(9): 2025-2036 Tang Yu-Bo, Shen Bi-Long, Shi Lei, Yi Xing. Research on the issues of next generation wargame system model engine. Journal of System Simulation, 2021, 33(9): 2025-2036
[51]	Ji S, Pan S, Cambria E, Marttinen P, Yu P. A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(2): 494-514 doi: 10.1145/154421.154422
[52]	Wang Z, Zhang J W, Feng J L, Chen Z. Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence. Québec, Can-ada: 2014. 1112−1119
[53]	王保魁, 吴琳, 胡晓峰, 贺筱媛, 郭圣明. 基于时序图的作战指挥行为知识表示学习方法. 系统工程与电子技术, 2020, 42(11): 2520-2528 doi: 10.3969/j.issn.1001-506X.2020.11.14 Wang Bao-Kui, Wu Lin, Hu Xiao-Feng, He Xiao-Yuan, Guo Sheng-Ming. Operations command behavior knowledge representation learning method based on sequential graph. Systems Engineering and Electronics, 2020, 42(11): 2520-2528 doi: 10.3969/j.issn.1001-506X.2020.11.14
[54]	刘嵩, 武志强, 游雄, 张欣, 王雪峰. 基于兵棋推演的综合战场态势多尺度表达. 测绘科学技术学报, 2012, 29(5): 382-385, 390 doi: 10.3969/j.issn.1673-6338.2012.05.015 Liu Song, Wu Zhi-Qiang, You Xiong, Zhang Xin, Wang Xue-Feng. Multi-scale expression of integrated battlefield situation based on wargaming. Journal of Geomatics Science and Technology, 2012, 29(5): 382-385, 390 doi: 10.3969/j.issn.1673-6338.2012.05.015
[55]	贺筱媛, 郭圣明, 吴琳, 李东, 许霄, 李丽. 面向智能化兵棋的认知行为建模方法研究. 系统仿真学报, 2021, 33(9): 2037-2047 He Xiao-Yuan, Guo Sheng-Ming, Wu Lin, Li Dong, Xu Xiao, Li Li. Modeling research of cognition behavior for intelligent wargaming. Journal of System Simulation, 2021, 33(9): 2037-2047
[56]	朱丰, 胡晓峰, 吴琳, 贺筱媛, 吕学志, 廖鹰. 从态势认知走向态势智能认知. 系统仿真学报, 2018, 30(3): 761-771 Zhu Feng, Hu Xiao-Feng, Wu Lin, He Xiao-Yuan, Lü Xue-Zhi, Liao Ying. From situation cognition stepped into situation intelligent cognition. Journal of System Simulation, 2018, 30(3): 761-771
[57]	Heinrich J, Lanctot M, Silver D. Fictitious self-play in extensive-form games. In: Proceedings of the 32nd International Conference on Machine Learning. Lille, France: 2015. 805−813
[58]	Adam L, Horcík R, Kasl T, Kroupa T. Double oracle algorithm for computing equilibria in continuous games. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. Virtual Event: 2021. 5070−5077
[59]	Nguyen T T, Nguyen N D, Nahavandi S. Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE Transactions on Cybernetics, 2020, 50(9): 3826-3839 doi: 10.1109/TCYB.2020.2977374
[60]	Zhang K Q, Yang Z R, Başar T. Multi-agent reinforcement learning: A selective overview of theories and algorithms. Hand-book of Reinforcement Learning and Control, 2021: 321−384
[61]	施伟, 冯旸赫, 程光权, 黄红蓝, 黄金才, 刘忠, 贺威. 基于深度强化学习的多机协同空战方法研究. 自动化学报, 2021, 47(7): 1610-1623 Shi Wei, Feng Yang-He, Cheng Guang-Quan, Huang Hong-Lan, Huang Jin-Cai, Liu Zhong, He Wei. Research on multi-aircraft cooperative air combat method based on deep reinforcement learning. Acta Automatica Sinica, 2021, 47(7): 1610-1623
[62]	梁星星, 冯旸赫, 马扬, 程光权, 黄金才, 王琦, 周玉珍, 刘忠. 多Agent深度强化学习综述. 自动化学报, 2020, 46(12): 2537-2557 Liang Xing-Xing, Feng Yang-He, Ma Yang, Cheng Guang-Quan, Huang Jin-Cai, Wang Qi, Zhou Yu-Zhen, Liu Zhong. Deep multi-agent reinforcement learning: A survey. Acta Automatica Sinica, 2020, 46(12): 2537-2557
[63]	Yan D, Weng J, Huang S, Li C, Zhou Y, Su H, Zhu J. Deep reinforcement learning with credit assignment for combinatorial optimization. Pattern Recognition, 2022, 124: Artice No. 108466
[64]	Lansdell B J, Prakash P R, Körding K P. Learning to solve the credit assignment problem. In: Proceedings of the 8th International Conference on Learning Representations. Addis Ababa, Ethiopia: 2020.
[65]	孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题. 自动化学报, 2020, 46(7): 1301-1312 Sun Chang-Yin, Mu Chao-Xu. Important scientific problems of multi-agent deep reinforcement learning. Acta Automatica Sinica, 2020, 46(7): 1301-1312
[66]	Sunehag P, Lever G, Gruslys A, Czarnecki W M, Zambaldi V, Jaderberg M, et al. Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and Multi-agent Systems. Stockholm, Sweden: 2018. 2085−2087
[67]	Rashid T, Samvelyan M, De Witt C S, Farquhar G, Foerster J N, Whiteson S. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning. Stock-holm, Sweden: 2018. 4292−4301
[68]	Son K, Kim D, Kang W J, Hostallero D, Yi Y. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: 2019. 5887−5896
[69]	Foerster J N, Farquhar G, Afouras T, Nardelli N, Whiteson S. Counterfactual multi-agent policy gradients. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, USA: 2018. 2974−2982
[70]	Nguyen D T, Kumar A, Lau H C. Credit assignment for collective multi-agent RL with global rewards. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: 2018. 8113−8124
[71]	Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 2018, 362(6419): 1140-1144 doi: 10.1126/science.aar6404
[72]	Yu Y. Towards sample efficient reinforcement learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden: 2018. 5739−5743
[73]	Ecoffet A, Huizinga J, Lehman J, Stanley K O, Clune J. First return, then explore. Nature, 2021, 590(7847): 580-586 doi: 10.1038/s41586-020-03157-9
[74]	Jin C, Krishnamurthy A, Simchowitz M, Yu T C. Reward-free exploration for reinforcement learning. In: Proceedings of the 37th International Conference on Machine Learning. Virtual Event: 2020. 4870−4879
[75]	Mahajan A, Rashid T, Samvelyan M, Whiteson S. MAVEN: Multi-agent variational exploration. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: 2019. Article No. 684
[76]	Yang Y D, Wen Y, Chen L H, Wang J, Shao K, Mguni D, et al. Multi-agent determinantal Q-learning. In: Proce-edings of the 37th International Conference on Machine Learning. Virtual Event: 2020. 10757−10766
[77]	Wang T H, Dong H, Lesser V, Zhang C J. ROMA: Role-oriented multi-agent reinforcement learning. In: Proceedings of the 37th International Conference on Machine Learning. Virtual Event: 2020. 9876−9886
[78]	张钹, 朱军, 苏航. 迈向第三代人工智能. 中国科学: 信息科学, 2020, 50(9): 1281-1302 doi: 10.1360/SSI-2020-0204 Zhang Bo, Zhu Jun, Su Hang. Toward the third generation of artificial intelligence. Scientia Sinca Informationis, 2020, 50(9): 1281-1302 doi: 10.1360/SSI-2020-0204
[79]	王保剑, 胡大裟, 蒋玉明. 改进A算法在路径规划中的应用. 计算机工程与应用, 2021, 57(12): 243-247 doi: 10.3778/j.issn.1002-8331.2008-0099 Wand Bao-Jian, Hu Da-Sha, Jiang Yu-Ming. Application of improved A algorithm in path planning. Computer Engineering and Applications, 2021, 57(12): 243-247 doi: 10.3778/j.issn.1002-8331.2008-0099
[80]	张可, 郝文宁, 史路荣, 余晓晗, 邵天浩. 基于级联模糊系统的兵棋进攻关键点推理. 控制工程, 2021, 28(7): 1366-1374 Zhang Ke, Hao Wen-Ning, Shi Lu-Rong, Yu Xiao-Han, Shao Tian-Hao. Inference of key points of attack in wargame based on cascaded fuzzy system. Control Engineering of China, 2021, 28(7): 1366-1374
[81]	邢思远, 倪晚成, 张海东, 闫科. 基于兵棋复盘数据的武器效用挖掘. 指挥与控制学报, 2020, 6(2): 132-140 doi: 10.3969/j.issn.2096-0204.2020.02.0132 Xing Si-Yuan, Ni Wan-Cheng, Zhang Hai-Dong, Yan Ke. Mining of weapon utility based on the replay data of wargame. Journal of Command and Control, 2020, 6(2): 132-140 doi: 10.3969/j.issn.2096-0204.2020.02.0132
[82]	金哲豪, 刘安东, 俞立. 基于GPR和深度强化学习的分层人机协作控制. 自动化学报, 2020, 46: 1-11 Jin Zhe-Hao, Liu An-Dong, Yu Li. Hierarchical human-robot cooperative control based on GPR and DRL. Acta Automatica Sinica, 2020, 46: 1-11
[83]	徐磊, 杨勇. 基于兵棋推演的分队战斗行动方案评估. 火力与指挥控制, 2021, 46(4): 88-92, 98 doi: 10.3969/j.issn.1002-0640.2021.04.016 Xu Lei, Yang Yong. Research on evaluation of unit combat action plan based on wargaming. Fire Control & Command Control, 2021, 46(4): 88-92, 98 doi: 10.3969/j.issn.1002-0640.2021.04.016
[84]	李云龙, 张艳伟, 王增臣. 联合作战方案推演评估技术框架. 指挥信息系统与技术, 2020, 11(4): 78-83 Li Yun-Long, Zhang Yan-Wei, Wang Zeng-Chen. Technical framework of joint operation scheme deduction and evaluation. Command Information System and Technology, 2020, 11(4): 78-83
[85]	Myerson R B. Game Theory. Cambridge: Harvard University Press, 2013.
[86]	Weibull J W. Evolutionary Game Theory. Cambridge: MIT Press, 1997.
[87]	Roughgarden T. Algorithmic game theory. Communications of the ACM, 2010, 53(7): 78-86 doi: 10.1145/1785414.1785439
[88]	Chalkiadakis G, Elkind E, Wooldridge M. Cooperative game theory: Basic concepts and computational challenges. IEEE Intelligent Systems, 2012, 27(3): 86-90 doi: 10.1109/MIS.2012.47
[89]	周雷, 尹奇跃, 黄凯奇. 人机对抗中的博弈学习方法. 计算机学报, DOI: 10.11897/SP.J.1016.2022.01859 Zhou Lei, Yin Qi-Yue, Huang Kai-Qi. Game-theoretic learning in human-computer gaming. Chinese Journal of Computers, DOI: 10.11897/SP.J.1016.2022.01859
[90]	Lanctot M, Zambaldi V, Gruslys A, Lazaridou A, Tuyls K, Pérolat J, et al. A unified game-theoretic approach to multi-agent reinforcement learning. In: Proceedings of the 31st Conference on Neural Information Processing Systems. Long Beach, USA: 2017. 4190−4203
[91]	Brown N, Lerer A, Gross S, Sandholm T. Deep counterfactual regret minimization. In: Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: 2019. 793−802
[92]	Qiu X P, Sun T X, Xu Y G, Shao Y F, Dai N, Huang X J. Pre-trained models for natural language processing: A survey. Science China Technological Sciences, 2020, 63(10): 1872-1897 doi: 10.1007/s11431-020-1647-3
[93]	Zhang Z Y, Han X, Zhou H, Ke P, Gu Y X, Ye D M, et al. CPM: A large-scale generative Chinese Pre-trained language model. AI Open, 2021, 2: 93-99 doi: 10.1016/j.aiopen.2021.07.001
[94]	Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al. Language models are few-shot learners. In: Proceedings of the 34th Conference on Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2020.
[95]	Meng D Y, Zhao Q, Jiang L. A theoretical understanding of self-paced learning. Information Sciences, 2017, 414: 319-328 doi: 10.1016/j.ins.2017.05.043
[96]	Singh P, Verma V K, Mazumder P, Carin L, Rai P. Calibrating CNNs for lifelong learning. In: Proceedings of the 34th Conference on Neural Information Processing Systems. Vancouver, Canada: 2020.
[97]	Cheng W, Yin Q Y, Zhang J G. Opponent strategy recognition in real time strategy game using deep feature fusion neural network. In: Proceedings of the 5th International Conference on Computer and Communication Systems. Shanghai, China: 2020. 134−137
[98]	Samvelyan M, Rashid T, De Witt C S, Farquhar G, Nardelli N, Rudner T G J, et al. The StarCraft multi-agent challenge. In: Proceedings of the 18th International Conference on Auto-nomous Agents and Multi-agent Systems. Montreal, Canada: 2019. 2186−2188
[99]	Tang Z T, Shao K, Zhu Y H, Li D, Zhao D B, Huang T W. A review of computational intelligence for StarCraft AI. In: Proceedings of the IEEE Symposium Series on Computational Intelligence. Bangalore, India: 2018. 1167−1173
[100]	Christianos F, Schäfer L, Albrecht S V. Shared experience actor-critic for multi-agent reinforcement learning. In: Proceedings of the Advances in Neural Information Processing Systems 33. Virtual Event: 2020.
[101]	Jaques N, Lazaridou A, Hughes E, Gulcehre C, Ortega P A, Strouse D J, et al. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In: Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: 2019. 3040−3049

施引文献

期刊类型引用(11)

1.	苏炯铭，罗俊仁，陈少飞，项凤涛. 海空跨域协同兵棋AI架构设计及关键技术分析. 指挥控制与仿真. 2024(02): 35-43 . 百度学术
2.	朱智，杨松，王涛，王维平，赵月华. 模数驱动的智能武器装备作战建模方法. 系统工程理论与实践. 2024(03): 1083-1096 . 百度学术
3.	费陈，赵亮，孙许可，卢野，张帆. 无人机蜂群技术发展研究. 火炮发射与控制学报. 2024(02): 50-60 . 百度学术
4.	罗俊仁，张万鹏，苏炯铭，袁唯淋，陈璟. 多智能体博弈学习研究进展. 系统工程与电子技术. 2024(05): 1628-1655 . 百度学术
5.	彭莉莎，孙宇祥，薛宇凡，周献中. 融合三支多属性决策与SAC的兵棋推演智能决策技术. 系统工程与电子技术. 2024(07): 2310-2322 . 百度学术
6.	王小书，史天运，白伟，彭凯贝，吕晓军. 基于数字孪生的客站安全态势推演技术研究. 中国安全生产科学技术. 2024(07): 49-56 . 百度学术
7.	徐志雄，邸彦佳，胡文雷，杨东东. 多任务知识交叉融合框架下智能博弈对抗策略研究. 军事运筹与评估. 2024(03): 55-60 . 百度学术
8.	傅妍芳，雷凯麟，魏佳宁，曹子建，杨博，王炜，孙泽龙，李秦洁. 基于演员-评论家框架的层次化多智能体协同决策方法. 兵工学报. 2024(10): 3385-3396 . 百度学术
9.	李智，孙怡峰，吴疆，王玉宾. 基于战车和夺控点分配的智能体步兵投送策略决策方法. 指挥与控制学报. 2024(04): 432-442 . 百度学术
10.	刘玮，张永亮，程旭. 基于深度强化学习的人机智能对抗综述. 指挥信息系统与技术. 2023(02): 28-37 . 百度学术
11.	罗俊仁，张万鹏，项凤涛，蒋超远，陈璟. 智能推演综述：博弈论视角下的战术战役兵棋与战略博弈. 系统仿真学报. 2023(09): 1871-1894 . 百度学术

其他类型引用(9)

资源附件(0)

访问统计

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

兵棋推演的智能决策技术与挑战

doi: 10.16383/j.aas.c210547

计量

Intelligent Decision Making Technology and Challenge of Wargame

1. 熵函数的信息度量特性

2. 流水车间生产线信息熵函数

2.1 加工批次在生产线上的信息熵

2.2 加工批量及批次的信息熵函数

3. 系统信息熵函数与加工批次定理

3.1 加工批次熵函数的单调特性

3.2 加工批次熵函数的极值特性

3.3 分析与讨论

4. 实证研究

5. 结论

期刊类型引用(11)

其他类型引用(9)

计量

目录

1. 熵函数的信息度量特性

2. 流水车间生产线信息熵函数

2.1 加工批次在生产线上的信息熵

2.2 加工批量及批次的信息熵函数

3. 系统信息熵函数与加工批次定理

3.1 加工批次熵函数的单调特性

3.2 加工批次熵函数的极值特性

3.3 分析与讨论

4. 实证研究

5. 结论

留言板

兵棋推演的智能决策技术与挑战

doi: 10.16383/j.aas.c210547

计量

出版历程

Intelligent Decision Making Technology and Challenge of Wargame

1. 熵函数的信息度量特性

2. 流水车间生产线信息熵函数

2.1 加工批次在生产线上的信息熵

2.2 加工批量及批次的信息熵函数

3. 系统信息熵函数与加工批次定理

3.1 加工批次熵函数的单调特性

3.2 加工批次熵函数的极值特性

3.3 分析与讨论

4. 实证研究

5. 结论

期刊类型引用(11)

其他类型引用(9)

计量

出版历程

目录

1. 熵函数的信息度量特性

2. 流水车间生产线信息熵函数

2.1 加工批次在生产线上的信息熵

2.2 加工批量及批次的信息熵函数

3. 系统信息熵函数与加工批次定理

3.1 加工批次熵函数的单调特性

3.2 加工批次熵函数的极值特性

3.3 分析与讨论

4. 实证研究

5. 结论