2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

兵棋推演的智能决策技术与挑战

尹奇跃 赵美静 倪晚成 张俊格 黄凯奇

隋振,  梁硕,  田彦涛.  考虑车辆横向主动安全的智能驾驶员模型.  自动化学报,  2021,  47(8): 1899−1911 doi: 10.16383/j.aas.c190526
引用本文: 尹奇跃, 赵美静, 倪晚成, 张俊格, 黄凯奇. 兵棋推演的智能决策技术与挑战. 自动化学报, 2023, 49(5): 913−928 doi: 10.16383/j.aas.c210547
Sui Zhen,  Liang Shuo,  Tian Yan-Tao.  Intelligent driving model considering lateral active safety of vehicles.  Acta Automatica Sinica,  2021,  47(8): 1899−1911 doi: 10.16383/j.aas.c190526
Citation: Yin Qi-Yue, Zhao Mei-Jing, Ni Wan-Cheng, Zhang Jun-Ge, Huang Kai-Qi. Intelligent decision making technology and challenge of wargame. Acta Automatica Sinica, 2023, 49(5): 913−928 doi: 10.16383/j.aas.c210547

兵棋推演的智能决策技术与挑战

doi: 10.16383/j.aas.c210547
基金项目: 国家自然科学青年基金(61906197)资助
详细信息
    作者简介:

    尹奇跃:中国科学院自动化研究所副研究员. 主要研究方向为强化学习, 数据挖掘和人工智能与游戏. E-mail: qyyin@nlpr.ia.ac.cn

    赵美静:中国科学院自动化研究所副研究员. 主要研究方向为知识表示与建模, 复杂系统决策. E-mail: meijing.zhao@ia.ac.cn

    倪晚成:中国科学院自动化研究所研究员. 主要研究方向为数据挖掘与知识发现, 复杂系统建模和群体智能博弈决策平台与评估. E-mail: wancheng.ni@ia.ac.cn

    张俊格:中国科学院自动化研究所研究员. 主要研究方向为持续学习, 小样本学习, 博弈决策和强化学习. E-mail: jgzhang@nlpr.ia.ac.cn

    黄凯奇:中国科学院自动化研究所研究员. 主要研究方向为计算机视觉, 模式识别和认知决策. 本文通信作者. E-mail: kqhuang@nlpr.ia.ac.cn

Intelligent Decision Making Technology and Challenge of Wargame

Funds: Supported by National Natural Science Youth Foundation of China (61906197)
More Information
    Author Bio:

    YIN Qi-Yue Associate professor at the Institute of Automation, Chinese Academy of Sciences. His research interest covers reinforcement learning, data mining, and artificial intelligence on games

    ZHAO Mei-Jing Associate professor at the Institute of Automation, Chinese Academy of Sciences. Her research interest covers knowledge representation and modeling, and complex system decision-making

    NI Wan-Cheng Professor at the Institute of Automation, Chinese Academy of Sciences. Her research interest covers data mining and knowledge discovery, complex system modeling, and swarm intelligence platform and evaluation

    ZHANG Jun-Ge Professor at the Institute of Automation, Chinese Academy of Sciences. His research interest covers continuous learning, small sample learning, game decision making, and reinforcement learning

    HUANG Kai-Qi Professor at the Institute of Automation, Chinese Academy of Sciences. His research interest covers computer vision, pattern recognition, and cognitive decision-making. Corresponding author of this paper

  • 摘要: 近年来, 以人机对抗为途径的智能决策技术取得了飞速发展, 人工智能(Artificial intelligence, AI)技术AlphaGo、AlphaStar等分别在围棋、星际争霸等游戏环境中战胜了顶尖人类选手. 兵棋推演作为一种人机对抗策略验证环境, 由于其非对称环境决策、更接近真实环境的随机性与高风险决策等特点, 受到智能决策技术研究者的广泛关注. 通过梳理兵棋推演与目前主流人机对抗环境(如围棋、德州扑克、星际争霸等)的区别, 阐述了兵棋推演智能决策技术的发展现状, 分析了当前主流技术的局限与瓶颈, 对兵棋推演中的智能决策技术研究进行了思考, 期望能对兵棋推演相关问题中的智能决策技术研究带来启发.
  • 驾驶员模型本质上即智能车辆的自动驾驶控制器, 自动完成车辆在特定驾驶任务下的速度控制与转向. 通常根据车辆运动的维度, 驾驶员模型可以大致分为纵向驾驶员模型、横向驾驶员模型与复合驾驶员模型[1]. 现如今最优控制理论、自适应控制理论与模型预测控制(Model predictive control, MPC)理论已成为当前驾驶员建模的主流方法. 如Yoshida等[2]采取自适应控制理论建立驾驶员模型. Qu等[3-4]提出了基于随机模型预测控制的驾驶员建模方法. Falcone等[5]则利用线性时变模型预测控制算法建立了自动驾驶车辆的转向控制器, 也可认为是横向驾驶员模型. Du等[6]利用非线性模型预测控制(Nonlinear model predictive control, NMPC)实现车辆速度和转向的综合控制. 未来随着人工智能技术的推进, 基于机器学习技术在特定驾驶任务条件下建立驾驶员模型也逐渐引起了人们的重视. 如Amsalu等[7]利用支持向量机对驾驶员在十字路口处的驾驶行为进行了分析与建模, 对驾驶员在十字路口的行为进行准确的预测, 并用于指导实际驾驶行为.

    在智能驾驶员模型飞速发展的同时, 车辆的主动安全也逐渐引起了人们的重视[8]. 车辆的横向运动过程中主要面对的安全威胁包括: 1)在车辆转向过程中, 由于车辆系统的非线性和耦合性使其在高速、弯道或者在湿滑路面下极容易发生侧滑、侧翻、车道偏离等危险. 2)在复杂交通环境中, 因对交通场景中环境态势分析的不足, 导致和其他交通车辆发生碰撞事故.

    本文研究的智能驾驶员模型主要解决两方面问题: 1)针对高速、低路面附着系数以及转弯工况下, 通过设计模型预测控制器作为车辆转向控制器并考虑车辆的侧向加速度、横摆角速度、质心侧偏角和横向转移率等现实约束, 实现智能车辆在跟踪轨迹的同时提高侧向稳定. 这里所指的侧向稳定性即车辆在行驶过程中, 不发生侧滑或侧翻的极限性能, 提高侧向稳定性即减小车辆发生侧滑、侧翻的风险. 2)在直线多车道的道路条件下, 通过分析一般工况下车辆换道行驶条件, 采用线性模型预测理论设计速度调整控制算法, 采用粒子群算法结合贝塞尔曲线设计轨迹发生器, 辅助智能车安全的实现自动换道的驾驶任务, 即在换道过程中不与环境车辆发生任意形式的碰撞.

    人类驾驶车辆的本质, 从控制系统的角度出发, 可看做是人—车—路环境构成的闭环系统. 人类驾驶员通过分析交通环境, 通过大脑制定出车辆应行驶的速度和轨迹, 并驱动方向盘和踏板执行相应的驾驶任务. 这一系列流程可概括为驾驶员的感知、决策和执行. 介于驾驶员具有以上特征, 本文设计的在换道条件下的驾驶员模型结构如图1所示.

    图 1  驾驶员模型结构
    Fig. 1  The structure of the driver model

    在该结构中, 驾驶员模型可划分为换道决策单元、转向控制器模块和速度控制器用来实现换道任务. 换道决策单元将结合具体车道环境, 合理分析换道条件、规划期望行驶的轨迹和车速避免事故的发生. 期望轨迹和车速信号作用于下游的转向控制器和速度控制器完成具体的转向和速度操控. 其中, 本方案中的速度控制器根据现有成果, 采用模糊神经网络算法计算指定工况下油门开度和制动压强实现对汽车加速度的控制[9]. 而转向控制器的设计, 保证轨迹跟踪精度的同时必须兼顾车辆自身的侧向稳定性. 这里所说的侧向稳定性指标包括质心侧偏角、横摆角速度、侧向加速度和横向转移率. 因此, 转向控制器的设计本质上是以轨迹跟踪精度为控制目标, 以方向盘转角为控制量, 同时兼顾汽车质心侧偏角、横摆角速度、侧向加速度和横向转移率等稳定性约束的, 具有多目标多约束的最优控制求解问题[10]. 下面将分别就本驾驶员模型中的转向控制器以及决策规划模块的详细内容进行说明.

    利用模型预测控制理论进行转向控制器的设计, 需要对被控对象进行清晰准确的描述. 本文选取两轮三自由度的非线性车辆动力学模型作为转向控制器设计的依据. 假设车辆在行驶过程中前轮转角与方向盘转角之间的传动比为线性关系, 且两个前轮的转向角度一致. 同时忽略车辆垂直与俯仰运动, 忽略空气动力学、侧向风与轮胎回正力矩对车身的作用. 模型结构如图2所示.

    图 2  简化3自由度车辆动力学模型
    Fig. 2  The model of the vehicle of 3DOF

    通过对车辆模型的纵向、侧向、横摆和侧倾运动的受力分析, 可以推导出其动力学方程为

    $$ \left\{ {\begin{array}{*{20}{l}} {m{a_y} = {F_{xf}}\sin \delta + {F_{yf}}\cos \delta + {F_{yr}}}\\ {m{a_x} = {F_{xf}}\cos \delta - {F_{yf}}\sin \delta + {F_{xr}}}\\ {{I_z}\dot r = ({F_{xf}}\sin \delta + {F_{yf}}\cos \delta )a - {F_{yr}}b} \end{array}} \right. $$ (1)

    图2中的各个参数分别表示如下: $ a_x $为车辆的纵向加速度$({\rm{m/s}}^2)$; $ a_y $为车辆的侧向加速度$({\rm{m/s}}^2)$; $ \psi $为车辆横摆角$({\rm{rad}})$; $ r $为车辆的横摆角速度$({\rm{rad/s}})$; $ \delta $是汽车前轮转角$({\rm{rad}})$, 其与方向盘转角$ {\delta _{sw}} $之间的线性系数$ G $; $ m $是车辆的整体质量$({\rm{kg}})$; $ I_z $代表车辆$ z $轴的转动惯量$({\rm{kg/m}}^2)$. $ a $$ b $分别为质心到车轮前后轴的轴距$({\rm{m}})$. 通常, ${a_x} = {{\dot v}_x} - r{v_y},\;$${a_y} = $$ {{\dot v}_y} - r{v_x}.\;$$ v_x $$ v_y $是车相对自身纵轴和横轴的速度$({\rm{m/s}}).\;$轮胎所受的纵向力$ F_{xi} $与侧向力$ F_{yi}(N) $由路面提供, 通常在滑移率和轮胎侧偏角较小的情况下, 其可近似为

    $$ \left\{ \begin{array}{l} {F_{xi}} = 2{K_i}{S_i}\\ {F_{yi}} = 2{C_i}{\alpha _i} \end{array} \right. $$ (2)

    式中, $i = f ,\;$$ r $. $ K_i $$ C_i $分别为轮胎的纵向刚度与侧向刚度$({\rm{N/m}})$, $ S_i $为轮胎滑移率, 是一无量纲参数, 通常在车辆匀速行驶中可近似为一个定值, 本文定义为$0.2 .$ 轮胎侧偏角$\alpha_i\;({\rm{rad}})$则满足

    $$ \left\{ \begin{array}{l} {\alpha _f} \approx \delta - \dfrac{{{v_y} + ar}}{{{v_x}}}\\ {\alpha _r} \approx - \dfrac{{{v_y} - br}}{{{v_x}}} \end{array} \right. $$ (3)

    此外, 车辆在运动过程中, 车辆在大地坐标系下的位移可表示为

    $$ \left\{ \begin{array}{l} \dot X = {v_x}\cos \psi - {v_y}\sin \psi \\ \dot Y = {v_x}\sin \psi + {v_y}\cos \psi \end{array} \right. $$ (4)

    式中, $ X $$ Y $分别为车辆相对于地面的纵向与侧向位移$ (m) $. 在转向控制器的设计中, 通常需要将复杂的车辆动力学方程简化. 实际转向中车辆前轮转角$ \delta $满足$ \cos \delta \approx 1 $, $ \sin \delta \approx \delta $. 因此, 车辆系统状态空间表达式[11]

    $$ \left\{ \begin{array}{l} {{\dot v}_y} = - r{v_x} \!+\! {K_1}\delta \!+\! {K_3}\!\!\left(\delta \!-\! \dfrac{{{v_y} \!+\! ar}}{{{v_x}}}\right) \!-\! {K_4}\!\!\left(\dfrac{{{v_y} \!-\! br}}{{{v_x}}}\!\right)\\ {{\dot v}_x} = r{v_y} + {K_1} + {K_2} - {K_3}\delta \left(\delta - \dfrac{{{v_y} + ar}}{{{v_x}}}\right)\\ \dot \psi = r\\ \dot r = {K_5}\delta + {K_6}\left(\delta - \dfrac{{{v_y} + ar}}{{{v_x}}}\right) + {K_7}\left(\dfrac{{{v_y} - br}}{{{v_x}}}\right)\\ \dot Y = {v_x}\sin \psi + {v_y}\cos \psi \\ \dot X = {v_x}\cos \psi - {v_y}\sin \psi \end{array} \right. $$ (5)

    其中, ${K_1} = \dfrac{2}{m}{K_f}{S_f};\;$ ${K_2} = \dfrac{2}{m}{K_f}{S_r};\;$ ${K_3} = \dfrac{2}{m}{C_f};\;$ ${K_4} = \dfrac{2}{m}{C_r} ;\;$ ${K_5} = \dfrac{2}{{{I_z}}}a{K_f}{S_f};\;$ ${K_6} = \dfrac{2}{{{I_z}}}a{C_f};\;$ ${K_7} = $$ \dfrac{2}{{{I_z}}}{C_r}b.\;$

    定义系统状态变量$ \xi $为: $v_y ,\;$ $v_x ,\;$ $\psi ,\;$ $r ,\;$ $Y ,\;$ $ X .$ 定义前轮转角$ \delta $为控制量$u .\;$ 车辆的侧向位移与巡航角为系统的被控变量$ \eta $, 因此上述连续控制系统的状态空间方程可以表示为

    $$ \left\{ \begin{array}{l} \dot \xi = f(\xi ,u)\\ \eta = h\xi \end{array} \right. $$ (6)

    式中, 矩阵${{h}}$为系统输出矩阵.

    衡量车辆侧倾稳定性的性能指标通常取车辆横向转移率(Lateral transfer rate, LTR), 其原理如图3所示. 图中的各个参数分别为: $ g $代表重力加速度$({\rm{m/s}}^2)$, $ \varphi $是车辆的侧倾角度$({\rm{rad}})$, $ m_s $是车辆簧载质量$({\rm{kg}})$, $ I_x $是车辆x轴的转动惯量$({\rm{kg/m}}^2)$. $ h $为侧倾臂长$({\rm{m}})$, $ H $代表簧载质心距离地面的高度$({\rm{m}})$, $ T $表示车辆宽度$({\rm{m}})$. 其大小与车辆侧向加速度、侧倾角、侧倾角加速度相关, 指标形式如下[12]:

    图 3  车辆侧倾动力学模型
    Fig. 3  The roll dynamic model of the vehicle
    $$ LTR = \dfrac{{2{m_s}}}{{mgT}}\left[ {H({a_y} - h\ddot \varphi ) + gh\varphi } \right] $$ (7)

    本文中, 转向控制器设计采用线性时变模型预测控制理论, 该控制器需要对预测模型、目标函数和约束条件进行设计[13]. 连续系统的状态空间表达式, 经线性化、离散化和增量化后的结果为

    $$ \left\{ \begin{array}{l} \tilde \xi (k + 1) = \tilde A\tilde \xi (k) + \tilde B\Delta u(k) + {{\tilde d}_k}\\ \eta (k) = \tilde C\tilde \xi (k) \end{array} \right. $$ (8)

    式中,

    $$ \begin{split} &\tilde A = \left[ {\begin{array}{*{20}{c}}A&B\\0&I\end{array}} \right] \!; \;\tilde \xi (k) = \left[ {\begin{array}{*{20}{c}}{\xi (k)}\\{u(k - 1)}\end{array}} \right]\! ;\; \tilde B = \left[ {\begin{array}{*{20}{c}}B\\I\end{array}} \right]\! ; \\ & {\tilde d_k} = \left[ {\begin{array}{*{20}{c}}{{d_k}}\\0\end{array}} \right]\! ; \;\tilde C = \left[ {\begin{array}{*{20}{c}}h&0\end{array}} \right]. \end{split} $$

    其中, ${{I}}$为单位矩阵, 其他各个符号的含义为

    $$ \begin{split} &A = {\left. {I + {T_s}\dfrac{{\partial f}}{{\partial \xi }}} \right|_{\hat \xi }} ; B = {\left. {{T_s}\dfrac{{\partial f}}{{\partial u}}} \right|_{\hat u}} \\ &{d_k} = \hat \xi (k + 1) - A\hat \xi (k) - B\hat u(k) \end{split} $$

    上述各式中, $ T_s $定义为离散系统采样时间, $ \hat \xi $$ \hat u $为当前系统的状态量和控制量.

    由于转向控制系统的主要控制目标是保证轨迹跟踪精度并同时兼顾车辆的侧向稳定性. 因此本文将其转化为轨迹控制精度的性能指标以及对车辆主要状态的约束的形式. 系统的主要侧向约束指标包括质心侧偏角、横摆角速度、侧向加速度和横向转移率, 其具体形式为

    $$ \left\{ {\begin{array}{*{20}{l}} {{\beta _{\min }} + {\varepsilon _\beta }{z_{\beta \min }} \le \beta \le {\beta _{\max }} + {\varepsilon _\beta }{z_{\beta \max }}}\\ {{r_{\min }} + {\varepsilon _r}{z_{r\min }} \le r \le {r_{\max }} + {\varepsilon _r}{z_{r\max }}}\\ {{a_y}_{\min } + {\varepsilon _{{a_y}}}{z_{{a_y}\min }} \le {a_y} \le {a_y}_{\max } + {\varepsilon _{{a_y}}}{z_{{a_y}\max }}}\\ {LT{R_{\min }} \!+\! {\varepsilon _{LTR}}{z_{LTR\min }} \!\le\! LTR \!\le\! LT{R_{\max }}}+\\ { {\varepsilon _{LTR}}{z_{LTR\max }}} \end{array}} \right. $$ (9)

    考虑车辆系统的实际运动极限, 转向控制系统输出量、控制增量和控制量也应满足约束

    $$ \left\{ \begin{array}{l} {\eta _{\min }} \le \eta \le {\eta _{\max }}\\ \Delta {u_{\min }} + {\varepsilon _{\Delta u}}{z_{\Delta u\min }} \le \Delta u \le \Delta {u_{\max }} + {\varepsilon _{\Delta u}}{z_{\Delta u\max }}\\ {u_{\min }} + {\varepsilon _u}{z_{u\min }} \le u \le {u_{\max }} + {\varepsilon _u}{z_{u\max }} \end{array} \right. $$ (10)

    式中, $ {\varepsilon _i} $为各个约束变量的松弛因子, 用于保证系统存在可行解. 为提高系统控制精度, 减小执行机构的运动幅度, 定义系统的性能指标为

    $$ \begin{split} J = & \displaystyle\sum\limits_{i = 1}^{{N_p}} {\left\| {(\eta (k + i) - {\eta _{{\rm{ref}}}}(k + i))} \right\|_Q^2} \;+\\ & \displaystyle\sum\limits_{i = 1}^{{N_c}} {\left\| {\Delta u(k + i - 1)} \displaystyle\right\|_R^2} + \displaystyle\sum\limits_i {\left\| {{\varepsilon _i}} \right\|_\rho ^2} \end{split} $$ (11)

    式中, $ N_p $为预测模型的预测时域, $ N_c $为预测模型的控制时域, $ {\eta _{{\rm{ref}}}} $为期望输出的侧向轨迹. $Q ,\;$$ R $$ \rho $代表相应的权重系数. 根据滚动优化理论, 系统状态方程在预测时域和控制时域内不断迭代后得到的系统预测模型, 结合性能指标与约束条件, 经二次规划算法可计算出最优控制序列

    $$ {u^*}(k) = {u^*}(k - 1) + \left[ {\begin{array}{*{20}{c}} 1&0& \cdots &0 \end{array}} \right]\Delta {U^*}(k) $$ (12)

    根据方向盘与前轮转角的比例系数$ G $, 可计算出方向盘转角的最优解为

    $$ {u_{SW}}^*(k) = G{u^*}(k) $$ (13)

    当车辆前方出现妨碍自身正常行驶的车辆或障碍物时, 为了提高驾驶效率, 人类驾驶员会通过变换车道的方式实现更好的驾驶体验[14]. 假设换道开始前交通车分布如图4所示.

    图 4  换道前车辆分布情况
    Fig. 4  Distribution of the vehicles before lane change

    在换道行驶过程中, 智能车M会因转向不足或安全间距过小导致和原车道前车(Lead vehicle of the original lane, Lo)发生追尾或斜向剐蹭. 或因车速调整不当或安全间距不足导致和目标车道的前车(Lead vehicle of the destined lane, Ld)和后车(Follow vehicle of the destined lane, Fd)发生追尾或剐蹭. 因此, 针对目标车道的换道行为, 本文将结合目标车道前后车的间距和速度情况做出分析, 计算出合理的行驶车速和跟车间距. 为避免因转向不足导致的与原车道前车相撞, 本驾驶员模型将结合前后两车的间距及其换道方向, 规划换道路径.

    根据图4中换道前车辆分布, 设$ t_0 $为换道开始时刻, $\Delta {D_{Lo}} ,\;$$\Delta {D_{Ld}},\;$$\Delta {D_{Fd}}$分别为智能车$ M $与原车道前车、目标车道前车和后车的当前间距. $ d_{Ls} $为车辆$ M $与前车的期望安全间距, $d_{Fs}$为车辆$ M $与后车的期望安全间距. $ v_i $为环境中各车车速$(i = $$ M,L_o,L_d,F_d)$. 假设换道前后目标车道车辆的位置关系如图5所示.

    图 5  换道后车辆分布情况
    Fig. 5  Distribution of the vehicles after lane change

    图中虚线代表换道前各个车辆所处的位置, 实线为换道结束后各车辆对应的位置, $ S_i $为各车在换道期间行驶的路程. 通常为使换道结束后前后两车的车距仍处在驾驶员期望安全间距之外, 车辆行驶过程应满足

    $$ \left\{ \begin{array}{l} {S_M} + {d_{Ls}} \le {S_{Ld}} + \Delta {D_{Ld}}\\ {S_{Fd}} + {d_{Fs}} \le {S_M} + \Delta {D_{Fd}} \end{array} \right. $$ (14)

    前车安全间距$d_{Ls}$通常根据前后两车车速$ v_M $$ v_L $、路面附着系数$ \Phi $重力加速度$ g $和与车间最小安全间距$ d_0 $来确定, 其不同环境下的大小为[15]

    $$ {d_{ls}} = \left\{ \begin{aligned} &{v_M}{\tau _r} - \dfrac{{{{({v_M} - {v_L})}^2}}}{{2g\Phi }} + {d_0},\;\;{a_L} < 0,{v_M} < {v_L}\\ &(2{v_M} \!-\! {v_L}){\tau _r} \!+\! \dfrac{{({v_M} \!-\! {v_L})({v_M} \!+\! {v_L} \!-\! 2)}}{{2g\Phi }} \!+\! {d_0},\\ &\qquad\qquad\qquad\qquad\qquad\;\;\;\;\;\;\,{v_L} < {v_M}\\ &{v_M}{\tau _r} + \dfrac{{{v_M}^2 - {v_L}^2}}{{2g\Phi }} + {d_0},\;\;\;{a_L} < 0,{v_L} < {v_M}\\ &{\tau _r}{v_M} + {d_0},\qquad\qquad\qquad\quad\!{\text{其他}} \end{aligned} \right. $$ (15)

    上式中$ d_0 $的定义参阅文献[15], 其形式为

    $$ {d_0} = {d_{Fs}} = k\frac{c}{{\Phi + d}} $$ (16)

    式中, $ c $$ d $为一常量, 在本文中定义$ c = 1.8 $, $ d = 0.17 $. $ k $为反映驾驶员实际意图的系数.

    假设车辆M的加速度为a, 通常情况下, 车辆M当前所处的车流中各车车速可大致划分为:

    工况1. $ v_M $$ > $$ v_{Ld} $$ > $$v_{Fd},$ 此时目标车道车速低于自车车速, 智能车$ M $向目标车道减速换道行驶.

    工况2. $ v_{Ld} $$ > $$ v_{Fd} $$ > $$v_M ,$ 此时智能车$ M $为了不与目标车道后车发生剐蹭, 需加速换道.

    工况3. $ v_{Ld} $$ > $$ v_M $$ > $$v_{Fd},$ 此时智能车$ M $可以匀速或加速向目标车道换道行驶.

    定义驾驶员反应时间为$ {{\tau _r}} $. 根据式(14) 以及牛顿运动学公式分别计算出智能车$ M $在以上3种情况下换道过程中采取的加速度上下限, 即$a_{{\rm{min}}}$$a_{{\rm{max}}}$.

    工况1中, 当$\Delta {D_{Ld}} < {d_{Ls}}$时, 此时由于$L_d$车速慢于本车, 且两车间距已经小于安全间距, 换道会增大两车相撞风险, 所以此工况下不做换道操作. 相反, 若车间距关系满足$\Delta {D_{Ld}} \ge {d_{Ls}}$, 根据式(14)可计算出车辆M的加速度上下限为

    $$ \left\{ \begin{array}{l} {a_{\min }} = - \dfrac{{({v_M} + {v_{Ld}} - 2{v_{Fd}})({v_M} - {v_{Ld}})}}{{2({d_{Fs}} + {v_{Fd}}{\tau _r} - \Delta {D_{Fd}} - {v_M}{\tau _r})}}\\ {a_{\max }} = - \dfrac{{{{({v_M} - {v_{Ld}})}^2}}}{{2(\Delta {D_{Ld}} + {v_{Ld}}{\tau _r} - {d_{Ls}} - {v_M}{\tau _r})}} \end{array} \right. $$ (17)

    在工况2中, 若此时换道时若车间距条件满足$\Delta {D_{Fd}} < {d_{Fs}}$, 由于目标车道后车车速较快且两车距离已经小于二者的安全间距, 因此不做换道. 相反, 若$\Delta {D_{Fd}} \ge {d_{Fs}}$$\Delta {D_{Ld}} < {d_{Ls}} ,\;$

    $$ \left\{ \begin{array}{l} {a_{\min }} = \dfrac{{{{({v_{Fd}} - {v_M})}^2}}}{{2(\Delta {D_{Fd}} + {v_M}{\tau _r} - {d_{Fs}} - {v_{Fd}}{\tau _r})}}\\ {a_{\max }} = \dfrac{{{{({v_M} - {v_{Ld}})}^2}}}{{2({d_{Ls}} + {v_M}{\tau _r} - \Delta {D_{Ld}} - {v_{Ld}}{\tau _r})}} \end{array} \right.$$ (18)

    $\Delta {D_{Fd}} \ge {d_{Fs}}$$\Delta {D_{Ld}} \ge {d_{Ls}}$, 此时目标车道前车车速快于本车且两车间距已大于安全间距, 则车辆$ M $拟采取的加速度范围为

    $$ \left\{ \begin{array}{l} {a_{\min }} = \dfrac{{{{({v_{Fd}} - {v_M})}^2}}}{{2(\Delta {D_{Fd}} + {v_M}{\tau _r} - {d_{Fs}} - {v_{Fd}}{\tau _r})}}\\ {a_{\max }} = {a_{s\max }} \end{array} \right. $$ (19)

    在工况3中, 由于此时目标车道后车慢于自车车速, 因此智能车$ M $无需考虑与其发生碰撞, 只考虑与前车的安全间距. 若$\Delta {D_{Ld}} < {d_{Ls}} ,\;$加速度范围满足:

    $$ \left\{ \begin{array}{l} {a_{\min }} = 0\\ {a_{\max }} = \dfrac{{{{({v_M} - {v_{Ld}})}^2}}}{{2({d_{Ls}} + {v_M}{\tau _r} - \Delta {D_{Ld}} - {v_{Ld}}{\tau _r})}} \end{array} \right. $$ (20)

    相反, 若满足$\Delta {D_{Ld}} \ge {d_{Ls}} ,\;$ 加速度范围为

    $$ \left\{ \begin{array}{l} {a_{\min }} = 0\\ {a_{\max }} = {a_{s\max }} \end{array} \right. $$ (21)

    通常人类驾驶车辆过程中, 车辆实际行驶的极限加速度范围为: $ a \in [{a_{s\min }},{a_{s\max }}] $, 此处$a_{s{\rm{min}}}$$a_{s{\rm{max}}}$是在保证车辆与驾驶员的舒适与稳定条件下加速度最小值与最大值. 将前文所述的安全换道加速范围与舒适性范围取交集, 即为本文设计驾驶员换道模型应采取的加速范围, 当分析模块计算出此范围为非空集合时, 代表当前工况下换道存在可行性, 即车速调整模块实际采取的加速区间为

    $$ u \in [{u_{\min }},{u_{\max }}] = [{a_{s\min }},{a_{s\max }}] \cap [{a_{\min }},{a_{\max }}] $$ (22)

    通常, 驾驶员在驶入与$L_o$的安全间距之内, 首先要根据$L_o$的运动状态调整两车的间距和速度关系, 并同时判断目标车道是否存在换道空间, 当换道空间达成后以目标车道前车为跟车对象进行车速和轨迹的调整. 因此在本驾驶员模型车速调整控制中, 将以前后两车的运动关系为被控对象进行调整. 在实际车辆行驶过程中, 前后两车间距$ \Delta D $、速度差$ v_r $、前后车车速$ v_L $$ v_M $, 前后车加速度$ a_L $$ a_M $, 后车期望加速度$ a_{Md} $之间满足

    $$ \left\{ \begin{array}{l} \mathop {\Delta {\dot D}(t)} = {v_L} - {v_M} = {v_r}\\ {{{\dot v}_r}} = {a_L} - {a_M}\\ {{\dot v}_M} = {a_M}\\ \tau {{{\dot a}_M}} + {a_M} = {a_{Md}} \end{array} \right. $$ (23)

    将上式转换为状态空间表达式形式, 定义状态量$x = {\left[ {\Delta D}\;\;{{v_r}}\;\;{{v_M}}\;\;{{a_M}} \right]^{\rm{T}}},$ 控制量$ u = a_{Md} $, 扰动量$ d = a_L $, 系统输出$y = {\left[ {\Delta D}\;\;{{v_r}} \right]^{\rm{T}}} ,$ 上述方程可表示为

    $$ \left\{ \begin{array}{l} \dot x = Ax + {B_u}u + {B_i}d\\ y = Cx \end{array} \right. $$ (24)

    取采样周期$ T $, 取增量式进行算法设计, 因此系统离散化后结果为

    $$ \left\{ \begin{array}{l} \Delta X(k + 1) = {A_d}\Delta X(k) + {B_{du}}\Delta u(k) + {B_{di}}\Delta d(k)\\ Y(k) = {C_d}\Delta X(k) + Y(k - 1) \end{array} \right. $$ (25)

    其中, ${A_d} \!=\! I + TA$, $B_{du} \!=\! TB_u ,\;$$B_{di} \!=\! TB_i ,\;$$ C_d = C $. 同样采用滚动优化思想, 对状态方程在预测时域$ p $和控制时域$ m $内迭代出系统的预测方程. 为了满足所要达到的控制目标, 定义控制器的参考输入$r(k + i) = {\left[ {{D_{{\rm{des}}}}(k)}\;\;0 \right]^{\rm{T}}}$, 其中$D_{{\rm{des}}}(k)$此处定义为期望安全间距$ d_{ls} $, 期望两车速度差$ v_r $为0. 速度控制系统的性能指标与控制量约束为

    $$ \begin{split} &\min J = \displaystyle\sum\limits_{i = 1}^p {\left\| {(y(k + i) - r(k + i))} \right\|} _Q^2\;+\\ \begin{array}{*{20}{c}} {}&{} \end{array} &\qquad\qquad\displaystyle\sum\limits_{i = 1}^m {\left\| {\Delta u(k + i - 1)} \right\|} _S^2\\ &{\rm{s.t.}}\left\{ \begin{array}{l} \Delta {u_{\min }} \le \Delta u\left( k \right) \le \Delta {u_{\max }}\\ {u_{\min }} \le u\left( k \right) \le {u_{\max }} \end{array} \right. \end{split} $$ (26)

    式中, ${{Q}}$${{S}}$代表输出量与控制增量序列的权重矩阵. 同样利用二次规划算法计算出期望加速度的最优解$ a_{Md}^*(k) $.

    为避免因转向不足导致和前车$L_o$的斜向剐蹭, 本驾驶员模型采用基于粒子群的贝塞尔曲线实施轨迹规划. 通常$ n $次贝塞尔曲线可表示为

    $$ B(t) = \sum\limits_{i = 0}^n {C_n^i{P_i}{{(1 - t)}^{n - i}}{t^i}} ,\;\;t \in [0,1] $$ (27)

    其中, $ P_i $为曲线的关键点坐标. 在轨迹规划中, 轨迹规划的关键点的选取方式如图6所示.

    图6显示了自车在换道开始前两车的位置关系. 设两车当前间距为$\Delta {D_{Lo}}$. 以实施左换道为例, 定义各个主要关键点的位置为: 实验车M的车头中心坐标$ {P_1}({x_1},{y_1}) $, 前车左后点$ {P_6}({x_6},{y_6}) $. 根据车辆轨迹的曲线特性, 在轨迹中距离前车$ P_6 $点最近的一点必为该轨迹切点$ {P_3}({x_3},{y_3}) $. 定义两点间的距离为$ R $, $ R $也可看做是轨迹与前车间的最小距离. $ P_3 $同时也是以$ P_6 $为圆心的圆的外切点. 那么将$ P_3 $切线与两侧车道中心线的交点定义为$ {P_3}({x_3},{y_3}) $$ {P_4}({x_4},{y_4}) $. 而目标车道上位于$ P_4 $前方的任意一点被定义为$ {P_5}({x_5},{y_5}) $.

    图 6  轨迹规划原理
    Fig. 6  The principle of trajectory planing

    因此, 根据参考轨迹初步设置的5个关键点, 故可采用四次贝塞尔曲线来实现轨迹规划. 为使该参考轨迹更加平滑、均匀[16], 易于跟踪, 因此本文选取优化性能指标为

    $$ \begin{split} \min J =\;& {\omega _1}\int {|\rho (x)|{\rm{d}}x} + {\omega _2}\int {|\dot \rho (x)|{\rm{d}}x}\; +\\ & {\omega _3}\int {|B(x) - S(x)|{\rm{d}}x} + {\omega _4}|\theta | \end{split} $$ (28)

    式中, $ \rho (x) $$ \dot \rho (x) $为贝塞尔曲线的曲率及其导数, $ B(x) $$ S(x) $分别为贝塞尔曲线函数和结构线函数, $ \theta $为过$ P_3 $点的切线与$ x $轴的夹角. 而$ \omega _i $为各个性能指标的权系数. 为使性能指标尽快达到最优结果, 本文最终采用粒子群算法求解各个主要关键点的最优坐标$ P_i^*(x_i^*,y_i^*) $, 最后根据贝塞尔曲线公式, 得出车辆换道过程的参考轨迹

    $$ \left\{ \begin{array}{l} {X^*}(t) = \displaystyle\sum\limits_{i = 0}^4 {C_4^ix_{i + 1}^*{{(1 - t)}^{4 - i}}{t^i}} \\ {Y^*}(t) = \displaystyle\sum\limits_{i = 0}^4 {C_4^iy_{i + 1}^*{{(1 - t)}^{4 - i}}{t^i}}, \\ {\psi ^*}(t) = {\arctan ^{ - 1}}\dfrac{{{\rm{d}}{Y^*}(t)}}{{{\rm{d}}{X^*}(t)}} \end{array} \right.\;t \in [0,1] $$ (29)

    本智能驾驶员模型将在CarSim/Simulink仿真环境下, 分别就车辆在轨迹跟踪以及自主换道两种交通场景进行实验验证. 实验车辆为CarSim2016中的D-Class型轿车.

    为验证本模型转向控制算法的可靠性, 本文分别对该驾驶员模型在高速、低路面附着系数以及弯道三种工况下驾驶员模型转向控制器进行验证. 分别对无约束模型预测控制器和本文采用的带约束模型预测控制器的控制效果进行对比说明.

    工况1. 高速双移线工况.

    此工况定义车速25${\rm{m/s}}$, 路面附着系数为0.9, 装备不同转向控制算法的智能车辆行驶轨迹和主要系统状态如图7所示.

    图 7  工况1条件下车辆行驶状态
    Fig. 7  The states of the vehicle on work Condition 1

    工况2. 低路面附着系数双移线工况.

    此工况定义车速25${\rm{m/s}}$, 路面附着系数为0.5, 装备不同转向控制算法的智能车辆行驶轨迹和主要系统状态如图8所示.

    图 8  工况2条件下车辆行驶状态
    Fig. 8  The states of the vehicle on work Condition 2

    工况3. 弯道路况.

    此工况下, 定义车速14${\rm{m/s}}$, 路面附着系数为0.9, 弯道曲率0.25. 装备不同转向控制算法的智能车辆行驶轨迹和主要系统状态如图9所示.

    图 9  工况3条件下车辆行驶状态
    Fig. 9  The states of the vehicle on work Condition 3

    根据实验结果, 在工况1条件下, 无约束MPC在25${\rm{m/s}}$车速下轨迹跟踪的能力较差, 其在变道过程中产生了震荡. 这是因为高速行驶中车辆的侧向加速度与质心侧偏角的变化幅度大, 且不受任何约束限制, 导致车辆在行车过程中发生了失稳现象. 而带约束MPC由于质心侧偏角、横摆角速度、侧向加速度和横向转移率受弹性约束, 因此其变化幅度更小, 更加平稳, 因而其在轨迹跟踪上的效果更好.

    在工况2条件下, 定义车速依然保持25${\rm{m/s}}$不变, 而路面附着系数减小至0.5时, 无约束MPC控制的智能车辆的轨迹跟踪效果愈发的变差, 其控制的车辆完全偏离了期望跟随的目标车道. 而车辆质心侧偏角、横摆角速度和侧向加速度也发生了大幅摆动, 稳定性变差, 而横向转移率也在某些时刻超过了1的极限值, 说明此时车辆有很严重的侧翻风险. 而带约束MPC控制的车辆尽管也偏离了原车道, 但很快恢复至正常轨迹中, 且其侧向稳定性更好.

    工况3的结果显示, 带约束MPC控制的车辆较好地跟随了弯道路径, 而无约束MPC控制的实验车在弯道后半程由于弯道曲率方向的改变, 其偏离了期望轨迹, 质心侧偏角、横摆角速度、侧向加速度和横向转移率的幅度加大, 其发生侧倾、侧滑等风险更高.

    本节验证驾驶员模型在换道场景下的安全性、可靠性和有效性[17], 检验当前交通环境下车辆换道行驶前后的侧向轨迹、车速变化情况以及实验车与各交通车之间的安全车距. 安全车距即智能车在行驶过程中, 与环境中其他各个车辆的等效外接矩形的直线最短距离. 其效果如图10所示.

    图 10  安全车距定义
    Fig. 10  The definition of the vehicles safety distance

    本文通过设置不同的安全间距、加速度及其增量范围和系统的反应时间来体现出智能车不一样的驾驶方式, 详细参数设置如表1所示[18].

    表 1  智能驾驶员系统参数设置
    Table 1  The definition of the intelligent driver system
    实验车M Car A Car B Car C
    最小安全间距${d_o}({\rm{m}})$ $ {d_o}(3) $ $ {d_o}(2) $ ${d_o}(1)$
    加速度幅度$({\rm{m/{s}}^2})$ 1.8 2.2 2.5
    加速度增量$({\rm{m/{s}}^2})$ 0.09 0.11 0.12
    反应时间$({\rm{s}})$ 0.4 0.7 0.9
    下载: 导出CSV 
    | 显示表格

    表1中, 最小安全间距$ {d_o} $由式(16)定义, 3, 2, 1是式(16)中参数$ k $的具体数值, 它反映了不同驾驶员对安全间距的要求. 在接下来实验中0.9为道路附着系数条件下, $ {d_o} $的值分别为: 5.1, 3.4和1.7.

    工况1. 第1种加速换道场景.

    此工况下, 装备智能驾驶员模型的实验车$ M $以20${\,\rm{m/s}}$的速度正常行驶, 此时其前方30${\,\rm{m}}$处交通车辆$ {L_o} $正以18${\,\rm{m/s}}$的速度匀速行驶. 其左前方车辆$ {L_o} $$ M $初始间距5${\,\rm{m}},$车速25${\,\rm{m/s}}$. 左后方车辆$F_d $速度20${\,\rm{m/s}}$, 距M初始间距10${\,\rm{m}}$. 三种智能车执行加速超车的结果如图11 ~ 15所示.

    图 11  工况1换道轨迹
    Fig. 11  The trajectory of the vehicle on work Condition 1
    图 15  工况1智能车与目标车道后车间距
    Fig. 15  The distance between the intelligent vehicle with the follow vehicle of the target lane on work Condition 1
    图 12  工况1速度控制
    Fig. 12  The velocity of the vehicle on work Condition 1
    图 13  工况1智能车与原车道前车间距
    Fig. 13  The distance between the intelligent vehicle with the lead vehicle of the original lane on work Condition 1
    图 14  工况1智能车与目标车道前车间距
    Fig. 14  The distance between the intelligent vehicle with the lead vehicle of the target lane on work Condition 1

    如图所示, 三种智能车均自主完成了换道行为. 车速和间距均以目标车道前车为跟车对象进行调整并逐渐趋于稳定. 从速度曲线和轨迹曲线的分析可以看出, C车因安全间距$ {d_o} $AB两车小, 因此其启动换道的时刻最晚, 同时速度调整也越快. 根据速度曲线第45${\rm{s}}$时间内, C车率先减速, 目的是为保持和前车$ {L_o} $安全间距, 而后发现邻车道具备换道空间进而加速驶入.

    车间距的对比上, 各个实验车与前车$ {L_o} $的最小间距一段时间内始终保持约2 m, 说明当前实验车换道后正超越前车$ {L_o} $, 此时两车在相邻车道上并列行驶. 而换道结束后, 实验车ABC对车辆$ {L_d} $做跟车行驶并趋于稳定. 由于各车执行换道的时间不同, 三台车对$ {L_d} $的跟车间距有所差异. 由于AB车同一工况下启动换道时间较早, 此时虽与$ {L_o} $距离较远, 但离后车$ {F_d} $距离较近, 最终迫使$ {F_d} $提前减速, 使$ {F_d} $AB车间的距离相比C车越来越大.

    工况2. 第2种加速换道场景

    此工况下, 装备智能驾驶员模型的实验车$ M $以20${\rm{m/s}}$的速度正常行驶, 此时其前方30${\rm{m }}$处交通车辆$ {L_o} $正以18${\rm{m/s}}$的速度匀速行驶. 其左前方车辆$ {L_o} $$ M $初始间距0${\rm{m}}$, 车速22.2${\rm{m/s}}$. 左后方车辆速度22.2${\rm{m/s}}$, 距M初始间距30${\rm{m}}$. 三种智能车执行加速超车的结果如图16 ~ 20所示.

    图 16  工况2换道轨迹
    Fig. 16  The trajectory of the vehicle on work Condition 2
    图 20  工况2智能车与目标车道后车间距
    Fig. 20  The distance between the intelligent vehicle with the follow vehicle of the target lane on work Condition 2
    图 17  工况2速度控制
    Fig. 17  The velocity of the vehicle on work Condition 2

    从速度曲线和轨迹曲线的分析可以看出, A车和B车由于对车辆安全间距$ {d_o} $期望较高, 因此提前减速对$ {L_o} $进行跟车, 待与$ {L_d} $的间距达到期望的安全距离以后开始换道行为. C车由于对前后方的安全车距$ {d_o} $要求较低, 所以没有对$ {L_o} $跟车减速, 而是直接执行换道. 三种车在换道过程中和环境车$ {L_o}, $ $ {L_d} $$ {F_d} $的位置关系与工况一大致相同. 且三辆车均以略小于前车$ {L_d} $的车速稳定行驶, 以逐步拉大与$ {L_d} $的间距, 保证纵向安全.

    图 18  工况2智能车与原车道前车间距
    Fig. 18  The distance between the intelligent vehicle with the lead vehicle of the original lane on work Condition 2
    图 19  工况2智能车与目标车道前车间距
    Fig. 19  The distance between the intelligent vehicle with the lead vehicle of the target lane on work Condition 2

    工况3. 减速换道场景

    假设在工况3条件下, 装备智能驾驶员系统的实验车M以20${\rm{m/s}}$的速度正常行驶, 此时其前方40${\rm{m}}$处交通车辆$ {L_o} $正以15${\rm{m/s}}$的速度匀速行驶. 其右前方车辆$ {L_d} $M初始间距20${\rm{m}}$, 车速18${\rm{m/s}}$. 左后方车辆速度18${\rm{m/s}}$, 距M初始间距10${\rm{m}}$. 三种智能车辆执行加速超车的结果如图21 ~ 25所示.

    图 21  工况3换道轨迹
    Fig. 21  The Trajectory of the vehicle on work Condition 3
    图 25  工况3智能车与目标车道后车间距
    Fig. 25  The distance between the intelligent vehicle with the follow vehicle of the target lane on work Condition 3
    图 22  工况3速度控制
    Fig. 22  The velocity of the vehicle on work Condition 3
    图 23  工况3智能车与原车道前车间距
    Fig. 23  The distance between the intelligent vehicle with the lead vehicle of the original lane on work Condition 3
    图 24  工况3智能车与目标车道前车间距
    Fig. 24  The distance between the intelligent vehicle with the lead vehicle of the target lane on work Condition 3

    从速度曲线和轨迹曲线的分析可以看出, 由于前车$ {L_o} $和目标车道的车速均小于本车车速, 且$ {L_o} $车速小于目标车道的前车$ {L_d} $. 因此为了提高驾驶效率, 智能驾驶员模型选择了向车速较慢的一侧进行换道行驶. 此方案主要检验智能驾驶员模型在减速换道场景下的换道安全. 综合以上三种工况, 说明在同一种工况下, 具有不同参数的驾驶员模型往往可以体现出不同的驾驶方式.

    本文将结合智能车在道路行驶中可能出现的安全问题, 设计了一种具有横向安全性的新型驾驶员模型. 该驾驶员模型从结构上由速度控制器、转向控制器和感知决策模块组成. 主要实现车辆准确跟踪轨迹并提高稳定性, 减小侧向安全风险, 实现自主安全换道.

    首先, 转向控制器采用预测模型结合车辆质心侧偏角、横摆角速度、侧向加速度和横向转移率的约束条件相结合, 通过对性能指标与约束条件的二次规划求解, 得出车辆模型的最优控制率. 最终在高速、低路面附着系数以及转弯路况下, 转向控制器具有良好的轨迹跟踪性能和侧向稳定性.

    其次, 为了避免在换道过程中与目标车道上的车辆发生碰撞, 通过对目标车道安全间距的分析, 确定出智能车执行换道的主要条件与驶入对象车道的参考加速度范围, 采用线性模型预测控制理论设计速度调整控制算法进行车速的控制. 为避免和原车道前车的发生碰撞, 采用粒子群算法计算最优的换道路径. 在三种换道工况下, 智能车均实现了通用场景下的主动换道行为, 并与环境车辆保持一定的安全间距. 不同驾驶员参数的设置, 也体现出了驾驶员模型在同一工况下的差异.

    最后, 本文还存在以下改进工作:

    1) 实验中设置不同驾驶员模型参数所引起的驾驶行为差异, 体现出实际行驶过程中不同驾驶员在同一路况往往体现出不一样的驾驶风格. 针对驾驶风格的定义及其判定依据和主要参数指标目前依然缺乏客观统一的依据, 后期仍需通过对真实驾驶数据的分析进行探索.

    2) 实际路况复杂多样, 论文研究难以对所有复杂路况进行一一验证. 目前本文所做工作也只能在一般条件下的换道过程进行检验. 如何使智能车适应更加极端的工况以及如何建立通用的换道检验标准也将成为本文接下来要进行的工作.


  • 1 http://turingai.ia.ac.cn
  • 2 http://turingai.ia.ac.cn/ranks/wargame_list
  • 3 http://turingai.ia.ac.cn/notices/detail/116
  • 4 http://turingai.ia.ac.cn/bbs/detail/14/1/29
  • 5 http://www.cas.cn/syky/202107/t20210712_4798152.shtml
  • 6 http://gym.openai.com
  • 图  1  包以德循环

    Fig.  1  OODA loop

    图  2  自博弈加强化学习训练

    Fig.  2  Self-gaming and reinforcement learning

    图  3  IMAPLA用于兵棋推演智能体训练

    Fig.  3  IMAPLA for training wargame agents

    图  4  知识与数据驱动“加性融合”框架

    Fig.  4  Additive fusion framework of knowledge and data driven

    图  5  人机对抗框架[45]

    Fig.  5  Human-machine confrontation framework[45]

    图  6  知识与数据驱动“主从融合”框架

    Fig.  6  Principal and subordinate fusion framework of knowledge and data driven

    图  7  智能体单项能力评估

    Fig.  7  Evaluation of specific capability of agents

    图  8  “图灵网”平台

    Fig.  8  Turing website platform

    图  9  兵棋推演知识库构建示例

    Fig.  9  Example of knowledge base construction for wargame

    图  10  兵棋推演中的异步协同与同步协同对比

    Fig.  10  Comparison between asynchronous cooperation and synchronous cooperation in wargame

    图  11  兵棋推演大模型训练挑战

    Fig.  11  Challenges of training big model for wargame

    图  12  排兵布阵问题示例

    Fig.  12  Example for problem of arranging arms

    图  13  异步协同问题示例

    Fig.  13  Example for problem of asynchronous multi-agent cooperation

    表  1  对决策带来挑战的代表性因素

    Table  1  Representative factors of challenge decision-making

    游戏雅达利围棋德州扑克星际争霸兵棋推演
    不完美信息博弈×
    长时决策×
    策略非传递性×
    智能体协作×××
    非对称环境××××
    随机性与高风险××××
    下载: 导出CSV
  • [1] Campbell M, Hoane A J Jr, Hsu F H. Deep blue. Artificial Intelligence, 2002, 134(1-2): 57-83 doi: 10.1016/S0004-3702(01)00129-1
    [2] Silver D, Huang A, Maddison C J, Guez A, Sifre L, van den Driessche G, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587): 484-489 doi: 10.1038/nature16961
    [3] Brown N, Sandholm T. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 2018, 359(6374): 418-424 doi: 10.1126/science.aao1733
    [4] Vinyals O, Babuschkin I, Czarnecki W M, Mathieu M, Dudzik A, Chung J, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 2019, 575(7782): 350-354 doi: 10.1038/s41586-019-1724-z
    [5] Ye D H, Chen G B, Zhang W, Chen S, Yuan B, Liu B, et al. Towards playing full MOBA games with deep reinforcement learning. In: Proceedings of the Advances in Neural Information Processing Systems 33. Virtual Event: MIT Press, 2020.
    [6] 胡晓峰, 贺筱媛, 陶九阳. AlphaGo的突破与兵棋推演的挑战. 科技导报, 2017, 35(21): 49-60

    Hu Xiao-Feng, He Xiao-Yuan, Tao Jiu-Yang. AlphaGo's breakthrough and challenges of wargaming. Science & Technology Review, 2017, 35(21): 49-60
    [7] 胡晓峰, 齐大伟. 智能化兵棋系统: 下一代需要改变的是什么. 系统仿真学报, 2021, 33(9): 1997-2009

    Hu Xiao-Feng, Qi Da-Wei. Intelligent wargaming system: Change needed by next generation need to be changed. Journal of System Simulation, 2021, 33(9): 1997-2009
    [8] 吴琳, 胡晓峰, 陶九阳, 贺筱媛. 面向智能成长的兵棋推演生态系统. 系统仿真学报, 2021, 33(9): 2048-2058

    Wu Lin, Hu Xiao-Feng, Tao Jiu-Yang, He Xiao-Yuan. Wargaming eco-system for intelligence growing. Journal of System Simulation, 2021, 33(9): 2048-2058
    [9] (徐佳乐, 张海东, 赵东海, 倪晚成. 基于卷积神经网络的陆战兵棋战术机动策略学习. 系统仿真学报, 2022, 34(10): 2181-2193.)

    Xu Jia-Le, Zhang Hai-Dong, Zhao Dong-Hai, Ni Wan-Cheng. Tactical maneuver strategy learning from land wargame replay based on convolutional neural network. Journal of System Simulation, 2022, 34(10): 2181-2193.
    [10] Moy G, Shekh S. The application of AlphaZero to wargaming. In: Proceedings of the 32nd Australasian Joint Conference on Artificial Intelligence. Adelaide, Australia: 2019. 3−14
    [11] Wu K, Liu M, Cui P, Zhang Y. A training model of wargaming based on imitation learning and deep reinforcement learning. In: Proceedings of the Chinese Intelligent Systems Conference. Beijing, China: 2022. 786−795
    [12] 胡艮胜, 张倩倩, 马朝忠. 兵棋推演系统中的异常数据挖掘方法. 信息工程大学学报, 2020, 21(3): 373-377 doi: 10.3969/j.issn.1671-0673.2020.03.019

    Hu Gen-Sheng, Zhang Qian-Qian, Ma Chao-Zhong. Outlier data mining of the war game system. Journal of Information Engineering University, 2020, 21(3): 373-377 doi: 10.3969/j.issn.1671-0673.2020.03.019
    [13] (张锦明. 运用栅格矩阵建立兵棋地图的地形属性. 系统仿真学报, 2016, 28(8): 1748-1756.) doi: 10.3969/j.issn.1673-3819.2018.05.016

    Zhang Jin-Ming. Using raster lattices to build terrain attribute of wargame map. Journal of System Simulation, 2016, 28(8): 1748-1756. doi: 10.3969/j.issn.1673-3819.2018.05.016
    [14] Chen L, Liang X, Feng Y, Zhang L, Yang J, Liu Z. Online intention recognition with incomplete information based on a weighted contrastive predictive coding model in wargame. IEEE Transaction on Neural Networks and Learning Systems. 2022, DOI: 10.1109/TNNLS.2022.3144171
    [15] 王桂起, 刘辉, 朱宁. 兵棋技术综述. 兵工自动化, 2012, 31(8): 38-41, 45 doi: 10.3969/j.issn.1006-1576.2012.08.012

    Wang Gui-Qi, Liu Hui, Zhu Ning. A survey of war games technology. Ordnance Industry Automation, 2012, 31(8): 38-41, 45 doi: 10.3969/j.issn.1006-1576.2012.08.012
    [16] 彭春光, 赵鑫业, 刘宝宏, 黄柯棣. 兵棋推演技术综述. 第 14 届系统仿真技术及其应用学术会议. 合肥, 中国: 2009. 366−370

    Peng Chun-Guang, Zhao Xin-Ye, Liu Bao-Hong, Huang Ke-Di. The technology of wargaming: An overview. In: Proceedings of the 14th Chinese Conference on System Simulation Technology & Application. Hefei, China: 2009. 366−370
    [17] 曹占广, 陶帅, 胡晓峰, 何吕龙. 国外兵棋推演及系统研究进展. 系统仿真学报, 2021, 33(9): 2059-2065

    Cao Zhan-Guang, Tao Shuai, Hu Xiao-Feng, He Lü-Long. Abroad wargaming deduction and system research. Journal of System Simulation, 2021, 33(9): 2059-2065
    [18] 司光亚, 王艳正. 新一代大型计算机兵棋系统面临的挑战与思考. 系统仿真学报, 2021, 33(9): 2010-2016

    Si Guang-Ya, Wang Yan-Zheng. Challenges and reflection on next-generation large-scale computer wargame system. Journal of System Simulation, 2021, 33(9): 2010-2016
    [19] Ganzfried S, Sandholm T. Game theory-based opponent modeling in large imperfect-information games. In: Proceedings of the 10th International Conference on Autonomous Agents and Multi-agent Systems. Taipei, China: 2011. 533−540
    [20] Littman M L. Algorithms for Sequential Decision Making [Ph.D. dissertation], Brown University, USA, 1996
    [21] Nieves N P, Yang Y D, Slumbers O, Mguni D H, Wen Y, Wang J. Modelling behavioural diversity for learning in open-ended games. In: Proceedings of the 38th International Conference on Machine Learning. Vienna, Austria: 2021. 8514−8524
    [22] Jaderberg M, Czarnecki W M, Dunning I, Marris L, Lever G, Castañeda A G, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 2019, 364(6443): 859-865 doi: 10.1109/TVT.2021.3096928
    [23] Baker B, Kanitscheider I, Markov T M, Wu Y, Powell G, McGrew B, et al. Emergent tool use from multi-agent autocurricula. In: Proceedings of the 8th International Conference on Learning Representations. Addis Ababa, Ethiopia: 2020.
    [24] Liu I J, Jain U, Yeh R A, Schwing A G. Cooperative exploration for multi-agent deep reinforcement learning. In: Proceedings of the 38th International Conference on Machine Learning. Vienna, Austria: 2021. 6826−6836
    [25] 周志杰, 曹友, 胡昌华, 唐帅文, 张春潮, 王杰. 基于规则的建模方法的可解释性及其发展. 自动化学报, 2021, 47(6): 1201-1216

    Zhou Zhi-Jie, Cao You, Hu Chang-Hua, Tang Shuai-Wen, Zhang Chun-Chao, Wang Jie. The interpretability of rule-based modeling approach and its development. Acta Automatica Sinica, 2021, 47(6): 1201-1216
    [26] Révay M, Líška M. OODA loop in command & control systems. In: Proceedings of the Communication and Information Technologies. Vysoke Tatry, Slovakia: 2017.
    [27] IEEE Transactions on Computational Intelligence and AI in Games, 2017, 9(3): 227-238 doi: 10.1109/TCIAIG.2016.2543661
    [28] Najam-ul-lslam M, Zahra F T, Jafri A R, Shah R, Hassan M u, Rashid M. Auto implementation of parallel hardware architecture for Aho-Corasick algorithm. Design Automation for Embedded System, 2022, 26: 29-53
    [29] 崔文华, 李东, 唐宇波, 柳少军. 基于深度强化学习的兵棋推演决策方法框架. 国防科技, 2020, 41(2): 113-121

    Cui Wen-Hua, Li Dong, Tang Yu-Bo, Liu Shao-Jun. Framework of wargaming decision-making methods based on deep reinforcement learning. National Defense Technology, 2020, 41(2): 113-121
    [30] 李琛, 黄炎焱, 张永亮, 陈天德. Actor-Critic框架下的多智能体决策方法及其在兵棋上的应用. 系统工程与电子技术, 2021, 43(3): 755-762 doi: 10.12305/j.issn.1001-506X.2021.03.20

    Li Chen, Huang Yan-Yan, Zhang Yong-Liang, Chen Tian-De. Multi-agent decision-making method based on actor-critic framework and its application in wargame. Systems Engineering and Electronics, 2021, 43(3): 755-762 doi: 10.12305/j.issn.1001-506X.2021.03.20
    [31] 张振, 黄炎焱, 张永亮, 陈天德. 基于近端策略优化的作战实体博弈对抗算法. 南京理工大学学报, 2021, 45(1): 77-83

    Zhang Zhen, Huang Yan-Yan, Zhang Yong-Liang, Chen Tian-De. Battle entity confrontation algorithm based on proximal policy optimization. Journal of Nanjing University of Science and Technology, 2021, 45(1): 77-83
    [32] 秦超, 高晓光, 万开方. 深度卷积记忆网络时空数据模型. 自动化学报, 2020, 46(3): 451-462

    Qin Chao, Gao Xiao-Guang, Wan Kai-Fang. Deep spatio-temporal convolutional long-short memory network. Acta Automatica Sinica, 2020, 46(3): 451-462
    [33] 陈伟宏, 安吉尧, 李仁发, 李万里. 深度学习认知计算综述. 自动化学报, 2017, 43(11): 1886-1897

    Chen Wei-Hong, An Ji-Yao, Li Ren-Fa, Li Wan-Li. Review on deep-learning-based cognitive computing. Acta Automatica Sinica, 2017, 43(11): 1886-1897
    [34] Burda Y, Edwards H, Storkey A J, Klimov O. Exploration by random network distillation. In: Proceedings of the 7th International Conference on Learning Representations. New Orleans, USA: 2019.
    [35] Mnih V, Badia A P, Mirza M, Graves A, Harley T, Lillicrap T P, et al. Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning. New York, USA: 2016. 1928−1937
    [36] Horgan D, Quan J, Budden D, Barth-Maron G, Hessel M, Van Hasselt H, et al. Distributed prioritized experience replay. In: Proceedings of the 6th International Conference on Learning Representations. Vancouver, Canada: 2018.
    [37] Espeholt L, Soyer H, Munos R, Simonyan K, Mnih V, Ward T, et al. IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures. In: Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: 2018. 1407−1416
    [38] Jaderberg M, Czarnecki W M, Dunning I, Marris L, Lever G, Castañeda A G, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 2019, 364(6443): 859-865 doi: 10.1126/science.aau6249
    [39] Espeholt L, Marinier R, Stanczyk P, Wang K, Michalski M. SEED RL: Scalable and efficient deep-RL with accelerated central inference. In: Proceedings of the 8th International Conference on Learning Representations. Addis Ababa, Ethiopia: 2020.
    [40] Moritz P, Nishihara R, Wang S, Tumanov A, Liaw R, Liang E, et al. Ray: A distributed framework for emerging AI applications. In: Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation. Carlsbad, USA: 2018. 561−577
    [41] 蒲志强, 易建强, 刘振, 丘腾海, 孙金林, 李非墨. 知识和数据协同驱动的群体智能决策方法研究综述. 自动化学报, 2022, 48(3): 627−643 doi: 10.16383/j.aas.c210118

    Pu Zhi-Qiang, Yi Jian-Qiang, Liu Zhen, Qiu Teng-Hai, Sun Jin-Lin, Li Fei-Mo. Knowledge-based and data-driven integrating methodologies for collective intelligence decision making: A survey. Acta Automatica Sinica, 2022, 48(3): 627−643 doi: 10.16383/j.aas.c210118
    [42] Rueden L V, Mayer S, Beckh K, Georgiev B, Giesselbach S, Heese R, et al. Informed machine learning – a taxonomy and survey of integrating prior knowledge into learning systems. IEEE Transactions on Knowledge and Data Engineering, DOI: 10.1109/TKDE.2021.3079836, 2021, 5: 1−19
    [43] Hartmann G, Shiller Z, Azaria A. Deep reinforcement learning for time optimal velocity control using prior knowledge. In: Proceedings of the 31st International Conference on Tools With Artificial Intelligence. Portland, USA: 2019. 186−193
    [44] Zhang P, Hao J Y, Wang W X, Tang H Y, Ma Y, Duan Y H, et al. KoGuN: Accelerating deep reinforcement learning via integrating human suboptimal knowledge. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. Virtual Event: 2020. 2291−2297
    [45] 黄凯奇, 兴军亮, 张俊格, 倪晚成, 徐博. 人机对抗智能技术. 中国科学: 信息科学, 2020, 50(4): 540-550 doi: 10.1360/N112019-00048

    Huang Kai-Qi, Xing Jun-Liang, Zhang Jun-Ge, Ni Wan-Cheng, Xu Bo. Intelligent technologies of human-computer gaming. Scientia Sinica Informations, 2020, 50(4): 540-550 doi: 10.1360/N112019-00048
    [46] Elo A E. The Rating of Chess Players, Past and Present. London: Batsford, 1978.
    [47] Herbrich R, Minka T, Graepel T. TrueSkill (TM): A Bayesian skill rating system. In: Proceedings of the 19th International Conference on Neural Information Processing Systems. Vancou-ver, Canada: 2006. 569−576
    [48] Balduzzi D, Tuyls K, Perolat J, Graepel T. Re-evaluating evaluation. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: 2018. 3272−3283
    [49] Omidshafiei S, Papadimitriou C, Piliouras G, Tuyls K, Rowland M, Lespiau J B, et al. α-rank: Multi-agent evaluation by evolution. Scientific Reports, 2019, 9(1): Article No. 9937 doi: 10.1038/s41598-019-45619-9
    [50] 唐宇波, 沈弼龙, 师磊, 易星. 下一代兵棋系统模型引擎设计问题研究. 系统仿真学报, 2021, 33(9): 2025-2036

    Tang Yu-Bo, Shen Bi-Long, Shi Lei, Yi Xing. Research on the issues of next generation wargame system model engine. Journal of System Simulation, 2021, 33(9): 2025-2036
    [51] Ji S, Pan S, Cambria E, Marttinen P, Yu P. A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(2): 494-514 doi: 10.1145/154421.154422
    [52] Wang Z, Zhang J W, Feng J L, Chen Z. Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence. Québec, Can-ada: 2014. 1112−1119
    [53] 王保魁, 吴琳, 胡晓峰, 贺筱媛, 郭圣明. 基于时序图的作战指挥行为知识表示学习方法. 系统工程与电子技术, 2020, 42(11): 2520-2528 doi: 10.3969/j.issn.1001-506X.2020.11.14

    Wang Bao-Kui, Wu Lin, Hu Xiao-Feng, He Xiao-Yuan, Guo Sheng-Ming. Operations command behavior knowledge representation learning method based on sequential graph. Systems Engineering and Electronics, 2020, 42(11): 2520-2528 doi: 10.3969/j.issn.1001-506X.2020.11.14
    [54] 刘嵩, 武志强, 游雄, 张欣, 王雪峰. 基于兵棋推演的综合战场态势多尺度表达. 测绘科学技术学报, 2012, 29(5): 382-385, 390 doi: 10.3969/j.issn.1673-6338.2012.05.015

    Liu Song, Wu Zhi-Qiang, You Xiong, Zhang Xin, Wang Xue-Feng. Multi-scale expression of integrated battlefield situation based on wargaming. Journal of Geomatics Science and Technology, 2012, 29(5): 382-385, 390 doi: 10.3969/j.issn.1673-6338.2012.05.015
    [55] 贺筱媛, 郭圣明, 吴琳, 李东, 许霄, 李丽. 面向智能化兵棋的认知行为建模方法研究. 系统仿真学报, 2021, 33(9): 2037-2047

    He Xiao-Yuan, Guo Sheng-Ming, Wu Lin, Li Dong, Xu Xiao, Li Li. Modeling research of cognition behavior for intelligent wargaming. Journal of System Simulation, 2021, 33(9): 2037-2047
    [56] 朱丰, 胡晓峰, 吴琳, 贺筱媛, 吕学志, 廖鹰. 从态势认知走向态势智能认知. 系统仿真学报, 2018, 30(3): 761-771

    Zhu Feng, Hu Xiao-Feng, Wu Lin, He Xiao-Yuan, Lü Xue-Zhi, Liao Ying. From situation cognition stepped into situation intelligent cognition. Journal of System Simulation, 2018, 30(3): 761-771
    [57] Heinrich J, Lanctot M, Silver D. Fictitious self-play in extensive-form games. In: Proceedings of the 32nd International Conference on Machine Learning. Lille, France: 2015. 805−813
    [58] Adam L, Horcík R, Kasl T, Kroupa T. Double oracle algorithm for computing equilibria in continuous games. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. Virtual Event: 2021. 5070−5077
    [59] Nguyen T T, Nguyen N D, Nahavandi S. Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE Transactions on Cybernetics, 2020, 50(9): 3826-3839 doi: 10.1109/TCYB.2020.2977374
    [60] Zhang K Q, Yang Z R, Başar T. Multi-agent reinforcement learning: A selective overview of theories and algorithms. Hand-book of Reinforcement Learning and Control, 2021: 321−384
    [61] 施伟, 冯旸赫, 程光权, 黄红蓝, 黄金才, 刘忠, 贺威. 基于深度强化学习的多机协同空战方法研究. 自动化学报, 2021, 47(7): 1610-1623

    Shi Wei, Feng Yang-He, Cheng Guang-Quan, Huang Hong-Lan, Huang Jin-Cai, Liu Zhong, He Wei. Research on multi-aircraft cooperative air combat method based on deep reinforcement learning. Acta Automatica Sinica, 2021, 47(7): 1610-1623
    [62] 梁星星, 冯旸赫, 马扬, 程光权, 黄金才, 王琦, 周玉珍, 刘忠. 多Agent深度强化学习综述. 自动化学报, 2020, 46(12): 2537-2557

    Liang Xing-Xing, Feng Yang-He, Ma Yang, Cheng Guang-Quan, Huang Jin-Cai, Wang Qi, Zhou Yu-Zhen, Liu Zhong. Deep multi-agent reinforcement learning: A survey. Acta Automatica Sinica, 2020, 46(12): 2537-2557
    [63] Yan D, Weng J, Huang S, Li C, Zhou Y, Su H, Zhu J. Deep reinforcement learning with credit assignment for combinatorial optimization. Pattern Recognition, 2022, 124: Artice No. 108466
    [64] Lansdell B J, Prakash P R, Körding K P. Learning to solve the credit assignment problem. In: Proceedings of the 8th International Conference on Learning Representations. Addis Ababa, Ethiopia: 2020.
    [65] 孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题. 自动化学报, 2020, 46(7): 1301-1312

    Sun Chang-Yin, Mu Chao-Xu. Important scientific problems of multi-agent deep reinforcement learning. Acta Automatica Sinica, 2020, 46(7): 1301-1312
    [66] Sunehag P, Lever G, Gruslys A, Czarnecki W M, Zambaldi V, Jaderberg M, et al. Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and Multi-agent Systems. Stockholm, Sweden: 2018. 2085−2087
    [67] Rashid T, Samvelyan M, De Witt C S, Farquhar G, Foerster J N, Whiteson S. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning. Stock-holm, Sweden: 2018. 4292−4301
    [68] Son K, Kim D, Kang W J, Hostallero D, Yi Y. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: 2019. 5887−5896
    [69] Foerster J N, Farquhar G, Afouras T, Nardelli N, Whiteson S. Counterfactual multi-agent policy gradients. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, USA: 2018. 2974−2982
    [70] Nguyen D T, Kumar A, Lau H C. Credit assignment for collective multi-agent RL with global rewards. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: 2018. 8113−8124
    [71] Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 2018, 362(6419): 1140-1144 doi: 10.1126/science.aar6404
    [72] Yu Y. Towards sample efficient reinforcement learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden: 2018. 5739−5743
    [73] Ecoffet A, Huizinga J, Lehman J, Stanley K O, Clune J. First return, then explore. Nature, 2021, 590(7847): 580-586 doi: 10.1038/s41586-020-03157-9
    [74] Jin C, Krishnamurthy A, Simchowitz M, Yu T C. Reward-free exploration for reinforcement learning. In: Proceedings of the 37th International Conference on Machine Learning. Virtual Event: 2020. 4870−4879
    [75] Mahajan A, Rashid T, Samvelyan M, Whiteson S. MAVEN: Multi-agent variational exploration. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: 2019. Article No. 684
    [76] Yang Y D, Wen Y, Chen L H, Wang J, Shao K, Mguni D, et al. Multi-agent determinantal Q-learning. In: Proce-edings of the 37th International Conference on Machine Learning. Virtual Event: 2020. 10757−10766
    [77] Wang T H, Dong H, Lesser V, Zhang C J. ROMA: Role-oriented multi-agent reinforcement learning. In: Proceedings of the 37th International Conference on Machine Learning. Virtual Event: 2020. 9876−9886
    [78] 张钹, 朱军, 苏航. 迈向第三代人工智能. 中国科学: 信息科学, 2020, 50(9): 1281-1302 doi: 10.1360/SSI-2020-0204

    Zhang Bo, Zhu Jun, Su Hang. Toward the third generation of artificial intelligence. Scientia Sinca Informationis, 2020, 50(9): 1281-1302 doi: 10.1360/SSI-2020-0204
    [79] 王保剑, 胡大裟, 蒋玉明. 改进A*算法在路径规划中的应用. 计算机工程与应用, 2021, 57(12): 243-247 doi: 10.3778/j.issn.1002-8331.2008-0099

    Wand Bao-Jian, Hu Da-Sha, Jiang Yu-Ming. Application of improved A* algorithm in path planning. Computer Engineering and Applications, 2021, 57(12): 243-247 doi: 10.3778/j.issn.1002-8331.2008-0099
    [80] 张可, 郝文宁, 史路荣, 余晓晗, 邵天浩. 基于级联模糊系统的兵棋进攻关键点推理. 控制工程, 2021, 28(7): 1366-1374

    Zhang Ke, Hao Wen-Ning, Shi Lu-Rong, Yu Xiao-Han, Shao Tian-Hao. Inference of key points of attack in wargame based on cascaded fuzzy system. Control Engineering of China, 2021, 28(7): 1366-1374
    [81] 邢思远, 倪晚成, 张海东, 闫科. 基于兵棋复盘数据的武器效用挖掘. 指挥与控制学报, 2020, 6(2): 132-140 doi: 10.3969/j.issn.2096-0204.2020.02.0132

    Xing Si-Yuan, Ni Wan-Cheng, Zhang Hai-Dong, Yan Ke. Mining of weapon utility based on the replay data of wargame. Journal of Command and Control, 2020, 6(2): 132-140 doi: 10.3969/j.issn.2096-0204.2020.02.0132
    [82] 金哲豪, 刘安东, 俞立. 基于GPR和深度强化学习的分层人机协作控制. 自动化学报, 2020, 46: 1-11

    Jin Zhe-Hao, Liu An-Dong, Yu Li. Hierarchical human-robot cooperative control based on GPR and DRL. Acta Automatica Sinica, 2020, 46: 1-11
    [83] 徐磊, 杨勇. 基于兵棋推演的分队战斗行动方案评估. 火力与指挥控制, 2021, 46(4): 88-92, 98 doi: 10.3969/j.issn.1002-0640.2021.04.016

    Xu Lei, Yang Yong. Research on evaluation of unit combat action plan based on wargaming. Fire Control & Command Control, 2021, 46(4): 88-92, 98 doi: 10.3969/j.issn.1002-0640.2021.04.016
    [84] 李云龙, 张艳伟, 王增臣. 联合作战方案推演评估技术框架. 指挥信息系统与技术, 2020, 11(4): 78-83

    Li Yun-Long, Zhang Yan-Wei, Wang Zeng-Chen. Technical framework of joint operation scheme deduction and evaluation. Command Information System and Technology, 2020, 11(4): 78-83
    [85] Myerson R B. Game Theory. Cambridge: Harvard University Press, 2013.
    [86] Weibull J W. Evolutionary Game Theory. Cambridge: MIT Press, 1997.
    [87] Roughgarden T. Algorithmic game theory. Communications of the ACM, 2010, 53(7): 78-86 doi: 10.1145/1785414.1785439
    [88] Chalkiadakis G, Elkind E, Wooldridge M. Cooperative game theory: Basic concepts and computational challenges. IEEE Intelligent Systems, 2012, 27(3): 86-90 doi: 10.1109/MIS.2012.47
    [89] 周雷, 尹奇跃, 黄凯奇. 人机对抗中的博弈学习方法. 计算机学报, DOI: 10.11897/SP.J.1016.2022.01859

    Zhou Lei, Yin Qi-Yue, Huang Kai-Qi. Game-theoretic learning in human-computer gaming. Chinese Journal of Computers, DOI: 10.11897/SP.J.1016.2022.01859
    [90] Lanctot M, Zambaldi V, Gruslys A, Lazaridou A, Tuyls K, Pérolat J, et al. A unified game-theoretic approach to multi-agent reinforcement learning. In: Proceedings of the 31st Conference on Neural Information Processing Systems. Long Beach, USA: 2017. 4190−4203
    [91] Brown N, Lerer A, Gross S, Sandholm T. Deep counterfactual regret minimization. In: Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: 2019. 793−802
    [92] Qiu X P, Sun T X, Xu Y G, Shao Y F, Dai N, Huang X J. Pre-trained models for natural language processing: A survey. Science China Technological Sciences, 2020, 63(10): 1872-1897 doi: 10.1007/s11431-020-1647-3
    [93] Zhang Z Y, Han X, Zhou H, Ke P, Gu Y X, Ye D M, et al. CPM: A large-scale generative Chinese Pre-trained language model. AI Open, 2021, 2: 93-99 doi: 10.1016/j.aiopen.2021.07.001
    [94] Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al. Language models are few-shot learners. In: Proceedings of the 34th Conference on Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2020.
    [95] Meng D Y, Zhao Q, Jiang L. A theoretical understanding of self-paced learning. Information Sciences, 2017, 414: 319-328 doi: 10.1016/j.ins.2017.05.043
    [96] Singh P, Verma V K, Mazumder P, Carin L, Rai P. Calibrating CNNs for lifelong learning. In: Proceedings of the 34th Conference on Neural Information Processing Systems. Vancouver, Canada: 2020.
    [97] Cheng W, Yin Q Y, Zhang J G. Opponent strategy recognition in real time strategy game using deep feature fusion neural network. In: Proceedings of the 5th International Conference on Computer and Communication Systems. Shanghai, China: 2020. 134−137
    [98] Samvelyan M, Rashid T, De Witt C S, Farquhar G, Nardelli N, Rudner T G J, et al. The StarCraft multi-agent challenge. In: Proceedings of the 18th International Conference on Auto-nomous Agents and Multi-agent Systems. Montreal, Canada: 2019. 2186−2188
    [99] Tang Z T, Shao K, Zhu Y H, Li D, Zhao D B, Huang T W. A review of computational intelligence for StarCraft AI. In: Proceedings of the IEEE Symposium Series on Computational Intelligence. Bangalore, India: 2018. 1167−1173
    [100] Christianos F, Schäfer L, Albrecht S V. Shared experience actor-critic for multi-agent reinforcement learning. In: Proceedings of the Advances in Neural Information Processing Systems 33. Virtual Event: 2020.
    [101] Jaques N, Lazaridou A, Hughes E, Gulcehre C, Ortega P A, Strouse D J, et al. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In: Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: 2019. 3040−3049
  • 期刊类型引用(1)

    1. 胡鹏,朱建新,刘昌盛,龚俊,张大庆,赵喻明. 基于状态规则的液压挖掘机虚拟驾驶员建模与仿真研究. 中南大学学报(自然科学版). 2021(04): 1118-1128 . 百度学术

    其他类型引用(16)

  • 加载中
  • 图(13) / 表(1)
    计量
    • 文章访问数:  5549
    • HTML全文浏览量:  3351
    • PDF下载量:  1460
    • 被引次数: 17
    出版历程
    • 收稿日期:  2021-06-17
    • 录用日期:  2021-09-17
    • 网络出版日期:  2021-10-24
    • 刊出日期:  2023-05-20

    目录

    /

    返回文章
    返回