-
摘要: 动态系统的实时安全性评估(Real-time safety assessment, RTSA)在防止潜在安全事故导致重大损失方面发挥着关键作用. 随着系统功能和操作环境复杂性的日益增加, 开发有效的实时安全性评估技术面临着更大的挑战. 鉴于此, 阐述动态系统实时安全性评估的概念定义, 从环境的平稳性及评估模型的构建方式两个维度出发提出一种分类框架, 给出相应的问题描述, 较系统地回顾了动态系统实时安全性评估技术的现有进展, 讨论针对不同实际系统的部署策略, 分析现有技术的发展趋势, 探讨实时安全性评估中亟待解决的问题与未来的发展方向.Abstract: Real-time safety assessment (RTSA) of dynamic systems plays a critical role in preventing serious losses from potential safety incidents. As the complexity of system functionality and operational environments increases, the development of effective RTSA technologies is faced with greater challenges. In this paper, the conceptual definition of RTSA of dynamic systems is elucidated. A taxonomy framework is introduced based on two dimensions: the stationary properties of the environment and the construction methods of assessment models, along with detailed problem descriptions. Existing safety assessment technologies and discusses deployment strategies for different practical systems are systematically reviewed. We then analyze the developmental trends of current technologies and explore the pressing issues and future development directions in RTSA.
-
在刚性航天器一致性[1]和欧拉−拉格朗日系统的编队控制[2]等应用场景中, 直接测量和反馈系统的输出变量更为方便和可靠. 例如, 在多无人车编队中, 通过全球定位系统(Global positioning system, GPS)等技术直接测量每辆车的位置和速度, 比估计和控制内部状态更简单易行[3]. 因此, 输出一致性跟踪控制在多智能体系统(Multi-agent system, MAS)的工程应用中更具实用性.
线性控制方法在传统多智能体控制理论中占据重要地位[4-5], 其通过将复杂的非线性系统线性化为多个局部线性系统来简化控制问题[6-8]. 然而, 异构非线性多智能体系统的高度非线性和动态特性使得这些方法难以有效应用. 具体来说, 线性控制方法在处理大范围动态变化和强耦合非线性特性时表现出较大局限性, 例如在多机器人协同任务中, 简化模型无法准确地反映各机器人不同的动力学特性, 导致控制精度和鲁棒性下降.
非线性控制方法直接处理系统的非线性特性, 通过Lyapunov方法[9-10]、反馈线性化[11-12]等理论设计控制策略. 尽管理论上能够解决线性方法的不足, 但其应用面临诸多困难: 需要精确的系统模型、设计和实现复杂, 特别是在异构多智能体系统中, 要求各智能体之间的协调和实时响应, 增加了计算量和实现难度[13]. 此外, 非线性控制方法在处理高维度系统和外界扰动时, 稳定性和鲁棒性也受到挑战.
无模型自适应动态规划方法作为一种数据驱动的控制策略[14]逐渐受到关注, 通过与环境交互, 基于奖励机制自主学习最优策略, 无需系统模型即可实现复杂任务的控制. Jiang等[15]提出一种数据驱动的自适应动态规划方法, 使用输入和输出序列作为基础状态的等效表示, 解决了部分可观测系统状态的离散线性多智能体系统的最优输出一致性控制问题. 对于部分未知动力学的严格反馈非线性多智能体系统, 文献[16]在输出调节理论下, 提出基于实测数据结合神经网络和自适应动态规划求解最优输出反馈控制的方法. 然而, 对于异构非线性系统的无模型输出一致性控制研究仍处于起步阶段.
无模型学习控制方法也存在明显不足: 自适应动态规划方法的训练过程对参数选择和奖励设计高度敏感, 可能导致策略的鲁棒性和稳定性不佳; 可解释性差, 使得控制策略的进一步调整变得困难; 在系统跟踪时变信号时, 自适应动态规划方法本身不具备预测未来状态的能力, 这使其更适合镇定控制而非跟踪控制.
混合控制策略利用不同方法的互补特性解决异构非线性多智能体系统的一致性控制问题[17]. 结合自适应动态规划与经典控制理论, 可以在数据驱动的基础上引入稳定性分析, 提升控制策略的可靠性[18]. 然而, 混合控制策略设计和实现难度大, 需在不同方法之间找到平衡点, 确保整体系统的稳定性和性能.
上述背景下, 本工作结合输入输出反馈线性化理论和自适应动态规划, 从简化分布式控制器设计、增加控制器可解释性、降低学习对奖励设计的敏感度的角度出发, 开发了异构非线性多智能体系统的无模型输出一致性控制方法. 具体来说, 通过构建一个同胚分布式两层控制结构, 将异构非线性多智能体系统的无模型输出一致性控制问题转化为两个问题进行求解: 在物理空间层中利用观测数据, 提出能够动态调整奖励信号的两阶段双启发式自适应动态规划方法实现非线性系统的无模型输入输出反馈线性化; 在同胚线性化空间层中, 基于线性化系统设计一致性分布式控制器, 实现被控多智能体系统的输出一致性控制. 本文的主要创新点和贡献如下:
1)现有分布式控制方法在处理异构多智能体输出一致性控制时[15-16], 因模型未知和非线性动态的影响, 会造成黎卡提方程或贝尔曼方程求解困难的问题. 为此, 本文提出一种基于无模型反馈线性化的同胚分布式控制协议, 不依赖精确模型的情况下实现输出一致性控制. 不同于传统无模型分布式控制方法, 分层分布式控制协议包含两层控制策略, 在物理空间层通过构建自适应动态规划算法求解无模型反馈线性化控制器, 将未知非线性多智能体系统转化为已知的线性系统. 结合同胚空间层的一致性控制协议, 该线性化系统可以根据协同任务的性能需求进行预设计或二次设计, 当控制任务发生改变时无需重新学习, 从而降低一致性策略设计难度.
2)解决物理空间层中反馈线性化控制器对精确模型的依赖问题是分层分布式方法实施的关键, 本文设计一种基于两阶段迭代学习的无模型自适应动态规划算法. 算法在值函数学习过程中引入目标依赖, 可以动态调整奖励信号以适应异构的智能体, 无需设计不同奖励信号, 同时通过一个双启发式评价网络实现线性化控制策略快速更新.
1. 图论和问题描述
本节首先详细描述图论的相关概念, 然后针对异构非线性多智能体输出一致性问题, 分析其求解难度和存在问题.
1.1 图论
存在一个有向图$ {\cal{G}}({\cal{K}},\;\Gamma ,\;{\cal{A}}) $包含领导者和$ N $个跟随者节点, 其中$ {\cal{K}} = \left\{ {{\kappa _1},\;{\kappa _2},\; \cdots ,\;{\kappa _N}} \right\} $是一个非空有限节点集, 表示有向边集; $ {\cal{A}} = \left[ {{a_{ij}}} \right] \in {{\bf{R}} ^{N \times N}} $是一个相关的邻接矩阵, $ {a_{ij}} = 1 $表示节点$ j $到$ i $之间存在一个有向边, 满足$ ({\kappa _j},\;{\kappa _i}) \in \Gamma $, $ \Gamma \subseteq {\cal{K}} \times {\cal{K}} $, 否则, $a_{{ij}} =0$. 设增益${{b}_{i}}\ge 0$, 只有与领导节点直接相连的节点才不为零, ${\cal{B}} = {\mathrm{diag}}\left\{ \sum{{{b}_{i}}} \right\}$. 令与节点${{\kappa }_{i}}$存在有向图相连的邻居集合为${{\aleph }_{i}} = \{ {{\kappa }_{j}}:({{\kappa }_{j}},\;{{\kappa }_{i}})\in \Gamma \}$, 进一步定义一个入度矩阵为${\cal{D}} = {\mathrm{diag}}\{ \sum\nolimits_{j\in {{\aleph }_{i}}} {{{a}_{ij}}} \}$, $i = 1,\;2,\;\cdots ,\;N$, 则有向图$ {\cal{G}} $的Laplacian矩阵表示为$ {\cal{L}} = {\cal{D}}-{\cal{A}} $.
1.2 问题描述
考虑$ N $个异构仿射非线性多智能体系统, 智能体分布在有向图$ {\cal{G}} $上, 系统动力学模型可描述为
$$ \begin{cases} {x}_{i,\;k+1} = f_i(x_{i,\;k}) + g_i(x_{i,\;k})u_{i,\;k} \\ y_{i,\;k} = h_i(x_{i,\;k}) \end{cases} \quad $$ (1) 其中, $ i \in {\cal{N}} $, $ {\cal{N}} = 1,\;2,\;\cdots,\;N $, $ {{x}_{i,\;k}}\in {{{\bf{R}} }^{n}} $为状态向量, $ {{u}_{i}}\in {{\bf{R} }^{{{m}}}} $表示控制策略. 光滑向量场$ {{f}_{i}}({{x}_{i,\;k}})\in {{\bf{R}}^{n}} $和$ {{g}_{i}}({{x}_{i,\;k}})\in {{\bf{R} }^{n\times m}} $表示未知的系统动力学漂移阵和输入阵, ${{h}_{i}}({{x}_{i,\;k}})\in {{\bf{R}}} $为输出矩阵, 均满足在$ {{\bf{R}}^{n}} $上Lipschitz连续且有界, $ f_i(0) = 0 $.
假设1. 智能体的相对阶$ \rho_i = n $.
假设2. 对于$ \forall i \in {\cal{N}} $, 总存在一个$ j \in {\cal{N}} $且$ j \ne i $, 使得$ {f_i}({x_{i,\;k}}) \ne {f_j}({x_{i,\;k}}) $; 总存在一个$ k \in {\cal{N}} $且$ k \ne i $, 使得$ {g_i}({x_{i,\;k}}) \ne {g_k}({x_{i,\;k}}) $.
在跟踪同步问题中, 需要设计分布式控制输入$ {{u}_{i,\;k}} $, 使所有节点的输出与领导节点$ {{y}_{r}} $的输出同步. 领导节点可以是一个期望轨迹生成器, 也可以是智能决策的结果, 或者人工示教的轨迹, 它代表所需的期望轨迹. 领导者的动力学模型为
$$ \begin{cases} {x}_{r,\;k+1} = f_r(x_{r,\;k}) \\ y_{r,\;k} = h_r(x_{r,\;k}) \end{cases} \quad $$ (2) 其中, $ {{x}_{r,\;k}}\in {{\bf{R} }^{n}} $. 函数$ {{f}_{r}}(\cdot ) $和$ {{h}_{r}}(\cdot ) $假设为$C_\infty $类. 输出$ {{y}_{r,\;k}} $是跟踪领导者输出所需的期望性能输出. 假设所有的智能体状态都是可测量的, 或者在系统对于输出满足能观性时, 也可以添加观察器.
为了解决智能体(1)和期望轨迹(2)的输出一致性跟踪问题, 智能体与期望轨迹的跟踪误差为$ {{e}_{p,\;i,\;k}} = {{y}_{i,\;k}}-{{y}_{r,\;k}} $, 多智能体协同局部邻域跟踪误差可表示为
$$ {\cal{E}}_{i,\;k} = \sum\limits_{j \in {\cal{N}}_i} a_{ij} (y_{i,\;k} - y_{j,\;k}) + b_i e_{p,\;i,\;k} \quad $$ (3) 假设3. 有向图$ {\cal{G}} $存在一个生成树结构, 且至少有一个根节点的增益$ {{b}_{i}} $是非零的, 意味着至少有一个智能体直接与领导者通讯.
由式(3)可知, 有向图$ {\cal{G}} $的全局邻域误差向量为
$$ E = \left[ ({\cal{L}} + {\cal{B}}) \otimes I_\rho \right] (Y - Y_r) \equiv \left[ ({\cal{L}} + {\cal{B}}) \otimes I_\rho \right] \delta \quad $$ (4) 其中, $ Y = {{\left[ {{y}_{1,\;k}},\;{{y}_{2,\;k}},\;\cdots ,\;{{y}_{N,\;k}} \right]}^{{\mathrm{T}}}}\in {{\bf{R} }^{N}} $表示系统全局输出向量, $ {{Y}_{r}} = {{1}_{N}}\otimes {{y}_{r}} $, $ {{1}_{N}} $表示元素全为1的$ N $维向量, $ \otimes $表示Kronecker积, $ E = [ {{{\cal{E}}}_{1}},\;{{{\cal{E}}}_{2}},\;\cdots ,\; {{{\cal{E}}}_{N}} ]^{{\mathrm{T}}}\in {{\bf{R} }^{N}} $. $ \delta $为全局跟踪误差向量, 由于其是一个全局向量, 无法在每个节点局部计算.
为了实现完全分布式的控制结构, 本文利用式(3)中的局部邻域跟踪误差来解决输出同步问题. 由式(1)和式(3)联例可得智能体$ i $的局部跟踪误差动力学:
$$ \begin{split} {{\cal{E}}_{i,\;k + 1}}=\; & \mathop \sum \limits_{j \in {{\cal{N}}_i}} {\mkern 1mu} {\kern 1pt} {a_{ij}}{h_i}\left[ {{f_i}({x_{i,\;k}}) + {g_i}({x_{i,\;k}}){u_{i,\;k}}} \right]-\\ & \mathop \sum \limits_{j \in {{\cal{N}}_i}} {\mkern 1mu} {\kern 1pt} {a_{ij}}{h_j}\left[ {{f_j}({x_{j,\;k}}) + {g_j}({x_{j,\;k}}){u_{j,\;k}}} \right]+\\ & {b_i}\{ {h_i}\left[ {{f_i}({x_{i,\;k}}) + {g_i}({x_{i,\;k}}){u_{i,\;k}}} \right] -\\ &{h_r}\left[ {{f_r}({x_{r,\;k}})} \right] \} \end{split} $$ (5) 对于包含复杂非线性部分的误差动力学(5), 传统控制理论在解决输出一致性控制问题时, 常受到黎卡提方程难以求解的困扰, 尤其是在系统的非线性动态未知且异构的情况下, 输出一致性控制器求解极其复杂.
输入输出反馈线性化技术能够通过微分同胚映射将非线性系统的输出$ {{y}_{i,\;k}} $与输入$ {{u}_{i,\;k}} $之间的动态关系转化为线性关系, 从而实现非线性系统的严格线性化. 基于模型的反馈线性化控制器求解形式如下所示:
$$ \begin{split} {u_{i,\;k}} =\;& \frac{{ - L_{{f_i}}^\rho {h_i}({x_{i,\;k}})}}{{{L_{{g_i}}}L_{{f_i}}^{\rho - 1}{h_i}({x_{i,\;k}})}} + \frac{{{v_{i,\;k}}}}{{{L_{{g_i}}}L_{{f_i}}^{\rho - 1}{h_i}({x_{i,\;k}})}}=\\ & {\beta _i}({x_{i,\;k}}) + {\alpha _i}({x_{i,\;k}}){v_{i,\;k}} \end{split} $$ (6) 其中, $ L $为李导数运算符, $ {u_{i,\;k}} $为实际控制输入, $ {v_{i,\;k}} $是一个虚拟输入, 在本文中作为分布式控制的输入端. 经过严格反馈线性化, 可消除系统非线性项并得到:
$$ \begin{aligned} y_{i,\;k}^{(\rho )} = {v_{i,\;k}} \end{aligned} $$ (7) 此时, 非线性多智能体通过微分同胚映射$ \Phi ({{x}_{i,\;k}}) $投影到同胚线性空间中的动力学方程为
$$ \left\{\begin{split} &{\xi _{i,\;k + 1}} = A{\xi _{i,\;k}} + B{v_{i,\;k}}\\& {y_{i,\;k}} = C{\xi _{i,\;k}} \end{split}\right. $$ (8) 其中, $ A = \left[ {\begin{array}{*{20}{c}} {{0_{(n{\rm{ - }}1) \times 1}}}&{{I_{n{\rm{ - }}1}}}\\ 0&{{0_{1 \times (n{\rm{ - }}1)}}} \end{array}} \right] $, $ B = \left[ {\begin{array}{*{20}{c}} {{0_{(n{\rm{ - }}1) \times 1}}}\\ I \end{array}} \right] $, $ C = \left[ {\begin{array}{*{20}{c}} I&{{0_{1 \times (n{\rm{ - }}1)}}} \end{array}} \right] $. 由此, 每个智能体均被映射为系统结构已知的线性化系统.
然而, 在原系统模型未知的情况下, $ {\alpha _i}({x_{i,\;k}}) $和$ {\beta _i}({x_{i,\;k}}) $的精确求解变得极为困难, 不严格的反馈线性化将影响分布式控制器的执行效果. 本文提出的控制策略核心在于无模型自适应动态规划方法, 在不依赖精确模型的前提下, 实现非线性多智能体系统的精确线性化, 使每个智能体的动力学行为近似为同一期望的线性系统动力学, 进而能够利用传统的线性控制理论设计分布式控制器, 实现全局系统的输出一致性.
2. 同胚分布式控制协议
为解决模型未知的输出一致性控制问题, 本文提出一种同胚分布式控制协议(如图1). 通过无模型自适应动态规划实现输入输出反馈线性化, 将异构非线性多智能体系统转化为同构线性系统, 从而简化分布式控制器的设计. 在物理空间中, 利用自适应动态规划方法设计输入输出反馈线性化控制器, 将智能体的闭环动态通过微分同胚映射为期望的线性系统, 实现与之一致的输出响应; 在同胚空间中, 以期望线性系统为基础设计分布式一致性控制器. 通过物理空间的线性化处理和同胚空间的协同作用, 将控制性能优化与分布式决策设计相结合, 以实现异构非线性智能体的输出一致性控制.
2.1 无模型输入输出反馈线性化
为近似求解未知的反馈线性化控制器(6), 首先需设计评价指标引导反馈线性化控制器学习. 考虑到系统输入输出未完成线性化前, 结合式(7), 存在如下微分状态误差:
$$ {\bar e_{i,\;k}} = {v_{i,\;k}} - y_{i,\;k}^{(\rho )} $$ (9) 自适应动态规划的目标是调整控制器使得$ {\bar e_{i,\;k}} $最小, 此时系统将被反馈线性化. 为得到$ y_{i,\;k}^{(\rho )} $, 采用式(8)作为期望转化的目标线性系统, 构造龙伯格状态观测器用以重构被控对象线性化状态:
$$ \left\{\begin{split} &{{{{\hat \xi}}}_{i,\;k+1}} = A{{{\hat \xi}}_{i,\;k}}+B{{v}_{i,\;k}} +H({{{{y}}}_{i,\;k}} - {{{\hat{y}}}_{i,\;k}})\\& {{{\hat{y}}}_{i,\;k}} = C{{{\hat \xi}}_{i,\;k}}\; \end{split}\right. $$ (10) 其中, $ {{v}_{i,\;k}} $为分布式控制输入, $ H $为滤波增益. 观测误差动力学可以表示为
$$ \begin{split} {{{e}}_{m,\;i,\;k+1}} = \; &\frac{\partial \Phi ({{x}_{i,\;k}})}{\partial {{x}_{i,\;k}}}\{ {{f}_{i}}({{x}_{i,\;k}})+{{g}_{i}}({{x}_{i,\;k}})[ {{\beta }_{i}}({{x}_{i,\;k}})\;+\\ &{{\alpha }_{i}}({{x}_{i,\;k}}){{v}_{i,\;k}} ] \} -A{\hat\xi_{i,\;k} } -B{{v}_{i,\;k}}\;-\\ &H\left( {{y}_{i,\;k}}-{{{\hat{y}}}_{i,\;k}} \right) \\[-1pt]\end{split} $$ (11) 注1. 在智能体完成线性化之前, 由于被控智能体与目标线性系统异构, 状态误差$ \bar e_{i,\;k} $无法渐近收敛. 仅当满足$ {\alpha _i}({x_{i,\;k}}) = { {1 \over {{L_{{g_i}}}L_{{f_i}}^{\rho - 1}{h_i}({x_{i,\;k}})}}} $和$ {\beta _i}({x_{i,\;k}}) = { {- L_{{f_i}}^\rho {h_i}({x_{i,\;k}}) \over {{L_{{g_i}}}L_{{f_i}}^{\rho - 1}{h_i}({x_{i,\;k}})}}} $时, $ \lim_{t \to \infty} \bar e_{i,\;k} = 0 $, 被控系统线性化为目标线性系统(8).
考虑$ {{\alpha }_{i}}(\cdot) $和$ {{\beta }_{i}}(\cdot) $的两组李导数是关于$ {{x}_{i,\;k}} $的多项式, 因此利用$ {{x}_{i,\;k}} $各个元素及相关表达式作为基向量, 设计两组多项式近似未知的反馈线性化控制器$ {{u}_{i}} = {{\beta }_{i}}({{x}_{i,\;k}})+{{\alpha }_{i}}({{x}_{i,\;k}}){{v}_{i}} $, 有
$$ {{\hat{\alpha }}_{i}}({{x}_{i,\;k}}) = W_{{{\alpha }_{i}}}^{{\mathrm{T}}}\omega ({{x}_{i,\;k}})\; $$ (12) $$ {{\hat{\beta }}_{i}}({{x}_{i,\;k}}) = W_{{{\beta }_{i}}}^{{\mathrm{T}}}\omega ({{x}_{i,\;k}})$$ (13) 其中, $ W_{{{\alpha }_{i,\;k}}}^{{\mathrm{T}}} $, $ W_{{{\beta }_{i,\;k}}}^{{\mathrm{T}}} $为多项式权值, $ \omega (\cdot ) $是由$ {{x}_{i,\;k}} $及其多项式组合构成的基向量. 接下来, 通过数据驱动的自适应动态规划算法, 学习得到$ {{\alpha }_{i}}(\cdot) $和$ {{\beta }_{i}}(\cdot) $的最优近似.
由于$ {{\alpha }_{i}}({{x}_{i,\;k}}) $和$ {{\beta }_{i}}({{x}_{i,\;k}}) $作用于同一控制通道, 一个网络的变化会影响另一个网络的学习空间. 这使得$ {{\alpha }_{i}}({{x}_{i,\;k}}) $和$ {{\beta }_{i}}({{x}_{i,\;k}}) $的学习均处于非平稳空间, 贝尔曼方程求解将是一个非凸优化问题, 容易使学习陷入局部最优.
为避免非线性项耦合, 利用历史采样输入输出数据, 结合极限差分方法重构$ {{\alpha }_{i}}(\cdot ) $观测值的倒数:
$$ {{L}_{{{g}_{i}}}}L_{{{f}_{i}}}^{\rho -1}{{h}_{i}}({{x}_{i,\;k}})\text{ = }\frac{1}{{{\alpha }_{i}}({{x}_{i,\;k}})}\text{ = }\frac{\partial {{y}_{i,\;k}}(\rho )}{\partial {{u}_{i,\;k}}} $$ (14) 采用监督学习训练网络(12)得到$ {{\hat{\alpha }}_{i}}({{x}_{i,\;k}}) = {{\alpha }_{i}}({{x}_{i,\;k}})+{{d}_{i,\; \alpha}} $, 可将式(11)表示为
$$ \left\{ \begin{aligned} {{{{e}}}_{m,\;i,\;k+1}}(1) & = {{e}_{m,\;i,\;k}}(2)-{{H}_{1}}\left( {{y}_{i,\;k}}-{{{\hat{y}}}_{i,\;k}} \right) \\ {{{{e}}}_{m,\;i,\;k+1}}(2) & = {{e}_{m,\;i,\;k}}(3)-{{H}_{2}}\left( {{y}_{i,\;k}}-{{{\hat{y}}}_{i,\;k}} \right) \\ & \qquad\qquad \vdots \\ {{{{e}}}_{m,\;i,\;k+1}}(\rho ) & = {\beta _i}({x_{i,\;k}}) + {\hat \beta _i}({x_{i,\;k}})\;-\\ &\;\;\;\;{{H}_{\rho }}\left( {{y}_{i,\;k}}-{{{\hat{y}}}_{i,\;k}} \right)+ {{\sigma }_{i,\;k}} \end{aligned} \right. $$ (15) 其中, $ {{\sigma }_{i,\;k}} = {{d}_{i,\; \beta}} + {{d}_{i,\;\alpha}}{{v}_{i,\;k}} $, $ {d_{i,\,\beta }} = {\hat \beta _i}({x_{i,\,k}}) - {\beta _i}({x_{i,\,k}}) $. 理论上多项式可以无限逼近一条光滑曲线, 因此$ {{\sigma }_{i,\;k}} $ 满足$ \| {{\sigma }_{i,\;k}} \|\le d^m<{{\varepsilon }_{d}} $ 和$ \| {{\sigma}_{i,\;k}}-{{\sigma}_{i,\;k-1}} \|\le \Delta \sigma^m $, $ \sigma^m,\;\Delta \sigma^m\in {{{\bf{R}}}^{+}} $是未知的, $ {{\varepsilon }_{d}} $ 为极小值. 基于此, 分布式反馈线性化控制器学习问题转为一个模型参考跟踪控制问题, 通过状态误差$ {{\bar e}_{i,\;k}} $作为强化信号优化网络$ {{\hat{\beta }}_{i}}(\cdot ) $的输出以消除非线性动态, 使得观测误差动力学(15)能够快速收敛, 同时完成系统线性化.
值得注意的是, 传统的启发式动态规划在求解最优跟踪策略时通常需要考虑误差−动作对信息. 反馈线性化控制器通过消除系统的非线性特征, 使得线性控制器能够得到更好的控制效果, 间接影响跟踪误差, 而非直接通过误差反馈减小跟踪误差. 因此, 在反馈线性化控制器的无模型学习中, 执行网络和值函数不应与误差相关. 为了有效引导优化方向, 避免陷入局部最优, 需将反馈线性化的程度指标作为系统长期目标融入值函数的优化过程. 但是由于模型信息缺失, 难以预先设计一个奖励信号来正确引导反馈线性化的学习.
为此, 本文定义反馈线性化奖励作为各智能体线性化程度的指标:
$$ \begin{aligned} {C_{i,\;k}} = \left\{ {\begin{aligned} &{0,\;}&&{{{\left\| {{{\bar e}_{i,\;k}}} \right\|}_1} + {{\left\| {{{\bar e}_{i,\;k}} - {{\bar e}_{i,\;k - 1}}} \right\|}_1} < {\varepsilon _i}}\\ &{1,\;}&&{{{\left\| {{{\bar e}_{i,\;k}}} \right\|}_1} + {{\left\| {{{\bar e}_{i,\;k}} - {{\bar e}_{i,\;k - 1}}} \right\|}_1} \ge {\varepsilon _i}} \end{aligned}} \right.\; \end{aligned}$$ (16) 同时为正确引导学习方向, 设计奖励网络$ {{R}_{i,\;k}} $
$$ \hat{R}_{i,\;k}^{l} = W_{{{r}_{i}}}^{l\;{\mathrm{T}}}\omega ({{X}_{i,\;k}}) $$ (17) 该网络用于在学习过程中动态调整奖励值, 无需针对不同异构智能体分别设计奖励信号.
为了同时调整奖励信号和求解反馈线性化控制器, 设计了双启发式评价网络同时逼近最优值函数和一个启发函数. 其中, 启发式函数用于快速估计值函数梯度方向和大小, 优化控制策略. 本文在奖励网络、评价网络与执行网络之间构建两阶段双启发式自适应动态规划问题, 通过两阶段循环迭代, 实现对高维奖励信息、值函数、启发函数和最优策略的同步逼近. 如图2所示, 两阶段双启发式自适应动态规划方法的每轮迭代包括两个阶段: 在奖励评估阶段, 根据反馈线性化奖励, 迭代优化奖励网络和双评价网络; 在动作评估阶段, 通过上一阶段得到的启发网络直接估计值函数梯度. 进而快速更新动作网络, 实现控制器的性能提升. 具体实现如下所述.
首先, 给出累计折扣奖励值函数的表达式:
$$ {{J}_{i,\;k}} = \sum\limits_{\delta = 0}^{\infty }{\gamma _{{{J}_{i}}}^{\delta }{{C}_{i,\;k+\delta }}} $$ (18) 其中, $ {{\gamma }_{{{J}_{i}}}}\in \left( 0,\;1 \right) $是一个折扣因子. 定义一个双启发式评价网络结构同时近似最优值函数$ J_{i}^{*}(\cdot ) $和一个最优启发函数$ \lambda _{i}^{*}(\cdot ) $:
$$ \left[ \begin{matrix} \hat{J}_{i,\;k}^{l} \\ \hat{\lambda }_{i,\;k}^{l} \\ \end{matrix} \right] = \left[ \begin{matrix} W_{{{J}_{i}}}^{l\;{\mathrm{T}}} \\ W_{{{\lambda }_{i}}}^{l\;{\mathrm{T}}} \\ \end{matrix} \right]\omega \left( {{X}_{i,\;k}},\;R_{i,\;k}^{l} \right) $$ (19) 其中, $ \hat{J}_{i,\;k}^{l} $和$ \hat{\lambda }_{i,\;k}^{l} $分别表示在 $ l $次迭代后对$ {{J}_{i,\;k}} $和$ {{\lambda }_{i,\;k}} $的估计值. $ {{\lambda }_{i,\;k}} $是值函数$ {{J}_{i,\;k}} $关于$ {{X}_{i,\;k}} $的各元素偏导组成的向量.
学习过程中, 采用异策略学习方式, 利用$ k $和$ k-1 $的数据更新网络权值. 根据贝尔曼原理, 定义$ {{e}_{c,\;i,\;k}} $为双评价网络的估计误差:
$$ {{e}_{c,\;i,\;k}} = {{\mu }_{j}}\frac{e_{J,\;i,\;k}^{2}}{2}+{{\mu }_{\lambda }}\frac{e_{\lambda ,\;i,\;k}^{2}}{2} $$ (20) 其中, $ {{e}_{J,\;i,\;k}} = {{\hat{R}}_{i,\;k-1}}+{{\gamma }_{{{J}_{i}}}}{{\hat{J}}_{i,\;k}}-{{\hat{J}}_{i,\;k-1}} $; $ {{e}_{\lambda ,\;i,\;k}} = \frac{{{{\hat{R}}}_{i,\;k-1}}}{{{X}_{i,\;k-1}}}+{{\gamma }_{{{J}_{i}}}}{{\hat{\lambda }}_{i,\;k}}{{\Xi }_{i,\;k}}-{{\hat{\lambda }}_{i,\;k-1}} $, 其中$ {{\mu }_{j}}\in \left( \left. 0,\;1 \right] \right. $和$ {{\mu }_{\lambda }}\in \left( \left. 0,\;1 \right] \right. $为学习步长; $ {{\Xi }_{i,\;k}} = \frac{\partial {{X}_{i,\;k}}}{\partial {{X}_{i,\;k-1}}}\; $为增广状态的雅克比矩阵. 根据梯度下降原则, 双评价网络通过如下更新规则进行更新:
$$ \begin{split} &{\begin{bmatrix} W_{{{J}_{i}}}^{l+1} \\ W_{{{\lambda }_{i}}}^{l+1} \end{bmatrix}}^{{\mathrm{T}}} = {\begin{bmatrix} W_{J_i}^{l} \\ W_{\lambda_i }^{l} \end{bmatrix}}^{{\mathrm{T}}} - \\&\qquad{{\eta }_{c}} {\begin{bmatrix} {{\mu }_{j}}\dfrac{\partial {{e}_{J,\;i,\;k}}}{\partial \hat{J}_{i,\;k}^{l}}\dfrac{\partial \hat{J}_{i,\;k}^{l}}{\partial W_{J}^{l}}{{e}_{J,\;i,\;k}} \\ {{\mu }_{\lambda }}\dfrac{\partial {{e}_{\lambda ,\;i,\;k}}}{\partial \hat{\lambda }_{i,\;k}^{l}}\dfrac{\partial \hat{\lambda }_{i,\;k}^{l}}{\partial W_{\lambda }^{l}}{{e}_{\lambda ,\;i,\;k}} \end{bmatrix}}^{{\mathrm{T}}} = {\begin{bmatrix} W_{J_i}^{l} \\ W_{\lambda_i }^{l} \end{bmatrix}}^{{\mathrm{T}}} - \\ &\qquad{{\eta }_{c}} {\begin{bmatrix} {{\mu }_{j}}{{\gamma }_{{{J}_{i}}}}\omega \left( {{X}_{i,\;k}},\;R_{i,\;k}^{l} \right){{e}_{J,\;i,\;k}} \\ {{\mu }_{\lambda }}{{\gamma }_{{{J}_{i}}}}\omega \left( {{X}_{i,\;k}},\;R_{i,\;k}^{l} \right){{\left( {{\Xi }_{i,\;k}}{{e}_{\lambda ,\;i,\;k}} \right)}^{{\mathrm{T}}} } \end{bmatrix}}^{{\mathrm{T}}} \end{split} $$ (21) 其中, $ {{\eta }_{c}} $是评价网络的权值更新步长.
定义$ {{e}_{R,\;i,\;k}} $为奖励网络估计误差:
$$ {{e}_{R,\;i,\;k}} = {{C}_{i,\;k-1}}-\hat{R}_{i,\;k}^{l} = {{C}_{i,\;k-1}}-\left( \hat{J}_{i,\;k-1}^{l}-{{\gamma }_{{{J}_{i}}}}\hat{J}_{i,\;k}^{l} \right) $$ (22) 奖励网络通过如下更新规则进行更新,
$$ \begin{split} &W_{{{r}_{i}}}^{l+1}= W_{{{r}_{i}}}^{l}-{{\eta }_{r}}\frac{\partial {{e}_{R,\;i,\;k}}}{\partial \hat{J}_{i,\;k}^{l}}\frac{\partial \hat{J}_{i,\;k}^{l}}{\partial \hat{R}_{i,\;k}^{l}}\frac{\partial \hat{R}_{i,\;k}^{l}}{\partial W_{{{r}_{i}}}^{l}}{{e}_{R,\;i,\;k}} =\\ &\;\;\;\; W_{{{r}_{i}}}^{l}-{{\eta }_{r}}{{e}_{R,\;i,\;k}}{{\gamma }_{{{J}_{i}}}}W_{{{J}_{i}}}^{l\;{\mathrm{T}}}{\omega }'\left( {{X}_{i,\;k}},\;R_{i,\;k}^{l} \right)\omega \left( {{X}_{i,\;k}} \right) \end{split} $$ (23) 其中, $ {{\eta }_{r}} $是奖励网络的权值更新步长.
基于启发网络, 动作网络的误差函数可定义为
$$ {{e}_{\beta ,\;i,\;k}} = \hat{\lambda }_{i,\;k}^{l} $$ (24) 动作网络通过最小化误差函数$ {{e}_{\beta ,\;i,\;k}} $求解最优动作, 更新规则如下:
$$ \begin{split} &W_{{{\beta }_{i}}}^{(l+1)\;{\mathrm{T}}} = W_{{{\beta }_{i}}}^{l\;{\mathrm{T}}}-{{\eta }_{a}}\frac{\partial {{e}_{\beta ,\;i,\;k}}}{\partial {{X}_{i,\;k}}}\frac{\partial {{X}_{i,\;k}}}{\partial \hat{\beta }_{i,\;k}^{l}}\frac{\partial \hat{\beta }_{i,\;k}^{l}}{\partial W_{{{\beta }_{i}}}^{l\;{\mathrm{T}}}}\hat{\lambda }_{i,\;k}^{l}= \\ &\;\;\;\;\;\;\;\;\;\; W_{{{\beta }_{i}}}^{l\;{\mathrm{T}}}-{{\eta }_{a}}{{{\hat{\lambda }}}^{l\;{\mathrm{T}}}}({{\xi }_{k}})W_{{{\lambda }_{i}}}^{l\;{\mathrm{T}}}{\omega }'\left( {{X}_{i,\;k}},\;R_{i,\;k}^{l} \right)\omega \left( {{x}_{i,\;k}} \right) \end{split} $$ (25) 其中, $ {{\eta }_{a}} $是执行网络的权值更新步长.
2.2 线性化系统分布式控制
在同胚空间中, 通过无模型反馈线性化, 非线性多智能体输入输出关系在控制器(6)的作用下由非线性动力学(1)映射为同胚空间中的能控标准型系统, 由此可将虚拟领导者设计为更简单的线性形式:
$$ \left\{ \begin{aligned} & {{{{\xi }}}_{r,\;k}} = A{{\xi }_{r,\;k}}+BK{{\xi }_{r,\;k}} \\ & {{y}_{r,\;k}} = C{{\xi }_{r,\;k}} \end{aligned} \right. $$ (26) 其中, $K $为反馈控制增益, 局部邻域输出跟踪误差可由一个虚拟局部邻域状态跟踪误差等效:
$$ {{{\cal{E}}}_{i,\;k}} = \mathop \sum \limits_{j \in {N_i}} {{a}_{ij}}({{\xi }_{i,\;k}}-{{\xi }_{j,\;k}})+{{b}_{i}}{{e}_{p,\;i,\;k}} $$ (27) 其中, $ {{e}_{p,\;i,\;k}} = {{\xi }_{i,\;k}}-{{\xi }_{r,\;k}} $.
令$ \xi = \left[ {{\xi }_{1}},\;{{\xi }_{2}},\;\cdots ,\;{{\xi }_{N}} \right] $, 则全局动力学方程为
$$ \left\{ \begin{aligned} & {\xi_k } = \left( {{I}_{N}}\otimes A \right)\xi +\left( {{I}_{N}}\otimes B \right)v \\ & y = \left( {{I}_{N}}\otimes C \right){{\xi }_{r}} \end{aligned} \right. $$ (28) 定义$ Q = {Q^{\mathrm{T}}} $和$ R = {R^{\mathrm{T}}} $为正定矩阵. 令反馈控制增益为
$$ \begin{aligned} K = {R^{ - 1}}{B^{\mathrm{T}}}{\cal{P}} \end{aligned} $$ (29) 其中, $ {\cal{P}} $是代数黎卡提方程的唯一正定解:
$$ \begin{aligned} {A^{\mathrm{T}}}{\cal{P}} + {\cal{P}}A + Q - {\cal{P}}B{R^{ - 1}}{B^{\mathrm{T}}}{\cal{P}} = 0 \end{aligned} $$ (30) 令$ {\zeta _i}\,\;\left( {i \in {\cal{N}}} \right) $为$ {\cal{L}} + {\cal{B}} $的特征根, 当满足 $ {\cal{C}} \ge \frac{1}{{2 {\min }_{i \in {\cal{N}}} {\mathop{\rm{Re}}\nolimits} ({\zeta _i})}} $时, $ \forall i \in {\cal{N}} $, 所有$ A - {\cal{C}}{\zeta _i}BK $满足Hurwitz条件, $ {\cal{C}} \in \bf{R} $为耦合增益.
引理1[11]. 选择
$$ {{v}_{i,\;k}} = -{\cal{C}}K{{{\cal{E}}}_{i,\;k}} $$ (31) 为分布式线性控制输入, 其中$ {\cal{C}} \ge \frac{1}{{2{\min }_{i \in {\cal{N}}} {\mathop{\rm{Re}}\nolimits} ({\zeta _i})}} $, $ K = {{R}^{-1}}{{B}^{{\mathrm{T}}} }{\cal{P}} $, 则$ \forall i\in {\cal{N}} $, 有$ {{\xi }_{i}} $关于$ {{\xi }_{r}} $协同一致渐近有界, 且所有节点与$ {{\xi }_{r}} $同步.
注2. 由于输入输出反馈线性化特性, 可将期望线性系统动力学设计为统一形式. 根据假设, 当所有智能体相对阶一致时, 采用同样的反馈控制增益$ K $即可实现所有智能体动态品质趋同, 显著减小分布式控制器设计复杂度.
3. 学习收敛性证明
本节讨论分布式无模型反馈线性化算法的收敛性. 考虑跟踪误差的收敛性以及双评价网络、奖励网络、动作网络的学习收敛问题. 定义分布式无模型反馈线性化算法中三种网络的最优权值表达式为
$$ \begin{aligned} \left\{ {\begin{aligned} &{W_{J_i}^*=\arg \mathop {\min }\limits_{{W_{J_i}}} \left\| {{{\hat J_i}^l}({X_{i,\;k}},\; {{\hat R_{i,\;k}}^l}) - J_{i,\;k}} \right\|}\\ &{W_{\lambda_i}^*=\arg \mathop {\min }\limits_{{W_{\lambda_i}}} \left\| {{{\hat \lambda_i}^l}({X_{i,\;k}},\; {{\hat R_{i,\;k}}^l}) - \frac{\partial {J_{i,\;k}}}{\partial {{X_{i,\;k}}}}} \right\|}\\ &{W_{r_i}^*=\arg \mathop {\min }\limits_{{W_{r_i}}} \left\| {{{\hat R_i}^l}({X_{i,\;k}}) - {C_{i,\;k}}} \right\|}\\ &{W_{a_i}^{\rm{*}}=\arg \mathop {\min }\limits_{{W_{a_i}}} \left\| {{{\hat \beta_i}^l}({x_{i,\;k}}) -L_{{{f}_{i}}}^{\rho }{{h}_{i}}({{x}_{i,\;k}})} \right\|} \end{aligned}} \right. \end{aligned} $$ (32) 其中, $ J_i({X_{i,\;k}}) $为理想值函数. 可得权值的估计误差为
$$ \begin{aligned} \left\{ {\begin{aligned} &{\tilde W_{J_i}^l=W_{J_i}^l - W_{J_i}^*}\\ &{\tilde W_{\lambda_i}^l=W_{\lambda_i}^l - W_{\lambda_i}^*}\\ &{\tilde W_{r_i}^l=W_{r_i}^l - W_{r_i}^*}\\ &{\tilde W_{a_i}^l=W_{a_i}^l - W_{a_i}^{\rm{*}}} \end{aligned}} \right. \end{aligned} $$ (33) 为了简化表示, 令${\omega _{a,\,i,\,k}} = \omega ({x_{i,\,k}})$, ${\omega _{c,\,i,\,k}} = \omega ({X_{i,\,k}}, \tilde R_{i,\,k}^l) $, $ {\omega _{r,\,i,\,k}} = \omega ({X_{i,\,k}})$, $ \tilde u_{i,\,k}^l = \tilde W_{{a_i}}^{l,\,{\mathrm{T}}}{\omega _{a,\,i,\,k}} $, $ \tilde J_{i,\,k}^l = \tilde W_{{J_i}}^{l,\,{\mathrm{T}}}{\omega _{c,\,i,\,k}} $, $ \tilde \lambda _{i,\,k}^l = \tilde W_{{\lambda _i}}^{l,\,{\mathrm{T}}}{\omega _{c,\,i,\,k}} $, $\tilde R_{i,\,k}^l = \tilde W_{{r_i}}^{l,\,{\mathrm{T}}}{\omega _{r,\,i,\,k}}$.
假设4. 网络的权值$ W_{J_i} $, $ W_{\lambda_i} $, $ W_{a_i} $, $ W_{r_i} $和基向量输出$ \omega ( \cdot ) $均有界, 且上界分别表示为$ W_{J_i}^m $, $ W_{\lambda_i}^m $, $ W_{a_i}^m $, $ W_{r_i}^m $, $ \omega^m $.
首先讨论系统跟踪误差的收敛性, 若期望模型的状态$ z_{i,\;k} $和输入$ r_{i,\;k} $有界, 且假设4成立, 令$ e_{m,\;i,\;k} $的Lyapunov函数候选为$ {L_{e_i}} = \frac{1}{{3}}e_{m,\;i,\;k}^{\mathrm{T}}{e_{m,\;i,\;k}} $, 则$ L_{e_i} $的一阶差分满足:
$$ \begin{split} \Delta {L_{e_i}} = \;&{e_{m,\;i,\;k + 1}^{\mathrm{T}}{e_{m,\;i,\;k + 1}} - e_{m,\;i,\;k}^{\mathrm{T}}{e_{m,\;i,\;k}}} \le\\ & \left( {{\lambda _{\max }} - \frac{1}{3}} \right){\left\| {{e_{m,\;i,\;k}}} \right\|^2} + {{{\left\| {\hat \beta_{i,\;k}^l} \right\|}^2} + {{\left\| {{d_{i,\;k}}} \right\|}^2}} \end{split} $$ (34) 其中, $ {\lambda _{\max }} $表示$ H^{\mathrm{T}}H $最大特征根.
接下来讨论学习过程的收敛性. 为了分析双重评价函数权值更新的稳定性, 考虑四个部分的收敛性: 值函数权值的估计误差、值函数的估计误差、启发式函数权值的估计误差和启发式函数的估计误差. 根据式(21), 双评价网络权值估计误差如下:
$$ \begin{split} &{\left[ {\begin{array}{*{20}{c}} {\tilde W_{J_i}^{l + 1}}\\ {\tilde W_{\lambda_i}^{l + 1}} \end{array}} \right]^{\mathrm{T}}} = {\left[ {\begin{array}{*{20}{c}} {\tilde W_{J_i}^l}\\ {\tilde W_{\lambda_i}^l} \end{array}} \right]^{\mathrm{T}}} -\\&\qquad {\eta _c}{\left[ {\begin{array}{*{20}{c}}{{\mu _J}{\gamma _J}\omega_{c,\;i,\;k} e_{{J,\;i,\;k}}^{\mathrm{T}}}\\ {{\mu _\lambda }{\gamma _J}\omega_{c,\;i,\;k} {{\left( {\Xi_{i,\;k}{e_{{\lambda,\; i,\;k}}}} \right)}^{\mathrm{T}}}} \end{array}} \right]^{\mathrm{T}}} \end{split}$$ (35) 引理2. 令双评价网络的Lyapunov函数候选为
$$ \begin{split} {L_{c_i}} =\;& {L_{{W_{J_i}}}} + {L_{J_i}} + {L_{{W_{\lambda_i}}}} + {L_{\lambda_i}}=\\ & \frac{1}{{{\eta _c}}}{\rm{tr}}\left( {\tilde W_{J_i}^{l\;{\mathrm{T}}}\tilde W_{J_i}^l} \right) + \frac{1}{2}{\mu _j}{\left\| {{{\tilde J}_i^l}(X_{i,\;k})} \right\|^2}\;+\\ & \frac{1}{{{\eta _c}}}{\rm{tr}}\left( {\tilde W_{\lambda_i}^{l\;{\mathrm{T}}}\tilde W_{\lambda_i}^l} \right) + \frac{1}{2}{\mu _\lambda }{\left\| {{{\tilde \lambda }_i^l}(X_{i,\;k})} \right\|^2} \end{split}\nonumber $$ 则有${L_{{c_i}}}$的一阶差分满足以下不等式:
$$ \begin{split} \Delta {L_{{c_i}}} \le\;& - {\mu _J}\gamma _{{J_i}}^2{\left\| {\tilde J_{i,\;k}^l} \right\|^2} + \frac{{{\mu _j}}}{2}{\left\| {\tilde J_{i,\;k - 1}^l} \right\|^2}\; +\\ & \frac{{{\mu _\lambda }}}{2}{\left\| {\tilde \lambda _{i,\;k - 1}^l} \right\|^2} - {\mu _j}\gamma _{{J_i}}^2\left( {I - {\chi _{{J_k}}}} \right)\times\\ & {\left\| {\tilde J_{i,\;k}^l + \gamma _{{J_i}}^{ - 1}\varepsilon _{{j_k}}^{\rm{*}}} \right\|^2}-\end{split}\qquad\qquad $$ $$ \begin{split} & \quad\qquad{\mu _\lambda }\gamma _{{J_i}}^2{\left\| {{\Xi _{i,\;k}}} \right\|^2}{\left\| {\tilde \lambda _{i,\;k}^l} \right\|^2} \;- \\ & \quad\qquad{\mu _\lambda }\gamma _{{J_i}}^2\left( {I - {\eta _c}{\mu _\lambda }\gamma _{{J_i}}^2{{\left\| {{\Xi _{i,\;k}}} \right\|}^2}{{\left\| {{\omega _{c,\;i,\;k}}} \right\|}^2}} \right)\times\\ &\quad\qquad {\left\| {\Xi _{i,\;k}^{\mathrm{T}}\tilde \lambda _{i,\;k}^l + \gamma _{{J_i}}^{ - 1}\varepsilon _{{\lambda _k}}^{\rm{*}}} \right\|^2}+ 2{\mu _j}\Bigg\| \hat R_{i,\;k - 1}^l\; +\\ &\quad\qquad {\gamma _{{J_i}}}W_{{J_i}}^*{\omega _{c,\;i,\;k}} - \frac{1}{2}\left( {W_{{J_i}}^l + W_{{J_i}}^*} \right){\omega _{c,\;i,\;k - 1}} \Bigg\|^2 \;+\\ &\quad\qquad \frac{1}{2}{\mu _j}\left( {{{\left\| {\tilde J_{i,\;k}^l} \right\|}^2} - {{\left\| {\tilde J_{i,\;k - 1}^l} \right\|}^2}} \right)+\\ &\quad\qquad 2{\mu _\lambda }\Bigg\| \frac{{\partial \hat R_{i,\;k - 1}^l}}{{\partial {X_{i,\;k - 1}}}} + {\gamma _{{J_i}}}{\Xi _{i,\;k}}W_{{\lambda _i}}^*{\omega _{c,\;i,\;k}}\; -\\ &\quad\qquad \frac{1}{2}\left( {W_{{\lambda _i}}^l - W_{{\lambda _i}}^*} \right){\omega _{c,\;i,\;k - 1}} \Bigg\|^2 \;+ \\ &\quad\qquad \frac{1}{2}{\mu _\lambda }\left( {{{\left\| {\tilde \lambda _{i,\;k}^l} \right\|}^2} - {{\left\| {\tilde \lambda _{i,\;k - 1}^l} \right\|}^2}} \right)\\[-1pt] \end{split} $$ (36) 其中, $ \frac{{\partial {{\hat R}_{i,\;k-1}^l} }}{{\partial {X_{i,\;k - 1}}}} $, $ \Xi_{i,\;k} $的上界分别为$ R^m $和$ \Xi^m $.
证明. $ {L_{{W_J}}} $一阶差分为
$$ \begin{split} \Delta {L_{{W_{{J_i}}}}} = \;&\frac{1}{{{\eta _c}}}{\rm{tr}}\left[ {\tilde W_{{J_i}}^{l + 1,\;{\mathrm{T}}}\tilde W_{{J_i}}^{l + 1} - \tilde W_{{J_i}}^{l,\;{\mathrm{T}}}\tilde W_{{J_i}}^l} \right]=\\ &\frac{1}{{{\eta _c}}}{\rm{tr}}\Big[ {\tilde W_{{J_i}}^{l,\;{\mathrm{T}}}{{\left( {I - {\chi _c}} \right)}^{\mathrm{T}}}\left( {I - {\chi _c}} \right)} \tilde W_{{J_i}}^l \;- \\ &\varepsilon _{{j_k}}^{\rm{*}}\omega _{c,\;i,\;k}^{\mathrm{T}}{\eta _c}{\mu _j}{\gamma _j}\left( {I - {\chi _c}} \right)\tilde W_{{J_i}}^l\;+\\ & \varepsilon _{{j_k}}^{\rm{*}}\omega _{c,\;i,\;k}^{\mathrm{T}}\eta _c^2\mu _j^2\gamma _j^2{\omega _{c,\;i,\;k}}\varepsilon _{{j_k}}^{*,\;{\mathrm{T}}}\;-\\ & \tilde W_{{J_i}}^{l,\;{\mathrm{T}}}{{\left( {I - {\chi _c}} \right)}^{\mathrm{T}}}{\eta _c}{\mu _j}{\gamma _j}{\omega _{c,\;i,\;k}}\varepsilon _{{j_k}}^{*,\;{\mathrm{T}}} \;-\\ & \tilde W_{{J_i}}^{l,\;{\mathrm{T}}}\tilde W_{{J_i}}^l \Big] \end{split} $$ (37) 其中, $ \varepsilon_{{j_k}}^* = {{\hat R}_{i,\;k}^l} - W_{J_i}^{l\;{\mathrm{T}}}{\omega_{c,\;i,\;k-1}} + {\gamma _J}W_J^{*\;{\mathrm{T}}}{\omega _{{c,\;k}}} $, $ {\chi _{{c}}}= {\eta _c}{\mu _j}\gamma _j^2{\omega_{c,\;i,\;k}}\omega _{c,\;i,\;k}^{\mathrm{T}} $.
对上式进行如下变换:
$$ \begin{split} \tilde W_{J_i}^{l\;{\mathrm{T}}}&{\left( {I - {\chi _{{c}}}} \right)^{\mathrm{T}}}\left( {I - {\chi _{{c}}}} \right)\tilde W_{J_i}^l - \tilde W_{J_i}^{l\;{\mathrm{T}}}\tilde W_{J_i}^l = \\ &\tilde W_{J_i}^{l\;{\mathrm{T}}}\left( {I - {\chi _{{c}}}} \right)\tilde W_{J_i}^l - \tilde W_{J_i}^{l\;{\mathrm{T}}}\tilde W_{J_i}^l\;- \\ & \tilde W_{J_i}^{l\;{\mathrm{T}}}{\chi _{{c}}}\left( {I - {\chi _{{c}}}} \right)\tilde W_{J_i}^l= - {\eta _c}{\mu _j}\gamma _j^2{\left\| {\tilde J_{i,\;k}^l} \right\|^2} \;- \\ &{\eta _c}{\mu _j}\gamma _j^2\left( {I - {\chi _{{c}}}} \right){\left\| {\tilde J_{i,\;k}^l} \right\|^2}\;\\[-1pt] \end{split} $$ (38) 则$ \Delta {L_{{W_{J_i}}}} $可重写为
$$ \begin{split}\;& \Delta {L_{{W_{{J_i}}}}} = \frac{1}{{{\eta _c}}}{\rm{tr}}\Big[ { - {\eta _c}{\mu _j}\gamma _j^2\left( {I - {\chi _c}} \right){{\left\| {\tilde J_{i,\;k}^l} \right\|}^2}} - \\ &\qquad{\eta _c}{\mu _j}\gamma _j^2{\left\| {\tilde J_{i,\;k}^l} \right\|^2}+\varepsilon _{{j_k}}^{\rm{*}}\omega _{c,\;i,\;k}^{\mathrm{T}}\eta _c^2\mu _j^2\gamma _j^2{\omega _{c,\;i,\;k}}\varepsilon _{{j_k}}^{*,\;{\mathrm{T}}}\;-\\ &\qquad \varepsilon _{{j_k}}^{\rm{*}}\omega _{c,\;i,\;k}^{\mathrm{T}}{\eta _c}{\mu _j}{\gamma _j}\left( {I - {\chi _c}} \right)\tilde W_{{J_i}}^l \;-\\ &\qquad{ \tilde W_{{J_i}}^{l,\;{\mathrm{T}}}{{\left( {I - {\chi _c}} \right)}^{\mathrm{T}}}{\eta _c}{\mu _j}{\gamma _j}{\omega _{c,\;i,\;k}}\varepsilon _{{j_k}}^{*,\;{\mathrm{T}}}} \Big]= \end{split} $$ $$ \begin{split} & {\mu _j}{\left\| {\varepsilon _{{j_k}}^{\rm{*}}} \right\|^2} - {\mu _j}\gamma _j^2{\left\| {\tilde J_{i,\;k}^l} \right\|^2} - \\ &{\mu _j}\gamma _j^2\left( {I - {\chi _c}} \right){\left\| {\tilde J_{i,\;k}^l + \gamma _j^{ - 1}\varepsilon _{{j_k}}^{\rm{*}}} \right\|^2} \end{split} \qquad\qquad$$ (39) 根据Cauchy-Schwarz不等式[19], $ \Delta {L_{{W_{J_i}}}} $满足:
$$ \begin{split} &\Delta {L_{{W_{{J_i}}}}} \le - {\mu _j}\gamma _j^2{\left\| {\tilde J_{i,\;k}^l} \right\|^2} + \frac{{{\mu _j}}}{2}{\left\| {\tilde J_{i,\;k - 1}^l} \right\|^2}\;-\\ & \;\; {\mu _j}\gamma _j^2\left( {I - {\chi _c}} \right){\left\| {\tilde J_{i,\;k}^l + \gamma _j^{ - 1}\varepsilon _{c,\;k}^{\rm{*}}} \right\|^2}\;+\\ & \;\; 2{\mu _j}{\left\| {{{\hat R}_{i,\,k}^l} + {\gamma _J}W_{{J_i}}^*{\omega _{c,\,i,\,k}} - \frac{1}{2}\left( {W_{{J_i}}^l + W_{{J_i}}^*} \right){\omega _{c,\,i,\,k - 1}}} \right\|^2} \end{split} $$ (40) 同理, 可得$ \Delta {L_{{W_{\lambda_i} }}} $满足
$$ \begin{split} \Delta {L_{{W_{{\lambda _i}}}}} \le \;&\frac{{{\mu _\lambda }}}{2}{\left\| {\tilde \lambda _{i,\;k - 1}^l} \right\|^2} - {\mu _\lambda }\gamma _j^2{\left\| {{\Xi _{i,\;k}}} \right\|^2}{\left\| {\tilde \lambda _{i,\;k}^l} \right\|^2}\;-\\ & {\mu _\lambda }\gamma _j^2\left( {I - {\eta _c}{\mu _\lambda }\gamma _j^2{{\left\| {{\Xi _{i,\;k}}} \right\|}^2}{{\left\| {{\omega _{c,\;i,\;k}}} \right\|}^2}} \right)\;\times\\ &{\left\| {\Xi _{i,\;k}^{\mathrm{T}}\tilde \lambda _{i,\;k}^l + \gamma _j^{ - 1}\varepsilon _\lambda ^{\rm{*}}} \right\|^2}\;+\\ & 2{\mu _\lambda }\Bigg\| \frac{{\partial \hat R_{i,\;k - 1}^l}}{{\partial {X_{i,\;k - 1}}}} + {\gamma _J}{\Xi _{i,\;k}}W_{{\lambda _i}}^*{\omega _{c,\;i,\;k}} \;-\\ &\frac{1}{2}\left( {W_{{\lambda _i}}^l + W_{{\lambda _i}}^*} \right){\omega _{c,\;i,\;k - 1}} \Bigg\|^2\\[-1pt] \end{split} $$ (41) 其中, $ \varepsilon _\lambda ^{\rm{*}} = \frac{{\partial \hat R_{i,\,k - 1}^l}}{{\partial {X_{i,\,k - 1}}}} + {\gamma _J}{\Xi _{i,\,k}}W_{{\lambda _i}}^{*\,{\mathrm{T}}}{\omega _{c,\,i,\,k}} - W_{\lambda_i} ^{l\,{\mathrm{T}}}{\omega _{c,\,i,\,k - 1}} $, $ {\chi _\lambda }={\eta _c}{\mu _\lambda }\gamma _j^2{\left\| {{\Xi _{i,\,k}}} \right\|^2}{\omega _{c,\,i,\,k}}\omega _{c,\,k}^{\mathrm{T}} $.
对于$ {L_{J_i}} $和$ {L_{\lambda_i}} $, 可直接表示为
$$ \Delta {L_{J_i}} = \frac{1}{2}{\mu _j}\left( {{{\left\| {\tilde J_{i,\;k}^l} \right\|}^2} - {{\left\| {\tilde J_{i,\;k - 1}^l} \right\|}^2}} \right) \;\; $$ (42) $$ \Delta {L_{\lambda_i}} = \frac{1}{2}{\mu _\lambda }\left( {{{\left\| {\tilde \lambda _{i,\;k}^l} \right\|}^2} - {{\left\| {\tilde \lambda _{i,\;k - 1}^l} \right\|}^2}} \right) $$ (43) 结合上述计算式, 可得$ \Delta {L_{c,\;i}} $满足式(36).
□ 根据式(23), 奖励网络权值误差方程如下:
$$ \begin{aligned} \tilde W_{r_i}^{l + 1} = \tilde W_{r_i}^l -{{\eta }_{r}}{{e}_{R,\;i,\;k}}{{\gamma }_{{{J}_{i}}}}W_{{{J}_{i}}}^{l\;{\mathrm{T}}}{\omega' _{c,\;i,\;k}}\omega_{r,\;k} \end{aligned} $$ (44) 引理3. 奖励网络的Lyapunov函数候选为
$$ \begin{aligned} {L_{r_i}} & = \frac{1}{{{2}{\eta _r}}}{\mathrm{tr}}\left( {\tilde W_{r_i}^{l\;{\mathrm{T}}}\tilde W_{r_i}^l} \right) \end{aligned}\nonumber $$ Lyapunov函数$ L_{r_i} $的一阶差分满足以下不等式:
$$ \begin{split} \Delta {L_{r_i}} \le\,& {\left\| {\tilde W_{{r_i}}^{l\,{\mathrm{T}}}{\omega _{r,\,i,\,k}}} \right\|^2} + {\left\| {W_{{J_i}}^{l\,{\mathrm{T}}}{\omega^\prime _{c,\,i,\,k}} } \right\|^2} + {\left\| {{J_{i,\,k}}{\gamma _{{J_i}}}} \right\|^2}\,-\\ & \left( {1 - {\eta _r}{{\left\| {{\omega _{r,\,i,\,k}}} \right\|}^2}} \right){\left\| {W_{{J_i}}^{l\,{\mathrm{T}}}{\omega^\prime _{c,\,i,\,k}} } \right\|^2}{\left\| {{J_{i,\,k}}{\gamma _{{J_i}}}} \right\|^2} \end{split} $$ (45) 证明. 根据式(44), $ {L_{r_i}} $的一阶差分为
$$ \begin{split} \Delta {L_{{r_i}}} =\;& \frac{1}{{{\eta _r}}}{\rm{tr}}\left( {\tilde W_{{r_i}}^{l + 1\;{\mathrm{T}}}\tilde W_{{r_i}}^{l + 1} - \tilde W_{{r_i}}^{l\;{\mathrm{T}}}\tilde W_{{r_i}}^l} \right)=\\ & {\rm{tr}}\Big( { - 2{J_{i,\;k}}{\gamma _{{J_i}}}W_{{J_i}}^{l\;{\mathrm{T}}}{\omega^\prime _{c,\;i,\;k}} {\omega _{r,\;i,\;k}}\tilde W_{{r_i}}^{l\;{\mathrm{T}}}}\; +\\ & {{\eta _r}{{\left\| {{\omega _{r,\;i,\;k}}} \right\|}^2}{{\left\| {W_{{J_i}}^{l\;{\mathrm{T}}}{\omega^\prime _{c,\;i,\;k}} } \right\|}^2}{{\left\| {{\gamma _J}{J_{i,\;k}}} \right\|}^2}} \Big) \end{split} $$ (46) 对式(46)第1项进行变换可得:
$$ \begin{split} \Delta L_{r_i} =\; &{\eta _r}{\left\| {{\omega _{r,\;i,\;k}}} \right\|^2}{\left\| {W_{{J_i}}^{l\;{\mathrm{T}}}{\omega^\prime_{c,\;i,\;k}} } \right\|^2}{\left\| {{\gamma _J}{J_{i,\;k}}} \right\|^2}\;-\\ & {\left\| {{J_{i,\;k}}{\gamma _{{J_i}}}W_{{J_i}}^{l\;{\mathrm{T}}}{\omega^\prime_{c,\;i,\;k}} } \right\|^2} - {\left\| {\tilde W_{{r_i}}^{l\;{\mathrm{T}}}{\omega _{r,\;i,\;k}}} \right\|^2}\;+\\ & {\left\| {\tilde W_{{r_i}}^{l\;{\mathrm{T}}}{\omega _{r,\;i,\;k}} - {J_{i,\;k}}{\gamma _{{J_i}}}W_{{J_i}}^{l\;{\mathrm{T}}}{\omega^\prime_{c,\;i,\;k}} } \right\|^2}=\\ & {\left\| {\tilde W_{{r_i}}^{l\;{\mathrm{T}}}{\omega _{r,\;i,\;k}} - {J_{i,\;k}}{\gamma _{{J_i}}}W_{{J_i}}^{l\;{\mathrm{T}}}{\omega^\prime_{c,\;i,\;k}} } \right\|^2} - \\ & {\left\| {\tilde W_{{r_i}}^{l\;{\mathrm{T}}}{\omega _{r,\;i,\;k}}} \right\|^2}\;-\\ & \left( {1 - {\eta _r}{{\left\| {{\omega _{r,\;i,\;k}}} \right\|}^2}} \right){\left\| {{J_{i,\;k}}{\gamma _{{J_i}}}W_{{J_i}}^{l\;{\mathrm{T}}}{\omega^\prime_{c,\;i,\;k}} } \right\|^2} \end{split} $$ (47) 同样, 根据Cauchy-Schwarz 不等式[19]进行缩放, 可得$ \Delta L_{r_i} $满足式(45).
□ 根据式(25), 执行网络权值估计误差方程如下:
$$ \begin{aligned} \tilde W_{{a_i}}^{l + 1} = \tilde W_{{a_i}}^l - {\eta _a}\hat \lambda _{i,\;k}^{l\;{\mathrm{T}}}W_{{\lambda _i}}^{l\;{\mathrm{T}}}{\omega^\prime _{c,\;i,\;}} {\omega _{a,\;i,\;k}} \end{aligned} $$ (48) 引理4. 执行网络的Lyapunov函数候选为
$$ \begin{aligned} {L_{a_i}} & = \frac{1}{{{\eta _{a,\;i}}}}{\mathrm{tr}}\left( {\tilde W_a^{l\;{\mathrm{T}}}\tilde W_a^l} \right) \end{aligned}\nonumber $$ Lyapunov函数$ L_{a_i} $的一阶差分满足以下不等式:
$$ \begin{split} \Delta {L_{a_i}} \le\;& {\left\| {\tilde \beta _{i,\;k}^l} \right\|^2} + {\left\| {W_{{\lambda _i}}^{l\;{\mathrm{T}}}{\omega ^\prime_{c,\;i,\;k}} } \right\|^2} + {\left\| {\hat \lambda _{i,\;k}^l} \right\|^2}\;-\\ & \left( {1 - {\eta _a}{{\left\| {{\omega _{a,\;i,\;k}}} \right\|}^2}} \right){\left\| {W_{{\lambda _i}}^{l\;{\mathrm{T}}}{\omega^\prime _{c,\;i,\;k}} } \right\|^2}{\left\| {\hat \lambda _{i,\;k}^l} \right\|^2} \end{split} $$ (49) 证明. $ {L_{a_i}} $的一阶差分为
$$ \begin{split} \Delta {L_{a_i}} =\;& \frac{1}{{{\eta _a}}}{\rm{tr}}\left( {\tilde W_{{a_i}}^{l + 1\;{\mathrm{T}}}\tilde W_{{a_i}}^{l + 1} - \tilde W_{{a_i}}^{l\;{\mathrm{T}}}\tilde W_{{a_i}}^l} \right)=\\ & {\rm{tr}}\left\{ { - 2\tilde \beta _{i,\;k}^l{{\left( {W_{{\lambda _i}}^{l\;{\mathrm{T}}}{\omega^\prime _{c,\;i,\;k}} } \right)}^{\mathrm{T}}}\hat \lambda _{i,\;k}^l} \right.+\\ & {\eta _a}{\left\| {{\omega _{a,\;i,\;k}}} \right\|^2}{\left\| {\hat \lambda _{i,\;k}^l} \right\|^2}\left. {{{\left\| {W_{{\lambda _i}}^{l\;{\mathrm{T}}}{\omega^\prime _{c,\;i,\;k}} } \right\|}^2}} \right\} \end{split} $$ (50) 与引理3证明类似, 易得$ \Delta {L_{{a_i}}} $满足式(49).
□ 通过上述分析, 可以给出算法收敛性定理.
定理1. 考虑非线性智能体$ i $的输入输出反馈线性化控制器学习过程, 动作网络、奖励网络和双评价网络分别如式(13)、(17)和(19)所定义. 各网络权值根据式(25)、(23)和(21)给出的更新规律进行更新. 如果学习参数满足以下不等式:
$$ \begin{aligned} \left\{ \begin{aligned} &3{\lambda _{\max }} < 1,\;\frac{{\sqrt 2 }}{2} < {\gamma _{J_i}} < 1\\ &{\eta _c} < \frac{1}{{{\mu _{J_i}}\gamma _{J_i}^2{{\left\| {{\omega^m}} \right\|}^2}}},\;{\eta _r} < \frac{1}{{{{\left\| {{\omega^m}} \right\|}^2}}},\;{\eta _a} < \frac{1}{{{{\left\| {{\omega^m}} \right\|}^2}}} \end{aligned} \right. \end{aligned} $$ (51) 则有基于输入输出数据的两阶段自适应双评价设计算法的跟踪性能误差$ {{e_{m,\;i,\;k}}} \in {{\cal{P}}_{{e_{m,\;i}}}} $和学习误差$ \tilde J_{i,\;k}^l \in {{\cal{P}}_{J_i}} $最终一致有界. 其中
$$ \begin{split}& {{\cal{P}}_{{e_{m,\;i}}}} = \left\{ {{e_{m,\;i,\;k}} \in {{\bf{R}}^n}:\left\| {{e_{m,\;i,\;k}}} \right\| \le \sqrt {\frac{{{{\Gamma}_{{\mathrm{max}}}}}}{{1 - 3{\lambda _{\max }}}}} } \right\}\;\\& {{\cal{P}}_{J_i}} = \left\{ {J_{i,\;k}^l \in {\bf{R}}:\left\| {\tilde J_{i,\;k}^l} \right\| \le \sqrt {\frac{{{{\Gamma}_{{\mathrm{max}}}}}}{{{\mu _{j}}\left( {2\gamma _{{J_i}}^2 - 1} \right)}}} } \right\}\\[-1pt] \end{split} $$ (52) 证明. 基于引理2 ~ 4以及不等式(34), 无模型反馈线性化算法的Lyapuno候选函数满足如下不等式:
$$ \begin{split} \Delta {L_i} &= \Delta {L_{{e_i}}} + \Delta {L_{{c_i}}} + \Delta {L_{{r_i}}} + \Delta {L_{{a_i}}}\le\\ & - \left( {\frac{1}{3} - {\lambda _{\max }}} \right){\left\| {{e_{m,\;i,\;k}}} \right\|^2} -\\ & {\mu _j}\gamma _{{J_i}}^2\left( {I - {\chi _{{J_k}}}} \right){\left\| {\tilde J_{i,\;k}^l + \gamma _{{J_i}}^{ - 1}\varepsilon _{{j_k}}^{\rm{*}}} \right\|^2} - \\ &{\mu _J}\gamma _{{J_i}}^2{\left\| {\tilde J_{i,\;k}^l} \right\|^2}-\\ & {\mu _\lambda }\gamma _{{J_i}}^2\left( {I - {\eta _c}{\mu _\lambda }\gamma _{{J_i}}^2{{\left\| {{\Xi _{i,\;k}}} \right\|}^2}{{\left\| {{\omega _{c,\;i,\;k}}} \right\|}^2}} \right)\times\\ &{\left\| {\Xi _{i,\;k}^{\mathrm{T}}\tilde \lambda _{i,\;k}^l + \gamma _{{J_i}}^{ - 1}\varepsilon _{{\lambda _k}}^{\rm{*}}} \right\|^2} -\\ & {\mu _\lambda }\gamma _{{J_i}}^2{\left\| {{\Xi _{i,\;k}}} \right\|^2}{\left\| {\tilde \lambda _{i,\;k}^l} \right\|^2}-\\ &\left( {1 - {\eta _r}{{\left\| {{\omega _{r,\;i,\;k}}} \right\|}^2}} \right){\left\| {W_{{J_i}}^{l,\;{\mathrm{T}}}{\omega _{c,\;i,\;k}}^\prime } \right\|^2}{\left\| {{J_{i,\;k}}{\gamma _{{J_i}}}} \right\|^2} -\\ & \left( {1 - {\eta _a}{{\left\| {{\omega _{a,\;i,\;k}}} \right\|}^2}} \right){\left\| {W_{{\lambda _i}}^{l,\;{\mathrm{T}}}{\omega _{c,\;i,\;k}}^\prime } \right\|^2}{\left\| {\hat \lambda _{i,\;k}^l} \right\|^2} + {\Gamma _i} \end{split} $$ (53) 其中, 对$ \Gamma _i $进行缩放可得:
$$ \begin{split} {\Gamma _i} =\;& 2{\left\| {\tilde \beta _{i,\;k}^l} \right\|^2} + {\left\| {{d_{i,\;k}}} \right\|^2} + {\left\| {W_{{\lambda _i}}^{l,\;{\mathrm{T}}}{\omega _{c,\;i,\;k}}^\prime } \right\|^2} +\\ & {\left\| {\hat \lambda _{i,\;k}^l} \right\|^2} + {\left\| {\tilde W_{{r_i}}^{l,\;{\mathrm{T}}}{\omega _{r,\;i,\;k}}} \right\|^2} + {\left\| {W_{{J_i}}^{l,\;{\mathrm{T}}}{\omega _{c,\;i,\;k}}^\prime } \right\|^2}+\\ & {\left\| {{J_{i,\;k}}{\gamma _{{J_i}}}} \right\|^2} + \frac{1}{2}{\mu _j}{\left\| {\tilde J_{i,\;k}^l} \right\|^2} + 2{\mu _j}\Big\| \hat R_{i,\;k - 1}^l \;+\\ & {\gamma _{{J_i}}}W_{{J_i}}^*{\omega _{c,\;i,\;k}} - \frac{1}{2}\left( {W_{{J_i}}^l + W_{{J_i}}^*} \right){\omega _{c,\;i,\;k - 1}} \Big\|^2+\\ & \frac{1}{2}{\mu _\lambda }{\left\| {\tilde \lambda _{i,\;k}^l} \right\|^2} + 2{\mu _\lambda }\Big\| \frac{{\partial \hat R_{i,\;k - 1}^l}}{{\partial {X_{i,\;k - 1}}}} +\\ &{\gamma _{{J_i}}}{\Xi _{i,\;k}}W_{{\lambda _i}}^*{\omega _{c,\;i,\;k}} - \frac{1}{2}\left( {W_{{\lambda _i}}^l - W_{{\lambda _i}}^*} \right){\omega _{c,\;i,\;k - 1}} \Big\|^2 \end{split} $$ $$ \begin{split} & {\left\| {\tilde W_{{r_i}}^{l\;{\mathrm{T}}}{\omega _{r,\;k}}} \right\|^2} + {\left\| {W_{{J_i}}^{l\;{\mathrm{T}}}{\omega^\prime _{c,\;k}} } \right\|^2} \;+ \\ &{\left\| {{d_{i,\;k}}} \right\|^2} + {\left\| {{J_{i,\;k}}{\gamma _{{J_i}}}} \right\|^2} \end{split} $$ (54) 则$ {\Gamma _i} $的上界为
$$ \begin{split} {\Gamma _{\max }} =\;& 2{\left\| {W_a^m{\omega ^m}} \right\|^2} + 2{\left\| {W_\lambda ^m{\omega ^m}} \right\|^2} + {\left\| {W_r^m{\omega ^m}} \right\|^2}\;+\\ & {\left\| {W_J^m{\omega ^m}} \right\|^2} + {\left\| {{d^m}} \right\|^2} + {\left\| {{\gamma _{J_i}}W_J^m{\omega ^m}} \right\|^2}\;+\\ & 8{\mu _\lambda }{\left\| {W_r^m{\omega ^m}} \right\|^2} \;+\\ & \frac{1}{2}{\mu _\lambda }\left( {5 + 8{{\left\| {{\gamma _J}{\Xi ^m}} \right\|}^2}} \right){\left\| {W_\lambda ^m{\omega ^m}} \right\|^2} \;+\\ & 8{\mu _j}{\left\| {W_r^m{\omega ^m}} \right\|^2} + \frac{1}{2}{\mu _j} \left( {5 + 8{{\left\| {{\gamma _{{J_i}}}} \right\|}^2}} \right) {\left\| {W_J^m{\omega ^m}} \right\|^2} \end{split} $$ (55) 当学习参数满足式(51), 且对于任意的跟踪误差和值函数估计误差
$$ \left\{\begin{aligned} &\left\| {{e_{m,\;i,\;k}}} \right\| > \sqrt {\frac{{{{\Gamma_{\max}}}}}{{1 - 3{\lambda _{\max }}}}}\;\\ &\left\| {\tilde J_k^l} \right\| > \sqrt {\frac{{\Gamma_{\max}}}{{{\mu _j}\left( {2\gamma _{{J_i}}^2 - 1} \right)}}} \end{aligned}\right. $$ (56) 有$ \Delta L_i \le 0 $. 因此, 根据Lyapunov扩展定理, 可得跟踪误差和学习误差最终一致有界收敛.
□ 定理1及相关证明通过数学推导给出学习收敛的条件, 这些条件的满足确保了系统的收敛性. 接下来将展示两个案例实验验证所提方法在模型未知的异构非线性多智能体系统中的应用效果.
4. 实验验证
在本节中, 通过对异构未知非线性多智能体系统的仿真算例说明同胚分布式控制协议的可开发性和有效性. 系统的网络拓扑如图3所示. 考虑由6个两轮小车横向动力学构成的多智能体系统, 智能体的动力学如下所示:
$$ \begin{split} &{f_i}({\xi _i}) = \left[ {\begin{array}{*{20}{c}} {\bar v\cos ({\psi _i})}\\ {\dfrac{{{h_i}}}{{2{m_i}}}\dot \psi \sin ({\psi _i})}\\ {\bar v\sin ({\psi _i})}\\ { - \dfrac{{{h_i}}}{{2{m_i}}}{{\dot \psi }_i}\cos ({\psi _i})}\\ {{{\dot \psi }_i}} \end{array}} \right]\; \\ &{g_i}({\xi _i}) = \left[ {\begin{array}{*{20}{c}} 0\\ 0\\ 0\\ 0\\ {\dfrac{{{m_i}}}{{{h_i}}}} \end{array}} \right],\; {h_i}({\xi _i}) = \left[ {\begin{array}{*{20}{l}} {{x_{i,\;k}}}\\ {{y_{i,\;k}}} \end{array}} \right] \end{split} $$ 其中, ${{\xi }_{i}}={{\left[ \begin{matrix} x & {\dot{x}} & y & {\dot{y}} & {\dot{\psi }} \end{matrix} \right]}^{{\mathrm{T}}} }$, $ x $, $ y $, $ \dot{x} $, $ \dot{y} $分别为小车中心沿$ x $轴和$ y $轴方向的位移和速度, $ \psi $和$ \dot{\psi } $为航向角和角速度, $ {{m}_{i}} $为车轮到小车中心距离, $ {{h}_{i}} $为万向轮到小车中心的距离, 模型参数(表1)和模型结构$ {{f}_{i}}({{\xi }_{i}}) $, $ {{g}_{i}}({{\xi }_{i}}) $在学习过程中被设定为未知. $ \bar{v} $为小车前进速度.
表 1 异构多智能体系统参数Table 1 Heterogeneous multi-agent system parameters变量 值 (m) 变量 值 (m) 变量 值 (m) $ {m_1} $ 0.04 $ {m_2} $ 0.04 $ {m_3} $ 0.06 $ {h_1} $ 0.06 $ {h_2} $ 0.04 $ {h_3} $ 0.06 $ {m_4} $ 0.06 $ {m_5} $ 0.08 $ {m_6} $ 0.08 $ {h_4} $ 0.04 $ {h_5} $ 0.06 $ {h_6} $ 0.04 为了降低分布式控制难度, 将各智能体目标线性系统设定为如下同构系统:
$$ \left\{ \begin{aligned} &{{\dot \xi }_i} = A{\xi _i} + B{v_{i,\;k}}\\ &{y_{i,\;k}} = C{\xi _i} \end{aligned} \right.,\;{\rm{ }}i = 1,\; \cdots ,\;6 $$ (57) 其中, $ A = \left[ \begin{matrix} 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 \\ \end{matrix} \right]$, $ B = {{\left[ \begin{matrix} 0 \\ 0 \\ 0 \\ 0 \\ 1 \\ \end{matrix} \right]}}$, $ C = \left[ \begin{matrix} 1 & 0 \\ 0 & 0 \\ 0 & 1 \\ 0 & 0 \\ 0 & 0 \\ \end{matrix} \right]^{\mathrm{T}}$.
4.1 案例1: 学习有效性验证
在本案例中, 采用预设计的式(57)和以其作为对象设计的线性分布式控制器, 基于两阶段双启发式自适应动态规划算法优化反馈线性化控制器, 进行学习前后控制效果对比实验. 学习参数如表2所示.
表 2 学习参数Table 2 Learning parameters参数 值 参数 值 参数 值 $ {\eta _r} $ 0.05 $ {\eta _c} $ 0.02 $ {\eta _a} $ 0.01 $ \gamma $ 0.9 $ {\mu _j} $ 0.01 $ {\mu _\lambda } $ 0.01 $ \varepsilon_i $ 0.08 $ H $ $ [1,\; 0.2] $ 奖励网络以扩展状态−动作对$ {{X}_{i,\;k}} $作为输入, 输出奖励值$ {{R}_{i,\;k}} $. 双评价网络以$ {{X}_{i,\;k}} $和$ {{R}_{i,\;k}} $为输入, 输出值函数$ \hat{J}_{i,\;k}^{l} $和启发式函数值$ \hat{\lambda }_{i,\;k}^{l} $. 动作网络的输入为状态$ {{x}_{i,\;k}} $, 输出未知非线性项$ {{\hat{\beta }}_{i}}({{x}_{i,\;k}}) $的估计值. 网络的初始权值服从均值为0、方差为0.1的分布.
在实验的初始阶段, 采用未训练的同胚分布式控制器对系统进行控制. 图4(a)和图4(c)显示了学习前的系统状态演化曲线. 结果表明未训练的控制器在应对异构非线性智能体系统时, 表现出较大的误差和不稳定性, 系统输出无法与期望轨迹一致. 原因在于系统的非线性动态和显著的异构性, 使得线性化控制策略无法适应所有智能体, 导致一致性控制效果不理想. 通过引入无模型反馈线性化算法, 并结合经验池和梯度下降对每个智能体的反馈线性化控制器进行训练, 系统控制性能显著提升. 学习收敛后, 系统收敛性和稳定性明显提高(图4(b)和图4(d)), 智能体输出与期望轨迹趋于一致, 跟踪误差显著减少, 验证了同胚分布式控制协议在模型未知的异构智能体系统中的有效性.
与现有动态规划方法不同, 本文无需预设计奖励信号的超参数. 但所提双启发式自适应动态规划算法仍然能够快速使各智能体的值函数网络和奖励函数网络的权值收敛(图5和图6), 体现出算法在应对非线性系统时具备较高的效率. 具体来说, 值函数网络通过奖励函数学习智能体线性化特征的长期动态行为, 逐步优化系统性能. 而奖励函数网络则动态调整奖励信号, 引导系统线性化效果.
值得注意的是, 图7中奖励函数的损失高于值函数损失, 说明直接使用原始奖励信号来驱动值函数学习可能会导致较大的波动性, 增加学习收敛的难度. 因此实验中引入的奖励值动态调整机制能够通过平滑奖励信号减少值函数网络的学习波动, 增强学习的稳定性.
4.2 案例2: 方法优越性验证
在本案例中, 为验证所提方法的可扩展性和优越性, 在反馈线性化控制器学习收敛后, 将其与预设的分布式控制器共同作用于系统. 系统在稳定运行30 s后, 仅通过调整分布式控制器$ v_{i,\;k} $, 实现编队构型的快速调整(图8). 实验结果表明, 同胚分布式控制协议能通过调整虚拟输入端的线性控制器输入适应不同的动态性能要求, 无需重新学习.
所提无模型分布式控制方法与现有方法的显著区别在于, 本方法在学习收敛后, 得到的反馈线性化控制器与被控系统共同组成已知的线性化系统, 可利用线性系统理论进行控制与综合. 如果系统性能需求或环境发生改变, 也可以方便地调整线性控制输入, 而完全依赖学习的无模型分布式控制器设计方法由于状态空间发生改变, 则需要重新学习.
5. 结束语
本文提出一种同胚分布式控制协议, 解决了异构非线性多智能体系统的无模型输出一致性控制问题. 结合输入输出反馈线性化理论和自适应动态规划技术, 实现了无需系统模型的非线性系统线性化. 通过将异构非线性多智能体系统转为预设的同构线性系统, 简化了分布式控制器的设计, 使得线性控制理论得以应用. 动态调整的奖励值和双阶段学习机制在训练过程中不断优化控制器, 增强了学习的稳定性和收敛速度. 实验结果表明, 各智能体的轨迹在所提方法下能够快速收敛到期望输出, 验证了控制策略的适应性和二次设计能力. 未来的研究将进一步讨论方法的泛化性, 考虑存在输入时滞、饱和、受限等情况, 扩展同胚分布式控制协议的适用范围, 以应对更复杂的实际应用场景.
-
表 1 几类相关概念在国家军用系列标准下的描述及主要侧重角度
Table 1 Descriptions and key focuses of several related concepts under national military standards
表 2 经典安全性评估方法与实时安全性评估方法的概念辨析
Table 2 Conceptual analysis of classical safety assessment approaches and real-time safety assessment approaches
经典安全性评估 实时安全性评估 先验知识需求 较多 较少 计算资源需求 较少 较多 主体适用对象 系统级 部件级 主要侧重阶段 方案设计阶段 使用保障阶段 典型应用领域 核能、航空航天 电力、工程结构 现有理论成果 较多 较少 应用价值 较高 较高 表 3 不同环境条件下实时安全性评估方法的研究情况
Table 3 Research of real-time safety assessment approaches under different environmental conditions
所依据的主要理论体系 平稳环境 非平稳环境 状态空间 极多 极少 风险模型 较少 较多 专家系统 较多 较少 统计学习 较多 较多 深度学习 极多 极少 信号处理 多 少 表 4 不同环境条件下实时安全性评估方法的研究特点
Table 4 Features of real-time safety assessment approaches under different environmental conditions
平稳环境 非平稳环境 是否利用系统输入输出 是 是 系统特性变化 较少 较多 更新能力需求 低 高 外部因素影响程度 低 高 问题困难程度 较低 较高 实际应用范围 较小 较大 评估模型稳定性 较高 较低 决策反馈要求 低 高 现有理论成果 较多 较少 表 5 几类典型动态系统在实时安全性评估框架下的现有进展
Table 5 Current advances in real-time safety assessment frameworks for several typical dynamic systems
系统对象 参考文献 平稳环境下显式分析法 平稳环境下隐式分析法 非平稳环境下显式分析法 非平稳环境下隐式分析法 工程结构系统 [154−158] [74, 84−85, 94−95, 107−108, 159−163] [126, 129, 164−166] [167−168] 交通系统 [21, 169] [170−172] [173−174] [150−153, 175] 电力系统 [176] [80−82, 86, 88, 91−93, 96−97, 177−180] [135−136, 140] [143−149, 181] 核能发电系统 [19, 182−183] [184] [34, 36, 127, 132, 137, 139, 185] [186] 化工系统 [187−188] [189−191] [44, 130−131] [192] 航空航天系统 [79, 193−195] [196] [37, 128, 197] [198] -
[1] Brin M, Stuck G. Introduction to Dynamical Systems. Cambridge: Cambridge University Press, 2002. [2] 周东华, 胡艳艳. 动态系统的故障诊断技术. 自动化学报, 2009, 35(6): 748−758 doi: 10.3724/SP.J.1004.2009.00748Zhou Dong-Hua, Hu Yan-Yan. Fault diagnosis techniques for dynamic systems. Acta Automatica Sinica, 2009, 35(6): 748−758 doi: 10.3724/SP.J.1004.2009.00748 [3] Smith R E. Quantitative vs. Qualitative ESOH Risk Assessments Using the 882E Risk Matrix, MIL-STD-882E, San Diego, USA, 2012.Smith R E. Quantitative vs. Qualitative ESOH Risk Assessments Using the 882E Risk Matrix, MIL-STD-882E, San Diego, USA, 2012. [4] 中国人民解放军总装备部. 装备安全性工作通用要求, GJB 900A-2012, 2012.General Equipment Department of the Chinese People' s Liberation Army. General Requirements for Materiel Safety Program, GJB 900A-2012, 2012. [5] International Electrotechnical Commission. International Electrotechnical Vocabulary Online Database (351-57-05. Safety), 2022.International Electrotechnical Commission. International Electrotechnical Vocabulary Online Database (351-57-05. Safety), 2022. [6] Isermann R, Ballé P. Trends in the application of model-based fault detection and diagnosis of technical processes. Control Engineering Practice, 1997, 5(5): 709−719 doi: 10.1016/S0967-0661(97)00053-1 [7] 中央军委装备发展部. 装备可靠性工作通用要求, GJB 450B-2021, 2021.Equipment Development Department of People's Republic of China Central Military Commission. General Requirements for Equipment Reliability Work, GJB 450B-2021, 2021. [8] 中国人民解放军总装备部. 装备测试性工作通用要求, GJB 2547A-2012, 2012.General Equipment Department of the Chinese People's Liberation Army. General Requirements for Equipment Testing Work, GJB 2547A-2012, 2012. [9] Aldemir T. A survey of dynamic methodologies for probabilistic safety assessment of nuclear power plants. Annals of Nuclear Energy, 2013, 52: 113−124 doi: 10.1016/j.anucene.2012.08.001 [10] Li B Q, Wen S P, Yan Z, Wen G H, Huang T W. A survey on the control lyapunov function and control barrier function for nonlinear-affine control systems. IEEE/CAA Journal of Automatica Sinica, 2023, 10(3): 584−602 doi: 10.1109/JAS.2023.123075 [11] Liu Z Y, Hu S Q, He X. Real-time safety assessment of dynamic systems in non-stationary environments: A review of methods and techniques. In: Proceedings of the CAA Symposium on Fault Detection, Supervision and Safety for Technical Processes (SAFEPROCESS). Yibin, China: IEEE, 2023. 1−6 [12] 柴毅, 毛万标, 任浩, 屈剑锋, 尹宏鹏, 杨志敏, 等. 航天发射系统运行安全性评估研究进展与挑战. 自动化学报, 2019, 45(10): 1829−1845Chai Yi, Mao Wan-Biao, Ren Hao, Qu Jian-Feng, Yin Hong-Peng, Yang Zhi-Min, et al. Research on operational safety assessment for spacecraft launch system: Progress and challenges. Acta Automatica Sinica, 2019, 45(10): 1829−1845 [13] Liu C, He X, Zhou D H, Huang B. Safety assessment for dynamic systems: A survey. Cybernetics and Intelligence, DOI: 10.26599/CAI.2024.9390001 [14] Zio E. The future of risk assessment. Reliability Engineering and System Safety, 2018, 177: 176−190 [15] Stamatis D H. Failure Mode and Effect Analysis. Quality Press, 2003.Stamatis D H. Failure Mode and Effect Analysis. Quality Press, 2003. [16] Liu H C, Liu L, Liu N. Risk evaluation approaches in failure mode and effects analysis: A literature review. Expert Systems With Applications, 2013, 40(2): 828−838 doi: 10.1016/j.eswa.2012.08.010 [17] Zio E. Integrated deterministic and probabilistic safety assessment: Concepts, challenges, research directions. Nuclear Engineering and Design, 2014, 280: 413−419 doi: 10.1016/j.nucengdes.2014.09.004 [18] de Vasconcelos V, Soares W A, da Costa A C L, Raso A L. Deterministic and probabilistic safety analyses. Advances in System Reliability Engineering. London: Academic Press, 2019. 43−75 [19] Holmberg J E, Kahlbom U. Application of human reliability analysis in the deterministic safety analysis for nuclear power plants. Reliability Engineering and System Safety, 2020, 194: Article No. 106371 [20] Rausand M. Preliminary Hazard Analysis, Norwegian University of Science and Technology, Norwegian, 2005.Rausand M. Preliminary Hazard Analysis, Norwegian University of Science and Technology, Norwegian, 2005. [21] Hadj-Mabrouk H. Preliminary Hazard Analysis (PHA): New hybrid approach to railway risk analysis. International Refereed Journal of Engineering and Science, 2017, 6(2): 51−58 [22] Lee W S, Grosh D L, Tillman F A, Lie C H. Fault tree analysis, methods, and applications——A review. IEEE Transactions on Reliability, 1985, R-34(3): 194−203Lee W S, Grosh D L, Tillman F A, Lie C H. Fault tree analysis, methods, and applications——A review. IEEE Transactions on Reliability, 1985, R-34 (3): 194−203 [23] Xing L D, Amari S V. Fault tree analysis. Handbook of Performability Engineering. London: Springer, 2008. 595−620 [24] Andrews J D, Dunnett S J. Event-tree analysis using binary decision diagrams. IEEE Transactions on Reliability, 2000, 49(2): 230−238 doi: 10.1109/24.877343 [25] Ferdous R, Khan F, Sadiq R, Amyotte P, Veitch B. Handling data uncertainties in event tree analysis. Process Safety and Environmental Protection, 2009, 87(5): 283−292 doi: 10.1016/j.psep.2009.07.003 [26] Dunjó J, Fthenakis V, Vílchez J A, Arnaldos J. Hazard and operability (HAZOP) analysis. A literature review. Journal of Hazardous Materials, 2010, 173(1−3): 19−32 doi: 10.1016/j.jhazmat.2009.08.076 [27] Baybutt P. A critique of the Hazard and Operability (HAZOP) study. Journal of Loss Prevention in the Process Industries, 2015, 33: 52−58 doi: 10.1016/j.jlp.2014.11.010 [28] Stranks J. Human Factors and Behavioural Safety. Oxford: Butterworth-Heinemann, 2007. [29] Booth R T, Lee T R. The role of human factors and safety culture in safety management. Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture, 1995, 209(5): 393−400 doi: 10.1243/PIME_PROC_1995_209_098_02 [30] Aldemir T, Siu N O, Mosleh A, Cacciabue P C, Göktepe B G. Reliability and Safety Assessment of Dynamic Process Systems. Berlin: Springer, 2013. 120 [31] Liu Z Y, Xiao F Y. An intuitionistic evidential method for weight determination in FMEA based on belief entropy. Entropy, 2019, 21(2): Article No. 211 doi: 10.3390/e21020211 [32] 周家红, 许开立, 陈志勇. 系统动态安全评价研究. 东北大学学报(自然科学版), 2008, 29(3): 416−419 doi: 10.3321/j.issn:1005-3026.2008.03.029Zhou Jia-Hong, Xu Kai-Li, Chen Zhi-Yong. On the dynamic assessment of system safety. Journal of Northeastern University (Natural Science), 2008, 29(3): 416−419 doi: 10.3321/j.issn:1005-3026.2008.03.029 [33] Holmberg J, Niemelae I. Risk Measures in Living Probabilistic Safety Assessment, VTT-PUB--146, Technical Research Centre of Finland, Finland, 1993. [34] Kančev D, Čepin M, Gjorgiev B. Development and application of a living probabilistic safety assessment tool: Multi-objective multi-dimensional optimization of surveillance requirements in NPPs considering their ageing. Reliability Engineering and System Safety, 2014, 131: 135−147 [35] Čepin M. The extended living probabilistic safety assessment. Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability, 2020, 234(1): 183−192Čepin M. The extended living probabilistic safety assessment. Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability, 2020, 234(1): 183−192 [36] Yang J, Yang M, Wang W L, Li F J. Online application of a risk management system for risk assessment and monitoring at NPPs. Nuclear Engineering and Design, 2016, 305: 200−212 doi: 10.1016/j.nucengdes.2016.05.025 [37] Zarei E, Azadeh A, Khakzad N, Aliabadi M M, Mohammadfam I. Dynamic safety assessment of natural gas stations using Bayesian network. Journal of Hazardous Materials, 2017, 321: 830−840 doi: 10.1016/j.jhazmat.2016.09.074 [38] Podofillini L, Zio E, Mercurio D, Dang V N. Dynamic safety assessment: Scenario identification via a possibilistic clustering approach. Reliability Engineering and System Safety, 2010, 95(5): 534−549 [39] International Electrotechnical Commission. International Electrotechnical Vocabulary Online Database (351-57-03, Risk), 2022.International Electrotechnical Commission. International Electrotechnical Vocabulary Online Database (351-57-03, Risk), 2022. [40] Vališ D. Contribution to reliability and safety assessment of systems. Safety and Reliability, 2007, 27(3): 23−35 doi: 10.1080/09617353.2007.11690840 [41] Siu N. Risk assessment for dynamic systems: An overview. Reliability Engineering and System Safety, 1994, 43(1): 43−73 [42] Moradi R, Groth K M. Modernizing risk assessment: A systematic integration of PRA and PHM techniques. Reliability Engineering and System Safety, 2020, 204: 107194.1−107194.11 [43] Hollnagel E. Safety-I and Safety-II: The Past and Future of Safety Management. CRC Press, 2018.Hollnagel E. Safety-I and Safety-II: The Past and Future of Safety Management. CRC Press, 2018. [44] Villa V, Paltrinieri N, Khan F, Cozzani V. Towards dynamic risk analysis: A review of the risk assessment approach and its limitations in the chemical process industry. Safety Science, 2016, 89: 77−93 doi: 10.1016/j.ssci.2016.06.002 [45] 何潇, 郭亚琦, 张召, 贾繁林, 周东华. 动态系统的主动故障诊断技术. 自动化学报, 2020, 46(8): 1557−1570He Xiao, Guo Ya-Qi, Zhang Zhao, Jia Fan-Lin, Zhou Dong-Hua. Active fault diagnosis for dynamic systems. Acta Automatica Sinica, 2020, 46(8): 1557−1570 [46] Hu S Q, Liu Z Y, Li M Y, He X. CADM +: Confusion-based learning framework with drift detection and adaptation for real-time safety assessment. IEEE Transactions on Neural Networks and Learning Systems, DOI: 10.1109/TNNLS.2024.3369315 [47] Ditzler G, Roveri M, Alippi C, Polikar R. Learning in nonstationary environments: A survey. IEEE Computational Intelligence Magazine, 2015, 10(4): 12−25 doi: 10.1109/MCI.2015.2471196 [48] Ahmadi M, Israel A, Topcu U. Safety assessemt based on physically-viable data-driven models. In: Proceedings of the IEEE 56th Annual Conference on Decision and Control (CDC). Melbourne, VIC, Australia: IEEE, 2017. 6409−6414 [49] Knight J C. Safety critical systems: Challenges and directions. In: Proceedings of the 24th International Conference on Software Engineering. Orlando, FL, USA: IEEE, 2002. 547−550 [50] Rausand M. Reliability of Safety-critical Systems: Theory and Applications. Hoboken: Wiley Publishing, 2014. [51] Ames A D, Xu X R, Grizzle J W, Tabuada P. Control barrier function based quadratic programs for safety critical systems. IEEE Transactions on Automatic Control, 2017, 62(8): 3861−3876 doi: 10.1109/TAC.2016.2638961 [52] Clarke E M. Model checking. In: Proceedings of the 17th Conference on Foundations of Software Technology and Theoretical Computer Science. Kharagpur, India: Springer, 1997. 54−56 [53] Alur R, Dang T, Ivančić F. Progress on reachability analysis of hybrid systems using predicate abstraction. In: Proceedings of the 6th International Workshop on Hybrid Systems: Computation and Control. Prague, Czech Republic: Springer, 2003. 4−19 [54] Prajna S, Jadbabaie A, Pappas G J. Stochastic safety verification using barrier certificates. In: Proceedings of the 43rd IEEE Conference on Decision and Control (CDC). Nassau, Bahamas: IEEE, 2004. 929−934 [55] Prajna S, Rantzer A. On the necessity of barrier certificates. IFAC Proceedings Volumes, 2005, 38(1): 526−531 [56] Wang G B, Liu J, Sun H Y, Liu J, Ding Z H, Zhang M M. Safety verification of state/time-driven hybrid systems using barrier certificates. In: Proceedings of the 35th Chinese Control Conference (CCC). Chengdu, China: IEEE, 2016. 2483−2489 [57] Ames A D, Coogan S, Egerstedt M, Notomista G, Sreenath K, Tabuada P. Control barrier functions: Theory and applications. In: Proceedings of the 18th European Control Conference (ECC). Naples, Italy: IEEE, 2019. 3420−3431 [58] Xiao W, Cassandras C G, Belta C. Safe Autonomy With Control Barrier Functions: Theory and Applications. Cham: Springer, 2023. [59] Nguyen Q, Sreenath K. Exponential control barrier functions for enforcing high relative-degree safety-critical constraints. In: Proceedings of the American Control Conference (ACC). Boston, MA, USA: IEEE, 2016. 322−328 [60] Xiao W, Belta C. Control barrier functions for systems with high relative degree. In: Proceedings of the IEEE 58th Conference on Decision and Control (CDC). Nice, France: IEEE, 2019. 474−479 [61] Romdlony M Z, Jayawardhana B. Stabilization with guaranteed safety using control Lyapunov-barrier function. Automatica, 2016, 66: 39−47 doi: 10.1016/j.automatica.2015.12.011 [62] Xu X R, Tabuada P, Grizzle J W, Ames A D. Robustness of control barrier functions for safety critical control. IFAC-PapersOnLine, 2015, 48(27): 54−61 doi: 10.1016/j.ifacol.2015.11.152 [63] Zhu Z R, Chai Y, Yang Z M. A novel kind of sufficient conditions for safety judgement based on control barrier function. Science China Information Sciences, 2021, 64(9): Article No. 199205 doi: 10.1007/s11432-018-9840-6 [64] Zhu Z R, Chai Y, Yang Z M, Huang C H. Exponential-alpha safety criteria of a class of dynamic systems with barrier functions. IEEE/CAA Journal of Automatica Sinica, 2022, 9(11): 1939−1951 doi: 10.1109/JAS.2020.1003408 [65] Liu S M, Liu C L, Dolan J. Safe control under input limits with neural control barrier functions. In: Proceedings of the 6th Conference on Robot Learning. Auckland, New Zealand: PMLR, 2023. 1970−1980 [66] Zhang Z Y, Zhao Q C, Sun K L. A learning-based method for computing control barrier functions of nonlinear systems with control constraints. IEEE Robotics and Automation Letters, 2023, 8(7): 4259−4266 doi: 10.1109/LRA.2023.3281930 [67] Liu S H, Liu L J, Yu Z. Safe reinforcement learning for affine nonlinear systems with state constraints and input saturation using control barrier functions. Neurocomputing, 2023, 518: 562−576 doi: 10.1016/j.neucom.2022.11.006 [68] Bujorianu M L, Wisniewski R, Boulougouris E. p-safety and stability. IFAC-PapersOnLine, 2021, 54(9): 665−670 doi: 10.1016/j.ifacol.2021.06.127 [69] Wisniewski R, Bujorianu L M. Safety of stochastic systems: An analytic and computational approach. Automatica, 2021, 133: Article No. 109839 doi: 10.1016/j.automatica.2021.109839 [70] Wisniewski R, Bujorianu M L, Sloth C. p-safe analysis of stochastic hybrid processes. IEEE Transactions on Automatic Control, 2020, 65(12): 5220−5235 doi: 10.1109/TAC.2020.2972789 [71] Girard A. Controller synthesis for safety and reachability via approximate bisimulation. Automatica, 2012, 48(5): 947−953 doi: 10.1016/j.automatica.2012.02.037 [72] Xiang W M, Tran H D, Johnson T T. Output reachable set estimation for switched linear systems and its application in safety verification. IEEE Transactions on Automatic Control, 2017, 62(10): 5380−5387 doi: 10.1109/TAC.2017.2692100 [73] Schürmann B, Klischat M, Kochdumper N, Althoff M. Formal safety net control using backward reachability analysis. IEEE Transactions on Automatic Control, 2022, 67(11): 5698−5713 doi: 10.1109/TAC.2021.3124188 [74] 张燕, 周围, 丛培江. 基于模糊规则推理的大坝安全监测变形预测模型. 水电自动化与大坝监测, 2009, 33(2): 51−54Zhang Yan, Zhou Wei, Cong Pei-Jiang. Fuzzy inference-based deformation prediction model for dam safety monitoring. Hydropower Automation and Dam Monitoring, 2009, 33(2): 51−54 [75] Li G L, Zhou Z J, Hu C H, Chang L L, Zhou Z G, Zhao F J. A new safety assessment model for complex system based on the conditional generalized minimum variance and the belief rule base. Safety Science, 2017, 93: 108−120 doi: 10.1016/j.ssci.2016.11.011 [76] Li G L, Zhou Z J, Hu C H, Chang L L, Zhang H T, Yu C Q. An optimal safety assessment model for complex systems considering correlation and redundancy. International Journal of Approximate Reasoning, 2019, 104: 38−56 doi: 10.1016/j.ijar.2018.10.004 [77] Tang S W, Zhou Z J, Hu C H, Zhao F J, Cao Y. A new evidential reasoning rule-based safety assessment method with sensor reliability for complex systems. IEEE Transactions on Cybernetics, 2022, 52(5): 4027−4038 doi: 10.1109/TCYB.2020.3015664 [78] Liu Z Y, Deng Y, Zhang Y, Ding Z J, He X. Safety assessment of dynamic systems: An evidential group interaction-based fusion design. IEEE Transactions on Instrumentation and Measurement, 2021, 70: Article No. 3523014 [79] Zhou Z J, Feng Z C, Hu C H, Hu G Y, He W, Han X X. Aeronautical relay health state assessment model based on belief rule base with attribute reliability. Knowledge-Based Systems, 2020, 197: Article No. 105869 doi: 10.1016/j.knosys.2020.105869 [80] Tomin N V, Kurbatsky V G, Sidorov D N, Zhukov A V. Machine learning techniques for power system security assessment. IFAC-PapersOnLine, 2016, 49(27): 445−450 doi: 10.1016/j.ifacol.2016.10.773 [81] Wehenkel L, Pavella M. Decision tree approach to power systems security assessment. International Journal of Electrical Power and Energy Systems, 1993, 15(1): 13−36 [82] Krishnan V, McCalley J D, Henry S, Issad S. Efficient database generation for decision tree based power system security assessment. IEEE Transactions on Power Systems, 2011, 26(4): 2319−2327 doi: 10.1109/TPWRS.2011.2112784 [83] Hatziargyriou N D, Contaxis G C, Sideris N C. A decision tree method for on-line steady state security assessment. IEEE Transactions on Power Systems, 1994, 9(2): 1052−1061 doi: 10.1109/59.317626 [84] Nazarko P, Ziemiański L. Application of artificial neural networks in the damage identification of structural elements. Computer Assisted Methods in Engineering and Science, 2017, 18(3): 175−189 [85] Yu J B. A hybrid feature selection scheme and self-organizing map model for machine health assessment. Applied Soft Computing, 2011, 11(5): 4041−4054 doi: 10.1016/j.asoc.2011.03.026 [86] Bellizio F, Cremer J L, Sun M Y, Strbac G. A causality based feature selection approach for data-driven dynamic security assessment. Electric Power Systems Research, 2021, 201: Article No. 107537 doi: 10.1016/j.jpgr.2021.107537 [87] Liu C X, Tang F, Bak C L. An accurate online dynamic security assessment scheme based on random forest. Energies, 2018, 11(7): Article No. 1914 doi: 10.3390/en11071914 [88] Liu S K, Liu L H, Yang N, Mao D, Zhang L, Cheng J Z, et al. A data-driven approach for online dynamic security assessment with spatial-temporal dynamic visualization using random bits forest. International Journal of Electrical Power and Energy Systems, 2021, 124: Article No. 106316 [89] Liu S K, Liu L H, Fan Y P, Zhang L, Huang Y H, Zhang T, et al. An integrated scheme for online dynamic security assessment based on partial mutual information and iterated random forest. IEEE Transactions on Smart Grid, 2020, 11(4): 3606−3619 doi: 10.1109/TSG.2020.2991335 [90] He M, Zhang J S, Vittal V. A data mining framework for online dynamic security assessment: Decision trees, boosting, and complexity analysis. In: Proceedings of the IEEE PES Innovative Smart Grid Technologies (ISGT). Washington, DC, USA: IEEE, 2012. 1−8 [91] Xu Y, Dong Z Y, Zhao J H, Zhang P, Wong K P. A reliable intelligent system for real-time dynamic security assessment of power systems. IEEE Transactions on Power Systems, 2012, 27(3): 1253−1263 doi: 10.1109/TPWRS.2012.2183899 [92] Liu R D, Verbič G, Xu Y. A new reliability-driven intelligent system for power system dynamic security assessment. In: Proceedings of the Australasian Universities Power Engineering Conference (AUPEC). Melbourne, VIC, Australia: IEEE, 2017. 1−6 [93] Rizwan-ul-Hassan, Li C G, Liu Y T. Online dynamic security assessment of wind integrated power system using SDAE with SVM ensemble boosting learner. International Journal of Electrical Power and Energy Systems, 2021, 125: Article No. 106429 [94] Sarmadi H, Entezami A, Razavi B S, Yuen K V. Ensemble learning-based structural health monitoring by Mahalanobis distance metrics. Structural Control and Health Monitoring, 2021, 28(2): Article No. e2663 [95] Dworakowski Z, Stepinski T, Dragan K, Jablonski A, Barszcz T. Ensemble ANN classifier for structural health monitoring. In: Proceedings of the 15th International Conference on Artificial Intelligence and Soft Computing. Zakopane, Poland: Springer, 2016. 81−90 [96] Liu T J, Liu Y B, Liu J Y, Wang L F, Xu L X, Qiu G, et al. A Bayesian learning based scheme for online dynamic security assessment and preventive control. IEEE Transactions on Power Systems, 2020, 35(5): 4088−4099 doi: 10.1109/TPWRS.2020.2983477 [97] He M, Vittal V, Zhang J S. Online dynamic security assessment with missing PMU measurements: A data mining approach. IEEE Transactions on Power Systems, 2013, 28(2): 1969−1977 doi: 10.1109/TPWRS.2013.2246822 [98] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436−444 doi: 10.1038/nature14539 [99] Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E. Deep learning for computer vision: A brief review. Computational Intelligence and Neuroscience, 2018, 2018(1): Article No. 7068349 [100] Minaee S, Boykov Y, Porikli F, Plaza A, Kehtarnavaz N, Terzopoulos D. Image segmentation using deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(7): 3523−3542 [101] Ayodeji A, Amidu M A, Olatubosun S A, Addad Y, Ahmed H. Deep learning for safety assessment of nuclear power reactors: Reliability, explainability, and research opportunities. Progress in Nuclear Energy, 2022, 151: Article No. 104339 doi: 10.1016/j.pnucene.2022.104339 [102] Ye X W, Jin T, Yun C B. A review on deep learning-based structural health monitoring of civil infrastructures. Smart Structures and Systems, 2019, 24(5): 567−585 [103] Liu C, Zhang Y, He X. Expert-augmented data-driven safety level assessment scheme with incremental learning. In: Proceedings of the CAA Symposium on Fault Detection, Supervision, and Safety for Technical Processes (SAFEPROCESS). Chengdu, China: IEEE, 2021. 1−6 [104] Alawad H, Kaewunruen S, An M. A deep learning approach towards railway safety risk assessment. IEEE Access, 2020, 8: 102811−102832 doi: 10.1109/ACCESS.2020.2997946 [105] Li Z W, Liu F, Yang W J, Peng S H, Zhou J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(12): 6999−7019 doi: 10.1109/TNNLS.2021.3084827 [106] Sun M Y, Konstantelos I, Strbac G. A deep learning-based feature extraction framework for system security assessment. IEEE Transactions on Smart Grid, 2019, 10(5): 5007−5020 doi: 10.1109/TSG.2018.2873001 [107] Sarkar S, Reddy K K, Giering M, Gurvich M R. Deep learning for structural health monitoring: A damage characterization application. Annual Conference of the PHM Society, 2016, 8(1): 1−7 [108] Azimi M, Pekcan G. Structural health monitoring using extremely compressed data through deep learning. Computer-Aided Civil and Infrastructure Engineering, 2020, 35(6): 597−614 doi: 10.1111/mice.12517 [109] Ren C, Xu Y. A fully data-driven method based on generative adversarial networks for power system dynamic security assessment with missing data. IEEE Transactions on Power Systems, 2019, 34(6): 5044−5052 doi: 10.1109/TPWRS.2019.2922671 [110] Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath A A. Generative adversarial networks: An overview. IEEE Signal Processing Magazine, 2018, 35(1): 53−65 doi: 10.1109/MSP.2017.2765202 [111] Warnecke A, Arp D, Wressnegger C, Rieck K. Evaluating explanation methods for deep learning in security. In: Proceedings of the IEEE European Symposium on Security and Privacy (EuroS&P). Genoa, Italy: IEEE, 2020. 158−174 [112] Guo W B, Mu D L, Xu J, Su P R, Wang G, Xing X Y. LEMNA: Explaining deep learning based security applications. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. Toronto, Canada: ACM, 2018. 367−379 [113] Liu C, Zhang Y, Ding Z J, He X. Active incremental learning for health state assessment of dynamic systems with unknown scenarios. IEEE Transactions on Industrial Informatics, 2023, 19(2): 1863−1873 doi: 10.1109/TII.2022.3181187 [114] Liu C, He X, Li M Y, Zhang Y, Ding Z J. Active labeling aided semi-supervised safety assessment with task-related unknown scenarios. IEEE Transactions on Reliability, 2024, 73 (4): 1792−1804 [115] Xiao W, Cassandras C G, Belta C. Adaptive control barrier functions. Safe Autonomy With Control Barrier Functions: Theory and Applications. Cham: Springer, 2023. 73−94 [116] Dhiman V, Khojasteh M J, Franceschetti M, Atanasov N. Control barriers in Bayesian learning of system dynamics. IEEE Transactions on Automatic Control, 2023, 68(1): 214−229 doi: 10.1109/TAC.2021.3137059 [117] Taylor A J, Ames A D. Adaptive safety with control barrier functions. In: Proceedings of the American Control Conference (ACC). Denver, CO, USA: IEEE, 2020. 1399−1405 [118] Lopez B T, Slotine J J E, How J P. Robust adaptive control barrier functions: An adaptive and data-driven approach to safety. IEEE Control Systems Letters, 2021, 5(3): 1031−1036 doi: 10.1109/LCSYS.2020.3005923 [119] Xiao W, Belta C, Cassandras C G. Adaptive control barrier functions. IEEE Transactions on Automatic Control, 2022, 67(5): 2267−2281 doi: 10.1109/TAC.2021.3074895 [120] Xiao W, Wang T H, Hasani R, Chahine M, Amini A, Li X, et al. Barriernet: Differentiable control barrier functions for learning of safe robot control. IEEE Transactions on Robotics, 2023, 39(3): 2289−2307 doi: 10.1109/TRO.2023.3249564 [121] Hu J Q, Zhang L B, Liang W. An adaptive online safety assessment method for mechanical system with pre-warning function. Safety Science, 2012, 50(3): 385−399 doi: 10.1016/j.ssci.2011.09.018 [122] 赵福均, 周志杰, 胡昌华, 常雷雷, 王力. 基于证据推理的动态系统安全性在线评估方法. 自动化学报, 2017, 43(11): 1950−1961Zhao Fu-Jun, Zhou Zhi-Jie, Hu Chang-Hua, Chang Lei-Lei, Wang Li. Online safety assessment method based on evidential reasoning for dynamic systems. Acta Automatica Sinica, 2017, 43(11): 1950−1961 [123] Zhao F J, Zhou Z J, Hu C H, Chang L L, Zhou Z G, Li G L. A new evidential reasoning-based method for online safety assessment of complex systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2018, 48(6): 954−966 doi: 10.1109/TSMC.2016.2630800 [124] Feng Z C, He W, Zhou Z J, Ban X J, Hu C H, Han X X. A new safety assessment method based on belief rule base with attribute reliability. IEEE/CAA Journal of Automatica Sinica, 2021, 8(11): 1774−1785 doi: 10.1109/JAS.2020.1003399 [125] Zhao F J, Zhou Z J, Hu C H, Cao Y, Han X X, Feng Z C. A new safety assessment method based on evidential reasoning rule with a prewarning function. IEEE Access, 2018, 6: 31862−31871 doi: 10.1109/ACCESS.2018.2815631 [126] Wenzel H. Monitoring based risk assessment and asset management of civil infrastructures. In: Proceedings of the Structural Health Monitoring. 2019.Wenzel H. Monitoring based risk assessment and asset management of civil infrastructures. In: Proceedings of the Structural Health Monitoring. 2019. [127] Adumene S, Islam R, Amin T, Nitonye S, Yazdi M, Johnson K T. Advances in nuclear power system design and fault-based condition monitoring towards safety of nuclear-powered ships. Ocean Engineering, 2022, 251: Article No. 111156 doi: 10.1016/j.oceaneng.2022.111156 [128] Compare M, Martini F, Mattafirri S, Carlevaro F, Zio E. Semi-Markov model for the oxidation degradation mechanism in gas turbine nozzles. IEEE Transactions on Reliability, 2016, 65(2): 574−581 doi: 10.1109/TR.2015.2506610 [129] Zadakbar O, Imtiaz S, Khan F. Dynamic risk assessment and fault detection using a multivariate technique. Process Safety Progress, 2013, 32(4): 365−375 doi: 10.1002/prs.11609 [130] Yu H Y, Khan F, Garaniya V, Ahmad A. Self-organizing map based fault diagnosis technique for non-Gaussian processes. Industrial and Engineering Chemistry Research, 2014, 53(21): 8831−8843 [131] Wang H Z, Khan F, Ahmed S, Imtiaz S. Dynamic quantitative operational risk assessment of chemical processes. Chemical Engineering Science, 2016, 142: 62−78 doi: 10.1016/j.ces.2015.11.034 [132] Zeng Z G, Zio E. Dynamic risk assessment based on statistical failure data and condition-monitoring degradation data. IEEE Transactions on Reliability, 2018, 67(2): 609−622 doi: 10.1109/TR.2017.2778804 [133] Zio E. Prognostics and Health Management (PHM): Where are we and where do we (need to) go in theory and practice. Reliability Engineering and System Safety, 2022, 218: Article No. 108119 [134] Hu Y, Miao X W, Si Y, Pan E S, Zio E. Prognostics and health management: A review from the perspectives of design, development and decision. Reliability Engineering and System Safety, 2022, 217: Article No. 108063 [135] Zhao S, Makis V, Chen S W, Li Y. Health assessment method for electronic components subject to condition monitoring and hard failure. IEEE Transactions on Instrumentation and Measurement, 2019, 68(1): 138−150 doi: 10.1109/TIM.2018.2839938 [136] Dehghanian P, Guan Y F, Kezunovic M. Real-time life-cycle assessment of high-voltage circuit breakers for maintenance using online condition monitoring data. IEEE Transactions on Industry Applications, 2019, 55(2): 1135−1146 doi: 10.1109/TIA.2018.2878746 [137] Kim H, Lee S H, Park J S, Kim H, Chang Y S, Heo G. Reliability data update using condition monitoring and prognostics in probabilistic safety assessment. Nuclear Engineering and Technology, 2015, 47(2): 204−211 doi: 10.1016/j.net.2014.12.008 [138] BahooToroody A, Abaei M M, BahooToroody F, de Carlo F, Abbassi R, Khalaj S. A condition monitoring based signal filtering approach for dynamic time dependent safety assessment of natural gas distribution process. Process Safety and Environmental Protection, 2019, 123: 335−343 doi: 10.1016/j.psep.2019.01.016 [139] Xing J D, Zeng Z G, Zio E. A framework for dynamic risk assessment with condition monitoring data and inspection data. Reliability Engineering and System Safety, 2019, 191: Article No. 106552 [140] Ni M, McCalley J D, Vittal V, Tayyib T. Online risk-based security assessment. IEEE Transactions on Power Systems, 2003, 18(1): 258−265 doi: 10.1109/TPWRS.2002.807091 [141] Li H F, Diao R S, Zhang X H, Lin X, Lu X, Shi D, et al. An integrated online dynamic security assessment system for improved situational awareness and economic operation. IEEE Access, 2019, 7: 162571−162582 doi: 10.1109/ACCESS.2019.2952178 [142] Tchernykh A, Babenko M, Chervyakov N, Miranda-López V, Avetisyan A, Drozdov A Y, et al. Scalable data storage design for nonstationary IoT environment with adaptive security and reliability. IEEE Internet of Things Journal, 2020, 7(10): 10171−10188 doi: 10.1109/JIOT.2020.2981276 [143] Sobajic D J, Pao Y H. Artificial neural-net based dynamic security assessment for electric power systems. IEEE Transactions on Power Systems, 1989, 4(1): 220−228 doi: 10.1109/59.32481 [144] Sun K, Likhate S, Vittal V, Kolluri V S, Mandal S. An online dynamic security assessment scheme using phasor measurements and decision trees. IEEE Transactions on Power Systems, 2007, 22(4): 1935−1943 doi: 10.1109/TPWRS.2007.908476 [145] Diao R S, Sun K, Vittal V, O'Keefe R J, Richardson M R, Bhatt N, et al. Decision tree-based online voltage security assessment using PMU measurements. IEEE Transactions on Power Systems, 2009, 24(2): 832−839 doi: 10.1109/TPWRS.2009.2016528 [146] He M, Zhang J S, Vittal V. Robust online dynamic security assessment using adaptive ensemble decision-tree learning. IEEE Transactions on Power Systems, 2013, 28(4): 4089−4098 doi: 10.1109/TPWRS.2013.2266617 [147] Zhang R, Xu Y. Data-driven dynamic security assessment and control of power systems: An online sequential learning method. Journal of Energy Engineering, 2019, 145(5): Article No. 04019019 [148] Zhai C, Nguyen H D, Zong X F. Dynamic security assessment of small-signal stability for power grids using windowed online Gaussian process. IEEE Transactions on Automation Science and Engineering, 2023, 20(2): 1170−1179 doi: 10.1109/TASE.2022.3173368 [149] Singh M, Chauhan S. A hybrid-extreme learning machine based ensemble method for online dynamic security assessment of power systems. Electric Power Systems Research, 2023, 214: Article No. 108923 doi: 10.1016/j.jpgr.2022.108923 [150] Liu Z Y, Zhang Y, Ding Z J, He X. An online active broad learning approach for real-time safety assessment of dynamic systems in nonstationary environments. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(10): 6714−6724 doi: 10.1109/TNNLS.2022.3222265 [151] Liu Z Y, He X. Real-time safety assessment for dynamic systems with limited memory and annotations. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(9): 10076−10086 doi: 10.1109/TITS.2023.3266256 [152] Liu Z Y, He X. Dynamic submodular-based learning strategy in imbalanced drifting streams for real-time safety assessment in nonstationary environments. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(3): 3038−3051 doi: 10.1109/TNNLS.2023.3294788 [153] He X, Liu Z Y. Dynamic model interpretation-guided online active learning scheme for real-time safety assessment. IEEE Transactions on Cybernetics, 2024, 54(5): 2734−2745 doi: 10.1109/TCYB.2023.3339242 [154] Li A Q, Ding Y L, Wang H, Guo T. Analysis and assessment of bridge health monitoring mass data——Progress in research/development of “Structural Health Monitoring”. Science China Technological Sciences, 2012, 55(8): 2212−2224 doi: 10.1007/s11431-012-4818-5 [155] Bao Y Q, Beck J L, Li H. Compressive sampling for accelerometer signals in structural health monitoring. Structural Health Monitoring, 2011, 10(3): 235−246 doi: 10.1177/1475921710373287 [156] Bao Y Q, Tang Z Y, Li H. Compressive-sensing data reconstruction for structural health monitoring: A machine-learning approach. Structural Health Monitoring, 2020, 19(1): 293−304 doi: 10.1177/1475921719844039 [157] Harshitha C, Alapati M, Chikkakrishna N K. Damage detection of structural members using internet of things (IoT) paradigm. Materials Today: Proceedings, 2021, 43: 2337−2341 doi: 10.1016/j.matpr.2021.01.679 [158] Abdelgawad A, Yelamarthi K. Internet of things (IoT) platform for structure health monitoring. Wireless Communications and Mobile Computing, 2017, 2017(1): Article No. 6560797 [159] Goulet J A, Michel C, der Kiureghian A. Data-driven post-earthquake rapid structural safety assessment. Earthquake Engineering and Structural Dynamics, 2015, 44(4): 549−562 [160] Catelani M, Ciani L, Galar D, Patrizi G. Optimizing maintenance policies for a yaw system using reliability-centered maintenance and data-driven condition monitoring. IEEE Transactions on Instrumentation and Measurement, 2020, 69(9): 6241−6249 doi: 10.1109/TIM.2020.2968160 [161] Nyman J, Rosengren P, Kool P, Karoumi R, Leander J, Petursson H. Smart condition monitoring of a steel bascule railway bridge. Life-Cycle of Structures and Infrastructure Systems. London: CRC Press, 2023. 229−236 [162] Bandara R P, Chan T H T, Thambiratnam D P. Structural damage detection method using frequency response functions. Structural Health Monitoring, 2014, 13(4): 418−429 doi: 10.1177/1475921714522847 [163] Avci O, Abdeljaber O, Kiranyaz S, Inman D. Structural damage detection in real time: Implementation of 1D convolutional neural networks for SHM applications. Structural Health Monitoring and Damage Detection, Volume 7. Cham: Springer, 2017. 49−54 [164] Entezami A, Shariatmadar H. Structural health monitoring by a new hybrid feature extraction and dynamic time warping methods under ambient vibration and non-stationary signals. Measurement, 2019, 134: 548−568 doi: 10.1016/j.measurement.2018.10.095 [165] Avendano-Valencia L D, Spiridonakos M D, Fassois S D. In-operation identification of a wind turbine structure via non-stationary parametric models. In: Proceedings of the 8th International Workshop on Structural Health Monitoring. Stanford, CA, USA: Stanford University, 2011. Article No. 2611 [166] Xu C, Ni Y Q, Wang Y W. A novel Bayesian blind source separation approach for extracting non-stationary and discontinuous components from structural health monitoring data. Engineering Structures, 2022, 269: Article No. 114837 doi: 10.1016/j.engstruct.2022.114837 [167] Ye X W, Xi P S, Su Y H. Analysis of non-stationary wind characteristics at an arch bridge using structural health monitoring data. Journal of Civil Structural Health Monitoring, 2017, 7(4): 573−587 doi: 10.1007/s13349-017-0244-5 [168] Hua X, Xiao F, Chen G S, Zatar W, Hulsey L. Stochastic non-stationary characteristics of vehicle-induced bridge vibrations. Journal of Low Frequency Noise, Vibration and Active Control, 2023, 42(2): 759−770 doi: 10.1177/14613484221141800 [169] Klischat M, Althoff M. Generating critical test scenarios for automated vehicles with evolutionary algorithms. In: Proceedings of the IEEE Intelligent Vehicles Symposium (IV). Paris, France: IEEE, 2019. 2352−2358 [170] Feng S, Sun H W, Yan X T, Zhu H J, Zou Z X, Shen S Y, et al. Dense reinforcement learning for safety validation of autonomous vehicles. Nature, 2023, 615(7953): 620−627 doi: 10.1038/s41586-023-05732-2 [171] Krajewski R, Moers T, Nerger D, Eckstein L. Data-driven maneuver modeling using generative adversarial networks and variational autoencoders for safety validation of highly automated vehicles. In: Proceedings of the 21st International Conference on Intelligent Transportation Systems (ITSC). Maui, HI, USA: IEEE, 2018. 2383−2390 [172] Jenkins I R, Gee L O, Knauss A, Yin H, Schroeder J. Accident scenario generation with recurrent neural networks. In: Proceedings of the 21st International Conference on Intelligent Transportation Systems (ITSC). Maui, HI, USA: IEEE, 2018. 3340−3345 [173] Wang C, Storms K, Winner H. Online safety assessment of automated vehicles using silent testing. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(8): 13069−13083 doi: 10.1109/TITS.2021.3119546 [174] Åsljung D, Nilsson J, Fredriksson J. Using extreme value theory for vehicle level safety validation and implications for autonomous vehicles. IEEE Transactions on Intelligent Vehicles, 2017, 2(4): 288−297 doi: 10.1109/TIV.2017.2768219 [175] Liu Z Y, He X. Contrastive preference-guided active learning approach based on ranking correlation for real-time safety assessment. IEEE Transactions on Automation Science and Engineering, DOI: 10.1109/TASE.2024.3401470 [176] Fouad A A, Vekataraman S, Davis J A. An expert system for security trend analysis of a stability-limited power system. IEEE Transactions on Power Systems, 1991, 6(3): 1077−1084 doi: 10.1109/59.119249 [177] Wehenkel L, van Cutsem T, Ribbens-Pavella M. An artificial intelligence framework for online transient stability assessment of power systems. IEEE Transactions on Power Systems, 1989, 4(2): 789−800 doi: 10.1109/59.193853 [178] Rovnyak S, Kretsinger S, Thorp J, Brown D. Decision trees for real-time transient stability prediction. IEEE Transactions on Power Systems, 1994, 9(3): 1417−1426 doi: 10.1109/59.336122 [179] Kamwa I, Grondin R, Loud L. Time-varying contingency screening for dynamic security assessment using intelligent-systems techniques. IEEE Transactions on Power Systems, 2001, 16(3): 526−536 doi: 10.1109/59.932291 [180] Diao R S, Vittal V, Logic N. Design of a real-time security assessment tool for situational awareness enhancement in modern power systems. IEEE Transactions on Power Systems, 2010, 25(2): 957−965 doi: 10.1109/TPWRS.2009.2035507 [181] Liu D, Niu D X, Wang H, Fan L L. Short-term wind speed forecasting using wavelet transform and support vector machines optimized by genetic algorithm. Renewable Energy, 2014, 62: 592−597 doi: 10.1016/j.renene.2013.08.011 [182] Sui Y, Ding R, Wang H Q. A novel approach for occupational health and safety and environment risk assessment for nuclear power plant construction project. Journal of Cleaner Production, 2020, 258: Article No. 120945 doi: 10.1016/j.jclepro.2020.120945 [183] Shin J, Son H, Heo G. Cyber security risk evaluation of a nuclear I&C using BN and ET. Nuclear Engineering and Technology, 2017, 49(3): 517−524 doi: 10.1016/j.net.2016.11.004 [184] Jang K B, Baek C H, Woo T H. Assessment for nuclear security using analytic hierarchy process (AHP) incorporated with neural networking method in nuclear power plants (NPPs). Kerntechnik, 2022, 87(5): 607−614 doi: 10.1515/kern-2022-0040 [185] Cohn B, Noel T, Cardoni J, Haskin T, Osborn D, Aldemir T. Integrated safety and security analysis of nuclear power plants using dynamic event trees. Nuclear Science and Engineering, 2023, 197(sup1): S45−S56 doi: 10.1080/00295639.2023.2177076 [186] Yockey P, Erickson A, Spirito C. Cyber threat assessment of machine learning driven autonomous control systems of nuclear power plants. Progress in Nuclear Energy, 2023, 166: Article No. 104960 doi: 10.1016/j.pnucene.2023.104960 [187] Bajpai S, Sachdeva A, Gupta J P. Security risk assessment: Applying the concepts of fuzzy logic. Journal of Hazardous Materials, 2010, 173(1−3): 258−264 doi: 10.1016/j.jhazmat.2009.08.078 [188] Zhou J F, Reniers G, Zhang L B. A weighted fuzzy Petri-net based approach for security risk assessment in the chemical industry. Chemical Engineering Science, 2017, 174: 136−145 doi: 10.1016/j.ces.2017.09.002 [189] Peng T, Li C, Zhou X B. Application of machine learning to laboratory safety management assessment. Safety Science, 2019, 120: 263−267 doi: 10.1016/j.ssci.2019.07.007 [190] Gao Y C, Zhang J C, Cui S X, Wu Y Q, Huang M L, Zhuang S L. Machine learning-based QSAR for safety evaluation of environmental chemicals. QSAR in Safety Evaluation and Risk Assessment. Academic Press, 2024. 89−99Gao Y C, Zhang J C, Cui S X, Wu Y Q, Huang M L, Zhuang S L. Machine learning-based QSAR for safety evaluation of environmental chemicals. QSAR in Safety Evaluation and Risk Assessment. Academic Press, 2024. 89−99 [191] Wang Z H, Wen H Q, Su Y, Shen W F, Ren J Z, Ma Y J, et al. Insights into ensemble learning-based data-driven model for safety-related property of chemical substances. Chemical Engineering Science, 2022, 248: Article No. 117219 doi: 10.1016/j.ces.2021.117219 [192] Amin T, Khan F. Dynamic process safety assessment using adaptive Bayesian network with loss function. Industrial and Engineering Chemistry Research, 2022, 61(45): 16799−16814 [193] 胡昌华, 冯志超, 周志杰, 胡冠宇, 贺维, 曹友. 考虑环境干扰的液体运载火箭结构安全性评估方法. 中国科学: 信息科学, 2020, 50(10): 1559−1573 doi: 10.1360/SSI-2019-0148Hu Chang-Hua, Feng Zhi-Chao, Zhou Zhi-Jie, Hu Guan-Yu, He Wei, Cao You. A safety assessment method for a liquid launch rocket based on the belief rule base with environmental disturbance. Scientia Sinica Informationis, 2020, 50(10): 1559−1573 doi: 10.1360/SSI-2019-0148 [194] Li Q Y, Wu Q G, Tu H Y, Zhang J P, Zou X, Huang S. Ground risk assessment for unmanned aircraft focusing on multiple risk sources in urban environments. Processes, 2023, 11(2): Article No. 542 doi: 10.3390/pr11020542 [195] Tabassum A, Sabatini R, Gardi A. Probabilistic safety assessment for UAS separation assurance and collision avoidance systems. Aerospace, 2019, 6(2): Article No. 19 doi: 10.3390/aerospace6020019 [196] Jiao R H, Peng K X, Zhang K, Ma L, Pi Y T. A novel scheme for remaining useful life prediction and safety assessment based on hybrid method. In: Proceedings of the CAA Symposium on Fault Detection, Supervision and Safety for Technical Processes (SAFEPROCESS). Xiamen, China: IEEE, 2019. 395−400 [197] Dheedan A A. On-line safety monitor based on a safety assessment model and hierarchical deployment of a multi-agent system. International Journal on Advances in Internet Technology, 2012, 5(3−4): 95−113 [198] Wang W X, Li X M, Xie L F, Lv H B, Lv Z H. Unmanned aircraft system airspace structure and safety measures based on spatial digital twins. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(3): 2809−2818 doi: 10.1109/TITS.2021.3108995 [199] Farrar C R, Worden K. An introduction to structural health monitoring. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2007, 365(1851): 303−315 doi: 10.1098/rsta.2006.1928 [200] Gordan M, Sabbagh-Yazdi S R, Ismail Z, Ghaedi K, Carroll P, McCrum D, et al. State-of-the-art review on advancements of data mining in structural health monitoring. Measurement, 2022, 193: Article No. 110939 doi: 10.1016/j.measurement.2022.110939 [201] Cury A, Ribeiro D, Ubertini F, Todd M D. Structural Health Monitoring Based on Data Science Techniques. Cham: Springer, 2022. [202] Ko J M, Ni Y Q. Technology developments in structural health monitoring of large-scale bridges. Engineering Structures, 2005, 27(12): 1715−1725 doi: 10.1016/j.engstruct.2005.02.021 [203] He Z G, Li W T, Salehi H, Zhang H, Zhou H Y, Jiao P C. Integrated structural health monitoring in bridge engineering. Automation in Construction, 2022, 136: Article No. 104168 doi: 10.1016/j.autcon.2022.104168 [204] Niyirora R, Ji W, Masengesho E, Munyaneza J, Niyonyungu F, Nyirandayisabye R. Intelligent damage diagnosis in bridges using vibration-based monitoring approaches and machine learning: A systematic review. Results in Engineering, 2022, 16: Article No. 100761 doi: 10.1016/j.rineng.2022.100761 [205] Kaartinen E, Dunphy K, Sadhu A. LiDAR-based structural health monitoring: Applications in civil infrastructure systems. Sensors, 2022, 22(12): Article No. 4610 doi: 10.3390/s22124610 [206] Donoho D L. Compressed sensing. IEEE Transactions on Information Theory, 2006, 52(4): 1289−1306 doi: 10.1109/TIT.2006.871582 [207] Stepinski T, Uhl T, Staszewski W. Advanced Structural Damage Detection: From Theory to Engineering Applications. West Sussex: John Wiley & Sons, 2013. [208] Feng D M, Feng M Q. Computer vision for SHM of civil infrastructure: From dynamic response measurement to damage detection——A review. Engineering Structures, 2018, 156: 105−117 doi: 10.1016/j.engstruct.2017.11.018 [209] Worden K, Baldacchino T, Rowson J, Cross E J. Some recent developments in SHM based on nonstationary time series analysis. Proceedings of the IEEE, 2016, 104(8): 1589−1603 doi: 10.1109/JPROC.2016.2573596 [210] Worden K, Iakovidis I, Cross E J. New results for the ADF statistic in nonstationary signal analysis with a view towards structural health monitoring. Mechanical Systems and Signal Processing, 2021, 146: Article No. 106979 doi: 10.1016/j.ymssp.2020.106979 [211] Tarko A P. Surrogate measures of safety. Safe Mobility: Challenges, Methodology and Solutions. Leeds: Emerald Publishing Limited, 2018. 383−405 [212] 宁滨. 智能交通中的若干科学和技术问题. 中国科学: 信息科学, 2018, 48(9): 1264−1269 doi: 10.1360/N112018-00080Ning Bin. A number of scientific and technical problems in intelligent transportation. Scientia Sinica Informationis, 2018, 48(9): 1264−1269 doi: 10.1360/N112018-00080 [213] Arun A, Haque M, Bhaskar A, Washington S, Sayed T. A systematic mapping review of surrogate safety assessment using traffic conflict techniques. Accident Analysis and Prevention, 2021, 153: Article No. 106016 [214] Riedmaier S, Ponn T, Ludwig D, Schick B, Diermeyer F. Survey on scenario-based safety assessment of automated vehicles. IEEE Access, 2020, 8: 87456−87477 doi: 10.1109/ACCESS.2020.2993730 [215] Rassafi A A, Ganji S S, Pourkhani H. Road safety assessment under uncertainty using a multi attribute decision analysis based on Dempster-Shafer theory. KSCE Journal of Civil Engineering, 2018, 22(8): 3137−3152 doi: 10.1007/s12205-017-1854-5 [216] de Leur P, Sayed T. Development of a road safety risk index. Transportation Research Record: Journal of the Transportation Research Board, 2002, 1784(1): 33−42 doi: 10.3141/1784-05 [217] Wang W S, Wang L T, Zhang C Y, Liu C L, Sun L J. Social interactions for autonomous driving: A review and perspectives. Foundations and Trends® in Robotics, 2022, 10(3−4): 198−376 [218] Morison K, Wang L, Kundur P. Power system security assessment. IEEE Power and Energy Magazine, 2004, 2(5): 30−39 doi: 10.1109/MPAE.2004.1338120 [219] Grigsby L L. Dynamic security assessment. Power System Stability and Control. Boca Raton: CRC Press, 2007. 421−430 [220] Alimi O A, Ouahada K, Abu-Mahfouz A M. A review of machine learning approaches to power system security and stability. IEEE Access, 2020, 8: 113512−113531 doi: 10.1109/ACCESS.2020.3003568 [221] Fouad A A, Vittal V. Power System Transient Stability Analysis Using the Transient Energy Function Method. Englewood: Pearson, 1992. [222] Bellizio F, Cremer J L, Strbac G. Machine-learned security assessment for changing system topologies. International Journal of Electrical Power and Energy Systems, 2022, 134: Article No. 107380 [223] Li Q Q, Xu Y, Ren C, Zhao J H. A hybrid data-driven method for online power system dynamic security assessment with incomplete PMU measurements. In: Proceedings of the IEEE Power and Energy Society General Meeting (PESGM). Atlanta, GA, USA: IEEE, 2019. 1−5 [224] Makarov Y V, Du P W, Lu S, Nguyen T B, Guo X X, Burns J W, et al. PMU-based wide-area security assessment: Concept, method, and implementation. IEEE Transactions on Smart Grid, 2012, 3(3): 1325−1332 doi: 10.1109/TSG.2012.2193145 [225] Jardim J L. Online dynamic security assessment. Real-Time Stability in Power Systems: Techniques for Early Detection of the Risk of Blackout. Cham: Springer, 2014. 159−197 [226] Vaahedi E, Mansour Y, Tse E K. A general purpose method for on-line dynamic security assessment. IEEE Transactions on Power Systems, 1998, 13(1): 243−249 doi: 10.1109/59.651642 [227] Zhang Y, Xie L. Online dynamic security assessment of microgrid interconnections in smart distribution systems. IEEE Transactions on Power Systems, 2015, 30(6): 3246−3254 doi: 10.1109/TPWRS.2014.2374876 [228] Zhang Y C, Xu Y, Bu S Q, Dong Z, Zhang R. Online power system dynamic security assessment with incomplete PMU measurements: A robust white-box model. IET Generation, Transmission and Distribution, 2019, 13(5): 662−668 [229] Liu R D, Verbič G, Ma J. A new dynamic security assessment framework based on semi-supervised learning and data editing. Electric Power Systems Research, 2019, 172: 221−229 doi: 10.1016/j.jpgr.2019.03.009 [230] Zhang Y Q, Zhao Q, Tan B D, Yang J. A power system transient stability assessment method based on active learning. The Journal of Engineering, 2021, 2021(11): 715−723 doi: 10.1049/tje2.12068 [231] Ren C, Xu Y. Transfer learning-based power system online dynamic security assessment: Using one model to assess many unlearned faults. IEEE Transactions on Power Systems, 2020, 35(1): 821−824 doi: 10.1109/TPWRS.2019.2947781 [232] Čepin M. Event tree analysis. Assessment of Power System Reliability: Methods and Applications. London: Springer, 2011. 89−99 [233] Bajpai S, Gupta J P. Site security for chemical process industries. Journal of Loss Prevention in the Process Industries, 2005, 18(4−6): 301−309 doi: 10.1016/j.jlp.2005.06.011 -