2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于学习的鲁棒自适应评判控制研究进展

王鼎

曾喆昭, 刘文珏.自耦PID控制器.自动化学报, 2021, 47(2): 404-422 doi: 10.16383/j.aas.c180290
引用本文: 王鼎. 基于学习的鲁棒自适应评判控制研究进展. 自动化学报, 2019, 45(6): 1031-1043. doi: 10.16383/j.aas.c170701
Zeng Zhe-Zhao, Liu Wen-Jue. Self-coupling PID controllers. Acta Automatica Sinica, 2021, 47(2): 404-422 doi: 10.16383/j.aas.c180290
Citation: WANG Ding. Research Progress on Learning-based Robust Adaptive Critic Control. ACTA AUTOMATICA SINICA, 2019, 45(6): 1031-1043. doi: 10.16383/j.aas.c170701

基于学习的鲁棒自适应评判控制研究进展

doi: 10.16383/j.aas.c170701
基金项目: 

北京市自然科学基金 4162065

国家自然科学基金 61773373

详细信息
    作者简介:

    王鼎   北京工业大学信息学部教授.2009年获得东北大学理学硕士学位, 2012年获得中国科学院自动化研究所工学博士学位.主要研究方向为自适应与学习系统, 计算智能, 智能控制.E-mail:dingwang@bjut.edu.cn

Research Progress on Learning-based Robust Adaptive Critic Control

Funds: 

Beijing Natural Science Foundation 4162065

National Natural Science Foundation of China 61773373

More Information
    Author Bio:

      Professor at the Faculty of Information Technology, Beijing University of Technology. He received his master degree in operations research and cybernetics from Northeastern University, Shenyang, China and his Ph. D. degree in control theory and control engineering from Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2009 and 2012, respectively. His research interest covers adaptive and learning systems, computational intelligence, and intelligent control

  • 摘要: 在作为人工智能核心技术的机器学习领域,强化学习是一类强调机器在与环境的交互过程中进行学习的方法,其重要分支之一的自适应评判技术与动态规划及最优化设计密切相关.为了有效地求解复杂动态系统的优化控制问题,结合自适应评判,动态规划和人工神经网络产生的自适应动态规划方法已经得到广泛关注,特别在考虑不确定因素和外部扰动时的鲁棒自适应评判控制方面取得了很大进展,并被认为是构建智能学习系统和实现真正类脑智能的必要途径.本文对基于智能学习的鲁棒自适应评判控制理论与主要方法进行梳理,包括自学习鲁棒镇定,自适应轨迹跟踪,事件驱动鲁棒控制,以及自适应H控制设计等,并涵盖关于自适应评判系统稳定性、收敛性、最优性以及鲁棒性的分析.同时,结合人工智能、大数据、深度学习和知识自动化等新技术,也对鲁棒自适应评判控制的发展前景进行探讨.
  • 近年来, 多传感器信息融合技术得到广泛关注.经典Kalman滤波需要已知模型参数和噪声统计特性[1].而在实际应用中, 系统可能会出现模型参数或噪声统计特性未知情况.文献[2-4]研究了带未知噪声方差系统的自校正估计问题.文献[5]和[6]研究了模型参数和噪声方差未知系统的自校正融合估计问题.文献[7]针对带未知模型参数和噪声统计特性且噪声相关的多传感器系统, 分别应用RELS算法、Gevers-Wouters算法对未知模型参数和噪声统计特性进行辨识, 并提出了自校正融合估值器.文献[5-7]中在对未知模型参数进行融合处理时, 采用加权平均方法获得最终融合辨识器.该方法没有考虑不同传感器的局部参数辨识的差异, 不能保证融合后的参数估计精度都优于每个局部参数估计.上述文献所提出的辨识和估计算法都是基于完整的传感器观测数据, 而没有考虑数据的不完整现象.

    在实际的网络化系统或传感器网络中, 由于传感器老化或故障、以及传感器的观测数据在通信传输过程中由于带宽有限, 可能会出现数据丢失、衰减、延迟等问题, 使得估值器所收到的传感器数据具有不完整现象[8].文献[9]研究了带多丢包系统的最优估计问题.文献[10]将文献[9]的单传感器系统推广到了多传感器系统, 对带有不同丢包率的多传感器系统, 提出了集中式和分布式融合估值器.文献[11-14]考虑数据在传输过程中存在的丢失和延迟现象, 设计了相应的最优估值器.其中文献[13]在文献[12]的基础上研究了带随机乘性噪声、多丢包和滞后系统的最优估计问题.文献[14]考虑了过程噪声和观测噪声具有一步自相关和互相关的情况.文献[15]和[16]对带有数据包丢失的多传感器系统, 应用协方差信息方法设计了分布式融合估值器.在上述文献[9-17]中数据丢失现象均由一组满足伯努利分布的随机变量所描述, 这种数据丢失可以看作是观测数据发生衰减现象的一种特殊情况.文献[18]和[19]考虑了带随机参数矩阵、相关噪声和衰减观测系统的最优和次优估计问题.文献[20]中研究了带衰减观测系统的Kalman滤波估计问题, 同时分析了误差协方差的有界性和稳态特性.文献[21-22]对带衰减观测的随机不确定系统, 分别设计了多传感器分布式和序贯融合估值器.上述文献都是在假设观测丢失率或衰减率已知的情况下得到的最优估计结果.而没有考虑观测丢失率或衰减率未知的自校正估计问题.目前, 有关同时考虑带有未知模型参数和未知衰减观测率多传感器系统的自校正融合估计问题的报导甚少.

    基于以上文献分析, 本文将对带有未知模型参数和衰减观测率的多传感器随机系统, 应用相关函数和递推增广最小二乘算法, 分别在线辨识衰减观测的数学期望、方差和模型参数.应用线性无偏最小方差估计准则, 分别提出了分布式模型参数融合辨识器和自校正融合状态滤波器, 并分析算法的收敛性.

    考虑带衰减观测的多传感器随机系统

    $$ \begin{equation} {\boldsymbol x(t+1)} = {\Phi}{\boldsymbol x(t)}+{\Gamma}{\boldsymbol w}(t) \end{equation} $$ (1)
    $$ \begin{equation} {{y}_{i}}(t) = {\mu_{i}(t)}{ h_{i}}{\boldsymbol x}(t)+{v_{i}(t)}, \ i = 1, 2, \cdots, L \end{equation} $$ (2)

    其中状态$ \boldsymbol x(t)\in {\bf R}^n $, 观测$ {{y}_{i}}(t)\in {{\bf R}} $, 过程噪声$ \boldsymbol w(t)\in {\bf R}^r $, 观测噪声$ {{v}_{i}}(t)\in {{\bf R}} $, 下标$ i $表示第$ i $个传感器, $ L $表示传感器的个数. $ \left\{ {{\mu _i}(t)} \right\} $为一组在${[0, 1]}$区间取值的用来描述第$ i $个传感器衰减观测现象的标量随机变量.其中$ {\rm E}{[\mu_{i}(t)] = \alpha_{i}} $, $ {\rm Cov}[\mu_{i}(t)] = \sigma_{i}^2 $, $ {\rm E} $为数学期望符号, $ {\rm Cov} $为协方差符号. $ \left\{ {{\mu _i}(t)} \right\} $与其他随机变量不相关. $ \Phi $, $ \Gamma $, $ { h_{i}} $是适当维数矩阵.

    假设1. $ {\boldsymbol w}(t) $和$ {v_{i}(t)} $为零均值、方差阵分别为$ Q_{{\boldsymbol w}} $和$ Q_{v_{i}} $的不相关白噪声.

    假设2. 初值$ {\boldsymbol x(0)} $不相关于$ {\boldsymbol w}(t) $和$ {v_{i}(t)} $, 且$ {\rm{E}}\left\{ {\boldsymbol x(0)} \right\} = {\boldsymbol u_0}, {\rm{E}}\left\{ {\left[{\boldsymbol x(0) - {\boldsymbol u_0}} \right]{{\left[{\boldsymbol x(0) - {\boldsymbol u_0}} \right]}^{\rm{T}}}} \right\} = {P_0} $.其中$ {\rm T} $为转置号.

    假设3. $ \Phi $为稳定矩阵, ($ \Phi, { h_{i}} $)是完全可观对, ($ \Phi, \Gamma $)是完全可控对.

    假设4. $ \Phi $中部分参数未知, $ \left\{ {{\mu _i}(t)} \right\} $的数学期望$ {\alpha_i} $和方差$ \sigma_{i}^2 $未知.

    问题是基于观测$ ({y}_{i}(1), \cdots, {y}_{i}(t)) $, $ i = 1, 2, $ $ \cdots, L $, 辨识$ \Phi $中未知参数、$ \left\{ {{\mu _i}(t)} \right\} $的数学期望$ {\alpha_i} $和方差$ \sigma_{i}^2 $, 并求$ \Phi $中未知参数的融合辨识器$ {{\rm{\hat \Phi }}_o}(t) $和状态$ {\boldsymbol x(t)} $的自校正融合滤波器$ {\hat{\boldsymbol {x}}_s(t|t)} $.

    注1. 为了阅读方便, 这里对后文中经常遇到一些术语给予解释.局部滤波器, 即基于单个传感器的观测数据所获得的滤波器; 最优滤波器, 即系统的模型参数以及衰减观测期望和方差已知情况下, 获得的线性最小方差意义下的滤波器; 自校正滤波器, 即系统模型含有未知参数, 以及衰减观测期望和方差未知情况下, 通过辨识这些未知参数, 然后代入最优滤波算法中获得的滤波器.分布式融合滤波器, 即基于各个传感器的局部滤波器, 应用线性无偏最小方差矩阵加权融合估计算法[23]获得的融合滤波器.

    当系统模型参数、$ \left\{ {{\mu _i}(t)} \right\} $的数学期望$ {\alpha_i} $和方差$ \sigma_{i}^2 $已知时, 应用线性无偏最小方差意义下的矩阵加权融合估计算法[23]可获得分布式最优融合滤波器.下面给出实现过程.

    由式(2)可得

    $$ \begin{equation} {{y}_{i}}(t) = {\alpha_i}{ h_{i}}{\boldsymbol x(t)}+{V_i(t)} \end{equation} $$ (3)

    其中

    $$ \begin{equation} {V_i(t)} = (\mu_{i}(t)-{\alpha_i}){h_{i}}{\boldsymbol x(t)}+v_i(t) \end{equation} $$ (4)

    可计算其方差阵为

    $$ \begin{equation} Q_{{V_i(t)}} = {\rm E}[V_i^2(t)] = {{{\sigma}}_{i}^2}{ h_{i}}X(t){ h_{i}^{\rm T}}+Q_{v_i} \end{equation} $$ (5)

    状态二阶矩$ X(t) = {\rm E}[{\boldsymbol x}(t){\boldsymbol x}^{\rm T}(t)] $可递推计算如下:

    $$ \begin{equation} X(t + 1) = {\rm{\Phi }}X(t){{\rm{\Phi }}^{\rm{T}}} + {\rm{\Gamma }}{Q_{\boldsymbol w}}{{\rm{\Gamma }}^{\rm{T}}} \end{equation} $$ (6)

    初值为$ X(0) = {\boldsymbol \mu_0}{\boldsymbol \mu}_0^{\rm{T}}+ {P_0} $.由假设3可知$ X(t) $有界.

    下面引理1给出了最优局部滤波器算法; 引理2给出了互协方差计算公式; 引理3给出了分布式最优加权融合滤波算法.

    引理1[24]. 在假设1$ \sim $3下, 随机系统(1)和(3)基于每个传感器的观测有最优局部滤波器

    $$ \begin{equation} {\hat {\boldsymbol x}_i}(t + 1|t + 1) = {\Psi _{fi}}(t + 1){\hat {\boldsymbol x}_i}(t|t)+{{K}_i}(t + 1){{ y}_i}(t + 1) \end{equation} $$ (7)
    $$ \begin{equation} {{ K}_i}(t + 1) = {\Sigma _i}(t + 1|t) F_i^{\rm{T}}Q_{{{{C}}_i}}^{ - 1}(t + 1) \end{equation} $$ (8)
    $$ \begin{equation} {\Sigma _i}(t + 1|t) = \Phi {P_i}(t|t){\Phi ^{\rm{T}}} + \Gamma {Q_{\boldsymbol w}}{\Gamma ^{\rm{T}}} \end{equation} $$ (9)
    $$ \begin{equation} {Q_{{{{C}}_i}}}(t + 1) = { F_i}{\Sigma _i}(t + 1|t)F_i^{\rm{T}} + {Q_{{V_i}}}(t + 1) \end{equation} $$ (10)
    $$ \begin{equation} {P_i}(t + 1|t + 1) = [{I_n} - {{ K}_i}(t + 1){ F_i}]{\Sigma _i}(t + 1|t) \end{equation} $$ (11)

    其中$ { F_i} = {\alpha _i}{ h_i}, {\Psi _{fi}}(t + 1) = {\rm{[}}{I_n}-{{ K}_i}(t + 1){F_i}]\Phi $. $ {\hat {\boldsymbol x}_i}(t|t) $为第$ i $个传感器的局部滤波器, $ {K_i}(t + 1) $为相应的滤波增益, $ {P_i}(t|t) $为滤波误差方差阵.初值为$ {\hat {\boldsymbol x}_i}(0|0) = {\boldsymbol u_0}, {P_i}(0|0) = {P_0} $.

    引理2[24]. 任意两个局部滤波误差之间的互协方差阵$ {P_{ij}}(t|t) = {\rm E}[{\tilde {\boldsymbol x}_i}(t|t)\tilde {\boldsymbol x}_j^{\rm T}(t|t)] $ (其中滤波误差$ {\tilde {\boldsymbol x}_i}(t|t) = {\boldsymbol x}(t)-{\hat {\boldsymbol x}_i}(t|t) $)可递推计算如下:

    $$ \begin{align} {P_{ij}}&(t + 1|t + 1) = [{I_n} - { K_i}(t + 1){ F_i}]\times \\ &[\Phi{P_{ij}}(t|t){\Phi ^{\rm{T}}} + \Gamma {Q_{\boldsymbol w}}{\Gamma ^{\rm{T}}}]{[{I_n} - { K_j}(t + 1){ F_j}]^{\rm{T}}} \end{align} $$ (12)

    初值为$ {P_{ij}}(0|0) = {P_0} $.

    引理3[23]. 基于引理1的各局部滤波器和引理2的任意两个局部滤波误差之间的互协方差阵, 分布式最优矩阵加权融合滤波器可计算如下:

    $$ \begin{equation} {\hat {\boldsymbol x}_o}(t|t) = \sum\limits_{i = 1}^L {W_i^{}(t){{\hat {\boldsymbol x}}_i}(t|t)} \end{equation} $$ (13)

    加权矩阵计算为

    $$ \begin{equation} [W_1^{}(t), \cdots , W_L^{}(t)]{\kern 1pt} {\kern 1pt} {\kern 1pt} = {(e_{}^{\rm{T}}P_{}^{ - 1}(t|t)e)^{ - 1}}e_{}^{\rm{T}}P_{}^{ - 1}(t|t) \end{equation} $$ (14)

    其中$ e = [I_{n}, \cdots, I_{n}]^{\mathrm{T}}, P(t|t) = {\left[{{P_{ij}}(t|t)} \right]_{nL \times nL}} $是以$ {P_{ij}}(t|t) $为第$ (i, j) $元素的分块矩阵.融合滤波器的估计误差方差阵计算为

    $$ \begin{equation} {P_o}(t|t) = {(e_{}^{\rm T}P_{}^{ - 1}(t|t)e)^{ - 1}} \end{equation} $$ (15)

    且有$ {P_o}(t|t) \le {P_i}(t|t), i = 1, \cdots, L. $

    上一节我们针对系统模型精确已知时给出了分布式最优融合估计算法.而在实际应用中, 系统模型可能含有未知参数.当$ \Phi $中含有未知参数时, 本节采用RELS算法辨识未知模型参数, 并对辨识得到的$ L $组参数估值进行加权融合, 获得模型参数的分布式融合辨识器.下面给出具体实现过程.

    由式(1)可得

    $$ \begin{equation} {\boldsymbol x}(t) = {({I_n} - {q^{ - 1}}\Phi )^{ - 1}}{q^{ - 1}}\Gamma {\boldsymbol w}(t) \end{equation} $$ (16)

    式中$ q^{-1} $为单位滞后算子, 即$ {q^{ - 1}}{\boldsymbol x}(t) = {\boldsymbol x}(t - 1) $.将式(16)代入式(3)得

    $$ \begin{equation} {y_i}(t) = {\alpha _i}{ h_i}{({I_n} - {q^{ - 1}}\Phi )^{ - 1}}{q^{ - 1}}\Gamma {\boldsymbol w}(t) + {V_i}(t) \end{equation} $$ (17)

    将式(17)进一步化简得

    $$ \begin{equation} A({q^{ - 1}}){ y_i}(t) = {\alpha _i}{B_i}({q^{-1}}){\boldsymbol w}(t)+ A({q^{ - 1}}){V_i}(t) \end{equation} $$ (18)

    其中$ A({q^{-1}}) = \det ({I_n}-{q^{ - 1}}\Phi), {B_i}({q^{ - 1}}) = { h_i}{\rm{adj}}({I_n}-{q^{ - 1}}\Phi){q^{ - 1}}\Gamma $, 式中det和adj分别表示矩阵行列式和伴随矩阵. $ A(q^{-1}) $和$ B_i(q^{-1}) $具有如下多项式形式:

    $$ \begin{equation} A({q^{ - 1}}) = 1 + {a_1}{q^{ - 1}} + \cdots + {a_{{n_a}}}{q^{ - {n_a}}} \end{equation} $$ (19)
    $$ \begin{equation} {B_i}({q^{ - 1}}) = {B_{i1}}{q^{ - 1}} + \cdots + {B_{i{n_{{B_i}}}}}{q^{ - {n_{{B_i}}}}} \end{equation} $$ (20)

    其中$ {a_k}, k = {\rm{1}}, \cdots, {n_a} $和$ {B_{ik}}, k = {\rm{1}}, \cdots, {n_{{B_i}}} $是多项式系数. $ n_A, n_{B_i} $分别为$ A(q^{-1}) $和$ B_i(q^{-1}) $的阶次.式(18)等号右侧两个滑动平均过程可以等价为一个稳定的滑动平均过程$ {D_i}({q^{ - 1}}){\varepsilon _i}(t) $[24], 即

    $$ \begin{equation} {D_i}({q^{ - 1}}){\varepsilon _i}(t) = {\alpha _i}{B_i}({q^{ - 1}}){\boldsymbol w}(t) + A({q^{ - 1}}){V_i}(t) \end{equation} $$ (21)

    其中$ {\varepsilon _i}(t) $是零均值且带有未知噪声方差$ \sigma _{{\varepsilon _i}}^2 $的白噪声, $ {D_i}({q^{ - 1}}) $具有如下多项式形式:

    $$ \begin{equation} {D_i}({q^{ - 1}}) = 1 + {d_{i1}}{q^{ - 1}} + \cdots + {d_{i{n_{{D_i}}}}}{q^{ - {n_{{D_i}}}}} \end{equation} $$ (22)

    其中$ {d_{ik}}, k = 1, \cdots, {n_{{D_i}}} $是多项式$ {D_i}({q^{ - 1}}) $的系数, $ {n_{{D_i}}} $是$ {D_i}({q^{ - 1}}) $的阶次.

    将式(18)重写为

    $$ \begin{equation} A({q^{ - 1}}){y_i}(t) = {D_i}({q^{ - 1}}){\varepsilon _i}(t) \end{equation} $$ (23)

    令$ \varphi _i^{\rm{T}}(t) = [-{y_i}(t- 1), \cdots, -{y_i}(t- {n_a}), {\hat \varepsilon _i}(t - 1), \cdots, {\hat \varepsilon _i}(t -{n_{{D_i}}})], {\vartheta _i} = [{a_1}, \cdots, {a_{{n_a}}}, {d_{i1}}, \cdots, {d_{i{n_{{D_i}}}}}{{\rm{]}}^{\rm{T}}} $, 则式(23)可表示为

    $$ \begin{equation} {y_i}(t) = \varphi _i^{\rm{T}}(t){\vartheta _i} + {\varepsilon _i}(t) \end{equation} $$ (24)

    参数$ {a_k}, k = 1, \cdots, {n_a}; {d_{ik}}, k = 1, \cdots, {n_{{D_i}}} $未知.

    基于每个单传感器的观测数据, 应用RELS算法[24]可得到局部参数估计为

    $$ \begin{equation} {\hat \vartheta _i}(t + 1) = {\hat \vartheta _i}(t) + {M_i}(t + 1){\hat \varepsilon _i}(t{\rm{ + }}1) \end{equation} $$ (25)
    $$ \begin{equation} {\hat \varepsilon _i}(t + 1) = {y_i}(t + 1) - \varphi _i^{\rm{T}}(t + 1){\hat \vartheta _i}(t) \end{equation} $$ (26)
    $$ \begin{equation} {M_i}(t + 1) = \frac{{{Z_i}(t){\varphi _i}(t + 1)}}{{1 + \varphi _i^{\rm{T}}(t + 1){Z_i}(t){\varphi _i}(t + 1)}} \end{equation} $$ (27)
    $$ \begin{equation} {Z_i}(t + 1) = [{I_{{n_a} + {n_{{D_i}}}}} - {M_i}(t + 1)\varphi _i^{\rm{T}}(t + 1)]{Z_i}(t) \end{equation} $$ (28)

    代初值$ {\hat \vartheta _i}{\rm{(}}0) = 0, \; {Z_i}(0) = {\beta _i}I, {\beta _i} $为充分大的正数, 且规定$ {\hat \varepsilon _i}(j) = 0, \; {y_i}(j) = 0\; (j \le 0) $.

    由文献[24]可知, RELS算法参数估计是一致的, 即$ {\hat \vartheta _i}(t) \to {\vartheta _i}, t \to \infty, w.p.1 $.符号"$ w.p.1. $"表示"以概率1".

    由式(25)可得基于单传感器的局部参数估计误差$ {\tilde \vartheta _i}(t) = {\vartheta _i} - {\hat \vartheta _i}(t) $满足如下方程:

    $$ \begin{align} {\tilde \vartheta _i}(t + 1) = \, & [{I_{{n_A} + {n_{{D_i}}}}} - {M_i}(t + 1)\varphi _i^{\rm{T}}(t + 1)]{\tilde \vartheta _i}(t) -\\ & {M_i}(t + 1){\varepsilon _i}(t + 1) \end{align} $$ (29)

    于是, 任意两个局部参数估计之间的估计误差协方差阵$ {P_{{\vartheta _{ij}}}}(t) = {\rm{E}}[{\tilde \vartheta _i}(t)\tilde \vartheta _j^{\rm{T}}(t)] $可计算如下:

    $$ \begin{align} {P_{{\vartheta _{ij}}}}(t + 1) = \, &[{I_{{n_A} + {n_{{D_i}}}}} - {M_i}(t + 1)\varphi _i^{\rm{T}}(t + 1)]{P_{{\vartheta _{ij}}}}(t) \\ &{[{I_{{n_A} + {n_{{D_j}}}}} - {M_j}(t + 1)\varphi _j^{\rm{T}}(t + 1)]^{\rm{T}}} +\\ & {M_i}(t + 1)\hat \sigma _{{\varepsilon _{ij}}}^2(t + 1)M_j^{\rm{T}}(t + 1) \end{align} $$ (30)

    当$ i = j $时, $ {P_{{\vartheta _{ii}}}}(t) $即为局部参数估计误差方差阵$ {P_{{\vartheta _i}}}(t) $. $ {\varepsilon _i}(t) $与$ {\varepsilon _j}(t) $之间的互协方差$ \sigma _{{\varepsilon _{ij}}}^2 $可近似计算如下:

    $$ \begin{equation} \hat \sigma _{{\varepsilon _{ij}}}^2(t) = \frac{1}{t}\sum\limits_{k = 1}^t {\hat \varepsilon _i^{}(k)\hat \varepsilon _j^{}(k)} \end{equation} $$ (31)

    它可递推地计算为

    $$ \begin{equation} \hat \sigma _{{\varepsilon _{ij}}}^2(t) = \hat \sigma _{{\varepsilon _{ij}}}^2(t - 1) + \frac{1}{t}[\hat \varepsilon _i^{}(t)\hat \varepsilon _j^{}(t) - \hat \sigma _{{\varepsilon _{ij}}}^2(t - 1)] \end{equation} $$ (32)

    初值为$ \hat \sigma _{{\varepsilon _{ij}}}^2(0) = {y_i}(0){y_j}(0), i, j = 1, \cdots, L $.

    令$ {\vartheta _A} = {[{a_1}, \cdots, {a_{{n_a}}}]^{\rm{T}}} $, 则有$ {\vartheta _{A_i}} = [{I_{{n_a}}}, 0]{\vartheta _i} $.于是, 我们有参数$ {\vartheta _A} $基于传感器$ i $的局部估计和估计误差协方差阵如下:

    $$ \begin{equation} {\hat \vartheta _{Ai}}(t) = [{I_{{n_a}}}, 0]{\hat \vartheta _i}(t) \end{equation} $$ (33)
    $$ \begin{equation} {P_{{\vartheta _{Aij}}}}(t) = [{I_{{n_a}}}, 0]{P_{{\vartheta _{ij}}}}(t){[{I_{{n_a}}}, 0]^{\rm{T}}} \end{equation} $$ (34)

    当$ i = j $时, $ {P_{{\vartheta _{Aii}}}}(t) $即为局部参数$ {\vartheta _A} $的估计误差方差阵$ {P_{{\vartheta _{Ai}}}}(t) $.

    由式(18)可知, 参数$ {\vartheta _A} = {[{a_1}, \cdots, {a_{{n_a}}}]^{\rm{T}}} $是$ \Phi $中未知参数的函数.假设$ \Phi $中未知模型参数组成的列向量为$ {\Lambda ^{[{\rm{\Phi }}]}} \in {{\boldsymbol{\rm R}}^{{n_\Phi }}}, {n_\Phi } \le {n_a} $, 且可由$ {\vartheta _A} $唯一确定.设${\Lambda ^{[{\rm{\Phi }}]}}$与${\vartheta _A}$之间满足如下关系:

    $$ \begin{equation} {\Lambda ^{[{\rm{\Phi }}]}} = f({\vartheta _A}) \end{equation} $$ (35)

    其中$ f({\vartheta _A}) $为关于$ {\vartheta _A} $的线性或非线性函数.

    1) 如果$ f({\vartheta _A}) $是线性函数, 我们将式(35)重写为

    $$ \begin{equation} {\Lambda ^{{\rm{[\Phi }}]}} = S{\vartheta _A} + \gamma \end{equation} $$ (36)

    式中$ S, \gamma $为适当维数的系数阵.

    那么, 基于传感器$ i $的数据获得的$ \Phi $中未知模型参数在$ t $时刻的局部估计为

    $$ \begin{equation} \hat \Lambda _i^{[{\rm{\Phi }}]}(t) = S{\hat \vartheta _{Ai}}(t) + \gamma \end{equation} $$ (37)

    定义局部估计误差方差$ {P_{\Lambda _i^{[{\rm{\Phi }}]}}}(t) = {\rm{E}}[\tilde \Lambda _i^{[{\rm{\Phi }}]}(t){(\tilde \Lambda _i^{[{\rm{\Phi }}]}(t))^{\rm{T}}}] $, 任意两个局部参数估值器之间的估计误差互协方差为$ {P_{\Lambda _{ij}^{[{\rm{\Phi }}]}}}(t) = {\rm{E}}[\tilde \Lambda _i^{{\rm{[\Phi }}]}(t){(\tilde \Lambda _j^{[{\rm{\Phi]}}}(t))^{\rm{T}}}] $, 其中估计误差$ \tilde \Lambda _i^{[\Phi]}(t) = \Lambda _{}^{[\Phi]} - \hat \Lambda _i^{[\Phi]}(t) = S{\tilde \vartheta _{Ai}}(t) $, 则可获得

    $$ \begin{equation} {P_{\Lambda _i^{[{\rm{\Phi }}]}}}(t) = S{P_{{\vartheta _{Ai}}}}(t){S^{\rm{T}}}, {P_{\Lambda _{ij}^{[{\rm{\Phi }}]}}}(t) = S{P_{{\vartheta _{Aij}}}}(t){S^{\rm{T}}} \end{equation} $$ (38)

    2) 如果$ f({\vartheta _A}) $是非线性函数, 我们将$ f({\vartheta _A}) $在点$ {\hat \vartheta _{Ai}}(t - 1) $处进行线性化, 有

    $$ \begin{equation} \Lambda _i^{[\Phi ]}(t) \approx S({\hat \vartheta _{Ai}}(t - 1)){\vartheta _A} + \gamma ({\hat \vartheta _{Ai}}(t - 1)) \end{equation} $$ (39)

    那么, 基于传感器$ i $的数据获得的$ \Phi $中未知模型参数在$ t $时刻的局部估计为

    $$ \begin{equation} \hat \Lambda _i^{[\Phi ]}(t) \approx S({\hat \vartheta _{Ai}}(t - 1)){\hat \vartheta _{Ai}}(t) + \gamma ({\hat \vartheta _{Ai}}(t - 1)) \end{equation} $$ (40)

    类似地, 可以得到相应的估计误差方差$ {P_{\Lambda _i^{[{\rm{\Phi }}]}}}(t) $和互协方差$ P_{ij}^{{\Lambda ^{[{\rm{\Phi }}]}}}(t) $分别为

    $$ \begin{align} &{P_{\Lambda _i^{[{\rm{\Phi }}]}}}(t) = S({\hat \vartheta _{Ai}}(t - 1)){{{P}}_{{\vartheta _{Ai}}}}(t){S^{\rm{T}}}({\hat \vartheta _{Ai}}(t - 1)) \\ &{P_{\Lambda _{ij}^{[{\rm{\Phi }}]}}}(t) = S({\hat \vartheta _{Ai}}(t - 1)){P_{{\vartheta _{Aij}}}}(t){S^{\rm{T}}}({\hat \vartheta _{Aj}}(t - 1)) \end{align} $$ (41)

    其中$ i, j = 1, \cdots, L $.通过以上算法, 基于$ L $个传感器的数据可获得$ \Phi $中未知参数在时刻$ t $处的局部估值$ \hat \Lambda _i^{[\Phi]}(t), i, j = 1, \cdots, L $.由于对$ \Phi $中未知参数估计了$ L $次, 因此我们可应用线性无偏最小方差加权融合估计算法[23]将它们进行融合处理.下面定理1给出了未知参数分布式融合估计的结果.

    定理1. 基于局部参数估计$ \hat \Lambda _i^{[{\rm{\Phi }}]}(t) $、局部参数估计误差方差阵$ {P_{\Lambda _i^{[{\rm{\Phi }}]}}}(t) $, 以及参数估计误差互协方差$ {P_{\Lambda _{ij}^{[{\rm{\Phi }}]}}}(t) $, 可得在线性无偏最小方差意义下的矩阵加权参数融合辨识器如下:

    $$ \begin{equation} \hat \Lambda _o^{{\rm{[\Phi ]}}}(t) = \sum\limits_{i = 1}^L {{W_{\Lambda _i^{{\rm{[\Phi }}]}}}(t)\hat \Lambda _i^{[{\rm{\Phi }}]}(t)} \end{equation} $$ (42)

    参数融合加权矩阵计算为

    $$ \begin{align} [{W_{\Lambda _i^{[{\rm{\Phi }}]}}}(t), &\cdots , {W_{\Lambda _L^{[{\rm{\Phi }}]}}}(t)] = \\ &{(e_{{\Lambda ^{[\Phi ]}}}^{\rm{T}}P_{{\Lambda ^{[\Phi ]}}}^{ - 1}(t){e_{{\Lambda ^{[\Phi ]}}}})^{ - 1}}e_{{\Lambda ^{[\Phi ]}}}^{\rm{T}}P_{{\Lambda ^{[\Phi ]}}}^{ - 1}(t) \end{align} $$ (43)

    其中$ {e_{{\Lambda ^{[\Phi]}}}} = {[{\begin{array}{*{20}{c}} {{I_{{n_\Phi }}}} & \cdots & {{I_{{n_\Phi }}}} \\ \end{array}}]^{\rm T}}_{{n_\Phi }L \times {n_\Phi }}, {P_{{\Lambda ^{[\Phi]}}}}(t) = {[{{P_{\Lambda _{ij}^{[{\rm{\Phi }}]}}}(t|t)}]_{{n_\Phi }L \times {n_\Phi }L}} $是以$ {P_{\Lambda _{ij}^{[{\rm{\Phi }}]}}}(t|t) $为第$ (i, j) $元素的分块矩阵.

    参数融合辨识器的估计误差方差阵计算为

    $$ \begin{equation} {P_{\Lambda _o^{[{\rm{\Phi }}]}}}(t) = {(e_{{\Lambda ^{[\Phi ]}}}^{\rm{T}}P_{{\Lambda ^{[\Phi ]}}}^{ - 1}(t){e_{{\Lambda ^{[\Phi ]}}}})^{ - 1}} \end{equation} $$ (44)

    且$ {P_{\Lambda _o^{{\rm{[\Phi]}}}}}(t) \le {P_{\Lambda _i^{[{\rm{\Phi }}]}}}(t), i = 1, \cdots, L. $

    基于以上算法可以获得未知模型参数融合辨识器$ \Lambda _o^{[{\rm{\Phi]}}}(t) $, 进而可获得$ \Phi $的融合估值$ {{\rm{\hat \Phi }}_o}(t) $.由前面的分析, 有$ {\hat \Phi _o}(t) \to \Phi, t \to \infty, w.p.1 $.

    注2. 文献[5-7, 24]在进行模型参数融合辨识时, 将各传感器辨识得到的模型参数采用加权平均方法进行融合处理.该方法不能保证所获得的参数融合辨识器的估计精度不低于所有的局部参数估计.而本文采用线性无偏最小方差分布式矩阵加权融合算法[23]对各传感器辨识得到的参数进行融合处理.所获得的参数融合辨识器的估计精度不低于所有的局部参数估计.因此, 本文的矩阵加权融合的参数估计精度高于加权平均融合的参数估计精度.这在后面的仿真研究中也能看到.

    当各传感器的衰减观测率未知时, 为了能应用第2节中的算法获得状态估计, 我们需要辨识描述衰减观测的随机变量$ \{ {\mu _i}(t)\} $的均值和方差.下面我们采用相关函数来辨识它们.

    将模型参数融合辨识器$ {\hat \Phi _o}(t) $代入式(6)中, 得到

    $$ \begin{equation} \hat X(t + 1) = {\hat \Phi _o}(t) \hat X(t)\hat \Phi _o^{\rm{T}}(t) + {\rm{\Gamma }}{Q_{\boldsymbol w}}{{\rm{\Gamma }}^{\rm{T}}} \end{equation} $$ (45)

    由式(2)可计算零步相关函数$ R_i(t, 0) = {\rm E}[y_i^2(t)] $为

    $$ \begin{equation} {R_i}(t, 0) = (\alpha _i^2 + \sigma _i^2){h_{i}}{{\hat X}}(t)h_{i}^{\rm{T}} + {{{Q}}_{{v_i}}} \end{equation} $$ (46)

    一步相关函数$ {{{R}}_i}(t, {\rm{1}}) = {\rm{E}}[{y_i}(t){y_i}(t- {\rm{1}})] $为

    $$ \begin{equation} {R_i}(t, 1) = \alpha _i^2{ h_{i}}{\hat \Phi _o}(t - 1)\hat X(t - 1) h_{i}^{\rm{T}} \end{equation} $$ (47)

    零阶和一阶相关函数阵$ {{{R}}_i}(t, r), r = 0, {\rm{1}} $可通过如下采样相关函数$ {{{\hat R}}_i}(t, r) $逼近:

    $$ \begin{align} {\hat R_i}(t, r) \approx &\frac{1}{t}\sum\limits_{s = 1}^t {{{ y}_i}(s){{y}_i}(s - r)} = {\hat R_i}(t - 1, r) + \\ & \frac{1}{t}[{y_i}(t){y_i}(t - r) - {\hat R_i}(t - 1, r)] \end{align} $$ (48)

    初值为$ {{{\hat R}}_i}(0, 0) = {\rm E}[y_i^2{\rm{(0)]}}, \; {{{\hat R}}_i}(0, 1){\rm{ = 0}} $.

    最后, 由式(47)可求出随机变量$ {\rm{\{ }}{\mu _i}{\rm{(}}t{\rm{)\} }} $的数学期望为

    $$ \begin{equation} {\hat \alpha _i}(t) = \sqrt {\frac{{{{\hat R}_i}(t, 1)}}{{{{ h}_i}{{\hat \Phi }_o}(t - 1)\hat X(t - 1){ h}_i^{\rm{T}}}}} \end{equation} $$ (49)

    将式(49)代入式(46)可得随机变量$ {\rm{\{ }}{\mu _i}{\rm{(}}t{\rm{)\} }} $的方差为

    $$ \begin{equation} \hat \sigma _i^2(t) = \frac{{{{\hat R}_i}(t, 0) - {{{Q}}_{{v_i}}}}}{{ {h_{i}}{{\hat X}}(t)h_{i}^{\rm{T}}}} - \hat \alpha _i^2(t) \end{equation} $$ (50)

    通过式(49)可获得$ {\alpha _i} $在$ t $时刻的估值.由第3节可知$ {\hat \Phi _o}(t) \to \Phi, t \to \infty $.由$ \Phi $的稳定性和式(45)可得$ \hat X(t) \to X(t), t \to \infty $.又由随机过程的遍历性可得$ {{{\hat R}}_i}(t, r) \to {{{R}}_i}(t, r), t \to \infty $.因而, 有估值$ {\hat \alpha _i}(t) $, $ \hat \sigma _i^2(t) $具有一致性, 即

    $$ \begin{equation} {\hat \alpha _i}(t) \to {\alpha _i}, \hat \sigma _i^2(t) \to \sigma _i^2, t \to \infty , w.p.1 \end{equation} $$ (51)

    进一步, 有$ {\hat Q_{{V_i}}}(t) = \hat \sigma _i^2(t){{{{ h}}}_i}\hat X(t){{ h}_i^{\rm{T}}} + {Q_{{v_i}}} \to {Q_{{V_i}}}(t) = \sigma _i^2{{{ h}}_i}X(t){{{ h}}_i^{\rm{T}}} + {Q_{{v_i}}}, t \to \infty, w.p.1 $.

    将每时刻辨识的$ {\hat \Phi _o}(t) $、$ {\hat \alpha _i}(t) $和$ {\hat Q_{{V_i}}}(t) $替代第2节中的局部最优滤波器、互协方差阵和分布式融合滤波算法中的$ \Phi $、$ {\alpha _i} $和$ {Q_{{V_i}}}(t) $, 可得到相应的自校正状态估计算法.为了后文阐述方便, 记相应的自校正局部滤波器、预报误差方差、滤波误差协方差、增益、融合滤波器分别为$ {\hat {\boldsymbol x}_{si}}(t|t) $、$ {\hat \Sigma _{si}}(t|t - 1) $、$ {\hat P_{sij}}(t|t) $、$ {\hat K_{si}}(t) $和$ {\hat {\boldsymbol x}_s}(t|t) $.

    下面的引理4给出了DESA方法, 应用DESA方法可证明自校正融合状态滤波器$ {\hat {\boldsymbol x}_s}(t|t) $的收敛性.

    引理4[24]. 考虑动态误差系统

    $$ \begin{equation} \delta (t) = T(t)\delta (t - 1) + u(t) \end{equation} $$ (52)

    其中$ t \ge 0 $, 输出$ \delta (t) \in{{\bf R}}^n $, 输入$ u(t) \in {{\bf R}}^n $, 并且矩阵$ T(t) \in {{{\bf R}}^{n \times n}} $是一致渐近稳定的.若$ u(t) $是有界的, 则$ \delta {\rm{(}}t{\rm{)}} $是有界的.当$ t \to \infty $时, 若$ u(t) \to 0 $, 则$ \delta (t) \to 0 $.

    记带已知参数$ \Phi $、$ {\alpha _i} $和$ \sigma _i^2 $的系统(1)和(3)为$ ({\rm{\Phi }}, {\rm{\Gamma}}, {F_i}, {Q_{\boldsymbol w}}, {Q_{{V_i}}}(t)) $, 相应的带时变参数$ {{\rm{\hat \Phi }}_o}(t) $、$ {\hat \alpha _i}(t) $和$ \hat \sigma_i^2(t) $的系统(1)和(3)为$ ({{\rm{\hat \Phi }}_o}(t), {\rm{\Gamma }}, {\hat F_i}(t), {Q_{\boldsymbol w}}, {\hat Q_{{V_i}}}(t)) $.

    引理5[24]. 在假设1$ \, \sim\, $4下, 系统$ ({\rm{\Phi }}, {\rm{\Gamma }}, $ $ {F_i}, $ $ {Q_{\boldsymbol w}}, $ $ {Q_{{V_i}}}(t)) $和$ ({{\rm{\hat \Phi }}_o}(t), \Gamma, {\hat F_i}(t), {Q_{\boldsymbol w}}, {\hat Q_{{V_i}}}(t)) $是一致完全可观和一致完全可控的; 系统$ ({\rm{\Phi }}, \Gamma, {F_i}, {Q_{\boldsymbol w}}, {Q_{{V_i}}}(t)) $的最优局部滤波器的状态转移阵$ {\Psi _{fi}}(t) = [{I_n}-{K_i}(t){F_i}]{\rm{\Phi }} $是一致渐近稳定的, 且$ {K_i}(t) $是有界的; 系统$ ({{\rm{\hat \Phi }}_o}(t), \Gamma, {\hat F_i}(t), {Q_{\boldsymbol w}}, {\hat Q_{{V_i}}}(t)) $的自校正局部滤波器的状态转移阵$ {\hat \Psi _{sfi}}(t) = [{I_n}-{\hat K_{si}}(t){\hat F_i}{\rm{(}}t{\rm{)}}]{{\rm{\hat \Phi }}_o}(t) $是一致渐近稳定的, 且$ {\hat K_{si}}(t) $是有界的.而且$ \Delta {\hat K_i}(t) = {\hat K_{si}}(t)-{K_i}(t)\to 0 $和$ \Delta {\hat \Psi _{fi}}(t) = {\hat \Psi _{sfi}}(t)-{\Psi _{fi}}(t) \to 0 $, $ t \to\infty $.自校正局部滤波误差方差阵$ {\hat P_{si}}(t|t) $收敛于最优局部滤波误差方差阵$ {P_i}(t|t) $, 自校正滤波误差互协方差阵$ {\hat P_{sij}}(t|t) $收敛于最优滤波误差互协方差阵$ {P_{ij}}(t|t) $.

    定理2. 在假设1$ \, \sim\, $4下, 自校正局部滤波器$ {\hat {\boldsymbol x}_{si}}(t|t) $收敛于最优局部滤波器$ {\hat {\boldsymbol x}_i}(t|t) $, 即

    $$ \begin{equation} [{\hat {\boldsymbol x}_{si}}(t|t) - {\hat {\boldsymbol x}_i}(t|t)] \to 0, t \to \infty \end{equation} $$ (53)

    证明. 由式(7)可得自校正滤波器为

    $$ \begin{equation} {\hat {\boldsymbol x}_{si}}(t|t) = {\hat \Psi _{sfi}}(t){\hat {\boldsymbol x}_{si}}(t - 1|t - 1)+{\hat K_{si}}(t){y_i}(t) \end{equation} $$ (54)

    注意到$ {\hat K_{si}}(t) $和$ {y_i}(t) $是有界的, $ {\hat \Psi _{sfi}}(t) $是一致渐近稳定的, 应用引理4得到$ {\hat {\boldsymbol x}_{si}}(t|t) $是有界的.令$ {\delta _i}(t) = {\hat {\boldsymbol x}_{si}}(t|t) - {\hat {\boldsymbol x}_i}(t|t) $, 式(54)减式(7)有动态误差方程

    $$ \begin{equation} {\delta _i}(t) = {\Psi _{fi}}(t){\delta _i}(t - 1) + {u_i}(t) \end{equation} $$ (55)

    其中$ {u_i}(t) = \Delta {\hat \Psi _{fi}}(t){\hat {\boldsymbol x}_{si}}(t - 1|t - 1) + \Delta {\hat K_i}(t){y_i}(t) $.根据$ {\hat {\boldsymbol x}_{si}}(t|t) $和$ { y_i}(t) $的有界性, 由$ \Delta {\hat K_i}(t)\to 0 $和$ \Delta{\hat \Psi _{fi}}(t){\to0} $, 有$ {u_i}(t)\to0 $.对式(55)应用引理4, 当$ t \to \infty $时, 有$ {\delta _i}(t) \to 0 $, 即式(53)成立.

    定理3. 在假设1$ \, \sim\, $4下, 自校正加权融合滤波器$ {\hat {\boldsymbol x}_s}(t|t) $收敛于最优加权融合滤波器$ {\hat {\boldsymbol x}_o}(t|t) $, 即

    $$ \begin{equation} [{\hat {\boldsymbol x}_s}(t|t) - {\hat {\boldsymbol x}_o}(t|t)] \to 0, t \to \infty \end{equation} $$ (56)

    证明. 由引理3和引理5有$ {W_i}(t) $有界且$ \Delta {\hat W_i}(t) = [{\hat W_{si}}(t) -{W_i}(t)] \to 0 $.由式(53)以及$ {\hat {\boldsymbol x}_{si}}(t|t) $的有界性可得

    $$ \begin{align} {\hat {\boldsymbol x}_s}(t|t) & - {\hat {\boldsymbol x}_o}(t|t) = \sum\limits_{i = 1}^L {{W_i}(t)[{{\hat {\boldsymbol x}}_{si}}(t|t) - {{\hat {\boldsymbol x}}_i}(t|t)} ]+ \\ &\sum\limits_{i = 1}^L {\Delta {{\hat W}_i}(t){{\hat {\boldsymbol x}}_{si}}(t|t)} \to 0 \end{align} $$ (57)

    即式(56)成立.

    考虑带三传感器系统(1)和(2), 其中系数阵为$ \Phi = \left[{\begin{array}{*{20}{c}} {{a_{11}}} & {{a_{12}}} \\ {0.4} & { - 0.8} \\ \end{array}} \right], \Gamma = \left[{\begin{array}{*{20}{c}} {0.5} \\ {0.6} \\ \end{array}} \right], {{h}_1} = \left[{\begin{array}{*{20}{c}} {0.5} & {1.2} \\ \end{array}} \right] $, $ {{ h}_2} = \left[{\begin{array}{*{20}{c}} {0.6} & {1.9} \\ \end{array}}\right], {{ h}_3} = \left[{\begin{array}{*{20}{c}} {1.4} & 2 \\ \end{array}} \right], $噪声方差为$ {Q_{\boldsymbol w}} = 3, {Q_{{v_1}}} = 2, {Q_{{v_2}}} = 0.4, {Q_{{v_3}}} = 1, \left\{ {{\mu _i}(t)} \right\} $, $ i = 1, 2, 3 $为在$ [0, 1] $区间取值的标量随机变量.取初值$ {\hat{\boldsymbol x}_i}(0|0) = 0, {P_{ij}}(0|0) = 0.1{I_2}, i, j = 1, 2, 3 $.假设$ {a_{11}}, {a_{12}}, \{ {\mu _i}(t)\} $的数学期望$ {\alpha _i} $与方差$ \sigma _i^2, i = 1, 2, 3 $, 未知.目的是辨识未知模型参数$ {a_{11}} $和$ {a_{12}} $、期望$ {\alpha _i} $和方差$ \sigma _i^2, i = 1, 2, 3 $并求自校正融合状态滤波器.

    在仿真中假设未知模型参数$ {a_{11}} = 0.6 $, $ {a_{12}} = -0.2 $, $ {\mu _i}(t), i = 1, 2, 3 $, 的概率分布分别为$ P{\rm{\{ }}{\mu _1}(t){\rm{ = 0.3\} = 0.3}} $, $ P{\rm{\{ }}{\mu _1}(t){\rm{ = 0.5\} = 0.2}} $, $ P{\rm{\{ }}{\mu _1}(t){\rm{ = 1\} = 0.5}} $, $ P{\rm{\{ }}{\mu_2}(t){\rm{ = 0.4\} = 0.4}} $, $ P{\rm{\{ }}{\mu_2}(t) {\rm{ = 0.7\} = 0.3}} $, $ P{\rm{\{ }}{\mu_2}(t){\rm{ = 0.9\} = 0.3}} $, $ P{\rm{\{ }}{\mu_3}(t){\rm{ = 0.1\}}} $ $ {{ = 0.2}} $, $ P{\rm{\{ }}{\mu_3}(t){\rm{ = 0.6\} = 0.6}} $, $ P{\rm{\{ }}{\mu_3}(t){\rm{ = 0.9\} = 0.2}} $.我们可以计算$ {\mu _i}(t) $, $ i = 1, 2, 3 $的数学期望和方差分别为$ {\alpha _1} = 0.69 $, $ {\alpha _2}{\rm{ = }}0.64 $, $ {\alpha_3} = 0.56 $, $ \sigma _1^2 = 0.1009 $, $ \sigma_2^2 = 0.0444 $, $ \sigma_3^2 = 0.0664 $.

    为了与文献[5-7, 24]中的参数加权平均融合算法相比较.如下给出了模型参数$ {a_{11}} $和$ {a_{12}} $的局部估计误差方差、加权平均估计误差方差以及分布式加权融合估计误差方差算法.

    a) 局部参数估计和分布式加权融合估计:

    根据前面的第3节, 可知未知模型参数与$ {\vartheta _A} = {[{a_1}, {a_2}]^{\rm{T}}} $有如下线性关系

    $$ \begin{equation} {\Lambda ^{[\Phi ]}} = S{\vartheta _A} + \gamma \end{equation} $$ (58)

    其中$ {\Lambda ^{[\Phi]}} = \left[{\begin{array}{*{20}{c}} {{a_{11}}} \\ {{a_{12}}} \\ \end{array}} \right] $, $ S = \left[{\begin{array}{*{20}{c}} { - 1} & 0 \\ {{ {{ - {a_{22}}} \over {{a_{21}}}}}} & {{ {{ - 1} \over {{a_{21}}}}}} \\ \end{array}} \right] $, $ \gamma = \left[{\begin{array}{*{20}{c}} {{a_{22}}} \\ {{ {{a_{_{22}}^2} \over {{a_{21}}}}}} \\ \end{array}} \right]. $

    ${\Lambda ^{[{\rm{\Phi }}]}}$的局部估值为

    $$ \begin{equation} \hat \Lambda _i^{[\Phi ]}(t) = S{\hat \vartheta _{Ai}}(t) + \gamma , i = 1, 2, 3 \end{equation} $$ (59)

    $ {\hat \vartheta _{Ai}}(t) $可由最小二乘算法辨识, 将$ S, {\hat \vartheta _{Ai}}(t), \gamma $代入式(38)中, 可分别获得局部估值$ \hat \Lambda _i^{[\Phi]}(t) $和局部估计误差方差$ {P_{\Lambda _i^{[\Phi]}}}(t) $以及误差互协方差$ {P_{\Lambda _{ij}^{[\Phi]}}}(t), i, j = 1, 2, 3 $.再根据定理1中参数融合估计算法, 可得参数融合估计误差方差$ {P_{\Lambda _0^{[{\rm{\Phi }}]}}}(t) $.

    b) 参数的加权平均融合估计:

    ${\Lambda ^{[{\rm{\Phi }}]}}$的加权平均估值为

    $$ \begin{equation} \bar \Lambda _{}^{[\Phi ]}(t) = { {1 \over 3}}\sum\limits_{i = 1}^3 {\hat \Lambda _i^{[\Phi ]}(t)} \end{equation} $$ (60)

    定义加权平均估计误差方差为$ {P_{\bar \Lambda _{}^{[\Phi]}}}(t) = {\rm{E}}[\tilde {\bar \Lambda}_{}^{[\Phi]}(t)(\tilde {\bar \Lambda} _{}^{[\Phi]}(t)){^{\rm{T}}}] $, 其中$ \tilde {\bar \Lambda} _{}^{[\Phi]}(t) = {\Lambda ^{[\Phi]}} -\bar \Lambda _{}^{[\Phi]}(t) = { {1 \over 3}}\sum_{i = 1}^3 {(S{{\tilde \vartheta }_{Ai}}(t))} $.于是, 加权平均估计误差方差可计算为

    $$ \begin{equation} {P_{\bar \Lambda _{}^{[\Phi ]}}}(t) = { {1 \over 9}}S\left(\sum\limits_{i = 1}^3 {\sum\limits_{j = 1}^3 ({P_{{\vartheta _{Aij}}}}}(t))\right){S^{\rm{T}}} \end{equation} $$ (61)

    图 1给出了应用部分3参数融合辨识算法获得的未知模型参数融合辨识结果.由图可知随着时间的增长, 辨识结果收敛于真值.

    图 1  $\Phi$中未知参数估计
    Fig. 1  Identification of parameters of $\Phi$

    图 2图 3分别给出了未知模型参数$ {a_{11}} $和$ {a_{12}} $的局部估计误差方差、加权平均估计误差方差和分布式融合估计误差方差的比较结果.由图可知分布式加权融合辨识误差方差小于各局部辨识误差方差和加权平均辨识误差方差.图中$ {S_i}, i = 1, 2, 3 $表示第$ i $个传感器的局部辨识的误差方差, DWF表示分布式加权融合辨识的误差方差, WAEV表示加权平均融合辨识的误差方差.

    图 2  $a_{11}$估计误差方差
    Fig. 2  Estimation error variance of $a_{11}$
    图 3  $a_{12}$估计误差方差
    Fig. 3  Estimation error variance of $a_{12}$

    图 4图 5给出了应用部分4中辨识算法分别对不同传感器的随机变量$ \{ {\mu _i}(t)\}, i = 1, 2, 3 $的数学期望和方差进行辨识的结果.曲线表示辨识结果, 直线表示相应的真值.由图可知随着时间的增长, 辨识结果收敛于真值.图 6图 7给出了自校正融合状态滤波器, 可见自校正融合估计具有有效性.

    图 4  $\mu_{i}(t)$的数学期望辨识
    Fig. 4  Identification of Mathematical expectation of $\mu_{i}(t)$
    图 5  $\mu_{i}(t)$的方差辨识
    Fig. 5  Identification of variance of $\mu_{i}(t)$
    图 6  自校正状态分量1融合滤波器
    Fig. 6  The first state component of self-tuning fusion filter
    图 7  自校正状态分量2融合滤波器
    Fig. 7  The second state component of self-tuning fusion filter

    图 8图 9给出了局部和融合的最优与自校正状态估计误差方差图.由图可见, 各局部自校正误差方差收敛于局部最优误差方差, 自校正融合误差方差收敛于最优融合误差方差, 即自校正滤波器具有渐近最优性.而且自校正融合滤波器比各局部自校正滤波器具有更高精度.图中$ {S_i}, i = 1, 2, 3 $表示第$ i $个传感器的局部自校正估计误差方差, SF表示自校正融合估计误差方差, 直线表示相应的最优方差.

    图 8  局部、融合最优与自校正状态分量1的滤波误差方差
    Fig. 8  Variance of the first state component of local, fusion optimal and self-tuning filters
    图 9  局部、融合最优与自校正状态分量2的滤波误差方差
    Fig. 9  Variance of the second state component of local, fusion optimal and self-tuning filters

    目前参考文献[5-7, 24]中的自校正滤波算法大都没有考虑传感器的衰减观测现象.图 10给出了传感器存在衰减观测而没有给予考虑的自校正融合滤波器与本文考虑衰减观测的自校正融合滤波器在30次蒙特卡洛实验下均方误差迹的比较.可见, 在传感器存在衰减观测时, 本文考虑衰减观测的自校正融合滤波器具有更高的精度.

    图 10  考虑衰减观测与没有考虑衰减观测自校正融合滤波器的均方误差的迹
    Fig. 10  Trace of mean square error of the self-tuning fusion filters with/without considering fading measurements

    对带未知模型参数和衰减观测率的多传感器随机系统, 应用RELS算法和相关函数分别对未知模型参数、描述衰减观测现象的随机变量的数学期望和方差进行在线实时辨识, 提出了线性无偏最小方差矩阵加权融合模型参数辨识器.与已有文献的加权平均融合模型参数辨识算法相比, 本文所提出的线性无偏最小方差矩阵加权融合参数辨识算法具有更高的估计精度.将实时辨识的模型参数、数学期望和方差代入到最优局部和融合状态估计算法中获得了相应的自校正状态滤波算法.利用DESA方法证明了自校正状态滤波器收敛于最优状态滤波器.与现有文献的带未知模型参数的自校正估计算法相比, 本文还考虑了传感器的衰减观测现象, 并给出了采用相关函数辨识衰减观测的数学期望和方差的算法.


  • 本文责任编委 吴立刚
  • 图  1  基于学习的自适应评判控制结构图

    Fig.  1  Structure of learning-based adaptive critic control

    图  2  事件驱动鲁棒自适应评判控制设计过程图

    Fig.  2  The design procedure of event-triggered robust adaptive critic control

    图  3  事件驱动自适应$H_{\infty}$控制结构图

    Fig.  3  Structure of event-triggered adaptive $H_{\infty}$ control

  • [1] Silver D, Huang A, Maddison C J, Guez A, Sifre L, van den Driessche G, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587):484-489 doi: 10.1038/nature16961
    [2] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553):436-444 doi: 10.1038/nature14539
    [3] Schmidhuber J. Deep learning in neural networks:an overview. Neural Networks, 2015, 61:85-117 doi: 10.1016/j.neunet.2014.09.003
    [4] Haykin S. Neural Networks: A Comprehensive Foundation (Second edition). Upper Saddle River, NJ: Prentice-Hall, 1999.
    [5] Sutton R S, Barto A G. Reinforcement Learning:An Introduction. Cambridge, MA:MIT Press, 1998.
    [6] Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, et al. Mastering the game of Go without human knowledge. Nature, 2017, 550:354-359 doi: 10.1038/nature24270
    [7] Bellman R E. Dynamic Programming. Princeton, NJ:Princeton University Press, 1957.
    [8] Lewis F L, Vrabie D, Syrmos V L. Optimal Control (Third edition). New York:Wiley, 2012.
    [9] Werbos P J. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences[Ph. D. dissertation], Harvard University, Cambridge, MA, 1974
    [10] Werbos P J. Advanced forecasting methods for global crisis warning and models of intelligence. General Systems Yearbook, 1977, 22(6):25-38
    [11] Werbos P J. Approximate dynamic programming for realtime control and neural modeling. Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. New York, NY: Van Nostrand Reinhold, 1992.
    [12] Prokhorov D V, Wunsch D C. Adaptive critic designs. IEEE Transactions on Neural Networks, 1997, 8(5):997-1007 doi: 10.1109/72.623201
    [13] Murray J J, Cox C J, Lendaris G G, Saeks R. Adaptive dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics, Part C:Applications and Reviews, 2002, 32(2):140-153 doi: 10.1109/TSMCC.2002.801727
    [14] Si J, Wang Y T. Online learning control by association and reinforcement. IEEE Transactions on Neural Networks, 2001, 12(2):264-276 doi: 10.1109/72.914523
    [15] Saridis G N, Wang F Y. Suboptimal control of nonlinear stochastic systems. Control Theory and Advanced Technology, 1994, 10(4):847-871
    [16] Beard R W, Saridis G N, Wen J T. Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation. Automatica, 1997, 33(12):2159-2177 doi: 10.1016/S0005-1098(97)00128-3
    [17] Abu-Khalaf M, Lewis F L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica, 2005, 41(5):779-791 doi: 10.1016/j.automatica.2004.11.034
    [18] Wang D, Liu D R, Wei Q L, Zhao D B, Jin N. Optimal control of unknown nona-ne nonlinear discrete-time systems based on adaptive dynamic programming. Automatica, 2012, 48(8):1825-1832 doi: 10.1016/j.automatica.2012.05.049
    [19] Xu B, Yang C G, Shi Z K. Reinforcement learning output feedback NN control using deterministic learning technique. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(3):635-641 doi: 10.1109/TNNLS.2013.2292704
    [20] 王鼎, 穆朝絮, 刘德荣.基于迭代神经动态规划的数据驱动非线性近似最优调节.自动化学报, 2017, 43(3):366-375 http://www.aas.net.cn/CN/abstract/abstract19015.shtml

    Wang Ding, Mu Chao-Xu, Liu De-Rong. Data-driven nonlinear near-optimal regulation based on iterative neural dynamic programming. Acta Automatica Sinica, 2017, 43(3):366-375 http://www.aas.net.cn/CN/abstract/abstract19015.shtml
    [21] Mu C X, Wang D, He H B. Novel iterative neural dynamic programming for data-based approximate optimal control design. Automatica, 2017, 81:240-252 doi: 10.1016/j.automatica.2017.03.022
    [22] Vamvoudakis K G, Lewis F L. Online actor-critic algorithm to solve the continuous-time inflnite horizon optimal control problem. Automatica, 2010, 46(5):878-888 doi: 10.1016/j.automatica.2010.02.018
    [23] Vamvoudakis K G, Miranda M F, Hespanha J P. Asymptotically stable adaptive-optimal control algorithm with saturating actuators and relaxed persistence of excitation. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(11):2386-2398 doi: 10.1109/TNNLS.2015.2487972
    [24] Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis K G, Lewis F L, Dixon W E. A novel actor-critic-identifler architecture for approximate optimal control of uncertain nonlinear systems. Automatica, 2013, 49(1):82-92 doi: 10.1016/j.automatica.2012.09.019
    [25] Modares H, Lewis F L. Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica, 2014, 50(7):1780-1792 doi: 10.1016/j.automatica.2014.05.011
    [26] Nodland D, Zargarzadeh H, Jagannathan S. Neural network-based optimal adaptive output feedback control of a helicopter UAV. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(7):1061-1073 doi: 10.1109/TNNLS.2013.2251747
    [27] Lv Y F, Na J, Yang Q M, Wu X, Guo Y. Online adaptive optimal control for continuous-time nonlinear systems with completely unknown dynamics. International Journal of Control, 2016, 89(1):99-112 doi: 10.1080/00207179.2015.1060362
    [28] Vrabie D, Lewis F. Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Networks, 2009, 22(3):237-246 doi: 10.1016/j.neunet.2009.03.008
    [29] Zhang H G, Cui L L, Zhang X, Luo Y H. Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Transactions on Neural Networks, 2011, 22(12):2226-2236 doi: 10.1109/TNN.2011.2168538
    [30] Jiang Y, Jiang Z P. Global adaptive dynamic programming for continuous-time nonlinear systems. IEEE Transactions on Automatic Control, 2015, 60(11):2917-2929 doi: 10.1109/TAC.2015.2414811
    [31] Bian T, Jiang Z P. Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design. Automatica, 2016, 71:348-360 doi: 10.1016/j.automatica.2016.05.003
    [32] Lee J Y, Park J B, Choi Y H. Integral reinforcement learning for continuous-time input-a-ne nonlinear systems with simultaneous invariant explorations. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(5):916-932 doi: 10.1109/TNNLS.2014.2328590
    [33] Ha M M, Wang D, Liu D R. Event-triggered adaptive critic control design for discrete-time constrained nonlinear systems. IEEE Transactions on Systems, Man and Cybernetics:Systems, 2019, DOI: 10.1109/TSMC.2018.2868510
    [34] Wang F Y, Zhang H G, Liu D R. Adaptive dynamic programming:an introduction. IEEE Computational Intelligence Magazine, 2009, 4(2):39-47 doi: 10.1109/MCI.2009.932261
    [35] Lewis F L, Liu D R. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Hoboken, NJ: John Wiley & Sons, Inc., 2012.
    [36] Zhang H G, Liu D R, Luo Y H, Wang D. Adaptive Dynamic Programming for Control: Algorithms and Stability. London, UK: Springer-Verlag, 2013.
    [37] 张化光, 张欣, 罗艳红, 杨珺.自适应动态规划综述.自动化学报, 2013, 39(4):303-311 http://www.aas.net.cn/CN/abstract/abstract17916.shtml

    Zhang Hua-Guang, Zhang Xin, Luo Yan-Hong, Yang Jun. An overview of research on adaptive dynamic programming. Acta Automatica Sinica, 2013, 39(4):303-311 http://www.aas.net.cn/CN/abstract/abstract17916.shtml
    [38] 刘德荣, 李宏亮, 王鼎.基于数据的自学习优化控制:研究进展与展望.自动化学报, 2013, 39(11):1858-1870 http://www.aas.net.cn/CN/abstract/abstract18225.shtml

    Liu De-Rong, Li Hong-Liang, Wang Ding. Data-based selflearning optimal control:research progress and prospects. Acta Automatica Sinica, 2013, 39(11):1858-1870 http://www.aas.net.cn/CN/abstract/abstract18225.shtml
    [39] Wang D, He H B, Liu D R. Adaptive critic nonlinear robust control:a survey. IEEE Transactions on Cybernetics, 2017, 47(10):3429-3451 doi: 10.1109/TCYB.2017.2712188
    [40] Wang D, Mu C X. Adaptive Critic Control with Robust Stabilization for Uncertain Nonlinear Systems. Singapore: Springer Singapore, 2019.
    [41] Liu D R, Wei Q L, Wang D, Yang X, Li H L. Adaptive Dynamic Programming with Applications in Optimal Control. Switzerland: Springer, 2017.
    [42] Jiang Y, Jiang Z P. Robust Adaptive Dynamic Programming. Hoboken, NJ:Wiley-IEEE Press, 2017.
    [43] 王飞跃.平行控制:数据驱动的计算控制方法.自动化学报, 2013, 39(4):293-302 http://www.aas.net.cn/CN/abstract/abstract17915.shtml

    Wang Fei-Yue. Parallel control:a method for data-driven and computational control. Acta Automatica Sinica, 2013, 39(4):293-302 http://www.aas.net.cn/CN/abstract/abstract17915.shtml
    [44] Hou Z S, Wang Z. From model-based control to datadriven control:Survey, classiflcation and perspective. Information Sciences, 2013, 235:3-35 doi: 10.1016/j.ins.2012.07.014
    [45] Lavretsky E, Wise K A. Robust and Adaptive Control: with Aerospace Applications. London, UK: SpringerVerlag, 2013.
    [46] Krstic M, Kanellakopoulos I, Kokotovic P V. Nonlinear and Adaptive Control Design. New York, NY: John Wiley & Sons, 1995.
    [47] Lewis F L, Jagannathan S, Yesildirek A. Neural Network Control of Robot Manipulators and Non-linear Systems. London: Taylor & Francis, 1999.
    [48] Corless M, Leitmann G. Continuous state feedback guaranteeing uniform ultimate boundedness for uncertain dynamic systems. IEEE Transactions on Automatic Control, 1981, 26(5):1139-1144 doi: 10.1109/TAC.1981.1102785
    [49] Lin F. Robust Control Design: An Optimal Control Approach. Chichester: John Wiley & Sons, 2007.
    [50] Lin F, Brand R D, Sun J. Robust control of nonlinear systems:Compensating for uncertainty. International Journal of Control, 1992, 56(6):1453-1459 doi: 10.1080/00207179208934374
    [51] Adhyaru D M, Kar I N, Gopal M. Fixed flnal time optimal control approach for bounded robust controller design using Hamilton-Jacobi-Bellman solution. IET Control Theory & Applications, 2009, 3(9):1183-1195
    [52] Adhyaru D M, Kar I N, Gopal M. Bounded robust control of nonlinear systems using neural network-based HJB solution. Neural Computing & Applications, 2011, 20(1):91-103
    [53] Wang D, Liu D R, Li H L. Policy iteration algorithm for online design of robust control for a class of continuoustime nonlinear systems. IEEE Transactions on Automation Science and Engineering, 2014, 11(2):627-632 doi: 10.1109/TASE.2013.2296206
    [54] Wang D, Liu D R, Li H L, Ma H W. Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming. Information Sciences, 2014, 282:167-179 doi: 10.1016/j.ins.2014.05.050
    [55] Wang D, Liu D R, Zhang Q C, Zhao D B. Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Transactions on Systems, Man, and Cybernetics:Systems, 2016, 46(11):1544-1555 doi: 10.1109/TSMC.2015.2492941
    [56] Liu D R, Yang X, Wang D, Wei Q L. Reinforcementlearning-based robust controller design for continuoustime uncertain nonlinear systems subject to input constraints. IEEE Transactions on Cybernetics, 2015, 45(7):1372-1385 doi: 10.1109/TCYB.2015.2417170
    [57] Wang D, Liu D R, Li H L, Luo B, Ma H W. An approximate optimal control approach for robust stabilization of a class of discrete-time nonlinear systems with uncertainties. IEEE Transactions on Systems, Man, and Cybernetics:Systems, 2016, 46(5):713-717 doi: 10.1109/TSMC.2015.2466191
    [58] Wang D. Adaptation-oriented near-optimal control and robust synthesis of an overhead crane system. In: Proceedings of the 2017 International Conference on Neural Information Processing. Guangzhou, China: Springer, 2017. 42-50
    [59] Zhong X N, He H B, Prokhorov D V. Robust controller design of continuous-time nonlinear system using neural network. In: Proceedings of the 2013 International Joint Conference on Neural Networks. Dallas, TX, USA: IEEE, 2013. 1-8
    [60] Sun J L, Liu C S, Ye Q. Robust difierential game guidance laws design for uncertain interceptor-target engagement via adaptive dynamic programming. International Journal of Control, 2017, 90(5):990-1004 doi: 10.1080/00207179.2016.1192687
    [61] Wang D, Li C, Liu D R, Mu C X. Data-based robust optimal control of continuous-time a-ne nonlinear systems with matched uncertainties. Information Sciences, 2016, 366:121-133 doi: 10.1016/j.ins.2016.05.034
    [62] Yang X, Liu D R, Luo B, Li C. Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning. Information Sciences, 2016, 369:731-747 doi: 10.1016/j.ins.2016.07.051
    [63] Fan Q Y, Yang G H. Adaptive actor-critic design-based integral sliding-mode control for partially unknown nonlinear systems with input disturbances. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(1):165-177 doi: 10.1109/TNNLS.2015.2472974
    [64] Jiang Y, Jiang Z P. Robust adaptive dynamic programming for large-scale systems with an application to multimachine power systems. IEEE Transactions on Circuits and Systems Ⅱ:Express Briefs, 2012, 59(10):693-697 doi: 10.1109/TCSII.2012.2213353
    [65] Jiang Z P, Jiang Y. Robust adaptive dynamic programming for linear and nonlinear systems:an overview. European Journal of Control, 2013, 19(5):417-425 doi: 10.1016/j.ejcon.2013.05.017
    [66] Jiang Y, Jiang Z P. Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(5):882-893 doi: 10.1109/TNNLS.2013.2294968
    [67] Bian T, Jiang Y, Jiang Z P. Decentralized adaptive optimal control of large-scale systems with application to power systems. IEEE Transactions on Industrial Electronics, 2015, 62(4):2439-2447 doi: 10.1109/TIE.2014.2345343
    [68] Gao W N, Jiang Y, Jiang Z P, Chai T Y. Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming. Automatica, 2016, 72:37-45 doi: 10.1016/j.automatica.2016.05.008
    [69] Dierks T, Jagannathan S. Optimal control of a-ne nonlinear continuous-time systems. In: Proceedings of the 2010 American Control Conference. Baltimore, MD, USA: IEEE, 2010. 1568-1573
    [70] Zhang H G, Cui L L, Luo Y H. Near-optimal control for nonzero-sum difierential games of continuous-time nonlinear systems using single-network ADP. IEEE Transactions on Cybernetics, 2013, 43(1):206-216 doi: 10.1109/TSMCB.2012.2203336
    [71] Yang X, Liu D R, Ma H W, Xu Y C. Online approximate solution of HJI equation for unknown constrained-input nonlinear continuous-time systems. Information Sciences, 2016, 328:435-454 doi: 10.1016/j.ins.2015.09.001
    [72] Wang D, Mu C. Developing nonlinear adaptive optimal regulators through an improved neural learning mechanism. Science China Information Sciences, 2017, 60(5):058201 doi: 10.1007/s11432-016-9022-1
    [73] Wang D, Mu C X. A novel neural optimal control framework with nonlinear dynamics:Closed-loop stability and simulation veriflcation. Neurocomputing, 2017, 266:353-360 doi: 10.1016/j.neucom.2017.05.051
    [74] Wang D, Liu D R, Mu C X, Zhang Y. Neural network learning and robust stabilization of nonlinear systems with dynamic uncertainties. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(4):1342-1351 doi: 10.1109/TNNLS.2017.2749641
    [75] Yang X, He H B. Self-learning robust optimal control for continuous-time nonlinear systems with mismatched disturbances. Neural Networks, 2018, 99:19-30 doi: 10.1016/j.neunet.2017.11.022
    [76] Jiang Z P, Teel A R, Praly L. Small-gain theorem for ISS systems and applications. Mathematics of Control, Signals and Systems, 1994, 7(2):95-120 doi: 10.1007/BF01211469
    [77] Mu C X, Sun C Y, Wang D, Song A G. Adaptive tracking control for a class of continuous-time uncertain nonlinear systems using the approximate solution of HJB equation. Neurocomputing, 2017, 260:432-442 doi: 10.1016/j.neucom.2017.04.043
    [78] Wang D, Mu C X. Adaptive-critic-based robust trajectory tracking of uncertain dynamics and its application to a spring-mass-damper system. IEEE Transactions on Industrial Electronics, 2018, 65(1):654-663 doi: 10.1109/TIE.2017.2722424
    [79] Wang D, Liu D R, Zhang Y, Li H Y. Neural network robust tracking control with adaptive critic framework for uncertain nonlinear systems. Neural Networks, 2018, 97:11-18 doi: 10.1016/j.neunet.2017.09.005
    [80] Tabuada P. Event-triggered real-time scheduling of stabilizing control tasks. IEEE Transactions on Automatic Control, 2007, 52(9):1680-1685 doi: 10.1109/TAC.2007.904277
    [81] Tallapragada P, Chopra N. On event triggered tracking for nonlinear systems. IEEE Transactions on Automatic Control, 2013, 58(9):2343-2348 doi: 10.1109/TAC.2013.2251794
    [82] Vamvoudakis K G. Event-triggered optimal adaptive control algorithm for continuous-time nonlinear systems. IEEE/CAA Journal of Automatica Sinica, 2014, 1(3):282-293 doi: 10.1109/JAS.2014.7004686
    [83] Sahoo A, Xu H, Jagannathan S. Neural networkbased event-triggered state feedback control of nonlinear continuous-time systems. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(3):497-509 doi: 10.1109/TNNLS.2015.2416259
    [84] Zhong X N, He H B. An event-triggered ADP control approach for continuous-time system with unknown internal states. IEEE Transactions on Cybernetics, 2017, 47(3):683-694 doi: 10.1109/TCYB.2016.2523878
    [85] Dong L, Tang Y F, He H B, Sun C Y. An event-triggered approach for load frequency control with supplementary ADP. IEEE Transactions on Power Systems, 2017, 32(1):581-589 doi: 10.1109/TPWRS.2016.2537984
    [86] Zhu Y H, Zhao D B, He H B, Ji J H. Event-triggered optimal control for partially unknown constrained-input systems via adaptive dynamic programming. IEEE Transactions on Industrial Electronics, 2017, 64(5):4101-4109 doi: 10.1109/TIE.2016.2597763
    [87] Wang D, Mu C X, He H B, Liu D R. Adaptive-critic-based event-driven nonlinear robust state feedback. In: Proceedings of the IEEE 55th Conference on Decision and Control. Las Vegas, NV, USA: IEEE, 2016. 5813-5818
    [88] Wang D, Mu C X, He H B, Liu D R. Event-driven adaptive robust control of nonlinear systems with uncertainties through NDP strategy. IEEE Transactions on Systems, Man, and Cybernetics:Systems, 2017, 47(7):1358-1370 doi: 10.1109/TSMC.2016.2592682
    [89] Wang D, Liu D R. Neural robust stabilization via eventtriggering mechanism and adaptive learning technique. Neural Networks, 2018, 102:27-35 doi: 10.1016/j.neunet.2018.02.007
    [90] Zhang Q C, Zhao D B, Wang D. Event-based robust control for uncertain nonlinear systems using adaptive dynamic programming. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(1):37-50 doi: 10.1109/TNNLS.2016.2614002
    [91] Abu-Khalaf M, Lewis F L, Huang J. Policy iterations on the Hamilton-Jacobi-Isaacs equation for H state feedback control with input saturation. IEEE Transactions on Automatic Control, 2006, 51(12):1989-1995 doi: 10.1109/TAC.2006.884959
    [92] Vamvoudakis K G, Lewis F L. Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. International Journal of Robust and Nonlinear Control, 2012, 22(13):1460-1483 doi: 10.1002/rnc.v22.13
    [93] Modares H, Lewis F L, Sistani M B N. Online solution of nonquadratic two-player zero-sum games arising in the H control of constrained input systems. International Journal of Adaptive Control and Signal Processing, 2014, 28(3-5):232-254 doi: 10.1002/acs.v28.3-5
    [94] Luo B, Wu H N, Huang T W. Ofi-policy reinforcement learning for H control design. IEEE Transactions on Cybernetics, 2015, 45(1):65-76 doi: 10.1109/TCYB.2014.2319577
    [95] Zhang H G, Qin C B, Jiang B, Luo Y H. Online adaptive policy learning algorithm for H state feedback control of unknown a-ne nonlinear discrete-time systems. IEEE Transactions on Cybernetics, 2014, 44(12):2706-2718 doi: 10.1109/TCYB.2014.2313915
    [96] Song R Z, Lewis F L, Wei Q L, Zhang H G. Ofi-policy actor-critic structure for optimal control of unknown systems with disturbances. IEEE Transactions on Cybernetics, 2016, 46(5):1041-1050 doi: 10.1109/TCYB.2015.2421338
    [97] Wang D, He H B, Liu D R. Improving the critic learning for event-based nonlinear H control design. IEEE Transactions on Cybernetics, 2017, 47(10):3417-3428 doi: 10.1109/TCYB.2017.2653800
    [98] Zhang Q C, Zhao D B, Zhu Y H. Event-triggered H control for continuous-time nonlinear system via concurrent learning. IEEE Transactions on Systems, Man, and Cybernetics:Systems, 2017, 47(7):1071-1081 doi: 10.1109/TSMC.2016.2531680
    [99] Mu C X, Wang D, Sun C Y, Zong Q. Robust adaptive critic control design with network-based event-triggered formulation. Nonlinear Dynamics, 2017, 90(3):2023-2035 doi: 10.1007/s11071-017-3778-5
    [100] Werbos P J. Computational intelligence for the smart gridhistory, challenges, and opportunities. IEEE Computational Intelligence Magazine, 2011, 6(3):14-21 doi: 10.1109/MCI.2011.941587
    [101] Tang Y F, He H B, Wen J Y, Liu J. Power system stability control for a wind farm based on adaptive dynamic programming. IEEE Transactions on Smart Grid, 2015, 6(1):166-177 doi: 10.1109/TSG.2014.2346740
    [102] Wang D, He H B, Mu C X, Liu D R. Intelligent critic control with disturbance attenuation for a-ne dynamics including an application to a microgrid system. IEEE Transactions on Industrial Electronics, 2017, 64(6):4935-4944 doi: 10.1109/TIE.2017.2674633
    [103] Wang D, He H B, Zhong X N, Liu D R. Event-driven nonlinear discounted optimal regulation involving a power system application. IEEE Transactions on Industrial Electronics, 2017, 64(10):8177-8186 doi: 10.1109/TIE.2017.2698377
    [104] Wei Q L, Lewis F L, Shi G, Song R Z. Error-tolerant iterative adaptive dynamic programming for optimal renewable home energy scheduling and battery management. IEEE Transactions on Industrial Electronics, 2017, 64(12):9527-9537 doi: 10.1109/TIE.2017.2711499
    [105] Liu D R, Xu Y C, Wei Q L, Liu X L. Residential energy scheduling for variable weather solar energy based on adaptive dynamic programming. IEEE/CAA Journal of Automatica Sinica, 2018, 5(1):36-46 doi: 10.1109/JAS.2017.7510739
    [106] Wang D, He H B, Liu D R. Intelligent optimal control with critic learning for a nonlinear overhead crane system. IEEE Transactions on Industrial Informatics, 2018, 14(7):2932-2940 doi: 10.1109/TII.2017.2771256
    [107] 赵冬斌, 刘德荣, 易建强.基于自适应动态规划的城市交通信号优化控制方法综述.自动化学报, 2009, 35(6):676-681 http://www.aas.net.cn/CN/abstract/abstract13331.shtml

    Zhao Dong-Bin, Liu De-Rong, Yi Jian-Qiang. An overview on the adaptive dynamic programming based urban city tra-c signal optimal control. Acta Automatica Sinica, 2009, 35(6):676-681 http://www.aas.net.cn/CN/abstract/abstract13331.shtml
    [108] Gao W N, Jiang Z P, Ozbay K. Data-driven adaptive optimal control of connected vehicles. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(5):1122-1133 doi: 10.1109/TITS.2016.2597279
    [109] Bertsekas D P. Value and policy iterations in optimal control and adaptive dynamic programming. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(3):500-509 doi: 10.1109/TNNLS.2015.2503980
    [110] Werbos P J. From ADP to the brain: Foundations, roadmap, challenges and research priorities. In: Proceedings of the 2014 International Joint Conference on Neural Networks. Beijing, China: IEEE, 2014. 107-111
  • 期刊类型引用(15)

    1. 时侠圣,任璐,孙长银. 自适应分布式聚合博弈广义纳什均衡算法. 自动化学报. 2024(06): 1210-1220 . 本站查看
    2. 王鼎,胡凌治,赵明明,哈明鸣,乔俊飞. 未知非线性零和博弈最优跟踪的事件触发控制设计. 自动化学报. 2023(01): 91-101 . 本站查看
    3. 吴健发,王宏伦,王延祥,刘一恒. 无人机反应式扰动流体路径规划. 自动化学报. 2023(02): 272-287 . 本站查看
    4. 吴清平. 改进的暗通道运动图像盲复原方法. 山东理工大学学报(自然科学版). 2023(05): 68-73 . 百度学术
    5. 霍煜,王鼎,乔俊飞. 基于单网络评判学习的非线性系统鲁棒跟踪控制. 控制与决策. 2023(11): 3066-3074 . 百度学术
    6. 李梦花 ,王鼎 ,乔俊飞 . 不对称约束多人非零和博弈的自适应评判控制. 控制理论与应用. 2023(09): 1562-1568 . 百度学术
    7. 张兴龙,陆阳,李文璋,徐昕. 基于滚动时域强化学习的智能车辆侧向控制算法. 自动化学报. 2023(12): 2481-2492 . 本站查看
    8. 王鼎. 一类离散动态系统基于事件的迭代神经控制. 工程科学学报. 2022(03): 411-419 . 百度学术
    9. 王鼎,赵明明,哈明鸣,乔俊飞. 基于折扣广义值迭代的智能最优跟踪及应用验证. 自动化学报. 2022(01): 182-193 . 本站查看
    10. 石永霞,胡庆雷,邵小东. 角速度受限下航天器姿态机动事件触发控制. 中国科学:信息科学. 2022(03): 506-520 . 百度学术
    11. 汪雨劼,杜翔宇,刘磊,成忠涛,王永骥. 基于鲁棒自适应动态规划的临近空间飞行器姿态跟踪控制. 战术导弹技术. 2022(02): 75-82 . 百度学术
    12. 王敏,黄龙旺,杨辰光. 基于事件触发的离散MIMO系统自适应评判容错控制. 自动化学报. 2022(05): 1234-1245 . 本站查看
    13. 何斌,刘全,张琳琳,时圣苗,陈红名,闫岩. 一种加速时间差分算法收敛的方法. 自动化学报. 2021(07): 1679-1688 . 本站查看
    14. 吕永峰,田建艳,菅垄,任雪梅. 非线性多输入系统的近似动态规划H_∞控制. 控制理论与应用. 2021(10): 1662-1670 . 百度学术
    15. 杨勐荷,郑雷. 利用计算机交互技术的网页界面文本视觉优化设计. 现代电子技术. 2020(24): 92-95+101 . 百度学术

    其他类型引用(18)

  • 加载中
  • 图(3)
    计量
    • 文章访问数:  3075
    • HTML全文浏览量:  526
    • PDF下载量:  1280
    • 被引次数: 33
    出版历程
    • 收稿日期:  2017-12-15
    • 录用日期:  2018-03-06
    • 刊出日期:  2019-06-20

    目录

    /

    返回文章
    返回