2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

前额叶皮层启发的Transformer模型应用及其进展

潘雨辰 贾克斌 张铁林

王冰洁, 徐磊, 林宗利, 施阳, 杨涛. 基于自适应动态规划的量化通信下协同最优输出调节. 自动化学报, 2025, 51(4): 1−11 doi: 10.16383/j.aas.c240494
引用本文: 潘雨辰, 贾克斌, 张铁林. 前额叶皮层启发的Transformer模型应用及其进展. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240538
Wang Bing-Jie, Xu Lei, Lin Zong-Li, Shi Yang, Yang Tao. Cooperative optimal output regulation under quantized communication based on adaptive dynamic programming. Acta Automatica Sinica, 2025, 51(4): 1−11 doi: 10.16383/j.aas.c240494
Citation: Pan Yu-Chen, Jia Ke-Bin, Zhang Tie-lin. The application and progress of prefrontal cortex-inspired transformer model. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240538

前额叶皮层启发的Transformer模型应用及其进展

doi: 10.16383/j.aas.c240538 cstr: 32138.14.j.aas.c240538
基金项目: 北京市科技新星(20230484369), 上海市市级科技重大专项(2021SHZDZX), 中科院青促会, 多模态人工智能系统全国重点实验室开放课题基金等资助.
详细信息
    作者简介:

    潘雨辰:北京工业大学信息科学技术学院硕士研究生, 中科院脑科学与智能技术卓越创新中心联合培养学生. 2019年获得北京工业大学工学学士学位. 主要研究方向为类脑模型算法.E-mail: 18201335023@sina.cn

    贾克斌:北京工业大学信息科学技术学院教授, 博士. 主要研究方向为图像/视频处理技术与生物医学信息处理技术.E-mail: kebinj@bjut.edu.cn

    张铁林:中国科学院脑智卓越中心, 脑认知与类脑智能国重实验室研究员, 课题组长, 兼职中科院自动化所复杂系统认知与决策实验室. 主要从事类脑脉冲神经网络算法, 类脑芯片及AI for Neuroscience研究. 本文通信作者.E-mail: zhangtielin@ion.ac.cn

The Application and Progress of Prefrontal Cortex-inspired Transformer Model

Funds: Supported by Beijing Nova Program (20230484369), Shanghai Municipal Science and Technology Major Project (2021SHZDZX), Youth Innovation Promotion Association of Chinese Academy of Sciences, and Open Projects Program of State Key Laboratory of Multimodal Artificial Intelligence Systems.
More Information
    Author Bio:

    PAN Yu-Chen Master’s student in the School of Information Science and Technology, Beijing University of Technology, co-supervised by the Center for Excellence of Brain Science and Intelligence Technology, Chinese Academy of Sciences. He received his bachelor degree in engineering from Beijing University of Technology in 2019. His research interest covers brain-inspired algorithms

    JIA Ke-Bin Professor at the School of Information Science and Technology, Beijing University of Technology. His research interests are focused on Image/Video coding and processing, bioinformation processing

    ZHANG Tie-Lin Principal indicator at the Key Laboratory of Brain Cognition and Brain-inspired Intelligence, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences (CAS), also a co-PI at Key Laboratory of Complex System for Recognition and Decision-making, Institute of Automation, CAS. He mainly engaged in the research of brain-inspired spiking neural network algorithms, brain-inspired chips, and AI for Neuroscience. Corresponding author of this paper

  • 摘要: 本文聚焦于生物结构与类脑智能的交叉研究方向, 探讨前额叶皮层的结构及其认知功能对人工智能领域Transformer模型的启发. 前额叶皮层在认知控制和决策制定中扮演着关键角色, 本文首先介绍前额叶皮层的注意力机制、生物编码、多感觉融合等相关生物研究进展, 然后探讨这些生物机制如何启发新型的类脑Transformer架构, 重点提升其在自注意力、位置编码、多模态整合等方面的生物合理性与计算高效性. 最后, 总结前额叶皮层启发的类脑新模型, 在支持多类型神经网络组合、多领域应用、世界模型构建等方面的发展与潜力, 为生物和人工智能两大领域之间交叉融合构建桥梁.
  • 近年来, 多智能体系统的输出调节问题因其在无人机编队控制、自动驾驶和车联网以及多航天器姿态同步等领域的应用而引起广泛的关注[13]. 多智能体输出调节问题的目标是通过设计一种分布式控制策略, 实现每个跟随者的输出信号跟踪参考信号, 并抑制由外部系统描述的干扰信号[46]. 目前, 分布式控制策略的设计方法主要有两种: 前馈−反馈方法[78]与内模原理方法[910].

    此外, 在多智能体系统中, 智能体的通信通常受限于系统的通信拓扑结构, 智能体通常只能与邻居进行直接通信. 在领导−跟随多智能体系统中, 跟随者为获得领导者的状态信息, 可通过设计分布式观测器进行估计[7, 11]. 在自主水下航行器[12], 航天器编队控制[13]等实际网络通信中, 通信信道的有限带宽在智能体之间的信息传输中不容忽视[1418]. 为降低通信负担, 减少通信信道中传输数据的比特数, 一些学者通过设计量化器与编码−解码方案来解决量化通信下多智能体系统的协同输出调节问题. 文献[19]利用对数量化器对控制输入进行量化, 并通过扇形约束方法来处理存在的量化误差. 文献[20]通过设计一种基于缩放函数策略的动态编码−解码方案, 保证量化误差的收敛, 实现多智能体系统跟踪误差渐近收敛到零. 文献[21]将上述结果推广到具有切换拓扑图的多智能体系统上, 解决带有切换图的线性多智能体系统的量化协同输出调节问题. 值得注意的是, 上述研究中所设计的控制策略都是基于模型的, 这就要求每个智能体需要知道系统的模型信息. 然而, 由于通信带宽的固有限制和网络系统固有的脆弱性将导致如时间延迟, 数据包丢失, 信号量化以及网络攻击等现象的发生, 智能体难以完整获得整个系统的动态信息[2224].

    随着自适应动态规划的发展[2528], 一种针对不确定动态系统的自适应控制方法应运而生, 其优势在于可以利用在线数据通过学习来逼近动态系统的控制策略, 而不必完全了解系统的动态信息, 为模型未知的协同输出调节问题提供新的解决方案. 近年来, 一些学者将最优控制理论与自适应动态规划方法进行结合[2931], 通过数据驱动的方式求解最优/次优控制策略, 在保证闭环系统实现输出调节的同时, 最小化系统性能指标. 文献[3]利用前馈−反馈方法设计分布式控制策略, 解决跟随者对领导者状态未知的多智能体系统的协同最优输出调节问题. 文献[32]构建分布式自适应内部模型来估计领导者的动态, 并提出基于策略迭代与值迭代的强化学习算法, 在线学习最优控制策略. 文献[33]针对包含外部系统在内的所有智能体动态未知的多智能体系统, 利用内模原理与自适应动态规划方法, 解决协同最优输出调节问题. 然而, 上述的这些研究并未考虑通信信道带宽有限的情况. 而在实际的工程应用中, 如智能交通系统中的自适应巡航控制等问题, 往往期望设计一种能在通信带宽有限且系统动力学未知情况下运行的数据驱动算法, 来实现多智能体系统间的协同最优输出调节, 这促使我们对这一问题进行研究.

    本文的主要贡献如下: 1) 通过引入均匀量化器, 设计分布式量化观测器来减少通信信道中传输数据的比特数, 降低多智能体间的通信负担. 同时, 将均匀量化器引入到编码−解码方案设计中, 消除量化误差对多智能体系统的影响, 保证每个跟随者对外部系统状态的估计误差渐近收敛至零. 2) 将分布式量化观测器的估计值引入到次优控制策略的设计中, 在系统动态未知的情况下, 提出一种基于自适应动态规划的数据驱动算法, 在线学习次优控制策略, 解决量化通信下的协同最优输出调节问题. 3) 受文献[32]的启发, 在学习阶段, 本文考虑一个更一般的情况, 即跟随者系统只能通过观测器对领导者的状态进行估计, 而无法直接获得领导者的状态. 在这种情况下, 证明学习到的控制器增益将收敛到最优控制增益的任意小邻域内. 与现有文献相比, 文献[32]需要智能体间的精确通信, 而本文中智能体间传输的为量化后的信息, 降低了多智能体间的通信负担, 并通过引入编码−解码方案消除量化误差的影响, 实现量化通信下外部系统状态估计误差的渐近收敛. 文献[3, 34]不仅需要智能体间的精确通信, 并且需要假设每个跟随者系统都能够获得外部系统状态的实际值. 本文在学习阶段考虑一个更一般的情况, 跟随者系统可通过设计的分布式量化观测器对领导者的状态进行估计, 从而获得外部系统状态的估计值.

    本文其余部分安排如下. 第1节介绍图论的基础知识以及相关符号说明; 第2节介绍本文的问题描述; 第3节设计量化通信下的分布式观测器; 第4节提出自适应次优控制策略与自适应动态规划算法; 第5节在智能车联网自适应巡航控制系统上验证理论结果; 第6节总结本文的主要结果, 并提出未来的研究方向.

    本节介绍一些图论的基础知识以及相关符号的定义.

    多智能体系统通过通信网络与相邻的智能体之间共享信息, 该网络可以使用图论来描述. 在这一部分, 首先介绍图论的一些基本知识. 考虑一个具有$ N $个智能体的有向图$ \mathcal{G}=(\mathcal{V},\; \mathcal{E}) $, 其中$ \mathcal{V}= \{1,\;2,\;\cdots,\;N\} $表示智能体的集合, $ \mathcal{E} \subseteq \mathcal{V} \times \mathcal{V} $表示边的集合, 邻接矩阵被定义为$ \mathcal{A}=[a_{ij}] \in \bf{R}^{N\times N} $, 其中当$ a_{ij}> 0 $时, $ (j,\;i) \in \mathcal{E} $, 否则$ a_{ij}=0 $. 有向图$ \mathcal{G} $的拉普拉斯矩阵被定义为$ \mathcal{L}=[\ell_{ij}]\in \bf{R}^{N\times N} $, 其中$ \ell_{ii}=\sum\nolimits_{j=1}^{N}a_{ij} $, $ \ell_{ij}=-a_{ij} $, $ j\ne i $. 领导者由智能体$ 0 $表示, 由$ N $个智能体和领导者组成的图称为增广有向图$ \mathcal{\bar{G}}=(\mathcal{\bar{V}},\;\mathcal{\bar{E}}) $, 其中$ \mathcal{\bar{V}}= \{0,\;1,\;2,\;\cdots,\;N\} $表示智能体的集合, $ \mathcal{\bar{E}} \subseteq \mathcal{\bar{V}} \times \mathcal{\bar{V}} $表示边的集合. 如果从领导者智能体$ 0 $到智能体$ i\; \in\mathcal{V} $存在有向边, 则$ a_{i0}=1 $, 否则$ a_{i0}=0 $. 定义$ G={\rm diag}\{a_{10}, \;a_{20},\;\cdots,\; a_{N0}\} $表示对角矩阵, 令$ H=\mathcal{L}+G $, $ \mathcal{F}=H+\mathcal{A} $. $ \mathcal{N}_{i}=\left\{j|a_{ij}>0,\; j \in \mathcal{\bar{V}}\right\} $表示智能体 $ i\; \in\mathcal{V} $的邻居集合. 对于一个根节点而言, 如果存在从根节点到每个其他节点的有向路径, 则该有向图具有有向生成树.

    $ \bf{Z} $表示整数的集合. $ ||\cdot|| $为向量的欧氏范数和矩阵的$ 2 $范数. 对于列向量$ l=(l_{1},\; l_{2},\;\cdots,\; l_{n})^{{\mathrm{T}}} \in \bf{R}^{n} $, $ ||l||_{\infty}={\rm max}_{1\leq i\leq n}|l_{i}| $. $ \otimes $表示克罗内克积算子. 对于矩阵$ X \in \bf{R}^{m\times m} $, $ \rho(X) $表示它的谱半径, $ \lambda(X) $表示它的特征值, $ \sigma(X) $表示它的谱. $ {\rm tr}(X) $表示它的迹. $ X>0 $表示为正定矩阵, $ X\ge0 $表示为半正定矩阵. 对于矩阵$ X \in \bf{R}^{m\times n} $, $ {\rm rank}(X) $表示它的列秩. $ {\rm vec}(A)=[a^{{\mathrm{T}}}_{1},\; a^{{\mathrm{T}}}_{2},\; \cdots,\; a^{{\mathrm{T}}}_{q}]^{{\mathrm{T}}} \in \bf{R}^{pq} $ 表示将矩阵$ A\in \bf{R}^{p\times q} $向量化, 其中$ a_{i}\in\bf{R}^{p} $是矩阵$ A $的第$ i $列. 对于对称矩阵$ B \in \bf{R}^{m\times m} $, $ b_{mm} $为矩阵$ B $中第$ m $行第$ m $列的元素, $ {\rm vecs}(B)=[b_{11},\; 2b_{12},\;\cdots,\; 2b_{1m},\; b_{22}, 2b_{23},\;\cdots,\;2b_{m-1,\;m},\;b_{mm}]^{{\mathrm{T}}} \in \bf{R}^{\frac{1}{2}m(m+1)} $. 针对任意的列向量$ c\in \bf{R}^{n} $, $ c_{n} $为$ c $中第$ n $个元素, $ {\rm vecv}(c)= [c^{2}_{1},\;\, c_{1}c_{2},\;\,\cdots,\;\,c_{1}c_{n},\;\,c^{2}_{2},\;\,c_{2}c_{3},\;\cdots,\;c_{n-1}c_{n} $, $ c^{2}_{n}]^{{\mathrm{T}}} \in \bf{R}^{\frac{1}{2}n(n+1)}$. $ D={\rm blockdiag}\{D_{1},\;D_{2},\;\cdots,\;D_{N} \} $表示分块对角矩阵, 其中$ D_{i} $为对角块, $ i=1,\; 2,\;\cdots,\; N $. $ \mathbf{1}_n $与$ {I}_n $分别表示$ n $维全1列向量与$ n\times n $维单位矩阵. 针对复数$ {\textit z} $, $ {\rm Re}({\textit z}) $表示$ {\textit z} $的实部.

    本文考虑如下一类连续时间线性多智能体系统:

    $$ \dot{x}_i=A_{i}x_{i}+B_{i}u_{i}+D_{i}\omega\; $$ (1a)
    $$ \dot{\omega}=E\omega\; $$ (1b)
    $$ e_{i}=C_{i}x_{i}+F_{i}\omega,\; \quad i\in \mathcal{V}\; $$ (1c)

    其中, $ x_i\in\bf{R}^{n_i} $, $ u_i\in\bf{R}^{m_i} $, $ e_i\in\bf{R}^{p_i} $分别表示第$ i $个智能体的状态向量, 输入向量以及跟踪误差. 系统(1)的矩阵维数分别为$ A_i\in\bf{R}^{n_i\times n_i} $, $ B_i\in\bf{R}^{n_i\times m_i} $, $ D_i\in\bf{R}^{n_i\times q} $, $ C_i\in\bf{R}^{p_i\times n_i} $, $ F_i\in\bf{R}^{p_i\times q} $. 自治系统(1b)称为外部系统, 其中, $ \omega\in\bf{R}^{q} $表示外部系统的状态, $ E\in\bf{R}^{q\times q} $表示外部系统矩阵.

    针对以上系统, 本文给出一些基本假设条件如下所示:

    假设1. $ (A_i,\;B_i) $可镇定, $ i\in \mathcal{V} $.

    假设2. $ {\rm rank}\left[ \begin{matrix} A_{i}-\lambda I_{n_i} & B_{i} \\ C_{i} & 0 \end{matrix} \right]= n_{i}+p_{i},\; \forall \lambda \in \sigma(E),\; i\in \mathcal{V}. $

    假设3. 有向图$ \mathcal{\bar{G}} $包含以智能体$ 0 $为根节点的有向生成树.

    注1. 假设1和假设2均为多智能体系统输出调节问题中的基本假设[4, 30]. 如果假设3成立, 则$ H $的所有特征值均具有正实部[8].

    引理1[3, 8] . 假设1 ~ 3成立, 对于$ j=1,\;2,\;\cdots,\;q $, $ i\in \mathcal{V} $, 选择充分大的 $ \alpha>0 $ 使 $ {\rm Re}(\lambda_{j}(E)- \alpha\lambda_{i} (H))< 0 $, 其中$ \lambda_{j}(E) $和$ \lambda_{i}(H) $分别为$ E $的第$ j $个和$ H $的第$ i $个特征值, 令$ K_{i} $使$ A_{i}-B_{i}K_{i} $赫尔维玆, $ L_{i}=K_{i}X_{i}+U_{i} $, 其中$ (X_{i},\;U_{i}) $为以下调节器方程的一组解:

    $$ X_{i}E=A_{i}X_{i}+B_{i}U_{i}+D_{i}\; $$ (2a)
    $$ 0=C_{i}X_{i}+F_{i} $$ (2b)

    通过设计控制策略$ u_{i}=-K_{i}x_{i}+L_{i}\eta_{i} $可实现多智能体系统(1)的协同输出调节, 其中$ \eta_{i} $为第$ i $个跟随者对领导者状态$ \omega $的估计值.

    本文的控制目标是通过设计一种次优控制策略

    $$ u_{i}=-K^{*}_{i}x_{i}+L^{*}_{i}\eta_{i},\;\quad i\in \mathcal{V}\; $$ (3)

    实现多智能体系统的协同最优输出调节. 其中$ K^{*}_{i} $为最优反馈控制增益, $ L^{*}_{i} $为最优前馈控制增益.

    此外, 所设计的次优控制策略, 不仅需要解决协同输出调节问题, 同时还需要解决以下两个优化问题.

    问题1.

    $$ \begin{aligned} &\min\limits_{(X_{i},\;U_{i})}\quad {\rm tr}(X^{{\mathrm{T}}}_{i}Q_{i}X_{i}+U^{{\mathrm{T}}}_{i}R_{i}U_{i})\;\\ &\; \rm{s.t.}\quad (2)\; \end{aligned} $$

    其中, $ Q_{i}=Q^{{\mathrm{T}}}_{i}>0 $, $ R_{i}=R^{{\mathrm{T}}}_{i}>0 $.

    根据文献[35]可知, 求解静态优化问题1能够得到调节器方程(2)的唯一最优解$ (X^{*}_{i},\;U^{*}_{i}) $, 最优前馈控制增益$ L^{*}_{i}=K^{*}_{i}X^{*}_{i}+U^{*}_{i} $. 接下来, 为得到最优反馈控制增益$ K^{*}_{i} $, 需要求解以下动态规划问题.

    定义状态误差变量$ \bar{x}_{i}=x_{i}-X^{*}_{i}\omega $与输入误差变量$ \bar{u}_{i}=u_{i}-U_{i}^{*}\omega $. 根据调节器方程(2)与次优控制策略(3)能够得到系统(1a)的误差系统为

    $$ \dot{\bar{x}}_{i}=A_{i}\bar{x}_{i}+B_{i}\bar{u}_{i}\; $$ (4a)
    $$ e_{i}=C_{i}\bar{x}_{i}\; $$ (4b)

    其中, 控制输入为$ \bar{u}_{i}=-K^{*}_{i}\bar{x}_{i}+L^{*}_{i}(\eta_{i}-\omega) $. 误差系统(4)的最优控制策略为$ \bar{u}_{i}=-K^{*}_{i}\bar{x}_{i} $, 可通过求解以下优化问题获得.

    问题2.

    $$ \begin{aligned} &\min \limits_{\bar{u}_{i}}\quad \int_{0}^{\infty} (\bar{x}^{{\mathrm{T}}}_{i}\bar{Q}_{i}\bar{x}_{i}+\bar{u}^{{\mathrm{T}}}_{i}\bar{R}_{i}\bar{u}_{i}){\mathrm{d}}t\;\\ &\; \rm{s.t.}\quad (4)\; \end{aligned} $$

    其中, $ \bar{Q}_{i} = \bar{Q}^{{\mathrm{T}}}_{i}\ge 0 $, $ \bar{R}_{i} = \bar{R}^{{\mathrm{T}}}_{i}>0 $, $ (A_{i},\;\sqrt{\bar{Q}_{i}}) $可观测.

    问题2是一个标准的线性二次型调节器问题, 根据线性最优控制理论, 最优反馈增益$ K^{*}_{i} $为

    $$ K^{*}_{i}=\bar{R}^{-1}_{i}B^{{\mathrm{T}}}_{i}P^{*}_{i}\; $$ (5)

    其中, $ P^{*}_{i}=(P^{*}_{i})^{{\mathrm{T}}}>0 $是以下代数黎卡提方程的唯一解:

    $$ A^{{\mathrm{T}}}_{i}P_{i}^{*}+P_{i}^{*}A_{i}+\bar{Q}_{i}-P_{i}^{*}B_{i}\bar{R}^{-1}_{i}B^{{\mathrm{T}}}_{i}P_{i}^{*}=0 $$ (6)

    注2. 根据文献[3]中定理1的分析可知, 问题2的性能指标中应用控制策略$ \bar{u}_{i}=-K^{*}_{i}\bar{x}_{i}+L^{*}_{i}(\eta_{i}\,- \omega) $与最优控制策略$ \bar{u}_{i}=-K^{*}_{i}\bar{x}_{i} $之间的成本误差是有界的. 因此, 本文所设计的控制策略(3)是次优控制策略.

    由于最优反馈控制增益$ K^{*}_{i} $和最优前馈控制增益$ L^{*}_{i} $是相互独立的, 因此问题1和问题2可以分别进行求解. 值得注意的是, 直接求解非线性方程(6)往往比较困难, 尤其是针对维数比较高的矩阵. 因此, 通常采用以下策略迭代的方法来解决此类问题[36].

    简单而言, 选择一个使闭环系统稳定并保证所需成本有限的反馈控制增益$ K_{i,\;0} $, 即$ A_{i}-B_{i}K_{i,\;0} $是赫尔维玆矩阵. 通过策略迭代的方式求解如下Lyapunov方程来更新值$ P_{i,\;k} $:

    $$ \begin{split} &(A_{i}-B_{i}K_{i,\;k})^{{\mathrm{T}}}P_{i,\;k}+P_{i,\;k}(A_{i}-B_{i}K_{i,\;k})\;+\\ & \qquad\bar{Q}_{i}+ K^{{\mathrm{T}}}_{i,\;k}\bar{R}_{i}K_{i,\;k}=0\; \end{split} $$ (7)

    其中, $ k=1,\;2,\;\cdots $表示迭代次数. 通过以下方程来更新反馈控制增益

    $$ K_{i,\;k+1}=\bar{R}^{-1}_{i}B^{{\mathrm{T}}}_{i}P_{i,\;k} $$ (8)

    文献[36]已证明策略迭代方法中的每一次迭代反馈控制增益$ K_{i,\;k} $都可接受, 即保证了$ A_{i}\;- B_{i}K_{i,\;k} $是赫尔维玆矩阵. 同时也保证了$ \mathop{\lim}\nolimits_{k \to \infty}K_{i,\;k} = K_{i}^* $且$ \mathop{\lim}\nolimits_{k \to \infty}P_{i,\;k}=P_{i}^* $.

    为降低多智能体间的通信负担, 在本节中, 通过引入量化器与编码−解码方案设计分布式量化观测器, 用于估计量化通信下领导者的状态$ \omega $.

    在正式介绍编码−解码方案之前, 首先考虑一种均匀量化器$ \mathcal{Q}[e] $[37]:

    $$ \mathcal{Q}[e]=\varsigma,\;\quad \varsigma-\frac{1}{2}<e \leq \varsigma+\frac{1}{2}\; $$ (9)

    其中, $ \varsigma\in\bf{Z} $, $ e $表示需要量化的变量.

    给定向量$ h=[h_{1},\;h_{2}\cdots,\;h_{n}]\in \bf{R}^{n} $, 定义量化器$ \mathcal{Q}[h]=[\mathcal{Q}[h_{1}],\;\cdots,\; \mathcal{Q}[h_{n}]] $. 量化误差为

    $$ ||h-\mathcal{Q}[h]||_{\infty} \leq \frac{1}{2} $$ (10)

    由于量化误差的存在, 智能体无法获得邻居传输的准确信息, 为消除量化误差带来的影响, 将量化器引入到如下编码−解码方案的设计之中.

    1)编码器

    为传输$ \eta_j(k) $量化后的数据, 对于任意$ k\ge1 $, 智能体$ j \in \mathcal{\bar{V}} $生成的量化输出为$ {\textit z}_j(k) $, 即

    $$ {\textit z}_{j}(k)=\mathcal{Q}\left[\frac{1}{s(k-1)}(\eta_j(k)-b_j(k-1))\right]\; $$ (11a)
    $$ b_j(k)=s(k-1){\textit z}_{j}(k)+b_j(k-1) $$ (11b)

    其中, 内部状态$ b_j(k) $的初值$ b_j(0)=0 $, $ s(k)= s(0) \mu^k>0 $为自适应调整编码器的递减序列, $ \mu\in (0,\;1) $.

    2)解码器

    当智能体$ i $从邻居智能体$ j $接收到量化后的数据$ {\textit z}_{j}(k) $时, 通过以下规则递归生成$ \eta_j(k) $的估计值$ \hat{\eta}_j(k) $, 并通过零阶保持器输出为连续信号$ \hat{\eta}_j(t) $, 即

    $$ \hat{\eta}_j(k)=s(k-1){\textit z}_{j}(k)+\hat{\eta}_j(k-1)\; $$ (12a)
    $$ \hat{\eta}_j(t)=\hat{\eta}_j(k),\; kT \leq t<(k+1)T\; $$ (12b)

    其中, 初值$ \hat{\eta}_j(0)=0 $, $ T>0 $为采样时间, 其选取遵循香农采样定理.

    图 1所示, 对智能体$ i $和邻居智能体$ j $之间的通信而言, 在每个采样时刻, 智能体$ j $对外部系统状态的估计值$ \eta_j(t) $进行采样, 并将采样后的数据$ \eta_j(k) $编码为量化后的数据$ {\textit z}_j(k) $, 然后通过通信信道传输给邻居智能体$ i $. 邻居智能体$ i $接收到数据信息之后通过解码器解码为$ \hat{\eta}_j(k) $, 进而通过零阶保持器得到发送者状态的估计值$ \hat{\eta}_j(t) $. 其中$ b_j(k) $表示一个预测器, 目的是预测智能体$ j $的邻居智能体$ i $经过解码后的得到的数据$ \hat{\eta}_j(k) $.

    图 1  编码−解码方案
    Fig. 1  Encoder-decoder scheme

    注3. 在编码−解码方案设计中, $ s(k) $表示用于调整预测误差$ \eta_j(k)-b_j(k-1) $的调节函数. $ \mu\in (0,\;1) $保证了随着迭代次数的增加, 智能体$ i $对邻居智能体$ j $传输数据的估计误差$ \eta_j(k)-\hat{\eta}_j(k) $逐渐减小, 即消除了量化误差对传输数据准确性的影响.

    接下来, 将上述经编码−解码方案传输的估计值$ \hat{\eta}_j(t) $引入到分布式观测器的设计当中, 针对每个跟随者$ i \in \mathcal{V} $, 受文献[8]的启发, 本文构建分布式量化观测器如下:

    $$ \dot{\eta}_i=E\eta_i+\alpha \sum\limits_{j \in \mathcal{N}_i} a_{i j}\left(\hat{\eta}_j-\eta_i\right) $$ (13)

    其中, $ \eta_i \in \bf{R}^{q} $, 参数$ \alpha>0 $. $ \hat{\eta}_j \in \bf{R}^{q} $表示智能体$ i $对$ \eta_j $经过编码−解码后的估计值, $ \hat{\eta}_0 = \hat{\omega} $.

    本文理论部分的全文流程图如图 2所示. 本文利用量化器与编码−解码方案设计分布式量化观测器, 在减少通讯负担的同时, 对外部系统的状态进行估计. 定理1证明了所提观测器对外部系统状态估计误差的收敛性. 通过求解问题1与问题2设计次优控制策略. 当系统模型未知时, 我们给出一个在线学习算法1, 通过数据驱动的方式在线求解次优控制策略. 定理2则证明了由算法1得到的次优控制策略能够实现量化通信下的自适应协同最优输出调节.

    图 2  理论部分示意图
    Fig. 2  Illustration of the theoretical part

    接下来, 通过以下定理说明所设计的分布式量化观测器保证了对外部系统状态估计误差的收敛性.

    定理1. 考虑外部系统(1b)和分布式量化观测器(13), 如果假设1 ~ 3成立, 对于充分大的$ \alpha>0 $, 经过编码−解码后, 智能体$ i $对外部系统状态的估计误差

    $$ \mathop{\lim}\limits_{t \to \infty}(\eta_{i}(t)-\omega(t))=0\; $$ (14)

    其中, $ i \in \mathcal{V} $.

    证明. 定义$ \bar{\eta}(t)=[\eta_{1}^{{\mathrm{T}}}(t),\; \eta_{2}^{{\mathrm{T}}}(t),\; \cdots,\; \eta_{N}^{{\mathrm{T}}}(t)]^{{\mathrm{T}}} $, $ \hat{\eta}(t)=[\hat{\eta}_{1}^{{\mathrm{T}}}(t),\; \hat{\eta}_{2}^{{\mathrm{T}}}(t),\; \cdots,\; \hat{\eta}_{N}^{{\mathrm{T}}}(t)]^{{\mathrm{T}}} $, $ \bar{\omega}(t)=\mathbf{1}_N\otimes \omega(t) $, $ \hat{\bar{\omega}}(t)=\mathbf{1}_N\otimes\hat{\omega}(t) $, $ \bar{E}={ I_{{N}}}\otimes E $. 将外部系统(1b)与分布式量化观测器(13)整理成如下紧凑形式:

    $$ \dot{\bar{\omega}}(t)=\bar{E}\bar{\omega}(t)\; $$ (15a)
    $$ \begin{split} \dot{\bar{\eta}}(t)=\;&\bar{E}\bar{\eta}(t)-\alpha(\mathcal{F}\otimes I_{q})\bar{\eta}(t)\;+ \\ &\alpha(\mathcal{A}\otimes I_{q})\hat{\eta}(t)+\alpha(H\otimes I_{q})\hat{\bar{\omega}}(t) \end{split} $$ (15b)

    定义$ e_{\omega}(t)=\bar{\omega}(t)-\hat{\bar{\omega}}(t) $, $ e_{\eta}(t)=\bar{\eta}(t)-\hat{\eta}(t) $, 可得

    $$ \begin{split} \dot{\bar{\eta}}(t)=\;&(\bar{E}-\alpha(H\otimes I_{q}))\bar{\eta}(t)\;+\\ &\alpha(H\otimes I_{q})\bar{\omega}(t)-\alpha(\mathcal{A}\otimes I_{q})e_{\eta}(t)\;-\\ &\alpha(H\otimes I_{q})e_{\omega}(t) \end{split} $$ (16)

    定义$ \tilde{\eta}(t)=\bar{\eta}(t)-\bar{\omega}(t) $, 根据式(15a)和式(16)有

    $$ \begin{split} \dot{\tilde{\eta}}(t)=\;&(\bar{E}-\alpha(H\otimes I_{q}))\tilde{\eta}(t)\;-\\ &\alpha(\mathcal{A}\otimes I_{q})e_{\eta}(t)-\alpha(H\otimes I_{q})e_{\omega}(t) \end{split} $$ (17)

    根据引理1可知, 对于$ j=1,\;2,\;\cdots,\;q $, $ i\in \mathcal{V} $, $ {\rm Re}(\lambda_{j}(E)-\alpha\lambda_{i}(H))<0 $, 其中$ \lambda_{j}(E) $和$ \lambda_{i}(H) $分别为$ E $的第$ j $个和$ H $的第$ i $个特征值, 即$ \bar{E}- \alpha(H\otimes I_{q}) $是赫尔维玆的.

    令$ E_h=\bar{E}-\alpha(H\otimes I_{q}) $, $ E_H=\alpha(H\otimes I_{q}) $, $ E_A= \alpha(\mathcal{A}\otimes I_{q}) $, 则式(16)可改写为

    $$ \begin{split} \dot{\bar{\eta}}(t)=\;&E_{h}\bar{\eta}(t)+E_{H}\bar{\omega}(t)\;-\\ &E_{A}e_{\eta}(t)-E_{H}e_{\omega}(t) \end{split} $$ (18)

    由于$ \hat{\bar{\omega}}(t) $与$ \hat{\eta}(t) $使用编码−解码方案进行更新, 将系统(15a)与(18)进行离散化. 定义$ e_{\omega}(k)= \bar{\omega}(k)-\hat{\bar{\omega}}(k) $, $ e_{\eta}(k)=\bar{\eta}(k)-\hat{\eta}(k) $, 系统(15a)与(18)利用零阶保持器方法进行离散化[38], 即

    $$ \bar{\omega}(k+1)={\mathrm{e}}^{\bar{E}{{T}}}\bar{\omega}(k)\; $$ (19a)
    $$ \begin{split} \bar{\eta}(k+1)=\;&{\mathrm{e}}^{E_{h}{{T}}}\bar{\eta}(k)+\int_{0}^{{{T}}}{\mathrm{e}}^{E_{h}\tau}E_{H}{\mathrm{d}}\tau\bar{\omega}(k)\; -\\ &\int_{0}^{{{T}}}{\mathrm{e}}^{E_{h}\tau}E_{A}{\mathrm{d}}\tau e_{\eta}(k) \;-\\ &\int_{0}^{{{T}}}{\mathrm{e}}^{E_{h}\tau}E_{H}{\mathrm{d}}\tau e_{\omega}(k)\; \end{split} $$ (19b)

    其中, $ T $为采样时间, 其选取遵循香农采样定理.

    接下来, 将预测器$ b_{j}(k) $表示为紧凑型, 其中$ j \in \mathcal{\bar{V}} $. 定义$ b_{\omega}(k)=\mathbf{1}_N\otimes b_0(k) $, $ b_{\eta}(k)=[b_1^{{\mathrm{T}}}(k),\;b_2^{{\mathrm{T}}} (k),\; \cdots,\; b_N^{{\mathrm{T}}}(k)]^{{\mathrm{T}}} $. 预测器$ b_{j}(k) $表示对智能体 $ i $经过解码后得到的数据$ \hat{\eta}_j(k) $的预测, 根据$ \hat{\eta}_0(k) = \hat{\omega}(k) $, 且初始值$ b_{\omega}(0)=\hat{\bar{\omega}}(0) $, $ b_{\eta}(0)=\hat{\eta}(0) $, 可得$ b_{\omega}(k)=\hat{\bar{\omega}}(k) $, $ b_{\eta}(k)=\hat{\eta}(k) $. 因此, $ e_{\omega}(k)= \bar{\omega}(k)\;- b_{\omega}(k) $, $ e_{\eta}(k)=\bar{\eta}(k)-b_{\eta}(k) $.

    根据式(11), 有

    $$ \begin{split} b_{\omega}(k)=\;&s(k - 1)\mathcal{Q}\left[\frac{1}{s(k - 1)}(\bar{\omega}(k) - b_{\omega}(k - 1))\right] +\\&b_{\omega}(k-1) \end{split} $$ (20a)
    $$ \begin{split} b_{\eta}(k)=\;&s(k - 1)\mathcal{Q}\left[\frac{1}{s(k - 1)}(\bar{\eta}(k) - b_{\eta}(k - 1))\right]+\\ &b_{\eta}(k-1) \end{split} $$ (20b)

    将式(19a)的左右两边同时减去$ b_{\omega}(k) $, 可以得到

    $$ \begin{split} &\bar{\omega}(k+1)-b_{\omega}(k)={\mathrm{e}}^{\bar{E}T}\bar{\omega}(k)-b_{\omega}(k)=\\ &\quad {{e}}_{\omega}(k)+({\mathrm{e}}^{\bar{E}T}-I_{qN})\bar{\omega}(k)=s(k)\theta_{\omega}(k)\; \end{split} $$ (21)

    其中, $ \theta_{\omega}(k)=\frac{e_{\omega}(k)}{s(k)}+\frac{1}{s(k)}({\mathrm{e}}^{\bar{E}T}-I_{qN})\bar{\omega}(k) $.

    基于式(20a)和式(21), 可得

    $$ \begin{split} e_{\omega}(k+1)=\;&\bar{\omega}(k+1)-b_{\omega}(k+1)= \\ & \bar{\omega}(k+1)-b_{\omega}(k)\;-\\ & s(k)\mathcal{Q}\left[\frac{1}{s(k)}(\bar{\omega}(k+1)-b_{\omega}(k))\right]=\\ & s(k)(\theta_{\omega}(k)-\mathcal{Q}[\theta_{\omega}(k)])\\[-3pt]\end{split} $$ (22)

    同理, 将式(19b)的左右两边同时减去$ b_{\eta}(k) $, 可得

    $$ \begin{split} &\bar{\eta}(k+1)-b_{\eta}(k)=\\ &\quad ({\mathrm{e}}^{E_{h}T}-I_{qN})\bar{\eta}(k)+\int_{0}^{{{T}}}{\mathrm{e}}^{E_{h}\tau}E_{H}{\mathrm{d}}\tau\bar{\omega}(k)\;+\\ &\quad (I_{qN}-\int_{0}^{{{T}}}{\mathrm{e}}^{E_{h}\tau}E_{A}{\mathrm{d}}\tau)e_{\eta}(k)\;-\\ &\quad \int_{0}^{{{T}}}{\mathrm{e}}^{E_{h}\tau}E_{H}{\mathrm{d}}\tau e_{\omega}(k)= s(k)\theta_{\eta}(k)\; \end{split} $$ (23)

    其中,

    $$\begin{split} \theta_{\eta}(k)=&\frac{1}{s(k)}({\mathrm{e}}^{E_{h}T}-I_{qN})\bar{\eta}(k)\;+\\&\frac{1}{s(k)}\int_{0}^{{{T}}}{\mathrm{e}}^{E_{h}\tau} E_{H} {\mathrm{d}}\tau\bar{\omega}(k)\;+\\& \frac{e_{\eta}(k)}{s(k)}(I_{qN}-\int_{0}^{{{T}}}{\mathrm{e}}^{E_{h}\tau}E_{A}{\mathrm{d}}\tau)\;-\\& \frac{e_{\omega}(k)}{s(k)} \int_{0}^{{{T}}}{\mathrm{e}}^{E_{h}\tau} E_{H}{\mathrm{d}}\tau \end{split}$$

    基于式(20b)和式(23), 可得

    $$ \begin{split} e_{\eta}(k+1)=\;&\bar{\eta}(k+1)-b_{\eta}(k+1)=\\ & \bar{\eta}(k+1)-b_{\eta}(k)\;-\\ & s(k)\mathcal{Q}\left[\frac{1}{s(k)}(\bar{\eta}(k+1)-b_{\eta}(k))\right]=\\ & s(k)(\theta_{\eta}(k)-\mathcal{Q}[\theta_{\eta}(k)]) \end{split} $$ (24)

    根据式(22), 式(24)以及量化误差(10), 有

    $$ ||\frac{e_{\omega}(k)}{s(k)}||_{\infty}\leq\frac{1}{2\mu}\; $$ (25a)
    $$ ||\frac{e_{\eta}(k)}{s(k)}||_{\infty}\leq\frac{1}{2\mu}\; $$ (25b)

    由$ \mathop{\lim}\nolimits_{k \to \infty}s(k) = 0 $可知$ \mathop{\lim}\nolimits_{k \to \infty}e_{\omega}(k) = e_{\eta}(k) = 0 $, 进而可知$ \mathop{\lim}\nolimits_{t \to \infty}e_{\omega}(t) = e_{\eta}(t) = 0 $. 由$ \bar{E}-\alpha(H\otimes I_{q}) $是赫尔维玆的, $ \mathop{\lim}\nolimits_{t \to \infty}e_{\omega}(t)=e_{\eta}(t)=0 $, 根据文献[39]引理$ 9.1 $, 可知$ \mathop{\lim}\nolimits_{t \to \infty}\tilde{\eta}(t)=0 $. 因此, 对于每个跟随者$ i \in \mathcal{V} $, 有$ \mathop{\lim}\nolimits_{t \to \infty}\tilde{\eta}_{i}(t)=0 $.

    在第3节中, 通过设计的分布式量化观测器可使每个跟随者渐近观测到外部系统的状态信息. 在本节中, 将观测到的估计值$ \eta_{i}(t) $引入到自适应动态规划算法的学习阶段, 进而设计一种数据驱动的方法来解决量化通信下的协同最优输出调节问题. 值得注意的是, 该方法能够近似逼近控制增益$ K^* $与$ L^* $, 而不需要知道系统矩阵$ A_{i} $, $ B_{i} $与$ D_{i} $的先验知识.

    考虑第$ i $个智能体, 定义$ \bar{x}_{ij}=x_{i}-X_{ij}\omega $, $ X_{ij}\in \bf{R}^{n_{i}\times q} $表示$ C_{i}X_{ij}+F=0 $的基础解系. 其中, $ i \in \mathcal{V} $, $ j=0,\;1,\;\cdots,\;h_{i}+1 $. $ h_{i}=(n_{i}-p_{i })q $ 表示 $ I_{q}\otimes C_{i} $零空间的维数. 接下来, 定义一个西尔维斯特方程$ S_{i}(X_{ij})=X_{ij}E-A_{i}X_{ij} $, $ X_{ij} \in \bf{R}^{n_{i} \times q} $, 根据输入误差变量$ \bar{u}_{i}=u_{i}-U_{i}^{*}\omega $与(2a), 式(4)可改写为

    $$ \begin{split} \dot{\bar{x}}_{i}=&\;A_{i}\bar{x}_{i}+B_{i}\bar{u}_{i}=\\ &\bar{A}_{i}\bar{x}_{ij}+B_{i}(K_{i,\;k}\bar{x}_{ij}+u_{i})\;+\\ &(D_{i}-S_{i}(X_{ij}))\omega =\\ &\bar{A}_{i}\bar{x}_{ij}+B_{i}(K_{i,\;k}\bar{x}_{ij}+u_{i})\;+\\ & (D_{i}-S_{i}(X_{ij}))\eta_{i}-(D_{i}-S_{i}(X_{ij}))\tilde{\eta}_{i} \end{split} $$ (26)

    其中, $ \bar{A}_{i}=A_{i}-B_{i}K^{*}_{i} $. 通过增大$ \alpha $, 可使$ \tilde{\eta}_{i}(t) $以所需的速度收敛到零[32].

    根据式(26)以及代数黎卡提方程(7)和(8), 有

    $$ \begin{split} &\bar{x}^{{\mathrm{T}}}_{ij}(t+\delta)P_{i,\;k}\bar{x}_{ij}(t+\delta)-\bar{x}^{{\mathrm{T}}}_{ij}(t)P_{i,\;k}\bar{x}_{ij}(t)=\\ &\quad\int_{t}^{t+\delta} (\bar{x}^{{\mathrm{T}}}_{ij}(\bar{A}_{i}^{{\mathrm{T}}}P_{i,\;k}+P_{i,\;k}\bar{A}_{i})\bar{x}_{ij}\;+\\ &\quad2(u_{i}+K_{i,\;k}\bar{x}_{ij})^{{\mathrm{T}}}B^{{\mathrm{T}}}_{i}P_{i,\;k}\bar{x}_{ij}\;+\\ &\quad2\eta_{i}^{{\mathrm{T}}}(D_{i}-S_{i}(X_{ij}))^{{\mathrm{T}}}P_{i,\;k}\bar{x}_{ij})\,\; {\mathrm{d}}\tau=\\ &\quad\int_{t}^{t+\delta} (-\bar{x}^{{\mathrm{T}}}_{ij}(\bar{Q}_{i}+ K^{{\mathrm{T}}}_{i,\;k}\bar{R}_{i}K_{i,\;k})\bar{x}_{ij}\;+\\ &\quad2(u_{i}+K_{i,\;k}\bar{x}_{ij})^{{\mathrm{T}}}\bar{R}_{i}K_{i,\;k+1}\bar{x}_{ij}\;+\\ &\quad2\eta_{i}^{{\mathrm{T}}}(D_{i}-S_{i}(X_{ij}))^{{\mathrm{T}}}P_{i,\;k}\bar{x}_{ij})\,\; {\mathrm{d}}\tau \end{split} $$ (27)

    通过克罗内克积的性质, 有

    $$ \begin{split} &\bar{x}^{{\mathrm{T}}}_{ij}(\bar{Q}_{i}+ K^{{\mathrm{T}}}_{i,\;k}\bar{R}_{i}K_{i,\;k})\bar{x}_{ij}= \\ &\quad(\bar{x}^{{\mathrm{T}}}_{ij}\otimes \bar{x}^{{\mathrm{T}}}_{ij}){\rm vec}(\bar{Q}_{i}+ K^{{\mathrm{T}}}_{i,\;k}\bar{R}_{i}K_{i,\;k})\; \end{split} $$ (28a)
    $$ \begin{split} &(u_{i}+K_{i,\;k}\bar{x}_{ij})^{{\mathrm{T}}}\bar{R}_{i}K_{i,\;k+1}\bar{x}_{ij} =\\ &\quad((\bar{x}^{{\mathrm{T}}}_{ij}\otimes \bar{x}^{{\mathrm{T}}}_{ij})(I_{ni}\otimes K^{{\mathrm{T}}}_{i,\;k}\bar{R}_{i})\;+ \\ &\quad(\bar{x}^{{\mathrm{T}}}_{ij}\otimes u^{{\mathrm{T}}}_{i})(I_{ni}\otimes \bar{R}_{i})){\rm vec}(K_{i,\;k+1})\; \end{split} $$ (28b)
    $$ \begin{split} &\eta_{i}^{{\mathrm{T}}}(D_{i}-S_{i}(X_{ij}))^{{\mathrm{T}}}P_{i,\;k}\bar{x}_{ij}= \\ &\quad(\bar{x}^{{\mathrm{T}}}_{ij}\otimes \eta_{i}^{{\mathrm{T}}}){\rm vec}((D_{i}-S_{i}(X_{ij}))^{{\mathrm{T}}}P_{i,\;k}) \end{split} $$ (28c)

    对于任意两个向量$ p $, $ q $以及正整数$ c $, 定义以下矩阵

    $$ \begin{split} {\Pi}_{pp}=\;&[\mathrm{vecv}(p(t_{1}))-\mathrm{vecv}(p(t_{0})),\;\cdots,\; \\ & \mathrm{vecv}(p(t_{c}))-\mathrm{vecv}(p(t_{c-1}))]^{{\mathrm{T}}}\; \end{split} $$ (29a)
    $$ {\Xi}_{pq}=\left[\int_{t_{0}}^{t_{1}}p\otimes q {\mathrm{d}}\tau,\;\cdots,\;\int_{t_{c-1}}^{t_{c}}p\otimes q {\mathrm{d}}\tau \right]^{{\mathrm{T}}}\; $$ (29b)

    其中, $ t_0<t_1<\cdots<t_c $, 基于以上矩阵定义, 通过式(27)得到以下矩阵方程

    $$ \Psi_{ij,\;k} \begin{bmatrix} {\rm vecs}(P_{i,\;k}) \\ {\rm vec}(K_{i,\;k+1})\\ {\rm vec}((D_{i}-S_{i}(X_{ij})^{{\mathrm{T}}}P_{i,\;k}) \end{bmatrix} =\Phi_{ij,\;k} $$ (30)

    其中,

    $$ \begin{split} \Psi_{ij,\;k}=\;&[ \Pi_{\bar{x}_{ij}\bar{x}_{ij}},\; -2\Xi_{\bar{x}_{ij}\bar{x}_{ij}}(I_{ni}\otimes K^{{\mathrm{T}}}_{i,\;k}\bar{R}_{i}) \;-\\ & 2\Xi_{\bar{x}_{ij}u_{i}}(I_{ni}\otimes \bar{R}_{i}),\;-2\Xi_{\bar{x}_{ij}\eta_{i}}]\; \end{split} $$ (31a)
    $$ \Phi_{ij,\;k}= -\Xi_{\bar{x}_{ij}\bar{x}_{ij}} {\rm vec}(\bar{Q}_{i}+K^{{\mathrm{T}}}_{i,\;k}\bar{R}_{i}K_{i,\;k}) $$ (31b)

    如果矩阵$ \Psi_{ij,\;k} $列满秩, 则式(30)具有唯一解. 文献[30]引理$ 3 $中给出矩阵$ \Psi_{ij,\;k} $列满秩的充分条件. 如果存在正整数$ c^{*} $使得任意的$ c>c^{*} $和时间序列$ t_{0}<t_{1}<\cdots<t_{c} $, 满足以下条件时,

    $$ \begin{split}& {\rm rank}([\Xi_{\bar{x}_{ij}\bar{x}_{ij}},\;\Xi_{\bar{x}_{ij}u_{i}},\;\Xi_{\bar{x}_{ij}\eta_{i}}])=\\&\quad \frac{n_{i}(n_{i}+1)}{2}+(m_{i}+q)n_{i}\; \end{split} $$ (32)

    矩阵$ \Psi_{ij,\;k} $对任意正整数$ k $列满秩.

    根据调节器方程(2), 西尔维斯特方程$ S_{i}(X_{ij})= X_{ij}E-A_{i}X_{ij} $以及式(30)的解, 能够求得调节器方程的解$ (X_{i},\;U_{i}) $. 该方法与文献[3]中式(27)的求解思路一致, 这里不做赘述.

    为确保满秩条件(32)能够得到满足, 在学习阶段$ [t_{0},\;t_{c}] $, 本文在初始控制策略上增加探测噪声$ \xi_{i} $, 即$ u_{i0}=-K_{i0}x_{i}+\xi_{i} $, 其中, $ K_{i0} $使$ A_{i}-B_{i}K_{i0} $赫尔维玆.

    据此, 针对量化通信下的自适应协同最优输出调节问题, 本文给出一个在线学习算法, 即算法1.

    算法1. 基于自适应动态规划的量化通信下协同最优输出调节算法

    1: 令$ i=1 $

    2: 选择一个初始控制策略$ u_{i0}=-K_{i0}x_{i}+\xi_{i} $

    3: 通过式(13)计算编码−解码后对外部系统状态的估 计值$ \eta_{i} $

    4: 计算满足条件(32)的$ \Xi_{\bar{x}_{ij}\bar{x}_{ij}},\;\Xi_{\bar{x}_{ij}u_{i}},\;\Xi_{\bar{x}_{ij}\eta_{i}} $

    5: 令$ k=0 $

    6: 通过式(30)求解$ P_{i,\;k} $, $ K_{i,\;k+1} $以及$ S_{i}(X_{ij}) $

    7: 令$ k\gets k+1 $, 重复步骤6, 直至满足$ ||P_{i,\;k}\;- \qquad\qquad P_{i,\;k-1}||<c_{i} $, 其中, 阈值$ c_{i} $为足够小的正数

    8: $ k^{*}\gets k $

    9: $ P_{i,\;k^*}\gets P_{i,\;k} $, $ K_{i,\;k^*}\gets K_{i,\;k} $

    10: 通过$ S_{i}(X_{ij}) $以及问题1求解调节器方程的最优解    $ (X^{*}_{i},\;U^{*}_{i}) $, $ L_{i,\;k^*}=K_{i,\;k^*}X^{*}_{i}+U^{*}_{i} $

    11: 学习到的次优控制策略为

    $$ u_{i}^*=-K_{i,\;k^*}x_{i}+L_{i,\;k^*}\eta_{i}\; $$ (33)

    12: 令$ i\gets i+1 $, 重复步骤2 ~ 11, 直至$ i=N $.

    注4. 本文利用所设计的算法1通过系统状态$ x_{i} $, 输入$ u_{i} $以及对外部系统状态的估计值$ \eta_{i} $在线学习次优控制策略(3), 而不需要依赖系统矩阵$ A_{i} $, $ B_{i} $与$ D_{i} $的先验知识. 然而, 由于在分布式量化观测器的设计部分应用外部系统的矩阵信息, 因此要求跟随者对外部系统矩阵$ E $是已知的. 目前, 在精确通信下, 文献[7, 11]不要求跟随者对外部系统矩阵$ E $是已知的, 即已经研究了部分/全部跟随者无法访问领导者系统矩阵信息的情况, 并设计了自适应分布式观测器. 然而在量化通信下, 文献[7, 11]中所设计的自适应分布式观测器并不适用, 需要设计自适应分布式量化观测器对外部系统矩阵$ E $的估计值$ E_{i}(t) $进行观测, 其中观测器中包含经过编码−解码方案后传输的信息$ \hat{E}_{i}(t) $, 我们难以保证估计误差$ {\lim}_{t \to \infty}(E_{i}(t)-E) $收敛到零, 这对我们的研究带来全新的挑战, 在未来的工作中将进一步研究.

    接下来, 给出关于控制增益$ K_{i,\;k^*} $和值$ P_{i,\;k^*} $的收敛性的定理.

    定理2. 在满足条件(32)的情况下, 对于任意小的参数$ \delta>0 $, 存在充分大的$ \alpha>0 $使由算法1得到的解$ \left\{P_{i,\;k}\right\}_{k=0}^{\infty} $和$ \left\{K_{i,\;k}\right\}_{k=0}^{\infty} $满足不等式$ ||P_{i,\;k^*}- P_{i}^*||<\delta $, $ ||K_{i,\;k^*}-K_{i}^*||<\delta $, 其中$ i \in \mathcal{V} $. 且由算法1得到的次优控制策略能够实现量化通信下的协同最优输出调节.

    证明. 令$ \left\{\bar{P}_{i,\;k}\right\}_{k=0}^{\infty} $, $ \left\{\bar{K}_{i,\;k}\right\}_{k=0}^{\infty} $为基于模型迭代方法得到的解.

    基于模型方法的收敛性分析已经在文献[36]中得到证明. 对于每个跟随者$ i \in \mathcal{V} $, 存在$ k^* $使得以下不等式成立, 即

    $$ \begin{split}& ||\bar{K}_{i,\;k^*}-K_{i}^*||<\frac{\delta}{2}\;\\& ||\bar{P}_{i,\;k^*}-P_{i}^*||<\frac{\delta}{2} \end{split} $$ (34)

    接下来, 需要证明算法1在每次迭代中学到的控制增益$ K_{i,\;k} $和值$ P_{i,\;k} $足够接近基于模型算法(7)和(8)得到的控制增益$ \bar{K}_{i,\;k} $和值$ \bar{P}_{i,\;k} $. 下面将通过归纳法证明.

    当$ k=0 $时, 对于所有的跟随者$ i \in \mathcal{V} $, 有$ K_{i0}= \bar{K}_{i0} $. 定义$ \Delta P_{i0}=P_{i0}-\bar{P}_{i0} $. $ \Delta P_{i0} $可通过以下方程进行求解, 即

    $$ \begin{split}& \Psi_{ij,\;0} \begin{bmatrix} {\rm vecs}(\Delta P_{i0}) \\ {\rm vec}(\bar{R}^{-1}_{i}B^{{\mathrm{T}}}_{i}\Delta P_{i0})\\ {\rm vec}((D_{i}-S_{i}(X_{ij}))^{{\mathrm{T}}}\Delta P_{i0})\\ \end{bmatrix}=\\&\qquad 2\Xi_{\bar{x}_{ij}\tilde{\eta}_{i}}{\rm vec}((D_{i}-S_{i}(X_{ij}))^{{\mathrm{T}}}\bar{P}_{i0}) \end{split} $$ (35)

    令$ ||\Delta\tilde{\eta}||=\max\nolimits_{t_{0}\leq t\leq t_{c}}\tilde{\eta}(t) $, 可知

    $$\begin{split}& \lim\nolimits_{||\Delta\tilde{\eta}||\rightarrow0} (P_{i0}- \bar{P}_{i0})=0\\ &\lim\nolimits_{||\Delta\tilde{\eta}||\rightarrow0}(K_{i1}-\bar{K}_{i1})=\\&\qquad\lim\nolimits_{||\Delta\tilde{\eta}||\rightarrow0} (\bar{R}^{-1}_{i}B^{{\mathrm{T}}}_{i}(P_{i0}- \bar{P}_{i0}))=0 \end{split}$$

    当$ k=p $时, 假设$ \lim\nolimits_{||\Delta\tilde{\eta}||\rightarrow0}(K_{ip}-\bar{K}_{ip})=0 $. 令$ \Delta P_{ip}= P_{ip}-\bar{P}_{ip} $. $ \Delta P_{ip} $可通过以下方程进行求解

    $$ \Psi_{ij,\;0} \begin{bmatrix} {\rm vecs}(\Delta P_{ip}) \\ {\rm vec}(\bar{R}^{-1}_{i}B^{{\mathrm{T}}}_{i}\Delta P_{ip})\\ {\rm vec}((D_{i}-S_{i}(X_{ij}))^{{\mathrm{T}}}\Delta P_{ip}) \end{bmatrix} =\Delta \Phi_{ij,\;p} $$ (36)

    其中, $ \lim\nolimits_{||\Delta\tilde{\eta}||\rightarrow0}\Delta \Phi_{ij,\;p}=0 $. 因此, 可得

    $$\begin{split}&\lim\nolimits_{||\Delta\tilde{\eta}||\rightarrow0} (P_{ip}-\bar{P}_{ip})=0\\ &\lim\nolimits_{||\Delta\tilde{\eta}||\rightarrow0}(K_{i,\;p+1}- \bar{K}_{i,\;p+1})=\\& \qquad\lim\nolimits_{||\Delta\tilde{\eta}||\rightarrow0} (\bar{R}^{-1}_{i}B^{{\mathrm{T}}}_{i}(P_{ip}- \bar{P}_{ip}))=0 \end{split}$$

    通过增大$ \alpha $的值能够加速$ \Delta\tilde{\eta} $的收敛, 对于充分大的$ \alpha>0 $, 总能找到足够小的$ \Delta\tilde{\eta} $使得在任何迭代$ k $处, 满足不等式$ ||P_{i,\;k}-\bar{P}_{i,\;k}||<\delta/2 $, $ ||K_{i,\;k}\;- \bar{K}_{i,\;k}||<\delta/2 $.

    因此, 当$ k=k^* $时, 以下不等式成立, 即

    $$ \begin{split}& ||K_{i,\;k^*}-\bar{K}_{i,\;k^*}||<\frac{\delta}{2}\;\\& ||P_{i,\;k^*}-\bar{P}_{i,\;k^*}||<\frac{\delta}{2} \end{split} $$ (37)

    根据式(34)与式(37), 通过矩阵三角不等式可知, $ ||P_{i,\;k^*}-P_{i}^*||<\delta $, $ ||K_{i,\;k^*}-K_{i}^*||<\delta $.

    接下来, 证明由算法1得到的次优控制策略能够实现量化通信下的协同最优输出调节. 令$ \tilde{\eta}_{i}(t)= \eta_{i}(t)-\omega(t) $, 由定理1可知, 在量化通信, 对外部系统状态的估计误差$ \mathop{\lim}\nolimits_{t \to \infty}\tilde{\eta}_{i}(t)=0 $. 对于$ \dot{\bar{x}}_{i}(t)= (A_{i}-B_{i}K^{*}_{i})\bar{x}_{i}(t)+B_{i}L^{*}_{i}\tilde{\eta}(t) $, 由于$ A_{i}- B_{i}K^{*}_{i} $是赫尔维玆的, $ \mathop{\lim}\nolimits_{t \to \infty}\tilde{\eta}_{i}(t)=0 $, 根据文献[39]引理$ 9.1 $, 可知$ \mathop{\lim}\nolimits_{t \to \infty}\bar{x}_{i}(t) = 0 $. 根据式(4b)可知$ e_{i}(t)= C\bar{x}_{i}(t) $, 因此$ \mathop{\lim}\nolimits_{t \to \infty}e_{i}(t)=0 $, 实现了多智能体系统的量化通信下协同最优输出调节.

    在本节中, 我们将算法1应用于智能车联网的纵向协同自适应巡航控制[3, 40]. 协同自适应巡航控制是一种基于无线通信的智能自动驾驶策略, 车辆的通信拓扑如图 3所示, 外部系统仅可被车辆$ \#1 $直接访问.

    图 3  车辆通信拓扑图
    Fig. 3  Vehicular platoon communication topology

    利用以下模型对第$ i\;(i=1,\;2,\;3,\;4) $辆车进行建模:

    $$ \begin{split} x_{i}&=\upsilon_{i}\;\\ \dot{\upsilon}_{i}&=a_{i}\;\\ \dot{a}_{i}&=\sigma^{-1}_{i}a_{i}+\sigma^{-1}_{i}u_{i}+d_{i}\; \end{split} $$ (38)

    其中, $ x_{i} $, $ \upsilon_{i} $, $ a_{i} $, $ \sigma_{i} $分别为车辆$ \#i $发动机的位置、速度、加速度和时间常数. 常数$ d_{i} $是机械阻力与$ \sigma_{i} $和车辆$ \#i $质量的乘积之比. $ \sigma_{i} $与$ d_{i} $的值与文献[3]相同.

    车辆$ \#i $的参考轨迹$ x^{*}_{i} $和干扰信号$ d_{i} $均由以下外部系统产生

    $$ \begin{split}& \dot{\omega}_{1}=0.7\omega_{2}\;\\& \dot{\omega}_{2}=-0.7\omega_{1}\;\\& \dot{d_{i}}=d_{i}\omega_{2}\;\\& x^{*}_{i}=-5\omega_{1}-10(i+1)\omega_{2}\; \end{split} $$ (39)

    外部系统状态的初值为$ \omega(t)=[\omega_{1}(t)\; \; \; \omega_{2}(t)]= [0\; \; \; 1]^{{\mathrm{T}}} $.

    接下来, 对量化通信下的智能车联网系统进行仿真. 其中观测器参数$ \alpha=10 $, 调节函数$ s(k) $的初值为$ s(0)=0.05 $, 参数$ \mu=0.8 $. 外部系统状态估计误差$ \tilde{\eta}_{i}(t) $的收敛性如图 4所示.

    图 4  量化通信下外部系统状态估计误差$\tilde{\eta}_{i}(t)$的轨迹
    Fig. 4  The trajectory of the exosystem state estimation error $\tilde{\eta}_{i}(t)$ under quantized communication

    图 4可知, 选择的参数$ \alpha $能够保证$ \tilde{\eta}_{i}(t) $足够小, 当$ t>30 $s时, $ \tilde{\eta}_{i}(t)<10^{-6} $.

    当$ t<10 $s时, 我们应用初始控制策略$ u_{i0}= -K_{i0}x_{i}+\xi_{i} $, 其中探测噪声$ \xi_{i} $为不同频率的正弦信号的总和. 根据算法1迭代学习到控制增益$ K_{i,\;k} $和值$ P_{i,\;k} $, 其中每辆车的值$ P_{i,\;k} $与基于模型情况下得到的最优值$ P_{i}^{*} $的比较结果如图 5所示.

    图 5  每辆车$P_{i,\;k}$与最优解$P_{i}^{*}$的比较
    Fig. 5  Comparisons of $P_{i,\;k}$ and the optimal solution $ P_{i}^{*}$ of each vehicle

    图 5可知, 当迭代次数$ k=9 $时, $ P_{i,\;k} $能够收敛到最优解$ P_{i}^{*} $. 也就是说, 经过9次迭代之后, 所有车辆均能学习到最优控制值.

    当$ t=10 $s时, 通过学习到的最优控制增益$ (K_{i,\;k^*},\; P_{i,\;k^*}) $更新次优控制策略(3)并应用于智能车联网系统, 其实际轨迹$ x_{i} $与参考轨迹$ x^{*}_{i} $的跟踪情况如图 6所示. 仿真结果表明, 所有的车辆均能实现对参考轨迹的跟踪.

    图 6  智能互联自动驾驶车辆的实际轨迹$x_{i}$与参考轨迹$x^{*}_{i}$
    Fig. 6  Actual trajectories $x_{i}$ of connected and autonomous vehicles and their references $x^{*}_{i}$

    若当$ t=10 $s时, 不采用更新后的次优控制策略(3), 而是继续使用初始控制策略, 则初始控制策略控制下的智能车联网系统的实际轨迹$ x_{i} $与参考轨迹$ x^{*}_{i} $的跟踪情况如图 7所示. 从图 6图 7的对比可知, 通过算法1得到的次优控制策略能够实现车联网自动驾驶车辆在有干扰情况下对参考轨迹的跟踪.

    图 7  初始控制策略下智能互联自动驾驶车辆的实际轨迹$x_{i}$与参考轨迹$x^{*}_{i}$
    Fig. 7  Actual trajectories $x_{i}$ of intelligent connected and autonomous vehicles and their references $x^{*}_{i}$ under the initial control strategy

    接下来, 通过表 1比较量化通信对车辆间通信传输比特数的影响.

    表 1  达到$ ||P_{i,\;k}-P_{i}^{*}||<10^{-4} $有无量化通信传输的比特数
    Table 1  Transmitted bits with and without quantized communication to reach $ ||P_{i,\;k}-P_{i}^{*}||<10^{-4} $
    算法1下传输的比特数 无量化通信传输的比特数[3] 降低百分比
    80000 192000 58.33%
    下载: 导出CSV 
    | 显示表格

    表 1可知, 量化通信下只需要传输较少的比特数就能够达到特定的收敛误差, 量化通信下降低了$ 58.33\% $比特.

    本文研究量化通信下系统动态未知的连续时间多智能体系统的协同最优输出调节问题. 通过引入均匀量化器与编码−解码方案, 设计一种基于采样和量化数据的分布式协议, 用于观测外部系统状态, 在保证外部系统状态估计误差收敛的同时, 降低多智能体间的通信负担. 针对一类具有不确定系统动态的多智能体系统, 设计一种自适应动态规划方法, 用于多智能体系统的协同最优输出调节. 理论分析和在智能车联网自适应巡航控制系统上的仿真验证表明, 模型未知的多智能体系统能够在量化通信下实现渐近跟踪与干扰抑制. 我们未来的研究将考虑在有限带宽通信约束下, 针对外部系统状态与系统矩阵全部未知的非线性多智能体系统设计自适应最优控制策略.

  • 图  1  PFC启发Transformer结构

    Fig.  1  PFC-inspired Transformer structure

    图  2  PFC生物功能启发生物功能模型与神经网络架构

    Fig.  2  PFC biofunctional-inspired biofunctional model with neural network architecture

    图  3  PFC与类脑智能相互促进、共同进步

    Fig.  3  PFC and brain-like intelligence promote each other and progress together

    图  4  PFC与Transformer注意力相关模型架构[14, 23]

    Fig.  4  PFC and Transformer attention-related model architecture[14, 23]

    图  5  PFC与Transformer以注意力机制为媒介相互启发

    Fig.  5  PFC and Transformer inspire each other through the medium of the attention mechanism

    图  6  标量相对位置编码 (SRPE) 原理

    Fig.  6  Principle of Scalar Relative Position Encoding(SRPE)

    图  7  PFC生物编码过程启发Transformer位置编码

    Fig.  7  PFC biological coding process inspired Transformer position coding

    图  8  不同模态下EEG分类的多尺度卷积Transformer模型[63]

    Fig.  8  Multi-scale convolutional Transformer model for EEG classification in different modalities[63]

    图  9  PFC多感觉融合与多模态Transformer逻辑结构图

    Fig.  9  PFC multisensory fusion with multimodal Transformer logic structure diagram

  • [1] CECCARELLI F, FERRUCCI L, LONDEI F, et al. Static and dynamic coding in distinct cell types during associative learning in the prefrontal cortex. Nature Communications, 2023, 14(1): 8325. doi: 10.1038/s41467-023-43712-2
    [2] WATAKABE A, SKIBBE H, NAKAE K, et al. Local and long-distance organization of prefrontal cortex circuits in the marmoset brain [J]. Neuron, 2023, 111 (14): 2258-73. e10.
    [3] PASSINGHAM R E, LAU H. Do we understand the prefrontal cortex?. Brain Structure and Function, 2023, 228(5): 1095−105.
    [4] TRAPP N T, BRUSS J E, MANZEL K, et al. Large-scale lesion symptom mapping of depression identifies brain regions for risk and resilience. Brain, 2023, 146(4): 1672−85. doi: 10.1093/brain/awac361
    [5] CHAFEE M V, HEILBRONNER S R. Prefrontal cortex. Current Biology, 2022, 32(8): R346−R51. doi: 10.1016/j.cub.2022.02.071
    [6] MILLER E K, COHEN J D. An integrative theory of prefrontal cortex function. Annual Review Of Neuroscience, 2001, 24: 167−202. doi: 10.1146/annurev.neuro.24.1.167
    [7] DIEHL G W, REDISH A D. Differential processing of decision information in subregions of rodent medial prefrontal cortex [J]. Elife, 2023, 12 .
    [8] WANG J X, KURTH-NELSON Z, KUMARAN D, et al. Prefrontal cortex as a meta-reinforcement learning system. Nature Neuroscience, 2018, 21(6): 860−8. doi: 10.1038/s41593-018-0147-8
    [9] ALEXANDER W H, BROWN J W. Medial prefrontal cortex as an action-outcome predictor. Nature Neuroscience, 2011, 14(10): 1338−44. doi: 10.1038/nn.2921
    [10] RIKHYE R V, GILRA A, HALASSA M M. Thalamic regulation of switching between cortical representations enables cognitive flexibility. Nature Neuroscience, 2018, 21(12): 1753−63. doi: 10.1038/s41593-018-0269-z
    [11] GISIGER T, BOUKADOUM M. Mechanisms Gating the Flow of Information in the Cortex: What They Might Look Like and What Their Uses may be. Frontiers In Computational Neuroscience, 2011, 5: 1.
    [12] JOHNSTON K, LEVIN H M, KOVAL M J, EVERLING S. Top-down control-signal dynamics in anterior cingulate and prefrontal cortex neurons following task switching. Neuron, 2007, 53(3): 453−62. doi: 10.1016/j.neuron.2006.12.023
    [13] TSUDA B, TYE K M, SIEGELMANN H T, SEJNOWSKI T J. A modeling framework for adaptive lifelong learning with transfer and savings through gating in the prefrontal cortex. Proceedings Of The National Academy Of Sciences, 2020, 117(47): 29872−82. doi: 10.1073/pnas.2009591117
    [14] WANG Z, ZHANG J, ZHANG X, et al. Transformer model for functional near-infrared spectroscopy classification. IEEE Journal Of Biomedical And Health Informatics, 2022, 26(6): 2559−69. doi: 10.1109/JBHI.2022.3140531
    [15] CHOI S R, LEE M. Transformer architecture and attention mechanisms in genome data analysis: a comprehensive review. Biology, 2023, 12(7): 1033. doi: 10.3390/biology12071033
    [16] LI Q, ZHUANG Y. An efficient image-guided-based 3D point cloud moving object segmentation with transformer-attention in autonomous driving. International Journal Of Applied Earth Observation And Geoinformation, 2023, 123: 103488. doi: 10.1016/j.jag.2023.103488
    [17] BRUS J, HENG J A, BELIAEVA V, et al. Causal phase-dependent control of non-spatial attention in human prefrontal cortex. Nature Human Behaviour, 2024, 8(4): 743−57. doi: 10.1038/s41562-024-01820-z
    [18] BICHOT N P, HEARD M T, DEGENNARO E M, DESIMONE R. A Source for Feature-Based Attention in the Prefrontal Cortex. Neuron, 2015, 88(4): 832−44. doi: 10.1016/j.neuron.2015.10.001
    [19] HUANG L, WANG J, HE Q, et al. A source for category-induced global effects of feature-based attention in human prefrontal cortex. Cell Reports, 2023, 42(9): 113080. doi: 10.1016/j.celrep.2023.113080
    [20] ZHAO M, XU D, GAO T. From Cognition to Computation: A Comparative Review of Human Attention and Transformer Architectures [J]. arXiv preprint arXiv: 240701548, 2024.
    [21] KUMAR S, SUMERS T R, YAMAKOSHI T, et al. Shared functional specialization in transformer-based language models and the human brain. Nature Communications, 2024, 15(1): 5523. doi: 10.1038/s41467-024-49173-5
    [22] MULLER L, CHURCHLAND P S, SEJNOWSKI T J. Transformers and Cortical Waves: Encoders for Pulling In Context Across Time [J]. arXiv preprint arXiv: 240114267, 2024.
    [23] HUANG H, LI R, QIAO X, et al. Attentional control influence habituation through modulation of connectivity patterns within the prefrontal cortex: Insights from stereo-EEG. Neuroimage, 2024, 294: 120640. doi: 10.1016/j.neuroimage.2024.120640
    [24] LI N, CHEN Y, LI W, et al. BViT: Broad attention-based vision transformer [J]. IEEE Transactions On Neural Networks And Learning Systems, 2023.
    [25] SHI Q, FAN J, WANG Z, ZHANG Z. Multimodal channel-wise attention transformer inspired by multisensory integration mechanisms of the brain. Pattern Recognition, 2022, 130: 108837. doi: 10.1016/j.patcog.2022.108837
    [26] GONG D, ZHANG H. Self-Attention Limits Working Memory Capacity of Transformer-Based Models [J]. arXiv preprint arXiv: 240910715, 2024.
    [27] MAITH O, SCHWARZ A, HAMKER F H. Optimal attention tuning in a neuro-computational model of the visual cortex-basal ganglia-prefrontal cortex loop. Neural Networks, 2021, 142: 534−47. doi: 10.1016/j.neunet.2021.07.008
    [28] SPITALE G, BILLER-ANDORNO N, GERMANI F. AI model GPT-3 (dis) informs us better than humans. Science Advances, 2023, 9(26): eadh1850. doi: 10.1126/sciadv.adh1850
    [29] BROWN T, MANN B, RYDER N, et al. Language models are few-shot learners. Advances In Neural Information Processing Systems, 2020, 33: 1877−901.
    [30] ZHANG S, ROLLER S, GOYAL N, et al. Opt: Open pre-trained transformer language models [J]. arXiv preprint arXiv: 220501068, 2022.
    [31] YUE F, KO T. An Investigation of Positional Encoding in Transformer-based End-to-end Speech Recognition [Z]. In: Proceedings of the 12th International Symposium on Chinese Spoken Language Processing (ISCSLP). 2021: 1-5.10. 1109/iscslp49672.2021. 9362093
    [32] KAZEMNEJAD A, PADHI I, NATESAN RAMAMURTHY K, et al. The impact of positional encoding on length generalization in transformers [J]. Advances In Neural Information Processing Systems, 2024, 36 .
    [33] CHOWDHERY A, NARANG S, DEVLIN J, et al. Palm: Scaling language modeling with pathways. Journal Of Machine Learning Research, 2023, 24(240): 1−113.
    [34] ZHANG R, HAN J, LIU C, et al. Llama-adapter: Efficient fine-tuning of language models with zero-init attention [J]. arXiv preprint arXiv: 230316199, 2023.
    [35] TOUVRON H, LAVRIL T, IZACARD G, et al. Llama: Open and efficient foundation language models [J]. arXiv preprint arXiv: 230213971, 2023.
    [36] WU J, ZHANG R, MAO Y, CHEN J. On scalar embedding of relative positions in attention models; In: Proceedings of the AAAI Conference on Artificial Intelligence, F, 2021, 35(16): 14050-14057 [C].
    [37] WALLIS J D, ANDERSON K C, MILLER E K. Single neurons in prefrontal cortex encode abstract rules. Nature, 2001, 411(6840): 953−6. doi: 10.1038/35082081
    [38] OOTA S R, ARORA J, ROWTULA V, et al. Visio-linguistic brain encoding [J]. arXiv preprint arXiv: 220408261, 2022.
    [39] BOCINCOVA A, BUSCHMAN T J, STOKES M G, MANOHAR S G. Neural signature of flexible coding in prefrontal cortex. Proceedings Of The National Academy Of Science USA, 2022, 119(40): e2200400119. doi: 10.1073/pnas.2200400119
    [40] ZHANG K, HAO W, YU X, SHAO T. An interpretable image classification model Combining a fuzzy neural network with a variational autoencoder inspired by the human brain. Information Sciences, 2024, 661: 119885. doi: 10.1016/j.ins.2023.119885
    [41] AOI M C, MANTE V, PILLOW J W. Prefrontal cortex exhibits multidimensional dynamic encoding during decision-making. Nature Neuroscience, 2020, 23(11): 1410−20. doi: 10.1038/s41593-020-0696-5
    [42] ZHANG Z, GONG X. Positional label for self-supervised vision transformer; In: Proceedings of the AAAI Conference on Artificial Intelligence, F, 2023, 37(3): 3516-3524 [C].
    [43] CAUCHETEUX C, GRAMFORT A, KING J R. Evidence of a predictive coding hierarchy in the human brain listening to speech. Nature Human Behavaviour, 2023, 7(3): 430−41. doi: 10.1038/s41562-022-01516-2
    [44] BUSCH A, ROUSSY M, LUNA R, et al. Neuronal activation sequences in lateral prefrontal cortex encode visuospatial working memory during virtual navigation. Nature Communications, 2024, 15(1): 4471. doi: 10.1038/s41467-024-48664-9
    [45] LABAIEN J, IDé T, CHEN P-Y, et al. Diagnostic spatio-temporal transformer with faithful encoding [J]. Knowledge-Based Systems, 2023, 274 .
    [46] DEIHIM A, ALONSO E, APOSTOLOPOULOU D. STTRE: A Spatio-Temporal Transformer with Relative Embeddings for multivariate time series forecasting. Neural Networks, 2023, 168: 549−59. doi: 10.1016/j.neunet.2023.09.039
    [47] MA Y, WANG R. Relative-position embedding based spatially and temporally decoupled Transformer for action recognition [J]. Pattern Recognition, 2024, 145 .
    [48] COEN P, SIT T P H, WELLS M J, et al. Mouse frontal cortex mediates additive multisensory decisions [J]. Neuron, 2023, 111 (15): 2432-47 e13.
    [49] FERRARI A, NOPPENEY U. Attention controls multisensory perception via two distinct mechanisms at different levels of the cortical hierarchy. PLoS Biology, 2021, 19(11): e3001465. doi: 10.1371/journal.pbio.3001465
    [50] MIHALIK A, NOPPENEY U. Causal inference in audiovisual perception. Journal Of Neuroscience, 2020, 40(34): 6600−12. doi: 10.1523/JNEUROSCI.0051-20.2020
    [51] KANG K, ROSENKRANZ R, KARAN K, et al. Congruence-based contextual plausibility modulates cortical activity during vibrotactile perception in virtual multisensory environments. Communications Biology, 2022, 5(1): 1360. doi: 10.1038/s42003-022-04318-4
    [52] CAO Y, SUMMERFIELD C, PARK H, et al. Causal inference in the multisensory brain [J]. Neuron, 2019, 102 (5): 1076-87. e8.
    [53] GIESSING C, THIEL C M, STEPHAN K E, et al. Visuospatial attention: how to measure effects of infrequent, unattended events in a blocked stimulus design. Neuroimage, 2004, 23(4): 1370−81. doi: 10.1016/j.neuroimage.2004.08.008
    [54] ZHENG Q, ZHOU L, GU Y. Temporal synchrony effects of optic flow and vestibular inputs on multisensory heading perception [J]. Cell Reports, 2021, 37 (7).
    [55] LIANG P P, ZADEH A, MORENCY L-P. Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions. ACM Computing Surveys, 2024, 56(10): 1−42.
    [56] KLEMEN J, CHAMBERS C D. Current perspectives and methods in studying neural mechanisms of multisensory interactions. Neuroscience & Biobehavioral Reviews, 2012, 36(1): 111−33.
    [57] PARASKEVOPOULOS G, GEORGIOU E, POTAMIANOS A. Mmlatch: Bottom-Up Top-Down Fusion For Multimodal Sentiment Analysis [Z]. In: Proceedings of the 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2022: 4573-7.10. 1109/icassp43922.2022. 9746418
    [58] SUN H, LIU J, CHEN Y-W, LIN L. Modality-invariant temporal representation learning for multimodal sentiment classification. Information Fusion, 2023, 91: 504−14. doi: 10.1016/j.inffus.2022.10.031
    [59] WANG Z, WAN Z, WAN X. Transmodality: An end2end fusion method with transformer for multimodal sentiment analysis; In: Proceedings of the web conference 2020, F, 2020: 2514-2520 [C].
    [60] YU J, CHEN K, XIA R. Hierarchical interactive multimodal transformer for aspect-based multimodal sentiment analysis. IEEE Transactions On Affective Computing, 2022, 14(3): 1966−78.
    [61] HUANG J, ZHOU J, TANG Z, et al. TMBL: Transformer-based multimodal binding learning model for multimodal sentiment analysis [J]. Knowledge-Based Systems, 2024, 285 .
    [62] YANG D, LIU Y, HUANG C, et al. Target and source modality co-reinforcement for emotion understanding from asynchronous multimodal sequences [J]. Knowledge-Based Systems, 2023, 265 .
    [63] AHN H J, LEE D H, JEONG J H, LEE S W. Multiscale Convolutional Transformer for EEG Classification of Mental Imagery in Different Modalities [J]. IEEE Transactions On Neural Systems And Rehabilitation Engineering, 2022, 31 : 646-656. PP.
    [64] LI J, CHEN N, ZHU H, et al. Incongruity-aware multimodal physiology signals fusion for emotion recognition [J]. Information Fusion, 2024, 105 .
    [65] ASIF M, GUPTA A, ADITYA A, et al. Brain Multi-Region Information Fusion using Attentional Transformer for EEG Based Affective Computing [Z]. In: Proceedings of the 20th India Council International Conference (INDICON). 2023: 771-5.10. 1109/indicon59947.2023. 10440791
    [66] CHEN Z, HAN Y, MA Z, et al. A prefrontal-thalamic circuit encodes social information for social recognition. Nature Communications, 2024, 15(1): 1036. doi: 10.1038/s41467-024-45376-y
    [67] YU J, LI J, YU Z, HUANG Q. Multimodal Transformer With Multi-View Visual Representation for Image Captioning. IEEE Transactions On Circuits And Systems For Video Technology, 2020, 30(12): 4467−80. doi: 10.1109/TCSVT.2019.2947482
    [68] HU B, GUAN Z-H, CHEN G, CHEN C P. Neuroscience and network dynamics toward brain-inspired intelligence. IEEE Transactions on Cybernetics, 2021, 52(10): 10214−27.
    [69] SUCHOLUTSKY I, MUTTENTHALER L, WELLER A, et al. Getting aligned on representational alignment [J]. arXiv preprint arXiv: 231013018, 2023.
    [70] CHERSONI E, SANTUS E, HUANG C-R, LENCI A. Decoding word embeddings with brain-based semantic features. Computational Linguistics, 2021, 47(3): 663−98. doi: 10.1162/coli_a_00412
    [71] TONEVA M, WEHBE L. Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain) [J]. Advances in Neural Information Processing Systems, 2019, 32 .
    [72] YU S, GU C, HUANG K, LI P. Predicting the next sentence (not word) in large language models: What model-brain alignment tells us about discourse comprehension. Science Advances, 2024, 10(21): eadn7744. doi: 10.1126/sciadv.adn7744
    [73] CAMBRIA E, DAS D, BANDYOPADHYAY S, FERACO A. Affective computing and sentiment analysis [J]. A Practical Guide To Sentiment Analysis, 2017: 1-10 .
    [74] MISHRA A, DEY K, BHATTACHARYYA P. Learning cognitive features from gaze data for sentiment and sarcasm classification using convolutional neural network; In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), F, 2017 [C].
    [75] ZHANG Z, WU C, CHEN H, CHEN H. CogAware: Cognition-Aware framework for sentiment analysis with textual representations [J]. Knowledge-Based Systems, 2024, 299 .
    [76] MONTEJO-RáEZ A, MOLINA-GONZáLEZ M D, JIMéNEZ-ZAFRA S M, et al. A survey on detecting mental disorders with natural language processing: Literature review, trends and challenges. Computer Science Review, 2024, 53: 100654. doi: 10.1016/j.cosrev.2024.100654
    [77] RAMACHANDRAN G, YANG R. CortexCompile: Harnessing Cortical-Inspired Architectures for Enhanced Multi-Agent NLP Code Synthesis [J]. arXiv preprint arXiv: 240902938, 2024.
    [78] LI Z, ZHAO B, ZHANG G, DANG J. Brain network features differentiate intentions from different emotional expressions of the same text; In: Proceedings of the 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), F, 2023 [C]. IEEE.
    [79] SQUIRES M, TAO X, ELANGOVAN S, et al. Deep learning and machine learning in psychiatry: a survey of current progress in depression detection, diagnosis and treatment. Brain Informatics, 2023, 10(1): 10. doi: 10.1186/s40708-023-00188-6
    [80] SONG G, HUANG D, XIAO Z. A study of multilingual toxic text detection approaches under imbalanced sample distribution. Information, 2021, 12(5): 205. doi: 10.3390/info12050205
    [81] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale [J]. arXiv preprint arXiv: 201011929, 2020.
    [82] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [J]. Advances In Neural Information Processing Systems, 2017, 30 .
    [83] BI Y, ABROL A, JIA S, et al. Gray Matters: An Efficient Vision Transformer GAN Framework for Predicting Functional Network Connectivity Biomarkers from Brain Structure [J]. BioRxiv, 2024: 2024.01. 11.575307.
    [84] DONG S, GONG Y, SHI J, et al. Brain Cognition-Inspired Dual-Pathway CNN Architecture for Image Classification. IEEE Transactions On Neural Networks And Learning Systems, 2024, 35(7): 9900−14. doi: 10.1109/TNNLS.2023.3237962
    [85] LIU L, WANG F, ZHOU K, et al. Perceptual integration rapidly activates dorsal visual pathway to guide local processing in early visual areas. PLoS Biology, 2017, 15(11): e2003646. doi: 10.1371/journal.pbio.2003646
    [86] BAR M. The proactive brain: using analogies and associations to generate predictions. Trends In Cognitive Sciences, 2007, 11(7): 280−9. doi: 10.1016/j.tics.2007.05.005
    [87] BARAM A B, MULLER T H, NILI H, et al. Entorhinal and ventromedial prefrontal cortices abstract and generalize the structure of reinforcement learning problems [J]. Neuron, 2021, 109 (4): 713-23 e7.
    [88] VAN HOLSTEIN M, FLORESCO S B. Dissociable roles for the ventral and dorsal medial prefrontal cortex in cue-guided risk/reward decision making. Neuropsychopharmacology, 2020, 45(4): 683−93. doi: 10.1038/s41386-019-0557-7
    [89] AVERBECK B, O'DOHERTY J P. Reinforcement-learning in fronto-striatal circuits. Neuropsychopharmacology, 2022, 47(1): 147−62. doi: 10.1038/s41386-021-01108-0
    [90] HU S, SHEN L, ZHANG Y, et al. On Transforming Reinforcement Learning With Transformers: The Development Trajectory [J]. IEEE Transactions On Pattern Analysis And Machine Intelligence, 2024, PP.
    [91] ZHANG Y, JIA M, CHEN T, et al. A neuroergonomics model for evaluating nuclear power plants operators’ performance under heat stress driven by ECG time-frequency spectrums and fNIRS prefrontal cortex network: A CNN-GAT fusion model. Advanced Engineering Informatics, 2024, 62: 102563. doi: 10.1016/j.aei.2024.102563
    [92] LAW C-K, KOLLING N, CHAN C C, CHAU B K. Frontopolar cortex represents complex features and decision value during choice between environments [J]. Cell Reports, 2023, 42 (6).
    [93] LEE J, JUNG M, LUSTIG N, LEE J H. Neural representations of the perception of handwritten digits and visual objects from a convolutional neural network compared to humans. Human Brain Mapping, 2023, 44(5): 2018−38. doi: 10.1002/hbm.26189
    [94] VISWANATHAN K A, MYLAVARAPU G, CHEN K, THOMAS J P. A Study of Prefrontal Cortex Task Switching Using Spiking Neural Networks; In: Proceedings of the 12th International Conference on Advanced Computational Intelligence (ICACI), F, 2020 [C]. IEEE.
    [95] LI B-Z, PUN S H, FENG W, et al. A spiking neural network model mimicking the olfactory cortex for handwritten digit recognition; In: Proceedings of the 9th International IEEE/EMBS Conference on Neural Engineering (NER), F, 2019 [C]. IEEE.
    [96] HYAFIL A, SUMMERFIELD C, KOECHLIN E. Two mechanisms for task switching in the prefrontal cortex. Journal Of Neuroscience, 2009, 29(16): 5135−42. doi: 10.1523/JNEUROSCI.2828-08.2009
    [97] KUSHLEYEVA Y, SALVUCCI D D, LEE F J. Deciding when to switch tasks in time-critical multitasking. Cognitive Systems Research, 2005, 6(1): 41−9. doi: 10.1016/j.cogsys.2004.09.005
    [98] BRASS M, VON CRAMON D Y. The role of the frontal cortex in task preparation. Cerebral Cortex, 2002, 12(9): 908−14. doi: 10.1093/cercor/12.9.908
    [99] WEI Q, HAN L, ZHANG T. Learning and Controlling Multiscale Dynamics in Spiking Neural Networks Using Recursive Least Square Modifications [J]. IEEE Transactions On Cybernetics, 2024, PP.
    [100] DEMIR A, KOIKE-AKINO T, WANG Y, et al. EEG-GNN: Graph neural networks for classification of electroencephalogram (EEG) signals; proceedings of the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), F, 2021 [C]. IEEE.
    [101] BALAJI S S, PARHI K K. Classifying Subjects with PFC Lesions from Healthy Controls during Working Memory Encoding via Graph Convolutional Networks [Z]. In: Proceedings of the 11th International IEEE/EMBS Conference on Neural Engineering (NER). 2023: 1-4.10. 1109/ner52421.2023. 10123793
    [102] YANG Y, YE C, MA T. A deep connectome learning network using graph convolution for connectome-disease association study. Neural Networks, 2023, 164: 91−104. doi: 10.1016/j.neunet.2023.04.025
    [103] ACHTERBERG J, AKARCA D, STROUSE D J, et al. Spatially embedded recurrent neural networks reveal widespread links between structural and functional neuroscience findings. Nature Machine Intelligence, 2023, 5(12): 1369−81. doi: 10.1038/s42256-023-00748-9
    [104] JENSEN K T, HENNEQUIN G, MATTAR M G. A recurrent network model of planning explains hippocampal replay and human behavior [J]. Nature Neuroscience, 2024: 1-9 .
    [105] PRATIWI M. Comparative Analysis of Brain Waves for EEG-Based Depression Detection in the Prefrontal Cortex Lobe using LSTM; In: Proceedings of the 7th International Conference on New Media Studies (CONMEDIA), F, 2023 [C]. IEEE.
    [106] PRATIWI M. EEG-Based Depression Detection in the Prefrontal Cortex Lobe using mRMR Feature Selection and Bidirectional LSTM. Ultima Computing: Jurnal Sistem Komputer, 2023, 15(2): 71−8. doi: 10.31937/sk.v15i2.3426
    [107] SHARMA S, SHARMA S, ATHAIYA A. Activation functions in neural networks. Towards Data Sci, 2017, 6(12): 310−6.
    [108] JAGTAP A D, KAWAGUCHI K, KARNIADAKIS G E. Adaptive activation functions accelerate convergence in deep and physics-informed neural networks. Journal Of Computational Physics, 2020, 404: 109136. doi: 10.1016/j.jcp.2019.109136
    [109] ABBASI J, ANDERSEN P Ø. Physical activation functions (PAFs): an approach for more efficient induction of physics into physics-informed neural networks (PINNs). Neurocomputing, 2024, 608: 128352. doi: 10.1016/j.neucom.2024.128352
    [110] JAGTAP A D, KARNIADAKIS G E. How important are activation functions in regression and classification? A survey, performance comparison, and future directions [J]. Journal Of Machine Learning For Modeling And Computing, 2023, 4 (1).
    [111] MANOLA L, ROELOFSEN B, HOLSHEIMER J, et al. Modelling motor cortex stimulation for chronic pain control: electrical potential field, activating functions and responses of simple nerve fibre models. Medical And Biological Engineering And Computing, 2005, 43: 335−43. doi: 10.1007/BF02345810
    [112] STEINERBERGER S, WU H-T. Fundamental component enhancement via adaptive nonlinear activation functions. Applied And Computational Harmonic Analysis, 2023, 63: 135−43. doi: 10.1016/j.acha.2022.11.007
    [113] PAPPAS C, KOVAIOS S, MORALIS-PEGIOS M, et al. Programmable tanh-, elu-, sigmoid-, and sin-based nonlinear activation functions for neuromorphic photonics [J]. IEEE Journal Of Selected Topics In Quantum Electronics, 2023, 29 (6: Photonic Signal Processing): 1-10.
    [114] HA D, SCHMIDHUBER J. World models [J]. arXiv preprint arXiv: 180310122, 2018.
    [115] ESLAMI S A, JIMENEZ REZENDE D, BESSE F, et al. Neural scene representation and rendering. Science, 2018, 360(6394): 1204−10. doi: 10.1126/science.aar6170
    [116] YAMINS D L, HONG H, CADIEU C F, et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings Of The National Academy Of Sciences, 2014, 111(23): 8619−24. doi: 10.1073/pnas.1403112111
    [117] FRISTON K, MORAN R J, NAGAI Y, et al. World model learning and inference. Neural Networks, 2021, 144: 573−90. doi: 10.1016/j.neunet.2021.09.011
    [118] ROBINE J, HöFTMANN M, UELWER T, HARMELING S. Transformer-based world models are happy with 100k interactions [J]. arXiv preprint arXiv: 230307109, 2023.
    [119] MICHELI V, ALONSO E, FLEURET F. Transformers are sample-efficient world models [J]. arXiv preprint arXiv: 220900588, 2022.
    [120] CHEN C, WU Y-F, YOON J, AHN S. Transdreamer: Reinforcement learning with transformer world models [J]. arXiv preprint arXiv: 220209481, 2022.
    [121] ZHANG W, WANG G, SUN J, et al. STORM: Efficient stochastic transformer based world models for reinforcement learning [J]. Advances In Neural Information Processing Systems, 2024, 36 .
    [122] HAFNER D, PASUKONIS J, BA J, LILLICRAP T. Mastering diverse domains through world models [J]. arXiv preprint arXiv: 230104104, 2023.
    [123] HAFNER D, LILLICRAP T, NOROUZI M, BA J. Mastering atari with discrete world models [J]. arXiv preprint arXiv: 201002193, 2020.
    [124] BARTO A G, SUTTON R S, ANDERSON C W. Looking back on the actor–critic architecture. IEEE Transactions On Systems, Man, And Cybernetics: Systems, 2020, 51(1): 40−50.
    [125] KAISER L, BABAEIZADEH M, MILOS P, et al. Model-based reinforcement learning for atari [J]. arXiv preprint arXiv: 190300374, 2019.
    [126] MOERLAND T M, BROEKENS J, PLAAT A, JONKER C M. Model-based reinforcement learning: A survey. Foundations And Trends® In Machine Learning, 2023, 16(1): 1−118.
    [127] GU A, GOEL K, Ré C. Efficiently modeling long sequences with structured state spaces [J]. arXiv preprint arXiv: 211100396, 2021.
    [128] SMITH J T, WARRINGTON A, LINDERMAN S W. Simplified state space layers for sequence modeling [J]. arXiv preprint arXiv: 220804933, 2022.
    [129] DENG F, PARK J, AHN S. Facing off world model backbones: Rnns, transformers, and s4 [J]. Advances In Neural Information Processing Systems, 2024, 36 .
    [130] CHEN J, LI S E, TOMIZUKA M. Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning. IEEE Transactions On Intelligent Transportation Systems, 2021, 23(6): 5068−78.
    [131] HAFNER D, LILLICRAP T, BA J, NOROUZI M. Dream to control: Learning behaviors by latent imagination [J]. arXiv preprint arXiv: 191201603, 2019.
    [132] ZHANG Y, MU Y, YANG Y, et al. Steadily learn to drive with virtual memory [J]. arXiv preprint arXiv: 210208072, 2021.
    [133] GAO Z, MU Y, CHEN C, et al. Enhance sample efficiency and robustness of end-to-end urban autonomous driving via semantic masked world model [J]. IEEE Transactions On Intelligent Transportation Systems, 2024.
    [134] YU N, LV Z, YAN J, WANG Z. Spatial Cognition and Decision Model Based on Hippocampus-Prefrontal Cortex Interaction [Z]. 2023 China Automation Congress (CAC). 2023: 3754-9.10. 1109/cac59555.2023. 10450650.
  • 加载中
计量
  • 文章访问数:  171
  • HTML全文浏览量:  77
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-12-13
  • 录用日期:  2024-12-13
  • 网络出版日期:  2025-03-03

目录

/

返回文章
返回