-
摘要: 本文研究了多智能体时变网络上基于Bandit反馈的分布式在线鞍点问题, 其中每个智能体通过本地计算和局部信息交流去协作最小化全局损失函数. 在Bandit反馈下, 包括梯度在内的损失函数信息是不可用的, 每个智能体仅能获得和使用在某决策或其附近产生的函数值. 为此, 结合单点梯度估计方法和预测映射技术, 提出一种非欧几里得意义上的分布式在线Bandit鞍点优化算法. 以动态鞍点遗憾作为性能指标, 对于一般的凸−凹损失函数, 建立了遗憾上界并在某些预设条件下确保所提算法的次线性收敛. 此外, 考虑到在迭代优化中计算优化子程序的精确解通常较为困难, 进一步扩展一种基于近似计算方法的算法变种, 并严格分析精确度设置对扩展算法遗憾上界的影响. 最后, 通过一个目标跟踪案例对算法的有效性和先进性进行仿真验证.Abstract: This paper studies the distributed online saddle point problem with Bandit feedback over a multi-agent time-varying network, in which each agent collaborates to minimize the global loss function through local calculation and local information exchange. Under Bandit feedback, loss function information including gradients is not available, and each agent can only obtain and use the function value generated by a decision or decisions near it. To this end, a distributed online Bandit saddle point optimization algorithm in a non-Euclidean sense is proposed by combining the one-point gradient estimation method and the predictive mapping technique. Taking the expected dynamic saddle point regret as the performance metric, we establish the related regret upper bound for the general convex-concave loss functions and ensure that the proposed algorithm converges sublinearly under certain preconditions. In addition, considering that computing the exact solutions of the optimization oracles is usually difficult in iterative optimization, this paper further expands an algorithm variant based on an approximate computation method, and rigorously analyzes the impact of precision settings on the regret upper bound of the expanded algorithm. Finally, the effectiveness and advancement of the proposed algorithms are verified through a simulation example of target tracking.
-
近年来, 在线凸优化已成为一种解决实时决策任务的强有力工具和框架, 并且由于其在传感器网络、机器学习、强化学习、智能电网等领域的重要应用, 而引起了国内外学者的广泛关注[1−4]. 在该优化框架下, 损失函数信息仅在决策者做出决策后才会被对手或环境揭示, 即决策者仅能获取损失函数的后验信息. 通过使用所揭示的函数信息, 决策者更新下一时刻的决策, 继而生成一系列决策以实现最小化累积损失函数的目标. 在线凸优化的开创性工作可追溯到文献[5], 其中该文分析在线梯度下降优化算法并建立了$ {\cal{O}}(\sqrt{T}) $的静态遗憾(Regret)界. 到目前为止, 人们针对在线凸优化问题提出了许多有效可行的算法, 如文献[6−11].
然而, 在一些重要应用场景中, 如双线性矩阵博弈[12]、约束优化对偶性[9]、高维数据分类[13]、鲁棒优化问题[14]等, 所涉及的损失函数并不适用于在线凸优化框架, 换句话说, 它们的损失函数不总是凸(凹)的, 而是呈现出一种凸凹结构. 因此, 这些实际场景自然地激发了学者们对在线鞍点问题(也称在线最小最大优化)的研究兴趣. 早期, Ho-Nguyen和Kılınç-Karzan[15]使用加权在线鞍点间隙的性能指标, 对集中式在线鞍点问题进行研究并给出了次线性收敛的结果. 在文献[16]中, Rivera等提出一种跟随领导者在线鞍点优化算法, 并对于凸凹和强凸强凹目标函数分别获得了$ {\cal{O}}(\sqrt{T} \ln T) $和$ {\cal{O}}(\ln {T}) $的静态遗憾上界. 随后, Xu等[17]在文献[16]的基础上额外引入了正则化项, 实现了稳定的决策. Wood和Dall'Anese[18]使用平衡点理论分析了一类具有决策相关数据的在线鞍点问题.
在很多实际场景中, 例如网络中的在线路由、发电、网络搜索中的在线广告投放[2, 9], 损失函数的具体信息是不可用的, 且智能体仅能获得和使用某决策处或其邻域内的函数值. 这种信息反馈模型称为Bandit反馈. 为此, 利用函数值信息构造单点和多点梯度估计器的解决方案随之引起关注[9, 11, 19−23]. 对于一类在线矩阵博弈问题, Cardoso等[23]分析了全信息和Bandit反馈下的纳什均衡遗憾. 在文献[22]中, Roy等在两种信息反馈下研究了非平稳鞍点问题的两种集中式优化算法, 并详细展示了所设计的算法在动态和静态鞍点遗憾标准下是次线性收敛的. 然而, 这一结果依赖于目标函数为强凸−强凹和光滑的强假设条件. 受限于单个机器的计算瓶颈, 集中式算法往往难以胜任大规模复杂的优化问题[3, 24]. 相比之下, 分布式框架下的算法避免了这个限制, 且因其低计算负担、结构鲁棒性和隐私性等显著优势, 近年来已引起国内外学者的广泛关注[3, 9−10, 24−36]. 在此框架下, Zhang等[37]提出一种Bandit反馈下的基于两点梯度估计方法的分布式在线鞍点优化算法, 并建立相应的次线性动态鞍点遗憾.
就在线Bandit鞍点问题的分布式解决方案而言, 现有的研究尚不完善, 新的分布式算法亟待开发和分析. 此外, 包括文献[37]在内的传统欧氏距离下的算法缺乏灵活性, 难以进一步利用优化问题的某些性质. 例如, 对于带有单纯形约束的优化问题, Kullback-Leibler (KL)散度作为一种非欧距离可获得算法显式表达, 而对于传统的欧氏距离则无法获得[8]. 因此, 开发一种非欧几里得意义上的分布式在线Bandit鞍点算法是必要且有意义的. 另一方面, 由于无偏估计性质和仅需一个采样点的要求, 单点估计框架能够有效处理Bandit反馈下梯度信息受限情形, 且可以与许多应用相兼容, 例如网络搜索中的在线广告投放[2, 9]. 受上述两方面分析的启发, 本文应用单点梯度估计框架和分布式镜面下降方法, 研究了在线Bandit鞍点问题, 主要贡献总结为如下三个方面:
1) 为解决时变多智能体网络上的分布式在线Bandit鞍点问题, 本文通过结合梯度估计和时变预测映射技术, 提出一种非欧几里得意义上的分布式在线算法. 在距离度量上采用更为一般的Bregman散度而不是传统的欧氏距离, 使得算法由于对不同优化问题的自由选择性而变得更加灵活. 此外, 受益于预测映射的灵活设置, 所提算法可实现比单位映射更好的收敛性能.
2) 开发并分析基于单点梯度估计器的分布式在线Bandit镜面下降鞍点优化算法. 区别于文献[37]的两点估计方法, 每个单点估计器仅要求一个函数值, 且其估计参数不依赖于总迭代时间$ T $的先验知识, 进而有效增强了算法的可操作性和实用性. 对于一般化的凸凹损失函数, 建立了其动态遗憾上界并保证了次线性收敛.
3) 考虑到在迭代优化中计算优化子程序的精确解通常较为困难且计算成本昂贵, 本文设计一种基于近似计算方法的算法变种, 并严格分析精确度设置对算法遗憾上界的影响. 最后, 通过一个目标跟踪案例对算法的有效性和先进性进行了仿真验证.
本文内容安排如下: 第1节描述所研究问题和一些预备知识; 第2节给出一种分布式在线Bandit鞍点优化算法和对应收敛结果; 第3节给出一种基于近似计算方法的算法变种; 第4节给出仿真实验结果; 第5节对本文进行总结与展望.
符号说明. $ {\bf{R}}^n $表示$ n $维向量空间. $ \bf{Z}\;(\bf{Z}_+) $表示整数 (正整数)集; $ \|\cdot\|_2 $和$ \|\cdot\|_* $分别代表欧氏范数和$ \| \cdot \| $的对偶范数; 记$ [{\boldsymbol{x}}]_i $ 和$ [m] $分别为向量$ \;{\boldsymbol{x}} $的第$ i $个元素和整数集$ \{1, \;2, \;\cdots, \;m \} $; 记$ [A_t]_{ij} $为矩阵$ A_t $的第$ i $行第$ \;j $列的元素; $ I_d $表示$\; d\times d $维单位矩阵; $ \mathbb{B}_1^m = \{ {\boldsymbol{s}} \in \bf{R}^m : \| \boldsymbol s\|_2 \leq 1\} $ 表示以原点为中心的单位欧氏球; 记$\, \triangle_n $为单纯形$ \;\{ {\boldsymbol{x}} \in \bf{R}^n | \sum_{i=1}^n [{\boldsymbol{x}}]_i=1, \;[{\boldsymbol{x}}]_i \geq 0, \;i \in [n]\} $; $ \{ {\boldsymbol{x}}_t\}_{t=1}^T $代表变量$ \;{\boldsymbol{x}}_t $在时间$ \;t\in [T] $上的序列.
1. 问题描述与预备知识
本文使用一个有向时变网络$ {\cal{G}}_t=\{{\cal{V}}, \;{\cal{E}}_t, \;A_t \} $刻画智能体间的通信拓扑, 其中$ {\cal{V}} : = [n] $, $ {\cal{E}}_t \subseteq {\cal{V}}\; \times {\cal{V}} $和$ A_t \in \bf{R}^{n \times n} $分别表示智能体集、边集和加权邻接矩阵. 记 $ {\cal{N}}_{i}^{\text {in }}(t)=\{j \mid(j, \; i) \in {\cal{E}}_t\} \cup\{i\} $为智能体 $ i $在$ t $时刻的入邻居集. 相应地, 使用$ {\cal{N}}_{i}^{\text {out }}(t)=\{j \mid $ $ (i, \; j) \in {\cal{E}}_t\} \cup\{i\} $表示其出邻居集. 记$ [A_t]_{ii}=1- \sum_{j\in \{j \mid(j, \; i) \in {\cal{E}}_t\}} [A_t]_{ij}. $ 假定当$ j \in $ $ {\cal{N}}_{i}^{\text {in }}(t) $, 则$ [A_{t}]_{ij}> \zeta $, 否则$ [A_{t}]_{ij}=0 $, 其中$ 0<\zeta <1 $. 以$\; {\boldsymbol{x}}_t $和$\; {\boldsymbol{y}}_t $为$ t $时刻的决策变量, Bandit反馈下分布式在线鞍点问题的数学模型表示如下:
$$ \min\limits_{{\boldsymbol{x}}_t \in {\boldsymbol{X}}} \max\limits_{{\boldsymbol{y}}_t \in {\boldsymbol{Y}}}\, \;\, \;\sum\limits_{t=1}^T{\sum\limits_{i=1}^n{f_{i, \;t}}}( {\boldsymbol{x}}_t, \;{\boldsymbol{y}}_t ) $$ (1) 其中, $ {\boldsymbol{X}} \subset \bf{R}^{d} $和$ {\boldsymbol{Y}} \subset \bf{R}^{m} $是两个凸紧集, $ T $是总迭代时间, $ f_{i, \;t}(\cdot) $表示仅智能体 $ i $可知的凸凹局部损失函数. 特别地, 在固定$ {\boldsymbol{y}}_t $后, $ f_{i, \;t}(\cdot, \;{\boldsymbol{y}}_t):\bf{R}^{d}\rightarrow \bf{R} $在$ {\boldsymbol{X}} $上是凸的; 在固定$\; {\boldsymbol{x}}_t $后, $ f_{i, \;t}({\boldsymbol{x}}_t, \;\cdot):\bf{R}^{m}\rightarrow \bf{R} $在$\; {\boldsymbol{Y}} $上是凹的.
在Bandit反馈下, 智能体$ i $在提交决策对后, 仅能获得其自身的损失函数值而不是具体的函数信息, 这区别于全信息反馈下的情形[9, 19, 38]. 进一步, Bandit反馈下分布式在线鞍点问题可以视为一种智能体与环境(对手)之间的重复博弈, 其过程如下: 首先, 在$ t $时刻智能体$ i \in {\cal{V}} $提交决策对$ ({\boldsymbol{x}}_{i, \;t}, {\boldsymbol{y}}_{i, \;t}) \in {\boldsymbol{X}} \times {\boldsymbol{Y}} $; 其次, 在环境(对手)选择一个损失函数$ f_{i, \;t} ({\boldsymbol{x}}, \; {\boldsymbol{y}}) $后, 智能体$ i $遭受损失$ f_{i, \;t} ({\boldsymbol{x}}_{i, \;t}, \; {\boldsymbol{y}}_{i, \;t}) $并仅收到在$ \;{\boldsymbol{x}}_{i, \;t} $和$ \;{\boldsymbol{y}}_{i, \;t} $或其附近处的损失值; 最后, 基于某些知识及从邻居智能体接收的信息, 智能体$ i $更新$ t+1 $时刻的决策对.
为了衡量所开发分布式在线鞍点优化算法的收敛性能, 定义如下的期望动态鞍点遗憾[37]:
$$ \begin{split} &{ESP-Regret}_{d}^j (T) =\\ & \qquad\Bigg| {{\rm{E}}} \Bigg( \sum\limits_{t=1}^T \sum\limits_{i=1}^n {f_{i, \;t} ({\boldsymbol{x}}_{j, \;t}, \;{\boldsymbol{y}}_{j, \;t})} \;-\\ &\qquad\sum\limits_{t=1}^T \sum\limits_{i=1}^n {f_{i, \;t} ({\boldsymbol{x}}_t^*, \;{\boldsymbol{y}}_t^*)} \Bigg)\Bigg| \end{split} $$ (2) 其中, $ ({\boldsymbol{x}}_t^*, \,{\boldsymbol{y}}_t^*) \in {\arg\min}_{{\boldsymbol{x}} \in {\boldsymbol{X}}} \, {\arg\max}_{{\boldsymbol{y}} \in {\boldsymbol{Y}}} \sum_{i=1}^n f_{i, \,t} ({\boldsymbol{x}}, \;{\boldsymbol{y}}) $ 为问题(1)在$ t $时刻的鞍点, 且对任意$ {\boldsymbol{x}} \in {\boldsymbol{X}} $和$ {\boldsymbol{y}} \in {\boldsymbol{Y}} $, 满足式(3)中的鞍点性质. 根据文献[39], 当$ f_{i, \;t} ({\boldsymbol{x}}, \;{\boldsymbol{y}}) $满足凸−凹和连续条件, 且$\; {\boldsymbol{X}} $和$\; {\boldsymbol{Y}} $是凸紧集时, 问题(1) 的鞍点始终存在.
$$ {\sum\limits_{i=1}^n{f_{i, \;t}}}( {\boldsymbol{x}}_t^*, \;{\boldsymbol{y}} ) \leq {\sum\limits_{i=1}^n{f_{i, \;t}}}( {\boldsymbol{x}}_t^*, \;{\boldsymbol{y}}_t^* ) \leq {\sum\limits_{i=1}^n{f_{i, \;t}}}( {\boldsymbol{x}}, \;{\boldsymbol{y}}_t^* ) $$ (3) 一般地, 动态遗憾的分析相关于在线优化问题的某些变化特征[21]. 为此, 定义变差 (Variation) $ V_T^x $和$ V_T^y $如下:
$$ \left\{ \begin{aligned}[l] V_T^x= \sum_{t=1}^T \| {\boldsymbol{x}}_{t+1}^* -B_t {\boldsymbol{x}}_t^* \| \\ V_T^y= \sum_{t=1}^T \| {\boldsymbol{y}}_{t+1}^* - C_t{\boldsymbol{y}}_t^* \| \end{aligned} \right.$$ (4) 其中, $ B_t\in\bf{R}^d $和$ C_t\in\bf{R}^m $为两个可设计的时变预测映射矩阵, 旨在通过减小$ {\boldsymbol{x}}_{t+1}^* - B_t {\boldsymbol{x}}_t^*$和$ {\boldsymbol{y}}_{t+1}^* - C_t{\boldsymbol{y}}_t^* $来获得更小的$ V_T^x $ 和$ V_T^y $. 当$ B_t=I_d, \; C_t=I_m $, 式(4)退化到其常规的形式[10, 37, 40]. 特别地, 对于一类最优决策对之间带有动态关系的优化问题, 例如文献[11]中的目标跟踪问题, 恰当的预测映射能够建立比单位映射更小的变差, 进而获得更紧的动态遗憾上界. 为便于下文分析, 记$ V_T =\max \{V_T^x, \; V_T^y\} $.
本文目标为设计一个Bandit反馈下的分布式在线鞍点优化算法, 使得定义在式(2)的期望动态鞍点遗憾呈次线性增长, 即对任意 $ \forall j \;\in\; {\cal{V}} $, $ \lim_{T \rightarrow \infty} ({ESP-Regret}_d^j(T) / T) =0 $. 换句话说, 本文期望设计一种高效在线算法, 在时间$ T $的平均意义下以实现产生接近于$ ({\boldsymbol{x}}_t^*, \;{\boldsymbol{y}}_t^*) $的决策对序列的目标.
接下来, 围绕多智能体网络$ {\cal{G}}_t $, 约束集$ {\boldsymbol{X}}, \; {\boldsymbol{Y}} $及损失函数$ f_{i, \;t}(\cdot) $, 给出几个常用假设和一个基本引理.
假设 1[9, 25]. 1) 对任意的 $ t \in [T] $和$ i, \;j \in {\cal{V }} $, 有$ \sum_{j=1}^{n} [A_{t}]_{ij}=\sum_{i=1}^{n} [A_{t}]_{ij}=1 $.
2) 存在$ Q \in \bf{Z}_{+} $, 使得图$ ({\cal{V}}, \; \bigcup_{i=k Q+1}^{(k+1)Q} {\cal{E}}_{i}) $对任意的整数$ k \geq 0 $是强连通的.
引理 1[25]. 在假设1成立的条件下, 对于所有$ i, \; j \in {\cal{V}} $和满足$ t \geq s \geq 1 $的$ t, \; s $, 有
$$ \left|[\Phi(t, \; s)]_{i j}-\frac{1}{n}\right| \leq \Gamma \sigma^{(t-s)} $$ 其中, $ \Phi(t, s)\;=\;A_{t} A_{t-1} \cdots A_{s} $, $ \sigma=(1-\zeta / (4 n^{2}))^{{1}/{Q}} $, $ \Gamma\;=\;(1\;-\;\zeta / (4 n^{2}))^{{(1-2Q)}/{ Q}} $.
假设 2[19−20, 41]. 约束集$ {\boldsymbol{X}} $和$ {\boldsymbol{Y}} $满足
$$ \begin{array}{*{20}{l}} \mathbb{B}_{r_1}^d \subseteq {\boldsymbol{X}} \subseteq \mathbb{B}_{R_1}^d, \; \mathbb{B}_{r_2}^m \subseteq {\boldsymbol{Y}} \subseteq \mathbb{B}_{R_2}^m \end{array} $$ (5) 其中, $ r_1,\; r_2,\; R_1,\; R_2 $表示球的半径, 并满足 $ 0 < r_1 \leq R_1 $, $ 0 < r_2 \leq R_2 $.
假设 3[9, 12, 19, 42]. 1) 函数$ f_{i, \;t}({\boldsymbol{x}}, \; {\boldsymbol{y}}) $满足Lipschitz连续性, 即对任意$ {\boldsymbol{x}}_1, \; {\boldsymbol{x}}_2 \in {\boldsymbol{X}} $和$ {\boldsymbol{y}}_1, \; {\boldsymbol{y}}_2 \in {\boldsymbol{Y}} $,
$$ \begin{split} &\left|f_{i, \;t}({\boldsymbol{x}}_{1}, \; {\boldsymbol{y}}_1)-f_{i, \;t}({\boldsymbol{x}}_{2}, \; {\boldsymbol{y}}_2)\right|\leq \\ &\quad \quad L_{X} \|{\boldsymbol{x}}_{1}-{\boldsymbol{x}}_{2}\|+L_{Y} \|{\boldsymbol{y}}_{1}-{\boldsymbol{y}}_{2}\| \end{split} $$ (6) 其中, $ L_{X}>0 $和$ L_{Y}>0 $是两个已知常数.
2) 对于所有$ i\in [n], \; t\in [T] $, $ f_{i, \;t} ({\boldsymbol{x}}, \; {\boldsymbol{y}}) $在$ {\boldsymbol{X}} $和$ {\boldsymbol{Y}} $上有界, 即存在一个常数$ F>0 $满足
$$ |f_{i, \;t} ( {\boldsymbol{x}}, \; {\boldsymbol{y}}) |\leq F $$ (7) 1.1 Bregman 散度
Bregman散度作为一种距离测量函数, 是镜面下降方法的关键组成部分. 设$ {\cal{R}}^x: \bf{R}^d \rightarrow \bf{R} $和$ {\cal{R}}^y : \bf{R}^m \rightarrow \bf{R} $分别是在约束集$\; {\boldsymbol{X}} $和$ \;{\boldsymbol{Y}} $上的$ \varrho_x $- 和$ \varrho_y $-强凸函数, 即对任意的$ {\boldsymbol{x}}, \;{\boldsymbol{s}}\in {\boldsymbol{X}} $和$ {\boldsymbol{y}}, \; {\boldsymbol{z}}\in {\boldsymbol{Y}} $有
$$\left\{ \begin{array}{l} {\cal{R}}^x( {\boldsymbol{x}})-{\cal{R}}^x({\boldsymbol{s}})\geq \langle \nabla{\cal{R}}^x({\boldsymbol{s}}), \; {\boldsymbol{x}}-{\boldsymbol{s}} \rangle +\dfrac{\varrho_x}{2} \| {\boldsymbol{x}}-{\boldsymbol{s}} \|^2 \; \\ {\cal{R}}^y( {\boldsymbol{y}})-{\cal{R}}^y( {\boldsymbol{z}})\geq \langle \nabla{\cal{R}}^y( {\boldsymbol{z}}), \; {\boldsymbol{y}}- {\boldsymbol{z}} \rangle +\dfrac{\varrho_y}{2} \| {\boldsymbol{y}}- {\boldsymbol{z}} \|^2 \end{array}\right. $$ (8) 其中, 符号$ \langle\cdot,\; \cdot \rangle $表示向量的内积. 基于距离生成函数$ {\cal{R}}^x $和$ {\cal{R}}^y $, 定义相关的Bregman散度如下:
$$\left\{ \begin{array}{l} \Psi_{{\cal{R}}}^x ( {\boldsymbol{x}}, \;{\boldsymbol{s}}):={\cal{R}}^x( {\boldsymbol{x}})-{\cal{R}}^x({\boldsymbol{s}})- \langle \nabla{\cal{R}}^x({\boldsymbol{s}}), \; {\boldsymbol{x}}-{\boldsymbol{s}} \rangle \; \\ \Psi_{{\cal{R}}}^y ( {\boldsymbol{y}}, \; {\boldsymbol{z}}):={\cal{R}}^y( {\boldsymbol{y}})-{\cal{R}}^y( {\boldsymbol{z}})- \langle \nabla{\cal{R}}^y( {\boldsymbol{z}}), \; {\boldsymbol{y}}- {\boldsymbol{z}} \rangle \end{array} \right.$$ (9) 其中, $ \Psi_{{\cal{R}}}^x $和 $ \Psi_{{\cal{R}}}^y $彼此独立. 进一步, 通过观察式(8) 和式(9), Bregman散度满足
$$\left\{ \begin{array}{l} \Psi_{{\cal{R}}}^x ( {\boldsymbol{x}}, \;{\boldsymbol{s}})\geq \dfrac{\varrho_x}{2} \| {\boldsymbol{x}}-{\boldsymbol{s}} \|^2, \; \forall {\boldsymbol{x}}, \;{\boldsymbol{s}}\in {\boldsymbol{X}} \\ \Psi_{{\cal{R}}}^y ( {\boldsymbol{y}}, \; {\boldsymbol{z}})\geq \dfrac{\varrho_y}{2} \| {\boldsymbol{y}}- {\boldsymbol{z}} \|^2, \; \forall {\boldsymbol{y}}, \; {\boldsymbol{z}} \in {\boldsymbol{Y}} \end{array} \right.$$ (10) 作为一类非欧氏距离测量函数, 不同的Bregman散度产生于不同的距离生成函数. 特别地, 以$ {\cal{R}}^x $和$ \Psi_{{\cal{R}}}^x $为例, Bregman散度包含了两个经典特例: 欧氏距离$ \Psi_{{\cal{R}}}^x ({\boldsymbol{x}}, \;{\boldsymbol{s}}) = $$ \frac{1}{2} \| {\boldsymbol{x}} - {\boldsymbol{s}} \|_2^2 $ 和 KL 散度$ \Psi_{{\cal{R}}}^x ({\boldsymbol{x}}, \;{\boldsymbol{s}})= \sum_{i=1}^d [{\boldsymbol{x}}]_i (\ln [{\boldsymbol{x}}]_i - \ln [{\boldsymbol{s}}]_i) $, 其中两者分别满足$ {\cal{R}}^x({\boldsymbol{x}})=\| {\boldsymbol{x}} \|_2^2/{2} , \; {\boldsymbol{x}} \in \bf{R}^d $ 和 $ {\cal{R}}^x({\boldsymbol{x}})= $$ \sum_{i=1}^d [{\boldsymbol{x}}]_i \times \ln [{\boldsymbol{x}}]_i $, $ {\boldsymbol{x}} \in \triangle_d $. 此外, 其他的Bregman散度设置包括但不限于$ {\cal{R}}^x({\boldsymbol{x}})\;= \| {\boldsymbol{x}} \|_p^2 /{2}$, $ {\boldsymbol{x}} \in \bf{R}^d, \; p \in (1, \; 2] $ 和 $ {\cal{R }}^x({\boldsymbol{x}})={\boldsymbol{x}}^{\rm T} Q {\boldsymbol{x}}/{2} $, $ {\boldsymbol{x}} \in \bf{R}^d $ [43−45], 其中 $ Q\in \bf{R}^{d\times d} $ 是对称且正定的.
围绕$ {\cal{R}}^x $, $ {\cal{R}}^y $及相关的Bregman散度$ \Psi_{{\cal{R}}}^x $和$ \Psi_{{\cal{R}}}^y $, 给出以下假设.
假设 4. 对于任意$ {\boldsymbol{x}}_1, \; {\boldsymbol{x}}_2, \; {\boldsymbol{z}}\in {\boldsymbol{X}} $和$ {\boldsymbol{y}}_1, \; {\boldsymbol{y}}_2, \; $ $ {\boldsymbol{v}} \in {\boldsymbol{Y}} $, Bregman 散度 $ \Psi_{{\cal{R}}}^x $和$ \Psi_{{\cal{R}}}^y $ 满足
$$\left\{ \begin{array}{l} |\Psi_{{\cal{R}}}^x( {\boldsymbol{x}}_1, \; {\boldsymbol{z}})-\Psi_{{\cal{R}}}^x( {\boldsymbol{x}}_2, \; {\boldsymbol{z}})| \leq K_{X} \| {\boldsymbol{x}}_1- {\boldsymbol{x}}_2 \| \; \\ | \Psi_{{\cal{R}}}^y( {\boldsymbol{y}}_1, \; {\boldsymbol{v}})-\Psi_{{\cal{R}}}^y( {\boldsymbol{y}}_2, \; {\boldsymbol{v}})| \leq K_{Y} \| {\boldsymbol{y}}_1- {\boldsymbol{y}}_2 \| \end{array} \right.$$ (11) 其中, $K_X, K_Y $是两个已知的正常数.
假设 5. Bregman散度$ \Psi_{{\cal{R}}}^x $和$ \Psi_{{\cal{R}}}^y $在第二个参数上是凸的, 即对于所有的$ {\boldsymbol{x}}, \; {\boldsymbol{s}}, \; {\boldsymbol{y}}, \; {\boldsymbol{z}} \in \bf{R}^d $, 有
$$\left\{ \begin{array}{l} \Psi_{{\cal{R}}}^x \left( {\boldsymbol{x}}, \; \sum\limits_{i=1}^d [{\boldsymbol{a}}]_i {\boldsymbol{s}}_i \right) \leq \sum\limits_{i=1}^d [{\boldsymbol{a}}]_i \Psi_{{\cal{R}}}^x \left( {\boldsymbol{x}}, \; {\boldsymbol{s}}_i \right), \; \forall {\boldsymbol{a}} \in {\triangle_{d}} \;\\ \Psi_{{\cal{R}}}^y \left( {\boldsymbol{y}}, \; \sum\limits_{i=1}^m [{\boldsymbol{b}}]_i {\boldsymbol{z}}_i \right) \leq \sum\limits_{i=1}^m [{\boldsymbol{b}}]_i \Psi_{{\cal{R}}}^y \left( {\boldsymbol{y}}, \; {\boldsymbol{z}}_i \right), \; \forall {\boldsymbol{b}} \in {\triangle_{m}} \end{array} \right.$$ (12) 注 1. 注意到当$ {\cal{R}}^x $和$ {\cal{R}}^y $满足Lipschitz连续性时, 则假设4中的条件自动被满足[11, 46-47]. 一般地, 假设 4和假设5常用于 Bregman散度, 并且是在线镜面下降算法动态遗憾分析的标准假定条件[11, 47-48]. 举个例子, 通过设置参数$ K_{X}=2R_1 $和使用凸性性质, 欧氏距离满足假设4和假设5; 根据文献[11], 元素下界不为零的KL散度满足假设4和假设5; 基于有限维实向量空间上范数的等价性和凸性, $ p $范数散度是Lispchitz连续的, 且根据文献[8]可知, $ p $范数散度满足假设5.
假设 6. 映射$ B_t $和$ C_t $, $ \forall t\in [T] $是非扩张的, 即
$$ \left\{\begin{array}{l} \Psi_{{\cal{R}}}^x \left(B_t {\boldsymbol{x}}, \; B_t {\boldsymbol{s}} \right) \leq \Psi_{{\cal{R}}}^x \left( {\boldsymbol{x}}, \; {\boldsymbol{s}} \right), \; \forall {\boldsymbol{x}}, \; {\boldsymbol{s}} \in {\boldsymbol{X}} \;\\ \Psi_{{\cal{R}}}^y \left( C_t {\boldsymbol{y}}, \; C_t {\boldsymbol{z}} \right) \leq \Psi_{{\cal{R}}}^y \left( {\boldsymbol{y}}, \; {\boldsymbol{z}} \right), \; \forall {\boldsymbol{y}}, \; {\boldsymbol{z}} \in {\boldsymbol{Y}} \; \end{array}\right. $$ (13) 且$ \| B_t\| \leq 1, \; \|C_t \|\leq 1, \; \forall t \in {T} $.
假设6确保随着算法的运行, 某一时刻糟糕预测的影响不会持续加剧. 单位映射作为一种特例满足这一条件且文献[11, 48]中也要求类似的假设. 从另一个角度看, 本文期望通过使用比单位映射更有效的预测序列 $ \{ B_t\} $和$ \{ C_t\} $来提高算法的性能.
2. 算法设计与收敛性分析
在本节中, 提出并分析一种基于单点梯度估计的分布式在线Bandit鞍点优化算法, 以解决所考虑的分布式在线Bandit鞍点问题. 在Bandit反馈下, 智能体$ i $提交决策$ {\boldsymbol{x}}_{i, \;t} $和$ {\boldsymbol{y}}_{i, \;t} $后, 仅有决策变量处或附近的函数值被揭示. 因此, 当更新下一时刻的决策时, 因缺乏梯度信息而面临新的困难和挑战. 为此, 本文采用单点梯度估计方法(一种使用单个函数值构造梯度估计器的方法)解决这一难题.
2.1 基于单点梯度估计的分布式在线 Bandit 鞍点优化算法
算法 1. 基于单点梯度估计的分布式在线 Bandit 鞍点优化算法
初始化. 初始向量 $ {\boldsymbol{x}}_{i, \;1} \in (1-\xi_1^x){\boldsymbol{X}}, \;{\boldsymbol{y}}_{i, \;1} \in (1-\xi_1^y) \times {\boldsymbol{Y}}, \; $非递增正参数$ \alpha_t, \;\eta_t, \; \delta_t^x, \; $ $ \delta_t^y, \;\xi_t^x, \; \xi_t^y $, 映射$ B_t, \; C_t. $
1: for $ t=1, \;2, \;\cdots, \;T $ do
2: for 每个智能体$ i \in {\cal{V}} $ do
3: 梯度估计步骤:
$ \bullet $ 采样$ f_{i, \, t}({\boldsymbol{x}}_{i, \, t} + {\delta_t^x} {\boldsymbol{u}}_{i, \, t}^x, \, {\boldsymbol{y}}_{i, \, t}), \, f_{i, \, t}({\boldsymbol{x}}_{i, \, t}, \, {\boldsymbol{y}}_{i, \, t}\, + $$ {\delta_t^y} {\boldsymbol{u}}_{i, \;t}^y) $;
$ \;\; \bullet $ 计算
$$\qquad \begin{split}&\tilde{{\boldsymbol{g}}}_{i, \;t}^{x} =\frac{d} {{\delta_t^x}} f_{i, \;t}({\boldsymbol{x}}_{i, \;t}+ {\delta_t^x} {\boldsymbol{u}}_{i, \;t}^x, \;{\boldsymbol{y}}_{i, \;t}) {\boldsymbol{u}}_{i, \;t}^x \\ &\tilde{{\boldsymbol{g}}}_{i, \;t}^{y} =\frac{m} {{\delta_t^y}} f_{i, \;t}({\boldsymbol{x}}_{i, \;t}, \;{\boldsymbol{y}}_{i, \;t}+ {\delta_t^y} {\boldsymbol{u}}_{i, \;t}^y) {\boldsymbol{u}}_{i, \;t}^y \end{split} $$ 4: 镜面下降步骤: 分别计算下面优化问题并获得解$ \tilde{ {\boldsymbol{x}}}_{i, \;t} $和$ \tilde{ {\boldsymbol{y}}}_{i, \;t} $
$$\quad\;\; \begin{split}&\arg \underset{ {\boldsymbol{x}}\in(1-\xi_t^x) {\boldsymbol{X}}}{\min} \left\{ \langle {\boldsymbol{x}}, \; \tilde{{\boldsymbol{g}}}_{i, \;t}^{x} \rangle + \frac{1}{\alpha_t} \Psi_{{\cal{R}}}^x ({\boldsymbol{x}}, \; {\boldsymbol{x}}_{i, \;t})\right\} \\ &\arg \underset{ {\boldsymbol{y}}\in(1-\xi_t^y) {\boldsymbol{Y}}}{\min} \left\{ - \langle {\boldsymbol{y}}, \; \tilde{{\boldsymbol{g}}}_{i, \;t}^{y} \rangle + \frac{1}{\eta_t} \Psi_{{\cal{R}}}^y ({\boldsymbol{y}}, \; {\boldsymbol{y}}_{i, \;t}) \right\} \end{split} $$ 5: 预测步骤:
$$\qquad{\boldsymbol{p}}_{i, \;t}^x= B_t \tilde{ {\boldsymbol{x}}}_{i, \;t}, \;{\boldsymbol{p}}_{i, \;t}^y= C_t \tilde{ {\boldsymbol{y}}}_{i, \;t} $$ 6: 一致性步骤:
$$\qquad \begin{split}&{\boldsymbol{x}}_{i, \;t+1}= \sum_{j \in {\cal{N}}_{i}^{\rm in}(t)}{[A_t]_{ij} {\boldsymbol{p}}_{j, \;t}^x} \\ &{\boldsymbol{y}}_{i, \;t+1}= \sum_{j \in {\cal{N}}_{i}^{\rm in}(t)}{[A_t]_{ij} {\boldsymbol{p}}_{j, \;t}^y} \end{split} $$ 7: end for
8: end for
算法1中展示了基于单点梯度估计的分布式在线Bandit鞍点优化过程, 其中包含以下关键步骤:
1) 梯度估计步骤(步骤3): 仅需分别采样一个带有扰动$ {\delta_t^x} {\boldsymbol{u}}_{i, \;t}^x $和$ {\delta_t^y} {\boldsymbol{u}}_{i, \;t}^y $的函数值, 梯度值$\; \tilde{{\boldsymbol{g}}}_{i, \;t}^{x} $和$ \tilde{{\boldsymbol{g}}}_{i, \;t}^{y } $能够通过使用单点估计器得到, 其中$ {\delta_t^x}, \; {\delta_t^y} $是探索参数, $ {\boldsymbol{u}}_{i, \;t}^x, \; {\boldsymbol{u }}_{i, \;t}^y $是满足$ \| {\boldsymbol{u}}_{i, \;t}^x\|_2= \| {\boldsymbol{u}}_{i, \;t}^y\|_2 =1 $的随机向量.
2) 镜面下降步骤(步骤4): 相比于传统的欧氏距离, 算法1采用Bregman散度作为更加一般的距离测量函数, 即$ \Psi_{{\cal{R}}}^x $和$ \Psi_{{\cal{R}}}^y $. 不同的$ {\cal{R}}^x, \; {\cal{R}}^y $能产生不同的$ \Psi_{{\cal{R}}}^x $和$ \Psi_{{\cal{R}}}^y $, 这将会极大地增强算法的灵活性. 此外, 为了保证决策不违反约束, $ \tilde{ {\boldsymbol{x}}}_{i, \;t} $ 和 $ \tilde{ {\boldsymbol{y}}}_{i, \;t} $ 被要求投影到时变收缩集$\; {\boldsymbol{x}}\in(1-\xi_t ^x) {\boldsymbol{X}} $和$ {\boldsymbol{y}}\in(1\,- \xi_t^y) {\boldsymbol{Y}} $上, 其中$ \xi_t^x, \;\xi_t^y \in (0, \;1), \;t\in [T] $是收缩参数. 进一步, 当$ {\delta_t^x} \in [0, \; \xi_t^x r_1] $和$ {\delta_t^y} \in [0, \; \xi_t^y r_2] $时, 则$ {\boldsymbol{x}}_{i, \;t} +\; {\delta_t^x} {\boldsymbol{u}} _{i, \;t}^x \in {\boldsymbol{X}} $, $ {\boldsymbol{y}}_{i, \;t}+ {\delta_t^y} {\boldsymbol{u}}_{i, \;t}^y \in {\boldsymbol{Y}} $成立[19].
3) 预测步骤(步骤5): 为提高决策质量, 算法1考虑了基于人为经验的一步预测, 这在理论上可实现比单位映射($ B_t=I_{d \times d}, \;C_t= I_{m\times m} $)更好的收敛效果.
4) 一致性步骤(步骤6): 智能体$ i $与邻居智能体通信并执行一致性步骤以产生$ t+1 $时刻的决策.
为了便于下文分析, 记$ h_{i, \;t}^x({\boldsymbol{x}})= f_{i, \;t} ({\boldsymbol{x}}, \; {\boldsymbol{y}}_{i, \;t}), $ $ h_{i, \;t}^y({\boldsymbol{y}})=f_{i, \;t} ({\boldsymbol{x}}_{i, \;t}, \; {\boldsymbol{y}}) $. 相应地, 记其光滑版本为
$$ \begin{split} \left\{\begin{aligned} & \hat{h}_{i, \;t}^x( {\boldsymbol{x}})={{\rm{E}}}_{{\boldsymbol{v}}_{i, \;t}^x \in {\mathbb{B}}_1^d} h_{i, \;t}^x ({\boldsymbol{x}}+ {\delta_t^x} {\boldsymbol{v}}_{i, \;t}^x) \; \\ &\hat{h}_{i, \;t}^y( {\boldsymbol{y}})={{\rm{E}}}_{{\boldsymbol{v}}_{i, \;t}^y \in {\mathbb{B}}_1^m} h_{i, \;t}^y ({\boldsymbol{y}}+ {\delta_t^y} {\boldsymbol{v}}_{i, \;t}^y) \end{aligned}\right. \end{split} $$ (14) 其中, $ {\boldsymbol{v}}_{i, \;t}^x\in\bf{R}^d $和$ {\boldsymbol{v}}_{i, \;t}^y\in\bf{R}^m $分别是从单位球$ \mathbb{B}_1^d $和 $ \mathbb{B}_1^m $ 上均匀获取的随机变量. 结合(14)和算法1, 进一步得
$$ \begin{aligned} \left\{\begin{aligned} & \tilde{{\boldsymbol{g}}}_{i, \;t}^{x} =\frac{d} {{\delta_t^x}} h_{i, \;t}^x ({\boldsymbol{x}}_{i, \;t}+ {\delta_t^x} {\boldsymbol{u}}_{i, \;t}^x) {\boldsymbol{u}}_{i, \;t}^x \; \\ &\tilde{{\boldsymbol{g}}}_{i, \;t}^{y} =\frac{m} {{\delta_t^y}} h_{i, \;t}^y ({\boldsymbol{y}}_{i, \;t}+ {\delta_t^y} {\boldsymbol{u}}_{i, \;t}^y) {\boldsymbol{u}}_{i, \;t}^y \end{aligned}\right. \end{aligned} $$ (15) 基于上述分析, 给出一个基本性质.
引理 2. 若假设2和假设3成立, 则向量$ {\boldsymbol{x}}_{i, \;t} \in (1- \xi_t^x){\boldsymbol{X}} $和${\boldsymbol{y}}_{i, \;t} \in (1-\xi_t^y){\boldsymbol{Y}}, \; t \in [T], \; i\in {\cal{V}} $满足如下性质:
$$ \begin{aligned} {{1})}\;\;& {\rm{E}}_{{\boldsymbol{u}}_{i, \;t}^x}[\tilde{{\boldsymbol{g}}}_{i, \;t}^x] = \nabla \hat{h}_{i, \;t}^x ({\boldsymbol{{x}}}_{i, \;t}), \; {\rm{E}}_{{\boldsymbol{u}}_{i, \;t}^y}[\tilde{{\boldsymbol{g}}}_{i, \;t}^y] = \\ & \nabla \hat{h}_{i, \;t}^y ({\boldsymbol{{y}}}_{i, \;t}) \end{aligned} $$ $$ {{2})}\;\; \|\tilde{{\boldsymbol{g}}}_{i, \;t}^x\|_* \leq \frac{dF\bar{p}_*} {{\delta_t^x}}, \; \|\tilde{{\boldsymbol{g}}}_{i, \;t}^y\|_* \leq \frac{mF\bar{p}_*} {{\delta_t^y}} $$ $$ \begin{aligned} {{3})}\;\;&\left|\hat{h}_{i, \;t}^x ({\boldsymbol{x}}) -h_{i, \;t} ({\boldsymbol{x}})\right| \leq L_X \bar{p} {\delta_t^x}, \\ &\left|\hat{h}_{i, \;t}^y ({\boldsymbol{y}}) -h_{i, \;t} ({\boldsymbol{y}})\right| \leq L_Y \bar{p} {\delta_t^y} \end{aligned} $$ 其中, $ \bar{p} $和$ \bar{p}_* $是两个分别满足$ \| {\boldsymbol{x}} \| \leq \bar{p} \| {\boldsymbol{x}} \|_2 $和$ \| {\boldsymbol{x}} \|_* \leq \; \bar{p}_* \| {\boldsymbol{x}} \|_2 $的常数.
证明. $ {1)} $ 基于式(14)、式(15), 性质$ {1)} $能够通过类比文献[19]中的引理1建立.
$ {2)} $ 根据算法1和式(15), 可得
$$ \begin{split} \|\tilde{{\boldsymbol{g}}}_{i, \;t}^x\|_* =\;& \left\|\frac{d} {{\delta_t^x}} h_{i, \;t}^x ({\boldsymbol{x}}_{i, \;t}+ {\delta_t^x} {\boldsymbol{u}}_{i, \;t}^x) {\boldsymbol{u}}_{i, \;t}^x \right\|_* \leq\\ & \frac{dF} {{\delta_t^x}} \left\| {\boldsymbol{u}}_{i, \;t}^x \right\|_* \leq \frac{dF\bar{p}_*} {{\delta_t^x}} \end{split} $$ (16) 其中, 式 (16) 中的第一个不等式结合了函数$ h_{i, \;t}^x $的定义和假设3; 第二个不等式使用了$ \|{\boldsymbol{u}}_{i, \;t}^x \|_2=1 $. 同理可得, $ \|\tilde{{\boldsymbol{g}}}_{i, \;t}^y\|_* \leq \frac{mF\bar{p}_*}{{\delta_t^y}} $.
$ {3)} $ 根据$ \hat{h}_{i, \;t}^x ({\boldsymbol{x}}) $的定义, 有
$$ \begin{split} &\Big| \hat{h}_{i, \;t}^x ({\boldsymbol{x}}) -h_{i, \;t} ({\boldsymbol{x}})\Big| = \\ & \quad\; \Big| {\rm{E}}_{{\boldsymbol{v}}_{i, \;t}^x \in {\mathbb{B}}_1^d} h_{i, \;t}^x ({\boldsymbol{x}}+ {\delta_t^x} {\boldsymbol{v}}_{i, \;t}^x) -h_{i, \;t} ({\boldsymbol{x}})\Big| \leq\\ & \quad\; {\rm{E}}_{{\boldsymbol{v}}_{i, \;t}^x \in {\mathbb{B}}_1^d} \Big| h_{i, \;t}^x ({\boldsymbol{x}}+ {\delta_t^x} {\boldsymbol{v}}_{i, \;t}^x) -h_{i, \;t} ({\boldsymbol{x}})\Big| {\leq} \\ & \quad\; {\rm{E}}_{{\boldsymbol{v}}_{i, \;t}^x \in {\mathbb{B}}_1^d} \Big| f_{i, \;t}^x ({\boldsymbol{x}}+ {\delta_t^x} {\boldsymbol{v}}_{i, \;t}^x, \; {\boldsymbol{y}}_{i, \;t}) \;-\\ &\quad\; f_{i, \;t} ({\boldsymbol{x}}, \; {\boldsymbol{y}}_{i, \;t})\Big| \leq L_X \bar{p} {\delta_t^x} \end{split} $$ (17) 其中, 式 (17) 中的第一个不等式使用了$ |{{\rm{E}}}(a_1)|\leq {{\rm{E}}}(|a_1|), \; \forall a_1\in \bf{R} $; 最后一个不等式结合了假设3和$ {\boldsymbol{v}}_{i, \;t}^x \in \mathbb{B}_1^d $. 同理可得$ |\hat{h}_{i, \;t}^y ({\boldsymbol{y}}) - h_{i, \;t} ({\boldsymbol{y}})| \leq L_Y \bar{p} {\delta_t^y} $.
□ 2.2 算法收敛性分析
以期望动态鞍点遗憾作为性能指标, 本节给出算法1的收敛性结果. 为了便于分析, 定义两个辅助遗憾: 动态鞍点部分遗憾
$$ \left\{ \begin{array}{l} {P-Reg}_d^x (T) = \sum\limits_{t=1}^T \sum\limits_{i=1}^n (f_{i, \;t} ({\boldsymbol{x}}_{i, \;t}, \;{\boldsymbol{y}}_{i, \;t}) - f_{i, \;t} ({\boldsymbol{x}}_t^*, \;{\boldsymbol{y}}_{i, \;t})) \; \\ {P-Reg}_d^y (T) = \sum\limits_{t=1}^T \sum\limits_{i=1}^n ( f_{i, \;t} ({\boldsymbol{x}}_{i, \;t}, \;{\boldsymbol{y}}_t^*) - f_{i, \;t} ({\boldsymbol{x}}_{i, \;t}, \;{\boldsymbol{y}}_{i, \;t})) \end{array}\right. $$ (18) 表示在一个相同的决策变量条件下, 另一个决策变量在时间$ t $上跟踪其最优决策所造成的累积差异.
下面, 首先给出包含优化误差、一致性误差和动态鞍点部分遗憾上界在内的必要引理.
引理 3. 由算法1产生的序列 $ \{{\boldsymbol{x}}_{i, \;t}, \;{\boldsymbol{y}}_{i, \;t}\} $和 $ \{\tilde{ {\boldsymbol{x}}}_{i, \;t}, \; $ $ \tilde{ {\boldsymbol{y}}}_{i, \;t} \}, \;i\in {\cal{V}}, \;t\in[T] $满足下式:
$$ \qquad\qquad\left\{\begin{aligned} &\| \tilde{ {\boldsymbol{x}}}_{i, \;t} -{ {\boldsymbol{x}}}_{i, \;t}\| \leq \dfrac{\alpha_t} {\varrho_x}\| \tilde{ {\boldsymbol{g}}}_{i, \;t}^{x}\|_* \;\qquad&&\;\;\;\; ({19{\mathrm{a}}})\\ &\| \tilde{ {\boldsymbol{y}}}_{i, \;t} -{ {\boldsymbol{y}}}_{i, \;t}\| \leq \dfrac{\eta_t}{\varrho_y}\| \tilde{ {\boldsymbol{g}}}_{i, \;t}^{y}\|_*\qquad&&\;\;\;\; ({19{\mathrm{b}}}) \end{aligned} \right. $$ 证明. 见附录 A.
□ 引理 4. 假定假设1和假设6成立. 记$ {\boldsymbol{x}}_{i, \;t} $, $ {\boldsymbol{y}}_{i, \;t} $的平均版本分别为$ {\boldsymbol{x}}_{avg, \;t}\;=\; \frac{1}{n} \sum_{i=1}^n {\boldsymbol{x}}_{i, \;t}, \; \ {\boldsymbol{y}}_{avg, \;t} =\frac{1}{n} \sum_{i=1}^n {\boldsymbol{y}}_{i, \;t}. $ 则对所有$ i\in[n], \; t \in [T], \; T\geq 2 $, 有
$$ \left\{\begin{aligned} & \sum\limits_{t=1}^T \sum\limits_{i=1}^n\| {\boldsymbol{x}}_{i, \;t} - {\boldsymbol{x}}_{avg, \;t}\| \leq D_1 + D_2 k \sum\limits_{t=1}^{T-1}\frac{\alpha_t}{{\delta_t^x}}\quad\; &&(20\rm a) \\ & \sum\limits_{t=1}^T \sum\limits_{i=1}^n\| {\boldsymbol{y}}_{i, \;t} - {\boldsymbol{y}}_{avg, \;t}\| \leq D_3 + D_4 k \sum\limits_{t=1}^{T-1} \frac{\eta_t}{{\delta_t^y}}\quad\; && (20\rm b) \end{aligned}\right. $$ 其中, $ k=\max\{d, \; m \}, $ $ D_1=\sum_{i=1}^n\| {\boldsymbol{x}}_{i, \;1} - {\boldsymbol{x}}_{avg, \;1}\| \;+ \frac{ n \Gamma}{1-\sigma} \sum_{j=1}^n \| {\boldsymbol{x}}_{j, \;1}\| $, $ D_2=\frac{n^2 F\bar{p}_* \Gamma}{ \varrho_x(1-\sigma)}, \; $ $ D_3= \sum_{i=1}^n\| {\boldsymbol{y}}_{i, \;1} \;- {\boldsymbol{y}}_{avg, \;1}\| + \frac{ n \Gamma}{1-\sigma} \sum_{j=1}^n \| {\boldsymbol{y}}_{j, \;1}\| $, $ D_4=\frac{n^2 F\bar{p}_* \Gamma}{ \varrho_y(1-\sigma)}. $
证明. 见附录B.
□ 引理 5. 假定假设1 ~ 6成立. 则对于$ T \geq 2 $, 由算法1产生的序列 $ \{{\boldsymbol{x}}_{i, \;t}, \;{\boldsymbol{y}}_{i, \;t}\}, \;i\in {\cal{V}}, \;t\in[T] $满足下式:
$$\left\{ \begin{aligned} {\rm{E}}&({P-Reg}_d^x (T)) \leq E_1k^2 \sum\limits_{t=1}^T \frac{\alpha_t}{(\delta_t^x)^2} + \frac{n R_X}{\alpha_T}\;+ \\ &n K_X \sum\limits_{t=1}^T \frac{1}{\alpha_t} \| {\boldsymbol{x}}_{t+1}^*-B_t {\boldsymbol{x}}_t^*\|+ E_2 \sum\limits_{t=1}^T (\xi_t^x + {\delta_t^x})\;+ \\ &n K_X R_1 \sum\limits_{t=1}^T \frac{\xi_t^x -\xi_{t+1}^x}{\alpha_t}:=\Lambda_{PR}^x(T) \qquad\qquad\;\;\;(21\rm a) \\ {\rm{E}} &( {P-Reg}_d^y (T) ) \leq E_3 k^2 \sum\limits_{t=1}^T \frac{\eta_t}{(\delta_t^y)^2} + \frac{n R_Y}{\eta_T}\;+ \\ & n K_Y \sum\limits_{t=1}^T \frac{1}{\eta_t} \| {\boldsymbol{y}}_{t+1}^*-C_t {\boldsymbol{y}}_t^*\| + E_4 \sum\limits_{t=1}^T (\xi_t^y+ \delta_t^y)\;+ \\ & n K_Y R_2 \sum\limits_{t=1}^T \frac{\xi_t^y -\xi_{t+1}^y}{\eta_t}:=\Lambda_{PR}^y(T) \qquad\qquad\;\;\;(21\rm b)\\[-1pt]\end{aligned} \right.$$ 其中, $ E_1= \frac{nF^2 \bar{p}_*^2}{\varrho_x}, \, E_3 = \frac{n F^2 \bar{p}_*^2}{\varrho_y}, \, E_2 = n\bar{p}L_X \max\{R_1, 2\}, \, $ $ E_4 = n\bar{p}L_Y\max\{R_2, \,2\}, \, R_X = \sup_{ {\boldsymbol{x}}, \, {\boldsymbol{x}}_1 \in {\boldsymbol{X}}} $ $ \Psi_{{\cal{R}}}^x ({\boldsymbol{x}}, {\boldsymbol{x}}_1), \; R_Y\, =\, \sup_{ {\boldsymbol{y}}, \; {\boldsymbol{y}}_1 \in {\boldsymbol{Y}}} \Psi_{{\cal{R}}}^y ({\boldsymbol{y}}, \; {\boldsymbol{y}}_1) $, $ \Lambda_{PR}^x(T), \; $ $ \Lambda_{PR}^y(T) $是两个关于$\, T $ 的函数.
证明. 见附录C.
□ 引理3中的优化误差上界描述了算法1中$ \tilde{ {\boldsymbol{x}}}_{i, \;t} $ 和$ { {\boldsymbol{x}}}_{i, \;t} $ ($ \tilde{ {\boldsymbol{y}}}_{i, \;t} $和$ { {\boldsymbol{y}}}_{i, \;t} $)之间的差异. 引理4 描述了网络中智能体之间决策不一致所带来的一致性惩罚. 引理5 中两个动态鞍点部分遗憾上界是连接主要收敛结果的关键, 对期望动态鞍点遗憾的分析起着重要作用. 接下来, 详细建立算法1的期望动态鞍点遗憾上界.
定理 1. 假定假设1 ~ 6成立, $ \{{\boldsymbol{x}}_{i, \;t}, \; {\boldsymbol{y}}_{i, \;t}\}, \; i\in[n] $是由算法1产生的决策序列. 则对于$ T \geq 2 $ 和任意$ j \in {\cal{V}} $, 算法1的期望动态鞍点遗憾满足下式:
$$ \begin{split} & {ESP-Regret}_d^j(T) \leq \max \left\{ \Lambda_{PR}^x(T), \; \Lambda_{PR}^y(T) \right\} +\\ &\quad E_5 + E_{6}k \sum\limits_{t=1}^{T-1} \frac{\alpha_t}{{\delta_t^x}} + E_{7}k\sum\limits_{t=1}^{T-1} \frac{\eta_t}{{\delta_t^y}} \; \end{split} $$ (22) 其中, $ E_5\;=\;(n+2)\;L_X D_1+(n+2)\;L_Y D_3, \; E_6\;= (n+ 2)L_X D_2, \; E_7=(n+2) L_Y D_4 $.
证明. 本文将$ {ESP-Regret}_d^j(T) $的上界证明划分为两个部分, 首先分析 $ {\rm{E}}(\sum_{t=1}^T \sum_{i=1}^n{f_{i, \; t} ({\boldsymbol{x}}_{j, \;t}, \; {\boldsymbol{y}}_{j, \;t})} \;- \sum_{t=1}^T\sum_{i=1}^n{f_{i, \;t} ({\boldsymbol{x}}_t^*, \;{\boldsymbol{y}}_t^*)}) $的上界.
$$ \begin{aligned} &\sum\limits_{i=1}^n {f_{i, \;t} ({\boldsymbol{x}}_{j, \;t}, \;{\boldsymbol{y}}_{j, \;t})} - \sum\limits_{i=1}^n{f_{i, \;t} ({\boldsymbol{x}}_t^*, \;{\boldsymbol{y}}_t^*)}\leq \\ & \sum\limits_{i=1}^n{f_{i, \;t} ({\boldsymbol{x}}_{j, \;t}, \;{\boldsymbol{y}}_{j, \;t})} - \sum\limits_{i=1}^n{f_{i, \;t} ({\boldsymbol{x}}_t^*, \;{\boldsymbol{y}}_{avg, \;t})}\leq \\ & \sum\limits_{i=1}^n (f_{i, \;t} ({\boldsymbol{x}}_{i, \;t}, \;{\boldsymbol{y}}_{i, \;t}) - f_{i, \;t} ({\boldsymbol{x}}_t^*, \;{\boldsymbol{y}}_{i, \;t})) + \sum\limits_{i=1}^n \left( L_X \| {\boldsymbol{x}}_{i, \;t} \;-\right. \\ &\left. {\boldsymbol{x}}_{j, \;t}\| +L_Y\|{\boldsymbol{y}}_{i, \;t}-{\boldsymbol{y}}_{j, \;t}\| +L_Y\|{\boldsymbol{y}}_{i, \;t}-{\boldsymbol{y}}_{avg, \;t}\| \right)\leq \\ & \sum\limits_{i=1}^n(f_{i, \;t} ({\boldsymbol{x}}_{i, \;t}, \;{\boldsymbol{y}}_{i, \;t}) - f_{i, \;t} ({\boldsymbol{x}}_t^*, \;{\boldsymbol{y}}_{i, \;t})) + \sum\limits_{i=1}^n \left( (n+1)\times\right. \\ & \left. L_X \| {\boldsymbol{x}}_{i, \;t}-{\boldsymbol{x}}_{avg, \;t}\|\;+\;(n+2)L_Y\|{\boldsymbol{y}}_{i, \;t} - {\boldsymbol{y}}_{avg, \;t}\| \right) \end{aligned} $$ (23) 其中, 式(23)中的第一个不等式使用了式(3); 第二个不等式结合了假设3; 最后一个不等式结合了$\sum_{i=1}^n \| {\boldsymbol{x}}_{i,\, t}\,- $ $ {\boldsymbol{x}}_{j, \,t} \| \leq \sum_{i=1}^n (\| {\boldsymbol{x}}_{i,\, t} - {\boldsymbol{x}}_{avg, \,t}\| + \| {\boldsymbol{x}}_{avg, \,t} $ $ -\;{\boldsymbol{x}}_{j, \;t} \|) \leq (n+1) $ $\sum_{i=1}^n \| {\boldsymbol{x}}_{i, \;t}- {\boldsymbol{x}}_{avg, \;t}\| .$
对式(23)从$ t=1 $到$ T $求和并取期望, 可得
$$ \begin{split} {\rm{E}}& \left( \sum\limits_{t=1}^T \sum\limits_{i=1}^n{f_{i, \;t} ({\boldsymbol{x}}_{j, \;t}, \;{\boldsymbol{y}}_{j, \;t})} - \sum\limits_{t=1}^T\sum\limits_{i=1}^n{f_{i, \;t} ({\boldsymbol{x}}_t^*, \;{\boldsymbol{y}}_t^*)} \right) \leq\\ & {\rm{E}}\left( {P-Reg}_d^x (T)\right) + {\rm{E}}\Bigg( \sum\limits_{t=1}^T \sum\limits_{i=1}^n ( (n+1)L_X \times \| {\boldsymbol{x}}_{i, \;t}\,-\\ &{\boldsymbol{x}}_{avg, \;t}\| +(n+2)L_Y\|{\boldsymbol{y}}_{i, \;t}-{\boldsymbol{y}}_{avg, \;t}\| )\Bigg)\\[-1pt] \end{split} $$ (24) 其次, 类似式(23), $ {\rm{E}}(\sum_{t=1}^T \sum_{i=1}^n{f_{i, \;t} ({\boldsymbol{x}}_t^*, \;{\boldsymbol{y}}_t^*)} \;- \sum_{t=1}^T \sum_{i=1}^n{f_{i, \;t} ({\boldsymbol{x}}_{j, \;t}, \;{\boldsymbol{y}}_{j, \;t})}) $的上界被建立如下.
$$ \begin{aligned} & {\rm{E}} \left( \sum\limits_{t=1}^T\sum\limits_{i=1}^n{f_{i, \;t} ({\boldsymbol{x}}_t^*, \;{\boldsymbol{y}}_t^*)} - \sum\limits_{t=1}^T\sum\limits_{i=1}^n{f_{i, \;t} ({\boldsymbol{x}}_{j, \;t}, \;{\boldsymbol{y}}_{j, \;t})} \right)\leq \\ & \;\quad {\rm{E}} \left( \sum\limits_{t=1}^T\sum\limits_{i=1}^n \left(f_{i, \;t} ({\boldsymbol{x}}_{avg, \;t}, \;{\boldsymbol{y}}_t^*) - f_{i, \;t} ({\boldsymbol{x}}_{j, \;t}, \;{\boldsymbol{y}}_{j, \;t})\right) \right)\leq \\ & \;\quad {\rm{E}}({P-Reg}_d^y (T)) +{\rm{E}} \Bigg( \sum\limits_{t=1}^T \sum\limits_{i=1}^n ( (n+2)L_X \,\times \\ &\quad\; \| {\boldsymbol{x}}_{i, \;t}-{\boldsymbol{x}}_{avg, \;t}\| +(n+1)L_Y\|{\boldsymbol{y}}_{i, \;t}-{\boldsymbol{y}}_{avg, \;t}\| )\Bigg) \end{aligned} $$ (25) 进一步, 结合式(25)、式(24)及引理5, 可得
$$ \begin{split} & {ESP-Regret}_d^j(T) \leq \max \left\{ \Lambda_{PR}^x(T), \; \Lambda_{PR}^y(T) \right\}+ \\ &\qquad {\rm{E}} \Bigg( (n+2) \sum\limits_{t=1}^T \sum\limits_{i=1}^n (L_X \| {\boldsymbol{x}}_{i, \;t}-{\boldsymbol{x}}_{avg, \;t}\| \;+\\ &\qquad L_Y\|{\boldsymbol{y}}_{i, \;t}-{\boldsymbol{y}}_{avg, \;t}\| )\Bigg) \end{split} $$ (26) 最后, 通过结合式(26)和引理4, 式(22)即可得到.
□ 定理1清晰地展示期望动态鞍点遗憾上界与问题参数, 可设置参数及期望部分鞍点遗憾上界的依赖关系. 同时, 也表明算法1的自由参数设计对遗憾上界有着重要的影响. 为此, 通过设置参数$ \alpha_t, \eta_t, \; \delta_t^x, \; \delta_t^x, \; \xi_t^x, \; \xi_t^y $, 在下文中具体探究这一影响关系.
推论 1. 假定定理1的假设和条件$ V_T = o(T) $成立. 取常数$ \theta_1, \; \theta_2 \in (0, \;1), \; \epsilon_1, \; \epsilon_2 > 0, \; \gamma_1, \; \gamma_2 \;\in (\frac{1}{2}, \; 1), \; $ 设
$$ \begin{split} &\xi_t^x=\theta_1 t^{-\frac{1}{4}}, \; \xi_t^y=\theta_2 t^{-\frac{1}{4}}, \; {\delta_t^x}=r_1 \theta_1 t^{-\frac{1}{4}}, \; \\ &{\delta_t^y}= r_2 \theta_2 t^{-\frac{1}{4}}, \; \alpha_t=\frac{1}{\epsilon_1k} t^{- \gamma_1}, \; \eta_t=\frac{1}{\epsilon_2 k} t^{- \gamma_2} \; \end{split} $$ (27) 则对于$ T \geq 2 $和 $ j \in {\cal{V}} $, 有
$$ \begin{split} &{ESP-Regret}_d^j(T) \leq\\ &\quad{\cal{O}} \left(\max \{kT^{\lambda_1}, \; kT^{\lambda_2} (1+V_T ) \}\right) \end{split} $$ (28) 其中, $ \lambda_1=\max \{\frac{3}{2}-\gamma_1, \; \frac{3}{2}-\gamma_2 \}, \; \lambda_2=\max \{\gamma_1, \; \gamma_2 \} $. 特别地, 当$ \gamma_{1} $和$ \gamma_{2} $设为$ 3/4 $ 或$ 3/4-\log_T {\sqrt{(1+V_T})} $ 时, 期望动态鞍点遗憾上界分别为$ {\cal{O}}(k T^{3/4} (1+V_T)) $ 或 $ {\cal{O}}(k T^{3/4}\sqrt{1+V_T }) $.
证明. 将推论1中的参数条件代入式(22), 可得
$$ \begin{split} &\Lambda_{PR}^x(T)\leq \\ & \quad E_1 k \epsilon_1^{-1} (r_1 \theta_1)^{-2} \sum\limits_{t=1}^T t^{\frac{1}{2}-\gamma_1} + n R_X\epsilon_1k T^{\gamma_1}\;+ \\ &\quad nK_X \epsilon_1k T^{\gamma_1} V_T + E_2 \theta_1 \sum\limits_{t=1}^T t^{-\frac{1}{4}} + E_2 r_1 \theta_1 \sum\limits_{t=1}^T t^{-\frac{1}{4}}\;+ \\ &\quad n K_X R_1 \theta_1 \epsilon_1 k \sum\limits_{t=1}^T t^{\gamma_1} \left( t^{-\frac{1}{4}} -(t+1)^{-\frac{1}{4}}\right)\leq \\ &\quad {\cal{O}} \left(k T^{\frac{3}{2}-\gamma_1} + k T^{\gamma_1} + k T^{\gamma_1} V_T+ k T^{\frac{3}{4}} \right)\\[-1pt] \end{split} $$ (29) 其中, 式(29)中的最后一个不等式结合了下面两个事实:
$$\left\{ \begin{split} &\sum\limits_{t=1}^T {\frac{1}{t^{\gamma_1-\frac{1}{2}}}} \leq 1+ \int_1^T \frac{1}{t^{\gamma_1-\frac{1}{2}}} {\rm d}t \leq \frac{1}{\frac{3}{2}-\gamma_1} T^{\frac{3}{2}-\gamma_1} \\ & \sum\limits_{t=1}^T t^{\gamma_1} \left( t^{-\frac{1}{4}} -(t+1)^{-\frac{1}{4}} \right)\leq \\ & \quad \sum\limits_{t=1}^T t^{\gamma_1}t^{-\frac{1}{4}}\left(1-\left(1-\frac{1}{t+1}\right)\right)\leq \frac{4}{3} T^{\frac{3}{4}}\\[-1pt] \end{split} \right.$$ (30) 类似地, $ \Lambda_{PR}^y(T) $满足
$$ \Lambda_{PR}^y(T) \leq {\cal{O}} \left( kT^{\frac{3}{2}-\gamma_2} + k T^{\gamma_2} + k T^{\gamma_2} V_T + k T^{\frac{3}{4}} \right) $$ (31) 基于两个事实: $ E_{6}k \sum_{t=1}^{T-1} \frac{\alpha_t}{{\delta_t^x}}\; \leq \;{\cal{O}} (T^{\frac{5}{4}-\gamma_1}), \; $ $ E_{7}k\sum_{t=1}^{T-1} \frac{\eta_t}{{\delta_t^y}}\leq {\cal{O}} (T^{\frac{5}{4}-\gamma_2}) $, 并结合式(29)、式(31)、式(22)和$ \lambda_1, \; \lambda_2 $的定义, 有
$$ \begin{split} &{ESP-Regret}_d^j(T) \leq \\ &\qquad{\cal{O}} \left( kT^{\lambda_1} +kT^{\lambda_2}(1+V_T)+kT^{\frac{3}{4}} \right) \end{split}$$ (32) 进而易得式(28).
□ 注 2. 根据推论1, 当应用带有$ V_T $知识的步长时, 所提算法的遗憾上界$ {\cal{O}}(k T^{3/4}\sqrt{1+V_T }) $匹配了单点Bandit反馈情况下的集中式结果[38]. 作为对比, 当$ V_T $知识未知时, 步长的设计是自由的, 不依赖知识, 即参数设置不要求$ T, \; V_T $的先验知识, 但也同时导致了更优参数调整的困难. 此外, 与文献[37]中基于两点梯度估计的算法相比, 算法1中每个单点估计器仅要求一个函数值, 且其估计参数不依赖于$ T $的先验知识, 这有效增强了算法可操作性和实用性. 但就算法收敛性能而言, 式(28)中遗憾界的收敛速率差于文献[37]的结果.
注 3. 观察式(28)可知, $ V_T $的阶次对遗憾上界的阶次有着直接的影响. 根据式(4), $ V_T $依赖于优化问题的同时, 也依赖于预测映射序列$ \{ B_t \} $和$ \{ C_t \}, \; t\in [T] $. 特别地, 恰当的$ \{ B_t \} $和$ \{ C_t \}, \; t\in [T] $能够获得比无预测映射设定更小的$ V_T $阶次, 进而建立更紧的遗憾上界. 在实际中, 预测映射的选取依赖于优化问题的先验知识, 且良好的预测效果通常取决于对相邻时刻最优决策变化规律的掌握程度.
3. 基于近似计算方法的变种算法
在每个时间$ t $, 决策变量$ \;{\boldsymbol{x}}_{i, \;t+1} $和$ \;{\boldsymbol{y}}_{i, \;t+1} $的生成都依赖于算法1中两个优化子程序(步骤4)的精确求解. 然而, 在某些情况下, 如大维度$ d $或$ m $下的优化问题, 计算其精确解的成本非常昂贵. 此外, 优化子程序的解通常无法在迭代优化中达到无限精度. 因此, 为解决这个问题, 本文开发了算法1的近似计算变种算法, 称为基于单点梯度估计的近似计算分布式在线Bandit鞍点优化算法.
不同于算法1步骤4中的计算, 基于单点梯度估计的近似计算分布式在线Bandit鞍点优化算法仅要求智能体$ i $在$ \rho_t^x, \; \rho_t^y $精确度的意义下分别获得下面优化问题的近似解$ \breve{ {\boldsymbol{x}}}_{i, \;t}, \; \breve{ {\boldsymbol{y}}}_{i, \;t} $:
$$ \left\{ \begin{split} &\arg \underset{ {\boldsymbol{x}}\in(1-\xi_t^x) {\boldsymbol{X}}}{\min} \left\{ \langle {\boldsymbol{x}}, \; \tilde{{\boldsymbol{g}}}_{i, \;t}^{x} \rangle + \frac{1}{\alpha_t} \Psi_{{\cal{R}}}^x ( {\boldsymbol{x}}, \; {\boldsymbol{x}}_{i, \;t})\right\} \; \\ &\arg \underset{ {\boldsymbol{y}}\in(1-\xi_t^y) {\boldsymbol{Y}}}{\min} \left\{ - \langle {\boldsymbol{y}}, \; \tilde{{\boldsymbol{g}}}_{i, \;t}^{y} \rangle + \frac{1}{\eta_t} \Psi_{{\cal{R}}}^y ( {\boldsymbol{y}}, \; {\boldsymbol{y}}_{i, \;t})\right\} \end{split} \right.$$ (33) 与此同时, 原来的步骤5相应地修改为
$$ {\boldsymbol{p}}_{i, \;t}^x= B_t \breve{ {\boldsymbol{x}}}_{i, \;t}, \; \ {\boldsymbol{p}}_{i, \;t}^y= C_t \breve{ {\boldsymbol{y}}}_{i, \;t} $$ (34) 在式(33)中, $ \rho_t^x $-近似解$ \breve{ {\boldsymbol{x}}}_{i, \;t}\in (1-\xi_t^x) {\boldsymbol{X}} $ 和 $ \rho_t^y $-近似解$ \breve{ {\boldsymbol{y}}}_{i, \;t}\in (1-\xi_t^y) {\boldsymbol{Y}} $分别等价于下面的不等式[8]:
$$\left\{ \begin{split} & \langle \breve{ {\boldsymbol{x}}}_{i, \;t}, \; \tilde{{\boldsymbol{g}}}_{i, \;t}^{x} \rangle + \frac{1}{\alpha_t} \Psi_{{\cal{R}}}^x (\breve{ {\boldsymbol{x}}}_{i, \;t}, \; {\boldsymbol{x}}_{i, \;t})\leq \\ &\qquad \langle \breve{ {\boldsymbol{x}}}_{i, \;t}^*, \; \tilde{{\boldsymbol{g}}}_{i, \;t}^{x} \rangle + \frac{1}{\alpha_t} \Psi_{{\cal{R}}}^x (\breve{ {\boldsymbol{x}}}_{i, \;t}^*, \; {\boldsymbol{x}}_{i, \;t}) + \rho_t^x \\ &- \langle \breve{ {\boldsymbol{y}}}_{i, \;t}, \; \tilde{{\boldsymbol{g}}}_{i, \;t}^{y} \rangle + \frac{1}{\eta_t} \Psi_{{\cal{R}}}^y (\breve{ {\boldsymbol{y}}}_{i, \;t}, \; {\boldsymbol{y}}_{i, \;t})\leq \\ & \qquad - \langle \breve{ {\boldsymbol{y}}}_{i, \;t}^*, \; \tilde{{\boldsymbol{g}}}_{i, \;t}^{y} \rangle + \frac{1}{\eta_t} \Psi_{{\cal{R}}}^y (\breve{ {\boldsymbol{y}}}_{i, \;t}^*, \; {\boldsymbol{y}}_{i, \;t}) + \rho_t^y \end{split} \right.$$ (35) 其中, $ \breve{ {\boldsymbol{x}}}_{i, \;t}^* ={\arg\min}_{ {\boldsymbol{x}}\in(1-\xi_t^x) {\boldsymbol{X}}} \{ \langle {\boldsymbol{x}}, \; \tilde{{\boldsymbol{g}}}_{i, \;t}^{x} \rangle + \frac{1}{\alpha_t} \Psi_{{\cal{R}}}^x ({\boldsymbol{x}}, $ $ {\boldsymbol{x}}_{i, \;t})\} $, $ \breve{ {\boldsymbol{y}}}_{i, \;t}^* ={\arg\min}_{ {\boldsymbol{y}}\in(1-\xi_t^y) {\boldsymbol{Y}}} \{ - \langle {\boldsymbol{y}}, \; \tilde{{\boldsymbol{g}}}_{i, \;t}^{y} \rangle + \frac{1}{\eta_t} \Psi_{{\cal{R}}}^y $ $ ({\boldsymbol{y}}, \; {\boldsymbol{y}}_{i, \;t}) \}. $ 特别地, 在实际优化问题中, 当$ \rho_t^x=0 $, $ \rho_t^y=0 $满足时, 式(33)中的优化子程序将会退化为算法1中的精确版本, 并且在相同的参数设置下, $ \breve{ {\boldsymbol{x}}}_{i, \;t}=\breve{ {\boldsymbol{x}}}_{i, \;t}^*=\tilde{ {\boldsymbol{x}}}_{i, \;t} $. 因此, 近似解的设定是要比无限精确度的设定更加实际、灵活和节约计算成本.
接下来, 分析基于单点梯度估计的近似计算分布式在线Bandit鞍点优化算法的期望动态鞍点遗憾. 首先, 对函数$ {\cal{R}}^x $和$ {\cal{R}}^y $做出如下假设.
假设 7. 函数$ {\cal{R}}^x $和$ {\cal{R}}^y $在$ {\boldsymbol{X}} $ 和$ {\boldsymbol{Y}} $上分别存在 $ G_{RX} $- 和 $ G_{RY} $-Lipschitz梯度, 即对任意$ {\boldsymbol{x}}_1, \;{\boldsymbol{x}}_2 \in {\boldsymbol{X}} $ 和 $ {\boldsymbol{y}}_1, \; {\boldsymbol{y}}_2 \in {\boldsymbol{Y}} $, 有
$$\left\{ \begin{split} &\| \nabla {\cal{R}}^x( {\boldsymbol{x}}_1)-\nabla{\cal{R}}^x( {\boldsymbol{x}}_2)\|_* \leq G_{RX} \| {\boldsymbol{x}}_1- {\boldsymbol{x}}_2 \| \;\\ &\| \nabla {\cal{R}}^y( {\boldsymbol{y}}_1)-\nabla{\cal{R}}^y( {\boldsymbol{y}}_2)\|_* \leq G_{RY} \| {\boldsymbol{y}}_1- {\boldsymbol{y}}_2 \| \end{split} \right.$$ (36) 注 4. 假设7通常用于分布式镜像下降优化中的收敛分析[8, 49-50]. 结合第1.1节中的例子, 可以验证欧氏距离, 由$ {\cal{R}}^x({\boldsymbol{x}})=\frac{1}{2} {\boldsymbol{x}}^{\rm T} Q {\boldsymbol{x}} $生成的 Mahalanobis距离, 带有非零下界的KL散度满足假设7.
式(33)中的近似优化子程序在带来算法优势的同时, 也带来了一些优化误差. 引理6和引理7分别建立了一些近似误差界和类似于引理5的关键上界.
引理 6. 由基于单点梯度估计的近似计算分布式在线Bandit鞍点优化算法产生的序列 $ \{{\boldsymbol{x}}_{i, \;t}, \;{\boldsymbol{y}}_{i, \;t}\} $ 和 $\{\breve{ {\boldsymbol{x}}}_{i, \;t},\; \breve{ {\boldsymbol{y}}}_{i, \;t} \},\;i\in {\cal{V}}, \;t\in[T] $满足下式:
$$\qquad\qquad \left\{ \begin{split} & \|\breve{ {\boldsymbol{x}}}_{i, \;t} - \breve{ {\boldsymbol{x}}}_{i, \;t}^*\| \leq \sqrt \frac{2\alpha_t \rho_t^x}{\varrho_x} \qquad\;\;\;&&(37\rm a) \\ & \|\breve{ {\boldsymbol{y}}}_{i, \;t} - \breve{ {\boldsymbol{y}}}_{i, \;t}^*\| \leq \sqrt \frac{2\eta_t \rho_t^y}{\varrho_y} \qquad\;\;\;&&(37\rm b) \\ &\| \breve{ {\boldsymbol{x}}}_{i, \;t}^* -{ {\boldsymbol{x}}}_{i, \;t}\| \leq \frac{\alpha_t} {\varrho_x}\| \tilde{ {\boldsymbol{g}}}_{i, \;t}^{x}\|_* \qquad\;\;\;&&(37\rm c) \\ & \| \breve{ {\boldsymbol{y}}}_{i, \;t}^* -{ {\boldsymbol{y}}}_{i, \;t}\| \leq \frac{\eta_t}{\varrho_y}\| \tilde{ {\boldsymbol{g}}}_{i, \;t}^{y}\|_* \qquad\;\;\;&&(37\rm d)\end{split} \right.$$ 证明. 见附录D.
□ 引理 7. 假定假设1 ~ 7成立. 则对于$ T \geq 2 $, 由基于单点梯度估计的近似计算分布式在线Bandit鞍点优化算法产生的序列 $ \{{\boldsymbol{x}}_{i, \;t}, \;{\boldsymbol{y}}_{i, \;t}\}, \;i\in {\cal{V}}, \;t\in[T] $满足下式:
$$ \qquad\; \left\{ \begin{aligned} {\rm{E}} &\left({P-Reg}_d^x (T)\right)\leq \\ & \Lambda_{PR}^x(T)+ E_8 \sum\limits_{t=1}^T \sqrt{\frac{ \rho_t^x}{ \alpha_t}}:=\Xi_{PR}^x(T) \quad&& (38\rm a)\\ {\rm{E}} &\left( {P-Reg}_d^y (T) \right)\leq \\ & \Lambda_{PR}^y (T)+ E_9 \sum\limits_{t=1}^T \sqrt{\frac{ \rho_t^y}{ \eta_t}}:=\Xi_{PR}^y(T) \quad&& (38\rm b) \end{aligned}\right. $$ 其中$ E_8= \frac{4\sqrt{2} n R_1G_{RX}}{\sqrt{\varrho_x}} $, $ E_9=\frac{4\sqrt{2} n R_2G_{RY}}{\sqrt{\varrho_y}} $.
证明. 见附录E.
□ 接下来, 给出算法的期望动态鞍点遗憾上界.
定理 2. 假定假设1 ~ 7成立. 则对于$ T \geq 2 $, 由基于单点梯度估计的近似计算分布式在线Bandit鞍点优化算法产生的决策序列$ \{{\boldsymbol{x}}_{i, \;t}, \;{\boldsymbol{y}}_{i, \;t}\}, \; i\in[n] $ 满足下式:
$$ \begin{split} &{ESP-Regret}_d^j(T) \leq \max \left\{ \Xi_{PR}^x(T), \; \Xi_{PR}^y(T) \right\}+\\ &\qquad E_5+E_{6}k \sum\limits_{t=1}^{T-1} \frac{\alpha_t}{{\delta_t^x}} + E_{7}k\sum\limits_{t=1}^{T-1} \frac{\eta_t}{{\delta_t^y}} \;+ \\ &\qquad E_{10} \sum\limits_{t=1}^{T-1} \sqrt{\alpha_t \rho_t^{x}}+ E_{11} \sum\limits_{t=1}^{T-1} \sqrt{\eta_t \rho_t^{y}} \end{split} $$ (39) 其中, $ E_{10}= \frac{\sqrt{2}n^2 (n+2)L_X \Gamma}{ \sqrt{\varrho_x}(1-\sigma)}, \; E_{11}=\frac{\sqrt{2}n^2 (n+2)L_Y \Gamma}{ \sqrt{\varrho_y}(1-\sigma)}. $
证明. 类比于式(26)并结合引理7, 可得
$$ \begin{split} & {ESP-Regret}_d^j(T) \leq\\ &\quad \max \Bigg\{ \Lambda_{PR}^x(T)+ E_8 \sum\limits_{t=1}^T \sqrt{\frac{ \rho_t^x}{ \alpha_t}}, \; \Lambda_{PR}^y(T) \;+ \\ &\quad E_9 \sum\limits_{t=1}^T \sqrt{\frac{ \rho_t^y}{ \eta_t}} \Bigg\} +{\rm{E}} \Bigg( (n+2) \sum\limits_{t=1}^T \sum_{i=1}^n (L_X \| {\boldsymbol{x}}_{i, \;t}\;-\\ &\quad {\boldsymbol{x}}_{avg, \;t}\| +L_Y\|{\boldsymbol{y}}_{i, \;t}-{\boldsymbol{y}}_{avg, \;t}\| )\Bigg) \\[-1pt]\end{split} $$ (40) 接下来, 针对式(40)中小于等于号右边的第二项进行分析. 首先, 根据引理6, 可得
$$ \left\{ \begin{aligned} \|\breve{ {\boldsymbol{x}}}_{i, \;t} - {\boldsymbol{x}}_{i, \;t}\| & \leq \|\breve{ {\boldsymbol{x}}}_{i, \;t} - \breve{ {\boldsymbol{x}}}_{i, \;t}^*\|+ \| \breve{ {\boldsymbol{x}}}_{i, \;t}^* -{ {\boldsymbol{x}}}_{i, \;t}\|\leq \\ & \sqrt{\frac{2\alpha_t \rho_t^x}{\varrho_x}} +\frac{\alpha_t} {\varrho_x}\| \tilde{ {\boldsymbol{g}}}_{i, \;t}^{x}\|_* \; \\ \|\breve{ {\boldsymbol{y}}}_{i, \;t} - {\boldsymbol{y}}_{i, \;t}\| & \leq \|\breve{ {\boldsymbol{y}}}_{i, \;t} - \breve{ {\boldsymbol{y}}}_{i, \;t}^*\| + \| \breve{ {\boldsymbol{y}}}_{i, \;t}^* -{ {\boldsymbol{y}}}_{i, \;t}\|\leq \\ & \sqrt{\frac{2\eta_t \rho_t^y}{\varrho_y}}+ \frac{\eta_t}{\varrho_y}\| \tilde{ {\boldsymbol{g}}}_{i, \;t}^{y}\|_* \end{aligned} \right.$$ (41) 然后, 基于上述不等式, 在新定义$ {\boldsymbol{e}}_{i, \;t}^x=\breve{ {\boldsymbol{x}}}_{i, \;t}\;- {\boldsymbol{x}}_{i, \;t}, \;\ {\boldsymbol{e}}_{i, \;t}^y=\breve{ {\boldsymbol{y}}}_{i, \;t}- {\boldsymbol{y}}_{i, \;t} $下沿着引理4的证明, 可得
$$ \left\{ \begin{aligned} & \sum\limits_{t=1}^T \sum\limits_{i=1}^n\| {\boldsymbol{x}}_{i, \;t} - {\boldsymbol{x}}_{avg, \;t}\| \leq D_1 +\frac{n \Gamma}{ 1-\sigma} \sum\limits_{t=1}^{T-1}\bigg( n \;\times \\ &\quad \quad \sqrt{\frac{2}{\varrho_x}}\sqrt{\alpha_t \rho_t^{x}}+ \frac{\alpha_t} {\varrho_x} \sum\limits_{i=1}^n \left\| \tilde{ {\boldsymbol{g}}}_{i, \;t}^{x}\right\|_*\bigg) \; \\ & \sum\limits_{t=1}^T \sum\limits_{i=1}^n\| {\boldsymbol{y}}_{i, \;t} - {\boldsymbol{y}}_{avg, \;t}\| \leq D_3 +\frac{n \Gamma}{ 1-\sigma} \sum\limits_{t=1}^{T-1}\bigg( n \;\times \\ &\quad \quad \sqrt{\frac{2}{\varrho_y}} \sqrt{\eta_t \rho_t^{y}}+ \frac{\eta_t} {\varrho_y} \sum\limits_{i=1}^n \left\| \tilde{ {\boldsymbol{g}}}_{i, \;t}^{y}\right\|_*\bigg) \end{aligned} \right. $$ (42) 最后, 通过结合式(40)、式(42)和引理2, 式(39)可以被得到.
□ 定理2 中的收敛结果表明, 基于单点梯度估计的近似计算分布式在线Bandit鞍点优化算法的期望动态鞍点遗憾上界, 不仅依赖于像定理1中的自由设置参数$ \alpha_t, \; \eta_t, \; \delta_t^x, \; \delta_t^y, \; \xi_t^x, \; \xi_t^y $, 还依赖于精确度$ \rho_t^ x $和$ \rho_t^y $. 特别地, 当$ \rho_t^x=0 $和$ \rho_t^y=0 $ 满足时, 式(39)中的遗憾上界将退化为式(22)中的遗憾上界, 这充分体现和验证了近似优化子程序的灵活性. 接下来, 通过具体设置自由参数, 给出具体的遗憾上界.
推论 2. 假定定理2的假设和条件$ V_T = o(T) $成立. 在与推论1相同的参数$ {\delta_t^x}, \; {\delta_t^y}, \; \xi_t^x, \; $ $ \xi_t^y, \; \alpha_t, \; \eta_t $选择下, 取
$$ \rho_t^x= t^{-\omega_1}, \;\rho_t^y=t^{-\omega_2}, \; \ \omega_1> \gamma_1, \; \omega_2>\gamma_2 \; $$ (43) 则对于$ T \geq 2 $和$ j \in {\cal{V}} $, 有
$$ \begin{align} &{ESP-Regret}_d^j(T)\leq \\ &\left \{ \begin{split} &{\cal{O}} \left(\max \{kT^{\lambda_3}, \; kT^{\lambda_2} (1+V_T )\}\right), \; \\ &\quad \text{当} \ \omega_1 \in(\gamma_1, \; 3 \gamma_1- 1] \ \text{或}\ \omega_2 \in (\gamma_2, \; 3\gamma_2- 1] \;\\ &{\cal{O}} \left(\max \{kT^{\lambda_1}, \; kT^{\lambda_2} (1+V_T )\}\right), \; \\ &\quad \text{当} \ \omega_1> 3\gamma_1-1 \ \text{且}\ \omega_2>3\gamma_2-1 \end{split} \right. \end{align} $$ (44) 其中$ \lambda_3 = \max \{\frac{3}{2}-\gamma_1, \; \frac{3}{2}-\gamma_2, \; 1 - \frac{\omega_1-\gamma_1}{2}, \; 1 - \frac{\omega_2-\gamma_2}{2}\} $. 特别地, 表1直观地展示了在不同步长和精确度下的遗憾界.
表 1 不同步长和精确度下的遗憾界Table 1 The regret bounds under different step sizes and precisions$ \alpha_t $ $ \eta_t $ $ \rho_t^x $ $ \rho_t^y $ $ {ESP-Regret}_d^j(T) $ $ \dfrac{1}{\epsilon_1k} t^{-\frac{3}{4}} $ $ \dfrac{1}{\epsilon_2 k} t^{-\frac{3}{4}} $ $ t^{-1} $ $ t^{-1} $ $ {\cal{O}}(\max \{kT^{\frac{7}{8}}, \; kT^{\frac{3}{4}} V_T \}) $ $ t^{-\frac{5}{4}} $ $ t^{-\frac{5}{4}} $ $ {\cal{O}}(kT^{\frac{3}{4}} (1+V_T)) $ $ t^{-2} $ $ t^{-2} $ $ {\cal{O}}(kT^{\frac{3}{4}} (1+V_T)) $ $ \dagger $ $ \dfrac{1}{\epsilon_1k} t^{-\varpi_1} $ $ \dfrac{1}{\epsilon_2 k} t^{-\varpi_1} $ $ t^{-1} $ $ t^{-1} $ ${\cal{O}}(\max \{kT^{\frac{3}{4}}\sqrt{1+V_T}, \; kT^{\frac{7}{8}} (1+V_T)^{-\frac{1}{4}} \})$ $ t^{-\frac{5}{4}} $ $ t^{-\frac{5}{4}} $ $ {\cal{O}}(kT^{\frac{3}{4}} \sqrt{1+V_T}) $ $ t^{-2} $ $ t^{-2} $ $ {\cal{O}}(kT^{\frac{3}{4}} \sqrt{1+V_T}) $ 注: $ \dagger $表示$ \varpi_1=3/4-\log_T {\sqrt{1+V_T}} $. 证明. 使用式(43)中的参数设置, 式(39)中含有$ \rho_t^x $ 和$ \rho_t^y $的项满足
$$\left\{ \begin{split} &E_8 \sum\limits_{t=1}^T \sqrt{\frac{ \rho_t^x}{ \alpha_t}} = E_8 \sqrt{\epsilon_1 k} \sum\limits_{t=1}^T t^{-\frac{\omega_1-\gamma_1}{2}} \; \\ &E_9 \sum\limits_{t=1}^T \sqrt{\frac{ \rho_t^y}{ \eta_t}} = E_9 \sqrt{\epsilon_2 k} \sum\limits_{t=1}^T t^{-\frac{\omega_2-\gamma_2}{2}} \; \\ &E_{10} \sum\limits_{t=1}^{T-1} \sqrt{\alpha_t \rho_t^{x}} = E_{10}(\epsilon_1 k)^{-\frac{1}{2}} \sum\limits_{t=1}^T t^{-\frac{\gamma_1+\omega_1}{2}} \; \\ &E_{11} \sum\limits_{t=1}^{T-1} \sqrt{\eta_t \rho_t^{y}} = E_{11} (\epsilon_2 k)^{-\frac{1}{2}} \sum\limits_{t=1}^T t^{-\frac{\gamma_2+\omega_2}{2}} \end{split} \right.$$ (45) 基于不等式$ t^{-\frac{\omega_1-\gamma_1}{2}} \;>\; t^{-\frac{\gamma_1+\omega_1}{2}}$和$\; t^{-\frac{\omega_2-\gamma_2}{2}} > t^{- \frac{\gamma_2+\omega_2}{2}} $, $ t \in [T] $, 并结合式(45)、式(32)和式(39)可得
$$ \begin{split} & {ESP-Regret}_d^j(T) \leq {\cal{O}} \bigg( kT^{\lambda_1} +kT^{\lambda_2}(1+V_T) \,+\\ &\quad \sum\limits_{t=1}^T t^{-\frac{\omega_1-\gamma_1}{2}} + \sum\limits_{t=1}^T t^{-\frac{\omega_2-\gamma_2}{2}}+ T^{\frac{3}{4}} \bigg) \end{split} $$ (46) 然后, 基于式(46), 分析$ \omega_1 $和$ \gamma_1 $之间以及$ \omega_2 $和$ \gamma_2 $ 之间的数值关系, 即可得式(44).
□ 注5. 由推论2可知, 近似优化子程序的引入有效地增强了算法的实际性、灵活性, 及节约计算成本能力的同时, 也影响了所提算法的收敛性能, 即遗憾上界的阶次. 这进一步表明在算法的遗憾界和由精确度$ \rho_t^x, \; \rho_t^y $引起的近似误差之间存在权衡. 根据式(44), 当$ \omega_1 \in (\gamma_1, \; 3\gamma_1-1) $和$ \omega_2 \in (\gamma_2, \; 3\gamma_2-1) $时, 较低的精确度可能会使得遗憾界的阶次比不使用近似技术时差, 例如表1中$ \rho_t^x= \rho_t^y=t^{-1} $下的情况; 当$ \omega_1 > 3\gamma_1-1 $和$ \omega_2 > 3\gamma_2-1 $ 时, 遗憾界的最大阶数将不受影响.
4. 仿真实验
考虑式(47)描述的多智能体网络上的目标跟踪问题作为仿真案例来验证所提算法的有效性和先进性. 在时刻$ t $, 智能体$ i\in[n] $做出一个动作$ {\boldsymbol{x}}_{i, \;t} $去调整自己的位置以接近目标, 同时使用尽可能少的移动成本$ \langle {\boldsymbol{\pi}}_{i, \;t}, \; {\boldsymbol{x}}_{i, \;t} \rangle $, 其中$ {\boldsymbol{\pi}}_{i, \;t}\in \bf{R}^d $ 是一个价格向量[48]. 假定目标动作的观测值$ \hat{{\boldsymbol{\varphi}}}_{i, \;t} $受到一个有界扰动$ {\boldsymbol{y}}_{i, \;t}\;\in\; {\boldsymbol{Y}} $ 的恶意破坏. 为保证决策序列$ \{ {\boldsymbol{x}}_{i, \;t}\}, \; t\in [T] $的鲁棒性和有效性, 每个智能体生成在最坏情况[51-52]下的最优决策, 即求解下面的分布式鲁棒目标跟踪问题, 以协作追踪一个自主移动目标.
$$ \begin{split} &\min\limits_{{\boldsymbol{x}}_t } \max\limits_{ {\boldsymbol{y}}_t}\, \;\, \;\sum\limits_{t=1}^T \sum\limits_{i=1}^n f_{i, \;t} ( {\boldsymbol{x}}_t, \; {\boldsymbol{y}}_t) \\ & \ \text{s.t.}\;\; {\boldsymbol{x}}_t \in {\boldsymbol{X}}, \; {\boldsymbol{y}}_t \in {\boldsymbol{Y}} \end{split} $$ (47) 其中, $ f_{i, \,t} ({\boldsymbol{x}}_t, \, {\boldsymbol{y}}_t) = c_{i, \;1} \langle {\boldsymbol{\pi}}_{i, \;t}, \; {\boldsymbol{x}}_t \rangle +c_{i, \;2} \| {\boldsymbol{x}}_t - (\hat{{\boldsymbol{\varphi}}}_{i, \;t}+ {\boldsymbol{y}}_t) \|_2^2 + \lambda_{i, \;1} \| {\boldsymbol{x}}_t \|_1 -\lambda_{i, \;2} \| {\boldsymbol{y}}_t \|_2^2, $ $ {\boldsymbol{X}}:=\{ {\boldsymbol{x}} | \ [{\boldsymbol{x}}]_i \in [-6, 6], \; i\in [d] \} $, $ {\boldsymbol{Y}}:=\{ {\boldsymbol{y}} | \ [{\boldsymbol{y}}]_i \in [-0.3, \;0.3], \; i\in [d] \} $, $ c_{i, \;1}, c_{i, \;2} $ 是两个非负权衡系数, $ \lambda_{i, \;1}, \; \lambda_{i, \;2} $ 是两个非负正则化系数.
在下面的仿真中, 设置价格向量$\; {\boldsymbol{\pi}}_{i, \;t} $ 的元素为$ [{\boldsymbol{\pi}}_{i, \;t}]_l=\text{sgn}([{\boldsymbol{x}}_{i, \;t}]_l)\times [(1-\frac{a_1}{\sqrt{t}}) {\boldsymbol{\pi}}_i^1 + \frac{a_1}{\sqrt{t}} {\boldsymbol{\pi}}_{i, \;t}^2]_l $, $ l\in [d], a_1 \in [0, \;1) $. 向量$ {\boldsymbol{\pi}}_i^1 $和$ {\boldsymbol{\pi}}_{i, \;t}^2 $分别随机均匀地产生且其元素分布在$ [2, \; 4] $和$ [0, \; 3] $上, 其中这两个向量是独立的且后者随时间 $ t $ 变化. 让 $ \hat{{\boldsymbol{\varphi}}}_{i, \;t}= {\boldsymbol{\varphi}}_{t} \;+ {\boldsymbol{\varepsilon}}_{i, \;t}/{\sqrt{t}} $, 其中目标动作$ {\boldsymbol{\varphi}}_{t+1}= P_t {\boldsymbol{\varphi}}_{t} $, $ \| P_t\|_2 \leq 1 $, $ {\boldsymbol{\varepsilon}}_{i, \;t} $是智能体$ i $随机均匀产生的观察噪声且其元素分布在$ [-0.1, \;0.1]$上, 向量$ {\boldsymbol{\varphi}}_{1} $随机产生于$ {\boldsymbol{X}} $. 设置 $ n=10 $, $ d=6 $, $ c_{i, \;1}=0.1 $, $c_{i, \;2}=0.3 $, $ \lambda_{i, \;1}=0.1 $, $ \lambda_{i, \;2}=0.5\, $和$ {\cal{R}}^x({\boldsymbol{x}})= {\cal{R}}^y({\boldsymbol{x}})=\frac{1}{2} \| {\boldsymbol{x}} \|_2^2 $. 根据文献[53]可知当$ \lambda_{i, \;2}>c_{i, \;2} $时, 式(47) 的损失函数是强凸强凹的. 为有效验证所提算法的有效性, 定义并使用$ M_j(T):= {\mathrm{ESP}}{\text-} {\mathrm{Regret}}_d^j (T)/T $的全局平均、上包络线和下包络线作为衡量标准, 即$ \frac{1}{n}\sum_{j=1}^n M_j(T) $, $ \sup_j\{ M_j(T) \} $, $ \inf_j \{ M_j(T)\} $.
首先, 实验一探究了$ B_t = C_t =I_d $情形下算法1的收敛性. 根据推论1中的步长设置, 选择$ \alpha_t, \; \eta_t $ 为 $ 1/(17t^{0.55}) $, $ 1/(15t^{0.65}) $并运行算法1. 通过观察图1中$ M_j(T) $的全局平均、上包络线和下包络线可知, 算法1是收敛的. 接着, 实验二分析了预测映射技术($ B_t = C_t =I_d $)对算法1的收敛性能影响, 并与无预测映射技术的情况相对比. 从图2可知, 相比于单位映射, 恰当的预测矩阵能明显地增强算法1的收敛性能.
其次, 实验三研究了距离生成函数为$ {\cal{R}}^x({\boldsymbol{x}})= {\cal{R}}^y({\boldsymbol{x}})=\frac{1}{2} \| {\boldsymbol{x}} \|_2^2 $和 $ {\cal{R}}^x({\boldsymbol{x}})={\cal{R}}^y({\boldsymbol{x}})=\frac{1}{2} \| {\boldsymbol{x}} \|_p^2 $时, 算法 1的收敛性能对比. 设置 $ B_t = C_t =I_d $, $ \lambda_1=0.3 $和$ p=1.9 $. 图3展示了算法1在欧氏距离和$ p $范数的情况下对于三种维度$ d=9, \; d= 15, \; d=21 $的全局平均期望动态鞍点遗憾曲线. 仿真结果表明使用$ p $范数的算法1比使用欧氏范数实现了更好的最优性. 此外, 对于小的问题维度$ d $, 两种距离测量函数下的算法都表现出了更好的收敛性能, 且随着$ d $的明显增大, 对应的收敛性能将变差.
接下来, 实验四探究了算法1的近似计算变种算法的收敛性能和研究精确度序列$ \{\rho_t^x\}_{t=1}^T, \; \{\rho_t^y\}_{t=1}^T $的设置对其的影响. 设置$ B_t = P_t $, $ C_t =I_d $, 以及与实验一相同的步长. 对$ \tilde{ {\boldsymbol{x}}}_{i, \;t} $和$\; \tilde{ {\boldsymbol{y}}}_{i, \;t} $分别增加噪声, 即$ \breve{ {\boldsymbol{x}}}_{i, \;t}=\tilde{ {\boldsymbol{x}}}_{i, \;t} +\rho_t^x {\boldsymbol{1}} $ 和 $ \breve{ {\boldsymbol{y}}}_{i, \;t}=\tilde{ {\boldsymbol{y}}}_{i, \;t} +\rho_t^y {\boldsymbol{1}} $, 以求解式(33)中的近似优化子程序[8]. 在这个仿真实验中, 考虑$ \rho_t^x $ 和$ \rho_t^y $ 的四种不同的时变设置$ \rho_t^x=\rho_t^y= \rho_t=0 $, $ \frac{1}{t^{0.8}} $, $ \frac{1}{t} $, $ \frac{1}{t^{1.3}}, \; t\in [T] $和两种固定设置$ \frac{1}{2T} $, $ 0.5 $, 其中第一种固定设置是依赖于$ T $的知识的. 图4的仿真结果表明: 1)精确度递减系数$\; \omega_1 $和$ \omega_2 $越大, 算法的最优性就越好, 特别是对于无误差情况($ \rho_t =0 $); 2)在固定精确度0.5下, 算法不收敛, 但对于设置$ \frac{1}{2T} $, 算法是收敛的.
最后, 围绕算法1和它的集中式版本(基于单点估计的集中式在线Bandit算法), 实验五对其收敛性能开展对比性研究. 不失一般性, 假定在所有智能体中存在一个中心节点能够收集其他智能体的局部损失函数, 并在每个时刻$ t $优化目标函数$ \sum_{i=1}^n f_i({\boldsymbol{x}}, \; {\boldsymbol{y}}) $. 图5展示出两个算法有着相似的收敛趋势, 这一现象对应于注2中的理论分析. 此外, 基于单点估计的集中式在线Bandit算法实现了比算法1更好的收敛性能. 从分布式算法中智能体之间存在额外协调误差的角度看, 这是可解释的.
5. 结束语
对于一类多智能体时变网络上的分布式在线鞍点问题, 本文研究损失函数具体信息不可用的情形, 即Bandit反馈. 通过引入单点梯度估计器并结合预测映射技术, 开发一种高效的非欧几里得意义上的分布式在线Bandit镜面下降鞍点优化算法. 对于一般的凸凹目标函数, 证明了所提算法在某些预设条件下可实现次线性动态遗憾. 值得注意的是, 所提算法的每个梯度估计仅需一个函数值且算法参数消除了对总迭代时间$ T $先验知识的依赖, 这极大地增强算法的实用性. 此外, 考虑到在迭代优化中计算优化子程序的精确解通常较为困难且计算成本昂贵, 本文进一步设计一种基于近似计算的算法变种, 并严格分析精确度设置对算法遗憾上界的影响. 最后, 以目标跟踪问题作为仿真案例, 对所开发算法的有效性和先进性进行有效的验证. 未来, 放宽凸凹条件将是一个充满兴趣但富有挑战的研究.
附录 A. 引理3的证明
证明. 由一阶最优性条件可得对任意$ {\boldsymbol{z}} \in {\boldsymbol{X}}, \; $$ \langle\nabla f ({\boldsymbol{z}}^*),\ {\boldsymbol{z}} - {\boldsymbol{z}}^*\rangle\geq0, \; {\boldsymbol{z}}^* = \arg\min_{{\boldsymbol{z}} \in {\boldsymbol{X}}} f({\boldsymbol{z}}) $成立. 故根据$ \tilde{ {\boldsymbol{x}}}_{i, \;t} $的定义可得对于任意$ {\boldsymbol{x}} \in (1-\xi_t^x) {\boldsymbol{X}} $, 有
$$ \begin{aligned} &\langle \tilde{{\boldsymbol{g}}}_{i, \;t}^{x} + \dfrac{1}{\alpha_t} \nabla \Psi_{{\cal{R}}}^x (\tilde{ {\boldsymbol{x}}}_{i, \;t}, \; {\boldsymbol{x}}_{i, \;t}), \; {\boldsymbol{x}}- \tilde{ {\boldsymbol{x}}}_{i, \;t} \rangle \geq0 \end{aligned} $$ (A1) 由于$ \nabla \Psi_{{\cal{R}}}^x ({\boldsymbol{x}}, \; {\boldsymbol{x}}_{i, \;t})= \nabla R^x ({\boldsymbol{x}}) - \nabla R^x({\boldsymbol{x}}_{i, \;t}) $, 式(A1)被进一步转化为
$$ \begin{aligned} &\langle \tilde{{\boldsymbol{g}}}_{i, \;t}^{x}, \; \tilde{ {\boldsymbol{x}}}_{i, \;t} - {\boldsymbol{x}} \rangle \leq\\ &\quad \frac{1}{\alpha_t}\langle \nabla R^x ( \tilde{ {\boldsymbol{x}}}_{i, \;t} ) - \nabla R^x( {\boldsymbol{x}}_{i, \;t}), \; {\boldsymbol{x}}- \tilde{ {\boldsymbol{x}}}_{i, \;t} \rangle =\\ &\quad \frac{1}{\alpha_t}\left( \Psi_{{\cal{R}}}^x( {\boldsymbol{x}}, \; {\boldsymbol{x}}_{i, \;t}) -\Psi_{{\cal{R}}}^x( {\boldsymbol{x}}, \; \tilde{ {\boldsymbol{x}}}_{i, \;t})- \Psi_{{\cal{R}}}^x(\tilde{ {\boldsymbol{x}}}_{i, \;t}, \; {\boldsymbol{x}}_{i, \;t}) \right) \end{aligned} $$ (A2) 其中, 式(A2)中的等式使用了$ \langle \nabla R^x ({\boldsymbol{x}})- \nabla R^x ({\boldsymbol{z}}), {\boldsymbol{y}}- {\boldsymbol{z}} \rangle= \Psi_{{\cal{R}}}^x({\boldsymbol{y}}, \; {\boldsymbol{z}})+ \Psi_{{\cal{R}}}^x({\boldsymbol{z}}, \; {\boldsymbol{x}})-\Psi_{{\cal{R}}}^x({\boldsymbol{y}}, \; {\boldsymbol{x}}) $.
由算法1及参数$ \xi_{t}^x $的非增长性可知, 当$ t=1 $时, $ {\boldsymbol{x}}_{i, \;1} \in (1-\xi_1^x) {\boldsymbol{X}} $; 当$ t\geq 2 $时, $ {\boldsymbol{x}}_{i, \;t} \in (1-\xi_{t-1}^x)\times {\boldsymbol{X}} \subseteq (1-\xi_{t}^x) {\boldsymbol{X}} $. 故可得 $ {\boldsymbol{x}}_{i, \;t} \in (1-\xi_t^x) {\boldsymbol{X}} $. 进一步, 将$ {\boldsymbol{x}}= {\boldsymbol{x}}_{i, \;t} $代入式(A2)并结合式(10), $ \alpha_t>0 $, 有
$$ \begin{split} \varrho_x \| {\boldsymbol{x}}_{i, \;t}- \tilde{ {\boldsymbol{x}}}_{i, \;t} \|^2 \leq\;& \langle\alpha_t \tilde{{\boldsymbol{g}}}_{i, \;t}^{x}, \; {\boldsymbol{x}}_{i, \;t} -\tilde{ {\boldsymbol{x}}}_{i, \;t} \rangle\leq \\ & \alpha_t\| \tilde{ {\boldsymbol{g}}}_{i, \;t}^{x}\|_* \| {\boldsymbol{x}}_{i, \;t} -\tilde{ {\boldsymbol{x}}}_{i, \;t} \| \end{split} $$ (A3) 因此, 通过简化上式可得式(19a).
类似于式(19a)的证明, 对任意$ {\boldsymbol{y}} \in (1 - \xi_t^y) {\boldsymbol{Y}} $, 有
$$ \begin{aligned} &\langle \eta_t \tilde{{\boldsymbol{g}}}_{i, \;t}^{y}, \; {\boldsymbol{y}} -\tilde{ {\boldsymbol{y}}}_{i, \;t} \rangle\leq \\ &\quad \langle \nabla R^y ( \tilde{ {\boldsymbol{y}}}_{i, \;t} ) - \nabla R^y( {\boldsymbol{y}}_{i, \;t}), \; {\boldsymbol{y}}- \tilde{ {\boldsymbol{y}}}_{i, \;t} \rangle\leq \\ &\quad \Psi_{{\cal{R}}}^y( {\boldsymbol{y}}, \; {\boldsymbol{y}}_{i, \;t}) -\Psi_{{\cal{R}}}^y( {\boldsymbol{y}}, \; \tilde{ {\boldsymbol{y}}}_{i, \;t})- \Psi_{{\cal{R}}}^y(\tilde{ {\boldsymbol{y}}}_{i, \;t}, \; {\boldsymbol{y}}_{i, \;t}) \end{aligned} $$ (A4) 同理设$ {\boldsymbol{y}}= {\boldsymbol{y}}_{i, \;t} \in (1-\xi_t^y) {\boldsymbol{Y}} $并结合式(A4)、式(10), 得
$$ \begin{split} \varrho_y \| {\boldsymbol{y}}_{i, \;t}- \tilde{ {\boldsymbol{y}}}_{i, \;t} \|^2\leq\; & \langle\eta_t \tilde{{\boldsymbol{g}}}_{i, \;t}^{y}, \; \tilde{ {\boldsymbol{y}}}_{i, \;t} - {\boldsymbol{y}}_{i, \;t} \rangle \leq\\ & \eta_t\| \tilde{ {\boldsymbol{g}}}_{i, \;t}^{y}\|_* \|\tilde{ {\boldsymbol{y}}}_{i, \;t} - {\boldsymbol{y}}_{i, \;t} \| \end{split} $$ (A5) 因此, 通过简化上式可得式(19b).
□ 附录 B. 引理4的证明
证明. 为方便分析, 分别记$ \tilde{ {\boldsymbol{x}}}_{i, \;t} $和$ {\boldsymbol{x}}_{i, \;t} $$ (\tilde{ {\boldsymbol{y}}}_{i, \;t} $和$ {\boldsymbol{y}}_{i, \;t}) $ 间的误差和误差平均为:
$$ \begin{split} \left\{\begin{aligned} {\boldsymbol{e}}_{i, \;t}^x=\tilde{ {\boldsymbol{x}}}_{i, \;t}- {\boldsymbol{x}}_{i, \;t}, \;& {\boldsymbol{e}}_{avg, \;t}^x=\frac{1}{n} \sum\limits_{i=1}^n {\boldsymbol{e}}_{i, \;t}^x \; \\ {\boldsymbol{e}}_{i, \;t}^y=\tilde{ {\boldsymbol{y}}}_{i, \;t}- {\boldsymbol{y}}_{i, \;t}, \;& {\boldsymbol{e}}_{avg, \;t}^y=\frac{1}{n} \sum\limits_{i=1}^n {\boldsymbol{e}}_{i, \;t}^y \end{aligned}\right. \end{split} $$ (B1) 记$ {\boldsymbol{x}}[{\boldsymbol{y}}] $为 $ {\boldsymbol{x}} $或 $ [\cdot] $中的 $ {\boldsymbol{y}} $, 例如$ {\boldsymbol{x}}[{\boldsymbol{y}}]_{i, \;t+1}\;= \sum_{j=1}^n[A_{t}]_{ij} B[C]_t \tilde{ {\boldsymbol{x}}}[\tilde{ {\boldsymbol{y}}}]_{j, \;t}$既表示 $ {\boldsymbol{x}}_{i, \;t+1} = \sum_{j=1}^n [A_{t}]_{ij} \times \,B_t \tilde{ {\boldsymbol{x}}}_{j, \;t} $ 也表示 $ {\boldsymbol{y}}_{i, \;t+1} = \sum_{j=1}^n [A_{t}]_{ij} C_t \tilde{ {\boldsymbol{y}}}_{j, \;t} $. 定义
$$\left\{ \begin{split} &{\boldsymbol{x}}[ {\boldsymbol{y}}]_t:= [ {\boldsymbol{x}}[ {\boldsymbol{y}}]_{1, \;t}^{\rm T}, \; {\boldsymbol{x}}[ {\boldsymbol{y}}]_{2, \;t}^{\rm T}, \;\cdots, \; {\boldsymbol{x}}[ {\boldsymbol{y}}]_{n, \;t}^{\rm T}]^{\rm T} \; \\ &{\boldsymbol{e}}_t^{x[y]}:=[{ {\boldsymbol{e}}_{1, \;t}^{x[y]}}^{\rm T}, \; { {\boldsymbol{e}}_{2, \;t}^{x[y]}}^{\rm T}, \; \cdots, \; { {\boldsymbol{e}}_{n, \;t}^{x[y]}}^{\rm T}]^{\rm T} \; \\ &\Pi_{B[C]}(t, \;s):=B[C]_t B[C]_{t-1}\cdots B[C]_s, \\ &\quad\forall t\geq s\geq 1 \end{split}\right.$$ (B2) 根据算法1, 可得
$$ \begin{split} {\boldsymbol{x}}[ {\boldsymbol{y}}]_{t+1} =\;& (A_t \otimes B[C]_t) \tilde{ {\boldsymbol{x}}}[\tilde{ {\boldsymbol{y}}}]_{j, \;t} =\\ & (A_t \otimes B[C]_t) {\boldsymbol{x}}[ {\boldsymbol{y}}]_{t} + (A_t \otimes B[C]_t) {\boldsymbol{e}}_t^{x[y]}= \\ & (\Phi(t, \; t-1) \otimes \Pi_{B[C]}(t, \;t-1)) {\boldsymbol{x}}[ {\boldsymbol{y}}]_{t-1}\;+ \\ & ( \Phi(t, \; t-1) \otimes \Pi_{B[C]}(t, \;t-1)) {\boldsymbol{e}}_{t-1}^{x[y]} \;+\\ & (A_t \otimes B[C]_t) {\boldsymbol{e}}_t^{x[y]} = \\ &(\Phi(t, \; 1) \otimes \Pi_{B[C]}(t, \;1)) {\boldsymbol{x}}[ {\boldsymbol{y}}]_{1}\;+ \\ & \sum\limits_{\tau=1}^t \left( \Phi(t, \; \tau) \otimes \Pi_{B[C]}(t, \;\tau)\right) {\boldsymbol{e}}_{\tau}^{x[y]}\\[-1pt] \end{split} $$ (B3) 进一步, 式(B3)等价于
$$ \begin{split} & {\boldsymbol{x}}[ {\boldsymbol{y}}]_{i, \;t+1}= \sum\limits_{j=1}^n [\Phi(t, \;1)]_{ij} \Pi_{B[C]}(t, \;1) {\boldsymbol{x}}[ {\boldsymbol{y}}]_{j, \;1} \;+\\ & \qquad \sum\limits_{\tau=1}^t \sum\limits_{j=1}^n [\Phi(t, \; \tau)]_{ij} \Pi_{B[C]}(t, \;\tau) {\boldsymbol{e}}_{j, \;\tau}^{x[y]} \end{split} $$ (B4) 通过使用$ {\boldsymbol{x}}[{\boldsymbol{y}}]_{avg, \;t} $的定义并结合式(B4), 可建立
$$ \begin{split} {\boldsymbol{x}}[ {\boldsymbol{y}}]_{avg, \;t+1} =\;&\Pi_{B[C]}(t, \;1) {\boldsymbol{x}}[ {\boldsymbol{y}}]_{avg, \;1}\;+ \\ & \sum\limits_{\tau=1}^t \Pi_{B[C]}(t, \;\tau) {\boldsymbol{e}}_{avg, \;\tau}^{x[y]} \end{split} $$ (B5) 其中, 式(B5)使用了$ \Phi(t, \; \tau), 1\leq \tau\leq t $的双随机性[25].
结合式(B5)和式(B4), 可得
$$ \begin{split} \| {\boldsymbol{x}}&[ {\boldsymbol{y}}]_{i, \;t+1} - {\boldsymbol{x}}[ {\boldsymbol{y}}]_{avg, \;t+1}\| \leq\\ & \sum\limits_{j=1}^n \left| [\Phi (t, \;1)]_{ij}- \frac{1}{n} \right| \| \Pi_{B[C]}(t, \;1)\| \left\| {\boldsymbol{x}}[ {\boldsymbol{y}}]_{j, \;1}\right\|+ \\ & \sum\limits_{\tau=1}^t \sum\limits_{j=1}^n \left| [\Phi (t, \;\tau)]_{ij}- \frac{1}{n} \right| \|\Pi_{B[C]}(t, \;\tau)\| \left\| {\boldsymbol{e}}_{j, \;\tau}^{x[y]} \right\| \leq\\ & \Gamma \sigma^{t-1} \sum\limits_{j=1}^n \left\| {\boldsymbol{x}}[ {\boldsymbol{y}}]_{j, \;1}\right\| + \Gamma \sum\limits_{\tau=1}^t \sigma^{t-\tau} \sum\limits_{j=1}^n \left\| {\boldsymbol{e}}_{j, \;\tau}^{x[y]} \right\| \end{split} $$ (B6) 其中, 式(B6)中的第二个不等式使用了引理1和假设6, 即
$$ \begin{split} \| &\Pi_{B[C]}(t, \;\tau)\| \leq \| B[C]_t\| \|B[C]_{t-1}\| \cdots \|B[C]_{\tau}\|\leq 1, \\ & \quad 1\leq \tau \leq t \end{split} $$ (B7) 对式(B6)从$ i $为1到$ n $及$ t $为1到$ T $求和, 有
$$ \begin{split} & \sum\limits_{t=1}^T \sum\limits_{i=1}^n\| {\boldsymbol{x}}[ {\boldsymbol{y}}]_{i, \;t} - {\boldsymbol{x}}[ {\boldsymbol{y}}]_{avg, \;t}\| \leq \sum\limits_{i=1}^n\| {\boldsymbol{x}}[ {\boldsymbol{y}}]_{i, \;1}\; -\\ &\quad {\boldsymbol{x}}[ {\boldsymbol{y}}]_{avg, \;1}\| + n\Gamma \sum\limits_{t=1}^{T-1} \sigma^{t-1} \sum\limits_{j=1}^n \left\| {\boldsymbol{x}}[ {\boldsymbol{y}}]_{j, \;1}\right\| + \\ & \quad n\Gamma \sum\limits_{t=1}^{T-1} \sum\limits_{\tau=1}^t \sigma^{t-\tau} \sum\limits_{j=1}^n \left\| {\boldsymbol{e}}_{j, \;\tau}^{x[y]} \right\| \end{split} $$ (B8) 接下来, 对式(B8)右侧的项做如下界定:
$$ \begin{aligned}[b] &n\Gamma \sum\limits_{t=1}^{T-1} \sigma^{t-1} \sum\limits_{j=1}^n \left\| {\boldsymbol{x}}[ {\boldsymbol{y}}]_{j, \, 1}\right\| \leq\\ &\qquad\frac{ n \Gamma}{1-\sigma} \sum\limits_{j=1}^n \left\| {\boldsymbol{x}}[ {\boldsymbol{y}}]_{j, \, 1}\right\| \end{aligned} $$ (B9) $$ \begin{split} & n\Gamma \sum\limits_{t=1}^{T-1} \sum\limits_{\tau=1}^t \sigma^{t-\tau} \sum\limits_{j=1}^n \left\| {\boldsymbol{e}}_{j, \;\tau}^{x[y]} \right\|\leq \\ &\quad n \Gamma \left( \sum\limits_{\tau=1}^{T-1} \sigma^{\tau-1} \right) \sum\limits_{t=1}^{T-1} \sum\limits_{j=1}^n \left\| {\boldsymbol{e}}_{j,\; t}^{x[y]} \right\| \leq\\ &\quad \frac{n \Gamma}{ 1-\sigma} \sum\limits_{t=1}^{T-1} \sum\limits_{i=1}^n \left\| \tilde{ {\boldsymbol{x}}}[ \tilde{ {\boldsymbol{y}}}]_{i, \;t}- {\boldsymbol{x}}[ {\boldsymbol{y}}]_{i, \;t} \right\|\leq \\ &\quad \frac{n^2 d[m]F\bar{p}_* \Gamma}{ \varrho_{x[y]} (1-\sigma)} \sum\limits_{t=1}^{T-1} \frac{\alpha[\eta]_t} {{\delta_t^{x[y]}}} \end{split} $$ (B10) 其中, 式(B10)中最后一个不等式结合了引理3和引理2.
最后, 通过代入式(B9)和(B10)到式(B8), 即可得式(20a)和式(20b).
□ 附录 C. 引理5的证明
证明. 根据假设2、假设3和$ h_{i, \;t}^x $的定义, 可以得到
$$ \begin{split} & f_{i, \;t} ({\boldsymbol{x}}_{i, \;t}, \;{\boldsymbol{y}}_{i, \;t}) - f_{i, \;t} ({\boldsymbol{x}}_t^*, \;{\boldsymbol{y}}_{i, \;t})\leq \\ &\quad h_{i, \;t}^x ({\boldsymbol{x}}_{i, \;t}) - h_{i, \;t}^x ((1-\xi_t^x){\boldsymbol{x}}_t^*) + \bar{p}L_XR_1\xi_t^x \leq\\ &\quad \hat{h}_{i, \;t}^x ({\boldsymbol{x}}_{i, \;t}) - \hat{h}_{i, \;t}^x ((1-\xi_t^x){\boldsymbol{x}}_t^*) + \bar{p}L_XR_1\xi_t^x \;+ \\ &\quad 2L_X \bar{p} {\delta_t^x} \leq \langle \nabla \hat{h}_{i, \;t}^x ({\boldsymbol{x}}_{i, \;t}), \; {\boldsymbol{x}}_{i, \;t}-(1-\xi_t^x) {\boldsymbol{x}}_t^* \rangle\; + \\ & \quad \bar{p}L_XR_1\xi_t^x + 2L_X \bar{p} {\delta_t^x}= \langle \nabla \hat{h}_{i, \;t}^x ({\boldsymbol{x}}_{i, \;t})-\tilde{{\boldsymbol{g}}}_{i, \;t}^x, \\ & \quad {\boldsymbol{x}}_{i, \;t}-(1-\xi_t^x) {\boldsymbol{x}}_t^* \rangle + 2L_X \bar{p} {\delta_t^x} \;+\\ &\quad \langle \tilde{{\boldsymbol{g}}}_{i, \;t}^x, \; {\boldsymbol{x}}_{i, \;t}-(1-\xi_t^x) {\boldsymbol{x}}_t^* \rangle+ \bar{p}L_XR_1\xi_t^x \\[-1pt]\end{split} $$ (C1) 接下来, 逐一分析式(C1)右侧的两个内积项. 一方面, 根据引理2的性质1)和$ {\boldsymbol{u}}_{i, \;t}^x,\; \forall i\in [n], t\in [T] $是独立同分布变量的事实, 可获得
$$ {\rm{E}} \langle \nabla \hat{h}_{i, \;t}^x ({\boldsymbol{x}}_{i, \;t})-\tilde{{\boldsymbol{g}}}_{i, \;t}^x, \; {\boldsymbol{x}}_{i, \;t}-(1-\xi_t^x) {\boldsymbol{x}}_t^* \rangle=0 $$ (C2) 另一方面, 通过增加和减去变量$ \tilde{ {\boldsymbol{x}}}_{i, \;t} $并结合Cauchy-Schwarz不等式, 可得
$$ \begin{aligned} &\langle \tilde{{\boldsymbol{g}}}_{i, \;t}^x, \; {\boldsymbol{x}}_{i, \;t}-(1-\xi_t^x) {\boldsymbol{x}}_t^* \rangle =\\ &\quad \langle \tilde{{\boldsymbol{g}}}_{i, \;t}^x, \; {\boldsymbol{x}}_{i, \;t}-\tilde{ {\boldsymbol{x}}}_{i, \;t} \rangle + \langle \tilde{{\boldsymbol{g}}}_{i, \;t}^x, \; \tilde{ {\boldsymbol{x}}}_{i, \;t}-(1-\xi_t^x) {\boldsymbol{x}}_t^* \rangle \leq\\ &\quad \left\| \tilde{{\boldsymbol{g}}}_{i, \;t}^x\right\|_* \| {\boldsymbol{x}}_{i, \;t}-\tilde{ {\boldsymbol{x}}}_{i, \;t} \| + \langle \tilde{{\boldsymbol{g}}}_{i, \;t}^x, \; \tilde{ {\boldsymbol{x}}}_{i, \;t}-(1-\xi_t^x) {\boldsymbol{x}}_t^* \rangle \end{aligned} $$ (C3) 根据式(A2)、$ (1-\xi_t^x) {\boldsymbol{x}}_t^* \in (1-\xi_t^x) {\boldsymbol{X}} $和引理3, 式(C3)可进一步转化为
$$ \begin{split} &\langle \tilde{{\boldsymbol{g}}}_{i, \;t}^x, \; {\boldsymbol{x}}_{i, \;t}-(1-\xi_t^x) {\boldsymbol{x}}_t^* \rangle \leq\\ &\quad \frac{\alpha_t}{\varrho_x}\left\| \tilde{{\boldsymbol{g}}}_{i, \;t}^x\right\|_*^2 +\frac{1}{\alpha_t} ( \Psi_{{\cal{R}}}^x((1-\xi_t^x) {\boldsymbol{x}}_t^*, \; {\boldsymbol{x}}_{i, \;t}) \;-\\ &\quad \Psi_{{\cal{R}}}^x((1-\xi_t^x) {\boldsymbol{x}}_t^*, \; \tilde{ {\boldsymbol{x}}}_{i, \;t})) \end{split} $$ (C4) 其中, 不等式根据式(10)得到.
接下来, 分析式(C4)中$ \Psi_{{\cal{R}}}^x ((1- \xi_t^x) {\boldsymbol{x}}_t^*, \; {\boldsymbol{x}}_{i, \;t}) -\, \Psi_{{\cal{R}}}^x((1-\xi_t^x) {\boldsymbol{x}}_t^*, \; \tilde{ {\boldsymbol{x}}}_{i, \;t}) $ 的上界. 通过增加和减去项$ \Psi_{{\cal{R}}}^x ((1 - \xi_{t + 1}^x) {\boldsymbol{x}}_{t+1}^*, \, {\boldsymbol{x}}_{i, \, t + 1}) $, $ \Psi_{{\cal{R}}}^x(B_t (1 - \xi_t^x) {\boldsymbol{x}}_t^*, $ ${\boldsymbol{x}}_{i, \ t+1}) $, 可得
$$ \begin{split} &\Psi_{{\cal{R}}}^x((1-\xi_t^x) {\boldsymbol{x}}_t^*, \; {\boldsymbol{x}}_{i, \;t}) - \Psi_{{\cal{R}}}^x((1-\xi_t^x) {\boldsymbol{x}}_t^*, \; \tilde{ {\boldsymbol{x}}}_{i, \;t})= \\ &\quad M_{i, \;t}^1 +M_{i, \;t}^2 +M_{i, \;t}^3 \end{split} $$ (C5) 其中, $ M_{i, \,t}^1 = \Psi_{{\cal{R}}}^x((1-\xi_t^x) {\boldsymbol{x}}_t^*, \, {\boldsymbol{x}}_{i, \,t}) -\Psi_{{\cal{R}}}^x((1-\xi_{t+1}^x)\, \times $ $ {\boldsymbol{x}}_{t+1}^*, \; {\boldsymbol{x}}_{i, \;t+1}), \; M_{i, \;t}^2=\Psi_{{\cal{R}}}^x((1-\xi_{t+1}^x) {\boldsymbol{x}}_{t+1}^*, \; {\boldsymbol{x}}_{i, \;t+1}) \;- \Psi_{{\cal{R}}}^x(B_t (1-\xi_t^x) {\boldsymbol{x}}_t^*, \; {\boldsymbol{x}}_{i, \;t+1}), \; $ $ M_{i, \;t}^3 = \Psi_{{\cal{R}}}^x(B_t (1\;-\;\xi_t^x) \times\,{\boldsymbol{x}}_t^*, \; $ $ {\boldsymbol{x}}_{i, \;t+1}) -\Psi_{{\cal{R}}}^x((1-\xi_t^x) {\boldsymbol{x}}_t^*, \; \tilde{ {\boldsymbol{x}}}_{i, \;t}). $ 进一步, 上述不等式右侧的三项分别满足
$$ \begin{split} \sum\limits_{t=1}^T & \sum\limits_{i=1}^n \frac{M_{i, \;t}^1 }{\alpha_t} \leq \frac{1}{\alpha_1} \sum\limits_{i=1}^n \Psi_{{\cal{R}}}^x((1-\xi_1^x) {\boldsymbol{x}}_1^*, \; {\boldsymbol{x}}_{i, \;1}) \;+\\ & \sum_{t=2}^T \sum\limits_{i=1}^n \left(\frac{1}{\alpha_t} -\frac{1}{\alpha_{t-1}} \right) \Psi_{{\cal{R}}}^x((1-\xi_t^x) {\boldsymbol{x}}_t^*, \; {\boldsymbol{x}}_{i, \;t}) \leq\\ & \frac{n R_X}{\alpha_T} \\[-1pt]\end{split} $$ (C6) $$ \begin{aligned} \sum\limits_{t=1}^T & \sum\limits_{i=1}^n \frac{M_{i, \;t}^2 }{\alpha_t} \leq \sum\limits_{t=1}^T \sum\limits_{i=1}^n \frac{K_X }{\alpha_t} ( \| (1-\xi_{t+1}^x) {\boldsymbol{x}}_{t+1}^* \;- \\ & (1-\xi_{t}^x) {\boldsymbol{x}}_{t+1}^*\| + \| (1-\xi_{t}^x) {\boldsymbol{x}}_{t+1}^* - (1 - \xi_t^x) B_t {\boldsymbol{x}}_t^*\| ) \leq\\ & \sum\limits_{t=1}^T \sum\limits_{i=1}^n \frac{K_X }{\alpha_t} ( R_1 (\xi_{t}^x-\xi_{t+1}^x) + \| {\boldsymbol{x}}_{t+1}^* - B_t {\boldsymbol{x}}_t^*\|) \; \end{aligned} $$ (C7) $$ \begin{split} \sum\limits_{t=1}^T & \sum\limits_{i=1}^n \frac{M_{i, \;t}^3}{\alpha_t} \leq \sum\limits_{t=1}^T \sum\limits_{i=1}^n \frac{1}{\alpha_t} \sum\limits_{j=1}^n [A_t]_{ij} \,\times \\ & \left( \Psi_{{\cal{R}}}^x (B_t (1 - \xi_t^x) {\boldsymbol{x}}_t^*, \; {\boldsymbol{p}}_{j, \;t}^x) - \Psi_{{\cal{R}}}^x((1 - \xi_t^x) {\boldsymbol{x}}_t^*, \; \tilde{ {\boldsymbol{x}}}_{i, \;t}) \right)= \\ & \sum\limits_{t=1}^T \sum\limits_{i=1}^n \frac{1}{\alpha_t} \left( \Psi_{{\cal{R}}}^x \left(B_t (1-\xi_t^x) {\boldsymbol{x}}_t^*, \; B_t\tilde{ {\boldsymbol{x}}}_{i, \;t}\right) \right. -\\ & \left. \Psi_{{\cal{R}}}^x((1-\xi_t^x) {\boldsymbol{x}}_t^*, \; \tilde{ {\boldsymbol{x}}}_{i, \;t})\right)\leq 0\\[-1pt] \end{split} $$ (C8) 其中, 不等式(C6)结合了$ \frac{1}{\alpha_t}-\frac{1}{\alpha_{t-1}}\geq 0 $和$ \Psi_{{\cal{R}}}^x({\boldsymbol{x}}_1, \; {\boldsymbol{x}}_2) \geq 0, \; \forall {\boldsymbol{x}}_1, \; {\boldsymbol{x}}_2 \in {\boldsymbol{X}} $; 不等式(C7)使用了式(11)和三角不等式; 不等式(C8)使用了$ A_t $ 的双随机性和假设6.
将式(C6) ~ (C8)代入式(C5), 并对其从$ i $为1到$ n $和 $ t $为1到$ T $求和, 可以得出
$$ \begin{split} & \sum\limits_{t=1}^T \sum\limits_{i=1}^n \frac{1}{\alpha_t} \left( \Psi_{{\cal{R}}}^x((1-\xi_t^x) {\boldsymbol{x}}_t^*, \; {\boldsymbol{x}}_{i, \;t}) \right. -\\ & \quad \left. \Psi_{{\cal{R}}}^x((1-\xi_t^x) {\boldsymbol{x}}_t^*, \; \tilde{ {\boldsymbol{x}}}_{i, \;t}) \right) \leq\\ & \quad \frac{n R_X}{\alpha_T} + n K_X \sum\limits_{t=1}^T \frac{1}{\alpha_t} \| {\boldsymbol{x}}_{t+1}^*-B_t {\boldsymbol{x}}_t^*\| \;+\\ & \quad n K_X R_1 \sum\limits_{t=1}^T \frac{\xi_t^x -\xi_{t+1}^x}{\alpha_t} \end{split} $$ (C9) 然后, 对式(C9)从$ i $为1到$ n $和 $ t $为1到$ T $求和, 可得
$$ \begin{split} \sum\limits_{t=1}^T&{\sum\limits_{i=1}^n {\langle \tilde{{\boldsymbol{g}}}_{i, \;t}^x, \; {\boldsymbol{x}}_{i, \;t}-(1-\xi_t^x) {\boldsymbol{x}}_t^* \rangle }} \leq\\ & \sum\limits_{t=1}^T \sum\limits_{i=1}^n \frac{\alpha_t}{\varrho_x}\left\| \tilde{{\boldsymbol{g}}}_{i, \;t}^x\right\|_*^2 + n K_X \sum\limits_{t=1}^T \frac{1}{\alpha_t} \| {\boldsymbol{x}}_{t+1}^*-B_t {\boldsymbol{x}}_t^*\| \;+\\ & \frac{n R_X}{\alpha_T}+n K_X R_1 \sum\limits_{t=1}^T \frac{\xi_t^x -\xi_{t+1}^x}{\alpha_t} \end{split} $$ (C10) 最后, 对式(C1)从$ i $为1到$ n $和$ t $为1到$ T $求和、取期望, 并结合式(C2)、式(C10), 式 (21a)能够被建立.
类似于式(21a)的证明, 根据假设2、假设3和$ h_{i, \;t}^y $的定义, 可以得到
$$ \begin{aligned} & f_{i, \;t} ({\boldsymbol{x}}_{i, \;t}, \;{\boldsymbol{y}}_{t}^*) - f_{i, \;t} ({\boldsymbol{x}}_{i, \;t}, \;{\boldsymbol{y}}_{i, \;t}) \leq\\ &\quad f_{i, \;t} ({\boldsymbol{x}}_{i, \;t}, \;(1-\xi_t^y ){\boldsymbol{y}}_{t}^*) - f_{i, \;t} ({\boldsymbol{x}}_{i, \;t}, \;{\boldsymbol{y}}_{i, \;t}) \;+ \\ & \quad \bar{p}L_Y R_2 \xi_t^y \leq\hat{h}_{i, \;t}^y ((1-\xi_t^y ){\boldsymbol{y}}_{t}^*) - \hat{h}_{i, \;t}^y ({\boldsymbol{y}}_{i, \;t})\;+ \\ &\quad \bar{p}L_YR_2\xi_t^y + 2L_Y \bar{p} {\delta_t^y} \leq\\ & \quad\langle \nabla \hat{h}_{i, \;t}^y ({\boldsymbol{y}}_{i, \;t})-\tilde{{\boldsymbol{g}}}_{i, \;t}^y, \; (1-\xi_t^y) {\boldsymbol{y}}_t^* - {\boldsymbol{y}}_{i, \;t} \rangle \;+ \\ &\quad \langle \tilde{{\boldsymbol{g}}}_{i, \;t}^y, \; (1-\xi_t^y) {\boldsymbol{y}}_t^* - {\boldsymbol{y}}_{i, \;t} \rangle + \bar{p}L_YR_2\xi_t^y+ 2L_Y \bar{p} {\delta_t^y} \end{aligned} $$ (C11) 接下来, 逐一分析式(C11)右侧的两个内积项. 一方面, 根据引理2的性质2)和$ {\boldsymbol{u}}_{i, \;t}^y, \; \forall i\in [n], \; t \in [T] $是独立同分布变量的事实, 可得
$$ {\rm{E}} \langle \nabla \hat{h}_{i, \;t}^y ({\boldsymbol{y}}_{i, \;t})-\tilde{{\boldsymbol{g}}}_{i, \;t}^y, \; (1-\xi_t^y) {\boldsymbol{y}}_t^* - {\boldsymbol{y}}_{i, \;t}\rangle=0 $$ (C12) 另一方面, 类似于式(C3)、式(C4), 有
$$ \begin{split} &\langle \tilde{{\boldsymbol{g}}}_{i, \;t}^y, \; (1-\xi_t^y) {\boldsymbol{y}}_t^* - {\boldsymbol{y}}_{i, \;t} \rangle\leq \\ &\quad \langle \tilde{{\boldsymbol{g}}}_{i, \;t}^y, \; (1-\xi_t^y) {\boldsymbol{y}}_t^*-\tilde{ {\boldsymbol{y}}}_{i, \;t} \rangle + \langle \tilde{{\boldsymbol{g}}}_{i, \;t}^y, \; \tilde{ {\boldsymbol{y}}}_{i, \;t}- {\boldsymbol{y}}_{i, \;t} \rangle\leq \\ &\quad \langle \tilde{{\boldsymbol{g}}}_{i, \;t}^y, \; (1-\xi_t^y) {\boldsymbol{y}}_t^*-\tilde{ {\boldsymbol{y}}}_{i, \;t} \rangle+ \left\| \tilde{{\boldsymbol{g}}}_{i, \;t}^y\right\|_* \| {\boldsymbol{y}}_{i, \;t}-\tilde{ {\boldsymbol{y}}}_{i, \;t} \| \leq\\ &\quad \frac{\eta_t}{\varrho_y} \left\| \tilde{{\boldsymbol{g}}}_{i, \;t}^y\right\|_*^2 +\frac{1}{\eta_t} \left( \Psi_{{\cal{R}}}^y((1-\xi_t^y) {\boldsymbol{y}}_t^*, \; {\boldsymbol{y}}_{i, \;t}) \right. -\\ &\quad \left.\Psi_{{\cal{R}}}^y((1-\xi_t^y) {\boldsymbol{y}}_t^*, \; \tilde{ {\boldsymbol{y}}}_{i, \;t})\right)\\[-1pt] \end{split} $$ (C13) 其中, 式 (C13) 中的最后一个不等式结合了式(A4)、$ (1-\xi_t^y) {\boldsymbol{y}}_t^* \in (1-\xi_t^y) {\boldsymbol{Y}} $、式(10)和引理3. 接下来, 类似于式(C5) ~ (C8)的分析, $ \Psi_{{\cal{R}}}^y ((1\;- \xi_t^y) {\boldsymbol{y}}_t^*, \; {\boldsymbol{y}}_{i, \;t}) - \Psi_{{\cal{R}}}^y((1- \xi_t^y) {\boldsymbol{y}}_t^*,\; \tilde{ {\boldsymbol{y}}}_{i, \;t}) $满足下式:
$$ \begin{split} & \sum\limits_{t=1}^T \sum\limits_{i=1}^n \frac{1}{\eta_t} \left( \Psi_{{\cal{R}}}^y((1-\xi_t^y) {\boldsymbol{y}}_t^*, \; {\boldsymbol{y}}_{i, \;t}) \right. -\\ &\quad \left. \Psi_{{\cal{R}}}^y((1-\xi_t^y) {\boldsymbol{y}}_t^*, \; \tilde{ {\boldsymbol{y}}}_{i, \;t}) \right)\leq \\ &\quad \frac{n R_Y}{\eta_T} + n K_Y \sum\limits_{t=1}^T \frac{1}{\eta_t} \| {\boldsymbol{y}}_{t+1}^*-C_t {\boldsymbol{y}}_t^*\| \;+\\ &\quad n K_Y R_2 \sum\limits_{t=1}^T \frac{\xi_t^y -\xi_{t+1}^y}{\eta_t} \end{split} $$ (C14) 进一步, 通过对式(C13)从$ i $为1到$ \;n $和$ t $为1到$ T $求和, 可得
$$ \begin{split} & \sum\limits_{t=1}^T{\sum\limits_{i=1}^n {\langle \tilde{{\boldsymbol{g}}}_{i, \;t}^y, \; (1-\xi_t^y) {\boldsymbol{y}}_t^* - {\boldsymbol{y}}_{i, \;t} \rangle }} \leq\\ &\quad \sum\limits_{t=1}^T \sum\limits_{i=1}^n \frac{\eta_t}{\varrho_y}\left\| \tilde{{\boldsymbol{g}}}_{i, \;t}^y\right\|_*^2 + n K_Y \sum\limits_{t=1}^T \frac{1}{\eta_t} \| {\boldsymbol{y}}_{t+1}^*-C_t {\boldsymbol{y}}_t^*\| \;+\\ & \quad \frac{n R_Y}{\eta_T} +n K_Y R_2 \sum\limits_{t=1}^T \frac{\xi_t^y -\xi_{t+1}^y}{\eta_t} \end{split} $$ (C15) 最后, 对式(C11)从$ i $为1到$ \;n $和$ t $为1到$ T $求和、取期望, 并结合式(C12)、式(C15), 式(21b)能被建立.
□ 附录 D. 引理6的证明
证明. 因为函数$ \langle {\boldsymbol{x}}, \; \tilde{{\boldsymbol{g}}}_{i, \;t}^{x} \rangle + \frac{1}{\alpha_t} \Psi_{{\cal{R}}}^x ({\boldsymbol{x}}, \; {\boldsymbol{x}}_{i, \;t}) $在集合$ {\boldsymbol{X}} $ 上是$ \frac{\varrho_x}{\alpha_t} $强凸的, 故有
$$ \begin{split} & \langle \breve{ {\boldsymbol{x}}}_{i, \;t}, \; \tilde{{\boldsymbol{g}}}_{i, \;t}^{x} \rangle + \frac{1}{\alpha_t} \Psi_{{\cal{R}}}^x ( \breve{ {\boldsymbol{x}}}_{i, \;t}, \; {\boldsymbol{x}}_{i, \;t}) \geq \langle \breve{ {\boldsymbol{x}}}_{i, \;t}^*, \; \tilde{{\boldsymbol{g}}}_{i, \;t}^{x} \rangle \;+\\ &\quad \frac{1}{\alpha_t} \Psi_{{\cal{R}}}^x ( \breve{ {\boldsymbol{x}}}_{i, \;t}^*, \; {\boldsymbol{x}}_{i, \;t}) + \frac{\varrho_x}{2\alpha_t} \|\breve{ {\boldsymbol{x}}}_{i, \;t} - \breve{ {\boldsymbol{x}}}_{i, \;t}^* \|^2 \end{split} $$ (D1) 根据式(35)中$ \rho_t^x $的定义, 并结合上述不等式, 可得
$$ \frac{\varrho_x}{2\alpha_t} \|\breve{ {\boldsymbol{x}}}_{i, \;t} - \breve{ {\boldsymbol{x}}}_{i, \;t}^* \|^2 \leq \rho_t^x $$ (D2) 进一步, 式(37a)能够被得到.
类似于式(37a)的证明, 由函数$ -\langle {\boldsymbol{y}}, \; \tilde{{\boldsymbol{g}}}_{i, \;t}^{y} \rangle\; + \frac{1}{\eta_t} \Psi_{{\cal{R}}}^y ({\boldsymbol{y}}, \; {\boldsymbol{y}}_{i, \;t}) $在$\; {\boldsymbol{Y}} $ 上的$ \frac{\varrho_y}{\eta_t} $强凸性, 可得
$$ \begin{split} & - \langle \breve{ {\boldsymbol{y}}}_{i, \;t}, \; \tilde{{\boldsymbol{g}}}_{i, \;t}^{y} \rangle + \frac{1}{\eta_t} \Psi_{{\cal{R}}}^y ( \breve{ {\boldsymbol{y}}}_{i, \;t}, \; {\boldsymbol{y}}_{i, \;t}) \geq -\langle \breve{ {\boldsymbol{y}}}_{i, \;t}^*, \; \tilde{{\boldsymbol{g}}}_{i, \;t}^{y} \rangle \;+ \\ &\quad \frac{1}{\eta_t} \Psi_{{\cal{R}}}^y ( \breve{ {\boldsymbol{y}}}_{i, \;t}^*, \; {\boldsymbol{y}}_{i, \;t}) + \frac{\varrho_y}{2\eta_t} \|\breve{ {\boldsymbol{y}}}_{i, \;t} - \breve{ {\boldsymbol{y}}}_{i, \;t}^* \|^2 \end{split} $$ (D3) 结合$ \rho_t^y $的定义和式(D3), 有 $ {\varrho_y}\|\breve{ {\boldsymbol{y}}}_{i, \;t} - \breve{ {\boldsymbol{y}}}_{i, \;t}^* \|^2 / {2\eta_t} \leq \rho_t^y. $ 故经过化简, 可得式(37b).
根据第3节中$ \breve{ {\boldsymbol{x}}}_{i, \;t}^* $和$ \breve{ {\boldsymbol{y}}}_{i, \;t}^* $的定义和函数 $ \langle {\boldsymbol{x}}, \tilde{{\boldsymbol{g}}}_{i, \;t}^{x} \rangle + \frac{1}{\alpha_t} \Psi_{{\cal{R}}}^x ({\boldsymbol{x}}, \; {\boldsymbol{x}}_{i, \;t}) $, $ - \;\langle {\boldsymbol{y}}, \; \tilde{{\boldsymbol{g}}}_{i, \;t}^{y} \rangle + \frac{1}{\eta_t} \Psi_{{\cal{R}}}^y ({\boldsymbol{y}}, \; {\boldsymbol{y}}_{i, \; t}) $的一阶最优性条件, 可知将$\; \tilde{ {\boldsymbol{x}}}_{i, \;t} $ 和 $ \tilde{ {\boldsymbol{y}}}_{i, \;t} $替换为$ \breve{ {\boldsymbol{x}}}_{i, \;t}^* $和$ \breve{ {\boldsymbol{y}}}_{i, \;t}^* $后, 引理3仍然成立, 即式(37c)和式(37d)成立.
□ 附录E. 引理7的证明
证明. 类似于式(A2), 由$ \breve{ {\boldsymbol{x}}}_{i, \;t}^* $的定义可得对任意 $ {\boldsymbol{x}} \in (1-\xi_t^x) {\boldsymbol{X}} $, 有
$$ \begin{split} &\langle \tilde{{\boldsymbol{g}}}_{i, \;t}^{x}, \; \breve{ {\boldsymbol{x}}}_{i, \;t}^* - {\boldsymbol{x}} \rangle \leq\frac{1}{\alpha_t}\langle \nabla R^x ( \breve{ {\boldsymbol{x}}}_{i, \;t}^* )\; -\\ &\qquad \nabla R^x( {\boldsymbol{x}}_{i, \;t}), \; {\boldsymbol{x}}- \breve{ {\boldsymbol{x}}}_{i, \;t}^* \rangle \end{split} $$ (E1) 通过增加和减去$ \breve{ {\boldsymbol{x}}}_{i, \;t} $, 式(E1)满足
$$ \begin{split} &\langle \nabla R^x ( \breve{ {\boldsymbol{x}}}_{i, \;t}^* ) - \nabla R^x( {\boldsymbol{x}}_{i, \;t}), \; {\boldsymbol{x}}- \breve{ {\boldsymbol{x}}}_{i, \;t}^* \rangle =\\ &\quad \langle \nabla R^x ( \breve{ {\boldsymbol{x}}}_{i, \;t}^* ) - \nabla R^x( {\boldsymbol{x}}_{i, \;t}), \; {\boldsymbol{x}}- \breve{ {\boldsymbol{x}}}_{i, \;t} +\breve{ {\boldsymbol{x}}}_{i, \;t}\;- \\ & \quad\breve{ {\boldsymbol{x}}}_{i, \;t}^* \rangle\leq \langle \nabla R^x ( \breve{ {\boldsymbol{x}}}_{i, \;t} ) - \nabla R^x( {\boldsymbol{x}}_{i, \;t}), \; {\boldsymbol{x}}- \breve{ {\boldsymbol{x}}}_{i, \;t} \rangle \;+\\ &\quad \left\| \nabla R^x ( \breve{ {\boldsymbol{x}}}_{i, \;t}^* ) - \nabla R^x(\breve{ {\boldsymbol{x}}}_{i, \;t})\right\|_* \left\| {\boldsymbol{x}}- \breve{ {\boldsymbol{x}}}_{i, \;t} \right\| \;+\\ &\quad \left\| \nabla R^x ( \breve{ {\boldsymbol{x}}}_{i, \;t}^* ) - \nabla R^x( {\boldsymbol{x}}_{i, \;t})\right\|_* \|\breve{ {\boldsymbol{x}}}_{i, \;t}- \breve{ {\boldsymbol{x}}}_{i, \;t}^* \| \leq\\ &\quad \langle \nabla R^x (\breve{ {\boldsymbol{x}}}_{i, \;t} ) - \nabla R^x( {\boldsymbol{x}}_{i, \;t}), \; {\boldsymbol{x}}- \breve{ {\boldsymbol{x}}}_{i, \;t} \rangle \;+\\ &\quad G_{RX} \left\| \breve{ {\boldsymbol{x}}}_{i, \;t}^* - \breve{ {\boldsymbol{x}}}_{i, \;t} \right\| \left( \| {\boldsymbol{x}}- \breve{ {\boldsymbol{x}}}_{i, \;t} \| +\left\| \breve{ {\boldsymbol{x}}}_{i, \;t}^* - {\boldsymbol{x}}_{i, \;t}\right\| \right) \leq \\ &\quad \langle \nabla R^x ( \breve{ {\boldsymbol{x}}}_{i, \;t} ) - \nabla R^x( {\boldsymbol{x}}_{i, \;t}), \; {\boldsymbol{x}}- \breve{ {\boldsymbol{x}}}_{i, \;t} \rangle \;+\\ &\quad 4 R_1G_{RX} \left\| \breve{ {\boldsymbol{x}}}_{i, \;t}^* - \breve{ {\boldsymbol{x}}}_{i, \;t} \right\| \\[-1pt]\end{split} $$ (E2) 其中, 式(E2)中第二个和第三个不等式分别使用了假设 7和假设2.
结合式(E1)和式(E2), 并设置$ {\boldsymbol{x}}= (1-\xi_t^x) {\boldsymbol{x}}_t^* \in (1-\xi_t^x) {\boldsymbol{X}} $, 可得
$$ \begin{split} &\langle \tilde{{\boldsymbol{g}}}_{i, \;t}^{x}, \; \breve{ {\boldsymbol{x}}}_{i, \;t}^* -(1-\xi_t^x) {\boldsymbol{x}}_t^* \rangle\leq \\ &\;\;\; \frac{1}{\alpha_t}\langle \nabla R^x ( \breve{ {\boldsymbol{x}}}_{i, \;t} ) - \nabla R^x( {\boldsymbol{x}}_{i, \;t}), \; (1-\xi_t^x) {\boldsymbol{x}}_t^*- \breve{ {\boldsymbol{x}}}_{i, \;t} \rangle \;+\\ & \;\;\;\frac{4 R_1G_{RX}}{\alpha_t} \left\| \breve{ {\boldsymbol{x}}}_{i, \;t}^* - \breve{ {\boldsymbol{x}}}_{i, \;t} \right\|\leq\frac{1}{\alpha_t}( \Psi_{{\cal{R}}}^x((1\;- \\ &\;\;\; \xi_t^x) {\boldsymbol{x}}_t^* , \; {\boldsymbol{x}}_{i, \;t}) -\Psi_{{\cal{R}}}^x((1-\xi_t^x) {\boldsymbol{x}}_t^*, \; \breve{ {\boldsymbol{x}}}_{i, \;t} ) ) \;+\\ & \;\;\;\frac{4 R_1G_{RX}}{\alpha_t} \sqrt{\frac{2\alpha_t \rho_t^x}{\varrho_x}} \end{split} $$ (E3) 其中, 式(E3)中最后一个不等式结合了引理6和$ \Psi_{{\cal{R}}}^x ({\boldsymbol{x}}_1, {\boldsymbol{x}}_2)\geq \frac{\varrho_x}{2} \| {\boldsymbol{x}}_1- {\boldsymbol{x}}_2 \|^2 $. 类似于式(C5) ~ (C8), 式(E3)右侧的项$ \Psi_{{\cal{R}}}^x((1-\xi_t^x) {\boldsymbol{x}}_t^*, \; {\boldsymbol{x}}_{i, \;t}) - \Psi_{{\cal{R}}}^x((1\;- \xi_t^x) {\boldsymbol{x}}_t^*,\; \breve{ {\boldsymbol{x}}}_{i, \;t}) $满足
$$ \begin{split} & \sum\limits_{t=1}^T \sum\limits_{i=1}^n \frac{1}{\alpha_t} \left( \Psi_{{\cal{R}}}^x((1-\xi_t^x) {\boldsymbol{x}}_t^*, \; {\boldsymbol{x}}_{i, \;t}) \right. -\\ &\quad \left.\Psi_{{\cal{R}}}^x((1-\xi_t^x) {\boldsymbol{x}}_t^*, \; \breve{ {\boldsymbol{x}}}_{i, \;t}) \right) \leq\\ & \quad \frac{n R_X}{\alpha_T} + n K_X \sum\limits_{t=1}^T \frac{1}{\alpha_t} \| {\boldsymbol{x}}_{t+1}^*-B_t {\boldsymbol{x}}_t^*\| \;+\\ &\quad n K_X R_1\sum\limits_{t=1}^T \frac{\xi_t^x -\xi_{t+1}^x}{\alpha_t} \end{split} $$ (E4) 进一步, 通过引入变量$ \tilde{ {\boldsymbol{x}}}_{i, \;t}^* $和结合式(37c), 有
$$ \begin{aligned}[b] & \sum\limits_{t=1}^T \sum\limits_{i=1}^n \langle \tilde{{\boldsymbol{g}}}_{i, \;t}^x, \; {\boldsymbol{x}}_{i, \;t}-(1-\xi_t^x) {\boldsymbol{x}}_t^* \rangle \leq \\ & \quad\sum\limits_{t=1}^T \sum\limits_{i=1}^n \left\| \tilde{{\boldsymbol{g}}}_{i, \;t}^x\right\|_* \| {\boldsymbol{x}}_{i, \;t}-\breve{ {\boldsymbol{x}}}_{i, \;t}^* \| \;+ \end{aligned} $$ $$ \begin{split} & \sum\limits_{t=1}^T \sum\limits_{i=1}^n \langle \tilde{{\boldsymbol{g}}}_{i, \;t}^x, \; \breve{ {\boldsymbol{x}}}_{i, \;t}^*-(1-\xi_t^x) {\boldsymbol{x}}_t^* \rangle \leq\\ & \sum\limits_{t=1}^T \sum\limits_{i=1}^n \frac{\alpha_t}{\varrho_x}\left\| \tilde{{\boldsymbol{g}}}_{i, \;t}^x\right\|_*^2 + n K_X \sum\limits_{t=1}^T \frac{1}{\alpha_t} \| {\boldsymbol{x}}_{t+1}^*-B_t {\boldsymbol{x}}_t^*\| \;+\\ & \frac{n R_X}{\alpha_T} +n K_X R_1\sum\limits_{t=1}^T \frac{\xi_t^x -\xi_{t+1}^x}{\alpha_t} + E_8 \sum\limits_{t=1}^T \sqrt{\frac{ \rho_t^x}{ \alpha_t}} \end{split} $$ (E5) 最后, 结合式(C1)、式(C2)、式(E5)及引理2, 式(38a)能被建立.
类似于式(A4), 由$ \breve{ {\boldsymbol{y}}}_{i, \;t}^* $的定义可知对任意的$ {\boldsymbol{y}} \in (1-\xi_t^y) {\boldsymbol{Y}} $, 有
$$ \begin{split} &\alpha_t\langle \tilde{{\boldsymbol{g}}}_{i, \;t}^{y}, \; {\boldsymbol{y}} -\breve{ {\boldsymbol{y}}}_{i, \;t}^* \rangle \leq\\ & \quad\langle \nabla R^y ( \breve{ {\boldsymbol{y}}}_{i, \;t}^* ) - \nabla R^y( {\boldsymbol{y}}_{i, \;t}), \; {\boldsymbol{y}}- \breve{ {\boldsymbol{y}}}_{i, \;t}^* \rangle \leq\\ & \quad\langle \nabla R^y ( \breve{ {\boldsymbol{y}}}_{i, \;t} ) - \nabla R^y( {\boldsymbol{y}}_{i, \;t}), \; {\boldsymbol{y}}- \breve{ {\boldsymbol{y}}}_{i, \;t} \rangle \;+\\ &\quad \left\| \nabla R^y ( \breve{ {\boldsymbol{y}}}_{i, \;t}^* ) - \nabla R^y(\breve{ {\boldsymbol{y}}}_{i, \;t})\right\|_* \left\| {\boldsymbol{y}}- \breve{ {\boldsymbol{y}}}_{i, \;t} \right\| +\\ &\quad \left\| \nabla R^y ( \breve{ {\boldsymbol{y}}}_{i, \;t}^* ) - \nabla R^y( {\boldsymbol{y}}_{i, \;t})\right\|_* \|\breve{ {\boldsymbol{y}}}_{i, \;t}- \breve{ {\boldsymbol{y}}}_{i, \;t}^* \| \leq\\ & \quad\Psi_{{\cal{R}}}^y( {\boldsymbol{y}}, \; {\boldsymbol{y}}_{i, \;t}) -\Psi_{{\cal{R}}}^y( {\boldsymbol{y}}, \; \breve{ {\boldsymbol{y}}}_{i, \;t}) \;+\\ &\quad 4 R_2G_{RY} \left\| \breve{ {\boldsymbol{y}}}_{i, \;t}^* - \breve{ {\boldsymbol{y}}}_{i, \;t} \right\| \end{split} $$ (E6) 令$ {\boldsymbol{y}}= (1-\xi_t^y) {\boldsymbol{y}}_t^* $并使用类似于式(C5) ~ (C8)的方法, 式(E6)右侧的项$ \Psi_{{\cal{R}}}^y({\boldsymbol{y}}, \; {\boldsymbol{y}}_{i, \;t}) - \Psi_{{\cal{R}}}^y({\boldsymbol{y}}, \breve{ {\boldsymbol{y}}}_{i, \;t}) $ 满足下式:
$$ \begin{split} & \sum\limits_{t=1}^T \sum\limits_{i=1}^n\frac{1}{\eta_t} \left( \Psi_{{\cal{R}}}^y((1-\xi_t^y) {\boldsymbol{y}}_t^*, \; {\boldsymbol{y}}_{i, \;t}) \right. -\\ &\quad \left. \Psi_{{\cal{R}}}^y((1-\xi_t^y) {\boldsymbol{y}}_t^*, \; \breve{ {\boldsymbol{y}}}_{i, \;t}) \right)\leq\frac{n R_Y}{\eta_T} \;+ \\ & \quad n K_Y \sum\limits_{t=1}^T \frac{1}{\eta_t} \| {\boldsymbol{y}}_{t+1}^*-C_t {\boldsymbol{y}}_t^*\| +n K_Y R_2 \sum\limits_{t=1}^T \frac{\xi_t^y -\xi_{t+1}^y}{\eta_t} \\ \end{split} $$ (E7) 进一步, 通过引入变量$ \tilde{ {\boldsymbol{y}}}_{i, \;t}^* $并结合式(37d), 有
$$ \begin{aligned}[b] & \sum\limits_{t=1}^T \sum\limits_{i=1}^n \langle \tilde{{\boldsymbol{g}}}_{i, \;t}^y, \; (1-\xi_t^y) {\boldsymbol{y}}_t^* - {\boldsymbol{y}}_{i, \;t} \rangle \leq\\ & \quad\sum\limits_{t=1}^T \sum\limits_{i=1}^n \left\| \tilde{{\boldsymbol{g}}}_{i, \;t}^y\right\|_* \| {\boldsymbol{y}}_{i, \;t}-\breve{ {\boldsymbol{y}}}_{i, \;t}^* \| \;+\\ &\quad \sum\limits_{t=1}^T \sum\limits_{i=1}^n \langle \tilde{{\boldsymbol{g}}}_{i, \;t}^y, \; (1-\xi_t^y) {\boldsymbol{y}}_t^*-\breve{ {\boldsymbol{y}}}_{i, \;t}^* \rangle \leq\\ &\quad \sum\limits_{t=1}^T \sum\limits_{i=1}^n \frac{\eta_t}{\varrho_y}\| \tilde{{\boldsymbol{g}}}_{i, \;t}^y\|_*^2 + n K_Y \sum\limits_{t=1}^T \frac{1}{\eta_t} \| {\boldsymbol{y}}_{t+1}^*-C_t {\boldsymbol{y}}_t^*\| \;+\\ &\quad \frac{n R_Y}{\eta_T} +n K_Y R_2\sum\limits_{t=1}^T \frac{\xi_t^y -\xi_{t+1}^y}{\eta_t}+ E_9 \sum\limits_{t=1}^T \sqrt{\frac{ \rho_t^y}{ \eta_t}} \\[-1pt]\end{aligned} $$ (E8) 最后, 通过结合式(C11)、式(C12)、式(E8)及引理2, 即可得到式(38b).
□ -
表 1 不同步长和精确度下的遗憾界
Table 1 The regret bounds under different step sizes and precisions
$ \alpha_t $ $ \eta_t $ $ \rho_t^x $ $ \rho_t^y $ $ {ESP-Regret}_d^j(T) $ $ \dfrac{1}{\epsilon_1k} t^{-\frac{3}{4}} $ $ \dfrac{1}{\epsilon_2 k} t^{-\frac{3}{4}} $ $ t^{-1} $ $ t^{-1} $ $ {\cal{O}}(\max \{kT^{\frac{7}{8}}, \; kT^{\frac{3}{4}} V_T \}) $ $ t^{-\frac{5}{4}} $ $ t^{-\frac{5}{4}} $ $ {\cal{O}}(kT^{\frac{3}{4}} (1+V_T)) $ $ t^{-2} $ $ t^{-2} $ $ {\cal{O}}(kT^{\frac{3}{4}} (1+V_T)) $ $ \dagger $ $ \dfrac{1}{\epsilon_1k} t^{-\varpi_1} $ $ \dfrac{1}{\epsilon_2 k} t^{-\varpi_1} $ $ t^{-1} $ $ t^{-1} $ ${\cal{O}}(\max \{kT^{\frac{3}{4}}\sqrt{1+V_T}, \; kT^{\frac{7}{8}} (1+V_T)^{-\frac{1}{4}} \})$ $ t^{-\frac{5}{4}} $ $ t^{-\frac{5}{4}} $ $ {\cal{O}}(kT^{\frac{3}{4}} \sqrt{1+V_T}) $ $ t^{-2} $ $ t^{-2} $ $ {\cal{O}}(kT^{\frac{3}{4}} \sqrt{1+V_T}) $ 注: $ \dagger $表示$ \varpi_1=3/4-\log_T {\sqrt{1+V_T}} $. -
[1] Shalev-Shwartz S. Online learning and online convex optimization. Foundations and Trends® in Machine Learning, 2012, 4(2): 107−194 [2] Hazan E. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2016, 2(3−4): 157−325 [3] Li X X, Xie L H, Li N. A survey on distributed online optimization and online games. Annual Reviews in Control, 2023, 56: Article No. 100904 doi: 10.1016/j.arcontrol.2023.100904 [4] 袁德明, 张保勇, 夏建伟. 在线分布式优化研究进展. 聊城大学学报(自然科学版), 2023, 36(5): 1−12Yuan De-Ming, Zhang Bao-Yong, Xia Jian-Wei. Recent advance in online distributed optimization. Journal of Liaocheng University (Natural Science Edition), 2023, 36(5): 1−12 [5] Zinkevich M. Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th International Conference on Machine Learning (ICML). Washington, USA: AAAI Press, 2003. 928−935 [6] Cao X Y, Liu K J R. Online convex optimization with time-varying constraints and bandit feedback. IEEE Transactions on Automatic Control, 2019, 64(7): 2665−2680 doi: 10.1109/TAC.2018.2884653 [7] 吴庆涛, 朱军龙, 葛泉波, 张明川. 一种基于条件梯度的加速分布式在线学习算法. 自动化学报, 2024, 50(2): 386−402Wu Qing-Tao, Zhu Jun-Long, Ge Quan-Bo, Zhang Ming-Chuan. An accelerated distributed online learning algorithm based on conditional gradient. Acta Automatica Sinica, 2024, 50(2): 386−402 [8] Yuan D M, Hong Y G, Ho D W C, Xu S Y. Distributed mirror descent for online composite optimization. IEEE Transactions on Automatic Control, 2021, 66(2): 714−729 doi: 10.1109/TAC.2020.2987379 [9] Yi X L, Li X X, Yang T, Xie L H, Chai T Y, Johansson K H. Distributed bandit online convex optimization with time-varying coupled inequality constraints. IEEE Transactions on Automatic Control, 2021, 66(10): 4620−4635 doi: 10.1109/TAC.2020.3030883 [10] Dixit R, Bedi A S, Rajawat K. Online learning over dynamic graphs via distributed proximal gradient algorithm. IEEE Transactions on Automatic Control, 2021, 66(11): 5065−5079 doi: 10.1109/TAC.2020.3033712 [11] Shahrampour S, Jadbabaie A. Distributed online optimization in dynamic environments using mirror descent. IEEE Transactions on Automatic Control, 2018, 63(3): 714−725 doi: 10.1109/TAC.2017.2743462 [12] Beznosikov A, Samokhin V, Gasnikov A. Distributed saddle-point problems: Lower bounds, near-optimal and robust algorithms. arXiv preprint arXiv: 2010.13112, 2022. [13] Raghavan K, Garg S, Jagannathan S, Samaranayake V A. Distributed min-max learning scheme for neural networks with applications to high-dimensional classification. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(10): 4323−4333 doi: 10.1109/TNNLS.2020.3017434 [14] Akimoto Y, Miyauchi Y, Maki A. Saddle point optimization with approximate minimization oracle and its application to robust berthing control. ACM Transactions on Evolutionary Learning and Optimization, 2022, 2(1): Article No. 2 [15] Ho-Nguyen N, Kılınç-Karzan F. Exploiting problem structure in optimization under uncertainty via online convex optimization. Mathematical Programming, 2019, 177(1−2): 113−147 doi: 10.1007/s10107-018-1262-8 [16] Rivera Cardoso A, Wang H, Xu H. The online saddle point problem and online convex optimization with knapsacks. Mathematics of Operations Research, DOI: 10.1287/moor.2018.0330 [17] Xu Y, Jiang Y, Xie X, Li D Q. An online saddle point optimization algorithm with regularization. IOP Conference Series: Materials Science and Engineering, 2019, 569(5): Article No. 052035 [18] Wood K R, Dall'Anese E. Online saddle point tracking with decision-dependent data. In: Proceedings of the 5th Annual Learning for Dynamics and Control Conference. Istanbul, Turkey: PMLR, 2023. 1416−1428 [19] Flaxman A D, Kalai A T, McMahan H B. Online convex optimization in the Bandit setting: Gradient descent without a gradient. In: Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms. Vancouver, Canada: Society for Industrial and Applied Mathematics, 2005. 385−394 [20] Agarwal A, Dekel O, Xiao L. Optimal algorithms for online convex optimization with multi-point Bandit feedback. In: Proceedings of the 23rd Conference on Learning Theory (COLT). Haifa, Israel: Omnipress, 2010. 28−40 [21] Besbes O, Gur Y, Zeevi A. Non-stationary stochastic optimization. Operations Research, 2015, 63(5): 1227−1244 doi: 10.1287/opre.2015.1408 [22] Roy A, Chen Y F, Balasubramanian K, Mohapatra P. Online and Bandit algorithms for nonstationary stochastic saddle-point optimization. arXiv preprint arXiv: 1912.01698, 2019. [23] Cardoso A R, Abernethy J, Wang H, Xu H. Competing against Nash equilibria in adversarially changing zero-sum games. In: Proceedings of the 36th International Conference on Machine Learning (ICML). Long Beach, USA: PMLR, 2019. 921−930 [24] Yang T, Yi X L, Wu J F, Yuan Y, Wu D, Meng Z Y, et al. A survey of distributed optimization. Annual Reviews in Control, 2019, 47: 278−305 doi: 10.1016/j.arcontrol.2019.05.006 [25] Nedic A, Olshevsky A, Ozdaglar A, Tsitsiklis J N. Distributed subgradient methods and quantization effects. In: Proceedings of the 47th IEEE Conference on Decision and Control. Cancun, Mexico: IEEE, 2008. 4177−4184 [26] 洪奕光, 张艳琼. 分布式优化: 算法设计和收敛性分析. 控制理论与应用, 2014, 31(7): 850−857 doi: 10.7641/CTA.2014.40012Hong Yi-Guang, Zhang Yan-Qiong. Distributed optimization: Algorithm design and convergence analysis. Control Theory & Applications, 2014, 31(7): 850−857 doi: 10.7641/CTA.2014.40012 [27] 王龙, 卢开红, 关永强. 分布式优化的多智能体方法. 控制理论与应用, 2019, 36(11): 1820−1833 doi: 10.7641/CTA.2019.90502Wang Long, Lu Kai-Hong, Guan Yong-Qiang. Distributed optimization via multi-agent systems. Control Theory & Applications, 2019, 36(11): 1820−1833 doi: 10.7641/CTA.2019.90502 [28] 杨涛, 柴天佑. 分布式协同优化的研究现状与展望. 中国科学: 技术科学, 2020, 50(11): 1413−1425Yang Tao, Chai Tian-You. Research status and prospects of distributed collaborative optimization. Scientia Sinica Technologica, 2020, 50(11): 1413−1425 [29] 谢佩, 游科友, 洪奕光, 谢立华. 网络化分布式凸优化算法研究进展. 控制理论与应用, 2018, 35(7): 918−927 doi: 10.7641/CTA.2018.80205Xie Pei, You Ke-You, Hong Yi-Guang, Xie Li-Hua. A survey of distributed convex optimization algorithms over networks. Control Theory & Applications, 2018, 35(7): 918−927 doi: 10.7641/CTA.2018.80205 [30] 衣鹏, 洪奕光. 分布式合作优化及其应用. 中国科学: 数学, 2016, 46(10): 1547−1564Yi Peng, Hong Yi-Guang. Distributed cooperative optimization and its applications. Scientia Sinica Mathematica, 2016, 46(10): 1547−1564 [31] 杨涛, 徐磊, 易新蕾, 张圣军, 陈蕊娟, 李渝哲. 基于事件触发的分布式优化算法. 自动化学报, 2022, 48(1): 133−143Yang Tao, Xu Lei, Yi Xin-Lei, Zhang Sheng-Jun, Chen Rui-Juan, Li Yu-Zhe. Event-triggered distributed optimization algorithms. Acta Automatica Sinica, 2022, 48(1): 133−143 [32] 刘奕葶, 马铭莙, 付俊. 基于有向图的分布式连续时间非光滑耦合约束凸优化分析. 自动化学报, 2024, 50(1): 66−75Liu Yi-Ting, Ma Ming-Jun, Fu Jun. Distributed continuous-time non-smooth convex optimization analysis with coupled constraints over directed graphs. Acta Automatica Sinica, 2024, 50(1): 66−75 [33] 时侠圣, 任璐, 孙长银. 自适应分布式聚合博弈广义纳什均衡算法. 自动化学报, 2024, 50(6): 1210−1220Shi Xia-Sheng, Ren Lu, Sun Chang-Yin. Distributed adaptive generalized Nash equilibrium algorithm for aggregative games. Acta Automatica Sinica, 2024, 50(6): 1210−1220 [34] 陈刚, 李志勇. 集合约束下多智能体系统分布式固定时间优化控制. 自动化学报, 2022, 48(9): 2254−2264Chen Gang, Li Zhi-Yong. Distributed fixed-time optimization control for multi-agent systems with set constraints. Acta Automatica Sinica, 2022, 48(9): 2254−2264 [35] Xu J M, Zhu S Y, Soh Y C, Xie L H. Convergence of asynchronous distributed gradient methods over stochastic networks. IEEE Transactions on Automatic Control, 2018, 63(2): 434−448 doi: 10.1109/TAC.2017.2730481 [36] Liu C X, Li H P, Shi Y. Resource-aware exact decentralized optimization using event-triggered broadcasting. IEEE Transactions on Automatic Control, 2021, 66(7): 2961−2974 doi: 10.1109/TAC.2020.3014316 [37] Zhang W T, Shi Y, Zhang B Y, Yuan D M, Xu S Y. Dynamic regret of distributed online saddle point problem. IEEE Transactions on Automatic Control, 2024, 69(4): 2522−2529 doi: 10.1109/TAC.2023.3312033 [38] Zhao P, Wang G H, Zhang L J, Zhou Z H. Bandit convex optimization in non-stationary environments. In: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. Palermo, Italy: PMLR, 2020. 1508−1518 [39] Borwein J M, Lewis A S. Convex Analysis and Nonlinear Optimization: Theory and Examples. New York: Springer, 2000. [40] Zhang Y, Ravier R J, Zavlanos M M, Tarokh V. A distributed online convex optimization algorithm with improved dynamic regret. In: Proceedings of the IEEE 58th Conference on Decision and Control (CDC). Nice, France: IEEE, 2019. 2449−2454 [41] Dai M C, Ho D W C, Zhang B Y, Yuan D M, Xu S Y. Distributed online Bandit linear regressions with differential privacy. Journal of the Franklin Institute, 2023, 360(16): 11736−11759 doi: 10.1016/j.jfranklin.2023.09.001 [42] Yuan D M, Zhang B Y, Ho D W C, Zheng W X, Xu S Y. Distributed online Bandit optimization under random quantization. Automatica, 2022, 146: Article No. 110590 doi: 10.1016/j.automatica.2022.110590 [43] Banerjee A, Merugu S, Dhillon I S, Ghosh J. Clustering with Bregman divergences. The Journal of Machine Learning Research, 2005, 6: 1705−1749 [44] Bauschke H H, Borwein J M. Joint and separate convexity of the Bregman distance. Studies in Computational Mathematics, 2001, 8: 23−36 [45] Dhillon I S, Tropp J A. Matrix nearness problems with Bregman divergences. SIAM Journal on Matrix Analysis and Applications, 2008, 29(4): 1120−1146 doi: 10.1137/060649021 [46] Jadbabaie A, Rakhlin A, Shahrampour S, Sridharan K. Online optimization: Competing with dynamic comparators. In: Proceedings of the 18th International Conference on Artificial Intelligence and Statistics. San Diego, USA: JMLR.org, 2015. 398−406 [47] Li J Y, Li C J, Yu W W, Zhu X M, Yu X H. Distributed online Bandit learning in dynamic environments over unbalanced digraphs. IEEE Transactions on Network Science and Engineering, 2021, 8(4): 3034−3047 doi: 10.1109/TNSE.2021.3093536 [48] Yi X L, Li X X, Xie L H, Johansson K H. Distributed online convex optimization with time-varying coupled inequality constraints. IEEE Transactions on Signal Processing, 2020, 68: 731−746 doi: 10.1109/TSP.2020.2964200 [49] Xiong M H, Zhang B Y, Ho D W C, Yuan D M, Xu S Y. Event-triggered distributed stochastic mirror descent for convex optimization. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(9): 6480−6491 doi: 10.1109/TNNLS.2021.3137010 [50] Yu Z, Ho D W C, Yuan D M. Distributed randomized gradient-free mirror descent algorithm for constrained optimization. IEEE Transactions on Automatic Control, 2022, 67(2): 957−964 doi: 10.1109/TAC.2021.3075669 [51] Ben-Tal A, El Ghaoui L, Nemirovski A. Robust Optimization. Princeton: Princeton University Press, 2009. [52] Zhang S Q, Choudhury S, Stich S U, Loizou N. Communication-efficient gradient descent-accent methods for distributed variational inequalities: Unified analysis and local updates. arXiv preprint arXiv: 2306.05100, 2024. [53] Thekumparampil K K, He N, Oh S. Lifted primal-dual method for bilinearly coupled smooth minimax optimization. In: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics. Valencia, Spain: PMLR, 2022. 4281−4308 -