2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

分布式在线鞍点问题的Bandit反馈优化算法

张文韬 张保勇 袁德明 徐胜元

张文韬, 张保勇, 袁德明, 徐胜元. 分布式在线鞍点问题的Bandit反馈优化算法. 自动化学报, 2025, 51(3): 1−18 doi: 10.16383/j.aas.c240289
引用本文: 张文韬, 张保勇, 袁德明, 徐胜元. 分布式在线鞍点问题的Bandit反馈优化算法. 自动化学报, 2025, 51(3): 1−18 doi: 10.16383/j.aas.c240289
Zhang Wen-Tao, Zhang Bao-Yong, Yuan De-Ming, Xu Sheng-Yuan. Bandit feedback optimization algorithm for distributed online saddle point problem. Acta Automatica Sinica, 2025, 51(3): 1−18 doi: 10.16383/j.aas.c240289
Citation: Zhang Wen-Tao, Zhang Bao-Yong, Yuan De-Ming, Xu Sheng-Yuan. Bandit feedback optimization algorithm for distributed online saddle point problem. Acta Automatica Sinica, 2025, 51(3): 1−18 doi: 10.16383/j.aas.c240289

分布式在线鞍点问题的Bandit反馈优化算法

doi: 10.16383/j.aas.c240289 cstr: 32138.14.j.aas.c240289
基金项目: 国家自然科学基金(62273181, 62373190, 62221004)资助
详细信息
    作者简介:

    张文韬:南京理工大学自动化学院博士研究生. 主要研究方向为多智能体分布式优化, 鞍点问题. E-mail: iswt.zhang@gmail.com

    张保勇:南京理工大学自动化学院教授.主要研究方向为多智能体分布式优化与控制, 时滞系统和非线性系统的分析与控制. 本文通信作者. E-mail: baoyongzhang@njust.edu.cn

    袁德明:南京理工大学自动化学院教授. 主要研究方向为多智能体分布式优化与控制, 分布式机器学习. E-mail: dmyuan1012@gmail.com

    徐胜元:南京理工大学自动化学院教授. 主要研究方向为广义系统、时滞系统和非线性系统的分析与控制. E-mail: syxu@njust.edu.cn

Bandit Feedback Optimization Algorithm for Distributed Online Saddle Point Problem

Funds: Supported by National Natural Science Foundation of China (62273181, 62373190, 62221004)
More Information
    Author Bio:

    ZHANG Wen-Tao Ph.D. candidate at the School of Automation, Nanjing University of Science and Technology. His research interest covers multi-agent distributed optimization and saddle point problem

    ZHANG Bao-Yong   Professor at the School of Automation, Nanjing University of Science and Technology. His research interest covers multi-agent distributed optimization and control, and the analysis and control of time-delay systems and nonlinear systems. Corresponding author of this paper

    YUAN De-Ming Professor at the School of Automation, Nanjing University of Science and Technology. His research interest covers multi-agent distributed optimization and control, and distributed machine learning

    XU Sheng-Yuan Professor at the School of Automation, Nanjing University of Science and Technology. His research interest covers the analysis and control of singular systems, time-delay systems, and nonlinear systems

  • 摘要: 本文研究了多智能体时变网络上基于Bandit反馈的分布式在线鞍点问题, 其中每个智能体通过本地计算和局部信息交流去协作最小化全局损失函数. 在Bandit反馈下, 包括梯度在内的损失函数信息是不可用的, 每个智能体仅能获得和使用在某决策或其附近产生的函数值. 为此, 结合单点梯度估计方法和预测映射技术, 提出一种非欧几里得意义上的分布式在线Bandit鞍点优化算法. 以动态鞍点遗憾作为性能指标, 对于一般的凸−凹损失函数, 建立了遗憾上界并在某些预设条件下确保所提算法的次线性收敛. 此外, 考虑到在迭代优化中计算优化子程序的精确解通常较为困难, 进一步扩展一种基于近似计算方法的算法变种, 并严格分析精确度设置对扩展算法遗憾上界的影响. 最后, 通过一个目标跟踪案例对算法的有效性和先进性进行仿真验证.
  • 图  1  算法1的收敛性

    Fig.  1  The convergence of Algorithm 1

    图  2  不同$ B_t $和$ C_t $下算法1的对比结果

    Fig.  2  The comparison results of Algorithm 1 under different $ B_t $ and $ C_t $

    图  3  不同$ d $选择下带有欧氏范数与$ p $范数的算法1对比

    Fig.  3  The comparison of Algorithm 1 using Euclidean norm and $ p $-norm under the different choices of$ d $

    图  4  不同$ \rho_t^x $和$ \rho_t^y $选择下算法1的近似计算变种性能对比

    Fig.  4  The comparison of the low-computation variant of Algorithm 1 with different choices of $ \rho_t^x $ and $ \rho_t^y $

    图  5  算法1和其集中式算法的对比

    Fig.  5  The comparison between Algorithm 1 and its centralized algorithm

    表  1  不同步长和精确度下的遗憾界

    Table  1  The regret bounds under different step sizes and precisions

    $ \alpha_t $ $ \eta_t $ $ \rho_t^x $ $ \rho_t^y $ $ {ESP-Regret}_d^j(T) $
    $ \dfrac{1}{\epsilon_1k} t^{-\frac{3}{4}} $ $ \dfrac{1}{\epsilon_2 k} t^{-\frac{3}{4}} $ $ t^{-1} $ $ t^{-1} $ $ {\cal{O}}(\max \{kT^{\frac{7}{8}}, \; kT^{\frac{3}{4}} V_T \}) $
    $ t^{-\frac{5}{4}} $ $ t^{-\frac{5}{4}} $ $ {\cal{O}}(kT^{\frac{3}{4}} (1+V_T)) $
    $ t^{-2} $ $ t^{-2} $ $ {\cal{O}}(kT^{\frac{3}{4}} (1+V_T)) $
    $ \dagger $ $ \dfrac{1}{\epsilon_1k} t^{-\varpi_1} $ $ \dfrac{1}{\epsilon_2 k} t^{-\varpi_1} $ $ t^{-1} $ $ t^{-1} $ ${\cal{O}}(\max \{kT^{\frac{3}{4}}\sqrt{1+V_T}, \; kT^{\frac{7}{8}} (1+V_T)^{-\frac{1}{4}} \})$
    $ t^{-\frac{5}{4}} $ $ t^{-\frac{5}{4}} $ $ {\cal{O}}(kT^{\frac{3}{4}} \sqrt{1+V_T}) $
    $ t^{-2} $ $ t^{-2} $ $ {\cal{O}}(kT^{\frac{3}{4}} \sqrt{1+V_T}) $
    注: $ \dagger $表示$ \varpi_1=3/4-\log_T {\sqrt{1+V_T}} $.
    下载: 导出CSV
  • [1] Shalev-Shwartz S. Online learning and online convex optimization. Foundations and Trends® in Machine Learning, 2012, 4(2): 107−194
    [2] Hazan E. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2016, 2(3−4): 157−325
    [3] Li X X, Xie L H, Li N. A survey on distributed online optimization and online games. Annual Reviews in Control, 2023, 56: Article No. 100904 doi: 10.1016/j.arcontrol.2023.100904
    [4] 袁德明, 张保勇, 夏建伟. 在线分布式优化研究进展. 聊城大学学报(自然科学版), 2023, 36(5): 1−12

    Yuan De-Ming, Zhang Bao-Yong, Xia Jian-Wei. Recent advance in online distributed optimization. Journal of Liaocheng University (Natural Science Edition), 2023, 36(5): 1−12
    [5] Zinkevich M. Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th International Conference on Machine Learning (ICML). Washington, USA: AAAI Press, 2003. 928−935
    [6] Cao X Y, Liu K J R. Online convex optimization with time-varying constraints and bandit feedback. IEEE Transactions on Automatic Control, 2019, 64(7): 2665−2680 doi: 10.1109/TAC.2018.2884653
    [7] 吴庆涛, 朱军龙, 葛泉波, 张明川. 一种基于条件梯度的加速分布式在线学习算法. 自动化学报, 2024, 50(2): 386−402

    Wu Qing-Tao, Zhu Jun-Long, Ge Quan-Bo, Zhang Ming-Chuan. An accelerated distributed online learning algorithm based on conditional gradient. Acta Automatica Sinica, 2024, 50(2): 386−402
    [8] Yuan D M, Hong Y G, Ho D W C, Xu S Y. Distributed mirror descent for online composite optimization. IEEE Transactions on Automatic Control, 2021, 66(2): 714−729 doi: 10.1109/TAC.2020.2987379
    [9] Yi X L, Li X X, Yang T, Xie L H, Chai T Y, Johansson K H. Distributed bandit online convex optimization with time-varying coupled inequality constraints. IEEE Transactions on Automatic Control, 2021, 66(10): 4620−4635 doi: 10.1109/TAC.2020.3030883
    [10] Dixit R, Bedi A S, Rajawat K. Online learning over dynamic graphs via distributed proximal gradient algorithm. IEEE Transactions on Automatic Control, 2021, 66(11): 5065−5079 doi: 10.1109/TAC.2020.3033712
    [11] Shahrampour S, Jadbabaie A. Distributed online optimization in dynamic environments using mirror descent. IEEE Transactions on Automatic Control, 2018, 63(3): 714−725 doi: 10.1109/TAC.2017.2743462
    [12] Beznosikov A, Samokhin V, Gasnikov A. Distributed saddle-point problems: Lower bounds, near-optimal and robust algorithms. arXiv preprint arXiv: 2010.13112, 2022.
    [13] Raghavan K, Garg S, Jagannathan S, Samaranayake V A. Distributed min-max learning scheme for neural networks with applications to high-dimensional classification. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(10): 4323−4333 doi: 10.1109/TNNLS.2020.3017434
    [14] Akimoto Y, Miyauchi Y, Maki A. Saddle point optimization with approximate minimization oracle and its application to robust berthing control. ACM Transactions on Evolutionary Learning and Optimization, 2022, 2(1): Article No. 2
    [15] Ho-Nguyen N, Kılınç-Karzan F. Exploiting problem structure in optimization under uncertainty via online convex optimization. Mathematical Programming, 2019, 177(1−2): 113−147 doi: 10.1007/s10107-018-1262-8
    [16] Rivera Cardoso A, Wang H, Xu H. The online saddle point problem and online convex optimization with knapsacks. Mathematics of Operations Research, DOI: 10.1287/moor.2018.0330
    [17] Xu Y, Jiang Y, Xie X, Li D Q. An online saddle point optimization algorithm with regularization. IOP Conference Series: Materials Science and Engineering, 2019, 569(5): Article No. 052035
    [18] Wood K R, Dall'Anese E. Online saddle point tracking with decision-dependent data. In: Proceedings of the 5th Annual Learning for Dynamics and Control Conference. Istanbul, Turkey: PMLR, 2023. 1416−1428
    [19] Flaxman A D, Kalai A T, McMahan H B. Online convex optimization in the Bandit setting: Gradient descent without a gradient. In: Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms. Vancouver, Canada: Society for Industrial and Applied Mathematics, 2005. 385−394
    [20] Agarwal A, Dekel O, Xiao L. Optimal algorithms for online convex optimization with multi-point Bandit feedback. In: Proceedings of the 23rd Conference on Learning Theory (COLT). Haifa, Israel: Omnipress, 2010. 28−40
    [21] Besbes O, Gur Y, Zeevi A. Non-stationary stochastic optimization. Operations Research, 2015, 63(5): 1227−1244 doi: 10.1287/opre.2015.1408
    [22] Roy A, Chen Y F, Balasubramanian K, Mohapatra P. Online and Bandit algorithms for nonstationary stochastic saddle-point optimization. arXiv preprint arXiv: 1912.01698, 2019.
    [23] Cardoso A R, Abernethy J, Wang H, Xu H. Competing against Nash equilibria in adversarially changing zero-sum games. In: Proceedings of the 36th International Conference on Machine Learning (ICML). Long Beach, USA: PMLR, 2019. 921−930
    [24] Yang T, Yi X L, Wu J F, Yuan Y, Wu D, Meng Z Y, et al. A survey of distributed optimization. Annual Reviews in Control, 2019, 47: 278−305 doi: 10.1016/j.arcontrol.2019.05.006
    [25] Nedic A, Olshevsky A, Ozdaglar A, Tsitsiklis J N. Distributed subgradient methods and quantization effects. In: Proceedings of the 47th IEEE Conference on Decision and Control. Cancun, Mexico: IEEE, 2008. 4177−4184
    [26] 洪奕光, 张艳琼. 分布式优化: 算法设计和收敛性分析. 控制理论与应用, 2014, 31(7): 850−857 doi: 10.7641/CTA.2014.40012

    Hong Yi-Guang, Zhang Yan-Qiong. Distributed optimization: Algorithm design and convergence analysis. Control Theory & Applications, 2014, 31(7): 850−857 doi: 10.7641/CTA.2014.40012
    [27] 王龙, 卢开红, 关永强. 分布式优化的多智能体方法. 控制理论与应用, 2019, 36(11): 1820−1833 doi: 10.7641/CTA.2019.90502

    Wang Long, Lu Kai-Hong, Guan Yong-Qiang. Distributed optimization via multi-agent systems. Control Theory & Applications, 2019, 36(11): 1820−1833 doi: 10.7641/CTA.2019.90502
    [28] 杨涛, 柴天佑. 分布式协同优化的研究现状与展望. 中国科学: 技术科学, 2020, 50(11): 1413−1425

    Yang Tao, Chai Tian-You. Research status and prospects of distributed collaborative optimization. Scientia Sinica Technologica, 2020, 50(11): 1413−1425
    [29] 谢佩, 游科友, 洪奕光, 谢立华. 网络化分布式凸优化算法研究进展. 控制理论与应用, 2018, 35(7): 918−927 doi: 10.7641/CTA.2018.80205

    Xie Pei, You Ke-You, Hong Yi-Guang, Xie Li-Hua. A survey of distributed convex optimization algorithms over networks. Control Theory & Applications, 2018, 35(7): 918−927 doi: 10.7641/CTA.2018.80205
    [30] 衣鹏, 洪奕光. 分布式合作优化及其应用. 中国科学: 数学, 2016, 46(10): 1547−1564

    Yi Peng, Hong Yi-Guang. Distributed cooperative optimization and its applications. Scientia Sinica Mathematica, 2016, 46(10): 1547−1564
    [31] 杨涛, 徐磊, 易新蕾, 张圣军, 陈蕊娟, 李渝哲. 基于事件触发的分布式优化算法. 自动化学报, 2022, 48(1): 133−143

    Yang Tao, Xu Lei, Yi Xin-Lei, Zhang Sheng-Jun, Chen Rui-Juan, Li Yu-Zhe. Event-triggered distributed optimization algorithms. Acta Automatica Sinica, 2022, 48(1): 133−143
    [32] 刘奕葶, 马铭莙, 付俊. 基于有向图的分布式连续时间非光滑耦合约束凸优化分析. 自动化学报, 2024, 50(1): 66−75

    Liu Yi-Ting, Ma Ming-Jun, Fu Jun. Distributed continuous-time non-smooth convex optimization analysis with coupled constraints over directed graphs. Acta Automatica Sinica, 2024, 50(1): 66−75
    [33] 时侠圣, 任璐, 孙长银. 自适应分布式聚合博弈广义纳什均衡算法. 自动化学报, 2024, 50(6): 1210−1220

    Shi Xia-Sheng, Ren Lu, Sun Chang-Yin. Distributed adaptive generalized Nash equilibrium algorithm for aggregative games. Acta Automatica Sinica, 2024, 50(6): 1210−1220
    [34] 陈刚, 李志勇. 集合约束下多智能体系统分布式固定时间优化控制. 自动化学报, 2022, 48(9): 2254−2264

    Chen Gang, Li Zhi-Yong. Distributed fixed-time optimization control for multi-agent systems with set constraints. Acta Automatica Sinica, 2022, 48(9): 2254−2264
    [35] Xu J M, Zhu S Y, Soh Y C, Xie L H. Convergence of asynchronous distributed gradient methods over stochastic networks. IEEE Transactions on Automatic Control, 2018, 63(2): 434−448 doi: 10.1109/TAC.2017.2730481
    [36] Liu C X, Li H P, Shi Y. Resource-aware exact decentralized optimization using event-triggered broadcasting. IEEE Transactions on Automatic Control, 2021, 66(7): 2961−2974 doi: 10.1109/TAC.2020.3014316
    [37] Zhang W T, Shi Y, Zhang B Y, Yuan D M, Xu S Y. Dynamic regret of distributed online saddle point problem. IEEE Transactions on Automatic Control, 2024, 69(4): 2522−2529 doi: 10.1109/TAC.2023.3312033
    [38] Zhao P, Wang G H, Zhang L J, Zhou Z H. Bandit convex optimization in non-stationary environments. In: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. Palermo, Italy: PMLR, 2020. 1508−1518
    [39] Borwein J M, Lewis A S. Convex Analysis and Nonlinear Optimization: Theory and Examples. New York: Springer, 2000.
    [40] Zhang Y, Ravier R J, Zavlanos M M, Tarokh V. A distributed online convex optimization algorithm with improved dynamic regret. In: Proceedings of the IEEE 58th Conference on Decision and Control (CDC). Nice, France: IEEE, 2019. 2449−2454
    [41] Dai M C, Ho D W C, Zhang B Y, Yuan D M, Xu S Y. Distributed online Bandit linear regressions with differential privacy. Journal of the Franklin Institute, 2023, 360(16): 11736−11759 doi: 10.1016/j.jfranklin.2023.09.001
    [42] Yuan D M, Zhang B Y, Ho D W C, Zheng W X, Xu S Y. Distributed online Bandit optimization under random quantization. Automatica, 2022, 146: Article No. 110590 doi: 10.1016/j.automatica.2022.110590
    [43] Banerjee A, Merugu S, Dhillon I S, Ghosh J. Clustering with Bregman divergences. The Journal of Machine Learning Research, 2005, 6: 1705−1749
    [44] Bauschke H H, Borwein J M. Joint and separate convexity of the Bregman distance. Studies in Computational Mathematics, 2001, 8: 23−36
    [45] Dhillon I S, Tropp J A. Matrix nearness problems with Bregman divergences. SIAM Journal on Matrix Analysis and Applications, 2008, 29(4): 1120−1146 doi: 10.1137/060649021
    [46] Jadbabaie A, Rakhlin A, Shahrampour S, Sridharan K. Online optimization: Competing with dynamic comparators. In: Proceedings of the 18th International Conference on Artificial Intelligence and Statistics. San Diego, USA: JMLR.org, 2015. 398−406
    [47] Li J Y, Li C J, Yu W W, Zhu X M, Yu X H. Distributed online Bandit learning in dynamic environments over unbalanced digraphs. IEEE Transactions on Network Science and Engineering, 2021, 8(4): 3034−3047 doi: 10.1109/TNSE.2021.3093536
    [48] Yi X L, Li X X, Xie L H, Johansson K H. Distributed online convex optimization with time-varying coupled inequality constraints. IEEE Transactions on Signal Processing, 2020, 68: 731−746 doi: 10.1109/TSP.2020.2964200
    [49] Xiong M H, Zhang B Y, Ho D W C, Yuan D M, Xu S Y. Event-triggered distributed stochastic mirror descent for convex optimization. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(9): 6480−6491 doi: 10.1109/TNNLS.2021.3137010
    [50] Yu Z, Ho D W C, Yuan D M. Distributed randomized gradient-free mirror descent algorithm for constrained optimization. IEEE Transactions on Automatic Control, 2022, 67(2): 957−964 doi: 10.1109/TAC.2021.3075669
    [51] Ben-Tal A, El Ghaoui L, Nemirovski A. Robust Optimization. Princeton: Princeton University Press, 2009.
    [52] Zhang S Q, Choudhury S, Stich S U, Loizou N. Communication-efficient gradient descent-accent methods for distributed variational inequalities: Unified analysis and local updates. arXiv preprint arXiv: 2306.05100, 2024.
    [53] Thekumparampil K K, He N, Oh S. Lifted primal-dual method for bilinearly coupled smooth minimax optimization. In: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics. Valencia, Spain: PMLR, 2022. 4281−4308
  • 加载中
计量
  • 文章访问数:  153
  • HTML全文浏览量:  57
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-05-24
  • 录用日期:  2024-08-27
  • 网络出版日期:  2024-10-08

目录

    /

    返回文章
    返回