一种新的多智能体Q学习算法

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一种新的多智能体Q学习算法

郭锐, 吴敏, 彭军, 彭姣, 曹卫华

文章导航 > 自动化学报 > 2007 > 33(4): 367-372

郭锐, 吴敏, 彭军, 彭姣, 曹卫华. 一种新的多智能体Q学习算法. 自动化学报, 2007, 33(4): 367-372. doi: 10.1360/aas-007-0367

引用本文:

郭锐, 吴敏, 彭军, 彭姣, 曹卫华. 一种新的多智能体Q学习算法. 自动化学报, 2007, 33(4): 367-372. doi: 10.1360/aas-007-0367

GUO Rui, WU Min, PENG Jun, PENG Jiao, CAO Wei-Hua. A New Q Learning Algorithm for Multi-agent Systems. ACTA AUTOMATICA SINICA, 2007, 33(4): 367-372. doi: 10.1360/aas-007-0367

Citation:

GUO Rui, WU Min, PENG Jun, PENG Jiao, CAO Wei-Hua. A New Q Learning Algorithm for Multi-agent Systems. ACTA AUTOMATICA SINICA, 2007, 33(4): 367-372. doi: 10.1360/aas-007-0367

郭锐, 吴敏, 彭军, 彭姣, 曹卫华. 一种新的多智能体Q学习算法. 自动化学报, 2007, 33(4): 367-372. doi: 10.1360/aas-007-0367

引用本文:

郭锐, 吴敏, 彭军, 彭姣, 曹卫华. 一种新的多智能体Q学习算法. 自动化学报, 2007, 33(4): 367-372. doi: 10.1360/aas-007-0367

GUO Rui, WU Min, PENG Jun, PENG Jiao, CAO Wei-Hua. A New Q Learning Algorithm for Multi-agent Systems. ACTA AUTOMATICA SINICA, 2007, 33(4): 367-372. doi: 10.1360/aas-007-0367

Citation:

GUO Rui, WU Min, PENG Jun, PENG Jiao, CAO Wei-Hua. A New Q Learning Algorithm for Multi-agent Systems. ACTA AUTOMATICA SINICA, 2007, 33(4): 367-372. doi: 10.1360/aas-007-0367

一种新的多智能体Q学习算法

doi: 10.1360/aas-007-0367

1.
中南大学信息科学与工程学院长沙 410083
2.
贵州省高速公路开发总公司贵阳 550003

通讯作者:
吴敏

中图分类号: TP18
计量
- 文章访问数: 4218
- HTML全文浏览量: 187
- PDF下载量: 1997
- 被引次数: 0
出版历程
- 收稿日期: 2005-11-10
- 修回日期: 2006-04-28
- 刊出日期: 2007-04-20

A New Q Learning Algorithm for Multi-agent Systems

1.
School of Information Science and Engineering, Central SouthUniversity, Changsha 410083;
2.
Expressway Development Company of Guizhou, Guiyang 550003

More Information

Corresponding author: WU Min

摘要: 针对非确定马尔可夫环境下的多智能体系统,提出了一种新的多智能体Q学习算法.算法中通过对联合动作的统计来学习其它智能体的行为策略,并利用智能体策略向量的全概率分布保证了对联合最优动作的选择. 同时对算法的收敛性和学习性能进行了分析.该算法在多智能体系统RoboCup中的应用进一步表明了算法的有效性与泛化能力.
- 多智能体 /
- 增强学习 /
- Q学习
Abstract: Due to the presence of other agents, the environment of multi-agent systems (MAS) cannot be simply treated as Markov decision processes (MDPs). The current reinforcement learning algorithms which are based on MDPs must be reformed before it can be applicable to MAS. Based on an agent's independent learning ability this paper proposes a novel Q-learning algorithm for MAS---an agent learning other agents' action policies through observing the joint action. The policies of other agents are expressed as action probability distribution matrixes. A concise and yet useful updating method for the matrixes is proposed. The full joint probability of distribution matrixes guarantees the learning agent to choose his/her optimal action. The convergence and performance of the proposed algorithm are analyzed theoretically. When applied to RoboCup, our algorithm showed high learning efficiency and good generalization ability. Finally, we briefly point out some directions of multi-agent reinforcement learning.
- Multi-agent systems /
- reinforcement learning /
- Q-learning

参考文献(0)

资源附件(0)

计量

文章访问数: 4218
HTML全文浏览量: 187
PDF下载量: 1997
被引次数: 0

/

下载: 全尺寸图片幻灯片

分享

用微信扫码二维码

分享至好友和朋友圈

返回

版权所有 © 《自动化学报》编辑部京ICP备14019135号-6

地址：北京中关村东路95号邮政编码：100190E-mail：aas_editor@ia.ac.cn

电话：010-82544677 (日常咨询和稿件处理)，010-82544653(费用管理、寄刊)

本系统由北京仁和汇智信息技术有限公司开发技术支持： info@rhhz.net