-
摘要: 针对强化学习系统收敛速度慢的问题, 提出一种适用于连续状态、离散动作空间的基于协同最小二乘支持向量机的Q学习. 该Q学习系统由一个最小二乘支持向量回归机(Least squares support vector regression machine, LS-SVRM)和一个最小二乘支持向量分类机(Least squares support vector classification machine, LS-SVCM)构成. LS-SVRM用于逼近状态--动作对到值函数的映射, LS-SVCM则用于逼近连续状态空间到离散动作空间的映射, 并为LS-SVRM提供实时、动态的知识或建议(建议动作值)以促进值函数的学习. 小车爬山最短时间控制仿真结果表明, 与基于单一LS-SVRM的Q学习系统相比, 该方法加快了系统的学习收敛速度, 具有较好的学习性能.Abstract: In order to solve the problem of slow convergence speed in reinforcement learning systems, a Q learning system based on a cooperative least squares support vector machine for continuous state space and discrete action space is proposed. The proposed Q learning system is composed of a least squares support vector regression machine (LS-SVRM) and a least squares support vector classification machine (LS-SVCM). The LS-SVRM is used to approximate a mapping from a state-action pair to a value function, and the LS-SVCM is used to approximate a mapping from a continuous state space to a discrete action space. In addition, the LS-SVCM supplies the LS-SVRM with dynamic and real-time knowledge or advice (suggested action) to accelerate its learning process. Simulation studies involving a mountain car control illustrate that compared with a Q learning system based on a single LS-SVRM, the proposed Q learning system has a faster convergence speed and a better learning performance.
计量
- 文章访问数: 3022
- HTML全文浏览量: 67
- PDF下载量: 1836
- 被引次数: 0