Temporal Difference Learning Algorithms for Average Reward Problem
-
摘要: 考虑平均准则随机动态规划(SDP)问题的一族在线即时差分(TD)学习算法.在学 习中,平均问题的相对值函数是控制器所要学习的目标函数.所提出的算法是已有的TD(λ) 算法及R-学习算法的一种推广.
-
关键词:
- 即时差分学习 /
- 强化学习 /
- 动态规划 /
- Monte Carlo方法
Abstract: In this paper, some on-line TD (λ) learning algorithms for average reward stochastic dynamic programming problems are presented. During learning, the relative function is the object to be predicted by the agent. This work is an extension to and generalization of the work on previous TD (λ) methods and R-learning algorithms.
计量
- 文章访问数: 3178
- HTML全文浏览量: 180
- PDF下载量: 884
- 被引次数: 0