Q学习算法在库存控制中的应用
Inventory Control Using Q-Learning
-
摘要: Q学习算法是Watkins提出的求解信息不完全马尔可夫决策问题的一种强化学习 方法.这里提出了一种新的探索策略,并将该策略和Q学习算法有效结合来求解一类典型的 有连续状态和决策空间的库存控制问题.仿真表明,该方法所求解的控制策略和用值迭代法 在模型已知的情况下所求得的最优策略非常逼近,从而证实了Q学习算法在一些系统模型 未知的工程控制问题中的应用潜力.Abstract: Q-learning is a reinforcement learning method to solve Markovian decision problems with incomplete information. In this paper, we present a novel exploration strategy and use Q-learning method with this strategy to solve a typical inventory control problem with continuous state and decision space. Simulation results are included to show that the optimal policy given by Q-learning can well approximate to the accurate one.
计量
- 文章访问数: 3050
- HTML全文浏览量: 134
- PDF下载量: 1483
- 被引次数: 0