可控碼尔可夫鏈的一种最优决策
AN OPTIMAL POLICY FOR CONTROLLING THE CONTROLLABLE MARKOV CHAINS
-
摘要: 本文研究了一种最优马尔可夫控制系统,这种控制系统以统计规律依赖于决定序列的马尔可夫链描述.我们称决定序列为决策.存在一具有下述性质的目标状态,一日系统到达此状态,状态就不再改变.我们的目的是要选取一决策,使所有从每一初始状态出发最终到达此目标状态的概率都达到最大.我们先提出在平稳决策集合中求最优决策的决策迭代法.然后证明,此决策在包含平稳及不平稳决策的决策集合上也是最优的.Abstract: This paper is concerned with one type of the optimal Markov controlled systems. The controlled system is described by a Markov chain whose statistical property depends on the sequence of decisions that we call a policy. There exists an objective state with the property that once the system reaches this state, it remains unchanged forever. Our purpose is to choose a policy which maximizes all the probabilities that the system ever reaches this objective state from every initial state. First we give a policy-iteration method for obtaining an optimal policy over the set of stable policies. We then prove such a policy is also optimal over the set containing both stable and unstable policies.
计量
- 文章访问数: 1947
- HTML全文浏览量: 76
- PDF下载量: 510
- 被引次数: 0