-
摘要: An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimal generator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria. According to the relations of their performance measures and performance potentials, the optimization of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamic programming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performance error bound is shown as there are approximate error and improvement error in each iteration step. The obtained results may be extended to Markov systems, and have much applicability. Finally, a numerical example is provided.Abstract: An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimal generator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria. According to the relations of their performance measures and performance potentials, the optimization of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamic programming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performance error bound is shown as there are approximate error and improvement error in each iteration step. The obtained results may be extended to Markov systems, and have much applicability. Finally, a numerical example is provided.
计量
- 文章访问数: 2621
- HTML全文浏览量: 71
- PDF下载量: 1604
- 被引次数: 0