A Simulation Optimization Algorithm for CTMDPs Based on Randomized Stationary Policies
-
摘要: 基于Markov性能势理论和神经元动态规划(NDP)方法,研究一类连续时间Markov决 策过程(MDP)在随机平稳策略下的仿真优化问题,给出的算法是把一个连续时间过程转换成其 一致化Markov链,然后通过其单个样本轨道来估计平均代价性能指标关于策略参数的梯度,以 寻找次优策略,该方法适合于解决大状态空间系统的性能优化问题.并给出了一个受控Markov 过程的数值实例.Abstract: Based on the theory of Markov performance potentials and neuro-dynamic programming (NDP) methodology, we study simulation optimization algorithm for a class of continuous time Markov decision processes (CTMDPs) under randomized stationary policies. The proposed algorithm will estimate the gradient of average cost performance measure with respect to policy parameters by transforming a ccntinuous time Markov process into a uniform Markov chain and simulating a single sample path of the chain. The goal is to look for a suboptimal randomized stationary pohcy. The algorithm derived here can meet the needs of periormance optimization of many difficult systems with large-scale state space. Finally,a numerical example for a controlled Markov process is provided.
计量
- 文章访问数: 2348
- HTML全文浏览量: 142
- PDF下载量: 943
- 被引次数: 0