SMDP基于性能势的神经元动态规划

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

SMDP基于性能势的神经元动态规划

唐昊, 袁继彬, 陆阳, 程文娟

文章导航 > 自动化学报 > 2005 > 31(4): 642-645

王大海. 观测器回路的最优条件. 自动化学报, 1985, 11(增刊2): 197-203.

引用本文:

唐昊, 袁继彬, 陆阳, 程文娟. SMDP基于性能势的神经元动态规划. 自动化学报, 2005, 31(4): 642-645.

Wang Dahai. Optimality Conditions for Observer Loop. ACTA AUTOMATICA SINICA, 1985, 11(增刊2): 197-203.

Citation:

TAN Hao, YUAN Ji-Bin, LU Yang, CHENG Wen-Juan. Performance Potential-based Neuro-dynamic Programming for SMDPs. ACTA AUTOMATICA SINICA, 2005, 31(4): 642-645.

王大海. 观测器回路的最优条件. 自动化学报, 1985, 11(增刊2): 197-203.

引用本文:

唐昊, 袁继彬, 陆阳, 程文娟. SMDP基于性能势的神经元动态规划. 自动化学报, 2005, 31(4): 642-645.

Wang Dahai. Optimality Conditions for Observer Loop. ACTA AUTOMATICA SINICA, 1985, 11(增刊2): 197-203.

Citation:

TAN Hao, YUAN Ji-Bin, LU Yang, CHENG Wen-Juan. Performance Potential-based Neuro-dynamic Programming for SMDPs. ACTA AUTOMATICA SINICA, 2005, 31(4): 642-645.

SMDP基于性能势的神经元动态规划

1.
School of Computer and Information, Hefei University of Technology, Hefei 230009

通讯作者:
唐昊

计量
- 文章访问数: 2631
- HTML全文浏览量: 72
- PDF下载量: 1615
- 被引次数: 0
出版历程
- 收稿日期: 2004-01-18
- 修回日期: 2004-07-14
- 刊出日期: 2005-07-20

Performance Potential-based Neuro-dynamic Programming for SMDPs

1.
School of Computer and Information, Hefei University of Technology, Hefei 230009

More Information

Corresponding author: TAN Hao

摘要: An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimal generator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria. According to the relations of their performance measures and performance potentials, the optimization of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamic programming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performance error bound is shown as there are approximate error and improvement error in each iteration step. The obtained results may be extended to Markov systems, and have much applicability. Finally, a numerical example is provided.
- Semi-Markov decision processes /
- performance potentials /
- neuro-dynamic programming
Abstract: An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimal generator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria. According to the relations of their performance measures and performance potentials, the optimization of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamic programming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performance error bound is shown as there are approximate error and improvement error in each iteration step. The obtained results may be extended to Markov systems, and have much applicability. Finally, a numerical example is provided.
- Semi-Markov decision processes /
- performance potentials /
- neuro-dynamic programming

参考文献(0)

资源附件(0)

计量

文章访问数: 2631
HTML全文浏览量: 72
PDF下载量: 1615
被引次数: 0

/

下载: 全尺寸图片幻灯片

分享

用微信扫码二维码

分享至好友和朋友圈

返回

版权所有 © 《自动化学报》编辑部京ICP备14019135号-6

地址：北京中关村东路95号邮政编码：100190E-mail：aas_editor@ia.ac.cn

电话：010-82544677 (日常咨询和稿件处理)，010-82544653(费用管理、寄刊)

本系统由北京仁和汇智信息技术有限公司开发技术支持： info@rhhz.net