2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

SMDP基于性能势的神经元动态规划

唐昊 袁继彬 陆阳 程文娟

唐昊, 袁继彬, 陆阳, 程文娟. SMDP基于性能势的神经元动态规划. 自动化学报, 2005, 31(4): 642-645.
引用本文: 唐昊, 袁继彬, 陆阳, 程文娟. SMDP基于性能势的神经元动态规划. 自动化学报, 2005, 31(4): 642-645.
TAN Hao, YUAN Ji-Bin, LU Yang, CHENG Wen-Juan. Performance Potential-based Neuro-dynamic Programming for SMDPs. ACTA AUTOMATICA SINICA, 2005, 31(4): 642-645.
Citation: TAN Hao, YUAN Ji-Bin, LU Yang, CHENG Wen-Juan. Performance Potential-based Neuro-dynamic Programming for SMDPs. ACTA AUTOMATICA SINICA, 2005, 31(4): 642-645.

SMDP基于性能势的神经元动态规划

详细信息
    通讯作者:

    唐昊

Performance Potential-based Neuro-dynamic Programming for SMDPs

More Information
    Corresponding author: TAN Hao
  • 摘要: An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimal generator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria. According to the relations of their performance measures and performance potentials, the optimization of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamic programming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performance error bound is shown as there are approximate error and improvement error in each iteration step. The obtained results may be extended to Markov systems, and have much applicability. Finally, a numerical example is provided.
  • 加载中
计量
  • 文章访问数:  2620
  • HTML全文浏览量:  71
  • PDF下载量:  1604
  • 被引次数: 0
出版历程
  • 收稿日期:  2004-01-18
  • 修回日期:  2004-07-14
  • 刊出日期:  2005-07-20

目录

    /

    返回文章
    返回