陈兴国 俞扬

CHEN Xing-Guo, YU Yang. Reinforcement Learning and Its Application to the Game of Go. ACTA AUTOMATICA SINICA, 2016, 42(5): 685-695. doi: 10.16383/j.aas.2016.y000003
国家自然科学基金 61403208, 61375061

南京邮电大学引进人才科研启动基金 NY214014


    陈兴国 南京邮电大学计算机学院/软件学院讲师. 2014年获得南京大学计算机系博士学位.主要研究方向为机器学习,强化学习. E-mail: chenxg@njupt.edu.cn


    俞扬 南京大学计算机系副教授,2011年获得南京大学计算机系博士学位.主要研究方向为机器学习,演化学习,强化学习.本文通信作者.E-mail:yuy@nju.edu.cn.

Reinforcement Learning and Its Application to the Game of Go


National Natural Science Foundation of China 61403208, 61375061

Science Foundation of Nanjing University of Posts and Telecommunications NY214014

    Author Bio:

    Lecturer at the School of Computer Science & Technology and the School of Software, Nanjing University of Posts and Telecommunications. He received his Ph. D. degree from Nanjing University. His research interest covers machine learning and reinforcement learning. E-mail:

    Corresponding author: YU Yang Associate professor in the Department of Computer Science and Technology, Nanjing University. He received his Ph. D. degree from the Department of Computer Science and Technology, Nanjing University in 2011. His re- search interest covers machine learning, evolutionary learning, reinforcement learning. Corresponding author of this paper.E-mail:yuy@nju.edu.cn.
  • 摘要: 强化学习是一类特殊的机器学习, 通过与所在环境的自主交互来学习决策策略, 使得策略收到的长期累积奖赏最大. 最近, 在围棋和电子游戏等领域, 强化学习被成功用于取得人类水平的操作能力, 受到了广泛关注. 本文将对强化学习进行简要介绍, 重点介绍基于函数近似的强化学习方法, 以及在围棋等领域中的应用.
  • 图  1  强化学习

    Fig.  1  Illustration of reinforcement learning

    图  2  广义策略迭代:值函数与策略交互直到最优

    Fig.  2  General value iteration: iterative update between the value function and the policy until convergence

    图  3  蒙特卡洛树搜索

    Fig.  3  Monte-Carlo tree search

    表  1  经典算法中值函数更新公式的区别与联系

    Table  1  Updating formulas in classical reinforcement learning

    算法fωt ω'
    Q学习Q(st,at)({s_{t+1}},\arg \mathop {\max }\limits_{a' \in A} Q({s_{t+1}},a'))
    下载: 导出CSV
