[1]
|
Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998[2] Gao Yang, Chen Shi-Fu, Lu Xin. Research on reinforcement learning technology: a review. Acta Automatica Sinica, 2004, 30(1): 86-100(高阳, 陈世富, 陆鑫. 强化学习研究综述. 自动化学报, 2004, 30(1): 86-100)[3] Zhao Dong-Bin, Liu De-Rong, Yi Jian-Qiang. An overview on the adaptive dynamic programming based urban city traffic signal optimal control. Acta Automatica Sinica, 2009, 35(6): 676-681(赵冬斌, 刘德荣, 易建强. 基于自适应动态规划的城市交通信号优化控制方法综述. 自动化学报, 2009, 35(6): 676-681)[4] Barto A G, Mahadevan S. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 2003, 13(4): 341-379[5] Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345-1359[6] Taylor M E, Stone P. Transfer learning for reinforcement learning domains: a survey. The Journal of Machine Learning Research, 2009, 10: 1633-1685[7] Wang Hao, Gao Yang, Cheng Xing-Guo. Transfer of reinforcement learning: the state of the art. Acta Electronica Sinica, 2008, 36(12a): 39-43(王皓, 高阳, 陈兴国. 强化学习中的迁移: 方法和进展. 电子学报, 2008, 36(12a): 39-43)[8] Mahadevan S, Maggioni M. Proto-value functions: a Laplacian framework for learning representation and control in Markov decision processes. The Journal of Machine Learning Research, 2007, 8: 2169-2231[9] Chiu C C, Soo V W. Automatic complexity reduction in reinforcement learning. Computational Intelligence, 2010, 26(1): 1-25[10] Simsek O, Wolfe A P, Barto A G. Identifying useful subgoals in reinforcement learning by local graph partitioning. In: Proceedings of the 22nd International Conference on Machine Learning. New York, USA: ACM, 2005. 816-823[11] Ferguson K, Mahadevan S. Proto-transfer Learning in Markov Decision Processes Using Spectral Methods, Technical Report, University Massachusetts, Amherst, 2008[12] Luo Si-Wei, Zhao Lian-Wei. Manifold learning algorithms based on spectral graph theory. Journal of Computer Research and Development, 2006, 43(7): 1174-1179(罗四维, 赵连伟. 基于谱图理论的流形学习算法. 计算机研究与发展, 2006, 43(7): 1174-1179)[13] Shi J B, Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888-905[14] Lagoudakis M G, Parr R. Least-squares policy iteration. Journal of Machine Learning Research, 2003, 4(12): 1107-1149[15] Wang Xue-Song, Tian Xi-Lan, Cheng Yu-Hu, Yi Jian-Qiang. Q-learning system based on cooperative least squares support vector machine. Acta Automatica Sinica, 2009, 35(2): 214-219(王雪松, 田西兰, 程玉虎, 易建强. 基于协同最小二乘支持向量机的Q学习. 自动化学报, 2009, 35(2): 214-219)[16] Xu X, Hu D W, Lu X C. Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks, 2007, 18(4): 973-992[17] Chung F R K. Spectral Graph Theory. United States: American Mathematical Society, 1996[18] Sutton R S, Precup D, Singh S. Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 1999, 112(1-2): 181-211
|