2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于状态聚类的多站点CSPS系统的协同控制方法

唐昊 裴荣 周雷 谭琦

唐昊, 裴荣, 周雷, 谭琦. 基于状态聚类的多站点CSPS系统的协同控制方法. 自动化学报, 2014, 40(5): 901-908. doi: 10.3724/SP.J.1004.2014.00901
引用本文: 唐昊, 裴荣, 周雷, 谭琦. 基于状态聚类的多站点CSPS系统的协同控制方法. 自动化学报, 2014, 40(5): 901-908. doi: 10.3724/SP.J.1004.2014.00901
TANG Hao, PEI Rong, ZHOU Lei, TAN Qi. Coordinate Control of Multiple CSPS System Based on State Aggregation Method. ACTA AUTOMATICA SINICA, 2014, 40(5): 901-908. doi: 10.3724/SP.J.1004.2014.00901
Citation: TANG Hao, PEI Rong, ZHOU Lei, TAN Qi. Coordinate Control of Multiple CSPS System Based on State Aggregation Method. ACTA AUTOMATICA SINICA, 2014, 40(5): 901-908. doi: 10.3724/SP.J.1004.2014.00901

基于状态聚类的多站点CSPS系统的协同控制方法

doi: 10.3724/SP.J.1004.2014.00901
基金项目: 

国家自然科学基金(61174186,71231004),国家国际科技合作项目(2011FA10440),教育部新世纪优秀人才计划项目(NCET-11-0626),高等学校博士学科点专项科研基金(20130111110007)资助

详细信息
    作者简介:

    裴荣 合肥工业大学计算机与信息学院硕士研究生. 2010 年获得合肥工业大学计算机与信息学院学士学位. 主要研究方向为强化学习,生产线优化.

Coordinate Control of Multiple CSPS System Based on State Aggregation Method

Funds: 

Supported by National Natural Science Foundation of China (61174186, 71231004), the International Science and Technology Cooperation Program of China (2011FA10440), Program for New Century Excellent Talents in University (NCET-11-0626), and Specialized Research Fund for the Doctoral Program of Higher Education (20130111110007)

  • 摘要: 单站点传送带给料加工站(Conveyor-serviced production station,CSPS)系统中,可运用强化学习对状态——行动空间进行有效探索,以搜索近似最优的前视距离控制策略.但是多站点CSPS系统的协同控制问题中,系统状态空间的大小会随着站点个数的增加和缓存库容量的增加而成指数形式(或几何级数)增长,从而导致维数灾,影响学习算法的收敛速度和优化效果.为此,本文在站点局域信息交互机制的基础上引入状态聚类的方法,以减小每个站点学习空间的大小和复杂性.首先,将多个站点看作相对独立的学习主体,且各自仅考虑邻近下游站点的缓存库的状态并纳入其性能值学习过程;其次,将原状态空间划分成多个不相交的子集,每个子集用一个抽象状态表示,然后,建立基于状态聚类的多站点反馈式Q学习算法.通过该方法,可在抽象状态空间上对各站点的前视距离策略进行优化学习,以寻求整个系统的生产率最大.仿真实验结果说明,与一般的多站点反馈式Q学习方法相比,基于状态聚类的多站点反馈式Q学习方法不仅具有收敛速度快的优点,而且还在一定程度上提高了系统生产率.
  • [1] Matsui M. A generalized model of convey-serviced production station (CSPS). Journal of Japan Industrial Management Association, 1993, 44(1): 25-32
    [2] Matsui M. CSPS model: look-ahead controls and physics. International Journal of Production Research, 2005, 43(10): 2001-2025
    [3] Hao T, Tamio A. Look-ahead control of conveyor-serviced production station by using potential-based online policy iteration. International Journal of Control, 2009, 82(10): 1917-1928
    [4] Yamada T, Satomi K, Matsui M. Strategic selection of assembly systems under viable demands. Assembly Automation, 2006, 26(4): 335-342
    [5] Nakase N, Yamada T, Matsui M. A management design approach to a simple flexible assembly system. International Journal of Production Economics, 2002, 76(3): 281-292
    [6] Feyzbakhsh S A, Matsui M. Adam-eve-like genetic algorithm: a methodology for optimal design of a simple flexible assembly system. Computers & Industrial Engineering, 1999, 36(2): 233-258
    [7] Tang Hao, Wan Hai-Feng, Han Jiang-Hong, Zhou Lei. Coordinated look-ahead control of multiple CSPS system by multi-agent reinforcement learning. Acta Automatica Sinica, 2010, 36(2): 289-296(唐昊, 万海峰, 韩江洪, 周雷. 基于多Agent 强化学习的多站点CSPS 系统的协作 Look-ahead 控制. 自动化学报, 2010, 36(2): 289-296)
    [8] Yan Q C, Liu Q, Hu D J. A hierarchical reinforcement learning algorithm based on heuristic reward function. In: Proceedings of the 2nd IEEE International Conference on Advanced Computer Control. Shenyang, China: IEEE, 2010. 371-376
    [9] Botvinick M M. Hierarchical reinforcement learning and decision making. Current Opinion in Neurobiology, 2012, 22(6): 956-962
    [10] Jia Q S. Event-based optimization with lagged state information. In: Proceedings of the 31st Chinese Control Conference. Hefei, China: IEEE, 2012. 2055-2060
    [11] Yuasa H, Ito M. Self-organizing system theory by use of reaction-diffusion equation on a graph with boundary. In: Proceedings of the 1999 IEEE International Conference on Systems, Man, and Cybernetics. Tokyo, Japan: IEEE, 1999. 211-216
    [12] Wright R, Lin S. Evolutionary tile coding: an automated state abstraction algorithm for reinforcement learning. In: Proceedings of the the 2010 Abstraction, Reformulation, and Approximation. Atlanta, Georgia, USA: the Association for the Advancement of Artificial Intelligence Workshops, 2010
    [13] Li L H, Walsh T J, Littman M L. Towards a unified theory of state abstraction for MDPs. In: Proceedings of the 9th International Symposium on Artificial Intelligence and Mathematics. Fort Lauderdale, Florida, USA: Kluwer AcademicPublishers, 2006. 531-539
    [14] Singh S P, Jaakkola T, Jordan M I. Reinforcement learning with soft state aggregation. In: Proceedings of the 1995 Conference on Neural Information Processing Systems. Denver, CO, USA: MIT, 1995. 361-368
    [15] Gunady M K, Gomaa W. Reinforcement learning generalization using state aggregation with a maze-solving problem. In: Proceedings of the 2012 Japan-Egypt Conference on Electronics, Communication and Computers. Alexandria, Egypt: IEEE, 2012. 157-162
    [16] Cao X R. Semi-Markov decision problems and performance sensitivity analysis. IEEE Transaction on Automatic Control, 2003, 48(5): 758-769
  • 加载中
计量
  • 文章访问数:  1968
  • HTML全文浏览量:  42
  • PDF下载量:  1040
  • 被引次数: 0
出版历程
  • 收稿日期:  2013-01-28
  • 修回日期:  2013-05-11
  • 刊出日期:  2014-05-20

目录

    /

    返回文章
    返回