机器人操作技能模型综述

秦方博; 徐德

doi:10.16383/j.aas.c180836

机器人操作技能模型综述

doi: 10.16383/j.aas.c180836

秦方博^1,2,,
徐德^1,2, ,

1.
中国科学院自动化研究所精密感知与控制研究中心北京 100190
2.
中国科学院大学人工智能学院北京 101408

基金项目:

国家自然科学基金 61873266

国家自然科学基金 61733004

国家重点研究发展计划 2018YFD0400902

详细信息

作者简介:
秦方博中国科学院自动化研究所博士研究生.2013年获得北京交通大学电子信息工程学院学士学位.主要研究方向为机器人视觉感知与控制, 精密装配.E-mail:qinfangbo2013@ia.ac.cn

通讯作者:
徐德中国科学院自动化研究所研究员.于1985年和1990年获得山东工业大学学士和硕士学位, 2001年获得浙江大学博士学位.主要研究方向为机器人视觉测量, 视觉控制, 智能控制, 视觉定位, 显微视觉, 微装配.本文通信作者.E-mail:de.xu@ia.ac.cn

计量
- 文章访问数: 3085
- HTML全文浏览量: 1288
- PDF下载量: 511
- 被引次数: 16
出版历程
- 收稿日期: 2018-12-17
- 录用日期: 2019-03-19
- 刊出日期: 2019-08-20

Review of Robot Manipulation Skill Models

QIN Fang-Bo^{1,2
,},
XU De^{1,2
, ,}

1.
Research Center of Precision Sensing and Control, Institute of Automation, Chinese Academy of Sciences, Beijing 100190
2.
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 101408

Funds:

National Natural Science Foundation of China 61873266

National Natural Science Foundation of China 61733004

National Key Research and Development Program of China 2018YFD0400902

More Information

Author Bio:
Ph. D. candidate at the Institute of Automation, Chinese Academy of Sciences. He received his bachelor degree from the School of Electronic and Information Engineering, Beijing Jiaotong University in 2013. His research interest covers robot vision based perception and control, and precision assembly

Corresponding author: XU De Professor at the Institute of Automation, Chinese Academy of Sciences. He received his bachelor and master degrees from Shandong University of Technology in 1985 and 1990, respectively, and received his Ph. D. degree from Zhejiang University in 2001. His research interest covers robotics and automation such as visual measurement, visual control, intelligent control, visual positioning, microscopic vision, and microassembly. Corresponding author of this paper

摘要

摘要: 机器人技能学习是人工智能与机器人学的交叉领域，目的是使机器人通过与环境和用户的交互得到经验数据，基于示教学习或强化学习，从经验数据中自主获取和优化技能，并应用于以后的相关任务中.技能学习使机器人的任务部署更加灵活快捷和用户友好，而且可以让机器人具有自我优化的能力.技能模型是技能学习的基础和前提，决定了技能效果的上限.日益复杂和多样的机器人操作任务，对技能操作模型的设计实现带来了很多挑战.本文给出了技能操作模型的概念与性质，阐述了流程、运动、策略和效果预测四种技能表达模式，并对其典型应用和未来趋势做出了概括.
- 机器人操作 /
- 智能机器人 /
- 机器人学习 /
- 技能学习 /
- 自主系统
Abstract: Robot skill learning is an interdisciplinary field of artificial intelligence and robotics. The aim is to enable that a robot autonomously obtains a certain manipulation skill via imitation learning or reinforcement learning, based on the experience data generated by the robot's interaction with the environment or human users. Skill learning can make the robot task deployments more flexible, efficient and user-friendly, even realize the sustainable self-improvement of robot. Robot skill model is the foundation of skill learning, which determines upper limit of the skill outcome. The complexity and variety of the modern robot manipulation tasks pose many challenges to the robot skill modeling. This paper describes the definition and characteristics of robot manipulation skill model, introduces the four skill representation methods considering procedure, motion, policy and outcome, respectively. Finally, the typical application and future tendency of robot manipulation skill models are presented.
- Robot manipulation /
- intelligent robot /
- robot learning /
- skill learning /
- autonomous system
注释:

1) 本文责任编委贺威

HTML全文

稀土是由镧系元素, 钪和钇等17种元素组成, 且以共生矿形式存在.稀土元素素有"工业维生素"的美称, 已成为极其重要的战略资源.目前我国稀土萃取分离工艺技术已达到世界先进水平^[1], 但稀土工业生产自动化水平普遍较低, 仍停留在"离线分析、手工调整、经验控制"状态, 导致企业生产效率低、资源消耗大、产品质量不稳定, 成为制约稀土工业发展的瓶颈.因此, 部分学者对稀土萃取过程建模与控制进行了大量研究, 并取得了一定成果.公锡泰等^[2]在稀土萃取过程的测控敏感点安装在线分析仪, 并在"四出口"萃取生产线上开展自动反馈控制试验, 取得了较好的结果.但在线分析仪因频繁堵塞, 工作人员保养习惯等原因导致难以长时间使用. Giles等^[3]提出了基于人工神经网络的稀土萃取控制模型.贾文君等^[4]考虑稀土萃取过程的多段动态特性的基础上, 建立了一种具有状态滞后的双线性动态模型.该模型忽略了级间相互作用, 损失了部分动态特性, 难以实现萃取过程流量自动控制, 现场仍采用操作员手动调节模式.黄桂文^[5]和Jia等^[6]研究了利用稀土离子的特征颜色进行串级萃取工艺控制的可行性, 并总结归纳了萃取剂、料液、洗涤液流量调节的经验公式.

Chai等^[7]和杨辉等^[8]提出了具有两层体系结构的基于案例推理技术的稀土萃取过程优化控制系统.该方法采用案例推理技术实现上层优化器的功能, 对底层回路值进行预设定, 并利用元素组分含量的离线化验值和预测值对预设定值进行修正, 底层流量回路采用PID控制器实现闭环控制.但是上述方法的优化层只采用了案例推理技术. Yang等^[9]提出了稀土萃取过程组分含量分布控制方法, 首先根据每一级建立渐进阶梯式模型, 结合质量平衡模型和监测级组分含量变化趋势, 通过动态补偿萃取剂和洗涤剂流量, 使两端出口产品满足纯度要求, 该方法忽略了较多实际因素, 难以应用于生产实践.针对稀土萃取过程非线性和动态特性, 杨辉等^[10]采用自适应神经模糊推理系统(ANFIS)对CePr/Nd萃取过程进行描述, 运用广义预测控制方法(GPC)实现各控制流量的优化控制.文献[11]提出一种稀土萃取过程多模型建模和组分含量预测控制方法, 基于组分含量预测值对CePr/Nd萃取过程中萃取剂和洗涤剂流量进行动态补偿, 以保证两端出口产品的纯度.但一旦工作环境发生变化, 给定参数的PID控制器无法自适应调整, 不能达到最优效果.很明显, 目前还没有较好的稀土萃取过程萃取量和洗涤量控制方法.

针对CePr/Nd稀土萃取过程入矿条件各参数的重要性程度不一致特点, 本文首先提出基于特征属性加权的案例推理预设定模型以实现对萃取剂/洗涤剂的流量预设定, 然后针对镨/钕稀土萃取过程的稀土溶液颜色与组分含量密切相关的特点^[12], 利用机器视觉技术获得稀土溶液颜色值, 建立基于LS-SVM的组分含量软测量模型, 在此基础上, 采用基于模糊的智能动态补偿模型补偿萃取剂和洗涤剂流量设定值.

1. 工艺描述与优化控制框架

1.1 工艺描述

由于分馏萃取可同时得到二个高纯度、高收率的产品, 且单级萃取不能达到有效的分离效果, 因此稀土工业生产普遍采用串级分馏萃取方式(若干级萃取槽串联起来), 使含料水相与有机相多次接触, 并从混合稀土溶液中分离、富集、提取所需具有一定纯度和收率的单一稀土元素. CePr/Nd稀土萃取分离工艺如图 1所示.按照Ce、Pr和Nd分离元素的化学特性, 该萃取过程将Ce和Pr设为难萃组分$ B $, 将Nd设为易萃组分$ A $.在稀土元素Pr和Nd位置间切割, 料液进料为有机相进料方式.整个稀土萃取生产线包括$ {n} $级萃取段和$ m $级洗涤段共$ {n+m} $级萃取槽组成, 每级萃取槽上面溶液为有机相下面溶液为水相.该生产工艺流程第1级萃取槽加入萃取剂$ u_1 $, 第$ {n} $级萃取槽加入料液$ u_3 $; 第$ {n+m} $级萃取槽加入洗涤剂$ u_2 $.由于萃取槽体独特的结构设计和搅拌力的作用, 使得各级萃取槽中有机相从左向右顺向流动、水相从右向左逆向流动.稀土溶液在萃取剂和洗涤剂的作用下经过萃取段和洗涤段各级萃取槽的交换和纯化, 最后从萃取段第1级水相出口得到目标纯度为$ {Y_B} $的难萃$ B $产品, 洗涤段第$ {n+m} $级有机相出口得到目标纯度为$ {Y_A} $的易萃$ A $产品.

图 1 CePr/Nd萃取生产过程工艺流程

Fig. 1 CePr/Nd extraction process

下载: 全尺寸图片幻灯片

1.2 萃取量和洗涤量控制方案分析

该萃取工艺包含料液、萃取剂和洗涤剂等进料, 水相、有机相等出料, 料液指混合稀土溶液, 萃取剂用于控制有机相出口易萃组分的纯度(即组分含量), 洗涤剂(酸液)用于控制水相出口难萃组分的纯度.

CePr/Nd稀土萃取过程的控制目标是指产量、有机相出口易萃稀土元素$ A $纯度和水相出口难萃稀土元素$ B $纯度等指标满足生产要求.稀土萃取过程的入矿条件及扰动量包括萃取流程原料处理量、料液稀土元素配分、皂化度、萃取剂浓度、料液浓度和洗涤剂浓度等工艺参数.其控制变量包括稀土萃取过程添加的萃取剂、洗涤剂、料液等流量值(简称萃取量、洗涤量、料液量).当稀土萃取生产工艺产量指标不变时, 料液的流量设定值一般不作调整, 即料液流量与稀土处理量指标直接关联, 一般不作为工业稀土萃取生产过程的调整量.

稀土萃取生产过程对关键工艺参量稀土元素组分含量缺乏有效的在线检测手段; 且萃取机理复杂、工艺流程长, 难以建立精确的萃取量(洗涤量)与稀土元素组分含量的关系数学模型, 并依据组分含量实现萃取量(洗涤量)优化控制.因此, 实际稀土萃取工业生产过程中工作人员一般根据萃取流程原料处理能力、料液稀土元素配分、皂化度、萃取剂浓度、料液浓度和酸液浓度等入矿条件预先设定萃取量和洗涤量, 然后当入矿条件或扰动量波动导致萃取溶液的元素组分含量变化时, 工作人员再根据CePr/Nd萃取溶液颜色或离线化验值凭借操作经验调整萃取量和洗涤量.例如, 当工况异常导致难萃组分$ B $纯度的化验值或易萃组分$ A $纯度的化验值低于目标值时, 操作人员需根据组分含量化验值调整萃取剂和洗涤剂的流量设定值.但因为萃取工艺流程长、离线化验流程(一班组等隔化验2次)等因素导致化验延时较长, 操作人员难以及时有效地调整萃取量和洗涤量, 导致CePr/Nd萃取工艺生产效率较差.当工况变化频繁时, 操作人员不能及时准确地判断工况和调整萃取量(洗涤量)设定值, 将会使控制系统性能变坏甚至瘫痪, 常常造成故障工况^[13-14].

根据稀土萃取工业生产操作实践经验可知, 操作人员首先根据入矿条件预先设定萃取剂和洗涤剂流量值, 在萃取生产过程中再根据稀土元素组分含量的变化对流量设定值动态微调补偿, 使萃取生产稳定运行.由于生产工艺指标稀土元素组分含量与底层控制回路的萃取量和洗涤量设定值密切相关, 它们间的静态、动态特性难以用精确模型描述.为此, 本文结合调节控制和过程运行优化思想, 采用两层结构的实时优化过程模型^[15-16], 上层采用非线性专家经验模型优化基础控制回路的萃取量(洗涤量)设定值, 下层通过PID基础控制系统跟踪设定值, 尽可能使过程运行在经济优化状态.由于PID控制能够实时有效地调节萃取剂和洗涤剂流量在给定值上, 因此本文重点研究上层优化结构模型, 该模型首先模拟操作人员专家行为, 采用案例推理方法预设定萃取剂和洗涤剂流量值, 其次, 在萃取生产过程中根据稀土元素组分含量变化采用模糊推理技术对预设定值动态微调补偿, 从而实时优化萃取量和洗涤量设定值.

综上所述, 本文提出一种案例推理静态设定和模糊推理动态补偿相结合的CePr/Nd萃取过程萃取量和洗涤量优化控制方法, 其总体方案结构如图 2所示, 主要包括萃取剂/洗涤剂流量预设定模型、稀土元素组分含量软测量模型及模糊推理智能补偿模型.借鉴人工操作专家行为, 预设定模型采用案例推理方法根据稀土生产入矿条件和生产工艺指标从历史操作案例库中按照一定的匹配原则选择最接近实时工况的流量作为流量预设定值; 然后根据稀土溶液组分含量的变化动态补偿流量预设定值, 包括基于稀土溶液颜色特征的组分含量软测量模型和模糊推理智能补偿模型; 组分含量软测量模型根据机器视觉系统采集到的稀土溶液图像颜色特征软测量稀土溶液的组分含量, 模糊推理智能补偿模型根据软测量模型计算的组分含量与目标组分含量的比较值对预设定的萃取量和洗涤量动态补偿; 最终得到一个满足实时工况要求的萃取剂和洗涤剂流量设定值, 作为底层控制回路的给定值, 由PID控制器实现对萃取剂和洗涤剂流量给定值的跟随控制.此外, 为使各个模型更加准确, 离线检测到的组分含量数据也要作为各个模型数据库更新、校正的参考数据源.

图 2 萃取量和洗涤量优化控制方案

Fig. 2 Optimal control scheme for the flow rate of the extractant and the detergent

下载: 全尺寸图片幻灯片

2. 基于案例推理的药剂量预设定模型

稀土萃取实际生产过程工作人员通常凭借操作经验根据萃取生产流程原料处理量、料液稀土元素配分、皂化度、萃取剂浓度、料液浓度和酸液浓度等入矿条件预先设定萃取剂和洗涤剂流量值.案例推理(Case based reasoning, CBR)是借鉴人类思维方式对问题进行处理的人工智能推理方法, 并从案例中获取专家知识用于解决新问题.基于此, 采用案例推理技术预设定稀土萃取过程萃取剂/洗涤剂的最初值.

基于CBR技术的分类器主要根据已知类别中的历史案例来判断一个新案例属于哪个类.分类精度是评价案例推理技术的一个重要指标, 案例检索、案例重用、案例更新与删除都对分类精度有着重要的影响.由于每个入矿条件的重要程度不一致, 所以利用CBR检索时需要考虑给每个入矿条件属性分配不同的权重, 显然如何确定属性的权重显得尤为重要.此外, 由于实际稀土萃取生产每隔2分钟就要检索案例并产生新的案例, 如果没有合适的案例更新及退出机制, 则案例库中的案例会急剧膨胀, 导致案例检索的精度下降, 因此本文主要针对这两方面问题进行设计.

1) 案例构造

通过对稀土萃取生产过程工艺机理和过程工况数据的分析表明, 稀土料液配分$ {Y_A} $、料液浓度$ {C_F} $、萃取剂浓度$ {C_S} $、洗涤剂浓度$ {C_W} $、料液处理量$ {G} $、产品纯度指标$ {P_{B, 1}} $及$ {P_{A, n+m}} $等入料参数和工艺指标参数直接影响到CePr/Nd萃取过程萃取量和洗涤量的设定值.每条案例都由工况描述和案例解构成, 可以用式(1)表示:

$$ \begin{equation} {\rm{Case}}(X_k, V_k) = {\rm{Case}}((x_{1, k}, \cdots, x_{7, k}), (v_{1, k}, v_{2, k})) \end{equation} $$

(1)

其中, $ {X_k = (x_{1, k}, \cdots, x_{7, k})} $表示第$ {k} $条工况描述, $ {x_{1, k}, \cdots, x_{7, k}} $分别表示稀土料液配比$ {R_F} $、料液浓度$ {C_F} $、萃取剂浓度$ {C_S} $、洗涤剂浓度$ {C_W} $、料液处理量$ {G} $、产品纯度指标$ {P_{B, 1}} $及$ {P_{A, n+m}} $等; $ {V_k = (v_{1, k}, v_{2, k})} $表示第$ {k} $条工况描述所对应的解答, $ {v_{1, k}, v_{2, k}} $分别表示萃取量预设定值$ {V_S} $和洗涤量预设定值$ {V_W} $.

2) 案例检索、匹配与重用

CBR作为一种重要的问题求解范式, 基于"相似问题具有相似解"的认知假设, 通过检索存储于案例库中的相似案例来解决当前的新问题^[17].案例检索策略可采用最近邻检索策略, 通过计算目标案例与源案例的特征向量之间的距离衡量二者的相似性, 距离越小说明两案例间的相似度越高. KNN (K-nearest neighbor)检索策略^[18]是最近邻检索策略的扩展, 当$ {K = 1} $时就是最近邻策略; 当$ {K>1} $时, 需将历史案例库中的$ {p} $个案例与目标案例的相似度$ {S(X, X_1), \cdots, S(X, X_P)} $从大到小排列, 依次选出相似度最大的$ {K} $个近邻, 并从这$ {K} $个近邻的结果中取出最多的某类结果作为新问题的建议解答.该策略简单有效, 计算方便, 但这种策略对于噪声或者不相关的数据比较敏感, 其解决通常依赖于给案例属性分配不同的权重.因此在利用KNN计算案例之间的相似度之前, 须对工况描述属性的权重进行合理的分配^[19].

根据工况描述$ {X_k = (x_{1, k}, \cdots, x_{7, k})} $进行案例检索和匹配, 由入矿条件对稀土萃取过程的影响程度, 设定权重为$ {w_i\; (i = 1, \cdots, 7)} $.在检索过程中, 由入矿条件和工艺指标计算所有案例与该工况输入条件的相似度, 检索出满足设定阈值的所有案例.

假设案例库中共有$ {n} $条案例, 其工况描述为$ {X_k = (x_{1, k}, \cdots, x_{7, k})} $, 对应案例解为$ {V_k = (v_{1, k}, v_{2, k})} $.设目标工况为$ {X = (x_{1}, \cdots, x_{7})} $, 其对应的案例解为$ {V = (v_{1}, v_{2})} $.定义单个工况输入条件$ {X_k} $和$ {X} $间的相似度函数为:

$$ \begin{align} S(X, X_k) = &1-\left(\sum\limits_{i = 1}^m {{w_i}({x_i-x_{i, k})^2}}\right)^{\frac{1}{2}}, \\& k = 1, 2, \cdots, n \end{align} $$

(2)

其中, $ {w_i} $是第$ {i} $个特征属性的权重, 满足$ {\sum\nolimits_{i = 1}^m{w_i = 1}} $.将案例库中与新工况输入条件的相似度$ {S(X, X_k)} $大于或等于阈值的所有历史案例都检测出来作为候选案例.

假设系统检索出大于相似度阈值的$ {l\; (l<n)} $个候选案例, 其相似度值分别为$ {s_1, \cdots, s_l} $, 对应的案例解为$ {V_k = (v_{1, k}, v_{2, k})\; (k = 1, \cdots, l)} $, 则当前新工况的案例解按下式计算:

$$ \begin{equation} \left\{ \begin{array}{l} v_1 = \dfrac{\sum\limits_{k = 1}^l{s_k}\times{v_{1, k}}}{\sum\limits_{k = 1}^l{s_k}}\\ v_2 = \dfrac{\sum\limits_{k = 1}^l{s_k}\times{v_{2, k}}}{\sum\limits_{k = 1}^l{s_k}} \end{array} \right. \end{equation} $$

(3)

属性权重优化的方法较多^[20-21], 由于萃取工艺生产过程的复杂多样性, 本文采用内省学习方法调整式(2)的属性权重, 根据两个案例相同属性的描述值, 需将案例属性分为匹配属性和不匹配属性, 给出其定义^[22].

定义1. 设历史案例库中的两案例$ C_I = (X_I;Y_I) $和$ {C_{II} = (X_{II};Y_{II})} $, 其中, $ X_I = (x_{1, I}, x_{2, I}, \cdots, $ $ x_{m, I}) $, $ X_{II} = (x_{1, II}, x_{2, II}, \cdots, $ $ x_{m, II}) $, 若满足$ 0\leq $ $ \mid\! x_{i, I}-x_{i, II}\!\mid\leq \xi_{\rm match} $, 则称案例$ {C_I} $与案例$ {C_{II}} $的第$ {i} $个属性为匹配属性, 否则, 称为不匹配属性.其中, $ {\xi_{\rm match}} $为判断属性是否匹配的阈值, 表示两个案例同一属性的接近程度.

基于内省学习原理, 采用如下权重学习策略规则.

规则1. 存在一条案例分类成功时, 匹配属性的权重增加, 同时不匹配属性的权重减少:当检索到的案例$ A $具有和目标案例$ C $相同的类别时(分类成功), 增大案例$ A $中与案例$ C $匹配的属性的权重, 同时减小案例$ A $中与案例$ C $不匹配的属性的权重.如此, $ A $与$ C $的相似度将增大, 保证在求解案例$ C $时, 相似案例$ A $更有可能被检索到.

规则2. 存在一条案例分类失败时, 匹配属性的权重减少, 不匹配属性的权重增加:当检索到的案例$ B $具有和目标案例$ C $不同的类别时(分类失败), 减小案例$ B $中与案例$ C $匹配的属性的权重, 同时增大案例$ B $中与案例$ C $不匹配的属性的权重.如此, $ B $与$ C $的距离增大($ B $被"推离" $ C $), 保证在求解案例$ C $时, 案例$ B $不易被检索到.

根据上述定义和规则, 考虑稀土入矿条件和参数属性对案例属性权重的影响, 对属性权重调整如下.

权重增加:

$$ \begin{equation} w_i(t+1) = w_i(t)+\frac{\triangle_i}{m}, \quad w_i(t+1)\geq0 \end{equation} $$

(4)

权重减少:

$$ \begin{equation} w_i(t+1) = w_i(t)-\frac{\triangle_i}{m}, \quad w_i(t+1)\geq0 \end{equation} $$

(5)

其中, $ {w_i(t)} $是第$ {i} $个属性第$ {t} $次迭代的权重; $ {w_i(t+1)} $是第$ {i} $个属性第$ {t+1} $次迭代的权重; $ {\triangle_i/m} $决定权重的变化量, $ {m} $是案例属性的个数, 对于不同的分类问题, 如果属性个数不同, 即使$ {\triangle_i} $相同，权重的变化量也不相同.但是对于同一个案例库而言, $ {m} $是固定不变的, 因此权重的变化主要与$ {\triangle_i} $有关, 为实现属性权重有效调整, 且保证属性权重调整值不会超过属性权重之和, $ {\triangle_i} $在$ {(0, m)} $区间内取值, 并且需满足式(4)和式(5)更新后的权重不小于零.

当一条案例所有属性的权重都被调整之后, 为保证所有属性的权重之和为1, 需进行归一化操作:

$$ \begin{equation} w_i(t+1)' = \frac{w_i(t+1)}{\sum\limits_{i = 1}^m w_i(t+1)} \end{equation} $$

(6)

其中, $ {w_i(t+1)'} $是第$ {i} $个属性第$ {t+1} $次迭代的最终权重.

3) 案例修正、存储和删除

经案例推理得到的案例结果与新工况不相匹配时, 则需应用其他领域相关知识对案例结果进行人工修正, 或更改参数和检索规则, 使得修正后的案例解(各回路流量设定值)与新工况相匹配.如果根据当前的工况描述没有被正确地分类成案例, 则说明当前案例库中没有该工况的案例, 则需要把当前的工况及案例加入案例库.如果根据当前的工况描述被正确地分类了, 但是其相似度太小, 则说明分类信息不完善, 则需要重新更新该类工况的案例, 并使得该类工况案例的相似度小于$ {\sigma[0, 1]}$并保存.

很明显, 案例推理的案例库随生产工况变化不断增加, 不但增加了检索的难度, 而且降低检索的精确度.所以, 伴随着检索案例的增加, 同时应该删除一些已经不适合新工况的案例.

假定每一个新的案例被保存时都被赋予一个初始遗忘因子, 根据内省学习原理, 其遗忘值的更新如下^[23]:

$$ \begin{equation} F_i(t+1) = F_i(t)+\beta\times{r_i}\times{F_i(0)} \end{equation} $$

(7)

其中, $ {\beta} $为遗忘增强因子; $ {r_i} $为奖励函数, 当检索案例与新案例一致时为$ -1, $否则为1. $ {F_i(0)} $为案例的初始遗忘值, $ {F_i(t)} $为案例当前迭代的遗忘值, $ {F_i(t+1)} $为案例当前迭代后的遗忘值.很明显, 当检索案例每次都能成功地匹配当前工况时, 则遗忘值不断减少.当检索案例每次不能成功地匹配当前工况时, 则遗忘值不断增加, 当$ {F_i(t+1) > \varsigma} $时, 该案例需要删除.

3. 基于稀土溶液颜色的智能补偿模型

通常情况下, 采用基于案例推理的预设定模型可得到一个相对合理的萃取剂和洗涤剂流量预设定值, 但该模型没有考虑到实时外界干扰导致的工况变化.当稀土萃取生产过程遇扰动使工况变化时, 难以保证生产指标满足工艺要求.实际铈镨/钕萃取工业生产过程中, 工作人员周期性地根据溶液颜色或组分含量化验值等调整萃取剂和洗涤剂流量差值, 使得生产指标满足工艺要求.本质上, 领域专家或工作人员是根据稀土元素组分含量变化差和变化率调节药剂量流量, 因此, 基于铈镨/钕溶液颜色能够表征其组分含量, 本文采用基于稀土溶液颜色的在线补偿模型, 该模型主要包括基于稀土溶液颜色的组分含量软测量模型和基于模糊推理的智能补偿模型.

3.1 组分含量软测量模型

稀土元素独特的电子层结构可使稀土离子具有丰富的发射光谱. CePr/Nd萃取生产工艺的易萃组分和难萃组分分别显现紫色和绿色颜色特征, 且颜色深浅与稀土离子溶度有关.由于难萃稀土组分含量和易萃稀土组分含量之和为1, 所以一般情况下只需要知道其中一个即可.将采集的稀土溶液颜色图像在HSI颜色空间下提取出H、S、I颜色特征分量.颜色H分量和S分量与组分含量的相关性分析如图 3所示, 图 3(a)为二维图, 图 3(b)为三维图.从图 3中可知, 溶液图像颜色与组分含量关系具有一定的规律, 且为非线性关系, 可表示如下^[24]:

$$ \begin{equation} y = f(C_h, C_s) \end{equation} $$

(8)

图 3 颜色分量H、S特征值和组分含量相关性分析

Fig. 3 Correlation between color feature and component content

下载: 全尺寸图片幻灯片

其中, $ {y} $表示组分含量, $ C_h $、$ C_s $分别表示HSI颜色空间的H和S分量.该非线性关系可采用非线性回归、神经网络等建模.由于最小二乘支持向量机(LS-SVM) 在求解二次规划问题时遇到的稀疏性及运算复杂问题时简化了采用二次规划求解优化问题的方式, 对小样本建模效果较好^[25].因此, 式(8)采用LS-SVM建模.

LS-SVM算法的最小化目标函数为^[26]:

$$ \begin{align} & \min\;J = \frac{1}{2}w^{ {\rm{T}}}w+\frac{1}{2}\gamma\sum\limits_{i = 1}^l{e_i^2} \\ & {\rm{s.t.}}\;y_i = w^ {{\rm{T}}}\cdot\varphi(x_i)+b+e_i, \;i = 1, \cdots, l \end{align} $$

(9)

根据式(9), 构造其拉格朗日函数为:

$$ \begin{equation} L = J-\sum\limits_{i = 1}^l \alpha_i[w^ {{\rm{T}}}\cdot\varphi(x_i)+b+e_i-y_i] \end{equation} $$

(10)

其中, $ {\alpha_i} $为拉格朗日乘子, 根据KTT条件得:

$$ \begin{equation} \begin{cases} \dfrac{\partial L}{\partial w} = 0 \to {w} = \sum\limits_{i = 1}^l \alpha_i\varphi(x_i)\\ \dfrac{\partial L}{\partial b} = 0 \to \sum\limits_{i = 1}^l \alpha_i = 0 \\ \dfrac{\partial L}{\partial e_i} = 0 \to \alpha_i = \gamma_{e_i}\\ \dfrac{\partial L}{\partial a_i} = 0 \to w^ {\rm{T}}\cdot\varphi(x_i)+b+e_i-y_i = 0\\ \end{cases} \end{equation} $$

(11)

对于$ {i = 1, \cdots, l} $, 消去$ {w} $和$ {e} $, 得到如下线性方程:

$$ \begin{equation} \left[ {\begin{array}{*{20}{c}} {0}&{{\bf{\mathord{\buildrel{\lower3pt\hbox{$ \rightharpoonup$}} \over 1} }}^ {\rm{T}}} \\ {\bf{\mathord{\buildrel{\lower3pt\hbox{$ \rightharpoonup$}} \over 1} }} &{ZZ^ {\rm{T}}+{\gamma^{-1}}I} \\ \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {{b}}\\ {{ \alpha}}\\ \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {0}\\ {y}\\ \end{array}} \right] \end{equation} $$

(12)

其中, $ {\bf{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over 1} }} = [1\cdots1]^ {\rm{T}} $, $ {a = [\alpha_1, \cdots, \alpha_l]^ {\rm{T}}} $, $ {y = [y_1, \cdots, y_l]^ {\rm{T}}} $, $ {Z = [\varphi(x_1), \cdots, \varphi(x_l)]^ {\rm{T}}} $, 令 $ {K(x, x_i) = \varphi(x)^ {\rm{T}}\varphi(x_i)} $, 根据核函数方法, 引入$ {n\times n} $的高斯径向基函数为核函数:

$$ \begin{equation} K(x_i, x_k) = {\rm{exp}}(-\frac{\|x_i-x_k\|}{2\sigma^2}) \end{equation} $$

(13)

则采用支持向量机进行预测的输出为:

$$ \begin{equation} y(x) = \sum\limits_{i = 1}^l a_iK(x, x_i)+b \end{equation} $$

(14)

LS-SVM模型的参数$ {a, b} $由式(12)计算, 高斯核函数宽度$ {\sigma} $和误差惩罚系数$ {\gamma} $等由粒子群优化算法计算^[27]:

$$ \begin{align} v_{i, d}(t+1) = &wv_{i, d}(t)+c_1r_1(P_{i, d}-x_{i, d}(t))+ \\& c_2r_2(P_{g, d}-x_{i, d}(t)) \end{align} $$

(15)

$$ \begin{equation} x_{i, d}(t+1) = x_{i, d}(t)+v_{i, d}(t+1) \end{equation} $$

(16)

其中, $ {x_{i, d}} $, $ {v_{i, d}} $, $ {P_{i, d}} $, $ {P_{g, d}} $分别表示粒子当前位置、当前速度、当前最佳位置和粒子的最佳位置. $ {\beta} $ 称为内部系数, 为[0, 1]间常量; $ {c_1} $和$ {c_2} $为学习速率; $ {r_1} $和$ {r_2} $为[0, 1]随机系数. $ v_{id}\in[-v_{ \rm{max}}, v_{ \rm{max}}], {v_{ \rm{max}}}$ 为最大速率.

3.2 基于模糊推理的流量智能补偿模型

稀土萃取生产过程中, 工作人员根据组分含量的变化量及变化率对当前的流量预设定值进行补偿, 使稀土元素组分含量满足生产指标.工作人员调节药剂量完全依赖于其经验知识, 且调节量是估计值.相比其他智能模型, 模糊推理是以模糊系统理论为基础、专家经验和领域知识为决策依据, 把定性模型转化成定量模型, 且不需要依据萃取机理建立关系模型.因此, 本节采用模糊推理建立萃取量和洗涤量补偿模型.该模型根据稀土元素组分含量软测量值, 补偿萃取量和洗涤量设定值.

通过第3.1节得到的组分含量 ${Y_{B}(k-1)}$ , ${Y_{A}(k-1)} $与目标组分含量$ {P_{B, 1}} $, $ {P_{A, n+m}} $比较, 得到其偏差$ e_s = Y_A(k-1)-P_{A, n+m} $, $ e_w = Y_B(k-1)-P_{B, 1} $及其偏差的变化$ \triangle e_s = Y_A(k-1)-Y_A(k-2) $, $ \triangle e_w = Y_B(k-1)-Y_B(k-2) $作为模糊系统的输入量, 萃取剂和洗涤剂的补偿增量$ {\triangle v_s} $和$ {\triangle v_w} $作为模糊推理系统的输出量.

基于模糊推理的智能补偿器如图 4所示, 主要包括模糊化/解模糊部分、模糊推理、知识库等; 知识库包括输入输出变量的比例(尺度)因子、隶属度函数和模糊语义控制规则.通常, 萃取剂流量模糊控制器的输入与输出之间的关系可表示为^[28-29]:

$$ \begin{equation} \triangle v_s(k) = f(e_s(k), \triangle e_s(k)|\psi) \end{equation} $$

(17)

图 4 基于模糊的补偿器

Fig. 4 Fuzzy based compensator

下载: 全尺寸图片幻灯片

其中, $ {f(\cdot|\psi)} $表示参数为$ {\psi} $ (如隶属度函数和模糊语义控制规则等)的模糊函数.假定萃取剂补偿量为$ {\triangle v_s(k)} $, 洗涤剂补偿量为$ {\triangle v_w(k)} $, 水相出口分数为$ {f_b'} $, 根据萃取机理模型^[30], 可推导出下列公式:

$$ \begin{equation} \triangle v_w(k) = \triangle v_s(k)+f_b' \end{equation} $$

(18)

一般情况只需要模糊推理萃取剂或洗涤剂补偿量即可.

式(17)所示的模糊补偿器需要确定模糊逻辑控制器的参数向量$ {\psi} $, 该参数向量包括模糊化方法、模糊推理方法、解模糊方法和知识库等.由于参数向量$ {\psi} $里面的参数相互作用影响输出控制量, 所以实际工业过程的模糊控制器要满足控制性能并非容易得到, 需要大量的时间和精力调整设计参数.基于模糊系统理论, 参数$ {\psi} $可写成^[31]:

$$ \begin{equation} \psi = [k_h, k_g, k_u;a_h, b_h, c_h;a_g, b_g, c_g;a_u, b_u, c_u] \end{equation} $$

(19)

其中, $ {k} $表示模糊控制的比例(尺度)因子, $ {a} $、$ {b} $和$ {c} $表示三角形隶属度函数的参数集, 下标$ {h} $、$ {g} $和$ {u} $分别表示组分含量误差、组分含量变化差和萃取量.

为实现对萃取量和洗涤量预设定值的补偿, 组分含量误差和组分含量变化差须模糊化成语义输入量, 经模糊推理得到的语义输出量须解模糊成实际补偿萃取量和洗涤量.假定组分含量误差的论域$ {e_s} $为$ [-e_{ \rm{smax}}, e_{ \rm{smax}}]$, 组分含量变化差$ {\triangle e_s} $的论域为 $ [-\triangle e_{ \rm{smax}}, \triangle e_{ \rm{smax}}]$, 补偿萃取量$ {\triangle v_s} $的论域为$ [-\triangle e_{ \rm{wmax}}, \triangle e_{ \rm{wmax}}]$. 语义变量 $ {E_s} $的论域为$ {-n, \cdots, 0, \cdots, n} $;语义变量$ {\triangle E_s} $的论域为$ {-m, \cdots, 0, \cdots, m-1, m} $; 语义变量$ {\triangle V_s} $ 的论域为$ {-l, \cdots, 0, \cdots, l-1, l} $ .因此, 比例因子表示为:

$$ \begin{equation} k_g = \frac{n}{e_{ \rm{smax}}}, k_h = \frac{m}{\triangle e_{ \rm{smax}}}, k_u = \frac{\triangle v_{ \rm{smax}}}{l} \end{equation} $$

(20)

$ {E_s} $和$ {\triangle E_s} $的论域定义七个模糊集为: "NB", "NM", "NS", "ZE", "PS", "PM", "PB". $ {\triangle V_s} $的论域定义七个模糊集为: "NL", "NB", "NM", "NS", "ZE ", "PS ", "PM ", "PB ", "PL".正如图 5(a)所示, 三角形隶属度函数主要有三个顶点参数$ {a} $、$ {b} $、$ {c} $确定.当两个相邻的隶属度函数确定后, 其交点$ {\lambda} $即确定.在一般情况下, 交点$ {\lambda} $ 应该位于两个隶属度函数中心位置, 能够对输入量快速响应及输入量小范围内变化精确响应.组分含量误差、组分含量变化差和萃取剂补偿量的隶属度函数如图 5(b)、5(c)和(d)所示.

图 5 组分含量误差、组分含量变化差和萃取剂补偿量的隶属度函数

Fig. 5 Membership functions of component content, change error of component content and compensator of the flow rate of the extractant

下载: 全尺寸图片幻灯片

根据专家经验和领域知识, 模糊规则表如表 1所示, 利用下面的语义控制规则构建模糊规则:

表 1 语义规则表

Table 1 Semantic rules table

	NB	NM	NS	ZO	PS	PM	PB
\hline NB	PL	PL	PB	PM	PM	PS	ZO
NM	PL	PB	PM	PS	PS	ZO	NS
NS	PB	PM	PS	PS	ZO	NS	NM
ZO	PM	PM	PS	ZO	NS	NM	NM
PS	PM	PS	ZO	NS	NS	NM	NB
PM	PS	ZO	NS	NS	NM	NB	NL
PB	ZO	NS	NM	NM	NB	NL	NL

下载: 导出CSV

| 显示表格

$ {R_i} $:如果$ {E_s} $为$ {\Gamma_i(E_s)} $和$ {\triangle E_s} $为$ {\Gamma_i(\triangle E_s)} $则$ {\triangle V_s} $为 $ {\Gamma_i(\triangle V_s)} $.

其中, $ {R_i} $表示$ {{\rm if}\cdots {\rm then}\cdots} $规则, $ \Gamma_i(\cdot)\in $ $ \{{\rm NL, NB, NM, NS, ZO, PS, PM, PB, PL}\} $. $ {NL} $、$ {NB} $等代码为模糊简写符号. $ {E_s} $、$ {\triangle E_s} $ 和$ {\triangle V_s} $ 分别表示组分含量误差、组分含量变化差和萃取量.

由于模糊推理的输出结果为一模糊集, 反映了模糊语义输出变量论域各元素隶属度大小的组合, 为了能够得到萃取剂补偿量的精确答案, 使用重心法解模糊:

$$ \begin{equation} \triangle v_s = \frac{\sum\limits_{i = 1}^n(\Gamma_i(\triangle v_s)\cdot\mu_v(\triangle v_s))}{\sum\limits_{i = 1}^n\mu_v(\triangle v_s)} \end{equation} $$

(21)

其中, $ {n} $指输出论域元素的总数, $ {\Gamma_i(\triangle v_s)} $表示输出论域的元素, 权系数$ {\mu_v(\triangle v_s)} $表示相应的隶属度函数.需要注意的是, $ {\triangle v_s} $需要乘以比例因子$ {k_u} $才能施加于预设定值.

4. 试验结果与分析

为了验证基于LS-SVM的组分含量软测量模型, 以某公司CePr/Nd萃取分离生产过程为对象, 在CePr/Nd萃取生产现场采集350份混合溶液样品, 287份作为训练样本, 63 份作为测试样本.溶液样本经机器视觉和化验两个基本预处理过程, 其提取出混合溶液图像的H、S颜色特征分量一阶矩作为软测量模型的输入变量, 化验后的CePr/Nd组分含量作为模型的输出变量.为了消除变量间由于数量级差异带来的影响, 将输入/输出样本组成的数据集进行归一化处理.在案例推理模型中, 案例阈值设为0.93, $ {\triangle_i} $设为2, 遗忘因子$ {\beta} $设为0.1, $ {\varsigma} $设为2.其案例推理权重属性通过学习策略计算分别为{0.12, 0.11, 0.21, 0.21, 0.11, 0.12, 0.12}. 在LS-SVM 模型中, 通过粒子群优化算法确定核宽度$ {\sigma} $ 为10.37, 惩罚系数(核参数) $ {\gamma} $为0.53, 287组样本数据集训练LS-SVM 模型, 利用63 组测试样本在已经训练好的模型上进行测试, 并与人工化验得到的63组样本所对应的组分含量化验值进行对比分析, 结果如图 6所示.结果表明该软测量模型基本满足要求, 为了说明该方法的有效性, 与RBF、SVM方法比较, 采用均方根误差(反映测量数据偏离真实值的程度)和均方差(反映一个数据集的离散程度)作为评价指标, 计算公式分别为: $ {{\rm RMSE} = \sqrt{\frac{1}{n}\sum\nolimits_{i = 1}^n(\hat{y_i}-y_i)^2}} $ 和 $ {{\rm MSE} = \sqrt{\frac{1}{n}\sum\nolimits_{i = 1}^n(y_i-\bar{y})^2}} $. 结果如表 2所示.本文采用的LS-SVM相比较于RBF和SVM方法效果更好.

图 6 基于溶液颜色的组分含量软测量与化验值比较

Fig. 6 Component content by soft sensing based on color feature of rare earth solution and lab test

下载: 全尺寸图片幻灯片

表 2 不同方法建模结果的性能比较

Table 2 Performance comparison of different models

	Nd (%)		CePr (%)
$ {\rm Method} $	$ {\rm RMSE} $	$ {\rm MSE} $	$ {\rm RMSE} $	$ {\rm MSE} $
$ {\rm RBF} $	0.5829	0.623	0.6243	0.6109
$ {\rm SVM} $	0.6021	0.528	0.5820	0.652
$ {\rm LSSVM} $	0.5012	0.4891	0.5391	0.509

下载: 导出CSV

| 显示表格

模糊药剂量补偿模块主要确定特征参数$ {\psi} $的值, 即$ \psi = [k_h, k_g, k_u; a_h, b_h, c_h; a_g, b_g, c_g; $ $ a_u, b_u, c_u] $.根据工艺试验, Nd组分含量误差的论域为$ [-3, 3] $, 其预定值为0, 因此$ \triangle h $的论域为 $ [-3, 3] $, 则$ {k_h = 3/6 = 0.5} $. Nd组分含量变化率的论域为$ [-0.18, 0.18], $其预定值为0, 因此$ {\triangle g} $的论域为$ [-0.18, 0.18] $, 则$ {k_g = 0.18/6 = 0.03} $.萃取量范围为 [3ml/min, 5ml/min] 且其预定值为4ml/min, 则$ {\triangle u} $的论域[$ -\rm{ml}/min, 1ml/min]$, 则$ {k_u = 8/1 = 8} $. $ {\triangle h} $, $ {\triangle g} $和$ {\triangle u} $隶属度函数的特征参数如表 3所示.工业现场工艺实验可知水出口分数$ {f_b' = 0.9298} $.

表 3 使用特征参数的隶属度函数

Table 3 Membership functions of the fuzzy logic controller using characteristic parameters

	Nd组分含量误差			Nd组分含量变化率			萃取补偿量
NO.	$ {a_h} $	$ {b_h} $	$ {c_h} $	$ {a_g} $	$ {b_g} $	$ {c_g} $	$ {a_u} $	$ {b_u} $	$ {c_u} $
1	-8	-6	-4	-8	-6	-4	-10	-8	-6
2	-6	-4	-1.5	-6	-4	-1.5	-8	-6	-4
3	-4	-1.5	0	-4	-1.5	0	-6	-4	-1.5
4	-1.5	0	1.5	-1.5	0	1.5	-1.5	0	1.5
5	0	1.5	4	0	1.5	4	-4	-1.5	0
6	1.5	4	6	1.5	4	6	0	1.5	4
7	4	6	8	4	6	8	1, 5	4	6
8	$ {\times} $	$ {\times} $	$ {\times} $	$ {\times} $	$ {\times} $	$ {\times} $	4	6	8
9	$ {\times} $	$ {\times} $	$ {\times} $	$ {\times} $	$ {\times} $	$ {\times} $	6	8	10

下载: 导出CSV

| 显示表格

采用人工方法、单独采用案例推理预设定模型与本文方法的控制效果相比较如何?为此, 结合图 1所示的铈镨/钕稀土萃取过程, 将图 2所提出的本文方法与人工设定方法、基于案例推理的单模型方法进行对比试验.三组萃取工艺生产流程来自于同一料液, 一组生产流程采用本文的智能设定方法; 另一组萃取生产流程采用人工设定方法; 第三组采用基于案例推理的单模型设定方法.三组萃取生产过程的料液相同, 萃取生产过程条件相同.每隔2小时记录萃取量和洗涤量、图像分析的组分含量及其对应的离线化验的组分含量, 各得到60组运行数据, 统计结果如图 7所示.由于图像采集安装在萃取段第20级, 在该级萃取槽中镨的组分含量应该在[26, 30]范围内, CePr的组分含量在[70, 74]范围内.

图 7 人工方法、单模型方法与本文方法的组分含量

Fig. 7 Component content obtained by manual method、CBR based method and the proposed method

下载: 全尺寸图片幻灯片

图 7方法1采用本文提出智能设定方法对其设定, 方法2表示人工方法设定, 方法3表示仅采用基于案例推理的单模型设定方法进行设定.图 8为三种方法的药剂量变化情况.图 7和图 8反映了组分含量的变化及相应的萃取剂和洗涤剂流量变化情况.在61个样本点里, 当组分含量变化时, 其对应的萃取剂和洗涤剂流量也就相应地变化, 并反映到了下一个组分含量上面.由图 7可看出, 方法2的人工调节随意性较大, 调节时喜欢采用一步到位法, 导致组分含量波动较大.方法3的组分含量波动较大, 并且不满足要求的数据点较多, 其原因在于通过案例推理模型计算萃取量, 但没有利用反馈补偿机制.采用方法1的组分含量波动较小, 满足要求的数据点较多.通过计算, 单独采用预设定模型时精矿品位合格率为73.78%, 采用人工方法的合格率为75.4%, 而采用本文方法时的合格率为83.5%.这说明经过智能动态补偿模型的补偿作用, 所提方法克服了工况变化带来的扰动, 弥补了人工方法、基于案例推理的单模型设定方法的不足.

图 8 人工方法、案例推理方法与本文方法的萃取量

Fig. 8 Flow rate of the extractant and the detergent consumed by manual method、CBR based method and the proposed method

下载: 全尺寸图片幻灯片

在稀土萃取工业生产领域, 除了生产指标, 萃取量和洗涤量消耗也是萃取生产厂家最为关注的指标之一, 与工厂绩效密切相关.为评估本文方法的有效性, 比较人工设定、单模型方法及智能优化设定的生产指标和萃取量和洗涤量消耗量等.图 8显示本文方法的萃取量和洗涤量波动比另外两种方法要小, 说明药剂量控制相对比较平缓; 而且, 如表 4所示, 本文方法的萃取量和洗涤量消耗比另外两种方法要少.结果表明, 比起另外两种方法, 本文方法的三个模型为紧密联系的有机整体, 能够实现更好的萃取剂和洗涤剂流量控制, 并达到了降低药剂量消耗的目的.

表 4 萃取量和洗涤量消耗统计表(升)

Table 4 Sum of the extractant and detergent comsumend by three methods (L)

	人工方法	案例推理方法	本文方法
洗涤量	82.67	82.29	81.54
萃取量	28.63	28.26	27.54

下载: 导出CSV

| 显示表格

5. 结论

萃取生产过程具有机理复杂、非线性、时滞大等特点, 难以建立精确的萃取量和洗涤量优化控制模型, 当前稀土生产企业采用主观性强的人工设定方式, 易造成工艺指标波动较大.本文根据稀土溶液颜色与组分含量密切相关的特点, 提出了一种基于案例推理静态设定和机器视觉动态补偿的萃取量和洗涤量优化控制方法.首先根据案例推理确定萃取剂/洗涤剂预设定值, 然后根据软测量组分含量和目标组分含量的差值采用模糊推理方法智能补偿萃取量预设定值, 实现了萃取生产过程萃取溶液组分含量的优化控制, 试验表明该方法能够实现较好的生产指标及经济效益.

本文责任编委贺威

图 1 机器人操作技能模型框图

Fig. 1 Diagram of robot manipulation skill model

下载: 全尺寸图片幻灯片

图 2 基于行为树的技能流程表示^[14]

Fig. 2 Behavior tree based skill procedure representation^[14]

下载: 全尺寸图片幻灯片

图 3 基于概率运动基元的轨迹编码^[31]

Fig. 3 ProMP based trajectory encoding^[31]

下载: 全尺寸图片幻灯片

图 4 基于多元变量动态系统的运动技能执行框架, 其中, $q$, $u$和分别表示机器人的关节角度、运动指令和动态系统的状态变量(此处为笛卡尔空间中的末端位置)^[61]

Fig. 4 Multivariate dynamical system based motion skill, $q$, $u$ and label the robot$'$s joint angle, motor command and dynamical system$'$s state variable (end-effector position in Cartesian space)^[61]

下载: 全尺寸图片幻灯片

图 5 基于LSTM的装配策略模型^[72]

Fig. 5 LSTM based assembly policy model^[72]

下载: 全尺寸图片幻灯片

图 6 基于深度神经网络的端到端策略模型^[80]

Fig. 6 DNN based end-to-end policy model^[80]

下载: 全尺寸图片幻灯片

图 7 机器人操作模型的典型应用((a)轴孔装配技能^[72]; (b)开门技能^[8]; (c)手术切除技能^[95])

Fig. 7 Typical application of robot manipulation skill model ((a) peg-in-hole assembly^[72]; (b) door opening^[8]; (c) resection surgery^[95])

下载: 全尺寸图片幻灯片

参考文献(117)

[1]	Hirzinger G, Landzettel K. Sensory feedback structures for robots with supervised learning. In: Proceedings of the 1985 IEEE International Conference on Robotics and Automation. St. Louis, MO, USA: IEEE, 1985. 627-635
[2]	Asada H, Asari Y. The direct teaching of tool manipulation skills via the impedance identification of human motions. In: Proceedings of the 1988 IEEE International Conference on Robotics and Automation. Philadelphia, PA, USA: IEEE, 1988. 1269-1274 http://www.panduoduo.net/r/17087799
[3]	曾毅, 刘成林, 谭铁牛.类脑智能研究的回顾与展望.计算机学报, 2016, 39(1): 212-223 http://d.old.wanfangdata.com.cn/Periodical/jsjxb201601015 Zeng Yi, Liu Cheng-Lin, Tan Tie-Niu. Retrospect and outlook of brain-inspired intelligence research. Chinese Journal of Computers, 2016, 39(1): 212-223 http://d.old.wanfangdata.com.cn/Periodical/jsjxb201601015
[4]	陶建华, 陈云霁.类脑计算芯片与类脑智能机器人发展现状与思考.中国科学院院刊, 2016, 31(7): 803-811 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=zgkxyyk201607009 Tao Jian-Hua, Chen Yun-Ji. Current status and consideration on brain-like computing chip and brain-like intelligent robot. Bulletin of Chinese Academy of Sciences, 2016, 31(7): 803-811 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=zgkxyyk201607009
[5]	Ersen M, Oztop E, Sariel S. Cognition-enabled robot manipulation in human environments: requirements, recent work, and open problems. IEEE Robotics and Automation Magazine, 2017, 24(3): 108-122 doi: 10.1109/MRA.2016.2616538
[6]	Argall B D, Chernova S, Veloso M, Browning B. A survey of robot learning from demonstration. Robotics and Autonomous Systems, 2009, 57(5): 469-483 doi: 10.1016/j.robot.2008.10.024
[7]	Kober J, Bagnell J A, Peters J. Reinforcement learning in robotics: a survey. The International Journal of Robotics Research, 2013, 32(11): 1238-1274 doi: 10.1177/0278364913495721
[8]	Yahya A, Li A, Kalakrishnan M, Chebotar Y, Levine S. Collective robot reinforcement learning with distributed asynchronous guided policy search. In: Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver, BC, Canada: IEEE, 2017. 79-86 https://arxiv.org/pdf/1610.00673.pdf
[9]	Foukarakis M, Leonidis A, Antona M, Stephanidis C. Combining finite state machine and decision-making tools for adaptable robot behavior. In: Proceedings of the 8th International Conference on Universal Access in Human-Computer Interaction. Heraklion, Crete, Greece: Springer, 2014. 625-635 http://hobbit.acin.tuwien.ac.at/publications/HCII2014.pdf
[10]	Zhou H T, Min H S, Lin Y H, Zhang S N. A robot architecture of hierarchical finite state machine for autonomous mobile manipulator. In: Proceedings of the 10th International Conference on Intelligent Robotics and Applications. Wuhan, China: Springer, 2017. 425-436 https://www.researchgate.net/publication/318924520_A_Robot_Architecture_of_Hierarchical_Finite_State_Machine_for_Autonomous_Mobile_Manipulator
[11]	Colledanchise M, Parasuraman R, Ögren P. Learning of behavior trees for autonomous agents. IEEE Transactions on Games, 2019, 11(2): 183-189 doi: 10.1109/TG.2018.2816806
[12]	Guerin K R, Lea C, Paxton C, Hager G D. A framework for end-user instruction of a robot assistant for manufacturing. In: Proceedings of the 2015 IEEE International Conference on Robotics and Automation. Seattle, WA, USA: IEEE, 2015. 6167-6174 https://jhu.pure.elsevier.com/en/publications/a-framework-for-end-user-instruction-of-a-robot-assistant-for-man-4
[13]	Paxton C, Hundt A, Jonathan F, Guerin K, Hager G D. CoSTAR: instructing collaborative robots with behavior trees and vision. In: Proceedings of the 2017 IEEE International Conference on Robotics and Automation. Singapore, Singapore: IEEE, 2017. 564-571 https://arxiv.org/pdf/1611.06145.pdf
[14]	Paxton C, Jonathan F, Hundt A, Mutlu B, Hager G D. Evaluating methods for end-user creation of robot task plans. In: Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid, Spain: IEEE, 2018. 6086-6092 https://cpaxton.github.io/public/paxton2018evaluating.pdf
[15]	Bagnell J A, Cavalcanti F, Cui L, Galluzzo T, Hebert M, Kazemi M, et al. An integrated system for autonomous robotics manipulation. In: Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vilamoura, Portugal: IEEE, 2012. 2955-2962 https://ieeexplore.ieee.org/abstract/document/6385888
[16]	Colledanchise M, Marzinotto A, Ögren P. Performance analysis of stochastic behavior trees. In: Proceedings of the 2014 IEEE International Conference on Robotics and Automation. Hong Kong, China: IEEE, 2014: 3265-3272 http://www.csc.kth.se/~miccol/Michele_Colledanchise/Publications_files/ICRA14_cmo_final.pdf
[17]	Akgun B, Thomaz A. Simultaneously learning actions and goals from demonstration. Autonomous Robots, 2016, 40(2): 211-227 doi: 10.1007/s10514-015-9448-x
[18]	Akgun B, Thomaz A L. Self-improvement of learned action models with learned goal models. In: Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg, Germany: IEEE, 2015. 5259-5264 https://ieeexplore.ieee.org/abstract/document/7354119
[19]	Kroemer O, Daniel C, Neumann G, van Hoof H, Peters J. Towards learning hierarchical skills for multi-phase manipulation tasks. In: Proceedings of the 2015 IEEE International Conference on Robotics and Automation. Seattle, WA, USA: IEEE, 2015. 1503-1510 https://ieeexplore.ieee.org/document/7139389
[20]	Medina J R, Billard A. Learning stable task sequences from demonstration with linear parameter varying systems and hidden Markov models. In: Proceedings of the 2017 Conference on Robot Learning. Mountain View, California, USA, 2017: 175-184 http://proceedings.mlr.press/v78/medina17a/medina17a.pdf
[21]	Pardowitz M, Knoop S, Dillmann R, Zollner R D. Incremental learning of tasks from user demonstrations, past experiences, and vocal comments. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2007, 37(2): 322-332 doi: 10.1109/TSMCB.2006.886951
[22]	Nicolescu M N, Mataric M J. Natural methods for robot task learning: instructive demonstrations, generalization and practice. In: Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multiagent Systems. Melbourne, Australia: ACM, 2003. 241-248 https://www.cse.unr.edu/~monica/Research/Publications/agents03.pdf
[23]	Hayes B, Scassellati B. Autonomously constructing hierarchical task networks for planning and human-robot collaboration. In: Proceedings of the 2016 IEEE International Conference on Robotics and Automation. Stockholm, Sweden: IEEE, 2016. 5469-5476 https://scazlab.yale.edu/sites/default/files/files/hayes_icra16.pdf
[24]	Ahmadzadeh S R, Kormushev P, Caldwell D G. Interactive robot learning of visuospatial skills. In: Proceedings of the 2013 International Conference on Advanced Robotics. Montevideo, Uruguay: IEEE, 2013: 1-8 https://www.researchgate.net/publication/258832541_Interactive_Robot_Learning_of_Visuospatial_Skills
[25]	Ahmadzadeh S R, Paikan A, Mastrogiovanni F, Natale L, Kormushev P, Caldwell D G, et al. Learning symbolic representations of actions from human demonstrations. In: Proceedings of the 2015 IEEE International Conference on Robotics and Automation. Seattle, WA, USA: IEEE, 2015. 3801-3808 https://www.researchgate.net/publication/273755287_Learning_Symbolic_Representations_of_Actions_from_Human_Demonstrations
[26]	Dornhege C, Hertle A. Integrated symbolic planning in the tidyup-robot project. In: Proceedings of the 2013 Designing Intelligent Robots: Reintegrating AI: Papers Form the AAAI Spring Symposium. Palo Alto, California, USA: AAAI, 2013. https://www.researchgate.net/publication/289304978_Integrated_symbolic_planning_in_the_tidyup-robot_project
[27]	Beetz M, Mösenlechner L, Tenorth M. CRAM — a cognitive robot abstract machine for everyday manipulation in human environments. In: Proceedings of the 2010 IEEE/ RSJ International Conference on Intelligent Robots and Systems. Taipei, China: IEEE, 2010. 1012-1017
[28]	Tenorth M, Beetz M. KnowRob: a knowledge processing infrastructure for cognition-enabled robots. The International Journal of Robotics Research, 2013, 32(5): 566- 590 doi: 10.1177/0278364913481635
[29]	Bozcuoǧlu A K, Kazhoyan G, Furuta Y, Stelter S, Michael B, Kei O, et al. The exchange of knowledge using cloud robotics. IEEE Robotics and Automation Letters, 2018, 3(2): 1072-1079 doi: 10.1109/LRA.2018.2794626
[30]	Calinon S, Guenter F, Billard A. On learning, representing, and generalizing a task in a humanoid robot. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2007, 37(2): 286-298 doi: 10.1109/TSMCB.2006.886952
[31]	Maeda G J, Neumann G, Ewerton M, Lioutikov R, Kroemer O, Peters J. Probabilistic movement primitives for coordination of multiple human-robot collaborative tasks. Autonomous Robots, 2017, 41(3): 593-612 doi: 10.1007/s10514-016-9556-2
[32]	Calinon S, Li Z B, Alizadeh T, Tsagarakis N G, Caldwell D G. Statistical dynamical systems for skills acquisition in humanoids. In: Proceedings of the 12th IEEE-RAS International Conference on Humanoid Robots. Osaka, Japan: IEEE, 2012. 323-329 https://www.researchgate.net/publication/234154957_Statistical_dynamical_systems_for_skills_acquisition_in_humanoids
[33]	Huang Y L, Silvério J, Rozo L, Caldwell D G. Generalized task-parameterized skill learning. In: Proceedings of the 2018 IEEE International Conference on Robotics and Automation. Brisbane, QLD, Australia: IEEE, 2018. 5667- 5674 https://www.researchgate.net/publication/318255627_Generalized_Task-Parameterized_Skill_Learning
[34]	Tanwani A K, Calinon S. Learning robot manipulation tasks with task-parameterized semitied hidden semi-Markov model. IEEE Robotics and Automation Letters, 2016, 1(1): 235-242 doi: 10.1109/LRA.2016.2517825
[35]	Silvério J, Rozo L, Calinon S, Caldwell D G. Learning bimanual end-effector poses from demonstrations using task-parameterized dynamical systems. In: Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg, Germany: IEEE, 2015. 464-470 https://ieeexplore.ieee.org/document/7353413
[36]	Calinon S, Bruno D, Caldwell D G. A task-parameterized probabilistic model with minimal intervention control. In: Proceedings of the 2014 IEEE International Conference on Robotics and Automation. Hong Kong, China: IEEE, 2014. 3339-3344 https://www.researchgate.net/publication/261722329_A_task-parameterized_probabilistic_model_with_minimal_intervention_control
[37]	Rozo L, Bruno D, Calinon S, Caldwell D G. Learning optimal controllers in human-robot cooperative transportation tasks with position and force constraints. In: Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg, Germany: IEEE, 2015. 1024-1030 http://publications.idiap.ch/downloads/papers/2015/Rozo_IROS_2015.pdf
[38]	Paraschos A, Daniel C, Peters J, Neumann G. Probabilistic movement primitives. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: ACM, 2013. 2616-2624 https://www.researchgate.net/publication/258620153_Probabilistic_Movement_Primitives
[39]	Paraschos A, Daniel C, Peters J, Neumann G. Using probabilistic movement primitives in robotics. Autonomous Robots, 2018, 42(3): 529-551 doi: 10.1007/s10514-017-9648-7
[40]	Paraschos A, Rueckert E, Peters J, Neumann G. Probabilistic movement primitives under unknown system dynamics. Advanced Robotics, 2018, 32(6): 297-310 doi: 10.1080/01691864.2018.1437674
[41]	Colomé A, Neumann G, Peters J, Torras C. Dimensionality reduction for probabilistic movement primitives. In: Proceedings of the 2014 IEEE-RAS International Conference on Humanoid Robots. Madrid, Spain: IEEE, 2014. 794-800 https://ieeexplore.ieee.org/document/7041454
[42]	Lioutikov R, Neumann G, Maeda G, Peters J. Learning movement primitive libraries through probabilistic segmentation. The International Journal of Robotics Research, 2017, 36(8): 879-894 doi: 10.1177/0278364917713116
[43]	Schneider M, Ertel W. Robot learning by demonstration with local Gaussian process regression. In: Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. Taipei, China: IEEE, 2010: 255 -260 https://ieeexplore.ieee.org/document/5650949
[44]	Garrido J, Yu W, Soria A. Human behavior learning for robot in joint space. Neurocomputing, 2015, 155: 22-31 doi: 10.1016/j.neucom.2014.12.068
[45]	Schulman J, Ho J, Lee C, Abbeel P. Learning from demonstrations through the use of non-rigid registration. Robotics Research. Cham: Springer International Publishing, 2016. 339-354 https://people.eecs.berkeley.edu/~pabbeel/papers/SchulmanHoLeeAbbeel_ISRR2013.pdf
[46]	Lee A X, Lu H, Gupta A, Levine S, Abbeel P. Learning force-based manipulation of deformable objects from multiple demonstrations. In: Proceedings of the 2015 IEEE International Conference on Robotics and Automation. Seattle, WA, USA: IEEE, 2015. 177-184 https://people.eecs.berkeley.edu/~pabbeel/papers/2015-ICRA-TPS-LfD-forces.pdf
[47]	Ijspeert A J, Nakanishi J, Schaal S. Learning attractor landscapes for learning motor primitives. In: Proceedings of the 15th International Conference on Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2002. 1547-1554 https://www.researchgate.net/publication/221617765_Learning_Attractor_Landscapes_for_Learning_Motor_Primitives
[48]	Ijspeert A J, Nakanishi J, Schaal S. Movement imitation with nonlinear dynamical systems in humanoid robots. In: Proceedings of the 2002 IEEE International Conference on Robotics and Automation. Washington, DC, USA: IEEE, 2002. 1398-1403 http://www4.cs.umanitoba.ca/~jacky/Robotics/Papers/movement-imitation-with-nonlinear.pdf
[49]	Ijspeert A J, Nakanishi J, Hoffmann H, Pastor P, Schaal S. Dynamical movement primitives: learning attractor models for motor behaviors. Neural Computation, 2013, 25(2): 328-373 doi: 10.1162/NECO_a_00393
[50]	Kober J, Peters J. Policy search for motor primitives in robotics. Machine Learning, 2011, 84(1-2): 171-203 doi: 10.1007/s10994-010-5223-6
[51]	Kober J, Peters J. Learning motor primitives for robotics. In: Proceedings of the 2009 IEEE International Conference on Robotics and Automation. Kobe, Japan: IEEE, 2009. 2112-2118 https://ieeexplore.ieee.org/document/5152577
[52]	Yang C G, Chen C Z, He W, Cui R X, Li Z J. Robot learning system based on adaptive neural control and dynamic movement primitives. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(3): 777-787 doi: 10.1109/TNNLS.2018.2852711
[53]	Kormushev P, Calinon S, Caldwell D G. Imitation learning of positional and force skills demonstrated via kinesthetic teaching and haptic input. Advanced Robotics, 2011, 25(5): 581-603 doi: 10.1163/016918611X558261
[54]	Kupcsik A, Deisenroth M P, Peters J, Loh A P, Vadakkepat P. Model-based contextual policy search for data-efficient generalization of robot skills. Artificial Intelligence, 2017, 247: 415-439 doi: 10.1016/j.artint.2014.11.005
[55]	Pastor P, Kalakrishnan M, Chitta S, Theodorou E, Schaal S. Skill learning and task outcome prediction for manipulation. In: Proceedings of the 2011 IEEE International Conference on Robotics and Automation. Shanghai, China: IEEE, 2011. 3828-3834 http://www.cs.cmu.edu/~cga/print.2/Pastor_ICRA_2011.pdf
[56]	Stulp F, Theodorou E A, Schaal S. Reinforcement learning with sequences of motion primitives for robust manipulation. IEEE Transactions on Robotics, 2012, 28(6): 1360- 1370 doi: 10.1109/TRO.2012.2210294
[57]	Mülling K, Kober J, Kroemer O, Peters J. Learning to select and generalize striking movements in robot table tennis. The International Journal of Robotics Research, 2013, 32(3): 263-279 doi: 10.1177/0278364912472380
[58]	Colomé A, Torras C. Dimensionality reduction for dynamic movement primitives and application to bimanual manipulation of clothes. IEEE Transactions on Robotics, 2018, 34(3): 602-615 doi: 10.1109/TRO.2018.2808924
[59]	Deniša M, Gams A, Ude A, Petrič T. Learning compliant movement primitives through demonstration and statistical generalization. IEEE/ASME Transactions on Mechatronics, 2016, 21(5): 2581-2594 doi: 10.1109/TMECH.2015.2510165
[60]	Gribovskaya E, Khansari-Zadeh S M, Billard A. Learning non-linear multivariate dynamics of motion in robotic manipulators. The International Journal of Robotics Research, 2011, 30(1): 80-117 doi: 10.1177/0278364910376251
[61]	Khansari-Zadeh S M, Billard A. Learning stable nonlinear dynamical systems with Gaussian mixture models. IEEE Transactions on Robotics, 2011, 27(5): 943-957 doi: 10.1109/TRO.2011.2159412
[62]	Shukla A, Billard A. Augmented-SVM for gradient observations with application to learning multiple-attractor dynamics. Support Vector Machines Applications. Cham: Springer International Publishing, 2014. 1-21 https://www.researchgate.net/publication/287723495_Augmented-SVM_for_Gradient_Observations_with_Application_to_Learning_Multiple-Attractor_Dynamics
[63]	Neumann K, Steil J J. Learning robot motions with stable dynamical systems under diffeomorphic transformations. Robotics and Autonomous Systems, 2015, 70: 1-15 doi: 10.1016/j.robot.2015.04.006
[64]	Duan J H, Ou Y S, Hu J B, Wang Z Y, Jin S K, Xu C. Fast and stable learning of dynamical systems based on extreme learning machine. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2019, 49(6): 1175-1185 doi: 10.1109/TSMC.2017.2705279
[65]	Shukla A, Billard A. Coupled dynamical system based arm-hand grasping model for learning fast adaptation strategies. Robotics and Autonomous Systems, 2012, 60(3): 424-440 doi: 10.1016/j.robot.2011.07.023
[66]	Ureche A L P, Umezawa K, Nakamura Y, Billard A. Task parameterization using continuous constraints extracted from human demonstrations. IEEE Transactions on Robotics, 2015, 31(6): 1458-1471 doi: 10.1109/TRO.2015.2495003
[67]	Gams A, Nemec B, Ijspeert A J, Ude A. Coupling movement primitives: interaction with the environment and bimanual tasks. IEEE Transactions on Robotics, 2014, 30(4): 816-830 doi: 10.1109/TRO.2014.2304775
[68]	Bruno D, Calinon S, Caldwell D G. Learning autonomous behaviours for the body of a flexible surgical robot. Autonomous Robots, 2017, 41(2): 333-347 doi: 10.1007/s10514-016-9544-6
[69]	Sung J, Selman B, Saxena A. Learning sequences of controllers for complex manipulation tasks. In: Proceedings of the 30th International Conference on Machine Learning. Atlanta, Georgia, USA: JMLR, 2013. https://www.researchgate.net/publication/241279096_Learning_Sequences_of_Controllers_for_Complex_Manipulation_Tasks
[70]	Chernova S, Veloso M. Confidence-based policy learning from demonstration using Gaussian mixture models. In: Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems. Honolulu, Hawaii: ACM, 2007. Article No. 233 https://wenku.baidu.com/view/818f5d134431b90d6c85c79d.html
[71]	Edmonds M, Gao F, Xie X, Liu H X, Qi S Y, Zhu Y X, et al. Feeling the force: integrating force and pose for fluent discovery through imitation learning to open medicine bottles. In: Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver, BC, Canada: IEEE, 2017. 3530-3537
[72]	Inoue T, De Magistris G, Munawar A, Yokoya T, Tachibana R. Deep reinforcement learning for high precision assembly tasks. In: Proceedings of the 2017 IEEE/ RSJ International Conference on Intelligent Robots and Systems. Vancouver, BC, Canada: IEEE, 2017. 819-825 https://arxiv.org/pdf/1708.04033.pdf
[73]	Deisenroth M P, Rasmussen C E, Fox D. Learning to control a low-cost manipulator using data-efficient reinforcement learning. In: Proceedings of the 2011 Robotics: Science and Systems Ⅶ. Los Angeles, CA, USA: University of Southern California, 2011. 57-64 https://rse-lab.cs.washington.edu/postscripts/robot-rl-rss-11.pdf
[74]	Deisenroth M P, Fox D, Rasmussen C E. Gaussian processes for data-efficient learning in robotics and control. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(2): 408-423 doi: 10.1109/TPAMI.2013.218
[75]	Levine S, Wagener N, Abbeel P. Learning contact-rich manipulation skills with guided policy search. In: Proceedings of the 2015 IEEE International Conference on Robotics and Automation. Seattle, WA, USA: IEEE, 2015. 156-163 https://ieeexplore.ieee.org/document/7138994
[76]	Han W Q, Levine S, Abbeel P. Learning compound multi-step controllers under unknown dynamics. In: Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg, Germany: IEEE, 2015. 6435-6442 http://rll.berkeley.edu/reset_controller/reset_controller.pdf
[77]	Finn C, Tan X Y, Duan Y, Darrell T, Levine S, Abbeel P. Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv: 1509.06113v1, 2015. https://arxiv.org/abs/1509.06113v1
[78]	Lee J, Ryoo M S. Learning robot activities from first-person human videos using convolutional future regression. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, HI, USA: IEEE, 2017. 472-473 https://arxiv.org/pdf/1703.01040.pdf
[79]	Gu S X, Holly E, Lillicrap T, Levine S. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: Proceedings of the 2017 IEEE International Conference on Robotics and Automation. Singapore, Singapore: IEEE, 2017. 3389-3396 https://arxiv.org/pdf/1610.00633.pdf
[80]	Levine S, Finn C, Darrell T, Abbeel P. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 2016, 17(1): 1334-1373 https://arxiv.org/pdf/1504.00702v1.pdf
[81]	Sasaki K, Ogata T. End-to-end visuomotor learning of drawing sequences using recurrent neural networks. In: Proceedings of the 2018 International Joint Conference on Neural Networks. Rio de Janeiro, Brazil: IEEE, 2018. 1-2 https://waseda.pure.elsevier.com/en/publications/end-to-end-visuomotor-learning-of-drawing-sequences-using-recurre
[82]	Kase K, Suzuki K, Yang P C, Mori H, Ogata T. Put-in-box task generated from multiple discrete tasks by a humanoid robot using deep learning. In: Proceedings of the 2018 IEEE International Conference on Robotics and Automation. Brisbane, QLD, Australia: IEEE, 2018. 6447-6452 https://www.researchgate.net/publication/321283962_Put-In-Box_task_generated_from_multiple_discrete_tasks_by_humanoid_robot_using_deep_learning
[83]	Wolpert D M, Diedrichsen J, Flanagan J R. Principles of sensorimotor learning. Nature Reviews Neuroscience, 2011, 12(12): 739-751 doi: 10.1038/nrn3112
[84]	Ghadirzadeh A, Maki A, Kragic D, Björkman M. Deep predictive policy training using reinforcement learning. In: Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vancouver, BC, Canada: IEEE, 2017. 2351-2358 https://arxiv.org/pdf/1703.00727.pdf
[85]	Schou C, Andersen R S, Chrysostomou D, Bogh S, Madsen O. Skill-based instruction of collaborative robots in industrial settings. Robotics and Computer-Integrated Manufacturing, 2018, 53: 72-80 doi: 10.1016/j.rcim.2018.03.008
[86]	Bekiroglu Y, Laaksonen J, Jorgensen J A, Kyrki V. Assessing grasp stability based on learning and haptic data. IEEE Transactions on Robotics, 2011, 27(3): 616-629 doi: 10.1109/TRO.2011.2132870
[87]	Dang H, Allen P K. Learning grasp stability. In: Proceedings of the 2012 IEEE International Conference on Robotics and Automation. Saint Paul, MN, USA: IEEE, 2012. 2392-2397 https://www.researchgate.net/publication/260289014_Learning_grasp_stability
[88]	Levine S, Pastor P, Krizhevsky A, Ibarz J, Quillen D. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. The International Journal of Robotics Research, 2018, 37(4-5): 421- 436 doi: 10.1177/0278364917710318
[89]	Finn C, Goodfellow I, Levine S. Unsupervised learning for physical interaction through video prediction. In: Proceedings of the 30th Neural Information Processing Systems. Barcelona, Spain: MIT Press, 2016: 64-72 https://arxiv.org/pdf/1605.07157.pdf
[90]	Finn C, Levine S. Deep visual foresight for planning robot motion. In: Proceedings of the 2017 IEEE International Conference on Robotics and Automation. Singapore, Singapore: IEEE, 2017. 2786-2793 https://arxiv.org/abs/1610.00696
[91]	Petrič T, Gams A, Colasanto L, Ijspeert A J, Ude A. Accelerated sensorimotor learning of compliant movement primitives. IEEE Transactions on Robotics, 2018, 34(6): 1636- 1642 doi: 10.1109/TRO.2018.2861921
[92]	Huang P C, Hsieh Y H, Mok A K. A skill-based programming system for robotic furniture assembly. In: Proceedings of the 16th IEEE International Conference on Industrial Informatics. Porto, Portugal: IEEE, 2018. 355-361
[93]	Qin F, Xu D, Zhang D, Li Y. Robotic skill learning for precision assembly with microscopic vision and force feedback. IEEE/ASME Transactions on Mechatronics, 24(3): 1117-1128 https://ieeexplore.ieee.org/document/8681089
[94]	倪自强, 王田苗, 刘达.基于视觉引导的工业机器人示教编程系统.北京航空航天大学学报, 2016, 42(3): 562-568 http://d.old.wanfangdata.com.cn/Periodical/bjhkhtdxxb201603018 Ni Zi-Qiang, Wang Tian-Miao, Liu Da. Vision guide based teaching programming for industrial robot. Journal of Beijing University of Aeronautics and Astronautics, 2016, 42(3): 562-568 http://d.old.wanfangdata.com.cn/Periodical/bjhkhtdxxb201603018
[95]	Hu D Y, Gong Y Z, Hannaford B, Seibel E J. Semi-autonomous simulated brain tumor ablation with RavenⅡ surgical robot using behavior tree. In: Proceedings of the 2015 IEEE International Conference on Robotics and Automation. Seattle, WA, USA: IEEE, 2015. 3868-3875 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4578323/
[96]	Ewerton M, Neumann G, Lioutikov R, Amor H B, Peters J, Maeda G, et al. Learning multiple collaborative tasks with a mixture of interaction primitives. In: Proceedings of the 2015 IEEE International Conference on Robotics and Automation. Seattle, WA, USA: IEEE, 2015. 1535-1542 Learning multiple collaborative tasks with a mixture of interaction primitives
[97]	Silvério J, Calinon S, Rozo L, Caldwell D G. Bimanual skill learning with pose and joint space constraints. In: Proceedings of the 2018 IEEE/RAS International Conference on Humanoid Robots. Beijing, China: IEEE, 2018. 153-159 http://publications.idiap.ch/downloads/papers/2018/Silverio_HUMANOIDS_2018.pdf
[98]	Figueroa N, Ureche A L P, Billard A. Learning complex sequential tasks from demonstration: a pizza dough rolling case study. In: Proceedings of the 11th ACM/IEEE International Conference on Human-Robot Interaction. Christchurch, New Zealand: IEEE, 2016. 611-612 http://lasa.epfl.ch/publications/uploadedFiles/p611-figueroa.pdf
[99]	Calinon S, Sardellitti I, Caldwell D G. Learning-based control strategy for safe human-robot interaction exploiting task and robot redundancies. In: Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. Taipei, China: IEEE, 2010. 249-254 http://vigir.missouri.edu/~gdesouza/Research/Conference_CDs/IEEE_IROS_2010/data/papers/1177.pdf
[100]	Ureche A L P, Billard A. Analyzing human behavior and bootstrapping task constraints from kinesthetic demonstrations. In: Proceedings of the 10th Annual ACM/IEEE International Conference on Human-Robot Interaction Extended Abstracts. Portland, Oregon, USA: ACM, 2015: 199-200 http://lasa.epfl.ch/publications/uploadedFiles/p199-ureche.pdf
[101]	Muhlig M, Gienger M, Hellbach S, Steil J J, Goerick C. Task-level imitation learning using variance-based movement optimization. In: Proceedings of the 2009 IEEE International Conference on Robotics and Automation. Kobe, Japan: IEEE, 2009. 1177-1184 https://www.researchgate.net/publication/224557223_Task-level_imitation_learning_using_variance-based_movement_optimization
[102]	Gupta A, Eppner C, Levine S, Abbeel P. Learning dexterous manipulation for a soft robotic hand from human demonstrations. In: Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems. Daejeon, South Korea: IEEE, 2016. 3786-3793 https://arxiv.org/pdf/1603.06348.pdf
[103]	Peters J, Schaal S. Reinforcement learning of motor skills with policy gradients. Neural Networks, 2008, 21(4): 682- 697 doi: 10.1016/j.neunet.2008.02.003
[104]	Xu W J, Chen J, Lau H Y K, Ren H L. Automate surgical tasks for a flexible serpentine manipulator via learning actuation space trajectory from demonstration. In: Proceedings of the 2016 IEEE International Conference on Robotics and Automation. Stockholm, Sweden: IEEE, 2016. 4406-4413 https://ieeexplore.ieee.org/document/7487640
[105]	Murali A, Sen S, Kehoe B, Garg A, McFarland S, Patil S, et al. Learning by observation for surgical subtasks: multilateral cutting of 3D viscoelastic and 2D orthotropic tissue phantoms. In: Proceedings of the 2015 IEEE International Conference on Robotics and Automation. Seattle, WA, USA: IEEE, 2015. 1202-1209 https://people.eecs.berkeley.edu/~pabbeel/papers/2015-ICRA-LBO-DVRK.pdf
[106]	Ureche L P, Billard A. Constraints extraction from asymmetrical bimanual tasks and their use in coordinated behavior. Robotics and Autonomous Systems, 2018, 103: 222-235 doi: 10.1016/j.robot.2017.12.011
[107]	Salehian S S M, Khoramshahi M, Billard A. A dynamical system approach for softly catching a flying object: theory and experiment. IEEE Transactions on Robotics, 2016, 32(2): 462-471 doi: 10.1109/TRO.2016.2536749
[108]	Kalashnikov D, Irpan A, Pastor P, Ibarz J, Herzog A, Jang E, et al. Scalable deep reinforcement learning for vision-based robotic manipulation. In: Proceedings of the 2nd Conference on Robot Learning. Zurich, Switzerland: PMLR, 2018. 651-673
[109]	Deng J, Dong W, Socher R, Li L J, Li K, Li F F. Imagenet: a large-scale hierarchical image database. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL, USA: IEEE, 2009. 248-255 http://image-net.org/papers/imagenet_cvpr09.pdf
[110]	Du Z H, He L, Chen Y N, Xiao Y, Gao P, Wang T Z. Robot cloud: bridging the power of robotics and cloud computing. Future Generation Computer Systems, 2015, 21(4): 301-312 https://www.sciencedirect.com/science/article/pii/S0167739X16000042
[111]	Kehoe B, Patil S, Abbeel P, Goldberg K. A survey of research on cloud robotics and automation. IEEE Transactions on Automation Science and Engineering, 2015, 12(2): 398-409 doi: 10.1109/TASE.2014.2376492
[112]	Hu G Q, Tay W P, Wen Y G. Cloud robotics: architecture, challenges and applications. IEEE Network, 2012, 26(3): 21-28 doi: 10.1109/MNET.2012.6201212
[113]	Hunziker D, Gajamohan M, Waibel M, D$'$Andrea R. Rapyuta: the RoboEarth cloud engine. In: Proceedings of the 2013 IEEE International Conference on Robotics and Automation. Karlsruhe, Germany: IEEE, 2013. 438-444
[114]	Saxena A, Jain A, Sener O, Jami A, Misra D K, Koppula H S. Robobrain: large-scale knowledge engine for robots. arXiv: 1412.0691, 2014. https://arxiv.org/pdf/1412.0691.pdf
[115]	王飞跃.知识机器人与工业5.0. 2015年国家机器人发展论坛.北京: 中国自动化学会, 2015. Wang Fei-Yue. Knowledge Robot and Industry 5.0. In: Proceedings of the 2015 China National Robotics Development Forum. Beijing, China: Chinese Association of Automation, 2015.
[116]	白天翔, 王帅, 沈震, 曹东璞, 郑南宁, 王飞跃.平行机器人与平行无人系统:框架、结构、过程、平台及其应用.自动化学报, 2017, 43(2): 161-175 http://www.aas.net.cn/CN/abstract/abstract18998.shtml Bai Tian-Xiang, Wang Shuai, Shen Zhen, Cao Dong-Pu, Zheng Nan-Ning, Wang Fei-Yue. Parallel robotics and parallel unmanned systems: framework, structure, process, platform and applications. Acta Automatica Sinica, 2017, 43(2): 161-175 http://www.aas.net.cn/CN/abstract/abstract18998.shtml
[117]	王飞跃.软件定义的系统与知识自动化:从牛顿到默顿的平行升华.自动化学报, 2015, 41(1): 1-8 doi: 10.3969/j.issn.1003-8930.2015.01.001 Wang Fei-Yue. Software-defined systems and knowledge automation: a parallel paradigm shift from Newton to Merton. Acta Automatica Sinica, 2015, 41(1): 1-8 doi: 10.3969/j.issn.1003-8930.2015.01.001

施引文献

期刊类型引用(6)

1.	邵雨晴，张红娟，高妍. 电动汽车混合储能系统双判据多模式功率分配策略. 现代电力. 2025(01): 176-182 . 百度学术
2.	李敏，李琤，鲁业安. 基于工况预测的两挡纯电动汽车换挡策略研究. 机械设计与制造. 2025(03): 18-22 . 百度学术
3.	付主木，朱龙龙，陶发展，李梦杨. 基于Stackelberg博弈的燃料电池混合动力汽车跟车能量管理. 河南科技大学学报(自然科学版). 2024(04): 1-9+115 . 百度学术
4.	付主木，龚慧贤，宋书中，陶发展，孙昊琛. 燃料电池电动汽车改进深度强化学习能量管理. 河南科技大学学报(自然科学版). 2023(04): 41-48+6 . 百度学术
5.	申永鹏，袁小芳，赵素娜，孟步敏，王耀南. 智能网联电动汽车节能优化控制研究进展与展望. 自动化学报. 2023(12): 2437-2456 . 本站查看
6.	万欣，荀径，WU Guoyuan，赵子枞. 基于MPC的混合动力汽车能量管理策略. 北京交通大学学报. 2022(05): 149-158 . 百度学术

其他类型引用(10)

资源附件(0)

访问统计

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

机器人操作技能模型综述

doi: 10.16383/j.aas.c180836

作者简介:
秦方博中国科学院自动化研究所博士研究生.2013年获得北京交通大学电子信息工程学院学士学位.主要研究方向为机器人视觉感知与控制, 精密装配.E-mail:qinfangbo2013@ia.ac.cn

计量

Review of Robot Manipulation Skill Models

1. 工艺描述与优化控制框架

1.1 工艺描述

1.2 萃取量和洗涤量控制方案分析

2. 基于案例推理的药剂量预设定模型

3. 基于稀土溶液颜色的智能补偿模型

3.1 组分含量软测量模型

3.2 基于模糊推理的流量智能补偿模型

4. 试验结果与分析

5. 结论

期刊类型引用(6)

其他类型引用(10)

计量

目录

1. 工艺描述与优化控制框架

1.1 工艺描述

1.2 萃取量和洗涤量控制方案分析

2. 基于案例推理的药剂量预设定模型

3. 基于稀土溶液颜色的智能补偿模型

3.1 组分含量软测量模型

3.2 基于模糊推理的流量智能补偿模型

4. 试验结果与分析

5. 结论

留言板

机器人操作技能模型综述

doi: 10.16383/j.aas.c180836

作者简介: 秦方博 中国科学院自动化研究所博士研究生.2013年获得北京交通大学电子信息工程学院学士学位.主要研究方向为机器人视觉感知与控制, 精密装配.E-mail:qinfangbo2013@ia.ac.cn

计量

出版历程

Review of Robot Manipulation Skill Models

1. 工艺描述与优化控制框架

1.1 工艺描述

1.2 萃取量和洗涤量控制方案分析

2. 基于案例推理的药剂量预设定模型

3. 基于稀土溶液颜色的智能补偿模型

3.1 组分含量软测量模型

3.2 基于模糊推理的流量智能补偿模型

4. 试验结果与分析

5. 结论

期刊类型引用(6)

其他类型引用(10)

计量

出版历程

目录

1. 工艺描述与优化控制框架

1.1 工艺描述

1.2 萃取量和洗涤量控制方案分析

2. 基于案例推理的药剂量预设定模型

3. 基于稀土溶液颜色的智能补偿模型

3.1 组分含量软测量模型

3.2 基于模糊推理的流量智能补偿模型

4. 试验结果与分析

5. 结论

作者简介:
秦方博中国科学院自动化研究所博士研究生.2013年获得北京交通大学电子信息工程学院学士学位.主要研究方向为机器人视觉感知与控制, 精密装配.E-mail:qinfangbo2013@ia.ac.cn