2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

机器人运动轨迹的模仿学习综述

黄艳龙 徐德 谭民

宋秀兰, 李洋阳, 何德峰. 外部干扰和随机DoS攻击下的网联车安全H∞ 队列控制. 自动化学报, 2024, 50(2): 348−355 doi: 10.16383/j.aas.c230327
引用本文: 黄艳龙, 徐德, 谭民. 机器人运动轨迹的模仿学习综述. 自动化学报, 2022, 48(2): 315−334 doi: 10.16383/j.aas.c210033
Song Xiu-Lan, Li Yang-Yang, He De-Feng. Secure H∞ platooning control for connected vehicles subject to external disturbance and random DoS attacks. Acta Automatica Sinica, 2024, 50(2): 348−355 doi: 10.16383/j.aas.c230327
Citation: Huang Yan-Long, Xu De, Tan Min. On imitation learning of robot movement trajectories: A survey. Acta Automatica Sinica, 2022, 48(2): 315−334 doi: 10.16383/j.aas.c210033

机器人运动轨迹的模仿学习综述

doi: 10.16383/j.aas.c210033
基金项目: 国家自然科学基金(61873266)资助
详细信息
    作者简介:

    黄艳龙:英国利兹大学计算机系助理教授. 主要研究方向为模仿学习, 强化学习和运动规划. 本文通信作者. E-mail: y.l.huang@leeds.ac.uk

    徐德:中国科学院自动化研究所研究员. 1985年、1990年获得山东工业大学学士、硕士学位. 2001年获得浙江大学博士学位. 主要研究方向为机器人视觉测量, 视觉控制, 智能控制, 视觉定位, 显微视觉, 微装配. E-mail: de.xu@ia.ac.cn

    谭民:中国科学院自动化研究所复杂系统管理与控制国家重点实验室研究员. 主要研究方向为机器人系统和智能控制系统. E-mail: min.tan@ia.ac.cn

On Imitation Learning of Robot Movement Trajectories: A Survey

Funds: Supported by National Natural Science Foundation of China (61873266)
More Information
    Author Bio:

    HUANG Yan-Long University academic fellow at the School of Computing, University of Leeds, Leeds, UK. His interest covers imitation learning, reinforcement learning and motion planning. Corresponding author of this paper

    XU De Professor at the Institute of Automation, Chinese Academy of Sciences. He received his bachelor and master degrees from Shandong University of Technology in 1985 and 1990, respectively. He received his Ph. D. degree from Zhejiang University in 2001. His research interest covers robotics and automation, such as visual measurement, visual control, intelligent control, visual positioning, microscopic vision, and microassembly

    TAN Min Professor at the State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences. His research interest covers robotics and intelligent control systems

  • 摘要: 作为机器人技能学习中的一个重要分支, 模仿学习近年来在机器人系统中得到了广泛的应用. 模仿学习能够将人类的技能以一种相对直接的方式迁移到机器人系统中, 其思路是先从少量示教样本中提取相应的运动特征, 然后将该特征泛化到新的情形. 本文针对机器人运动轨迹的模仿学习进行综述. 首先详细解释模仿学习中的技能泛化、收敛性和外插等基本问题; 其次从原理上对动态运动基元、概率运动基元和核化运动基元等主要的模仿学习算法进行介绍; 然后深入地讨论模仿学习中姿态和刚度矩阵的学习问题、协同和不确定性预测的问题以及人机交互中的模仿学习等若干关键问题; 最后本文探讨了结合因果推理的模仿学习等几个未来的发展方向.
  • 语音增强的主要目标是从含噪语音中提取原始纯净语音信号, 通过抑制或分离噪声来提升语音感知质量与可懂度, 在语音信号通信、助听器和自动语音识别等领域有着广泛的应用. 经过几十年的发展, 众多语音增强算法相继被提出, 经典的语音增强技术主要包括谱减法、维纳滤波法、基于统计模型的方法以及基于子空间的方法等, 这些方法往往基于噪声平稳或缓变的假设, 在高度非平稳的噪声情况下增强效果会急剧恶化[1-2]. 深度学习[3]的兴起以及在声学建模领域的成功应用, 为解决复杂环境下的语音增强提供了思路. 根据网络学习的目标不同, 基于神经网络的语音增强主要分为基于时频掩蔽的方法与基于特征映射的方法. 基于时频掩蔽的方法将纯净语音与噪声之间的相互关系作为学习目标, 将得到的时频掩蔽估计作用于含噪语音上, 并经由逆变换技术合成增强语音的时域波形. Wang等[4]将深度神经网络(Deep neural networks, DNN)引入语音分离与降噪领域, 通过前馈DNN估计理想二值掩蔽(Ideal binary mask, IBM); 随后, Narayanan等[5]提出在梅尔谱域估计理想浮值掩蔽(Ideal ratio mask, IRM), 在一定程度上提高了语音识别的鲁棒性; Williamson等[6]也提出复数理想浮值掩蔽(Complex ideal ratio mask, cIRM), 并使用DNN同时估计cIRM的实部和虚部, 显著提高了语音的可懂度. 基于特征映射的方法利用神经网络学习含噪语音和纯净语音之间的复杂映射关系. Xu等[7]把深层神经网络视为一个回归模型, 使用带受限玻尔兹曼机(Restricted Boltzmann machine, RBM)预训练的DNN将含噪语音的对数功率谱映射到纯净语音的对数功率谱上; Park等[8]提出冗余卷积编解码网络, 通过删去池化层、加入跳跃连接的方式优化训练过程, 将卷积神经网络(Convolutional neural network, CNN)应用于频谱映射. 这两类方法通常需要将时域波形变换到时频域处理信号的幅度谱或功率谱, 往往会忽略掉语音信号中的相位信息.

    基于端到端的语音增强方法不依赖于频域表示, 可以有效地利用时域信号的相位信息, 避免了信号在时域和时频域之间来回切换, 简化处理流程. Qian等[9]考虑到WaveNet[10]对语音波形的强大建模能力, 提出将语音先验分布引入到WaveNet框架进行语音增强; Rethage等[11]也在WaveNet的基础上开展语音增强研究, 通过非因果的(Non-causal)扩张卷积来预测目标, 在主观评价指标上取得了比维纳滤波更好的效果. Pascual等[12]将生成对抗网络[13-14] (Generative adversarial nets, GAN)引入语音增强领域并提出SEGAN (Speech enhancement generative adversarial network), 并用其对时域波形信号直接处理, 取得了一定的增强效果, 但是在客观评价指标语音质量感知评价(Perceptual evaluation of speech quality, PESQ)上略低于维纳滤波. Fu等[15-16]提出全卷积神经网络并将其作用于整句语音波形信号, 提升了语音增强的性能. 这些基于端到端的方法都是直接将一维时域波形映射到目标语音, 然而时域波形信号本身并不能表现出明显的特征结构信息, 直接对时域信号建模比较困难, 而且低信噪比环境下信号更复杂, 建模难度会进一步提高. 有学者考虑将神经网络作为前端短时傅立叶变换(Short-time Fourier transform, STFT)替代方案[17-19], 我们在其基础上修改扩展, 提出了一个时频分析网络来模拟STFT变换过程的基函数, 将一维时域信息映射到一个类似于时频表示的高维空间中以获取更多的信息; 相比于常见的神经网络方法中使用时频域幅度谱或功率谱值的方式, 时频分析网络能更充分地利用输入信号中的相位信息.

    语音和噪声信号在时域相邻帧以及频域相邻频带间具有很强的相关性, 这种时频域的局部相关性与图像中的相邻像素间的相关性非常相似. 由于在语音增强领域使用卷积神经网络可以获得与深度神经网络和循环神经网络(Recurrent neural network, RNN)相当或更好的增强效果[8, 20-22], 为进一步提高语音增强的性能, 本文考虑使用卷积神经网络中的一种重要网络 — RefineNet[23]来进行端到端的语音增强. 它是一个通用的多路径优化网络, 通过显式利用下采样过程中的所有可用信息, 并使用较长范围的残差连接来实现高分辨率预测. 通过这种方式, 可以利用前期卷积的细粒度特性捕获更深层的高级特征; RefineNet的各个组件使用了带有Identity mappings[24]的残差连接, 这样梯度就可以通过不同跨度的残差连接直接传播, 从而实现高效的端到端训练.

    在语音增强领域的神经网络训练过程中, 通常将均方误差(Mean square error, MSE)作为损失函数, 而在客观评价中往往使用PESQ或STOI等评价指标, 这种损失函数与评价指标之间的差异性并不能保证训练后的模型在应用中能够提供最优的性能; Fu等[16]和Zhao等[25]将STOI评价指标融入到了损失函数中, 一定程度上提高了语音增强性能. 受此启发, 我们提出将STOI和SDR同时融入到损失函数中, 并且采用多目标联合优化策略, 利用神经网络根据不同目标之间的共性和差异性建模.

    本文提出了基于RefineNet的端到端语音增强模型(RefineNet-based speech enhancement, RNSE), 首先利用时频分析网络模仿STFT, 学习时域波形在模拟的二维时频空间表示; 然后利用RefineNet整合不同大小特征图的能力, 对不同粒度的模拟时频空间特征进行综合分析; 最后通过时频分析网络逆处理得到增强语音的估计. 在训练阶段, 我们将STOI与SDR评价指标融入到损失函数中进行联合优化, 从而得到更好的增强效果.

    RNSE模型的网络结构由时频分析网络TFANet (Time-frequence analysis network)和RefineNet两部分构成, 其结构如图1所示. TFANet是一个用于模拟短时傅里叶变换及其逆变换过程的时频分析神经网络, 在RNSE前端, TFANet将一维时域语音信号映射为二维特征表示; 在RNSE后端, TFANet将神经网络输出的增强后特征图重构成一维时域语音信号. RefineNet是RNSE的主体部分, 用于对特征图进行精炼分析, 并与TFANet结合, 实现从时域的含噪语音信号到时域的纯净语音信号的直接映射.

    图 1  RNSE模型结构图
    Fig. 1  The diagram for RNSE architecture

    Venkataramani等在语音分离任务中提出了实值转换方法[19], 通过卷积和平滑操作对原始时域波形进行预处理, 然后输入到后续神经网络中进行增强. 为了充分保留卷积结果中的原始信息, 我们去除了平滑操作, 提出了时频分析网络TFANet. 该网络包含编码分析阶段和解码生成阶段, 在编码分析阶段将时域信号处理为二维特征图表示并输入到RefineNet中, 在解码生成阶段将RefineNet输出的增强语音的特征图重构成一维语音信号. 假设含噪语音信号为s[n], 那么STFT计算可表示为:

    $$ {{\boldsymbol{x}}_t}[f] = \sum\limits_{i = 0}^{N - 1} {\boldsymbol{s}} [tH + i] \cdot {\boldsymbol{w}}[i] \cdot {{\boldsymbol{b}}_f}[i] $$ (1)

    式(1)中, xt[f]是语音在第t帧第f频点的STFT结果, 最终组成一个TF个频点的矩阵, N是每帧的采样点个数, H是相邻帧间的位移, w是窗函数系数, bf[i]是对应的STFT变换系数. 令${\boldsymbol{k}} = $$ {\boldsymbol{w}} \cdot {\boldsymbol{b}}$, 可以将式(1)变换成卷积形式:

    $$ {{\boldsymbol{x}}_t}[f] = \sum\limits_{i = 0}^{N - 1} {\boldsymbol{s}} [tH + i] \cdot {{\boldsymbol{k}}_f}[i] $$ (2)

    TFANet通过一个卷积层来模拟实现上式的计算过程, 其中包含F个大小为N且系数为kf的卷积核, 我们将卷积步长设为H, 输出设为x. 通过试验参数, 本文将H设置为64, TFN均为512, 这层卷积的输出为512×512的2维矩阵. 在非端到端的方法中, 通常将时域语音信号通过STFT处理为幅度谱表示, 经由模型增强后, 再结合原始含噪语音的相位谱合成增强后的时域语音波形. 如图1所示, 类比这种语音增强过程, 我们通过对x取绝对值|x|来模拟STFT的幅度谱, 然后将|x|作为特征图输入到RefineNet中学习含噪语音到纯净语音的复杂映射关系. 这里RNSE模型保留了x的正负号表示p是对原始信号相位的模拟, 用于增强语音的重构.

    由于RefineNet的输出特征图的长和宽是其输入的1/4, 在解码生成阶段, 我们使用步长为4的解卷积层将特征图恢复为原大小, 同时微调特征图. 接着将特征图与编码分析阶段保留的p相乘, 输入到解卷积层, 模拟语音重构过程的短时傅里叶逆变换, 最终得到对时域纯净语音${{\boldsymbol{\hat s}}_t}$的估计.

    RefineNet是在ResNet[26]框架上的改进, 为了在增加神经网络深度的同时不影响模型训练, ResNet采用了残差连接, 将一层或多层叠加的隐含层输出F(x)与输入x相加, 作为最终输出:

    $$ {\boldsymbol{o}} = F\left( {\boldsymbol{x}} \right) + {\boldsymbol{x}} $$ (3)

    本文通过实验最终确定的ResNet结构如图2所示. ResNet的输入依次经过卷积核大小为7×7步长为2的卷积层, 步长为2的池化层, 进入4个叠加的网络块(ResBlock). 每个ResBlock包含7个结构相似的卷积层, 以ResBlock 1为例, 它是一个输出通道为256的堆叠卷积层, 每个卷积层步长均为1; 在ResBlock 1中包含2个三层堆叠卷积层, 每个三层堆叠的卷积层与ResBlock的输出通道相同, 且除了第二层卷积核大小为3×3且步长与ResBlock相同外, 其他层卷积核大小均为1×1且步长为1; 在ResBlock中通过残差连接的方式将输入输出连接起来, 提升网络的表征能力. 其余3个ResBlock的结构与ResBlock 1的结构相似, 不再赘述.

    图 2  ResNet模型结构图(Conv后用, 分隔的分别是卷积层的输出通道数、步长, 若未指明步长, 默认为1)
    Fig. 2  The diagram for ResNet architecture

    4个ResBlock输出的特征图逐块缩小, 感受野变大, 特征更加抽象, 从而能捕获更高层次的全局和上下文信息, 并且计算量随之减少, 但是精细特征也会逐渐丢失. RefineBlock是一种神经网络块, 可以把不同大小的特征图融合, 从而利用高层的抽象特征和底层的精细特征, 其结构如图3所示, 包含残差卷积单元RCU (Residual convolution unit)、自适应卷积(Adaptive convolution)模块、多分辨率融合(Multi-resolution fusion)模块、链式残差池化(Chained residual pooling)模块、RefineBlock输出卷积(RefineBlock output convolution)模块等. 自适应卷积模块用于在融合前微调特征图, 由2个RCU构成, 每个RCU包含2层带ReLU激活的卷积, 每个特征图输入与第2层卷积输出相加构成残差连接. RefineBlock 4只有1个特征图输入, 而其他RefineBlock有2个输入.

    多分辨率融合模块用于将不同分辨率的特征图合成一张图. 首先, 特征图通过一层卷积做融合前的微调, 然后以分辨率最高的特征图为基准, 对所有分辨率较低的新特征图通过双线性插值上采样, 最后直接相加, 得到一张高分辨率的特征图. 链式残差池化模块使用更大的感受野从输入特征图中提取抽象特征. 特征图首先经过ReLU激活函数, 池化压缩图大小, 提取主要特征, 再通过卷积层微调, 得到的新特征图在进行下一次的池化和卷积的同时, 通过残差连接与原特征图融合, 形成链式的残差池化结构. RefineBlock输出卷积模块由1个RCU组成.

    RefineNet的总体结构如图1所示, ResBlock 4的特征图输入到RefineBlock 4中, 经过微调输入RefineBlock 3, 与ResBlock 3的特征图融合, 再依次通过RefineBlock 2、1与ResBlock 2、1的特征图融合, 最后经过输出卷积模块做最后的微调. 输出卷积模块包含2个RCU, 以及1个卷积核大小为1×1的卷积层.

    基于深度学习的语音增强模型常用均方误差MSE作为优化目标, 在时域可表示为:

    $$ {\cal{L}}_{\rm{MSE}}=\min \frac{1}{NP} \sum\|\boldsymbol{\hat{y}}-{\boldsymbol{y}}\|^{2}_2 $$ (4)

    式中${\boldsymbol{\hat y}} \in {\bf{R}}^{1\times P}$是时域的增强语音, ${\boldsymbol{y}}\in {\bf{R}}^{1\times P}$是纯净语音, $\|\cdot\|_2 $L2范数, PN是每条语音的采样点数与语音总数. 虽然MSE在大量模型里得到应用, 但不能保证得到最优的模型训练结果[16], 其值越小不等同于语音可懂度和语音质量越高, 而提升可懂度和质量是语音增强算法的目标. STOI是语音客观可懂度评估指标, SDR则计算了语音信号与失真信号的比率, 与语音质量高度相关. 本文提出将STOI与SDR两个评估指标共同融合到均方根误差(Root mean square error, RMSE)中进行联合优化的策略, 通过直接优化评价指标来提升语音增强模型的性能, 缓解损失函数与评价指标之间的不匹配问题. PESQ也是常用的语音增强评估指标, 但是其计算方式比STOI要复杂得多, 模型训练效率会降低; 而且PESQ计算中的一些函数(比如非对称扰动因子)是不连续的, 不满足梯度下降优化的可微条件[27], 所以本文没有将PESQ融合到损失函数中联合优化. 本文提出模型RNSE的优化目标为:

    图 3  RefineBlock结构图
    Fig. 3  The diagram for RefineBlock architecture
    $$\begin{aligned} {\cal{L}}= & \min \bigg[\alpha \frac{1}{ N P} \sqrt{\sum\|{\boldsymbol{\hat{y}}-{\boldsymbol{y}}\|^{2}_2}}+\\ & \frac{1}{N} \sum\left(-\beta C_{\rm{stoi}}(\boldsymbol{\hat{y}}, {\boldsymbol{y}})-\lambda C_{\rm{sdr}}(\boldsymbol{\hat{y}}, {\boldsymbol{y}})\right)\bigg] \end{aligned}$$ (5)

    其中$\alpha $$\beta $$\lambda $是各优化目标的权重系数, CstoiCsdr表示计算STOI、SDR的函数, 下面是对两优化目标的详细介绍.

    1) SDR优化目标

    SDR是增强语音信号中纯净语音分量${{\boldsymbol{\hat y}}_c}$与其他分量的能量比值. ${{\boldsymbol{\hat y}}_c}$计算公式如下:

    $$\boldsymbol{\hat{y}}_{c}=\frac{\boldsymbol{\hat{y}} {\boldsymbol{y}}^{\rm{T}}}{{\boldsymbol{y}}{\boldsymbol{y}}^{\rm{T}}} \cdot {\boldsymbol{y}}$$ (6)

    SDR的计算公式为:

    $${\rm{SDR}}=10 \lg \frac{\boldsymbol{\hat{y}}_{c}\boldsymbol{\hat{y}}_{c}^{\rm{T}}}{\left\|\boldsymbol{\hat{y}}-\boldsymbol{\hat{y}}_{c}\right\|^{2}_2}$$ (7)

    将式(6)代入式(7)可得到:

    $${\rm{SDR}}=10 \lg \frac{(\boldsymbol{\hat{y}} {\boldsymbol{y}^{\rm{T}}})^{2}}{\|{\boldsymbol{y}}\|^{2}_2 \|\boldsymbol{\hat{y}}\|^{2}_2-(\boldsymbol{\hat{y}} {\boldsymbol{y}^{\rm{T}}})^{2}}$$ (8)

    对式(5)中SDR优化目标做等价替换以简化计算:

    $$\begin{split} {\cal{L}} &=\min -C_{s t o i}(\boldsymbol{\hat{y}}, {\boldsymbol{y}})=\min -10 \lg \frac{(\boldsymbol{\hat{y}} {\boldsymbol{y}}^{\rm{T}})^{2}}{\|{\boldsymbol{y}}\|^{2}_2 \|\boldsymbol{\hat{y}}\|^{2}_2-(\boldsymbol{\hat{y}} {\boldsymbol{y}^{\rm{T}}})^{2}} \\ & \Leftrightarrow \min \frac{{\|{\boldsymbol{y}}\|^{2}_2 \|\boldsymbol{\hat{y}}\|^{2}_2-(\boldsymbol{\hat{y}} {\boldsymbol{y}^{\rm{T}}})^{2}} }{(\boldsymbol{\hat{y}} {\boldsymbol{y}^{\rm{T}}})^{2}} \\ & \Leftrightarrow \min \frac{{\|{\boldsymbol{y}}\|^{2}_2 \|\boldsymbol{\hat{y}}\|^{2}_2}}{(\boldsymbol{\hat{y}} {\boldsymbol{y}}^{\rm{T}})^{2}} \Leftrightarrow \min \frac{\|\boldsymbol{\hat{y}}\|^{2}_2}{(\boldsymbol{\hat{y}} {\boldsymbol{y}^{\rm{T}}})^{2}}\\[-15pt] \end{split}$$ (9)

    在上式的最后一步推导中, 我们丢弃了$\|{\boldsymbol{y}}\|^2_2 $, 因为它对于网络的输出来说是一个正常数, 不影响模型训练.

    2) STOI优化目标

    STOI用于评估语音的可理解性, 输入是纯净语音y和增强语音${\boldsymbol{\hat y}}.$ 首先去除对语音可懂度无贡献的无声区域, 然后对信号做STFT, 对两个信号进行时频分解, 通过将两个信号分割为50 %重叠的带汉宁窗的帧, 得到与听觉系统中语音表征性质相似的特征. 接着进行1/3倍频带分析, 划分共15个1/3倍频带, 其中频带中心频率范围为150 Hz至4.3 kHz. 纯净语音的短时时间包络${{\boldsymbol{z}}_{j,m}}$可表示如下:

    $$ {{\boldsymbol{z}}_{j,m}} = {\left[ {{{\boldsymbol{Y}}_j}(m - L + 1)\;{{\boldsymbol{Y}}_j}(m - L + 2)\; \cdots {{\boldsymbol{Y}}_j}(m)} \right]^{\rm{T}}} $$ (10)

    其中${\boldsymbol{Y}} \in {{{\bf{R}}}^{15\times M}}$是由划分得到的15个1/3倍频带, M代表该段语音的总帧数, $j \in \left\{ {1,2, \cdots, 15} \right\}$是15个1/3倍频带的索引, m为帧的索引, L = 30, 其代表分析语音段长度为384 ms.

    类似地, ${{\boldsymbol{\hat z}}_{j,m}}$表示增强语音${\boldsymbol{\hat y}}$的短时时间包络. 之后对语音进行归一化与裁剪, 归一化用来补偿全局差异, 这种差异不应该对语音的可懂度产生影响, 裁剪限定了严重恶化语音的STOI 取值边界. 增强语音的归一化和裁剪时间包络表示为${{\boldsymbol{\tilde z}}_{j,m}}$. 可懂度的测量被定义为两个时间包络之间的相关系数:

    $${\boldsymbol{d}}_{j, m}=\frac{\left({\boldsymbol{z}}_{j, m}-{\boldsymbol{\mu}}_{{\boldsymbol{z}}_{j, m}}\right)^{{\rm{T}}}\left(\tilde{\boldsymbol{z}}_{j, m}-{\boldsymbol{\mu}}_{\tilde{{\boldsymbol{z}}}_{j, m}}\right)}{\left\|{\boldsymbol{z}}_{j, m}-{\boldsymbol{\mu}}_{{\boldsymbol{z}}_{j, m}}\right\|_{2}\left\|\tilde{{\boldsymbol{z}}}_{j, m}-{\boldsymbol{\mu}}_{\tilde{{\boldsymbol{z}}}_{j, m}}\right\|_{2}}$$ (11)

    ${\boldsymbol{\mu}} _{{\boldsymbol{z}}_{j,m}}$${\boldsymbol{\mu}} _{\tilde{{\boldsymbol{z}}}_{j,m}}$分别表示${{\boldsymbol{z}}_{j,m}}$${\tilde{{\boldsymbol{z}}}_{j,m}}$中元素的均值. 最后, STOI通过计算所有波段和帧的可懂度均值得到:

    $$C_{\rm{stoi}}(\boldsymbol{\hat{y}}, {\boldsymbol{y}})=\frac{1}{15 M} \sum_{j, m} {\boldsymbol{d}}_{j, m}$$ (12)

    ${C_{{\rm{soti}}}}\left( {{\boldsymbol{\hat y}}{\rm{,}}{\boldsymbol{y}}} \right)$为用于训练神经网络的STOI优化目标.

    实验中使用的语音数据来自于TIMIT数据集, 噪声数据集采用ESC-50作为训练集, 为了验证本文提出模型的泛化性能, 我们也将Noisex92噪声数据集用于测试. TIMIT数据集总共包含6300条语音, 由630人每人录制10个句子得到, 男女比率为7:3. 其中, 每人录制的句子中有7个是重复的, 为了去除重复句子对模型训练与测试的影响, 本实验只取句子均不相同的1890条语音. 将其中约80 %的语音作为训练集, 另外20 %作为测试语音, 且男女比例与TIMIT总体分布相同. ESC-50数据集包含2000条带标签的环境录音集合, 共分为5个主要类别: 动物、自然音景与水声、非语音人声、室内声音、城区声音. Noisex92是常用于语音增强测试的数据集, 本文使用Babble、Factory1、White、HFChannel四种常见噪声进行不同噪声环境的测试, 用所有15种Noisex92噪声做不可见噪声测试, 用所有ESC-50噪声做可见噪声测试.

    本文选择4个经典算法对比: a) Log-MMSE, 是一种常用的基于统计模型的语音增强方法[28]; b) CNN-SE[29], 采用CNN对语音进行增强, 并且通过添加跳连接的方式融合神经网络低层和高层的信息; c) WaveUnet[30], 基于Unet模型结构所提出的一种应用于时域语音波形信号的神经网络模型[31]; d) AET[19], 通过神经网络模仿STFT前端变换过程, 直接在时域语音波形上建模, 其中b)、c)、d)均为基于端到端的语音增强方法.

    本文采用的评估指标为STOI、PESQ及SDR, 其中STOI是短时客观可懂度, 用于衡量语音可懂度, 得分范围为0 ~ 1, 分值越高表明可懂度越高; PESQ用于语音质量感知评估, 它是ITU-T (国际电信联盟电信标准化部)推荐的语音质量评估指标, 其得分范围为−0.5 ~ 4.5, 值越大表明质量越好. SDR测试增强语音中纯净语音的分量与其他分量的参量比值, 范围理论上为整个实数域, 值越大表明增强效果越好.

    本文评估了RNSE与其他非端到端方法的性能差异. 图4展示了在几种常见噪声条件下各模型的指标对比, 我们可以看出在不同噪声环境和不同信噪比条件下, RNSE方法相比于时频域方法有显著的性能提升, 在每种评估指标下几乎均取得了最佳结果. 我们还注意到, 即使在 −10 dB的极限信噪比下, RNSE方法仍然可以取得比部分基线方法在−5 dB下相当或更好的性能, 这意味着RNSE更适合于在低信噪条件下的复杂环境中挖掘语音信号的信息. 我们通过在可见和不可见噪声下做测试, 进一步验证RNSE模型的泛化性, 表1表2分别给出了已知噪声和未知噪声下的客观评价指标; 由表1表2可以看出, RNSE在已知噪声环境和未知噪声环境下均取得了最佳的结果, 而且远优于其他端到端对比方法; 同时, 我们注意到相比于其他基线方法, WaveUnet方法在STOI上, 取得了相对更高的客观评估指标. 为了更加直观的比较各种算法的增强效果, 我们对各个网络增强后的语音的语谱图进行了比较分析, 图5为在0 dB的Babble噪声下使用不同算法得到的增强语音的语谱图, 横轴表示时间T, 纵轴表示语音信号频率F. 从语谱图中可以看出, 各种算法都在一定程度上对含噪语音进行了有效的处理, CNN-SE与WaveUnet方法在增强含噪语音的过程中, 存在相对较多的噪声残留; AET方法在增强过程中, 对噪声的抑制能力更强, 但在去除大量噪声的过程中也去除了一些语音成分信息. 由于时域波形信号的复杂性, 通过神经网络直接挖掘时域特征时, 无法较为准确地辨识语音和噪声, 导致在增强过程中, 会引入一些噪声或增强过度. RNSE方法利用TFANet将时域信号映射到二维表达空间, 保留其正负号特征并用于后期波形重构. 通过这种方式引导神经网络在训练过程中对原始信息的利用, 可以缓解模型在增强过程中的增强不足或增强过度的问题.

    图 4  不同噪声不同信噪比下实验结果图(从第一行到第三行评价指标分别为PESQ、STOI与SDR, 图(a) ~ (c)、图(d) ~ (f)、图(g) ~ (i)、图(j) ~ (l)分别为Babble, Factory1, White, HFChannel噪声下的结果;每簇信噪比中的柱状图从左至右依次对应Log-MMSE, CNN-SE, WaveUNet, AET以及RNSE)
    Fig. 4  Experimental results under different noise and SNR
    表 1  可见噪声的测试结果
    Table 1  The performance of baseline systems compared to the proposed RNSE approach in seen noise condition
    指标模型可见噪声
    −10−505
    PESQ(a)1.111.461.792.10
    (b)1.651.922.242.51
    (c)1.661.922.232.50
    (d)1.702.002.252.48
    (e)2.112.462.732.93
    STOI(a)0.580.680.770.85
    (b)0.640.720.800.86
    (c)0.660.740.810.86
    (d)0.630.720.790.84
    (e)0.770.850.900.93
    SDR(a)−6.67−1.723.077.58
    (b)−2.242.026.359.76
    (c)−0.613.307.2510.38
    (d)1.435.768.6710.87
    (e)7.019.9612.1613.98
    注: (a) Log-MMSE, (b) CNN-SE, (c) WaveUnet, (d) AET, (e) RNSE
    下载: 导出CSV 
    | 显示表格
    表 2  不可见噪声的测试结果
    Table 2  The performance of baseline systems compared to the proposed RNSE approach in unseen noise condition
    指标模型不可见噪声
    −10−505
    PESQ(a)1.331.702.042.35
    (b)1.481.772.092.39
    (c)1.491.762.082.36
    (d)1.581.872.152.39
    (e)1.802.242.612.88
    STOI(a)0.520.630.740.83
    (b)0.560.660.760.83
    (c)0.590.690.780.85
    (d)0.570.690.770.83
    (e)0.670.790.870.92
    SDR(a)−0.174.778.6912.03
    (b)−2.971.966.349.81
    (c)−1.283.257.0510.22
    (d)1.505.658.6610.99
    (e)4.868.4511.3913.78
    注: (a) Log-MMSE, (b) CNN-SE, (c) WaveUnet, (d) AET, (e) RNSE
    下载: 导出CSV 
    | 显示表格
    图 5  0 dB的Babble噪声下的语音增强语谱图示例
    Fig. 5  An example of spectrogram of enhanced speech under Babble noise at 0 dB SNR

    通过在各种噪声和信噪比环境下的测试表明RNSE模型在复杂环境下具有较强的鲁棒性. 在RNSE模型训练阶段, 我们把评估指标融入到损失函数中, 为了比较融入的评价指标对语音增强性能的影响, 我们比较了在不同组合的损失函数下RNSE模型的增强性能, 图6展示了不同信噪比下的增强效果对比. 从图中可以看出, 在使用单一目标作为损失函数时, 基于SDR的损失函数在PESQ和SDR评价指标上均取得了相对更好的性能, 基于STOI的损失函数在STOI指标上也取得了更好的性能; 但是不同的损失函数存在与其他评估指标不兼容的情况, 比如基于STOI的损失函数在PESQ与SDR指标上的性能较低, 这是由于STOI的计算是基于增强语音的时间包络, 其作为训练的损失函数时会引导神经网络模型过多关注增强语音与纯净语音之间的时间包络关系, 导致在PESQ和SDR方面的性能不佳. 同时我们注意到两两组合的损失函数相比于单一目标损失函数可以取得相对更好的性能, 基于STOI与SDR融合的损失函数取得了比其他组合或单一目标损失函数更好的评估结果. 进一步地, 沿着这个思路, 我们将STOI和SDR与RMSE按照一定的权重组合起来联合训练优化调参.

    图 6  基于不同损失函数的测试结果
    Fig. 6  Results based on different objective functions

    在调参的过程中, 先单独使用STOI、SDR以及RMSE作为损失函数进行训练, 观察他们分别训练的损失函数值, 当其收敛到某一个数量级时, 再通过调节超参数$\alpha $$\beta $以及$\lambda $对相应的损失函数值进行收缩, 将他们的范围都限制到 −1 ~ +1的范围内, 然后在此基础上微调, 从而得到模型各超参数的最佳匹配. 图中STOI+SDR+MSE组合对应于式(5)中的超参数$\alpha = 10$$\beta = 1$$\lambda = 5 \times {10^3}$. 由此, 我们从实验直观地证明了损失函数与评价指标的不匹配会导致语音增强性能无法达到最佳, 我们通过将评估指标与损失函数按照一定的权重比例组合并输入到神经网络中联合训练, 显著提高了语音增强的性能, 表明损失函数与评估指标的结合可以有效地提高语音增强的性能, 而且本文提出的将评估指标融合到损失函数中联合训练的思想并不是只适用于语音增强领域, 还可以普适性地应用到其他各领域.

    本文提出了一个端到端的语音增强算法. 首先构建一个时频分析网络对语音信号编码分析, 然后利用RefineNet网络学习含噪语音到纯净语音的特征映射, 最后解码生成增强的语音信号. 在此基础上, 我们提出将评价指标与训练损失函数相融合的改进方法以及将STOI与SDR同时作为优化目标的多目标学习策略. 在不同噪声环境和不同信噪比下的测试中, 本文提出的方法在STOI、PESQ以及SDR方面的指标显著优于具有代表性的传统方法和端到端的深度学习方法, 证明它能更好地提高语音的清晰度和可懂度; 通过对不同损失函数的对比实验, 本文验证了将评价指标与损失函数融合的策略在深度学习模型上的有效性.


  • 1 在一些文献中轨迹的模仿学习被归类为BC, 然而考虑到其研究内容的差异, 本文采用不同的划分方式.
  • 2 将式(2)中的$ \boldsymbol{{s}} $$ \boldsymbol{{\xi}} $分别用$ \boldsymbol{{\xi}} $$ \dot{\boldsymbol{{\xi}}} $进行替换即可.
  • 3 该协方差可以控制自适应轨迹经过期望点$ \boldsymbol{{\mu}}_t^{*} $的误差: $ \boldsymbol{{\Sigma}}_t^{*} $越小则误差越小, 反之则误差变大.4 根据文献([35], 第3.6节), 固定基函数的数量常随输入变量维度的增加呈指数级增加.5 关于从GMM中采样的方法可以参考文献[59].
  • 4 根据文献([35], 第3.6节), 固定基函数的数量常随输入变量维度的增加呈指数级增加.
  • 5 关于从GMM中采样的方法可以参考文献[59].
  • 6 对于期望点输入和参考轨迹存在重叠的情况, 可参考文献[6]中的轨迹更新策略.
  • 7 在预测之前需要获得足够多的训练样本对$ \{\boldsymbol{{s}}, \tilde{\boldsymbol{{w}}}\} $.
  • 8 分割后的轨迹片段一般不等同于MP, 常常不同的轨迹片段可能对应相同的MP, 因此需要对轨迹片段进行聚类.
  • 9 向量值GP通过恰当的可分离核函数可以表征多维轨迹之间的协同关系, 然而其未考虑轨迹本身的方差, 故这里未将其包括在内.
  • 10 这里使用“协方差”是为了表明i)和ii)使用相同的预测模型.11 这些工作中对应的控制器被称作最小干涉控制(Minimal intervention control).
  • 11 这些工作中对应的控制器被称作最小干涉控制(Minimal intervention control).
  • 12 利用泛函梯度得到的导数为函数, 该导数用来对函数本身进行优化.
  • 13 该更新同时也需要机器人的观测轨迹, 然而该轨迹恰是需要预测的, 因此文献[20]在更新$ \boldsymbol{{w}} $时将机器人的观测值设成零向量, 同时将拟合机器人轨迹的基函数设成零矩阵.
  • 图  1  KMP在粉刷任务中的应用[30]. 第一行表示技能的示教, 第二行和第三行分别对应新情形下的泛化

    Fig.  1  The application of KMP in painting tasks[30]. The first row illustrates kinesthetic teaching of a painting task while the second and third rows correspond to skill adaptations in unseen situations

    图  2  粉刷任务中的示教轨迹(a) ~ (b)以及泛化轨迹(c) ~ (f), 其中(c) ~ (d)和(e) ~ (f)对应不同情形下的泛化[30]. $[p_x \ p_y \ p_z]^{\rm{T}} $$[q_s \ q_x \ q_y \ q_z]^{\rm{T}}$分别表示机器人末端的位置和四元数姿态. 圆圈为泛化时对应的期望路径点

    Fig.  2  Demonstrations (a) ~ (b) and adapted trajectories (c) ~ (f) in painting tasks, where (c) ~ (d) and (e) ~ (f) correspond to different adaptations. $[p_x \ p_y \ p_z]^{\rm{T}} $ and $[q_s \ q_x \ q_y \ q_z]^{\rm{T}}$ denote Cartesian position and quaternion, respectively. Circles depict various desired points

    图  3  DMP在书写字母中的应用. (a)表示技能的复现, (b) ~ (c)均表示技能的泛化, 其中实线对应DMP生成的轨迹, 虚线为示教轨迹并用 ‘*’ 和 ‘+’ 分别表示其起点和终点, 圆圈表示泛化轨迹需要经过的期望位置点

    Fig.  3  The application of DMP in writing tasks. (a) corresponds to skill reproduction, (b) ~ (c) represent skill adaptations with different desired points. Solid curves are generated via DMP, while the dashed curves denote the demonstration with ‘*’ and ‘+’ respectively marking its starting and ending points. Circles depict desired points which the adapted trajectories should go through

    图  4  KMP在书写字母中的应用. (a)对应二维轨迹, (b) ~ (e)分别表示轨迹的$x,$ $y,$ $\dot{x}$$\dot{y}$分量. 实线对应KMP生成的轨迹, 虚线为通过GMR对示教轨迹进行建模得到的均值, 圆圈表示不同的期望点

    Fig.  4  The application of KMP in a writing task. (a) plots the corresponding 2D trajectories, while (b) ~ (e) show the $x,$ $y,$ $\dot{x}$ and $\dot{y}$ components of trajectories, respectively. Solid curves are planned via KMP while the dashed curves are retrieved by GMR after modelling demonstrations. Circles denote various desired points

    图  5  应用GMM和GMR对多条示教轨迹进行概率建模. (a) ~ (b)分别对应示教轨迹的$x$$y$分量, (c) ~ (d)表示GMM和GMR的建模结果, 其中(c)中椭圆表示GMM中的高斯成分, (d)中的实线和阴影部分分别表示多条轨迹的均值和方差

    Fig.  5  The modeling of multiple demonstrations using GMM and GMR. (a) ~ (b) plot the $x$ and $y$ components of demonstrations. (c) ~ (d) depict the probabilistic features obtained via GMM and GMR, where the ellipses in (c) denote the Gaussian components in GMM, the solid curve and shaded area in (d) represent the mean and covariance of demonstrations, respectively

    图  6  DMP 的外插应用

    Fig.  6  The extrapolation application of DMP

    图  7  KMP在人机交互中的应用[34]. 第一行表示技能示教, 第二行为技能复现, 第三行对应新情形下的技能泛化

    Fig.  7  The application of KMP in handover tasks[34]. The first row shows kinesthetic teaching of a handover task, while the second and third rows illustrate skill reproduction and adaptation, respectively

    表  1  几种主要模仿学习方法的对比

    Table  1  Comparison among the state-of-the-art approaches in imitation learning

    技能复现 多轨迹概率 中间点 目标点 外插 收敛性 时间输入 多维输入
    位置 速度 位置 速度
    GMM[35]
    HMM/HSMM[40]
    GP[45]
    DMP[2]
    SEDS[3]
    ProMP[4]
    KMP[6]
    TP-GMM[5]
    下载: 导出CSV

    表  2  几种主要姿态学习方法的对比

    Table  2  Comparison among the state-of-the-art approaches in orientation learning

    单位范数 多轨迹概率 中间姿态 目标姿态 收敛性 时间输入 多维输入
    单个基元 姿态 角速度 姿态 角速度
    Pastor 等[62]
    Silverio 等[63]
    Ude 等[64]
    Abu-Dakka 等[65]
    Ravichandar 等[66]
    Zeestraten 等[67]
    Huang 等[34]
    Saveriano 等[68]
    下载: 导出CSV
  • [1] Schaal S. Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences, 1999, 3(6): 233-242 doi: 10.1016/S1364-6613(99)01327-3
    [2] Ijspeert A J, Nakanishi J, Hoffmann H, Pastor P, Schaal S. Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Computation, 2013, 25(2): 328-373 doi: 10.1162/NECO_a_00393
    [3] Khansari-Zadeh S M, Billard A. Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Transactions on Robotics, 2011, 27(5): 943-957 doi: 10.1109/TRO.2011.2159412
    [4] Paraschos A, Daniel C, Peters J, Neumann G. Probabilistic movement primitives. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. Nevada, USA: NIPS, 2013. 2616−2624
    [5] Calinon S, Bruno D, Caldwell D G. A task-parameterized probabilistic model with minimal intervention control. In: Proceedings of the 2014 IEEE International Conference on Robotics and Automation. Hong Kong, China: IEEE, 2014. 3339−3344
    [6] Huang Y L, Rozo L, Silverio J, Caldwell D G. Kernelized movement primitives. The International Journal of Robotics Research, 2019, 38(7): 833-852 doi: 10.1177/0278364919846363
    [7] Muhlig M, Gienger M, Hellbach S, Steil J J, Goerick C. Task-level imitation learning using variance-based movement optimization. In: Proceedings of the 2009 IEEE International Conference on Robotics and Automation. Kobe, Japan: IEEE, 2009. 1177−1184
    [8] Huang Y L, Buchler D, Koc O, Scholkopf B, Peters J. Jointly learning trajectory generation and hitting point prediction in robot table tennis. In: Proceedings of the 2016 IEEE-RAS 16th International Conference on Humanoid Robots. Cancun, Mexico: IEEE, 2016. 650−655
    [9] Huang Y L, Silverio J, Rozo L, Caldwell D G. Hybrid probabilistic trajectory optimization using null-space exploration. In: Proceedings of the 2018 IEEE International Conference on Robotics and Automation. Brisbane, Australia: IEEE, 2018. 7226−7232
    [10] Stulp F, Theodorou E, Buchli J, Schaal S. Learning to grasp under uncertainty. In: Proceedings of the 2011 IEEE International Conference on Robotics and Automation. Shanghai, China: IEEE, 2011. 5703−5708
    [11] Mylonas G P, Giataganas P, Chaudery M, Vitiello V, Darzi A, Yang G Z. Autonomous eFAST ultrasound scanning by a robotic manipulator using learning from demonstrations. In: Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. Tokyo, Japan: IEEE, 2013. 3251−3256
    [12] Reiley C E, Plaku E, Hager G D. Motion generation of robotic surgical tasks: Learning from expert demonstrations. In: Proceedings of the 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology. Buenos Aires, Argentina: IEEE, 2010. 967−970
    [13] Colome A, Torras C. Dimensionality reduction in learning Gaussian mixture models of movement primitives for contextualized action selection and adaptation. IEEE Robotics and Automation Letters, 2018, 3(4): 3922-3929 doi: 10.1109/LRA.2018.2857921
    [14] Canal G, Pignat E, Alenya G, Calinon S, Torras C. Joining high-level symbolic planning with low-level motion primitives in adaptive HRI: Application to dressing assistance. In: Proceedings of the 2018 IEEE International Conference on Robotics and Automation. Brisbane, Australia: IEEE, 2018. 3273−3278
    [15] Joshi R P, Koganti N, Shibata T. A framework for robotic clothing assistance by imitation learning. Advanced Robotics, 2019, 33(22): 1156-1174 doi: 10.1080/01691864.2019.1636715
    [16] Motokura K, Takahashi M, Ewerton M, Peters J. Plucking motions for tea harvesting robots using probabilistic movement primitives. IEEE Robotics and Automation Letters, 2020, 5(2): 3275-3282 doi: 10.1109/LRA.2020.2976314
    [17] Ding J T, Xiao X H, Tsagarakis N, Huang Y L. Robust gait synthesis combining constrained optimization and imitation learning. In: Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems. Las Vegas, USA: IEEE, 2020. 3473−3480
    [18] Zou C B, Huang R, Cheng H, Qiu J. Learning gait models with varying walking speeds. IEEE Robotics and Automation Letters, 2020, 6(1): 183-190
    [19] Huang R, Cheng H, Guo H L, Chen Q M, Lin X C. Hierarchical interactive learning for a human-powered augmentation lower exoskeleton. In: Proceedings of the 2016 IEEE International Conference on Robotics and Automation. Stockholm, Sweden: IEEE, 2016. 257−263
    [20] Maeda G, Ewerton M, Neumann G, Lioutikov R, Peters J. Phase estimation for fast action recognition and trajectory generation in human–robot collaboration. The International Journal of Robotics Research, 2017, 36(13-14): 1579-1594 doi: 10.1177/0278364917693927
    [21] Silverio J, Huang Y L, Abu-Dakka F J, Rozo L, Caldwell D G. Uncertainty-aware imitation learning using kernelized movement primitives. In: Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Macau, China: IEEE, 2019. 90−97
    [22] Pomerleau D A. ALVINN: An autonomous land vehicle in a neural network. In: Proceedings of the 1st International Conference on Neural Information Processing Systems. Denver, USA: NIPS, 1989. 305−313
    [23] Ross S, Gordon G J, Bagnell D. A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, USA: JMLR.org, 2011. 627−635
    [24] Abbeel P, Ng A Y. Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st International Conference on Machine Learning. Banff, Canada: 2004. 1−8
    [25] Ho J, Ermon S. Generative adversarial imitation learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: NIPS, 2016. 4572−4580
    [26] Liu Nai-Jun, Lu Tao, Cai Ying-Hao, Wang Shuo. A review of robot manipulation skills learning methods. Acta Automatica Sinica, 2019, 45(3): 458-470
    [27] Qin Fang-Bo, Xu De. Review of robot manipulation skill models. Acta Automatica Sinica, 2019, 45(8): 1401-1418
    [28] Billard A, Epars Y, Cheng G, Schaal S. Discovering imitation strategies through categorization of multi-dimensional data. In: Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems. Las Vegas, USA: IEEE, 2003. 2398−2403
    [29] Calinon S, Guenter F, Billard A. On learning, representing, and generalizing a task in a humanoid robot. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2007, 37(2): 286-298 doi: 10.1109/TSMCB.2006.886952
    [30] Huang Y L, Abu-Dakka F J, Silverio J, Caldwell D G. Generalized orientation learning in robot task space. In: Proceedings of the 2019 International Conference on Robotics and Automation. Montreal, Canada: IEEE, 2019. 2531−2537
    [31] Matsubara T, Hyon S H, Morimoto J. Learning stylistic dynamic movement primitives from multiple demonstrations. In: Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. Taipei, China: IEEE, 2010. 1277−1283
    [32] Giusti A, Zeestraten M J A, Icer E, Pereira A, Caldwell D G, Calinon S, et al. Flexible automation driven by demonstration: Leveraging strategies that simplify robotics. IEEE Robotics & Automation Magazine, 2018, 25(2): 18-27
    [33] Huang Y L, Scholkopf B, Peters J. Learning optimal striking points for a ping-pong playing robot. In: Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg, Germany: IEEE, 2015. 4587−4592
    [34] Huang Y L, Abu-Dakka F J, Silverio J, Caldwell D G. Toward orientation learning and adaptation in Cartesian space. IEEE Transactions on Robotics, 2021, 37(1): 82-98 doi: 10.1109/TRO.2020.3010633
    [35] Bishop C M. Pattern Recognition and Machine Learning. Heidelberg: Springer, 2006.
    [36] Cohn D A, Ghahramani Z, Jordan M I. Active learning with statistical models. Journal of Artificial Intelligence Research, 1996, 4: 129-145 doi: 10.1613/jair.295
    [37] Calinon S. A tutorial on task-parameterized movement learning and retrieval. Intelligent Service Robotics, 2016, 9(1): 1-29 doi: 10.1007/s11370-015-0187-9
    [38] Guenter F, Hersch M, Calinon S, Billard A. Reinforcement learning for imitating constrained reaching movements. Advanced Robotics, 2007, 21(13): 1521-1544 doi: 10.1163/156855307782148550
    [39] Peters J, Vijayakumar S, Schaal S. Natural actor-critic. In: Proceedings of the 16th European Conference on Machine Learning. Porto, Portugal: Springer, 2005. 280−291
    [40] Rabiner L R. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 1989, 77(2):257-286 doi: 10.1109/5.18626
    [41] Yu S Z. Hidden semi-Markov models. Artificial Intelligence, 2010, 174(2): 215-243 doi: 10.1016/j.artint.2009.11.011
    [42] Calinon S, D’halluin F, Sauser E L, Caldwell D G, Billard A G. Learning and reproduction of gestures by imitation. IEEE Robotics & Automation Magazine, 2010, 17(2): 44-54
    [43] Osa T, Pajarinen J, Neumann G, Bagnell J A, Abbeel P, Peters J. An algorithmic perspective on imitation learning. Foundations and Trends® in Robotics, 2018, 7(1-2): 1-79 doi: 10.1561/2300000053
    [44] Zeestraten M J A, Calinon S, Caldwell D G. Variable duration movement encoding with minimal intervention control. In: Proceedings of the 2016 IEEE International Conference on Robotics and Automation. Stockholm, Sweden: IEEE, 2016. 497−503
    [45] Rasmussen C E, Williams C K I. Gaussian Processes for Machine Learning. Cambridge: MIT Press, 2006.
    [46] Hofmann T, Scholkopf B, Smola A J. Kernel methods in machine learning. The Annals of Statistics, 2008, 36(3): 1171-1220
    [47] Alvarez M A, Rosasco L, Lawrence N D. Kernels for vector-valued functions: A review. Foundations and Trends® in Machine Learning, 2012, 4(3): 195-266 doi: 10.1561/2200000036
    [48] Solak E, Murray-Smith R, Leithead W E, Leith D J, Rasmussen C E. Derivative observations in Gaussian process models of dynamic systems. In: Proceedings of the 15th International Conference on Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2002. 1057−1064
    [49] Atkeson C G, Moore A W, Schaal S. Locally weighted learning. Artificial Intelligence Review, 1997, 11(1-5): 11-73
    [50] Kober J, Mulling K, Kromer O, Lampert C H, Scholkopf B, Peters J. Movement templates for learning of hitting and batting. In: Proceedings of the 2010 IEEE International Conference on Robotics and Automation. Anchorage, USA: IEEE, 2010. 853−858
    [51] Fanger Y, Umlauft J, Hirche S. Gaussian processes for dynamic movement primitives with application in knowledge-based cooperation. In: Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems. Daejeon, Korea : IEEE, 2016. 3913−3919
    [52] Calinon S, Li Z B, Alizadeh T, Tsagarakis N G, Caldwell D G. Statistical dynamical systems for skills acquisition in humanoids. In: Proceedings of the 12th IEEE-RAS International Conference on Humanoid Robots. Osaka, Japan: IEEE, 2012. 323−329
    [53] Stulp F, Sigaud O. Robot skill learning: From reinforcement learning to evolution strategies. Paladyn, Journal of Behavioral Robotics, 2013, 4(1): 49-61
    [54] Kober J, Oztop E, Peters J. Reinforcement learning to adjust robot movements to new situations. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. Barcelona, Spain: IJCAI/AAAI, 2011. 2650−2655
    [55] Zhao T, Deng M D, Li Z J, Hu Y B. 2018. Cooperative manipulation for a mobile dual-arm robot using sequences of dynamic movement primitives. IEEE Transactions on Cognitive and Developmental Systems, 2020, 12(1): 18−29
    [56] Li Z J, Zhao T, Chen F, Hu Y B, Su C Y, Fukuda T. Reinforcement learning of manipulation and grasping using dynamical movement primitives for a humanoidlike mobile manipulator. IEEE/ASME Transactions on Mechatronics, 2018, 23(1): 121-131 doi: 10.1109/TMECH.2017.2717461
    [57] Paraschos A, Rueckert E, Peters J, Neumann G. Model-free probabilistic movement primitives for physical interaction. In: Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg, Germany: IEEE, 2015. 2860−2866
    [58] Havoutis I, Calinon S. Supervisory teleoperation with online learning and optimal control. In: Proceedings of the 2017 IEEE International Conference on Robotics and Automation. Singapore: IEEE, 2017. 1534−1540
    [59] Hershey J R, Olsen P A. Approximating the Kullback Leibler divergence between Gaussian mixture models. In: Proceedings of the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing. Honolulu, USA: IEEE, 2007. IV-317−IV-320
    [60] Goldberg P W, Williams C K I, Bishop C M. Regression with input-dependent noise: A Gaussian process treatment. In: Proceedings of the 10th International Conference on Neural Information Processing Systems. Denver, USA: NIPS, 1998. 493−499
    [61] Kersting K, Plagemann C, Pfaff P, Burgard W. Most likely heteroscedastic Gaussian process regression. In: Proceedings of the 24th International Conference on Machine Learning. Corvalis, USA: ACM, 2007. 393−400
    [62] Pastor P, Hoffmann H, Asfour T, Schaal S. Learning and generalization of motor skills by learning from demonstration. In: Proceedings of the 2009 IEEE International Conference on Robotics and Automation. Kobe, Japan: IEEE, 2009. 763−768
    [63] Silverio J, Rozo L, Calinon S, Caldwell D G. Learning bimanual end-effector poses from demonstrations using task-parameterized dynamical systems. In: Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg, Germany: IEEE, 2015. 464−470
    [64] Ude A, Nemec B, Petric T, Morimoto J. Orientation in cartesian space dynamic movement primitives. In: Proceedings of the 2014 IEEE International Conference on Robotics and Automation. Hong Kong, China: IEEE, 2014. 2997−3004
    [65] Abu-Dakka F J, Nemec B, J\orgensen J A, Savarimuthu T R, Kruger N, Ude A. Adaptation of manipulation skills in physical contact with the environment to reference force profiles. Autonomous Robots, 2015, 39(2): 199-217 doi: 10.1007/s10514-015-9435-2
    [66] Ravichandar H, Dani A. Learning position and orientation dynamics from demonstrations via contraction analysis. Autonomous Robots, 2019, 43(4): 897-912 doi: 10.1007/s10514-018-9758-x
    [67] Zeestraten M J A, Havoutis I, Silverio J, Calinon S, Caldwell D G. An approach for imitation learning on Riemannian manifolds. IEEE Robotics and Automation Letters, 2017, 2(3): 1240-1247 doi: 10.1109/LRA.2017.2657001
    [68] Saveriano M, Franzel F, Lee D. Merging position and orientation motion primitives. In: Proceedings of the 2019 International Conference on Robotics and Automation. Montreal, Canada: IEEE, 2019. 7041−7047
    [69] Abu-Dakka F J, Kyrki V. Geometry-aware dynamic movement primitives. In: Proceedings of the 2020 IEEE International Conference on Robotics and Automation. Paris, France: IEEE, 2020. 4421−4426
    [70] Abu-Dakka F J, Huang Y L, Silverio J, Kyrki V. A probabilistic framework for learning geometry-based robot manipulation skills. Robotics and Autonomous Systems, 2021, 141: 103761. doi: 10.1016/j.robot.2021.103761
    [71] Calinon S. Gaussians on Riemannian manifolds: Applications for robot learning and adaptive control. IEEE Robotics & Automation Magazine, 2020, 27(2): 33-45
    [72] Kronander K, Billard A. Learning compliant manipulation through kinesthetic and tactile human-robot interaction. IEEE Transactions on Haptics, 2014, 7(3): 367-380 doi: 10.1109/TOH.2013.54
    [73] Wu Y Q, Zhao F, Tao T, Ajoudani A. A framework for autonomous impedance regulation of robots based on imitation learning and optimal control. IEEE Robotics and Automation Letters, 2021, 6(1): 127-134 doi: 10.1109/LRA.2020.3033260
    [74] Forte D, Gams A, Morimoto J, Ude A. On-line motion synthesis and adaptation using a trajectory database. Robotics and Autonomous Systems, 2012, 60(10): 1327-1339 doi: 10.1016/j.robot.2012.05.004
    [75] Kramberger A, Gams A, Nemec B, Chrysostomou D, Madsen O, Ude A. Generalization of orientation trajectories and force-torque profiles for robotic assembly. Robotics and Autonomous Systems, 2017, 98: 333-346 doi: 10.1016/j.robot.2017.09.019
    [76] Stulp F, Raiola G, Hoarau A, Ivaldi S, Sigaud O. Learning compact parameterized skills with a single regression. In: Proceedings of the 13th IEEE-RAS International Conference on Humanoid Robots. Atlanta, USA: IEEE, 2013. 417−422
    [77] Huang Y L, Silverio J, Rozo L, Caldwell D G. Generalized task-parameterized skill learning. In: Proceedings of the 2018 IEEE International Conference on Robotics and Automation. Brisbane, Australia: IEEE, 2018. 5667−5474
    [78] Kulic D, Ott C, Lee D, Ishikawa J, Nakamura Y. Incremental learning of full body motion primitives and their sequencing through human motion observation. The International Journal of Robotics Research, 2012, 31(3): 330-345 doi: 10.1177/0278364911426178
    [79] Manschitz S, Gienger M, Kober J, Peters J. Learning sequential force interaction skills. Robotics, 2020, 9(2): Article No. 45 doi: 10.3390/robotics9020045
    [80] Kober J, Gienger M, Steil J J. Learning movement primitives for force interaction tasks. In: Proceedings of the 2015 IEEE International Conference on Robotics and Automation. Seattle, USA: IEEE, 2015. 3192−3199
    [81] Medina J R, Billard A. Learning stable task sequences from demonstration with linear parameter varying systems and hidden Markov models. In: Proceedings of the 1st Annual Conference on Robot Learning. Mountain View, USA: PMLR, 2017. 175−184
    [82] Meier F, Theodorou E, Stulp F, Schaal S. Movement segmentation using a primitive library. In: Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. San Francisco, USA: IEEE, 2011. 3407−3412
    [83] Lee S H, Suh I H, Calinon S, Johansson R. Autonomous framework for segmenting robot trajectories of manipulation task. Autonomous Robots, 2015, 38(2): 107-141 doi: 10.1007/s10514-014-9397-9
    [84] Stulp F, Schaal S. Hierarchical reinforcement learning with movement primitives. In: Proceedings of the 11th IEEE-RAS International Conference on Humanoid Robots. Bled, Slovenia: IEEE, 2011. 231−238
    [85] Daniel C, Neumann G, Kroemer O, Peters J. Learning sequential motor tasks. In: Proceedings of the 2013 IEEE International Conference on Robotics and Automation. Karlsruhe, Germany: IEEE, 2013. 2626−2632
    [86] Duan A Q, Camoriano R, Ferigo D, Huang Y L, Calandriello D, Rosasco L, et al. Learning to sequence multiple tasks with competing constraints. In: Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Macau, China: IEEE, 2019. 2672−2678
    [87] Silverio J, Huang Y L, Rozo L, Calinon S, Caldwell D G. Probabilistic learning of torque controllers from kinematic and force constraints. In: Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid, Spain: IEEE, 2018. 1−8
    [88] Schneider M, Ertel W. Robot learning by demonstration with local Gaussian process regression. In: Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. Taipei, China: IEEE, 2010. 255-260
    [89] Umlauft J, Fanger Y, Hirche S. Bayesian uncertainty modeling for programming by demonstration. In: Proceedings of the 2017 IEEE International Conference on Robotics and Automation. Singapore: IEEE, 2017. 6428−6434
    [90] Wilson A G, Ghahramani Z. Generalised Wishart processes. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence. Barcelona, Spain: AUAI Press, 2011. 1−9
    [91] Medina J R, Lee D, Hirche S. Risk-sensitive optimal feedback control for haptic assistance. In: Proceedings of the 2012 IEEE International Conference on Robotics and Automation. Saint Paul, USA: IEEE, 2012. 1025−1031
    [92] Huang Y L, Silverio J, Caldwell D G. Towards minimal intervention control with competing constraints. In: Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid, Spain: IEEE, 2018. 733−738
    [93] Calinon S, Billard A. A probabilistic programming by demonstration framework handling constraints in joint space and task space. In: Proceedings of the 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems. Nice, France: IEEE, 2008. 367−372
    [94] Calinon S, Billard A. Statistical learning by imitation of competing constraints in joint space and task space. Advanced Robotics, 2009, 23(15): 2059-2076 doi: 10.1163/016918609X12529294461843
    [95] Paraschos A, Lioutikov R, Peters J, Neumann G. Probabilistic prioritization of movement primitives. IEEE Robotics and Automation Letters, 2017, 2(4): 2294-2301 doi: 10.1109/LRA.2017.2725440
    [96] Fajen B R, Warren W H. Behavioral dynamics of steering, obstable avoidance, and route selection. Journal of Experimental Psychology: Human Perception and Performance, 2003, 29(2): 343-362 doi: 10.1037/0096-1523.29.2.343
    [97] Hoffmann H, Pastor P, Park D H, Schaal S. Biologically-inspired dynamical systems for movement generation: Automatic real-time goal adaptation and obstacle avoidance. In: Proceedings of the 2009 IEEE International Conference on Robotics and Automation. Kobe, Japan: IEEE, 2009. 2587−2592
    [98] Duan A Q, Camoriano R, Ferigo D, Huang Y L, Calandriello D, Rosasco L, et al. Learning to avoid obstacles with minimal intervention control. Frontiers in Robotics and AI, 2020, 7: Article No. 60 doi: 10.3389/frobt.2020.00060
    [99] Park D H, Hoffmann H, Pastor P, Schaal S. Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields. In: Proceedings of the 8th IEEE-RAS International Conference on Humanoid Robots. Daejeon, Korea: IEEE, 2008. 91−98
    [100] Maciejewski A A, Klein C A. Obstacle avoidance for kinematically redundant manipulators in dynamically varying environments. The International Journal of Robotics Research, 1985, 4(3): 109-117 doi: 10.1177/027836498500400308
    [101] Shyam R B, Lightbody P, Das G, Liu P C, Gomez-Gonzalez S, Neumann G. Improving local trajectory optimisation using probabilistic movement primitives. In: Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Macau, China: IEEE, 2019. 2666−2671
    [102] Zucker M, Ratliff N, Dragan A D, Pivtoraiko M, Klingensmith M, Dellin C M, et al. CHOMP: Covariant hamiltonian optimization for motion planning. The International Journal of Robotics Research, 2013, 32(9-10): 1164-1193 doi: 10.1177/0278364913488805
    [103] Huang Y L, Caldwell D G. A linearly constrained nonparametric framework for imitation learning. In: Proceedings of the 2020 IEEE International Conference on Robotics and Automation. Paris, France: IEEE, 2020. 4400−4406
    [104] Saveriano M, Lee D. Learning barrier functions for constrained motion planning with dynamical systems. In: Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Macau, China: IEEE, 2019. 112−119
    [105] Huang Y L. EKMP: Generalized imitation learning with adaptation, nonlinear hard constraints and obstacle avoidance. arXiv: 2103.00452, 2021.
    [106] Osa T, Esfahani A M G, Stolkin R, Lioutikov R, Peters J, Neumann G. Guiding trajectory optimization by demonstrated distributions. IEEE Robotics and Automation Letters, 2017, 2(2): 819-826 doi: 10.1109/LRA.2017.2653850
    [107] Marinho Z, Boots B, Dragan A, Byravan A, Srinivasa S, Gordon G J. Functional gradient motion planning in reproducing kernel Hilbert spaces. In: Proceedings of the Robotics: Science and Systems XII. Ann Arbor, USA, 2016. 1−9
    [108] Rana M A, Mukadam M, Ahmadzadeh S R, Chernova S, Boots B. Towards robust skill generalization: Unifying learning from demonstration and motion planning. In: Proceedings of the 1st Annual Conference on Robot Learning. Mountain View, USA: PMLR, 2017. 109−118
    [109] Koert D, Maeda G, Lioutikov R, Neumann G, Peters J. Demonstration based trajectory optimization for generalizable robot motions. In: Proceedings of the 2016 IEEE-RAS 16th International Conference on Humanoid Robots. Cancun, Mexico: IEEE, 2016. 515−522
    [110] Ye G, Alterovitz R. Demonstration-guided motion planning. Robotics Research. Cham: Springer, 2017. 291−307
    [111] Englert P, Toussaint M. Learning manipulation skills from a single demonstration. The International Journal of Robotics Research, 2018, 37(1): 137-154 doi: 10.1177/0278364917743795
    [112] Doerr A, Ratliff N D, Bohg J, Toussaint M, Schaal S. Direct loss minimization inverse optimal control. In: Proceedings of the Robotics: Science and Systems. Rome, Italy, 2015. 1−9
    [113] Hansen N. The CMA evolution strategy: A comparing review. Towards a New Evolutionary Computation: Advances in the Estimation of Distribution Algorithms. Berlin, Heidelberg: Springer, 2006, 75−102
    [114] Ewerton M, Neumann G, Lioutikov R, Amor H B, Peters J, Maeda G. Learning multiple collaborative tasks with a mixture of interaction primitives. In: Proceedings of the 2015 IEEE International Conference on Robotics and Automation. Seattle, USA: IEEE, 2015. 1535−1542
    [115] Amor H B, Neumann G, Kamthe S, Kroemer O, Peters J. Interaction primitives for human-robot cooperation tasks. In: Proceedings of the 2014 IEEE International Conference on Robotics and Automation. Hong Kong, China: IEEE, 2014. 2831−2837
    [116] Vogt D, Stepputtis S, Grehl S, Jung B, Amor H B. A system for learning continuous human-robot interactions from human-human demonstrations. In: Proceedings of the 2017 IEEE International Conference on Robotics and Automation. Singapore: IEEE, 2017. 2882−2889
    [117] Silverio J, Huang Y L, Rozo L, Caldwell D G. An uncertainty-aware minimal intervention control strategy learned from demonstrations. In: Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems. Madrid, Spain: IEEE, 2018. 6065−6071
    [118] Khoramshahi M, Billard A. A dynamical system approach to task-adaptation in physical human–robot interaction. Autonomous Robots, 2019, 43(4): 927-946 doi: 10.1007/s10514-018-9764-z
    [119] Kalakrishnan M, Chitta S, Theodorou E, Pastor P, Schaal S. STOMP: Stochastic trajectory optimization for motion planning. In: Proceedings of the 2011 IEEE International Conference on Robotics and Automation. Shanghai, China: IEEE, 2011. 4569−4574
    [120] Schulman J, Duan Y, Ho J, Lee A, Awwal I, Bradlow H, et al. Motion planning with sequential convex optimization and convex collision checking. The International Journal of Robotics Research, 2014, 33(9): 1251-1270 doi: 10.1177/0278364914528132
    [121] Osa T. Multimodal trajectory optimization for motion planning. The International Journal of Robotics Research, 2020, 39(8): 983-1001 doi: 10.1177/0278364920918296
    [122] LaValle S M, Kuffner Jr J J. Randomized kinodynamic planning. The International Journal of Robotics Research, 2001, 20(5): 378-400 doi: 10.1177/02783640122067453
    [123] Kavraki L E, Svestka P, Latombe J C, Overmars M H. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Transactions on Robotics and Automation, 1996, 12(4): 566-580 doi: 10.1109/70.508439
    [124] Hsu D, Latombe J C, Kurniawati H. On the probabilistic foundations of probabilistic roadmap planning. The International Journal of Robotics Research, 2006, 25(7): 627-643 doi: 10.1177/0278364906067174
    [125] Celemin C, Maeda G, Ruiz-del-Solar J, Peters J, Kober J. Reinforcement learning of motor skills using policy search and human corrective advice. The International Journal of Robotics Research, 2019, 38(14): 1560-1580 doi: 10.1177/0278364919871998
    [126] Maeda G, Ewerton M, Osa T, Busch B, Peters J. Active incremental learning of robot movement primitives. In: Proceedings of the 1st Annual Conference on Robot Learning. Mountain View, USA: PMLR, 2017. 37−46
    [127] Pearl J. Causality. Cambridge: Cambridge University Press, 2009.
    [128] Katz G, Huang D W, Hauge T, Gentili R, Reggia J. A novel parsimonious cause-effect reasoning algorithm for robot imitation and plan recognition. IEEE Transactions on Cognitive and Developmental Systems, 2018, 10(2): 177-193 doi: 10.1109/TCDS.2017.2651643
    [129] Haan P, Jayaraman D, Levine S. Causal confusion in imitation learning. In: Proceedings of the 33rd Conference on Neural Information Processing Systems. Vancouver, Canada: NeurIPS, 2019. 11693−11704
  • 期刊类型引用(21)

    1. 肖洒,陈旭阳,叶锦华,吴海彬. 一种基于DTW-DP-GMM的工业机器人轨迹学习策略. 天津大学学报(自然科学与工程技术版). 2025(01): 68-80 . 百度学术
    2. 冯振,牟海明,薛杰,李清都. 融合模仿学习的双足机器人全向行走步态生成方法. 电子科技. 2025(01): 29-36 . 百度学术
    3. 李臻恺,付明磊,姜国栋,刘锦元,Uladzislau Sychou. 基于冗余机械臂可操作性的改进动态运动基元方法. 计算机集成制造系统. 2025(01): 35-46 . 百度学术
    4. 薛俊楠,李志海,于洪鹏. 基于改进Soft-DTW的人类示教轨迹模板生成方法. 小型微型计算机系统. 2025(03): 528-534 . 百度学术
    5. 刘暾东,张馨月,林晨滢,吴晓敏,苏永彬. 基于分段动态运动基元的机械臂轨迹学习与避障方法. 机器人. 2024(03): 275-283 . 百度学术
    6. 仲训杲,罗家国,田军,仲训昱,彭侠夫,刘强. 二阶锥约束规划的机器人视觉闭环位姿自协调方法. 中国机械工程. 2024(06): 1064-1073 . 百度学术
    7. 伍家俊,黎奕辉,陈燊豪. 基于改进ProMPs的机器人姿态轨迹模仿学习. 组合机床与自动化加工技术. 2024(10): 46-49 . 百度学术
    8. 李平,李利娜,侯志利. 协作机器人运动轨迹模仿学习方法研究. 组合机床与自动化加工技术. 2024(10): 120-125 . 百度学术
    9. 毛飞鸿,冀晓春,黄开启,苏建华. 考虑障碍物尺寸信息的机械臂避障路径学习方法. 机电工程技术. 2024(10): 136-142 . 百度学术
    10. 翟雪倩,江励,郑昊辰,罗艺,周雪峰,吴鸿敏. 机器人强泛化性运动技能学习与自适应变阻抗控制方法. 机床与液压. 2024(23): 37-44+50 . 百度学术
    11. 李思敏,姜喜胜,吉祥,梁国祥,李清都. 基于分层二次规划的双臂机器人动作模仿研究. 智能计算机与应用. 2024(12): 1-9 . 百度学术
    12. 肖洒,吕勇明,吴海彬. 一种基于DP-KMP的机器人避障交互式学习方法. 仪器仪表学报. 2024(11): 65-78 . 百度学术
    13. 张铁民,邓鸿锋,李看,蒋佳城,廖峻添. 笼养鸡舍巡检机器人惯性导航系统设计与试验研究. 农业工程学报. 2024(23): 135-146 . 百度学术
    14. 张秋菊,吕青. 机器人多模态智能操作技术研究综述. 计算机科学与探索. 2023(04): 792-809 . 百度学术
    15. 柏纪伸,钱堃,徐欣. 基于多级核化运动基元的人机交递轨迹模仿学习. 机器人. 2023(04): 409-421 . 百度学术
    16. 王雪松,王荣荣,程玉虎. 安全强化学习综述. 自动化学报. 2023(09): 1813-1835 . 本站查看
    17. 周娴玮,包明豪,叶鑫,余松森. 带Q网络过滤的两阶段TD3深度强化学习方法. 计算机技术与发展. 2023(10): 101-108 . 百度学术
    18. 苏永彬,洪瑞康,刘暾东. 基于前馈隐马尔可夫模型的机器人演示轨迹精准重构方法研究. 仪器仪表学报. 2023(12): 199-207 . 百度学术
    19. 段宝阁,杨尚尚,谢啸,肖晓晖. 基于模仿学习的双曲率曲面零件复合材料织物机器人铺放. 机器人. 2022(04): 504-512 . 百度学术
    20. 曾海,许德章. 基于模仿学习的气管插管机器人非结构环境作业策略. 淮阴工学院学报. 2022(03): 31-40 . 百度学术
    21. 颜鹏,郭继峰,白成超. 考虑移动目标不确定行为方式的轨迹预测方法. 宇航学报. 2022(08): 1040-1051 . 百度学术

    其他类型引用(23)

  • 加载中
  • 图(7) / 表(2)
    计量
    • 文章访问数:  5699
    • HTML全文浏览量:  3152
    • PDF下载量:  1806
    • 被引次数: 44
    出版历程
    • 收稿日期:  2021-01-12
    • 录用日期:  2021-04-29
    • 网络出版日期:  2021-11-11
    • 刊出日期:  2022-02-18

    目录

    /

    返回文章
    返回