2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于视觉的目标检测与跟踪综述

尹宏鹏 陈波 柴毅 刘兆栋

汤昊林, 杨扬, 杨昆, 罗毅, 张雅莹, 张芳瑜. 基于混合特征的非刚性点阵配准算法. 自动化学报, 2016, 42(11): 1732-1743. doi: 10.16383/j.aas.2016.c150618
引用本文: 尹宏鹏, 陈波, 柴毅, 刘兆栋. 基于视觉的目标检测与跟踪综述. 自动化学报, 2016, 42(10): 1466-1489. doi: 10.16383/j.aas.2016.c150823
TANG Hao-Lin, YANG Yang, YANG Kun, LUO Yi, ZHANG Ya-Ying, ZHANG Fang-Yu. Non-rigid Point Set Registration with Mixed Features. ACTA AUTOMATICA SINICA, 2016, 42(11): 1732-1743. doi: 10.16383/j.aas.2016.c150618
Citation: YIN Hong-Peng, CHEN Bo, CHAI Yi, LIU Zhao-Dong. Vision-based Object Detection and Tracking: A Review. ACTA AUTOMATICA SINICA, 2016, 42(10): 1466-1489. doi: 10.16383/j.aas.2016.c150823

基于视觉的目标检测与跟踪综述

doi: 10.16383/j.aas.2016.c150823
基金项目: 

重庆市基础科学与前沿研究技术专项重点项目 cstc2015jcyjB0569

中央高校基本科研业务专项基金 106112015CDJXY170003

国家自然科学基金 61203321

重庆市研究生科研创新项目 CYB14023

中央高校基本科研业务专项基金 106112016CDJZR175511

详细信息
    作者简介:

    陈波  重庆大学自动化学院硕士研究生.2015年获得重庆大学学士学位.主要研究方向为深度学习, 计算机视觉.E-mail:qiurenbieyuan@gmail.com

    柴毅  重庆大学自动化学院教授.2001年获得重庆大学博士学位.主要研究方向为信息处理, 融合与控制, 计算机网络与系统控制.E-mail:chaiyi@cqu.edu.cn

    刘兆栋  重庆大学自动化学院博士研究生.主要研究方向为稀疏表示, 机器学习.E-mail:liuzhaodong@cqu.edu.cn

    通讯作者:

    尹宏鹏  重庆大学自动化学院副教授.2009年获得重庆大学博士学位.主要研究方向为模式识别, 图像处理与计算机视觉.本文通信作者.E-mail:yinhongpeng@gmail.com

Vision-based Object Detection and Tracking: A Review

Funds: 

Chongqing Nature Science Foundation of Fundamental Science and Frontier Technologies cstc2015jcyjB0569

China Central Universities Foundation 106112015CDJXY170003

National Natural Science Foundation of China 61203321

Chongqing Graduate Student Research Innovation Project CYB14023

China Central Universities Foundation 106112016CDJZR175511

More Information
    Author Bio:

     Master student at the College of Automation, Chongqing University. He received his bachelor0s degree from Chongqing University in 2015. His research interest covers deep learning and computer vision.E-mail:

     Professor at the College of Automation, Chongqing University. He received his Ph. D. degree from Chongqing University in 2001. His research interest covers information processing, integration and control, and computer network and system control.E-mail:

     Ph. D. candidate at the College of Automation, Chongqing University. His research interest covers sparse representation and machine learning.E-mail:

    Corresponding author: YIN Hong-Peng  Associate professor at the College of Automation, Chongqing University. He received his Ph. D. degree from Chongqing University in 2009. His research interest covers pattern recognition, image processing, and computer vision. Corresponding author of this paper.E-mail:yinhongpeng@gmail.com
  • 摘要: 基于视觉的目标检测与跟踪是图像处理、计算机视觉、模式识别等众多学科的交叉研究课题,在视频监控、虚拟现实、人机交互、自主导航等领域,具有重要的理论研究意义和实际应用价值.本文对目标检测与跟踪的发展历史、研究现状以及典型方法给出了较为全面的梳理和总结.首先,根据所处理的数据对象的不同,将目标检测分为基于背景建模和基于前景建模的方法,并分别对背景建模与特征表达方法进行了归纳总结.其次,根据跟踪过程有无目标检测的参与,将跟踪方法分为生成式与判别式,对基于统计的表观建模方法进行了归纳总结.然后,对典型算法的优缺点进行了梳理与分析,并给出了其在标准数据集上的性能对比.最后,总结了该领域待解决的难点问题,对其未来的发展趋势进行了展望.
  • 非刚性点阵配准(Non-rigid point set registration)是将某一点阵(称为源点阵)与其发生形变后的点阵(称为目标点阵)进行匹配的过程.该技术在计算机视觉、机器学习、医学图像处理、模式识别以及地理信息系统中扮演着极其重要的角色.基于当前算法的特点,非刚性点阵配准算法大体可以分为两大类:基于迭代或非迭代的算法 和基于学习或非学习的算法.由于本文算法主要涉及基于迭代的问题,所以我们主要从基于迭代或非迭代的角度来介绍当前的非刚性点阵配准算法.

    在基于非迭代的非刚性点阵配准算法中,两组点阵之间的对应关系是通过使用某种高级结构特征(High level structural features)仅进行一次相似性评估后直接找回两组点阵之间的对应关系. 在基于非迭代的配准模型中,直线[1]、曲线[2]、表面结构[3]、Shape context[4-5]和图论(Graphs)[6-7]等特征被用于两个点阵之间相似度的评估.在非迭代算法中,Shape context和Graphs是最受欢迎的两种特征描述法,其核心是通过最小化两个点阵之间的分布差异(使用Shape context时)或者拓扑结构差异(使用Graphs时)来找回点阵之间的对应关系[4-9].最近,一部分研究人员[10-14] 在传统的基于Graphs特征算法的基础上加入了学习要素,通过在配准前使用适当的学习样本进行学习来优化算法中的参数设置,从而提高了算法的配准精度.但是这类算法由于使用了Shape context 或Graphs特征,当相邻点较为接近时该类特征则变得非常相似以至于这类算法并不能达到较好的配准效果[8, 15-16].

    基于迭代的算法通常包括两个相互交替的过程:对应关系评估(Correspondence estimation)和空间变换更新(Transformation updating).相对于基于非迭代的算法,基于迭代的算法的优势在于它们在迭代过程中逐步地调整源点阵的初始几何形状和空间位置使得源点阵在几何形状和空间位置上变得越来越接近目标点阵,从而使得通过几何结构特征寻找它们之间的对应关系变得更加容易.TPS-RPM[17]是第一个利用迭代技术来进行非刚性点阵配准的算法.它通过使用点阵到点阵的距离、Softassign[18-19]和退火算法[20-21]来评估点阵之间的对应概率和控制薄板样条函数(Thin plate spline,TPS)[22]的更新. Myronenko等[23] 在TPS-RPM算法框架基础上提出了在空间变换更新中增加运动一致性约束条件(Motion coherence constraint)[24]来提高配准过程中空间变换的稳定性,并利用最大似然法(Maximum likelihood)来评估点阵之间的对应关系. 之后,Myronenko等[25]在文献[23]的基础上发表了著名的CPD算法(Coherent points drift algorithm),他们改良了空间变换模型使之既可以适用于刚性和非刚性的点阵配准问题,并可以在配准精度要求相对不高的情况下通过使用快速高斯变换(Fast Gauss transform)[26]和矩阵低秩逼近(Low-rank matrix approximation)[27]技术减少计算量来提升算法的配准速度.近期,Jian等[16]提出了一种基于高斯混合模型(Gaussian mixture model)的非刚性点阵配准算法(称为GMMREG).该算法不直接在几何空间中配准两个点阵,而是把两个点阵先转变成为两个高斯混合模型,然后在这基础上进行对应关系评估,空间变换更新基于最小化两个高斯混合模型的L2距离[28].最近,国内的Ma等[29]提出了一种基于Shape context特征和L2E评估[30]的算法,Wang等[31]通过使用不对称的高斯模型捕捉空间点阵的不对称分布,并用其作为特征描述进行点阵的非刚性配准.

    本文中,我们提出了一种基于混合特征的非刚性点阵配准算法.本算法的主要贡献体现在以下3个方面:

    1) 全局结构特征描述算法:我们提出了一种利用和向量来描述点阵中各点的全局结构特征的描述算法.

    2) 局部结构特征描述算法:我们提出了一种利用点阵之间的局部区域相邻点的距离和描述点阵中各点的局部结构特征的描述算法.

    3) 基于混合特征的点阵对应评估算法:我们通过混合全局和局部结构特征描述算法提出了一种基于混合特征的能量方程,该方程允许使用混合特征进行点阵对应评估,使得在配准过程中所使用的特征不再单一化,使配准精度得到了提高并在大部分实验中超越了当前相关算法.

    我们首先定义了全局和局部特征描述法以及混合特征能量优化方程,然后对本文算法的两个核心步骤进行介绍.在本章的后面部分,我们将对本文算法的参数设定以及本文算法与当前相关方法的差异进行说明.假设 $\{{\pmb a}_{i},i=1,2,\cdots,n\}$ 和 $\{{\pmb b}_{j},j=1,2,\cdots,m\}$ 是两组需要进行配准的点阵, ${\pmb a}$ 和 ${\pmb b}$ 分别为源点阵和目标点阵.

    我们首先定义了两种特征描述法,分别被用来评估源点阵 ${\pmb a}$ 与目标点阵 ${\pmb b}$ 之间的全局与局部几何结构特征差异.

    1.1.1   全局结构特征差异

    全局几何结构特征差异被定义为

    \begin {equation} G_{{\pmb a}_{i}{\pmb b}_{j}}= |{\pmb v}_{{\pmb a}_{i}}-{\pmb v}_{{\pmb b}_{j}} |\end {equation}

    (1)

    其中 $G_{\pmb{ab}}$ 为全局结构差异矩阵,矩阵中的每个元素值为两个向量 ${\pmb v}_{{\pmb a}_{i}}$ 与 ${\pmb v}_{{\pmb b}_{j}}$ 相减后的模. $G_{\pmb{ab}}$ 被用于评估点阵 ${\pmb a}$ 与点阵 ${\pmb b}$ 之间的全局结构特征差异. ${\pmb v}_{{\pmb a}_{i}}$ 与 ${\pmb v}_{{\pmb b}_{j}}$ 则是我们提出的全局结构特征描述法,定义为

    \begin {equation} {\pmb v}_{{\pmb a}_{i}}=\sum_{k=1,,kneqi}^{n}\overrightarrow{{\pmb a}_{i}{\pmb a}_{k}}\end {equation}

    (2)

    \begin {equation} {\pmb v}_{{\pmb b}_{j}}=\sum_{k=1,,kneqj}^{m}\overrightarrow{{\pmb b}_{j}{\pmb b}_{k}}\end {equation}

    (3)

    其中 $\overrightarrow{{\pmb a}_{i}{\pmb a}_{k}}$ 与 $\overrightarrow{{\pmb b}_{j}{\pmb b}_{k}}$ 分别是点 ${\pmb a}_{i}$ 到点 ${\pmb a}_{k}$ 与点 ${\pmb b}_{j}$ 到 ${\pmb b}_{k}$ 的几何向量. ${\pmb v}_{{\pmb a}_{i}}$ 与 ${\pmb v}_{{\pmb b}_{j}}$ 则是分别用来描述点 ${{\pmb a}_{i}}$ 与点 ${{\pmb b}_{j}}$ 的全局结构特征的和向量.

    1.1.2   局部结构特征差异

    局部结构特征差异被定义为

    \begin {equation}L_{{\pmb a}_{i}{\pmb b}_{j}}=\sum_{k=1}^{K}\parallel T({ N}({\pmb a}_{i})_{k},{\pmb b}_{j})-{N}({\pmb b}_{j})_{k}\parallel^{2}\end {equation}

    (4)

    其中 $L_{{\pmb a}{\pmb b}}$ 是点阵 ${\pmb a}$ 与点阵 ${\pmb b}$ 之间的局部结构差异矩阵,K 为相邻点的个数. ${N}({\pmb a}_{i})_{k}$ 和 ${N}({\pmb b}_{j})_{k}$ 分别为点 ${\pmb a}_{i}$ 和点 ${\pmb b}_{j}$ 的第 k 个最近点. T 则是平移方程,被定义为

    \begin {equation}T({N}({\pmb a}_{i})_{k},{\pmb b}_{j})={N}({\pmb a}_{i})_{k}+({\pmb b}_{j}-{\pmb a}_{i})\end {equation}

    (5)

    其主要思想为考虑在点阵 ${\pmb a}$ 与 ${\pmb b}$ 中的每一个点与其相邻点( ${N}({\pmb a}_{i})_{k=1,\cdots,K}$ 或 $N{{({{a}_{i}})}_{k=1,\cdots ,K}}$ )构成了一个局部小段,评估点阵 ${\pmb a}$ 与 ${\pmb b}$ 之间的对应关系就可以转化为评估局部小段的相似度. 例如,首先将 ${\pmb a}_{i}$ 以及它的 K个相邻点 ${N}({\pmb a}_{i})_{k=1,\cdots,K}$ 根据平移向量 $\overrightarrow{{\pmb a}_{i}{\pmb b}_{j}}$ 移动到 点 ${\pmb b}_{j}$ . 然后把点 ${\pmb a}_{i}$ 的相邻点 ${N}({\pmb a}_{i})_{k=1,\cdots,K}$ 与点 ${\pmb b}_{j}$ 的相邻点 ${N}({\pmb b}_{j=1,2,\cdots,m})$ 之间的几何距离进行累加.最后,点 ${\pmb a}_{i}$ 在点阵 ${\pmb b}$ 中的对应点被确定为拥有最小距离和的点 ${N}({\pmb b}_{j})$ .其中,局部结构特征差异主要决定于相邻点个数 K,同时 K也决定了两个点阵之间局部结构相似度的评估.

    在这里,我们使用Linear assignment技术来最小化全局结构特征差异矩阵 $G_{\pmb{ab}}$ 与局部结构特征差异矩阵 $L_{{\pmb{ab}}}$ ,我们将会获得两种对应关系,它们分别是基于最小化的全局结构特征差异和局部结构特征差异计算而来的.

    本文中提出的基于混合特征的能量优化方程被定义为

    \begin {equation}E(M)=\sum_{i=1}^n\sum_{j=1}^m G_{{\pmb a}_{i}{\pmb b}_{j}}M_{ij}+\alpha \sum_{i=1}^n\sum_{j=1}^m L_{{\pmb a}_{i}{\pmb b}_{j}}M_{ij}\end {equation}

    (6)

    其中 $\sum_{i=1}^n \sum_{j=1}^m G_{{\pmb a}_{i}{\pmb b}_{j}}M_{ij}$ 和 $\sum_{i=1}^n\sum_{j=1}^m L_{{\pmb a}_{i}{\pmb b}_{j}}M_{ij}$ 分别描述了基于最小化全局和局部结构特征差异计算的能量,可以被考虑为Linear assignment问题. $G_{{\pmb a}{\pmb b}}$ 和 $L_{{\pmb a}{\pmb b}}$ 分别正规化至[0,1]区间. nm分别代表了点阵 ${\pmb a}$ 与 ${\pmb b}$ 中的序列长度. Mij是对应矩阵,表示了点阵 ${\pmb a}$ 与 点阵 ${\pmb b}$ 之间的对应关系,当点 ${\pmb a}_{i}$ 与点 ${\pmb b}_{j}$ 相对应时赋值为1,反之则赋值为0. Mij 始终满足 ${\sum_{j=1}^m}M_{ij}=1$ 和 ${\sum_{i=1}^n}M_{ij}=1$ . $\alpha$ 为权重参数用于调节能量优化时 $\sum_{i=1}^n\sum_{j=1}^m G_{{\pmb a}_{i}{\pmb b}_{j}}M_{ij}$ 与 $\sum_{i=1}^n\sum_{j=1}^m L_{{\pmb a}_{i}{\pmb b}_{j}}M_{ij}$ 的比重. 在配准过程中,能量调节参数 $\alpha$ 通过退火算法在每次迭代中逐渐减小,最后趋近为0.

    首先我们创建一个可变形的代理点阵 ${{\pmb a}^{w}}$ ,并使其在配准开始时满足 ${\pmb a}^{w}={\pmb a}$ .本文算法的主要过程是:1) 首先利用前述基于混合特征的能量优化方程在每次迭代中评估 ${{\pmb a}^{w}}$ 和 ${\pmb b}$ 的对应关系(前述式(1) $\thicksim$ (6) 中的 ${{\pmb a}_{i}}$ 应使用 ${{\pmb a}^{w}}$ ); 2) 随后使用TPS空间变换更新 ${{\pmb a}^{w}}$ 的空间位置及几何形状,该TPS空间变换由步骤1) 中获取的对应关系建立而来.这两个步骤1) 和2) 相互交替迭代以至于代理点阵 ${{\pmb a}^{w}}$ 能逐渐在空间位置和几何形状上越来越接近其目标点阵 ${\pmb b}$ ,最终准确地匹配上它在点阵 ${\pmb b}$ 中的真实对应点.使用代理点阵 ${\pmb a}^{w}$ 评估获得的对应点即为点阵 ${\pmb a}$ 的对应关系.

    1.3.1   步骤1:对应关系评估

    在每一次迭代中,点阵 ${{\pmb a}^{w}}$ 与 ${\pmb b}$ 的对应关系通过最小化基于混合特征的能量优化方程(6) 获得.基于混合特征的能量优化在本文中被看作一个线性分配问题,所以式(6) 可以使用Jonker-Volgenant算法[32]求解.Jonker-Volgenant算法已被广泛用于解决线性分配问题,它可以提供最短广增路(Shortest augmenting path)并拥有O $(N^3) $ 的计算复杂度.

    对于线性分配中的Integer cost问题,在配准前我们首先将需要配准的点阵坐标缩放至[0,1]之间,然后在每一次迭代中把计算出的全局与局部结构特征差异矩阵通过使用 $\lfloor{G_{{\pmb a}^{w}{\pmb b}}}× R \rceil$ 和 $\lfloor {L_{{\pmb a}^{w}{\pmb b}}}× R \rceil$ 进行数值处理,其中R被设为 $10^{6}$ .对于非方形矩阵问题(点阵 ${\pmb b}$ 包含冗余点),非方形矩阵 ${{ G}_{{\pmb a}^{w}{\pmb b}}}$ 和 ${L_{{\pmb a}^{w}{\pmb b}}}$ 可以通过分配虚拟项(Dummy entries)[33]来转换为方形矩阵,而且不会影响整体能量优化.转换后 $E(M)$ 则可以使用通常手段求解,并且仍然给出最优解.虽然我们提供了一种针对目标点阵包含冗余点的配准解决方案,但是本文算法并不能很好地处理包含冗余点的配准问题.原因是用于描述各点全局结构特征的和向量 ${\pmb v}_{{\pmb a}_{i}}$ 和 ${\pmb v}_{{\pmb b}_{j}}$ 容易受冗余点的影响.

    通过使用Jonker-Volgenant算法求解的对应关系矩阵M确保了从点阵 ${{\pmb a}^{w}}$ 到点阵 ${\pmb b}$ 的一一对应关系. 当前迭代的对应点集 ${\pmb b}^{c}$ 由式(7) 进行更新

    \begin{equation}{\pmb b}^{c}= M \cdot {\pmb b}\end{equation}

    (7)

    本文提出的基于混合特征的能量优化方程为使用混合特征来评估对应关系提供了一个灵活方法.例如,当 $\alpha$ 非常大时,最小化E等于最小化局部结构特征差异 ${L_{{\pmb a}^{w}{\pmb b}}}$ ,求出的点对点的对应关系是基于最小化两个点阵之间的局部结构特征差异.当 $\alpha$ 逐渐变小时,对应关系评估开始转向使用最小化全局结构特征差异,求出的点对点的对应关系是基于最小化两个点阵之间的全局结构特征差异.

    1.3.2   步骤 2:空间变换更新

    当更新完当前对应点集 ${\pmb b}^{c}$ 后,空间变换通过使用 ${\pmb b}^{c}$ 和源点阵 ${\pmb a}$ 的对应关系进行更新(由于 ${\pmb a}$ 和 ${{\pmb a}^{w}}$ 拥有相同的点阵序列,所以 ${\pmb b}^{c}$ 也是源点阵 ${\pmb a}$ 的对应点集).本文中,我们使用TPS空间变换来建立从 ${\pmb a}$ 到 ${\pmb b}^{c}$ 的映射关系

    \begin{equation}f({\pmb a},{\pmb d},{ w})={\pmb a}\cdot {d} + \phi({\pmb a})\cdot{ w}\end{equation}

    (8)

    其中 ${d}$ 为一个仿射系数矩阵, ${ w}$ 为一个非刚性形变系数矩阵. $\phi({\pmb a})$ 称为TPS内核方程(TPS kernel function),分别在二维和三维映射中被定义为 $\phi({\pmb a})=\|{\pmb a}-{{\pmb a}_{c}}\|^{2}\log \|{\pmb a}-{{\pmb a}_{c}}\|$ 和 $\phi({\pmb a})=\|{\pmb a}-{{\pmb a}_{c}}\|$ . ${{\pmb a}_{c}}$ 是从点阵 ${\pmb a}$ 中选取的一组控制点.

    为了使用适当的仿射系数 ${ d}$ 和非刚性形变系数 ${ w}$ 来建立从 ${\pmb a}$ 到它的对应点集 ${\pmb b}^{c}$ 的映射关系,TPS能量方程被定义为

    \begin{equation}E_{\textrm{TPS}}({ d},{w})=\|{\pmb b}^{c}-{\pmb a} {d}- \Phi {w}\|^{2}+λ \textrm{tr}({w}^{\rm T} \Phi { w})\end{equation}

    (9)

    其中正规化参数λ用于调节非刚性形变系数 ${w}$ ,同时它也被前述使用在式(6) 中用来控制权重参数 $\alpha$ 的能量权重调节所控制. ${\Phi}$ 是TPS内核矩阵,由前述TPS内核方程 $\phi({\pmb a})$ 计算而来.

    为了计算 ${d}$ 和 ${w}$ 的最小二乘解,矩阵的QR分解技术[34]被用于分离点阵的仿射和非刚性形变空间

    \begin{equation}{\pmb a}=QR=[Q_{1}|Q_{2}] \left(\begin{array}{c}{R}_{1} \\0 \\\end{array} \right)\end{equation}

    (10)

    其中,Q1 $\in {\bf R}^{N× D}$ , ${ Q}_{2}$ $\in {\bf R}^{N×(N-D)}$ , ${R}_{1}$ $\in {\bf R}^{D× D}$ . 此外,Q1Q2拥有相同的正交列. 所以式(9) 可以转换为

    \begin{equation}\begin{aligned}E_{\textrm{TPS}}(\boldsymbol{\gamma},{d})=\|Q^{\rm T}_{2}{\pmb b}^{c}-Q^{\rm T}_{2} \Phi Q_{2} \boldsymbol{\gamma}\|^{2}+ \|Q^{\rm T}_{1}{\pmb b}^{c}-\\{ R}_{1}{ d}-Q^{\rm T}_{1}\Phi Q_{2}\boldsymbol{\gamma}\|^{2} +λ\boldsymbol{\gamma}^{\rm T}Q^{\rm T}_{2} \Phi Q_{2}\boldsymbol{\gamma}\end{aligned}\end{equation}

    (11)

    其中 ${ w}=Q_{2}\boldsymbol{\gamma}$ , $\boldsymbol{\gamma}$ $\in{\bf R}^{(N-D-1) × (D+1) }$ .式(11) 的最小二乘解可以通过先最小化 $\boldsymbol{\gamma}$ ,然后最小化 ${d}$ 来求解. ${ w}$ 和 ${ d}$ 的解为

    \begin{equation}\hat{{ w}}=Q_{2}\boldsymbol{\gamma}=Q_{2}(Q^{\rm T}_{2} \Phi Q_{2}+λ I_{N-D-1})^{-1}Q^{\rm T}_{2}{\pmb b}^{c}\end{equation}

    (12)

    \begin{equation}\hat{{ d}}={R}^{-1}(Q^{\rm T}_{1}{\pmb b}^{c}-\Phi {w})\end{equation}

    (13)

    代理点阵 ${\pmb a}^{w}$ 的空间位置与几何形状被更新为

    \begin{equation}{\pmb a}^{w}={\pmb a}\cdot {d} + \Phi \cdot { w}\end{equation}

    (14)

    代理点阵 ${\pmb a}^{w}$ 的空间位置与几何形状更新后,本文算法重新回到步骤1(第1.3.1节)进行对应关系评估,两个步骤交替进行直到能量权重调节中的T达到终止设置(Tfinal).

    算法 1给出了本文算法的伪代码.

    算法 1. 基于混合特征的非刚性配准算法

    输入. 点阵 ${\pmb a}$ , ${{\pmb a}^{w}}$ 和 ${\pmb b}$ .

    预处理. 初始化参数Tinit,Tfinal,r, $λ_{init}$ 和 $\alpha_{init}$ . 设定 K并且确定点阵 ${{\pmb a}^{w}}$ 和 ${\pmb b}$ 的相邻点集 ${N}({\pmb a}_{i})$ 和 ${ N}({\pmb b}_{j})$ .

    开始 . 能量权重调节计划.

    步骤 1. 使用式(6) 和(7) 评估当前对应关系 ${\pmb b}^{c}$ .

    步骤 2. 使用式(12) 和(13) 更新TPS空间变 换.

    使用式(14) 更新 ${{\pmb a}^{w}}$ .

    通过调节减小T,然后更新参数 $\alpha$ 和λ.

    结束. 直至满足 $T\leq T_{final}$ .

    输出. 代理点阵 ${{\pmb a}^{w}}$ }.\vskip2

    本文提出的基于混合特征的非刚性点阵配准算法包含四组重要参数:调节参数Tinit,Tfinalr,权重参数 $\alpha$ ,正规化参数λ 以及相邻点个数参数K.每组参数的详细设定如下

    1) 调节参数:能量权重调节中所使用的T[20-21]在配准开始前被设定为一个较高的值Tinit,随后在每次迭代中利用一个线性调节计划 $T=T × r$ 使得T值在配准过程中被逐步降低,其中 r为调节率.当到达一个较低的设定值 Tinit 时,调节计划停止.本文中设计该线性调节计划的目的主要有2方面: 首先利用T来逐步减小式(6) 中的权重参数 $\alpha$ ,使得式(6) 的能量优化问题可以从首先最小化局部结构特征差异逐步过度到最小化全局结构特征差异;其次利用T来逐步减小式(9) 和(12) 中的正规化参数λ,使得TPS空间变换可以从更加刚性的形变更新逐渐转化为更加非刚性的形变更新.由于调节参数从根本上决定了算法迭代的次数,所以Tinit,Tfinalr 的参数设定原则为满足配准所需的足够迭代次数.基于前期使用Fish 1点阵[17]进行的试错实验(Trial-and-error experiment),起始Tinit值被设为点阵 ${\pmb a}$ 到 ${\pmb b}$ 最大距离平方的 $1/10$ ,终止Tfinal 值被设为点阵 ${\pmb a}$ 中各点到其最近点平均距离平方的 $1/8$ ,调节率r 通常被设为0.7.

    2) 权重参数:权重参数 $\alpha$ 在每次迭代中,通过使用 $\alpha=\alpha_{init} × T$ 被逐渐减小, $\alpha$ 的初始值设定原则为能够保证在配准前期整个算法可以集中在利用最小化局部结构特征差异来评估点阵的对应关系.初始值 $\alpha_{init}$ 被设为相邻点个数的平方 $K^{2}$ .

    3) 正规化参数:正规化参数λ在每次迭代中,通过使用 $λ=λ_{init}× T$ 被逐渐减小,由于λ主要用来控制TPS变换中的刚性和非刚性形变(λ较大时,TPS呈现出刚性变换; λ较小时,TPS转为呈现非刚性形变),所以λ的初始值设定原则为能够确保在配准前期TPS处于刚性变换.初始值 $λ_{init}$ 被设为点阵 ${\pmb a}$ 中点的数量.

    4) 相邻点数量参数:参数K的默认值设定是基于用于区别局部结构差异所需的最少相邻点数.例如,当我们需要区别角(Corner,其中包含2个相邻点)和十字(Cross,其中包含4个相邻点)时,我们至少需要借助4个相邻点来判断.基于上述考虑,我们将参数K在二维和三维配准情况下的默认值设为5.

    当前主要有TPS-RPM[17],CPD[25],GMMREG (L2 +TPS)[16],Ma等[29]和Wang等[31]5种算法与本文算法相似,表 1详细列举了本文算法与上述5种算法之间存在的差异.

    表 1  本文算法与相关算法的不同
    Table 1  Methodological differences between our method and the current methods
    算法对应关系评估空间变换更新
    使用的特征对应关系约束条件空间变换方程
    本文算法混合特征BTPS 能量方程 1TPS
    TPS-RPM高斯概率密度FTPS 能量方程 2TPS
    CPD高斯概率密度FMCC-NLLGRBF
    GMMREG高斯概率密度F最小化 L2 距离TPS
    Ma 等[29]Shape contextBL2E 评估子[30]RKHS
    Wang 等[31]MoAGF最小化 L2 距离RKHS
    注: B: 二值对应; F: 模糊对应; GRBF (Gaussian radial basis function): 高斯径向基函数; TPS: 薄板样条函数; MCC-NLL(Motion coherence constraint based negative log-likelihod):基于运动一致性的负对数似然; RKHS (Reproducing kernel Hilbert space): 再生核Hilbert空间; MoAG (Mixture of asymmetric Gaussian model): 混合非对称高斯模型; 在TPS能量方程 2中, $λ_{2}\textrm{tr}(d-I)^{\rm T}(d-I)$ 被加到了式(7) (TPS能量方程 1) 来控制仿射变换.
    下载: 导出CSV 
    | 显示表格

    1) 对应关系评估:与上述基于单一特征配准的5种算法不同,本文算法是一种基于混合特征的能量优化问题,且允许使用混合特征进行点阵之间的对应关系评估.因为本文算法与Ma等使用了线性分配技术求解对应关系,所以我们都提供了一个二值对应关系,即在对应关系矩阵Mij中仅使用01来描述对应关系.在TPS-RPM,CPD,GMMREG和Wang等[31]算法中,空间变换方程是建立在模糊对应(Fuzzy correspondences,即对应概率)关系基础上的,所以在指导代理点阵 ${{\pmb a}^{w}}$ 改变其空间位置和几何形状时会发生模糊更新,同时也会需要更多的迭代次数才能完成配准.在本文算法中,建立在最小化全局或局部结构特征差异的二值对应关系可以为代理点阵 ${{\pmb a}^{w}}$ 提供一个正确且清晰的空间位置与几何形状的更新指导.

    2) 空间变换更新: 本文算法使用的是标准TPS能量方程.TPS-RPM在式(6) 中增加了 $λ_{2} \textrm{tr} [d-I]^{\rm T}[d-I]$ 项用于控制仿射参数.由于本文算法在每次迭代中提供了一个较为精确的二值对应关系给TPS空间变换,所以我们仅需要使用λ来控制 ${w}$ 系数在刚性和非刚性变换上的作用.同时一个自由的仿射变换(也就是不受控制的仿射系数 ${d}$ )可以帮助代理点阵 ${{\pmb a}^{w}}$ 快速(使用更少的迭代次数)地找到更加接近目标点阵 ${\pmb b}$ 的空间位置和几何形状来完成接下来的非刚性配准.此外,与CPD中强制相邻点集保持运动一致性不同,本文算法通过在整个配准过程中固定相邻点集 ${N}({\pmb a}_{i}^{w})$ 和 ${N}({\pmb b}_{j})$ ,来保护代理点阵 ${{\pmb a}^{w}}$ 的拓扑结构特征.

    我们使用Matlab实现了本文算法的主要过程,其中Jonker-Volgenant算法使用C++编写并利用Matlab mex function调用Jonker-Volgenant算法的C++函数.我们首先基于以下四种配准模式测试了本文算法的各项性能,

    1) 轮廓配准 (2D synthetic point set);

    2) 3D 轮廓配准 (3D face point set);

    3) 序列图像 (CMU house and CMU hotel sequence);

    4) 真实图像特征点配准 (Pascal 2007 challenge datasets).

    而且,本文算法还与下列当前典型的8种算法进行了性能比较实验,

    1) 基于迭代的算法: TPS-RPM[17],CPD[25],GMMREG(L2 + TPS)[16],Wang等[31];

    2) 基于Graph的学习算法: Caetano等[10],Leordeanu等[13],Torresani等[14];

    3) 基于Graph的非学习算法: Zhou等[9].

    最后,我们评估了本文算法的计算复杂度并且讨论了如何降低本文算法的计算复杂度.

    Line[17]、Fish 1[17]、Fish 2[25]、Chinese character[17]和3D face[25]是非刚性点阵算法在轮廓点阵配准测试中普遍使用的几个流行点阵,它们分别来自TPS-RPM[17]和CPD[25].本文首先使用这5个点阵作为源点阵,并使用下面人工合成的方法创建了一系列丰富的目标点阵与TPS-RPM,CPD和GMMREG进行了性能对比实验.为了达到公平的实验对比,在目标点阵的生成、误差测量和性能评估上我们遵循了TPS-RPM[17]和CPD[25]中所用的方法.由于本文中提出的全局特征描述法(见第1.1.1节)是由和向量设计而来,当配准目标点阵中包含冗余点时,本文算法并不能很好地处理包含冗余点的配准问题,所以在本实验中我们不进行包含冗余点的配准模式性能测试.

    目标点阵:

    1) 形变级别: 我们设置8个控制点(三维配准情况是为6个控制点)在每组轮廓点阵边缘.为了创建一系列不同形变级别且适合的目标点阵,每个控制点拥有上、下、左、右4个方向的自由移动以及0.2的移动步长.8个(或6个)控制点的移动循序以及方向被随机设定. 在本实验中,TPS空间变换被用于使用这8个(或6个)控制点使前述源点阵发生形变创建新目标点阵.因为被移动的控制点数量反映了点阵的形变大小,所以本实验中形变级别被定义为移动控制点的数量(二维和三维情况下的最大形变级别分别为8和6) .

    2) 噪音比:我们通过利用均值为0且标准偏差从0.01至0.05的高斯白噪声(Gaussian white noise)创建了5个噪音级别的目标点阵.

    3) 旋转角度:我们认为在适当旋转下的配准性能测试是必要的,因为通常形变发生时都会伴随着旋转.但是过大旋转会导致相关算法产生不稳定或无价值的配准结果,所以我们主要专注于在以 $15^{\circ}$ 为间隔,旋转 $-30^{\circ}$ 到 $30^{\circ}$ 的情况下的配准性能测试.在三维配准实验中,源点阵被沿Z轴旋转来创建新目标点阵.

    误差测量: 在误差测量中,通常可以选择的测量方法很多.例如,正确匹配百分比、配准后点阵之间的平均距离等.为了保证直接和公平的比较,我们遵循了TPS-RPM与CPD中的误差测量法,即代理点阵 ${{\pmb a}^{w}}$ 与目标点阵 ${\pmb b}$ 之间平均距离的平方.

    性能评估:平均误差(即100次测试中的平均距离平方与标准偏差)在本实验中被用来比较不同算法之间的配准性能.对于每组点阵,在每种形变级别、噪音比、旋转角度下执行了100次的随机实验.

    在第一系列的实验中,我们在不同的二维人造轮廓点阵上评估了本文算法的性能.与后面的序列图像(CMU sequences and Pascal 2007challenge)以及真实图像特征点(Pascal 2007challenge)配准相比,这些二维轮廓点阵拥有更多的点数以及较密的点阵分布.在这类点阵配准中,由于相邻点彼此靠近且拥有相似的局部结构特征,所以在评价各点的局部特征结构相似度时变得更加困难.本文算法与相关算法的比较结果如下.

    3.2.1   Line

    在点阵Line的配准测试中,本文算法仅与TPS-RPM进行对比测试.因为其他算法并没有在该点阵上进行测试并公布相关的参数设定.性能测试统计数据(平均误差与标准偏差)展示在图 1的第1 行.本文算法在所有的实验中展现了准确的配准结果,并且在所有形变级别、噪音比、旋转角度的测试中,给出了最优的配准结果.图 2给出了本文算法的一个配准实例.

    图 1  二维轮廓点阵配准下的性能对比(误差线表示了100次随机测试中平均误差的标准偏差值.从第1行至第4行分别为点阵Line,Fish 1,Chinese character以及Fish2的实验结果.
    Fig. 1  Comparison of our results against CPD,TPS-RPM and GMMREG on 2D contour point set registration (The error bars indicate the standard deviations of the mean errors in100 random experiments. From the top row to bottom row are: Line,Fish 1,Chinese character and Fish 2,respectively.
    图 2  本文算法的配准实例: Line
    Fig. 2  Registration examples on Line point set
    3.2.2   Fish 1

    在点阵Fish 1的配准测试中,我们测试了本文算法与CPD,TPS-RPM 和GMMREG的性能,图 1的第2行展示了测试结果.这4种算法均给出了准确的配准结果,本文算法在所有的形变级别和所有旋转角度的测试中展现了最优的性能结果.在目标点阵含有噪音的配准测试中,这四种算法均展现了准确的配准结果,GMMREG 表现得更好. 图 3给出了本文算法的一个配准实例.

    图 3  本文算法的配准实例: Fish 1
    Fig. 3  Registration examples on Fish 1 point set
    3.2.3   Chinese character

    在点阵Chinese character的配准测试中,本文算法仅与TPS-RPM进行对比实验.因为CPD与GMMREG并未在非刚性配准中测试过该点阵(GMMREG仅在刚性配准中测试过该点阵).本文算法在所有形变级别、噪音比从0.01至0.03、所有旋转角度的测试中给出了最优的配准结果.图 4给出了一个本文算法的配准实例.

    图 4  本文算法的配准实例: Chinese character
    Fig. 4  Registration examples on Chinese character point set
    3.2.4   Fish 2

    本文算法与CPD的性能测试结果展示在图 1的第4行.本文算法在所有的实验中展现了准确的配准结果,并且在所有形变级别、噪音比以及旋转角度的测试中给出了最优的配准性能.图 5给出了本文算法的一个配准实例.

    图 5  本文算法的配准实例: Fish 2
    Fig. 5  Registration examples on Fish 2 point set

    在二维轮廓点阵配准测试中,所有的算法均给出了准确的配准结果,但是本文算法在形变与旋转测试中明显地超越了相关算法.

    在第二系列的实验中,我们评估了本文算法在三维配准中的性能.本实验中使用的3D face点阵已被CPD和GMMREG等算法用于测试其在三维配准中的性能. 图 6 给出了本文算法与CPD、GMMREG算法的性能测试结果.本文算法在所有实验中给出了准确的配准结果,同时在所有形变级别、噪音比从0.01至0.04以及所有旋转角度的实验中给出了最优的性能结果.图 7给出了一个本文算法的配准实例.

    图 6  三维Face轮廓点阵配准下的性能对比 (误差线表示了100次随机测试中平均误差的标准偏差值.)
    Fig. 6  Comparison of our results against CPD and GMMREG on3D face contour point set registration (The error bars indicate the standard deviations of the mean errors in 100 random experiments.)
    图 7  3D face点阵配准实例
    Fig. 7  Registration examples on 3D face point set

    在第三系列的实验中,我们测试了本文算法在序列图像特征点配准问题上的性能.与二维和三维人造点阵相比,序列图像拥有更少的特征点,这些点稀疏地分布在图像中.CMU house和CMU hotel序列图像是目前用于测试基于Graph的学习算法最流行的实验数据.两个序列图像分别由111和101幅图组成,每幅图拥有30个标记的特征点.在本实验中,我们使用正确配准点数的百分比(称为配准率)为误差测量法.

    本文算法与三种基于Graph的学习算法{[10, 13-14],一种基于Graph的非学习算法[9],和三种基于迭代的算法[16, 25, 31]分别在这两组序列图像的所有配准可能中进行了性能对比实验.

    表 2展示了实验结果.在House序列图像的配准中,对于Caetano等[10]与Zhou等[9],我们报告了他们公布的配准率的上限值,对于Leordeanu等[13]、Torresani等[14]和Wang等[31],我们给出了他们公布的配准率. 本文算法,Wang等[31]和Torresani等[14]给出了完美的配准结果,也超越了其他算法.但是从算法运行时间角度来看,本文算法的运行时间(平均0.049秒)比Torresani等公布的平均运行时间4.8秒[14]快了很多(该对比也考虑了使用电脑的性能问题).在CMU hotel序列图像的配准中,Wang等[14, 31]与Zhou等[9]没有提供他们的实验结果. 与CPD,GMMREG,Leordeanu等[13]和Caetano等[10]相比较,本文算法展现了更好的配准精度. 图 8给出了本文算法的两个配准实例.

    表 2  CMU house和CMU hotel序列图像中所有可能的图像配准结果 (%)
    Table 2  Matching rates on the CMU house and CMU hotel for all possible image pairs (%)
    算法CMU houseCMU hotel
    本文算法100.099.3
    CPD99.698.9
    GMMREG99.597.1
    Wang 等[31]100.0
    Torresani 等[14]100.0
    Zhou 等[9]≈ 100.0
    Leordeanu 等[13]99.894.8
    Caetano 等[10]< 96.0 < 90.0
    下载: 导出CSV 
    | 显示表格
    图 8  CMU house与CMU hotel配准实例
    Fig. 8  Registration examples on CMU house and CMU hotel

    在第四系列的实验中,我们使用Leordeanu等[13]的测试数据测试了本文算法的性能.这套测试数据集从Pascal 2007 challenge数据库中挑选出来的,包含30对汽车图像与20对摩托车图像. 每对图像中包含 $30\sim60$ 个特征点.本文算法与CPD,GMMREG,Zhou等[9]和Leordeanu等[13]进行了性能对比,其结果在表 3中列出,对于Zhou等[9] (A)和 Leordeanu等[13] (B),我们报告了他们公布的实验结果. 本文算法给出了最优的配准率.图 9给出了本文算法的两个配准实例.

    图 9  Pascal 2007 challenge 配准实例
    Fig. 9  Registration examples on Pascal 2007 challenge
    表 3  汽车与摩托车图像库的配准结果 (%)
    Table 3  Matching rates on cars and motorbikes (%)
    本文算法CPDGMMREGAB
    9380828080
    下载: 导出CSV 
    | 显示表格

    本文算法的计算复杂度主要与两个方面相关:1) 决定收敛性的能量权重调节参数 Tinit,Tfinalr;2) 用于求解基于混合特征的能量优化方程的线性分配算法.

    3.6.1   收敛范围

    收敛范围主要与形变级别和能量权重调节参数设定相关.在其他相关算法中,TPS-RPM的收敛范围由退火算法决定,CPD和GMMREG则分别由容差停止准则(Tolerance stopping criterion)以及最大迭代次数所决定.我们调查了上述这四种算法在点阵Chinese character形变实验中的收敛范围.本文算法、TPS-RPM、CPD与GMMREG的参数设定值遵循前述Fish1实验中的设定值.CPD和TPS-RPM分别平均需要43次与85次迭代来完成整个配准过程,而GMMREG则需要最大迭代次数(100次)才能完成配准. 原因是由于容差停止准则被设定为 $10^{-10}$ ,GMMREG在配准中最小化后的L2距离很难达到该标准.本文算法仅需要17次迭代就可以完成配准.

    此外,我们也调查了本文算法在不同能量权重调节参数设定下的收敛范围.图 10给出了在Chinese character点阵形变实验中的例子.对于每一个能量权重调节参数设定值,我们在每一个形变级别下运行了100次随机实验.基于图 10展示的实验结果,随着调节初始值Tinit降低为默认值的 $1/10$ 时,本文算法的性能发生了轻微的退化,配准所需迭代次数减少了 $41%$ (平均迭代次数从17次减少至10次);随着最终值Tfinal增加为默认值的10倍时,本文算法的性能发生了退化,配准所需迭代次数减少了 $41%$ (平均迭代次数从17次减少到10 次);随着调节速率r减少为默认值的 $1/2$ ,本文算法的性能轻微退化,配准所需迭代次数减少了 $65%$ (从17次减少至仅需6 次).即便能量权重调节参数被显著地改变了,所有的实验依旧展现了非常高的配准精度(也就是误差低于0.0013且标准偏差在 $\pm 0.0015$ 之内).基于这些结果,本文算法的计算复杂度可以通过调整能量权重调节参数设定大幅降低,同时算法依旧维持了很高的配准精度.

    图 10  不同能量权重调节参数设定下的配准性能
    Fig. 10  Relationships between performances and different energy tradeoff adjustment parameter settings
    3.6.2   Jonker-Volgenant算法性能

    为了使用线性分配技术求解二值对应矩阵M,本文算法使用了Jonker-Volgenant算法[32],该算法提供了O $(N^{3})$ 的计算复杂度. 我们在一台4GB内存和2.67GHz Intel(R)Xeon(R) CPU的电脑上使用Matlab mex function功能测试了C++代码的Jonker-Volgenant算法性能. 表 4给出了使用Jonker-Volgenant 算法求解不同大小的二值对应矩阵所需时间.Jonker-Volgenant算法展现了快速的求解能力,同时也为本文算法实现快速非刚性点阵配准提供了支撑.

    表 4  Jonker-Volgenant 算法性能 (测试矩阵由 Matlab 的 rand 函数自动生成.)
    Table 4  Performance of Jonker-Volgenant algorithm (The cost matrices were generated by Matlab rand function.)
    矩阵大小2005001 0002 0003 000
    所需时间 (秒)0.0020.0160.1000.3160.588
    下载: 导出CSV 
    | 显示表格

    我们已经介绍了一种基于混合特征的非刚性点阵配准算法:1) 设计出了一种基于和向量特征的全局结构特征描述算法;2) 提出了一种利用点阵之间的局部区域相邻点的距离和描述点阵中各点的局部结构特征的描述算法;3) 提出一种基于混合特征的能量方程并设计了该方程的能量权重调节,该方程允许使用混合特征进行点阵对应评估.最后将本文算法与8种当前典型算法进行了性能对比测试,本文算法在绝大多数的形变和旋转配准情况中展现了最好的配准结果.

    致谢: 感谢Chui Hai-Li,Rangarajan Anand,Myronenko Andriy,Song Xu-Bo,Jian Bing,Vemuri Baba,Zhou Feng,De la Torre Fernando,Leordeanu Marius,Torresani Lorenzo 和 Caetano Tiberio 提供了他们的算法源代码和测试数据. 这极大地促进了对比实验.我们无偿提供本文算法的Matlab源代码供学术研究.
  • 图  1  基于视觉的目标检测与跟踪框架

    Fig.  1  General framework of vision-based object detection and tracking

    图  2  基于背景建模的目标检测流程图

    Fig.  2  Flow chart of object detection based on background modeling

    图  3  基于目标建模的目标检测流程图

    Fig.  3  Flow chart of object detection based on object modeling

    图  4  限制玻尔兹曼机

    Fig.  4  Restricted Boltzmann machine

    图  5  基于自编码机的特征表达

    Fig.  5  Feature representation based on auto-encoder

    图  6  单层卷积神经网络

    Fig.  6  Single layer convolutional neural network

    图  7  基于单层卷积神经网络的特征表达

    Fig.  7  Feature representation based on single layer CNN

    图  8  运动目标跟踪一般流程

    Fig.  8  Flow chart of moving object tracking

    表  1  基于视觉的目标检测与跟踪应用领域

    Table  1  Applications of vision-based object detection and tracking

    应用领域 具体应用
    智能监控 公共安全监控(犯罪预防、人流密度检测)、停车场、超市、百货公司、自动售货机、ATM、小区(外来人员访问控制)、交通场景、家庭环境(老幼看护)等
    虚拟现实 交互式虚拟世界、游戏控制、虚拟工作室、角色动画、远程会议等
    高级人机交互 手语翻译、基于手势的控制、高噪声环境(机场、工厂等)下的信息传递等
    动作分析 基于内容的运动视频检索, 高尔夫、网球等的个性化训练, 舞蹈等的编排, 骨科患者的临床研究等
    自主导航 车辆导航、机器人导航、太空探测器的导航等
    机器人视觉 工业机器人、家庭服务机器人、餐厅服务机器人、太空探测器等
    下载: 导出CSV

    表  2  目标检测与跟踪相关综述文献

    Table  2  Related surveys about object detection and tracking

    文献 题目 主要内容 讨论主题 发表年限 不足之处
    [8] Vision based hand gesture recognition for human computer interaction: a survey 从检测、跟踪与识别三方面对手势识别的发展现状进行了梳理与总结 检测、跟踪、识别 2015 只进行了某些具体应用方向上的梳理
    [9] A survey on recent object detection techniques useful for monocular vision-based planetary terrain classification 对行星地形分类中的目标检测技术进行了总结 目标检测 2014
    [10] Sparse coding based visual tracking: review and experimental comparison 对基于稀疏编码的目标跟踪进行了全面的梳理与总结, 给出了实验对比与分析 表观建模 2013 只讨论了目标检测与跟踪的组成部分
    [11] A survey of appearance models in visual object tracking 从全局与局部信息描述的角度探讨了目标跟踪中的视觉表达问题 表观建模 2013
    [12] 面向目标检测的稀疏表示方法研究进展 综述了稀疏表示方法在目标检测领域中的国内外重要研究进展 表观建模 2015
    [13] Background subtraction techniques: a review 对几种常用的背景减除方法进行了总结 背景建模 2004
    [14] Traditional and recent approaches in background modeling for foreground detection: an overview 对目标检测中背景建模方法进行了详细讨论 背景建模 2014
    [15] Visual tracking: an experimental survey 对19种先进的跟踪器在315段视频序列上进行了对比实验与性能评估 目标跟踪 2014
    [16] Automated human behavior analysis from surveillance videos: a survey 在人体行为理解的底层处理部分, 对目标检测、分类及其跟踪进行了详细阐述 人体行为理解 2014 没有展开讨论检测跟踪问题
    [17] 智能视频监控技术综述 在智能视频监控的底层部分, 对目标检测与跟踪进行了讨论 智能监控 2015
    [18] Object tracking: a survey 对目标跟踪中的目标表达、特征或运动模型选取等问题进行了分类归纳 目标跟踪 2006 发表年限比较久远, 不断更新的理论和方法亟需梳理总结
    [19] 视觉跟踪技术综述 分类归纳了视觉跟踪, 并论述了其在视频监控、图像压缩和三维重构等的应用 目标跟踪 2006
    [20] 运动目标检测算法的探讨 对2007年以前的主流运动目标检测方法进行了分类讨论 目标检测 2006
    [21] 运动目标跟踪算法研究综述 将运动目标跟踪问题分为运动检测与目标跟踪, 并对跟踪算法进行了综述工作 目标跟踪 2009
    [22] 微弱运动目标的检测与跟踪识别算法研究 对强噪声背景下的微弱运动目标检测与跟踪算法进行了探讨 目标检测与跟踪 2010
    下载: 导出CSV

    表  3  基于人工设计的特征表达方法

    Table  3  Human-engineering-based feature representation methods

    序号 文献 典型算法 主要思想 提出年限 方法类别
    1 [4] SIFT 通过获取特定关键点附近的梯度信息来描述运动目标, 具有旋转、尺度不变等优良特性, 其改进特征主要有PCA-SIFT[49]、GLOH[50]、SURF[51]、DAISY[52] 2004 梯度特征
    2 [5] HOG 通过计算空间分布区域的梯度强度及其方向信息来描述运动目标, 其改进特征主要有v-HOG[53]、CoHOG[54]、GIST[55] 2005
    3 [56] Gabor 利用Gabor滤波器对图像卷积得到, 在一定程度上模拟了人类视觉的细胞感受野机制 1997 模式特征
    4 [57] LBP 通过计算像素点与周围像素的对比信息, 获得的一种对光照不变的局部描述, 其改进特征主要有CS-LBP[58]、NR-LBP[59] 2004
    5 [60] Haar-like 通过计算相邻矩形区域的像素和之差来描述线性、边缘、中心点以及对角线特征, 其改进特征主要有LAB[66] 2001
    6 [6] DPM 其实质是一种弹性形状模型, 是通过将梯度直方图(HOG)特征与Latent SVM相结合而训练得到的一种目标形状描述模型 2010 形状特征
    7 [69] Shape context 通过获取形状上某一参考点与其余点的距离分布来描述目标轮廓 2002
    8 [71] kAS 使用一组近似线性的线段对目标形状进行描述, 具有平移、尺度等不变特性 2008
    9 [77] Color names 通过将图像像素值映射至相应的语义属性来对目标进行描述, 该特征通常包含11种语义属性, 一般需要结合梯度特征一起使用 2009 颜色特征
    10 [88] 基于熵的显著性特征 通过计算图像像素的灰度概率分布来获取目标的感兴趣区域 2004
    下载: 导出CSV

    表  4  基于学习的特征表达方法

    Table  4  Learning-based feature representation methods

    类别 方法名称
    基于深度学习的特征表达 CDBN[102], SBM[104], DeCAF[112], R-CNN[113], SPPNet[114], Fast R-CNN[115], Faster R-CNN[116], segDeepM[117], MatchNet[118], OverFe-at[121], NIN[122], GoogLeNet[123], VGGNet[124], DeepID Net[125], Vox-Net[126], SuperCNN[127], MDNet[128], DeepSRDCF[129], SODLT[130]
    下载: 导出CSV

    表  5  目标检测典型数据集

    Table  5  Typical data sets for object detection

    序号 参考文献 数据集名字 数据规模 是否标注 特点及描述 主页链接 发布时间
    1 [139] MIT CBCL Pedestrian Database 共924张图片, 64 × 128, PPM格式 人体目标处于图像正中间, 且图像视角限定为正向或背向 http://cbcl.mit.edu/software-datasets/PedestrianData.html 2000
    2 [140-141] USC Pedestrian Detection Test Set 共359张图片, 816个人 包含单视角下无遮挡、部分遮挡以及多视角下无遮挡的行人检测数据 http://iris.usc.edu/Vision-Users/OldUsers/bowu/DatasetWebpage/dataset.html 2005 / 2007
    3 [5] INRIA Person Dataset 共1 805张图片, 64 × 128 包含了各种各样的应用背景, 对行人的姿势没有特别的要求 http://pascal.inrialpes.fr/data/human/ 2005
    4 [45, 142] ChangeDetection.Net 共51段视频, 约140 000帧图片 包含了动态背景、目标运动、夜晚及阴影影响等多种挑战 http://changedetection.net/ 2012/2014
    5 [143] Caltech Pedestrian Dataset 10小时视频, 640 × 480 视频为城市交通环境下驱车拍摄所得, 行人之间存在一定的遮挡 http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/ 2009
    6 [144] CVC Datasets 共9个数据集 部分标注 提供了多种应用场景, 如城市、红外等场景, 行人间存在部分遮挡 http://www.cvc.uab.es/adas/site/?q=node/7 2007/2010/2013~2015
    7 [119] PASCAL VOC Datasets 共11540张图, 含20个类 该比赛包括分类、检测、分割、动作分类以及人体布局检测等任务 http://host.robots.ox.ac.uk/pascal/VOC/ 2005~2012
    8 [120] ImageNet 共14197122张图片 大规模目标识别比赛, 包括目标检测、定位以及场景分类等任务 http://image-net.org/ 2010~2015
    9 [145] Microsoft COCO 约328000张图片, 含91个类 自然场景下的图像分类、检测、场景理解等, 不仅标注了不同的类别, 还对类中个例进行了标注 http://mscoco.org/ 2014
    下载: 导出CSV

    表  6  目标跟踪典型数据集

    Table  6  Typical data sets for object tracking

    序号 参考文献 数据集 数据规模 是否标注 特点及描述 主页链接 发布时间
    1 [209-210] Visual Tracker Benchmark 100段序列 来源于现有文献, 包括了光照及尺度变化、遮挡、形变等9种挑战 http://www.visual-tracking.net 2013
    2 [211] VIVID 9段序列 主要任务为航拍视角下的车辆目标跟踪, 具有表观微小、相似等特点 http://vision.cse.psu.edu/data/vividEval/datasets/datasets.html 2005
    3 [212] CAVIAR 28段序列 主要用于人体目标跟踪, 视频内容包含行走、会面、进出场景等行为 http://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1/ 2003 / 2004
    4 [213] BIWI Walking Pedestrians Dataset 1段序列 主要任务为鸟瞰视角下的行人跟踪, 可用于评测多目标跟踪算法 http://www.vision.ee.ethz.ch/datasets/ 2009
    5 [214] “Central” Pedestrian Crossing Sequences 3段序列 行人过街序列, 每4帧标定一次 http://www.vision.ee.ethz.ch/datasets/ 2007
    6 [215] MOT16 14段序列 无约束环境的多目标跟踪, 有不同视角、相机运动、天气影响等挑战 http://motchallenge.net/ 2016
    7 [216] PETS2015 7段序列 关于停车场中车辆旁边不同活动序列, 可用于目标检测与跟踪、动作识别、场景分析等 http://www.pets2015.net/ 2015
    8 [217] VOT Challenge 60段序列(2015年) 主要用于短视频跟踪算法的评测, 该比赛从2013年开始举办 http://votchallenge.net/ 2013~2015
    下载: 导出CSV

    表  7  典型跟踪算法的性能对比

    Table  7  Performance comparison of typical tracking algorithms

    序号 参考文献 跟踪器 准确度 平均失败数 平均覆盖率 速度(EFO) 时间 方法类别
    1 [128] MDNet 0.60 0.69 0.38 0.87 2015 CNN
    2 [129] DeepSRDCF 0.56 1.05 0.32 0.38 2015
    3 [130] SODLT 0.56 1.78 0.23 0.83 2015
    4 [218] SumShift 0.52 1.68 0.23 16.78 2011 核学习
    5 [219] ASMS 0.51 1.85 0.21 115.09 2013
    6 [217] S3Tracker 0.52 1.77 0.24 14.27 2015
    7 [161] IVT 0.44 4.33 0.12 8.38 2008 子空间学习
    8 [220] CT 0.39 4.09 0.11 12.90 2012
    9 [221] L1APG 0.47 4.65 0.13 1.51 2012 稀疏表示
    10 [222] OAB 0.45 4.19 0.13 8.00 2014 Online Boosting
    11 [223] MCT 0.47 1.76 0.22 2.77 2011
    12 [224] CMIL 0.43 2.47 0.19 5.14 2010
    13 [225] Struck 0.47 1.61 0.25 2.44 2014 SVM
    14 [217] RobStruck 0.48 1.47 0.22 1.89 2015
    15 [226] MIL 0.42 3.11 0.17 5.99 2011 随机学习
    下载: 导出CSV
  • [1] Harold W A. Aircraft warning system, U. S. Patent 3053932, September 1962
    [2] Papageorgiou C P, Oren M, Poggio T. A general framework for object detection. In:Proceedings of the 6th IEEE International Conference on Computer Vision. Bombay, India:IEEE, 1998. 555-562
    [3] Viola P, Jones M J. Robust real-time object detection. International Journal of Computer Vision, 2001, 4:51-52 http://www.academia.edu/9081672/Robust_Real-time_Object_Detection
    [4] Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2):91-110 doi: 10.1023/B:VISI.0000029664.99615.94
    [5] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In:Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, CA, USA:IEEE, 2005. 886-893 http://www.oalib.com/references/16902331
    [6] Felzenszwalb P F, Girshick R B, McAllester D, Ramanan D. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9):1627-1645 doi: 10.1109/TPAMI.2009.167
    [7] Everingham M, Van Gool L, Williams C K I, Winn J, Zisserman A. The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88(2):303-338 doi: 10.1007/s11263-009-0275-4
    [8] Rautaray S S, Agrawal A. Vision based hand gesture recognition for human computer interaction:a survey. Artificial Intelligence Review, 2015, 43(1):1-54 doi: 10.1007/s10462-012-9356-9
    [9] Gao Y, Spiteri C, Pham M T, Al-Milli S. A survey on recent object detection techniques useful for monocular vision-based planetary terrain classification. Robotics and Autonomous Systems, 2014, 62(2):151-167 doi: 10.1016/j.robot.2013.11.003
    [10] Zhang S P, Yao H X, Sun X, Lu X S. Sparse coding based visual tracking:review and experimental comparison. Pattern Recognition, 2013, 46(7):1772-1788 doi: 10.1016/j.patcog.2012.10.006
    [11] Li X, Hu W M, Shen C H, Zhang Z F, Dick A, van den Hengel A. A survey of appearance models in visual object tracking. ACM transactions on Intelligent Systems and Technology (TIST), 2013, 4(4):Article No. 58 http://www.oalib.com/paper/4037071
    [12] 高仕博, 程咏梅, 肖利平, 韦海萍.面向目标检测的稀疏表示方法研究进展.电子学报, 2015, 43(2):320-332 http://www.cnki.com.cn/Article/CJFDTOTAL-DZXU201502018.htm

    Gao Shi-Bo, Cheng Yong-Mei, Xiao Li-Ping, Wei Hai-Ping. Recent advances of sparse representation for object detection. Acta Electronica Sinica, 2015, 43(2):320-332 http://www.cnki.com.cn/Article/CJFDTOTAL-DZXU201502018.htm
    [13] Piccardi M. Background subtraction techniques:a review. In:Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics. The Hague, Holland:IEEE, 2004. 3099-3104 http://www.oalib.com/references/8616532
    [14] Bouwmans T. Traditional and recent approaches in background modeling for foreground detection:an overview. Computer Science Review, 2014, 11-12:31-66 doi: 10.1016/j.cosrev.2014.04.001
    [15] Smeulders A W M, Chu D M, Cucchiara R, Calderara S, Dehghan A, Shah M. Visual tracking:an experimental survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7):1442-1468 doi: 10.1109/TPAMI.2013.230
    [16] Gowsikhaa D, Abirami S, Baskaran R. Automated human behavior analysis from surveillance videos:a survey. Artificial Intelligence Review, 2014, 42(4):747-765 doi: 10.1007/s10462-012-9341-3
    [17] 黄凯奇, 陈晓棠, 康运锋, 谭铁牛.智能视频监控技术综述.计算机学报, 2015, 38(6):1093-1118 http://www.cnki.com.cn/Article/CJFDTOTAL-JSJX201506001.htm

    Huang Kai-Qi, Chen Xiao-Tang, Kang Yun-Feng, Tan Tie-Niu. Intelligent visual surveillance:a review. Chinese Journal of Computers, 2015, 38(6):1093-1118 http://www.cnki.com.cn/Article/CJFDTOTAL-JSJX201506001.htm
    [18] Yilmaz A, Javed O, Shah M. Object tracking:a survey. ACM Computing Surveys (CSUR), 2006, 38(4):Article No. 13 doi: 10.1145/1177352
    [19] 侯志强, 韩崇昭.视觉跟踪技术综述.自动化学报, 2006, 32(4):603-617 http://www.aas.net.cn/CN/abstract/abstract14397.shtml

    Hou Zhi-Qiang, Han Chong-Zhao. A survey of visual tracking. Acta Automatica Sinica, 2006, 32(4):603-617 http://www.aas.net.cn/CN/abstract/abstract14397.shtml
    [20] 万缨, 韩毅, 卢汉清.运动目标检测算法的探讨.计算机仿真, 2006, 23(10):221-226 http://www.cqvip.com/qk/92897x/200610/23059904.html

    Wan Ying, Han Yi, Lu Han-Qing. The methods for moving object detection. Computer Simulation, 2006, 23(10):221-226 http://www.cqvip.com/qk/92897x/200610/23059904.html
    [21] 张娟, 毛晓波, 陈铁军.运动目标跟踪算法研究综述.计算机应用研究, 2009, 26(12):4407-4410 http://www.cnki.com.cn/Article/CJFDTOTAL-JSYJ200912001.htm

    Zhang Juan, Mao Xiao-Bo, Chen Tie-Jun. Survey of moving object tracking algorithm. Application Research of Computers, 2009, 26(12):4407-4410 http://www.cnki.com.cn/Article/CJFDTOTAL-JSYJ200912001.htm
    [22] 牛芗洁, 黄永春.微弱运动目标的检测与跟踪识别算法研究.计算机仿真, 2010, 27(4):245-247 http://www.cnki.com.cn/Article/CJFDTOTAL-JSJZ201004061.htm

    Niu Xiang-Jie, Huang Yong-Chun. Research on detection and tracking identification algorithm of weak moving target. Computer Simulation, 2010, 27(4):245-247 http://www.cnki.com.cn/Article/CJFDTOTAL-JSJZ201004061.htm
    [23] Gutchess D, Trajkovics M, Cohen-Solal E, Lyons D, Jain A K. A background model initialization algorithm for video surveillance. In:Proceedings of the 8th IEEE International Conference on Computer Vision. Vancouver, BC, Canada:IEEE, 2001. 733-740
    [24] Wang H Z, Suter D. A novel robust statistical method for background initialization and visual surveillance. In:Proceedings of the 7th Asian Conference on Computer Vision (ACCV 2006). Hyderabad, India:Springer, 2006. 328-337 http://dblp.uni-trier.de/db/conf/accv/accv2006-1
    [25] Colombari A, Fusiello A. Patch-based background initialization in heavily cluttered video. IEEE Transactions on Image Processing, 2010, 19(4):926-933 doi: 10.1109/TIP.2009.2038652
    [26] Lee B, Hedley M. Background estimation for video surveillance. In:Proceedings of the Image and Vision Computing New Zealand. Auckland, New Zealand, 2002. 315-320
    [27] McFarlane N J B, Schofield C P. Segmentation and tracking of piglets in images. Machine Vision and Applications, 1995, 8(3):187-193 doi: 10.1007/BF01215814
    [28] Bouwmans T, El Baf F, Vachon B. Statistical background modeling for foreground detection:a survey. Handbook of Pattern Recognition and Computer Vision. Singapore:World Scientific Publishing, 2010. 181-189
    [29] Bouwmans T. Recent advanced statistical background modeling for foreground detection:a systematic survey. Recent Patents on Computer Science, 2011, 4(3):147-176 http://benthamscience.com/journal/abstracts.php?journalID=rpcseng&articleID=94725
    [30] Butler D E, Bove V M Jr, Sridharan S. Real-time adaptive foreground/background segmentation. EURASIP Journal on Advances in Signal Processing, 2005, 2005:2292-2304 doi: 10.1155/ASP.2005.2292
    [31] Kim K, Chalidabhongse T H, Harwood D, Davis L. Background modeling and subtraction by codebook construction. In:Proceedings of the 2004 IEEE International Conference on Image Processing. Singapore:IEEE, 2004. 3061-3064 https://www.researchgate.net/publication/4138335_Background_modeling_and_subtraction_by_codebook_construction
    [32] Palomo E J, Domínguez E, Luque R M, Muńoz J. Image hierarchical segmentation based on a GHSOM. In:Proceedings of the 16th International Conference on Neural Information Processing. Bangkok, Thailand:Springer, 2009. 743-750
    [33] De Gregorio M, Giordano M. Background modeling by weightless neural networks. In:Proceedings of the 2015 Workshops on New Trends in Image Analysis and Processing (ICIAP 2015). Genoa, Italy:Springer, 2015. 493-501 http://www.springer.com/us/book/9783319232218
    [34] Toyama K, Krumm J, Brumitt B, Meyers B. Wallflower:principles and practice of background maintenance. In:Proceedings of the 7th IEEE International Conference on Computer Vision. Kerkyra, Greece:IEEE, 1999. 255-261
    [35] Ridder C, Munkelt O, Kirchner H. Adaptive background estimation and foreground detection using Kalman-filtering. In:Proceedings of the 1995 International Conference on Recent Advances in Mechatronics. Istanbul, Turkey:Boğaziči University, 1995. 193-199
    [36] Kim W, Kim C. Background subtraction for dynamic texture scenes using fuzzy color histograms. IEEE Signal Processing Letters, 2012, 19(3):127-130 doi: 10.1109/LSP.2011.2182648
    [37] Bouwmans T, Zahzah E H. Robust PCA via principal component pursuit:a review for a comparative evaluation in video surveillance. Computer Vision and Image Understanding, 2014, 122:22-34 doi: 10.1016/j.cviu.2013.11.009
    [38] Cevher V, Sankaranarayanan A, Duarte M F, Reddy D, Baraniuk R G, Chellappa R. Compressive sensing for background subtraction. In:Proceedings of the 10th European Conference on Computer Vision (ECCV 2008). Marseille, France:Springer, 2008. 155-168
    [39] Wren C R, Porikli F. Waviz:spectral similarity for object detection. In:Proceedings of the 2005 IEEE International Workshop on Performance Evaluation of Tracking and Surveillance. Breckenridge, Colorado, USA:IEEE, 2005. 55-61
    [40] Baltieri D, Vezzani R, Cucchiara R. Fast background initialization with recursive Hadamard transform. In:Proceedings of the 7th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). Boston, USA:IEEE, 2010. 165-171
    [41] Bouwmans T, El Baf F, Vachon B. Background modeling using mixture of gaussians for foreground detection——a survey. Recent Patents on Computer Science, 2008, 1(3):219-237 doi: 10.2174/2213275910801030219
    [42] Lin H H, Liu T L, Chuang J H. A probabilistic SVM approach for background scene initialization. In:Proceedings of the 2002 International Conference on Image Processing. Rochester, New York, USA:IEEE, 2002. 893-896
    [43] Maddalena L, Petrosino A. The SOBS algorithm:what are the limits? In:Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Providence, RI, USA:IEEE, 2012. 21-26
    [44] Maddalena L, Petrosino A. The 3dSOBS+ algorithm for moving object detection. Computer Vision and Image Understanding, 2014, 122:65-73 doi: 10.1016/j.cviu.2013.11.006
    [45] Goyette N, Jodoin P M, Porikli F, Konrad J, Ishwar P. Changedetection.net:a new change detection benchmark dataset. In:Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Providence, RI, USA:IEEE, 2012. 1-8
    [46] Barnich O, Van Droogenbroeck M. ViBe:a powerful random technique to estimate the background in video sequences. In:Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Taipei, China:IEEE, 2009. 945-948 http://cn.bing.com/academic/profile?id=2115352131&encoded=0&v=paper_preview&mkt=zh-cn
    [47] Hofmann M, Tiefenbacher P, Rigoll G. Background segmentation with feedback:the pixel-based adaptive segmenter. In:Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Providence, RI, USA:IEEE, 2012. 38-43
    [48] Sobral A, Bouwmans T. BGS Library:A Library Framework for Algorithm's Evaluation in Foreground/Background Segmentation. London:CRC Press, 2014. https://www.researchgate.net/publication/259574448_BGS_Library_A_Library_Framework_for_Algorithm%27s_Evaluation_in_ForegroundBackground_Segmentation
    [49] Ke Y, Sukthankar R. PCA-SIFT:a more distinctive representation for local image descriptors. In:Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, D.C., USA:IEEE, 2004. Ⅱ-506-Ⅱ-513
    [50] Mikolajczyk K, Schmid C. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(10):1615-1630 doi: 10.1109/TPAMI.2005.188
    [51] Bay H, Ess A, Tuytelaars T, Van Gool L. Speeded-up robust features (SURF). Computer Vision and Image Understanding, 2008, 110(3):346-359 doi: 10.1016/j.cviu.2007.09.014
    [52] Tola E, Lepetit V, Fua P. Daisy:an efficient dense descriptor applied to wide-baseline stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(5):815-830 doi: 10.1109/TPAMI.2009.77
    [53] Zhu Q, Yeh M C, Cheng K T, Avidan S. Fast human detection using a cascade of histograms of oriented gradients. In:Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA:IEEE, 2006. 1491-1498
    [54] Watanabe T, Ito S, Yokoi K. Co-occurrence histograms of oriented gradients for human detection. Information and Media Technologies, 2010, 5(2):659-667 https://www.researchgate.net/publication/240976706_Co-occurrence_Histograms_of_Oriented_Gradients_for_Human_Detection
    [55] Torralba A, Oliva A, Castelhano M S, Henderson J M. Contextual guidance of eye movements and attention in real-world scenes:the role of global features in object search. Psychological Review, 2006, 113(4):766-786 doi: 10.1037/0033-295X.113.4.766
    [56] Jain A K, Ratha N K, Lakshmanan S. Object detection using Gabor filters. Pattern Recognition, 1997, 30(2):295-309 doi: 10.1016/S0031-3203(96)00068-4
    [57] Ahonen T, Hadid A, Pietikäinen M. Face recognition with local binary patterns. In:Proceedings of the 8th European Conference on Computer Vision (ECCV 2004). Prague, Czech Republic:Springer, 2004. 469-481
    [58] Heikkilä M, Pietikäinen M, Schmid C. Description of interest regions with local binary patterns. Pattern Recognition, 2009, 42(3):425-436 doi: 10.1016/j.patcog.2008.08.014
    [59] Nguyen D T, Ogunbona P O, Li W Q. A novel shape-based non-redundant local binary pattern descriptor for object detection. Pattern Recognition, 2013, 46(5):1485-1500 doi: 10.1016/j.patcog.2012.10.024
    [60] Viola P, Jones M. Robust Real-time Object Detection, Technical Report CRL-2001-1, Cambridge Research Laboratory, University of Cambridge, United Kingdom, 2001 http://www.nexoncn.com/read/a611db6e123132763aa5bf23.html
    [61] Wu J X, Rehg J M. CENTRIST:a visual descriptor for scene categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(8):1489-1501 doi: 10.1109/TPAMI.2010.224
    [62] Bourdev L, Malik J. Poselets:body part detectors trained using 3D human pose annotations. In:Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto, Japan:IEEE, 2009. 1365-1372 https://www.researchgate.net/publication/221110114_Poselets_Body_Part_Detectors_Trained_Using_3D_Human_Pose_Annotations
    [63] Girshick R, Song H O, Darrell T. Discriminatively activated sparselets. In:Proceedings of the 30th International Conference on Machine Learning (ICML-13). Atlanta, GA, USA:ACM, 2013. 196-204
    [64] Kokkinos I. Shufflets:shared mid-level parts for fast object detection. In:Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV). Sydney, Australia:IEEE, 2013. 1393-1400 https://www.computer.org/csdl/proceedings/iccv/2013/2840/00/index.html
    [65] Wang X Y, Yang M, Zhu S H, Lin Y Q. Regionlets for generic object detection. In:Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV). Sydney, Australia:IEEE, 2013. 17-24
    [66] Yan S Y, Shan S G, Chen X L, Gao W. Locally assembled binary (LAB) feature with feature-centric cascade for fast and accurate face detection. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Anchorage, Alaska, USA:IEEE, 2008. 1-7
    [67] Arman F, Aggarwal J K. Model-based object recognition in dense-range images——a review. ACM Computing Surveys (CSUR), 1993, 25(1):5-43 doi: 10.1145/151254.151255
    [68] Yang M Q, Kpalma K, Ronsin J. A survey of shape feature extraction techniques. Pattern Recognition. IN-TECH, 2008. 43-90
    [69] Belongie S, Malik J, Puzicha J. Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(4):509-522 doi: 10.1109/34.993558
    [70] Kontschieder P, Riemenschneider H, Donoser M, Bischof H. Discriminative learning of contour fragments for object detection. In:Proceedings of the 2011 British Machine Vision Conference. Dundee, Scotland:British Machine Vision Association, 2011. 4.1-4.12
    [71] Ferrari V, Fevrier L, Jurie F, Schmid C. Groups of adjacent contour segments for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(1):36-51 doi: 10.1109/TPAMI.2007.1144
    [72] Chia A Y S, Rahardja S, Rajan D, Leung M K. Object recognition by discriminative combinations of line segments and ellipses. In:Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). San Francisco, CA, USA:IEEE, 2010. 2225-2232 https://www.computer.org/csdl/proceedings/cvpr/2010/6984/00/index.html
    [73] Tombari F, Franchi A, Di L. BOLD features to detect texture-less objects. In:Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV). Sydney, Australia:IEEE, 2013. 1265-1272 https://www.computer.org/csdl/proceedings/iccv/2013/2840/00/index.html
    [74] Jurie F, Schmid C. Scale-invariant shape features for recognition of object categories. In:Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, D. C., USA:IEEE, 2004. Ⅱ-90-Ⅱ-96
    [75] Dhankhar P, Sahu N. A review and research of edge detection techniques for image segmentation. International Journal of Computer Science and Mobile Computing (IJCSMC), 2013, 2(7):86-92 http://www.academia.edu/4018689/A_Review_and_Research_of_Edge_Detection_Techniques_for_Image_Segmentation_
    [76] Rassem T H, Khoo B E. Object class recognition using combination of color SIFT descriptors. In:Proceedings of the 2011 IEEE International Conference on Imaging Systems and Techniques (IST). Penang, Malaysia:IEEE, 2011. 290-295
    [77] Van De Weijer J, Schmid C, Verbeek J, Larlus D. Learning color names for real-world applications. IEEE Transactions on Image Processing, 2009, 18(7):1512-1523 doi: 10.1109/TIP.2009.2019809
    [78] Vadivel A, Sural S, Majumdar A K. An integrated color and intensity co-occurrence matrix. Pattern Recognition Letters, 2007, 28(8):974-983 doi: 10.1016/j.patrec.2007.01.004
    [79] Walk S, Majer N, Schindler K, Schiele B. New features and insights for pedestrian detection. In:Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). San Francisco, CA, USA:IEEE, 2010. 1030-1037
    [80] Shechtman E, Irani M. Matching local self-similarities across images and videos. In:Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Minneapolis, Minnesota, USA:IEEE, 2007. 1-8
    [81] Deselaers T, Ferrari V. Global and efficient self-similarity for object classification and detection. In:Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). San Francisco, CA, USA:IEEE, 2010. 1633-1640
    [82] Tuzel O, Porikli F, Meer P. Human detection via classification on Riemannian manifolds. In:Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Minneapolis, Minnesota, USA:IEEE, 2007. 1-8 https://www.researchgate.net/publication/221361317_Human_Detection_via_Classification_on_Riemannian_Manifolds
    [83] Burghouts G J, Geusebroek J M. Performance evaluation of local colour invariants. Computer Vision and Image Understanding, 2009, 113(1):48-62 doi: 10.1016/j.cviu.2008.07.003
    [84] Bosch A, Zisserman A, Muńoz X. Scene classification using a hybrid generative/discriminative approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(4):712-727 doi: 10.1109/TPAMI.2007.70716
    [85] Van De Weijer J, Schmid C. Coloring local feature extraction. In:Proceedings of the 9th European Conference on Computer Vision (ECCV 2006). Graz, Austria:Springer, 2006. 334-348
    [86] Khan F S, Anwer R M, van de Weijer J, Bagdanov A D, Vanrell M, Lopez A M. Color attributes for object detection. In:Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, RI, USA:IEEE, 2012. 3306-3313
    [87] Danelljan M, Khan F S, Felsberg M, van de Weijer J. Adaptive color attributes for real-time visual tracking. In:Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, OH, USA:IEEE, 2014. 1090-1097 https://www.computer.org/csdl/proceedings/cvpr/2014/5118/00/index.html
    [88] Kadir T, Zisserman A, Brady M. An affine invariant salient region detector. In:Proceedings of the 8th European Conference on Computer Vision (ECCV 2004). Prague, Czech Republic:Springer, 2004. 228-241
    [89] Lee T S, Mumford D, Romero R, Lamme V A F. The role of the primary visual cortex in higher level vision. Vision Research, 1998, 38(15-16):2429-2454 doi: 10.1016/S0042-6989(97)00464-1
    [90] Lee T S, Mumford D. Hierarchical Bayesian inference in the visual cortex. Journal of the Optical Society of America A, 2003, 20(7):1434-1448 doi: 10.1364/JOSAA.20.001434
    [91] Jia Y Q, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe:convolutional architecture for fast feature embedding. In:Proceedings of the 22nd ACM International Conference on Multimedia. Orlando, Florida, USA:ACM, 2014. 675-678
    [92] Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Ranzato M, Senior A, Tucker P, Yang K, Le Q V, Ng A Y. Large scale distributed deep networks. In:Proceedings of the 2012 Advances in Neural Information Processing Systems 25. Lake Tahoe, Nevada, USA:MIT Press, 2012. 1223-1231
    [93] Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z F, Citro C, Corrado G S, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y Q, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mane D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viegas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X Q. TensorFlow:large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467, 2016.
    [94] Collobert R, Kavukcuoglu K, Farabet C. Torch7:a Matlab-like environment for machine learning. In:Proceedings of Annual Conference on Neural Information Processing Systems. Granada, Spain:MIT Press, 2011.
    [95] Krizhevsky A. CUDA-convnet:high-performance C++/CUDA implementation of convolutional neural networks[Online], available:http://code.google.com/p/cuda-convnet/, August6, 2016
    [96] Vedaldi A, Lenc K. MatConvNet-convolutional neural networks for MATLAB. arXiv:1412.4564, 2014.
    [97] Goodfellow I J, Warde-Farley D, Lamblin P, Dumoulin V, Mirza M, Pascanu R, Bergstra J, Bastien F, Bengio Y. Pylearn2:a machine learning research library. arXiv:1308.4214, 2013.
    [98] The Theano Development Team. Theano:a Python framework for fast computation of mathematical expressions. arXiv:1605.02688, 2016.
    [99] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7):1527-1554 doi: 10.1162/neco.2006.18.7.1527
    [100] Hinton G E, Zemel R S. Autoencoders, minimum description length and Helmholtz free energy. In:Proceedings of the 1993 Advances in Neural Information Processing Systems 6. Cambridge, MA:MIT Press, 1993. 3-10
    [101] Lécun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11):2278-2324 doi: 10.1109/5.726791
    [102] Lee H, Grosse R, Ranganath R, Ng A Y. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In:Proceedings of the 26th Annual International Conference on Machine Learning. Montréal, Canada:ACM, 2009. 609-616
    [103] Nair V, Hinton G E. 3D object recognition with deep belief nets. In:Proceedings of the 2009 Advances in Neural Information Processing Systems 22. Vancouver, B.C., Canada:MIT Press, 2009. 1339-1347
    [104] Eslami S M A, Heess N, Williams C K I, Winn J. The shape Boltzmann machine:a strong model of object shape. International Journal of Computer Vision, 2014, 107(2):155-176 doi: 10.1007/s11263-013-0669-1
    [105] Salakhutdinov R, Hinton G. Deep Boltzmann machines. In:Proceedings of the 12th International Conference on Artificial Intelligence and Statistics. Clearwater Beach, Florida, USA:ACM, 2009. 448-455
    [106] 郑胤, 陈权崎, 章毓晋.深度学习及其在目标和行为识别中的新进展.中国图象图形学报, 2014, 19(2):175-184 http://www.cnki.com.cn/Article/CJFDTOTAL-ZGTB201402002.htm

    Zheng Yin, Chen Quan-Qi, Zhang Yu-Jin. Deep learning and its new progress in object and behavior recognition. Journal of Image and Graphics, 2014, 19(2):175-184 http://www.cnki.com.cn/Article/CJFDTOTAL-ZGTB201402002.htm
    [107] Xiong M F, Chen J, Wang Z, Liang C, Zheng Q, Han Z, Sun K M. Deep feature representation via multiple stack auto-encoders. In:Proceedings of the 16th Pacific-Rim Conference on Advances in Multimedia Information Processing (PCM 2015). Gwangju, South Korea:Springer, 2015. 275-284
    [108] Yin H P, Jiao X G, Chai Y, Fang B. Scene classification based on single-layer SAE and SVM. Expert Systems with Applications, 2015, 42(7):3368-3380 doi: 10.1016/j.eswa.2014.11.069
    [109] Bai J, Wu Y, Zhang J M, Chen F Q. Subset based deep learning for RGB-D object recognition. Neurocomputing, 2015, 165:280-292 doi: 10.1016/j.neucom.2015.03.017
    [110] Su S Z, Liu Z H, Xu S P, Li S Z, Ji R R. Sparse auto-encoder based feature learning for human body detection in depth image. Signal Processing, 2015, 112:43-52 doi: 10.1016/j.sigpro.2014.11.003
    [111] Hubel D H, Wiesel T N. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. The Journal of Physiology, 1962, 160(1):106-154 doi: 10.1113/jphysiol.1962.sp006837
    [112] Donahue J, Jia Y Q, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T. DeCAF:a deep convolutional activation feature for generic visual recognition. arXiv:1310.1531, 2013.
    [113] Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In:Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, OH, USA:IEEE, 2014. 580-587
    [114] He K M, Zhang X Y, Ren S Q, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916 doi: 10.1109/TPAMI.2015.2389824
    [115] Girshick R. Fast R-CNN. arXiv:1504.08083, 2015.
    [116] Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN:towards real-time object detection with region proposal networks. arXiv:1506.01497, 2015.
    [117] Zhu Y K, Urtasun R, Salakhutdinov R, Fidler S. SegDeepM:exploiting segmentation and context in deep neural networks for object detection. arXiv:1502.04275, 2015. http://www.academia.edu/14266389/segDeepM_Exploiting_Segmentation_and_Context_in_Deep_Neural_Networks_for_Object_Detection
    [118] Han X F, Leung T, Jia Y Q, Sukthankar R, Berg A C. MatchNet:unifying feature and metric learning for patch-based matching. In:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA:IEEE, 2015. 3279-3286
    [119] Everingham M, Eslami S M A, Van Gool L, Williams C K I, Winn J, Zisserman A. The pascal visual object classes challenge:a retrospective. International Journal of Computer Vision, 2015, 111(1):98-136 doi: 10.1007/s11263-014-0733-5
    [120] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C, Li F F. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3):211-252 doi: 10.1007/s11263-015-0816-y
    [121] Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y. OverFeat:integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229, 2013. http://www.bibsonomy.org/bibtex/245b5a51026b1aa160b129a2e05c1a1f2/dblp
    [122] Lin M, Chen Q, Yan S C. Network in network. arXiv:1312.4400, 2013.
    [123] Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA:IEEE, 2015. 1-9
    [124] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014. http://www.robots.ox.ac.uk/%7Evgg/research/very_deep/
    [125] Ouyang W L, Luo P, Zeng X Y, Qiu S, Tian Y L, Li H S, Yang S, Wang Z, Xiong Y J, Qian C, Zhu Z Y, Wang R H, Loy C C, Wang X G, Tang X O. DeepID-Net:multi-stage and deformable deep convolutional neural networks for object detection. arXiv:1409.3505, 2014.
    [126] Maturana D, Scherer S. VoxNet:a 3D convolutional neural network for real-time object recognition. In:Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Hamburg, Germany:IEEE, 2015. 922-928
    [127] He S F, Lau R W H, Liu W X, Huang Z, Yang Q X. SuperCNN:a superpixelwise convolutional neural network for salient object detection. International Journal of Computer Vision, 2015, 15(3):330-344 https://www.researchgate.net/publication/275246763_SuperCNN_A_Superpixelwise_Convolutional_Neural_Network_for_Salient_Object_Detection
    [128] Nam H, Han B. Learning multi-domain convolutional neural networks for visual tracking. arXiv:1510.07945, 2015.
    [129] Danelljan M, Häger G, Shahbaz Khan F, Felsberg M. Learning spatially regularized correlation filters for visual tracking. In:Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago, Chile:IEEE, 2015. 4310-4318
    [130] Wang N Y, Li S Y, Gupta A, Yeung D Y. Transferring rich feature hierarchies for robust visual tracking. arXiv:1501.04587, 2015.
    [131] Kotsiantis S B, Zaharakis I D, Pintelas P E. Machine learning:a review of classification and combining techniques. Artificial Intelligence Review, 2006, 26(3):159-190 doi: 10.1007/s10462-007-9052-3
    [132] Schölkopf B, Smola A J. Learning with Kernels:Support Vector Machines, Regularization, Optimization, and Beyond. London, England:MIT Press, 2002. http://www.researchgate.net/publication/240074728_Kernels_Regularization_and_Optimization
    [133] Lu Z W, Ip H H S. Image categorization with spatial mismatch kernels. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Miami, Florida, USA:IEEE, 2009. 3974-404 https://www.computer.org/web/csdl/index/-/csdl/proceedings/cvpr/2009/3992/00/index.html
    [134] Lazebnik S, Schmid C, Ponce J. Beyond bags of features:spatial pyramid matching for recognizing natural scene categories. In:Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA:IEEE, 2006. 2169-2178 http://www.oalib.com/references/16875919
    [135] Yang J C, Yu K, Gong Y H, Huang T. Linear spatial pyramid matching using sparse coding for image classification. In:Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, Florida, USA:IEEE, 2009. 1794-1801
    [136] Kavukcuoglu K, Ranzato M A, Fergus R, LeCun Y. Learning invariant features through topographic filter maps. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Miami, Florida, USA:IEEE, 2009. 1605-1612 https://www.computer.org/web/csdl/index/-/csdl/proceedings/cvpr/2009/3992/00/index.html
    [137] Gao S H, Tsang I W H, Chia L T, Zhao P L. Local features are not lonely-Laplacian sparse coding for image classification. In:Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA:IEEE, 2010. 3555-3561 https://www.researchgate.net/publication/221364283_Local_features_are_not_lonely_-_Laplacian_sparse_coding_for_image_classification
    [138] Meshram S B, Shinde S M. A survey on ensemble methods for high dimensional data classification in biomedicine field. International Journal of Computer Applications, 2015, 111(11):5-7 doi: 10.5120/19580-1162
    [139] Papageorgiou C, Poggio T. A trainable system for object detection. International Journal of Computer Vision, 2000, 38(1):15-33 doi: 10.1023/A:1008162616689
    [140] Wu B, Nevatia R. Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In:Proceedings of the 10th IEEE International Conference on Computer Vision. Beijing, China:IEEE, 2005. 90-97
    [141] Wu B, Nevatia R. Cluster boosted tree classifier for multi-view, multi-pose object detection. In:Proceedings of the 11th IEEE International Conference on Computer Vision. Rio de Janeiro, Brazil:IEEE, 2007. 1-8
    [142] Wang Y, Jodoin P M, Porikli F, Konrad J, Benezeth Y, Ishwar P. CDnet 2014:an expanded change detection benchmark dataset. In:Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Columbus, OH, USA:IEEE, 2014. 393-400
    [143] Dollár P, Wojek C, Schiele B, Perona P. Pedestrian detection:a benchmark. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Miami, Florida, USA:IEEE, 2009. 304-311
    [144] González A, Vázquez D, Ramos S, López A M, Amores J. Spatiotemporal stacked sequential learning for pedestrian detection. In:Proceedings of the 7th Iberian Conference on Pattern Recognition and Image Analysis. Santiago de Compostela, Spain:Springer, 2015. 3-12 doi: 10.1007/978-3-319-19390-8
    [145] Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L. Microsoft COCO:common objects in context. In:Proceedings of the 13th European Conference on Computer Vision (ECCV 2014). Zurich, Switzerland:Springer, 2014. 740-755
    [146] Seber G A F, Lee A J. Linear Regression Analysis (Second Edition). New York:John Wiley & Sons, 2003.
    [147] Comaniciu D, Ramesh V, Meer P. Real-time tracking of non-rigid objects using mean shift. In:Proceedings of the 2000 IEEE Conference on Computer Vision and Pattern Recognition. Hilton Head, SC, USA:IEEE, 2000. 142-149 http://www.oalib.com/references/16342699
    [148] Chen F S, Fu C M, Huang C L. Hand gesture recognition using a real-time tracking method and hidden Markov models. Image and Vision Computing, 2003, 21(8):745-758 doi: 10.1016/S0262-8856(03)00070-2
    [149] Ali N H, Hassan G M. Kalman filter tracking. International Journal of Computer Applications, 2014, 89(9):15-18 doi: 10.5120/15530-4315
    [150] Chang C, Ansari R. Kernel particle filter for visual tracking. IEEE Signal Processing Letters, 2005, 12(3):242-245 doi: 10.1109/LSP.2004.842254
    [151] Comaniciu D, Ramesh V, Meer P. Kernel-based object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(5):564-577 doi: 10.1109/TPAMI.2003.1195991
    [152] Rahmati H, Aamo O M, StavdahlØ, Adde L. Kernel-based object tracking for cerebral palsy detection. In:Proceedings of the 2012 International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV). United States:CSREA Press, 2012. 17-23
    [153] Melzer T, Reiter M, Bischof H. Appearance models based on kernel canonical correlation analysis. Pattern Recognition, 2003, 36(9):1961-1971 doi: 10.1016/S0031-3203(03)00058-X
    [154] Yilmaz A. Object tracking by asymmetric kernel mean shift with automatic scale and orientation selection. In:Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, Minnesota, USA:IEEE, 2007. 1-6
    [155] Hu J S, Juan C W, Wang J J. A spatial-color mean-shift object tracking algorithm with scale and orientation estimation. Pattern Recognition Letters, 2008, 29(16):2165-2173 doi: 10.1016/j.patrec.2008.08.007
    [156] Levey A, Lindenbaum M. Sequential Karhunen-Loeve basis extraction and its application to images. IEEE Transactions on Image Processing, 2000, 9(8):1371-1374 doi: 10.1109/83.855432
    [157] Brand M. Incremental singular value decomposition of uncertain data with missing values. In:Proceedings of the 7th European Conference on Computer Vision (ECCV 2002). Copenhagen, Denmark:Springer, 2002. 707-720 http://dl.acm.org/citation.cfm?id=645315&picked=prox
    [158] De La Torre F, Black M J. A framework for robust subspace learning. International Journal of Computer Vision, 2003, 54(1-3):117-142 doi: 10.1023%2FA%3A1023709501986
    [159] Li Y M. On incremental and robust subspace learning. Pattern Recognition, 2004, 37(7):1509-1518 doi: 10.1016/j.patcog.2003.11.010
    [160] Skocaj D, Leonardis A. Weighted and robust incremental method for subspace learning. In:Proceedings of the 9th IEEE International Conference on Computer Vision. Nice, France:IEEE, 2003. 1494-1501 https://www.computer.org/csdl/proceedings/iccv/2003/1950/02/index.html
    [161] Ross D A, Lim J, Lin R S, Yang M H. Incremental learning for robust visual tracking. International Journal of Computer Vision, 2008, 77(1-3):125-141 doi: 10.1007/s11263-007-0075-7
    [162] Wang Q, Chen F, Xu W L, Yang M H. Object tracking via partial least squares analysis. IEEE Transactions on Image Processing, 2012, 21(10):4454-4465 doi: 10.1109/TIP.2012.2205700
    [163] Li X, Hu W M, Zhang Z F, Zhang X Q, Luo G. Robust visual tracking based on incremental tensor subspace learning. In:Proceedings of the 11th IEEE International Conference on Computer Vision. Rio de Janeiro, Brazil:IEEE, 2007. 1-8
    [164] Wen J, Li X L, Gao X B, Tao D C. Incremental learning of weighted tensor subspace for visual tracking. In:Proceedings of the 2009 IEEE International Conference on Systems, Man and Cybernetics. San Antonio, Texas, USA:IEEE, 2009. 3688-3693
    [165] Khan Z H, Gu I Y H. Nonlinear dynamic model for visual object tracking on grassmann manifolds with partial occlusion handling. IEEE Transactions on Cybernetics, 2013, 43(6):2005-2019 doi: 10.1109/TSMCB.2013.2237900
    [166] Chin T J, Suter D. Incremental kernel principal component analysis. IEEE Transactions on Image Processing, 2007, 16(6):1662-1674 doi: 10.1109/TIP.2007.896668
    [167] Mei X, Ling H B. Robust visual tracking using l1 minimization. In:Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto, Japan:IEEE, 2009. 1436-1443
    [168] Li H X, Shen C H, Shi Q F. Real-time visual tracking using compressive sensing. In:Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, RI, USA:IEEE, 2011. 1305-1312
    [169] Jia X, Lu H C, Yang M H. Visual tracking via adaptive structural local sparse appearance model. In:Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, RI, USA:IEEE, 2012. 1822-1829
    [170] Dong W H, Chang F L, Zhao Z J. Visual tracking with multifeature joint sparse representation. Journal of Electronic Imaging, 2015, 24(1):013006 doi: 10.1117/1.JEI.24.1.013006
    [171] Hu W M, Li W, Zhang X Q, Maybank S. Single and multiple object tracking using a multi-feature joint sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(4):816-833 doi: 10.1109/TPAMI.2014.2353628
    [172] Zhang T Z, Liu S, Xu C S, Yan S C, Ghanem B, Ahuja N, Yang M H. Structural sparse tracking. In:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA:IEEE, 2015. 150-158 https://www.computer.org/csdl/proceedings/cvpr/2015/6964/00/index.html
    [173] Zhong W, Lu H C, Yang M H. Robust object tracking via sparse collaborative appearance model. IEEE Transactions on Image Processing, 2014, 23(5):2356-2368 doi: 10.1109/TIP.2014.2313227
    [174] Bai T X, Li Y F. Robust visual tracking with structured sparse representation appearance model. Pattern Recognition, 2012, 45(6):2390-2404 doi: 10.1016/j.patcog.2011.12.004
    [175] Zhang S P, Yao H X, Zhou H Y, Sun X, Liu S H. Robust visual tracking based on online learning sparse representation. Neurocomputing, 2013, 100:31-40 doi: 10.1016/j.neucom.2011.11.031
    [176] Wang N Y, Wang J D, Yeung D Y. Online robust nonnegative dictionary learning for visual tracking. In:Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV). Sydney, Australia:IEEE, 2013. 657-664 http://www.oalib.com/references/17185722
    [177] Zhang X, Guan N Y, Tao D C, Qiu X G, Luo Z G. Online multi-modal robust non-negative dictionary learning for visual tracking. PLoS One, 2015, 10(5):657-664
    [178] Oza N C. Online bagging and boosting. In:Proceedings of the 2005 IEEE International Conference on Systems, Man and Cybernetics. Waikoloa, Hawaii, USA:IEEE, 2005. 2340-2345
    [179] Valiant L. Probably Approximately Correct:Nature0s Algorithms for Learning and Prospering in a Complex World. New York, USA:Basic Books, 2013. http://www.amazon.de/Probably-Approximately-Correct-Algorithms-Prospering-ebook/dp/B00BE650IQ
    [180] Grabner H, Grabner M, Bischof H. Real-time tracking via on-line boosting. In:Proceedings of the 2006 British Machine Conference. Edinburgh, UK:British Machine Vision Association, 2006. 6.1-6.10
    [181] Liu X M, Yu T. Gradient feature selection for online boosting. In:Proceedings of the 11th IEEE International Conference on Computer Vision. Rio de Janeiro, Brazil:IEEE, 2007. 1-8 http://www.oalib.com/references/16891977
    [182] Avidan S. Ensemble tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(2):261-271 doi: 10.1109/TPAMI.2007.35
    [183] Parag T, Porikli F, Elgammal A. Boosting adaptive linear weak classifiers for online learning and tracking. In:Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, Alaska, USA:IEEE, 2008. 1-8 https://www.computer.org/csdl/proceedings/cvpr/2008/2242/00/index.html
    [184] Visentini I, Snidaro L, Foresti G L. Dynamic ensemble for target tracking. In:Proceedings of the 8th IEEE International Workshop on Visual Surveillance (VS2008). Marseille, France:IEEE, 2008. 1-8
    [185] Okuma K, Taleghani A, De Freitas N, Little J J, Lowe D G. A boosted particle filter:multitarget detection and tracking. In:Proceedings of the 8th European Conference on Computer Vision (ECCV 2004). Prague, Czech Republic:Springer, 2004. 28-39
    [186] Wang J Y, Chen X L, Gao W. Online selecting discriminative tracking features using particle filter. In:Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, CA, USA:IEEE, 2005. 1037-1042 https://www.computer.org/web/csdl/index/-/csdl/proceedings/cvpr/2005/2372/02/index.html
    [187] Avidan S. Support vector tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(8):1064-1072 doi: 10.1109/TPAMI.2004.53
    [188] Williams O, Blake A, Cipolla R. Sparse Bayesian learning for efficient visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8):1292-1304 doi: 10.1109/TPAMI.2005.167
    [189] Tian M, Zhang W W, Liu F Q. On-line ensemble SVM for robust object tracking. In:Proceedings of the 8th Asian Conference on Computer Vision (ACCV 2007). Tokyo, Japan:Springer, 2007. 355-364
    [190] Yao R, Shi Q F, Shen C H, Zhang Y N, van den Hengel A. Part-based visual tracking with online latent structural learning. In:Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Portland, OR, USA:IEEE, 2013. 2363-2370
    [191] Bai Y C, Tang M. Robust tracking via weakly supervised ranking SVM. In:Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, RI, USA:IEEE, 2012. 1854-1861
    [192] Hare S, Saffari A, Torr P H S. Struck:structured output tracking with kernels. In:Proceedings of the 2011 International Conference on Computer Vision (ICCV). Barcelona, Spain:IEEE, 2011. 263-270
    [193] Tang F, Brennan S, Zhao Q, Tao H. Co-tracking using semisupervised support vector machines. In:Proceedings of the 11th IEEE International Conference on Computer Vision. Rio de Janeiro, Brazil:IEEE, 2007. 1-8 https://www.computer.org/web/csdl/index/-/csdl/proceedings/iccv/2007/1630/00/index.html
    [194] Zhang S L, Sui Y, Yu X, Zhao S C, Zhang L. Hybrid support vector machines for robust object tracking. Pattern Recognition, 2015, 48(8):2474-2488 doi: 10.1016/j.patcog.2015.02.008
    [195] Zhang X M, Wang M G. Compressive tracking using incremental LS-SVM. In:Proceedings of the 27th Chinese Control and Decision Conference (CCDC). Qingdao, China:IEEE, 2015. 1845-1850
    [196] Breiman L. Random forests. Machine Learning, 2001, 45(1):5-32 doi: 10.1023/A:1010933404324
    [197] Saffari A, Leistner C, Santner J, Godec M, Bischof H. Online random forests. In:Proceedings of the 12th IEEE International Conference on Computer Vision (ICCVW). Kyoto, Japan:IEEE, 2009. 1393-1400 http://www.oalib.com/references/19330612
    [198] Leistner C, Saffari A, Bischof H. Miforests:multipleinstance learning with randomized trees. In:Proceedings of the 11th European Conference on Computer Vision (ECCV 2010). Crete, Greece:Springer, 2010. 29-42
    [199] Godec M, Leistner C, Saffari A, Bischof H. On-line random naive bayes for tracking. In:Proceedings of the 20th International Conference on Pattern Recognition (ICPR). Istanbul, Turkey:IEEE, 2010. 3545-3548
    [200] Wang A P, Wan G W, Cheng Z Q, Li S K. An incremental extremely random forest classifier for online learning and tracking. In:Proceedings of the 16th IEEE International Conference on Image Processing (ICIP). Cairo, Egypt:IEEE, 2009. 1449-1452
    [201] Lin R S, Ross D A, Lim J, Yang M H. Adaptive discriminative generative model and its applications. In:Proceedings of the 2004 Advances in Neural Information Processing Systems 17. Vancouver, British Columbia, Canada:MIT Press, 2004. 801-808
    [202] Nguyen H T, Smeulders A W M. Robust tracking using foreground-background texture discrimination. International Journal of Computer Vision, 2006, 69(3):277-293 doi: 10.1007/s11263-006-7067-x
    [203] Li X, Hu W M, Zhang Z F, Zhang X Q, Zhu M L, Cheng J. Visual tracking via incremental log-Euclidean Riemannian subspace learning. In:Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, Alaska, USA:IEEE, 2008. 1-8
    [204] Wang X Y, Hua G, Han T X. Discriminative tracking by metric learning. In:Proceedings of the 11th European Conference on Computer Vision (ECCV 2010). Heraklion, Crete, Greece:Springer, 2010. 200-214 http://dblp.uni-trier.de/db/conf/eccv/eccv2010-3
    [205] Tsagkatakis G, Savakis A. Online distance metric learning for object tracking. IEEE Transactions on Circuits and Systems for Video Technology, 2011, 21(12):1810-1821 doi: 10.1109/TCSVT.2011.2133970
    [206] Xu Z F, Shi P F, Xu X Y. Adaptive subclass discriminant analysis color space learning for visual tracking. In:Proceedings of the 9th Pacific Rim Conference on Advances in Multimedia Information Processing (PCM 2008). Tainan, China:Springer, 2008. 902-905
    [207] Zhang X Q, Hu W M, Chen S Y, Maybank S. Graphembedding-based learning for robust object tracking. IEEE Transactions on Industrial Electronics, 2014, 61(2):1072-1084 doi: 10.1109/TIE.2013.2258306
    [208] 查宇飞, 毕笃彦, 杨源, 董守平, 罗宁.基于全局和局部约束直推学习的鲁棒跟踪研究.自动化学报, 2010, 36(8):1084-1090 doi: 10.3724/SP.J.1004.2010.01084

    Zha Yu-Fei, Bi Du-Yan, Yang Yuan, Dong Shou-Ping, Luo Ning. Transductive learning with global and local constraints for robust visual tracking. Acta Automatica Sinica, 2010, 36(8):1084-1090 doi: 10.3724/SP.J.1004.2010.01084
    [209] Wu Y, Lim J, Yang M H. Online object tracking:a benchmark. In:Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA:IEEE, 2013. 2411-2418
    [210] Wu Y, Lim J, Yang M H. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1834-1848 doi: 10.1109/TPAMI.2014.2388226
    [211] Collins R, Zhou X H, Teh S K. An open source tracking testbed and evaluation web site. In:Proceedings of IEEE International Workshop on Performance Evaluation of Tracking and Surveillance. Beijing, China:IEEE, 2005. http://www.oalib.com/references/9307084
    [212] Fisher R B. The PETS04 surveillance ground-truth data sets. In:Proceedings of the 6th IEEE International Workshop on Performance Evaluation of Tracking and Surveillance. Prague, Czech Republic:IEEE, 2004. 1-5
    [213] Pellegrini S, Ess A, Schindler K, van Gool L. You 0ll never walk alone:modeling social behavior for multi-target tracking. In:Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto, Japan:IEEE, 2009. 261-268
    [214] Leibe B, Schindler K, Van Gool L. Coupled detection and trajectory estimation for multi-object tracking. In:Proceedings of the 11th IEEE International Conference on Computer Vision. Rio de Janeiro, Brazil:IEEE, 2007. 1-8 https://lirias.kuleuven.be/handle/123456789/276721
    [215] Milan A, Leal-Taixe L, Reid I, Roth S, Schindler K. MOT16:a benchmark for multi-object tracking. arXiv:1603.00831, 2016. http://arxiv.org/pdf/1603.00831v2.pdf
    [216] Li L Z, Nawaz T, Ferryman J. PETS 2015:datasets and challenge. In:Proceedings of the 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). Karlsruhe, Germany:IEEE, 2015. 1-6 https://www.computer.org/csdl/proceedings/avss/2015/7632/00/index.html
    [217] Kristan M, Matas J, Leonardis A, Felsberg M, Cehovin L, Fernandez G, Vojir T, Hager G, Nebehay G, Pflugfelder R, Gupta A, Bibi A, Lukezic A, Garcia-Martin A, Saffari A, Petrosino A, Montero A S. The visual object tracking VOT 2015 challenge results. In:Proceedings of the 2015 IEEE International Conference on Computer Vision Workshops. Santiago, Chile:IEEE, 2015. 564-586
    [218] Lee J Y, Yu W. Visual tracking by partition-based histogram backprojection and maximum support criteria. In:Proceedings of the 2011 IEEE International Conference on Robotics and Biomimetics (ROBIO). Karon Beach, Thailand:IEEE, 2011. 2860-2865
    [219] Vojir T, Noskova J, Matas J. Robust scale-adaptive meanshift for tracking. Pattern Recognition Letters, 2014, 49:250-258 doi: 10.1016/j.patrec.2014.03.025
    [220] Zhang K H, Zhang L, Yang M H. Real-time compressive tracking. In:Proceedings of the 12th European Conference on Computer Vision (ECCV 2012). Florence, Italy:Springer, 2012. 864-877
    [221] Bao C L, Wu Y, Ling H B, Ji H. Real time robustll tracker using accelerated proximal gradient approach. In:Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, RI, USA:IEEE, 2012. 1830-1837
    [222] Binh N D. Online boosting-based object tracking. In:Proceedings of the 12th International Conference on Advances in Mobile Computing and Multimedia. Kaohsiung, China:ACM, 2014. 194-202
    [223] Dinh T B, Vo N, Medioni G. Context tracker:exploring supporters and distracters in unconstrained environments. In:Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, RI:IEEE, 2011. 1177-1184 http://www.oalib.com/references/19330642
    [224] Dollár P, Belongie S, Perona P. The fastest pedestrian detector in the west. In:Proceedings of the 2010 British Machine Vision Conference. Aberystwyth, UK:British Machine Vision Association, 2010. 68.1-68.11 http://www.researchgate.net/publication/221259170_The_Fastest_Pedestrian_Detector_in_the_West
    [225] Hare S, Golodetz S, Saffari A, Vineet V, Cheng M M, Hicks S, Torr P. Struck:structured output tracking with kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, DOI: 10.1109/TPAMI.2015.2509974
    [226] Babenko B, Yang M H, Belongie S. Robust object tracking with online multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(8):1619-1632 doi: 10.1109/TPAMI.2010.226
    [227] Kristan M, Pflugfelder R, Leonardis A, Matas J, ČehovinL, Nebehay G, Vojıř T, Fernández G, LukežičA, Dimitriev A, Petrosino A, Saffari A, Li B, Han B, Heng C, Garcia C, PangeršičD, Häger G, Khan F S, Oven F, Possegger H, Bischof H, Nam H, Zhu J K, Li J J, Choi J Y, Choi J W, Henriques J F, van de Weijer J, Batista J, Lebeda K, Öfjäll K, Yi K M, Qin L, Wen L Y, Maresca M E, Danelljan M, Felsberg M, Cheng M M, Torr P, Huang Q M, Bowden R, Hare S, Lim S Y, Hong S, Liao S C, Hadfield S, Li S Z, Duffner S, Golodetz S, Mauthner T, Vineet V, Lin W Y, Li Y, Qi Y K, Lei Z, Niu Z H. The visual object tracking VOT2014 challenge results. In:Proceedings of the European Conference on Computer Vision (ECCV 2014), Lecture Notes in Computer Science. Zurich, Switzerland:Springer International Publishing, 2015. 191-217
  • 加载中
图(8) / 表(7)
计量
  • 文章访问数:  7254
  • HTML全文浏览量:  1945
  • PDF下载量:  8215
  • 被引次数: 0
出版历程
  • 收稿日期:  2015-12-14
  • 录用日期:  2016-05-16
  • 刊出日期:  2016-10-20

目录

/

返回文章
返回