王松涛 周真 靳薇 曲寒冰

国家自然科学基金 91746207

北京市科技计划 Z161100001116086


    王松涛  北京市科学技术研究院北京市新技术应用研究所助理研究员.哈尔滨理工大学测控技术与仪器省高校重点实验室博士研究生.主要研究方向为计算机视觉, 模式识别, 深度学习. E-mail: wangsongtao1983@163.com

    靳薇   博士, 北京市科学技术研究院北京市新技术应用研究所副研究员.全国公共安全基础标准技术委员会委员.主要研究方向为机器学习, 计算机视觉, 模式识别, 生物特征识别. E-mail: jinwei201002@163.com

    曲寒冰  博士, 北京市科学技术研究院北京市新技术应用研究所副研究员.中国自动化学会智能自动化专业委员会委员.主要研究方向为机器学习, 计算机视觉, 模式识别, 生物特征识别.E-mail:quhanbing@gmail.com


    周真  哈尔滨理工大学测控技术与通信工程学院教授.主要研究方向为可靠性工程技术, 生物信息检测.本文通信作者. E-mail: zhzh49@126.com

Saliency Detection for RGB-D Images Under Bayesian Framework


National Natural Science Foundation of China 91746207

Beijing Science and Technology Program Z161100001116086

    WANG Song-Tao   Assistant professor at Beijing Institute of New Technology Applications, Beijing Academy of Science and Technology. Ph.D. candidate at the Higher Educational Key Laboratory for Measuring & Control Technology and Instrumentations of Heilongjiang Province, Harbin University of Science and Technology. His research interest covers computer vision, pattern recognition, and deep learning

    JIN Wei Associate professor at Beijing Institute of New Technology Applications, Beijing Academy of Science and Technology. She is also a committee member of National Technical Committee for Basic Standards of Public Safety. Her research interest covers machine learning, computer vision, pattern recognition, and biometrics

    QU Han-Bing Associate professor at Beijing Institute of New Technology Applications, Beijing Academy of Science and Technology. He is also a committee member of Intelligent Automation Committee of Chinese Association of Automation (IACAA). His research interest covers machine learning, computer vision, pattern recognition, and biometrics

    Corresponding author: ZHOU Zhen Professor at the School of Measurement-Control Technology and Communications Engineering, Harbin University of Science and Technology. His research interest covers reliability engineering technology and biological information detection. Corresponding author of this paper
  • 摘要: 为了有效融合RGB图像颜色信息和Depth图像深度信息, 提出一种基于贝叶斯框架融合的RGB-D图像显著性检测方法.通过分析3D显著性在RGB图像和Depth图像分布的情况, 采用类条件互信息熵(Class-conditional mutual information, CMI)度量由深层卷积神经网络提取的颜色特征和深度特征的相关性, 依据贝叶斯定理得到RGB-D图像显著性后验概率.假设颜色特征和深度特征符合高斯分布, 基于DMNB (Discriminative mixed-membership naive Bayes)生成模型进行显著性检测建模, 其模型参数由变分最大期望算法进行估计.在RGB-D图像显著性检测公开数据集NLPR和NJU-DS2000上测试, 实验结果表明提出的方法具有更高的准确率和召回率.
  • 图  1  RGB-D图像显著性检测方法分类

    Fig.  1  Methodologies of the RGB-D saliency detection

    图  2  3D显著性在RGB-D图像分布情况

    Fig.  2  3D saliency situation in RGB-D images

    图  3  本文方法框图

    Fig.  3  Overview diagram of the proposed model

    图  4  RGB-D图像超像素分割(如RGB图像矩形框区域所显示, 兼顾颜色和深度信息超像素分割得到边缘比只考虑颜色信息要准确.同样情况, Depth图像矩形框区域显示兼顾颜色和深度信息超像素分割得到边缘比只考虑深度信息要准确)

    Fig.  4  Visual samples for superpixel segmentation of RGB-D images (Within the rectangle, the boundary between the foreground and the background segmented by the color and depth cues more accurate than color-based segmentation. Similarly, within the rectangle, the boundary between the foreground and the background segmented by the color and depth cues more accurate than depth-based segmentation)

    图  5  监督迁移学习过程示意图((a)提取Depth图像显著特征的深层卷积神经网络结构图.其中Relu层使用修正线性函数Relu$(x) = \max(x, 0)$保证输出不为负; Lrn表示局部响应归一化层; Dropout表示Dropout层, 在训练时以0.5比例忽略隐层节点防止过拟合. (b)基于深层卷积神经网络提取RGB图像和Depth图像显著特征流程图.首先图像被裁剪成尺寸为227$\times$227$\times$3作为深层卷积神经网络的输入, 在卷积层1通过96核的尺寸为7$\times$7步长为2滤波器卷积滤波, 得到卷积图像通过Relu函数, 再经过池化层1尺寸为3$\times$3步长为2的最大值池化成96个尺寸为55$\times$55的特征图, 最后对得到的特征图进行局部响应归一化.在卷积层2, 池化层2, 卷积层3, 卷积层4, 卷积层5和池化层5执行相似的处理.其池化层5输出作为全连接层6的输入, 经过全连接层7由输出层输出显著类别, 其中输出层采用softmax函数. (c)本文基于监督迁移学习的方法, 在RGB图像训练完成的Clarifai网络的基础上, 利用与RGB图像配对的Depth图像重新训练提取Depth图像显著特征的深层卷积神经网络)

    Fig.  5  Architecture for supervision transfer ((a) The Architecture of Depth CNN, where Relu denotes a rectified linear function Relu$(x) = \max(x, 0)$, which rectify the feature maps thus ensuring the feature maps are always positive, lrn denotes a local response normalization layer, and Dropout is used in the fully connected layers with a rate of 0.5 to prevent CNN from overfitting. (b) The flowchart of image processed based on Depth CNN. A 227 by 227 crop of an image (with 3 planes) is presented as the input. This is convolved with 96 different 1st layer filters, each of size 7 by 7, using a stride of 2 in both $x$ and $y$. The resulting feature maps are then: passed through a rectified linear fuction, pooled (max within 3 by 3 regions, using stride 2), and local response normalized across feature maps to give 96 different 55 by 55 element feature maps. Similar operations are repeated in layers 2, 3, 4, 5. The last two layers are fully connected, taking features from the pooling layer 5 as input in vector form. The final layer is a 2-way softmax function, which indicates the image is salient or not. (c) We train a CNN model for depth images by teaching the network to reproduce the mid-level semantic representation learned from RGB images for which there are paired images)

    图  6  NLPR数据集和NJU-DS2000数据集RGB图像和Depth图像显著特征的类条件互信息熵分布图((a) NLPR数据集颜色-深度显著情况; (b) NLPR数据集颜色显著情况; (c) NLPR数据集深度显著情况; (d) NJU-DS2000数据集颜色-深度显著情况; (e) NJU-DS2000数据集颜色显著情况; (f) NJU-DS2000数据集深度显著情况)

    Fig.  6  Visual result for class-conditional mutual information between color and depth deep features on NLPR and NJU-DS2000 RGB-D image datasets ((a) Color-depth saliency situation in terms of the NLPR dataset, (b) Color saliency situation in terms of the NLPR dataset, (c) Depth saliency situation in terms of the NLPR dataset, (d) Color-depth saliency situation in terms of the NJU-DS2000 dataset, (e) Color saliency situation in terms of the NJU-DS2000 dataset, (f) Depth saliency situation in terms of the NJU-DS2000 dataset.)

    图  7  基于DMNB模型显著性检测的图模型($y$和$\pmb{x}$为可观测变量, $\pmb{z}$为隐藏变量.其中$\pmb{x}_{1:N} = (\pmb{x}_c, \pmb{x}_d)$表示RGB-D图像显著特征, 特征$\pmb{x}_j$服从$C$个均值为$\{\mu_{jk}|j = 1, \cdots, N\}$和方差为$\{\sigma_{jk}^2|j = 1, \cdots, N\}$高斯分布. $y$是标识超像素是否显著的标签, 取值1或者0, 其中1表示显著, 0表示非显著)

    Fig.  7  Graphical models of DMNB for saliency estimation ($y$ and $\pmb{x}$ are the corresponding observed states, and $\pmb{z}$ is the hidden variable, where each feature $\pmb{x}_j$ is assumed to have been generated from one of $C$ Gaussian distribution with a mean of $\{\mu_{jk}|j = 1, \cdots, N\}$ and a variance of $\{\sigma_{jk}^2|j = 1, \cdots, N\}$, $y$ is either 1 or 0 that indicates whether the pixel is salient or not.)

    图  8  对比基于生成聚类和狄利克雷过程聚类方法确定DMNB模型混合分量参数$C$ ((a)针对NLPR数据集显著特征生成聚类图. (b)针对NLPR数据集基于狄利克雷过程的显著特征聚类图, 其中不同图例的数目代表DMNB模型混合分量参数$C$.对于NLPR数据集, 得到$C = 24$. (c)针对NJU-DS2000数据集显著性特征生成聚类图. (d)针对NJU-DS2000数据集基于狄利克雷过程的显著特征聚类图, 其中不同图例的数目代表DMNB模型混合分量参数$C$.对于NJU-DS2000数据集, 得到$C = 28$)

    Fig.  8  Visual result for the number of components $C$ in the DMNB model: generative clusters vs DPMM clustering ((a) Generative clusters for NLPR RGB-D image dataset. (b) DPMM clustering for NLPR RGB-D image dataset, where the number of colors and shapes of the points denote the number of components $C$. We find $C = 24$ using DPMM on the NLPR dataset. (c) Generative clusters for NJU-DS2000 RGB-D image dataset. (d) DPMM clustering for NJU-DS2000 RGB-D image dataset, where the number of colors and shapes of the points denote the number of components $C$. We find $C = 28$ using DPMM on the NJU-DS2000 dataset.)

    图  9  对于NLPR数据集交叉验证DMNB模型混合分量参数$C$, 给定一个由DPMM模型得到的参数$C$的取值范围, 采用10-fold进行交叉验证

    Fig.  9  Cross validation for the parameter $C$ in the DMNB model in terms of NLPR dataset, we use 10-fold cross-validation with the parameter $C$ for DMNB models, the $C$ found using DPMM was adjusted over a wide range in a 10-fold cross-validation

    图  10  NLPR数据集颜色-深度显著情况显著图对比. ((a) RGB图像; (b) Depth图像; (c)真值图; (d) ACSD方法; (e) GMR方法; (f) MC方法; (g) MDF方法; (h) LMH方法; (i) GP方法; (j)本文方法)

    Fig.  10  Visual comparison of the saliency detection in the color-depth saliency situation in terms of NLPR dataset ((a) RGB image, (b) Depth image, (c) Ground truth, (d) ACSD, (e) GMR, (f) MC, (g) MDF, (h) LMH, (i) GP, (j) BFSD)

    图  11  NLPR数据集ROC曲线对比图

    Fig.  11  The ROC curves of different saliency detection models in terms of the NLPR dataset

    图  12  NLPR数据集F测度结果对比图

    Fig.  12  The F-measures of different saliency detection models when used on the NLPR dataset

    图  13  NLPR数据集颜色显著情况显著图对比((a) RGB图像; (b) Depth图像; (c)真值图; (d) ACSD方法; (e) GMR方法; (f) MC方法; (g) MDF方法; (h) LMH方法; (i) GP方法; (j)本文方法)

    Fig.  13  Visual comparison of the saliency detection in the color saliency situation in terms of NLPR dataset ((a) RGB image, (b) Depth image, (c) Ground truth, (d) ACSD, (e) GMR, (f) MC, (g) MDF, (h) LMH, (i) GP, (j) BFSD)

    图  14  NLPR数据集深度显著情况显著图对比((a) RGB图像; (b) Depth图像; (c)真值图; (d) ACSD方法; (e) GMR方法; (f) MC方法; (g) MDF方法; (h) LMH方法; (i) GP方法; (j)本文方法)

    Fig.  14  Visual comparison of the saliency detection in the depth saliency situation in terms of NLPR dataset ((a) RGB image, (b) Depth image, (c) Ground truth, (d) ACSD, (e) GMR, (f) MC, (g) MDF, (h)LMH, (i) GP, (j) BFSD)

    图  15  NJU-DS2000数据集颜色-深度显著情况显著图对比((a) RGB图像; (b) Depth图像; (c)真值图; (d) ACSD方法; (e) GMR方法; (f) MC方法; (g) MDF方法; (h)本文方法)

    Fig.  15  Visual comparison of the saliency detection in the color-depth saliency situation in terms of NJU-DS2000 dataset ((a) RGB image, (b) Depth image, (c) Ground truth, (d) ACSD, (e) GMR, (f) MC, (g) MDF, (h) BFSD)

    图  16  NJU-DS2000数据集颜色显著情况显著图对比. ((a) RGB图像; (b) Depth图像; (c)真值图; (d) ACSD方法; (e) GMR方法; (f) MC方法; (g) MDF方法; (h)本文方法)

    Fig.  16  Visual comparison of the saliency detection in the color saliency situation in terms of NJU-DS2000 dataset ((a) RGB image, (b) Depth image, (c) Ground truth, (d) ACSD, (e) GMR, (f) MC, (g) MDF, (h) BFSD)

    图  17  NJU-DS2000数据集深度显著情况显著图对比((a) RGB图像; (b) Depth图像; (c)真值图; (d) ACSD方法; (e) GMR方法; (f) MC方法; (g) MDF方法; (h)本文方法)

    Fig.  17  Visual comparison of the saliency detection in the depth saliency situation in terms of NJU-DS2000 dataset ((a) RGB image; (b) Depth image; (c) Ground truth; (d) ACSD; (e) GMR; (f) MC; (g) MDF; (h) BFSD)

    图  18  NJU-DS2000数据集ROC对比图

    Fig.  18  The ROC curves of different saliency detection models in terms of the NJU-DS2000 dataset

    图  19  NJU-DS2000数据集F测度结果对比图

    Fig.  19  The F-measures of different saliency detection models when used on the NJU-DS2000 dataset

    图  20  失败情况

    Fig.  20  Some failure cases

    表  1  RGB-D图像数据集中3D显著性分布比例

    Table  1  3D saliency situation on RGB-D image dataset

    数据集 颜色-深度显著 颜色显著 深度显著
    NLPR[40] 76.7% 20.8% 2.5%
    NJU-DS2000[38] 69.4% 16.6% 14.0%
    表  2  参数表

    Table  2  Summary of parameters

    变量名 取值范围 参数描述
    $\tau$ (0, 1) 类条件互信息熵阈值
    $\alpha$ (0, 20) 狄利克雷分布参数
    $\theta$ (0, 1) 多项式分布参数
    $\eta$ (-10.0, 3.0) 伯努利分布参数
    $\Omega$ ((0, 1), (0, 0.2)) 高斯分布参数
    $N$ $> 2$ 特征维度
    $C$ $> 2$ DMNB模型分量参数
    $\varepsilon_\mathcal{L}$ (0, 1) EM收敛阈值
    表  3  NLPR数据集和NJU-DS2000数据集分布情况

    Table  3  The benchmark and existing 3D saliency detection dataset

    数据集 图片数 显著目标数 场景种类 中央偏置
    NLPR 1000 (大多数)一个 11
    NJU-DS2000 2000 (大多数)一个 $>$ 20
    表  4  NLPR数据集处理一幅RGB-D图像平均时间比较

    Table  4  Comparison of the average running time for per RGB-D image on the NLPR dataset

    NLPR 2.9s 72.7s $942.3$s 0.2s 2.8s 38.9s 80.1s
    表  5  AUC值比较

    Table  5  Comparison of the AUC on the NLPR dataset

    颜色-深度显著 0.61 0.73 0.81 0.82 0.70 0.79 0.83
    颜色显著 0.56 0.74 0.84 0.83 0.61 0.65 0.84
    深度显著 0.63 0.71 0.76 0.74 0.75 0.81 0.90
    总体 0.60 0.73 0.81 0.80 0.69 0.78 0.85
