生成式不完整多视图数据聚类

赵博宇; 张长青; 陈蕾; 刘新旺; 李泽超; 胡清华

doi:10.16383/j.aas.c200121

生成式不完整多视图数据聚类

doi: 10.16383/j.aas.c200121

赵博宇^1,,
张长青^{1, 2,},
陈蕾^{2, 3,},
刘新旺^4,,
李泽超^5,,
胡清华^1,

1.
天津大学智能与计算学部天津 300350
2.
江苏省大数据安全与智能处理重点实验室南京 210023
3.
南京邮电大学计算机学院南京 210023
4.
国防科技大学计算机学院长沙 410073
5.
南京理工大学计算机科学与工程学院南京 210094

基金项目: 国家自然科学基金(61976151, 61732011, 61872190), 南京邮电大学江苏省大数据安全与智能处理重点实验室资助

详细信息

作者简介:
赵博宇：天津大学智能与计算学部硕士研究生. 主要研究方向为多视图学习. E-mail: boyuzhao@tju.edu.cn

张长青：天津大学智能与计算学部副教授. 主要研究方向为机器学习, 模式识别. 本文通信作者. E-mail: zhangchangqing@tju.edu.cn

陈蕾：南京邮电大学计算机学院教授. 主要研究方向为人工智能, 机器学习及数据挖掘应用. E-mail: chenlei@njupt.edu.cn

刘新旺：国防科技大学计算机学院教授. 主要研究方向为核学习, 特征选择, 谱聚类和隐变量学习. E-mail: 1022xinwang.liu@gmail.com

李泽超：南京理工大学计算机科学与工程学院教授. 主要研究方向为大媒体分析, 计算机视觉. E-mail: zechao.li@njust.edu.cn

胡清华：天津大学智能与计算学部教授. 主要研究方向为多模态学习, 度量学习, 模糊集不确定性建模与推理, 粗糙集和概率论. E-mail: huqinghua@tju.edu.cn

计量
- 文章访问数: 2955
- HTML全文浏览量: 1333
- PDF下载量: 409
- 被引次数: 0
出版历程
- 收稿日期: 2020-03-11
- 录用日期: 2020-05-03
- 网络出版日期: 2021-02-01
- 刊出日期: 2021-08-20

Generative Model For Partial Multi-view Clustering

ZHAO Bo-Yu^1
,,
ZHANG Chang-Qing^{1, 2
,},
CHEN Lei^{2, 3
,},
LIU Xin-Wang^4
,,
LI Ze-Chao^5
,,
HU Qing-Hua^1
,

1.
School of Intelligence and Computing, Tianjin University, Tianjin 300350
2.
Jiangsu Key Laboratory of Big Data Security & Intelligent Processing, Nanjing University of Posts and Telecommunications, Nanjing 210023
3.
School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023
4.
School of Computer, National University of Defense Technology, Changsha 410073
5.
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094

Funds: Supported by National Natural Science Foundation of China (61976151, 61732011, 61872190), Jiangsu Key Laboratory of Big Data Security & Intelligent Processing, Nanjing University of Posts and Telecommunications

More Information

Author Bio:
ZHAO Bo-Yu　Master student at the College of Intelligence and Computing, Tianjin University. His main research interest is multi-view learning

ZHANG Chang-Qing　Associate professor at the College of Intelligence and Computing, Tianjin University. His research interest covers machine learning and pattern recognition. Corresponding author of this paper

CHEN Lei　Professor at the School of Computer Science, Nanjing University of Posts and Telecommunications. His research interest covers application of artificial intelligence, machine learning and data mining

LIU Xin-Wang　Professor at the School of Computer, National University of Defense Technology. His research interest covers kernel learning, feature selection, spectral clustering and latent variable learning

LI Ze-Chao Professor at the School of Computer Science and Engineering, Nanjing University of Science and Technology. His research interest covers big media analysis and computer vision

HU Qing-Hua　Professor at the College of Intelligence and Computing, Tianjin University. His research interest covers multi-modality learning, metric learning, uncertainty modeling and reasoning with fuzzy sets, rough sets and probability theory

摘要

摘要:
基于自表示子空间聚类的多视图聚类引起越来越多的关注. 大多数现有算法假设每个样本的所有视图都可获得, 然而在实际应用中, 由于各种因素, 可能会导致某些视图缺失. 为了对视图不完整数据进行聚类, 本文提出了一种在统一框架下同时执行缺失视图补全和多视图子空间聚类的方法. 具体地, 缺失视图是由已观测视图数据约束的隐表示生成的. 此外, 多秩张量应用于挖掘不同视图之间的高阶相关性. 这样通过隐表示和高阶张量同时挖掘了不同视图以及所有样本(即使是不完整视图样本)之间的相关性. 本文使用增广拉格朗日交替方向最小化(AL-ADM)方法求解优化问题. 在真实数据集上的实验结果表明, 我们的方法优于最新的多视图聚类算法, 具有更好的聚类准确度和鲁棒性.
- 视图缺失 /
- 多视图聚类 /
- 张量 /
- 生成式模型
Abstract:
There has been a growing interest in multi-view clustering over self-representation-based subspace clustering. Most existing algorithms assume that all views for each sample are available. However, in real applications, some views may be missing which produces data with partial views. To cluster the incomplete data, we propose a generative model to simultaneously perform view imputation and multi-view subspace clustering in a unified framework. Specifically, the missing views are generated by a latent representation which is constrained by the observed views. Moreover, multi-rank tensor is employed to explore the higher-order correlations across different views. In this way, the correlations across different views and all samples even with incomplete views are simultaneously explored by the latent representation and high-order tensor. We solve the optimization problem by using augmented Lagrangian alternating direction minimization (AL-ADM) method. Experimental results on real-world datasets demonstrate the superior performance and robustness of our method over state-of-the-art multi-view clustering algorithms.
- View missing /
- multi-view clustering /
- tensor /
- generative model
注释:

1) ¹ http://cvc.yale.edu/projects/yalefacesB/yalefacesB.html² http://www.uk.research.att.com/facedatabase.html³ http://www.cs.columbia.edu/CAVE/software/softlib/⁴ http://mlg.ucd.ie/datasets/

2) ²http://www.uk.research.att.com/facedatabase.html

3) ³http://www.cs.columbia.edu/CAVE/software/softlib/

4) ⁴http://mlg.ucd.ie/datasets/

HTML全文

图 1 同时用$P(X|H)$对隐空间$H$进行建模, 并基于隐表示生成完整特征. 根据完整的数据, GM-PMVC将子空间表示集成到一个张量中, 可以挖掘多视图数据高阶相关性

Fig. 1 Illustration of generative model for partial multi-view clustering (GM-PMVC). Given incomplete multi-view data, we simultaneously model latent space $H$ by $P(X|H)$ and generate complete feature based on latent representation. According to the completed data, GM-PMVC integrates subspace representation into a tensor which can effectively explores higher-order correlations equipped with low-rank constraint

下载: 全尺寸图片幻灯片

图 2 在四个数据集上不同缺失率的准确度(ACC)和归一化互信息(NMI) (平均值 ± 标准差)

Fig. 2 Results (mean ± std) in terms of accuracy and NMI on four datasets with different missing rate

下载: 全尺寸图片幻灯片

图 3 YaleB数据集上缺失率为10 %时的模型分析: (a) 参数调整对NMI指标的影响; (b)迭代过程中的收敛条件数值和聚类指数曲线(收敛条件数值已归一化)

Fig. 3 Model analysis on YaleB with missing rate: 10 %: (a) Performence with parameter tuning; (b) Convergence and clustering index curves during iteration (convergence values are normlized)

下载: 全尺寸图片幻灯片

表 1 符号与定义

Table 1 Notations and definitions

$b$	标量	$B$	矩阵
${\boldsymbol{b}}$	向量	${\cal{B}}$	张量
${\cal{I}}$	单位张量	$fft$	快速傅里叶变换
${\cal{B}}_{ijk}$	张量${\cal{B}}$第$(i,j,k)$元素	${\cal{Q}}$	正交张量
${\cal{B}}(i,:,:)$	第$i$水平切片	${\cal{B}}^{\rm T}$	${\cal{B}}$的转置
${\cal{B}}(:,i,:)$	第$i$侧面切片	${\cal{B}}_{f}$	$fft({\cal{B}},[],3)$
${\cal{B}}(:,:,i)$	第$i$正面切片	$B^{(i)}$	${\cal{B}}(:,:,i)$
$\|\|B\|\|_{F}$	$\sqrt{\sum\nolimits_{i,j}\|B_{ij}\|^{2}}$	$\|\|B\|\|_{*}$	矩阵$B$奇异值之和
$\|\|{\cal{B}}\|\|_{F}$	$\sqrt{\sum\nolimits_{i,j,k}\|{\cal{B}}_{ijk}\|^{2}}$	$\|\|{\cal{B}}\|\|_{1}$	$\sum\nolimits_{i,j,k}\|{\cal{B}}_{ijk}\|$

下载: 导出CSV

表 2 算法运行时间对比(秒)

Table 2 Algorithm running time comparison (s)

Algorithms	ORL	yaleB
MIC	84.67	143.30
IMG	83.02	169.38
PVC	120.68	404.82
DAIMC	157.76	191.27
SRLCs	93.21	193.36
t-SVD-MSC	56.77	107.03
Ours	180.90	288.50

下载: 导出CSV

参考文献(34)

[1]	Sun S. A survey of multi-view machine learning. Neural Computing and Applications, 2013, 23(7-8): 2031-2038 doi: 10.1007/s00521-013-1362-6
[2]	Yang Y, Wang H. Multi-view clustering: A survey. Big Data Mining and Analytics, 2018, 1(2): 83-107 doi: 10.26599/BDMA.2018.9020003
[3]	Baltrusaitis T, Ahuja C, Morency L P. Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(2): 423-443
[4]	张祎, 孔祥维, 王振帆, 付海燕, 李明. 基于多视图矩阵分解的聚类分析. 自动化学报, 2018, 44(12): 2160-2169 Zhang Yi, Kong Xiang-Wei, Wang Zhen-Fan, Fu Hai-Yan, Li Ming. Matrix Factorization for Multi-view Clustering. Acta Automatica Sinica, 2018, 44(12): 2160-2169
[5]	王海艳. 一种基于多视图学习的群组发现方法. 自动化学报, 2019, 39(4): 80-89 Wang Hai-Yan. Group discovery method based on multi view learning. Acta Automatica Sinica, 2019, 39(4): 80-87
[6]	李霞, 卢官明, 闫静杰, 张正言. 多模态维度情感预测综述. 自动化学报, 2018, 44(12): 2142-2159. Li Xia, Lu Guan-Ming, Yan Jing-Jie, Zhang Zheng-Yan. A Survey of Dimensional Emotion Prediction by Multimodal Cues. Acta Automatica Sinica, 2018, 44(12): 2142-2159
[7]	Li Z Y, Wang Q Q, Tao Z Q, Gao Q X, Yang Z H. Deep adversarial multi-view clustering network. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. Macau, China: Morgan Kaufmann, 2019. 2952−2958
[8]	Xie Y, Tao D C, Zhang W S, Liu Y, Zhang L, Qu Y Y. On unifying multi-view self-representations for clustering by tensor multi-rank minimization. International Journal of Computer Vision, 2018, 126(11): 1157-1179 doi: 10.1007/s11263-018-1086-2
[9]	Zhang C Q, Fu H Z, Liu S, Liu G C, Cao X C. Low-rank tensor constrained multiview subspace clustering. In: Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015. 1582−1590
[10]	Cao X C, Zhang C Q, Fu H Z, Liu S, Zhang H. Diversity-induced multi-view subspace clustering. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015. 586−594
[11]	Zhang C Q, Fu H Z, Hu Q H, Cao X C, Xie Y, Tao D C, et al. Generalized latent multi-view subspace clustering. IEEE transactions on pattern analysis and machine intelligence, 2018, 42(1): 86-99
[12]	Li R H, Zhang C Q, Hu Q H, Zhu P F, Wang Z. Flexible multi-View representation learning for subspace clustering. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. Macau, China: Morgan Kaufmann, 2019. 2916−2922
[13]	Kang Z, Shi G X, Huang S, Chen W Y, Pu X R, Zhou T Y, et al. Multi-graph fusion for multi-view spectral clustering. Knowledge-Based Systems, 2020, 189: 105102 doi: 10.1016/j.knosys.2019.105102
[14]	Kang Z, Zhao X J, Peng C, Zhu H Y, Zhou T Y, Peng X, et al. Partition level multiview subspace clustering. Neural Networks, 2020, 122: 279-288 doi: 10.1016/j.neunet.2019.10.010
[15]	Huang Z Y, Zhou T Y, Peng X, Zhang C Q, Zhu H Y, Lv J C. Multi-view spectral clustering network. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. Macau, China: Morgan Kaufmann, 2019. 2563−2569
[16]	Cai J F, Candès E J, Shen Z. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 2010, 20(4): 1956-1982 doi: 10.1137/080738970
[17]	Mazumder R, Hastie T, Tibshirani R. Spectral regularization algorithms for learning large incomplete matrices. Journal of machine learning research, 2010, 11(80): 2287-2322
[18]	Zhu J Y, Park T, Isola P, Efros A A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 2223−2232
[19]	Choi Y, Choi M, Kim M, Ha J W, Kim S, Choo J. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 8789−8797
[20]	Kim T, Cha M, Kim H, Lee J K, Kim J. Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: ACM, 2017. 1857−1865
[21]	Lee D, Kim J, Moon W J, Ye J C. CollaGAN: Collaborative GAN for missing image data imputation. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 2487−2496
[22]	孙亮, 韩毓璇, 康文婧, 葛宏伟. 基于生成对抗网络的多视图学习与重构算法. 自动化学报, 2018, 44(5): 819-828 Sun Liang, Han Yu-Xuan, Kang Wen-Jing, Ge Hong-Wei. Multi-view Learning and Reconstruction Algorithms via Generative Adversarial Networks. Acta Automatica Sinica, 2018, 44(5): 819-828
[23]	Li S Y, Jiang Y, Zhou Z H. Partial multi-view clustering. In: Proceedings of Twenty-Eighth the Association for the Advance of Artificial Intelligence Conference. Québec Convention Center, Canada: AAAI, 2014. 1968−1974
[24]	Zhao H D, Liu H F, Fu Y. Incomplete multi-modal visual data grouping. In: Proceedings of the 25th International Joint Conference on Artificial Intelligenc. New York, USA: Morgan Kaufmann, 2016. 2392−2398
[25]	Zhuge W Z, Hou C P, Liu X W, Tao H, Yi D Y. Simultaneous representation learning and clustering for incomplete multi-view data. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macau, China: Morgan Kaufmann, 2019. 4482−4488
[26]	Kilmer M E, Martin C D. Factorization strategies for third-order tensors. Linear Algebra and its Applications, 2011, 435(3): 641-658 doi: 10.1016/j.laa.2010.09.020
[27]	Semerci O, Hao N, Kilmer M E, Miller E L. Tensor-based formulation and nuclear norm regularization for multienergy computed tomography. IEEE Transactions on Image Processing, 2014, 23(4): 1678-1693 doi: 10.1109/TIP.2014.2305840
[28]	Zhang Z M, Ely G, Aeron S, Hao N, Kilmer M. Novel methods for multilinear data completion and de-noising based on tensor-SVD. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014. 3842−3849
[29]	Zhang C Q, Adeli E, Wu Z W, Li G, Lin W L, Shen D G. Infant brain development prediction with latent partial multi-view representation learning. IEEE Transaction on Medical Imaging, 2019, 38(4): 909-918 doi: 10.1109/TMI.2018.2874964
[30]	Liu G C, Lin Z C, Yan S C, Sun J, Yu Y, Ma Y. Robust recovery of subspace structures by low-rank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 35(1): 171-184
[31]	Lin Z C, Liu R S, Su Z X. Linearized alternating direction method with adaptive penalty for low-rank representation. In: Proceedings of Advances in Neural Information Processing Systems. Granada Congress and Exhibition Centre, SPAIN: MIT Press, 2011. 612−620
[32]	Greene D, Cunningham P.A matrix factorization approach for integrating multiple data views. In: Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Bled, Slovenia: Springer, 2009. 423−438
[33]	Shao W X, He L F, Philip S Y. Multiple Incomplete Views Clustering via Weighted Nonnegative Matrix Factorization with L_{2, 1} Regularization. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Porto, Portugal: Springer, 2015. 318−334
[34]	Hu M L, Chen S C. Doubly aligned incomplete multi-view clustering. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. Stockholm, Sweden: AAAI, 2018. 2262−2268