Fully Overlapped Handwritten Number Recognition and Separation Based on Deep EM Capsule Network
-
摘要: 基于胶囊网络的向量神经元思想和EM算法, 本文设计了一种以EM为向量聚类算法的深度胶囊网络, 实现了重叠手写数字的识别与分离. 该网络由两部分组成, 第一部分使用两个卷积层、两个基础胶囊层、两个EM聚类胶囊层构成六层网络结构. 其将胶囊维数由常规的8维扩充为16维, 并利用姿态转换矩阵实现低级特征到高级特征的预测, 同时将EM算法改为EM向量聚类算法, 以替换原胶囊网络中的迭代路由部分, 优化了网络的运算过程, 实现了重叠目标识别. 第二部分是重构网络部分, 由结构完全相同的两个并行网络组成, 对双向量进行并行重构, 实现了重叠目标的分离. 实验结果显示, 对于100%全重叠手写数字图片本网络识别率达到了96%, 对比现有的胶囊网络CapsNet在80%的重叠率下95%的识别率, 在100%的重叠率下88%的识别率, 本文网络在难度提升的情况下, 识别率有明显提高, 能够将完全叠加的两张手写数字图片进行准确地分离.Abstract: Based on the idea of vector neurons in capsule network and EM algorithm, a deep capsule network is designed in this paper with EM as the vector clustering algorithm. The network consists of two parts, the first of which uses a double convolution layers, double primary capsule layers and double EM clustering capsule layers to form a six-layer network structure, which expands the dimension of capsule from 8 to 16 dimensions, and uses the attitude transformation matrix to realize the prediction of low-level feature to high-level feature, and changes the EM algorithm to the EM vector clustering algorithm to replace the iterative routing part of the original capsule network, optimizes the operation of the network and achieves overlapping target recognition. The second part is the
-
表 1 数据集标签
Table 1 Dataset label
输入图像 标签 说明 (0,0,0,0,0,0,0,1,0,0) 无叠加 (0,0,0,0,0,0,0,0,0,2) 两相同数字叠加 (0,0,0,1,0,0,0,1,0,0) 两不同数字叠加 表 2 在不同聚类次数下的激活向量模长
Table 2 Active vector module length under different clustering times
网络结构及聚类形式 所用训练集 R = 1 R = 2 R = 3 DCN EM聚类 / CapsNet路由聚类 MNIST数据集 0.0413/0.0536 0.5241/0.4122 0. 9800/0.8792 全重叠数据集 0.0332/0.0423 0.4342/0.5865 0.9943/0.8653 混合数据集 0.0323/0.0354 0.4543/0.3252 0.9923/0.9173 表 3 参数量与不同聚类次数R下的单epoch消耗时间(s)
Table 3 Parameter quantity and single epoch consumption time under different clustering times (s)
网络结构 参数量 聚类算法 R = 1 R = 2 R = 3 CapsNet 8215568 迭代路由 150±2 210±2 240±2 DCN 20128032 EM 240±2 300±2 340±2 表 4 DCN不同聚类算法单epoch消耗时间(s)
Table 4 Single epoch consumption time of different DCN clustering algorithms (s)
聚类算法 R = 1 R = 2 R = 3 迭代路由EM 350±2 410±2 440±2 240±2 300±2 340±2 表 5 DCN识别手写数字效果对比
Table 5 Effect comparison of handwritten digits recognized by DCN
所用训练集 无重叠手写数字识别率 全重叠手写数字识别率 MNIST数据集 99.6% 55.2% 全重叠手写数字数据集 80.7% 96.75% 混合数据集 95.7% 96.55% 表 6 重叠手写数字识别率对比(R=3)
Table 6 Comparison of recognition rate of overlapping handwritten digits (R = 3)
网络模型 训练集 重叠率 正确率 CapsNet MutiMNIST 80% 95% 全重叠数据集 100% 88% DCN 全重叠数据集 100% 96.75% 表 7 全重叠手写数字分类与重构的部分结果
Table 7 partial results of classification and reconstruction of fully overlapped handwritten digits
分类标签 (3, 7) (9, 1) (0, 8) (0, 4) (9, 7)* (7, 9)* (7, 9)* (5, 9)• 分类结果 (3, 7) (9, 1) (8, 0) (0, 4) (7, 9)* (7, 9)* (7, 9)* (8, 9)• 输入图片 重构图片1 重构图片2 表 8 部分识别和分离结果
Table 8 Partial identification and separation results
分类标签 (不, 专) (下, 不) (丑, 下) (不, 丑) (下, 世) (下, 专) (王, 丑) (也, 卫) 分类结果 (不, 专) (下, 不) (丑, 下) (不, 丑) (下, 世) (下, 专) (丑, 不能确定) (不能确定, 不能确定) 输入图片 重构图片1 重构图片2 -
[1] Hinton G E, Ghahramani Z, Teh Y W. Learning to parse images. In: Proceeding of the Neural Information Processing Systems. 2000: 463−469. [2] Goodfellow I J, Bulatov Y, Ibarz J, et al. Multi-digit number recognition from street view imagery using deep convolutional neural networks. ArXiv Preprint, ArXiv: 1312.6082, 2013. [3] Ba J, Mnih V, Kavukcuoglu K. Multiple object recognition with visual attention. ArXiv Preprint, ArXiv: 1412.7755, 2014. [4] Greff K, Rasmus A, Berglund M, et al. Tagger: Deep unsupervised perceptual grouping. In: Proceeding of the Neural Information Processing Systems. 2016: 4484−4492. [5] Sabour S, Frosst N, Hinton G E. Dynamic routing between capsules. In: Proceeding of the Neural Information Processing Systems. 2017: 3856−3866. [6] Gupta M R, Chen Y. Theory and use of the EM algorithm. Foundations and Trends® in Signal Processing, 2011, 4(3): 223−296 [7] Xuan G, Zhang W, Chai P. EM algorithms of Gaussian mixture model and hidden Markov model. In: Proceedings 2001 International Conference on Image Processing (Cat. No. 01CH37205). IEEE, 2001, 1: 145−148. [8] Fujimoto M, Riki Y A. Robust speech recognition in additive and channel noise environments using GMM and EM algorithm. In: Proceeding of the IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, 2004, 1: I−941. [9] Bilik I, Tabrikian J, Cohen A. GMM-based target classification for ground surveillance Doppler radar. IEEE Transactions on Aerospace and Electronic Systems, 2006, 42(1): 267−278 doi: 10.1109/TAES.2006.1603422 [10] Jain A K, Dubes R C. Algorithms for clustering data. 1988. [11] D. Wang and Q. Liu. “An optimization view on dynamic routing between capsules, ” 2018. [12] A. Jaiswal, W. AbdAlmageed, Y. Wu, and P. Natarajan, “Capsulegan: Generative adversarial capsule network, ” in ECCV, Munich, Germany, 2018, pp. 526–535. [13] R. LaLonde and U. Bagci. Capsules for object segmentation. ArXiv Preprint, ArXiv: 2018. [14] J. Rajasegaran and V. Jayasundara. DeepCaps: Going deeper with capsule networks. arXiv: 1904.09546v1, 2019. [15] G. E. Hinton, S. Sabour, and N. Frosst. Matrix capsules with em routing, in ICLR, Vancouver, BC, 2018. [16] Abbas O A. Comparisons Between Data Clustering Algorithms. International Arab Journal of Information Technology (IAJIT), 2008, 5(3) [17] Zhang B, Hsu M, Dayal U. K-harmonic means-a data clustering algorithm. Hewlett-Packard Labs Technical Report HPL-1999-124, 1999, 55. [18] 王爱平, 张功营, 刘方. EM算法研究与应用. 计算机技术与发展, 2009, 19(09): 108−110 doi: 10.3969/j.issn.1673-629X.2009.09.030Wang Ai-Ping, Zhang Gong-Ying, Liu Fang. Research and Application of EM Algorithm. Computer Technology and Development, 2009, 19(09): 108−110 doi: 10.3969/j.issn.1673-629X.2009.09.030 [19] 岳佳, 王士同. 高斯混合模型聚类中EM算法及初始化的研究. 微计算机信息, 2006, 22(33): 244G246−302Yue Jia, Wang Shi-Tong. Algorithm EM and Its Initialization in G aussian-M ixture-M odelBased Clustering. Microcomputer Information, 2006, 22(33): 244G246−302 [20] 朱周华. 期望最大(EM)算法及其在混合高斯模型中的应用. 现代电子技术, 2003, 26(24): 88−9 doi: 10.3969/j.issn.1004-373X.2003.24.032Zhu Zhou-Hua. EM Algorithm and Its Application in Mixture of Gaussian. Modern Electronics Technique, 2003, 26(24): 88−9 doi: 10.3969/j.issn.1004-373X.2003.24.032 [21] Hongyi Zhang, et al. mixup: BEYOND EMPIRICAL RISK MINIMIZATION. ArXiv Preprint, ArXiv: 1710.09412, 2018. -

计量
- 文章访问数: 81
- HTML全文浏览量: 49
- 被引次数: 0