Text-independent Writer Identification Based on Hybrid Codebook and Factor Analysis
-
摘要: 针对已有的笔迹鉴别方法对笔迹版式的要求比较严格、训练过程耗时、对内容不受限制的小样本数据情况下鉴别性能较低等问题, 提出了基于混合码本与因子分析的文本独立笔迹鉴别算法. 该算法提取写作时常用的子图像, 并用描述符标注“代码”建立“码本”. 在特征提取层, 分别采用加权的方向指数直方图法和距离变换法, 对于具有相同描述符的“代码”计算特征距离. 把影响特征距离的因素分为书写因子和字符因子, 对码本中的每个书写模式进行双因子方差分析. 在IAM和Firemaker这两个标准数据集上的实验结果证明, 相比目前国内外的先进已有方法, 本文提出的算法在精度和速度方面有一定的优势, 具有一定的推广价值, 适合处理多语种的笔迹鉴别问题.Abstract: In order to solve the problems of existing writer identification methods, such as strict requirements on handwriting format, time-consuming training process and low identification performance under the condition of small sample data with unlimited content, a text-independent writer identification algorithm based on hybrid codebook and factor analysis is proposed. This algorithm extracts sub-images that are often used in writing, and uses descriptors to label them as codes to create code books. In the feature extraction layer, weighted directional index histogram method and distance transformation method are used respectively to calculate feature distance for codes with the same descriptor. Then the factors that affect the feature distance are divided into writing factors and character factors, and the two-way analysis of variance is carried out for each writing mode in the codebook. The experimental results on IAM and Firemaker benchmark datasets show that compared with the current advanced methods at home and abroad, the algorithm proposed in this paper has certain advantages in accuracy and speed. Compared with some methods, it has certain promotional value and is suitable for dealing with multilingual writer identification problems.
-
Key words:
- Writer identification /
- hybrid codebook /
- text independent /
- factor analysis
-
表 1 双因子方差分析(TW-ANOVA)指示表
Table 1 Two way analysis of variance instruction table
方差来源 平方和 自由度 均方 F比 书写因子 $S_A$ $N-1$ $S_A/({N-1})$ $F_A$ 字符因子 $S_B$ $M-1$ $S_B/({M-1})$ $F_B$ 误差 $S_E$ $(N-1)(M-1)$ $S_E/({(N-1)(M-1)})$ 总和 $S_T$ $MN-1$ 表 2 加权方向指数直方图法/距离变换法的TW-ANOVA结果
Table 2 Results of WDIH/DT method of TW-ANOVA
方差来源 平方和 自由度 均方 F比 书写因子 1.76/4.23 10 0.176/0.423 24.11/34.67 字符因子 28.14/31.52 209 0.1346/0.1495 18.44/12.25 误差 15.23/25.61 2 090 0.0073/0.0122 总和 45.13/61.36 2 309 表 3 各种方法在
${\rm{Firemaker}}$ 数据集上的性能对比(%)Table 3 Performance comparison on Firemaker (%)
表 4 各种方法在
${\rm{ IAM }}$ 数据集上的性能对比(%)Table 4 Performance comparison on IAM dataset (%)
表 5 在三个数据集上的性能对比(%)
Table 5 Performance comparisons on three datasets (%)
评估标准 TOP-1 TOP-10 维吾尔文 2016 数据集 100 100 IAM 数据集 95.69 99.69 Firemaker 数据集 94.4 98.8 -
[1] Tan G J, Sulong G, Rahim M S M. Writer identification: A comparative study across three world major languages. Forensic Science International, 2017, 279: 41-52 doi: 10.1016/j.forsciint.2017.07.034 [2] Marti U V, Bunke H. The IAM-database: An English sentence database for off-line handwriting recognition. International Journal on Document Analysis and Recognition, 2002, 5: 39−46 doi: 10.1007/s100320200071 [3] Bulacu M, Schomaker L, Vuurpijl L. Writer identification using edge-based directional features. In: Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR' 03). Edinburgh, UK: IEEE, 2003. 937−941 [4] Nguyen H T, Nguyen C T, Ino T, Indurkhya B, Nakagawa M. Text-independent writer identification using convolutional neural network. Pattern Recognition Letters, 2019, 121: 104-112 doi: 10.1016/j.patrec.2018.07.022 [5] He S, Schomaker L. Writer identification using curvature-free features. Pattern Recognition, 2017, 63: 451−464 doi: 10.1016/j.patcog.2016.09.044 [6] Khan F A, Tahir M A, Khelifi F, Bouridane A, Almotaeryi R. Robust off-line text independent writer identification using bagged discrete cosine transform features. Expert Systems with Applications, 2017, 71: 404−415 doi: 10.1016/j.eswa.2016.11.012 [7] 李昕, 丁晓青, 彭良瑞. 一种基于微结构特征的多文种文本无关笔迹鉴别方法. 自动化学报, 2009, 35(9): 1199−1208 doi: 10.3724/SP.J.1004.2009.01199Li Xin, Ding Xiao-Qing, Peng Liang-Rui. Writer identification based on improved microstructure features. Acta Automatica Sinica, 2009, 35(9): 1199−1208 doi: 10.3724/SP.J.1004.2009.01199 [8] Bertolini D, Oliveira L S, Justino E, Sabourin R. Texture-based descriptors for writer identification and verification. Expert Systems with Applications, 2013, 40(6): 2069−2080 doi: 10.1016/j.eswa.2012.10.016 [9] Fiel S, Sablatnig R. Writer retrieval and writer identification using local features. In: Proceedings of the 10th IAPR International Workshop on Document Analysis Systems. Gold Coast, QLD, Australia: IEEE, 2012. 145−149 [10] Christlein V, Gropp M, Fiel S, Maier A. Unsupervised feature learning for writer identification and writer retrieval. In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR' 17). Kyoto, Japan: IEEE, 2017. 991−997 [11] 陈使明, 王以松. 一种鲁棒的离线笔迹鉴别方法. 自动化学报, 2020, 46(1): 108−116Chen Shi-Ming, Wang Yi-Song. A robust off-line writer identification method. Acta Automatica Sinica, 2020, 46(1): 108−116 [12] Christlein V, Bernecker D, Hönig F, Maier A, Angelopoulou E. Writer identification using GMM supervectors and exemplar-SVMs. Pattern Recognition, 2017, 63: 258−267 doi: 10.1016/j.patcog.2016.10.005 [13] Siddiqi I, Vincent N. Text-independent writer recognition using redundant writing patterns with contour-based orientation and curvature features. Pattern Recognition, 2010, 43(11): 3853−3865 doi: 10.1016/j.patcog.2010.05.019 [14] Ghiasi G, Safabakhsh R. Offline text-independent writer identification using codebook and efficient code extraction methods. Image and Vision Computing, 2013, 31(5): 379−391 doi: 10.1016/j.imavis.2013.03.002 [15] Khalifa E, Al-Maadeed S, Tahir M A, Bouridane A, Jamshed A. Off-line writer identification using an ensemble of grapheme codebook features. Pattern Recognition Letters, 2015, 59: 18−25 doi: 10.1016/j.patrec.2015.03.004 [16] Hannad Y, Siddiqi I, El Kettani M E Y. Writer identification using texture descriptors of handwritten fragments. Expert Systems with Applications, 2016, 47: 14−22 doi: 10.1016/j.eswa.2015.11.002 [17] Lowe D G. Distinctive image features from scale-invariant key points. International Journal of Computer Vision, 2004, 60(2): 91−110 doi: 10.1023/B:VISI.0000029664.99615.94 [18] Wu X Q, Tang Y B, Bu W. Offline text-independent writer identification based on scale invariant feature transformation. IEEE Transactions on Information Forensics and Security, 2014, 9(3): 526−536 doi: 10.1109/TIFS.2014.2301274 [19] Xiong Y J, Wen Y, Wang P S P, Lu Y. Text-independent writer identification using SIFT descriptor and contour-directional feature. In: Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR' 15). Tunis, Tunisia: IEEE, 2015. 91−95 [20] Khan F A, Khelifi F, Tahir M A, Bouridane A. Dissimilarity Gaussian mixture models for efficient offline handwritten text-independent identification using SIFT and RootSIFT descriptors. IEEE Transactions on Information Forensics and Security, 2019, 14(2): 289−303 doi: 10.1109/TIFS.2018.2850011 [21] Christlein V, Bernecker D, Maier A, Angelopoulou E. Offline writer identification using convolutional neural network activation features. In: Proceedings of the 2015 German Conference on Pattern Recognition. Cham: Springer, 2015. 540−552 [22] Fiel S, Sablatnig R. Writer identification and retrieval using a convolutional neural network. In: Proceedings of the 2015 International Conference on Computer Analysis of Images and Patterns. Cham: Springer, 2015. 26−37 [23] He S, Schomaker L. Deep adaptive learning for writer identification based on single handwritten word images, \textit {Pattern Recognition}, 2019, 4(88): 64-74 [24] DeGroot M H, Schervish M J. Probability and Statistics. Beijing: Higher Education Press, 2005. 324−332 [25] 鄢煜尘, 李蔡媛, 邱益鸣, 陈庆虎. 基于因子分析的文本独立笔迹鉴定分类器. 武汉大学学报(工学版), 2018, 51(1): 91−94Yan Yu-Chen, Li Cai-Yuan, Qiu Yi-Ming, Chen Qing-Hu. Text-independent classifier for handwriting verification based on factor analysis. Engineering Journal of Wuhan University, 2018, 51(1): 91−94 [26] Xiao J S, Tian H, Zhang Y Q, Zhou Y Q, Lei J F. Blind video denoising via texture-aware noise estimation. Computer Vision and Image Understanding, 2018, 169: 1−13 [27] Hadjadji B, Chibani Y. Two combination stages of clustered one-class classifiers for writer identification from text fragments. Pattern Recognition, 2018, 82: 147−162 [28] Chahi A, Merabet Y EI, Ruichek Y, Touahni R. An effective and conceptually simple feature representation for off-line text-independent writer identification. Expert Systems with Applications, 2019, 123: 357−376