A Segmentation and Recognition System for Touching and Broken Numeral Strings Based on Viterbi Algorithms
-
摘要: 粘连断裂字符行的切分识别,是很多OCR 实际应用中存在的主要困难之一. 本文针对粘连断裂的印刷体数字行,提出了一种基于Viterbi 算法的切分识别方案,该方案采用两次切分识别的层次型结构. 在第二次切分识别过程中,首先,在候选切分点区域,结合灰度图像与二值轮廓信息,采用基于Viterbi 算法搜索的非直线路径进行切分,得到有效的切分路径;然后,结合分类器输出的可信度,采用Viterbi 算法来合并前面得到的候选切分图像块,进行动态切分与识别. 实际的金融票据识别系统实验表明,本文提出的印刷体数字行切分识别方法能够较好的克服字符行的粘连与断裂情况,提高了识别系统的识别率和鲁棒性.
-
关键词:
- 字符切分 /
- OCR /
- 粘连断裂字符 /
- Viterbi 算法 /
- 印刷体数字行
Abstract: Currently, in many OCR applications, it is di±cult to segment and recognize touching and broken characters. In this paper, a segmentation and recognition system based on Viterbi algorithms is proposed to solve such a problem for touching and broken machine-printed numeral strings. This system includes two steps of segmentation and recognition. In the second step, first, a segmentation method is adopted to find the character nonlinear segmentation paths by combining gray scale and binary information based on a Viterbi algorithm; then, a recognition method of using a Viterbi algorithm is adopted to dynamically combine and recognize the character candidates with their reliabilities generated from the recognizer. Some experiments on a financial document analysis and recognition system indicate that this Viterbi algorithms based method is efficient for segmentation and recognition of touching and broken numeral strings, and enhances the accuracy and robustness of the recognition system.
计量
- 文章访问数: 3449
- HTML全文浏览量: 85
- PDF下载量: 2021
- 被引次数: 0