Applications of Deep Learning for Handwritten Chinese Character Recognition: A Review
-
摘要: 手写汉字识别(Handwritten Chinese character recognition,HCCR)是模式识别的一个重要研究领域,最近几十年来得到了广泛的研究与关注,随着深度学习新技术的出现,近年来基于深度学习的手写汉字识别在方法和性能上得到了突破性的进展.本文综述了深度学习在手写汉字识别领域的研究进展及具体应用.首先介绍了手写汉字识别的研究背景与现状.其次简要概述了深度学习的几种典型结构模型并介绍了一些主流的开源工具,在此基础上详细综述了基于深度学习的联机和脱机手写汉字识别的方法,阐述了相关方法的原理、技术细节、性能指标等现状情况,最后进行了分析与总结,指出了手写汉字识别领域仍需要解决的问题及未来的研究方向.Abstract: Handwritten Chinese character recognition (HCCR) is an important research filed of pattern recognition, which has attracted extensive studies during the past decades. With the emergence of deep learning, new breakthrough progresses of HCCR have been obtained in recent years. In this paper, we review the applications of deep learning models in the field of HCCR. First, the research background and current state-of-the-art HCCR technologies are introduced. Then, we provide a brief overview of several typical deep learning models, and introduce some widely used open source tools for deep learning. The approaches of online HCCR and offline HCCR based on deep learning are surveyed, with the summaries of the related methods, technical details, and performance analysis. Finally, further research directions are discussed.
-
人类视觉在处理数量庞大的输入信息时,注意机制具有极其重要的作用[1].它能够将有限的资源优先分配给有用的信息,从而优先处理最有价值的数据. 与人类的视觉注意行为相对应,计算机在处理输入图像时,通过检测显著性区域来实现判断其中视觉信息的重要程度.视觉显著性检测在诸如目标检测、图像压缩、基于内容的图像编辑等方面中具有广泛的应用,是计算视觉研究中非常重要的基础性课题[2].
在显著性目标检测研究领域,基于区域的显著性检测方法由于检测速度快、精确度高等优点已经成为目前该领域中的主流方法.此类方法进行显著性检测的过程可以分为区域特征表示和对比度计算两个重要步骤,对图像区域的特征进行有效的表示直接影响到显著图的质量.然而目前的方法几乎都是使用底层视觉特征对分割区域内的像素集合进行特征表示,例如文献[3-4]使用CIELab 颜色直方图表示图像区域的特征;文献[5]使用RGB颜色特征、方向特征和纹理特征表示图像区域.与底层视觉特征相比较,中层语义特征具有更好的区分度,本文提出一种基于词袋模型的新的显著性目标检测算法.
1. 相关工作
自Koch等[6]提出显著图的定义以来,目前已经出现了大量的显著性检测算法.Achanta等[7]将这些方法总体上概括为以下三类:第一类为基于生物模型的方法,经典IT算法[8]是其中的典型代表.由于人类视觉系统的生物学结构非常复杂,此类方法计算复杂度非常高,而纯数学计算型的方法在很多环节使用简单的计算直接实现,大幅提高了计算速度和检测效果,是目前显著性检测算法中的主流研究方向.还有些方法采用了纯数学计算并融合生物学模型,例如Harel 等提出的GBVS(Graph based visual saliency) 模型[9].
对比度是引起人类视觉注意的最大因素,基于纯数学计算的显著性检测方法又因为所使用的对比度计算方式不同而有所区别.Ma等[10]提出了一种局部对比度的显著性检测方法,它使用CIELuv颜色表示图像中每个像素的特征,并使用欧式距离度量每个像素与其邻域像素之间的差异程度;MZ方法在计算局部对比度时,将邻域的大小设为固定值,无法实现多尺度的显著性计算,为此Achanta等[11]提出通过改变感知单元邻域的尺寸大小实行显著性的多尺度计算;LC (Luminance-based contrast)方法[12]同样是以图像中的每个像素作为基本处理单元,但与MZ不同的是,使用图像像素的灰度特征计算像素在整幅图像上的全局对比度;Cheng等[3] 提出的HC (Histogram-based contrast)方法在CIELab颜色空间的三个通道计算像素在整幅图像上的全局对比度;Achanta等[7]提出的FT (Frequency tuned)方法同样也是一种全局对比度计算方法,其所使用的全局信息是图像的平均信息;Goferman等[13] 提出的CA (Contex aware)方法也是从感知单元之间的差异性出发计算显著性,但是与上面方法不同的是,CA 考虑了感知单元之间的空间位置关系.
上述显著性检测方法都是在像素级别计算显著性,而基于区域的显著性检测方法以图像区域为基本处理单元,速度更快,精度更高. 此类方法又因为使用不同的分割方法,区域的图像特征表示和显著性计算而有所不同.Cheng等[3]提出的RC (Region-based contrast)方法使用图割对图像进行分割,然后使用颜色直方图表示每个图像区域的特征,在计算每个图像小块的全局对比度的同时考虑了颜色对比度、空间距离和分块大小三个因素;与RC方法基于超像素分割获得图像区域不同,Cheng等[14]提出的GC (Global cues)方法利用对所有像素进行初始聚类得到的聚类中心计算颜色对比度,利用对高斯成分进行二次聚类得到聚类中心计算颜色空间分布,最后使用文献[15]中的方法将颜色对比度与颜色空间分布相结合得到最终显著图;Margolin等[16] 提出的PD (Patch distinct)方法通过分析图像小块的内部统计特性,使用主成分分析表示图像小块进而计算图像小块的显著性; Jiang等[4] 提出的CBS (Context-based saliency)方法使用图割方法将图像快速分成不同的子区域,使用CIELab颜色直方图表示图像区域的特征,然后使用距离函数计算每个图像小块与近邻图像小块之间的差异性生成显著图;Shen等[5] 提出的LR (Low rank) 方法使用RGB 颜色特征、方向特征和纹理特征表示图像区域,使用鲁棒PCA (Principal component analysis) 算法对特征矩阵进行分解计算显著性.基于区域的显著性检测过程可以分为区域的图像特征表示和对比度计算两个重要步骤,目前此类方法几乎都是使用底层视觉特征进行对比度计算.相对于底层视觉特征,中层语义特征更加符合人类视觉模型,为此本文提出一种基于词袋模型的新的显著性目标检测方法.
2. 本文方法
2.1 方法描述
对于一幅给定的图像I,显著性检测的目的是将图像中任意像素x归于前景目标区域或者背景区域两种可能状态之一,将这两种状态分别简记为S (Salient)和B (Background),它们的先验概率相应地简记为P(S)和P(B),则根据贝叶斯推断原理,像素x的显著性计算公式为:
\begin{align} & P(S|x)=\frac{P(S)P(x|S)}{P(S)P(x|S)+P(B)P(x|B)} \\ & P(S)+P(B)=1 \\ \end{align}
(1) 式中,P(x|S)表示显著区域已知的情况下观测像素x的条件概率密度,P(x|B)表示背景区域已知的情况下观测像素x的条件概率密度.
2.2 基于目标性的先验概率
本文使用目标性计算式(1)中的先验概率,对于图像中的任意像素x,以此像素为中心,随机抽取图像中的W个窗口,文献[17]分别从以下四个方面计算每个窗口的目标性:
1) 窗口显著性.首先利用任意显著性检测方法计算得到图像中每个像素的显著值I(p),则窗口w ∈ W 的显著性计算公式为:
$\begin{align} & S(w,\theta_s)=\\ & \sum_{\{p\in{W} \mid I(P)\ge \theta_s\}}I(p)\times \frac{\{p\in{W} \mid I(P)\ge \theta_s\}}{|w|} \end{align}$
(2) 式中,θs表示待学习的显著性阈值参数.
2) 颜色对比度. 对于窗口w∈ W,以θcc为固定倍数在每个方向将其扩展到周围区域得到某一矩形区域Surr(w,θcc),则窗口w在此区域内的颜色对比度计算公式为:
$CC(w,\theta_{cc})=\chi^2(h(w),h(Surr(w,\theta_{cc})))$
(3) 式中,h(w)、h(Surr(w,θcc))分别表示窗口w与矩形区域Surr(w,θcc)的颜色直方图,χ2(·)表示卡方距离函数.
3) 边缘密度. 对于窗口w ∈ W,以θED为固定倍数将其收缩到内部环状区域Inn(w,θED),则此窗口w在区域Inn(w,θED)内的边缘性计算公式为:
$ED(w,\theta_{ED})=\frac{\sum_{p\in{Inn(w,\theta_{ED})}}I_{ED}(p)}{Len(Inn(w,\theta_{ED}))}$
(4) 式中,IED(p)表示使用Canny算子得到的二值图,Len(·)表示计算区域Inn(w,θED) 的周长.
4) 轮廓闭合性. 首先将图像分割为若干超像素S,则窗口w∈W的轮廓闭合性的计算公式为:
$SS(w) = 1 - \sum\limits_{s \in S} {\frac{{\min (|s{\rm{\backslash w}}|,|s \cap w|)}}{{|w|}}} $
(5) 式中,s∈S表示图像中的第s个超像素,|s\w|表示超像素s位于窗口w之外的面积,而|s∩ w|表示超像素s位于窗口w内部的面积.
将上述得到的窗口显著性S(w,θs)、颜色对比度CC(w,θcc)、边缘密度ED(w,θED)以及轮廓闭合性SS(w)进行融合就得到每个窗口被判定为显著性目标的概率值P(w),那么基于目标性的先验概率计算公式为:
$P_s(x)=\sum_{w\in{W}\cap x\in{W}}P(w_x)$
(6) 2.3 超像素词袋特征
已知一个图像数据集D={d1,d2,…,dN},由于CIELab颜色模型能够将亮度和色度分量分开,相关研究工作[3-4, 7, 16]也表明在此颜色空间进行检测得到的显著图的准确度更高,因此将图像变换到CIELab 颜色空间,然后随机抽取其中的300k个像素的颜色特征组成局部特征集合X,对X进行聚类得到视觉词典V=[v1,v2,…,vK] ∈ {RD× K},vk∈ RD×1,k=1,2,… ,K表示第k个视觉单词向量,K为视觉单词数目,D为像素颜色特征的维数. 在得到视觉词典后,使用硬分配编码方法对图像中的每个像素进行编码[18].对于数据集中任意一幅图像,cj∈ RD×1表示第j个像素颜色特征,其对应的编码矢量Uj∈ RK×1第k维值的计算公式为:
${{U}_{jk}}=\left\{ \begin{array}{*{35}{l}} 1,\text{若}j=\arg {{\min }_{j=1,2,\cdots ,K}}\|{{c}_{j}}-{{v}_{k}}{{\|}_{2}} \\ 0,\text{其他} \\ \end{array} \right.$
(7) 式中,矢量cj与vk之间的距离计算采用欧氏函数.
完成对图像中所有像素的编码操作之后,使用SLIC (Simple linear iterative clustering)方法对图像进行分割,如图 1(b)所示,图像被相应地分割成为N个尺寸均匀的超像素,假设其中第n个超像素区域内共有Pn个像素,则此区域内所有像素编码矢量的总和统计值为:
$BoF_n=\sum_{j=1}^{P_n}U_j$
(8) 式中,Uj表示超像素区域内第j个像素颜色特征的编码矢量,可以利用式(7)计算其第k维值,则BoFn就为图像中第n个超像素的词袋特征.
2.4 条件概率
为了估计式(1)中观测像素x的条件概率密度,本文假定图像周边的超像素区域为背景区域,如图 1(c)所示.假设背景区域内超像素的数目为Nb,背景超像素词袋特征记为BoFB,其中第j个超像素区域的词袋特征表示为BoFBj,使用Parzen窗法[19]得到背景超像素特征BoFB的概率密度分布,表达式为:
$P(\hat{BoF_B})=\frac{1}{N_b \sigma^K}\sum_{j=1}^{N_b}K\left(\frac{BoF_B-BoF_{Bj}}{\sigma}\right)$
(9) 式中,K为核函数,σ为窗宽,K为背景超像素特征的维数,即词袋特征的维数. 如果核函数选用高斯核函数,式(9)变为:
$ P(\hat{BoF_B})=\frac{1}{N_b \sigma^K}\sum_{j=1}^{N_b}{exp}\left(-\frac{\|BoF_B-BoF_{Bj}\|_2}{2\sigma^2}\right)$
(10) 式中,||·||2表示l2范数,则在背景区域已知的情况下,图像中任意超像素区域Rn的条件概率密度计算公式为:
$\begin{array}{l}P(\hat{R_i}\!\mid\!B)=\frac{1}{N_b \sigma^K }\sum\limits_{j=1}^{N_b}{exp}(-\frac{\|BoF_i-BoF_{Bj}\|_2}{2\sigma^2})\\P(R_i\!\mid\!\hat S)=1-P(\hat {R_i}\!\mid\! B)\end{array}$
(11) 将区域的显著性值传递给此区域内的所有像素就得到了基于中层语义特征的条件概率显著图P(x|B)和P(x|S).将式(6)中得到的先验概率和式(11)中得到的条件概率P(x|B)和P(x|S)代入式(1)中就得到了图像的最终显著图.
3. 实验与分析
3.1 数据库及评价准则
本节实验在4个显著性目标公开数据库上验证本文方法的性能.第一个为瑞士洛桑理工大学Achanta等建立的ASD数据库[7],该数据库是MSRA-5000数据库的一个子集,共有1000幅图像,是目前最为广泛使用的、已经人工精确标注出显著性目标的显著性检测算法标准测试库.第二个和第三个为SED1和SED2 数据库[20],这两个数据库都包含共100幅图像,并且提供了3个不同用户给出的精确人工标注,也是目前广泛使用的显著性检测算法标准测试库.这两个数据库的主要区别在于前者每幅图像包含一个目标物体,而后者包含两个目标物体. 第四个为SOD数据库[21],该数据库是由伯克利图像分割数据集的300幅图像所组成,提供了七个不同用户给出的精确人工标注.
在第一个评价准则中,假设使用某一固定阈值
t对显著图进行分割,得到二值分割后的图像,t的取值范围为[0,255]. 将二值分割(Binary segmentation,BS)图像与人工标注图像(Groud-truth,GT)进行比较得到查准率(Precision)和查全率(Recall),计算公式为: $Precision=\frac{\sum_{(x,y)}GT(x,y)BS(x,y)}{\sum_{(x,y)}BS(x,y)}$
(12) $Recall=\frac{\sum_{(x,y)}GT(x,y)BS(x,y)}{\sum_{(x,y)}GT(x,y)}$
(13) 式中,GT和BS分别表示人工标注图像和二值分割后的图像.将阈值t依次设定为1到255对数据库中的所有显著图进行二值分割,计算出相应的平均查准率和查全率,以查全率为横坐标,以查准率为纵坐标,就得到了关于阈值t在整个数据库上的PR(Precision-recall)曲线.
在第二个评价准则中,使用文献[3, 5, 7]中的自适应阈值确定方法对图像进行二值分割,同样与人工标注图像进行比较,得到查准率和查全率,并计算F度量值(F-measure),计算公式为:
$F_\beta=\frac{(1+\beta^2)\times Precision \times Recall}{\beta^2\times Precision+Recall}$
(14) 与文献[3, 5, 7]一致,本文也将β2设为0.3,并且将自适应阈值设为图像显著值的整数倍,即:
$t_\alpha=\frac{K} {W\times H}\sum_{x=1}^{W} \sum_{y=1}^{H} S(x,y)$
(15) 式中,W、H分别表示显著图的宽度和长度,
S为显著图,K的经验值为2. 为了进一步评价F度量值的综合性能,在区间[0.1,6]中以0.1为采样步长均匀选取一系列K 的值,利用式(14)计算不同K值对应的平均F度量值,然后以K值为横坐标,F值为纵坐标,相应地画出Fβ-K曲线.由于查准率和查全率不能度量显著图中被正确标注为前景像素和背景像素的精确数目,为了更加全面均衡地对显著性检测方法进行客观评价,使用文献[22]中的平均绝对误差(Mean absolute error,MAE)作为第三个评价准则,该准则计算未进行二值分割的连续显著图S与人工标注图GT所有像素之间的绝对误差的平均值,计算公式为: $MAE=\frac{1}{W\times H}\sum_{x=1}^{W} \sum_{y=1}^{H} \mid S(x,y)-GT(x,y)\mid$
(16) 式中,W、H分别表示S以及GT的宽度和长度.
3.2 参数
本文方法中的重要参数为超像素的数目N和视觉单词数目K,使用第二个度量准则中的平均F度量值衡量各种参数对检测性能的影响.首先固定K=50,将N分别设为100、150、200、250、300、350、400、450、500、600,不同超像素数目下的平均F度量值如图 2所示,由此可知,ASD、SED1、SED2以及SOD四个数据库上,当N大于200 之后,各个超像素数目之间的性能相差不大.当超像素数目分别为200、350、250、350时,本文方法取得最高的F值,因此接下来的实验在四个数据库上将N分别设为200、350、250、350.将K分别设为10、20、30、40、50、60、70、80、90、100,不同单词数目下的平均F度量值如图 3所示,由此可知,ASD数据库上各个单词数目之间的性能相差很小,单词数目为70 时,本文方法取得了最高的F值.SED1和SED2数据库上F值的最高与最低之差分别为0.011和0.013,单词数目分别为80和20时,本文方法取得了最高的F值.SOD数据库上的最高值与最低值之差超过0.02,这主要是因为此数据集比较复杂,当视觉单词数目比较少时,不能充分编码图像中的颜色特征,从而加剧了视觉单词数目之间的性能之差. 单词数为90时,本文方法取得了最高的F值. 因此在接下来的实验中,ASD、SED1、SED2以及SOD 数据库上的单词数目分别被设为70、80、20和90.
3.3 与其他显著性检测算法的比较
将本文方法与16种流行的显著性检测方法进行性能比较. 为了便于对比,本文将这16种流行算法分为: 1)在图像像素级别上进行显著性计算的方法,包括IT{[8]}、{MZ[10]}、AC[11]、LC[12]、HC[3]、FT[7]、CA[13]、 GBVS[9],这类方法是本领域引用次数较多的经典方法; 2)在图像区域级别上进行显著性计算的方法,包括RC[3]、GC[14]、 PD[16]、 CBS[4]、 LR[5],这类方法是近三年出现在顶级期刊上的方法; 3) 基于贝叶斯模型的方法,包括SUN (Saliency using natual statistics)[23]、\mbox{SEG (Segmentation)[24]、} CHB (Convex hull and Bayesian)[25],此类方法是与本文方法最为相关的显著性计算方法.
3.3.1 定量对比
图 4至图 7给出了本文算法与16种流行算法的PR曲线. 本文方法在ASD、SED1、 SED2以及SOD四个数据库上都取得了最优的性能.当分割阈值t为0时,所有方法具有相同的查准率,在ASD、SED1、SED2以及SOD数据库上的数值分别为0.1985、0.2674、0.2137、0.2748,即表明数据库中分别平均有19.85%、26.74%、21.37%、27.48%的像素属于显著性区域.当分割阈值t为255时查全率达到最小值.此时本文方法的查准率在ASD、SED1、SED2、SOD数据库上分别达到了0.9418、0.8808、0.9088、0.7781.当查全率为0.85时,本文方法在ASD数据库上的查准率保持在0.9以上,在SED1、SED2两个数据库上保持在0.75以上,在SOD数据库上也高于0.5,表明本文方法能够以更高精度检测到显著区域的同时覆盖更大的显著性区域.
除此之外,将式(15)中K值设为2计算自适应阈值,使用式(12)~(14)分别计算平均查准率、查全率和F 值,本文方法与16种流行算法的的对比结果见图 8. 由图中的数据可知,ASD数据与SED1数据库上取得了一致的结果,与基于像素的和基于区域的13种检测方法相比,本文方法具有最高的查准率、查全率和F值,说明本文方法能够以最高的精度检测显著性目标,同时能够最大覆盖显著性目标所在区域.与基于贝叶斯模型的显著性检测方法相比,本文方法具有最高的查准率,但是查全率仅仅低于CHB方法,这主要是因为CHB使用角点检测显著性区域作为先验信息时,很多角点会落在背景区域,造成检测到的显著性区域过大,如图 22(d)中第4排、图 23(d)中第2排所示.但本文方法仍然具有最高F 度量值,说明仍具有更优的检测性能. 在SED2和SOD数据库上取得了一致结果,与所有对比方法相比较,本文方法具有最高的查全率和F值,但是查准率却分别低于SEG方法和CBS方法.
为了更进一步评价F度量值的综合性能,将式(15)中K值分别设为[0.1:0.1:6]计算自适应阈值,使用式(14)计算得到一系列F值,以K值为横坐标,以F值为纵坐标得到Fβ-K 曲线.本文方法与16种流行算法的Fβ-K曲线分别见图 9~图 12.由图中的结果可知,在ASD 数据库上,与基于像素的和基于区域的13种检测方法相比,本文方法在每个K值处都具有最高的F值,与基于贝叶斯模型的显著性检测算法相比较,在K∈[5.7,6]这个区间时(如图 9(c)所示),本文方法的F值低于CHB 方法,在K取其他值时,本文方法的F值仍然最高,这是因为CHB方法的检测结果会出现显著范围过大的现象.SED1、SED2和SOD数据库上取得了一致的结果,相较于所有对比方法,本文方法在每个K值处都具有最高的F值.
为了全面评价显著性检测方法的性能,根据式(16)计算显著图与人工标注图之间的MAE值,本文方法与16种流行算法在所有数据库上的对比结果分别见图 13.由图中的结果可知,四个数据库上取得了一致的结果,本文方法具有最低的MAE值,SUN方法的MAE值最高.在ASD、SED1以及SOD数据库上,所有对比方法中,GC的MAE值最低,与该方法相比较,本文方法的MAE值又分别降低了22%、12%和17%;在SED2数据库上,所有对比方法中,HC的MAE值最低,与该方法相比较,本文方法的MAE值又降低了13%.
3.3.2 视觉效果对比
本文方法与基于像素的显著性检测算法的视觉对比结果见图 14~图 17. 由图 14(b)和14(i)、图 15,(b)和15(i)、图 16(b)和16(i)以及图 17(b)和17(i)可知,IT和GBVS 方法得到的显著图分辨率比较低,这是因为IT方法采用下采样的方式实现多尺度显著性计算,而GBVS方法中的马尔科夫链平衡状态的计算复杂度比较高,同样需要减小图像的分辨率实现快速计算. 由图 14(c)、图 15(c)、图 16(c)以及图 17(c)可知,MZ方法得到的显著图过分强调显著性目标边缘部分,这是因为在计算局部对比度时使用的邻域比较少. 相对于MZ方法,AC方法是一种多尺度局部对比度方法,多个尺度的范围比较大,如图 14(d)、图 15(d)、图 16(d)以及图 17(d)所示,该方法能够检测到整个显著性目标.LC和HC都是使用颜色的全局对比度,导致稀有颜色占优,只能检测到显著性目标的部分区域,例如图 14(e)和14(f)的第4排,两种方法只将鸡蛋中最明亮的颜色检测出来. 图 15(e)和图 15(f)中的第1排和第2排也出现了相同的现象,两种方法将图像中颜色最明亮的水面和草地错误地检测为显著性区域.与MZ方法相比,CA方法考虑了像素之间的距离因素,检测性能有很大的提高,但是仍然只是使用K个近邻计算局部对比度,因此同样会过分强调显著性目标边缘,如图 14(h)、图 15(h)、图 16(h)以及图 17(h)所示.与基于像素的典型显著性检测算法相比,本文方法以区域为处理单位,如图 14(j)、图 15(j)、图 16(j)以及图 17(j)所示,显著图具有很高的分辨率,能够一致高亮地凸显图像中的显著性目标.ASD数据库上,与图 14(k) 中的人工标注图相比较,图 14(j)中各显著图的检测准确度从上而下分别为0.9610、0.6512、0.9905、0.9961;SED1数据库上,与图 15(k) 中的人工标注图相比较,图 15(j)中各显著图的检测准确度从上而下分别为0.1911、0.9828; SED2数据库上,与图 16(k) 中的人工标注图相比较,图 16(j)中各显著图的检测准确度从上而下分别为0.9924、0.9939; SOD数据库上,与图 17(k) 中的人工标注图相比较,图 17(j)中各显著图的检测准确度从上而下分别为0.9987、0.9999.仍然存在以下缺陷:1)该方法只能检测到图像中颜色最显著的区域. 例如图 18(c)中第4排的鸡蛋图像中,GC方法只检测到了鸡蛋最明亮的区域,而不是整个鸡蛋;SED1以及SOD数据库上也出现了同样的现象,例如图 19(c)的第1排图像中,GC方法将颜色最明亮的水面错误地检测为显著性区域,而图 21(c)中的第1排图像中,GC方法只检测到了花朵中最明亮的花芯区域. 2)GC方法与RC方法不能有效地检测到尺寸比较大的显著性目标,例如图 21(c)的第2排. PD算法将模式和颜色对比度相结合,显著性目标的边界清晰,但该方法同样是一种全局对比度方法,无法将显著性目标整体地凸显出来,例如图 18(d)中第3排的鲜花图像,只把整幅图像中最显著的花蕊部分检测出来,而不是整个鲜花;SOD数据库上也出现了同样的现象,例如图 21(d)中第1排图像. CBS方法计算颜色的局部对比度,并在计算过程中利用中心先验信息,如图 18(e)、图 19(e)以及图 20(e)中第1排所示,当显著性目标偏离中心区域时,此方法会失效. LR方法本质上是一种忽略了空间位置因素的全局对比度方法,如图 18(f)、图 19(f)、图 20(f)以及图 21(f)中的所有示例图像所示,此方法得到的显著图非常不均匀,只能检测到显著性目标的部分区域.与上述方法相比,本文方法利用图像周边区域作为背景先验信息,如图 20(g)中第1排、图 18(g)中第4排、图 19(g)中第2排以及图 21(g)中第2排的示例图像所示,对于不同尺寸的目标都具有非常好的检测性能.如图 18(g)、图 19(g)以及图 20(g) 中第1 排所示,当显著性目标偏离中心区域时,本文方法也具有优良检测结果.图 18(g)、图 19(g)、图 20(g)以及图 21(g)中的所有示例图像显示,本文方法得到的显著图非常均匀,能够一致高亮地凸显图像中的显著性目标.
本文方法与基于区域的显著性检测算法的视觉对比结果见图 18~21.当显著性区域与背景的颜色非常接近时,RC方法会失效,如图 18(b)第1排、图 20(b)第1排以及图 21(b)第2排的示例图像所示. 与RC方法相比,GC不仅考虑了颜色的全局对比度,同时结合了颜色空间分布信息,但是这种方法
本文方法与基于贝叶斯模型的显著性检测算法的视觉对比结果见图 22sim25.由图 22(b)、图 23(b)、图 24(b)以及图 25(b)中的所有示例图像可知,SUN方法所得到的显著图过分强调目标的边缘,而不是整个目标.由于SEG方法在整幅图像内的每个滑动窗口内对背景和显著性区域进行先验性假定,如图 22(c)、23(c)以及25(c)中的所有示例图像所示,该方法不能凸显目标与背景之间的显著性差别. 对于CHB方法,显著图的准确度取决于凸包所在的区域,当图像中的背景变得复杂时,背景区域的角点相应地变多,导致检测结果会出现显著范围过大的现象,如图 22(d)中第4 排、图 23(d) 中第2排的示例图像所示.与此类方法相比,本文方法使用了更精确的目标性作为先验概率,能够一致高亮地凸显整个显著性目标,显著区域的边界与目标边界吻合,如图 22(e)、图 23(e)、图 24(e)以及图 25(e)所示.
4. 总结
本文提出一种基于词袋模型的新的显著性目标检测算法,首次将具有更好区分度的中层语义特征-词袋模型应用到显著性目标检测领域,具有非常强的新颖性. 具体来说,该方法首先利用目标性计算先验概率显著图,然后利用超像素区域的词袋计算条件概率显著图,最后根据贝叶斯原理将二者进行合成.多个公开数据库上的对比实验结果表明本文方法具有更高的精度和更好的查全率,能够一致高亮地凸显图像中的显著性目标.
-
表 1 目前一些主流的深度学习开源仿真工具及其下载地址
Table 1 Some mainstream deep-learning open source toolboxes and their download address at present
工具名称 说明及备注 下载地址 Caffe[112] UC Berkeley BVLC 实验室发布的深度学习开源工具,是目前使用最为广泛的深度学习实验平台之一 https://github.com/BVLC/caffe Theano[113-114] 基于Python 语言的深度学习开源仿真工具 https://github.com/Theano/Theano Torch[115] 基于Lua 脚本语言的工具,支持iOS、Android 等嵌入式平台 http://torch.ch/ Purine[116] 支持多GPU,提供线性加速能力 https://github.com/purine/purine2 MXNet[117] 由百度牵头组织的深度机器学习联盟(DMCL) 发布的C++ 深度学习工具库 https://github.com/dmlc/mxnet DIGITS[118] 由NVIDIA 公司集成开发发布的一款基于Web 页面的可视化深度学习仿真工具,支持Caffe 及Touch 工程代码 https://github.com/NVIDIA/DIGITS ConvNet[119] 最早的支持GPU 的CNN 开源工具之一,ILSVRC2012 比赛第一名提供的代码 https://code.google.com/p/cuda-convnet/ Cuda-ConvNet2[109] 支持多GPU 的ConvNet https://github.com/akrizhevsky/cuda-convnet2 DeepCNet[120] 英国Warwick 大学Graham 教授发布的开源CNN 仿真工具,曾获ICDAR 2013 联机手写汉字识别竞赛第一名 https://github.com/btgraham/SparseConvNet Petuum[121] CMU 发布的一款基于多CPU/GPU 集群并行化分布式,机器学习开源仿真平台除了支持深度学习的常用算法之外,还提供很多传统机器学习算法的实现. 可部署在云计算平台之中 https://github.com/petuum/bosen/wiki CURRENT[122] 支持GPU 的回归神经网络函数库 http://sourceforge.net/projects/currennt/ Minerva[123] 深度机器学习联盟(DMCL) 发布的支持多GPU 并行化的深度学习工具 https://github.com/dmlc/minerva TensorFlow[124] 谷歌发布的机器学习可视化开发工具,支持多CPU 及多GPU 并行化仿真,支持CNN、RNN 等深度学习模型 https://github.com/tensor°ow/tensor°ow DMTK[125] 微软发布的一套通用的分布式深度学习开源仿真工具 https://github.com/Microsoft/DMTK 表 2 不同方法在CASIA-OLHWDB1.1联机手写中文单字数据集上的识别结果对比
Table 2 Comparison with different methods on the CASIA-OLHWDB1.1
方法 准确率 (%) 伪样本变形 模型集成 (模型数量) 传统最佳方法: DFE+DLQDF[10] 94.85 × × HDNN-SSM-MCE[66] 89.39 × × MCDNN[127] 94.39 √ √(35) DeepCNet[40] 96.42 √ × DeepCNet-8方向直方图特征[40] 96.18 √ × DCNN (4种领域知识融合)[60] 96.35 √ × HSP-DCNN (4种领域知识集成)[64] 96.87 √ √(8) DeepCNet-FMP (单次测试)[132] 96.74 √ × DeepCNet-FMP (多次测试)[132] 97.03 √ √(12 test) DropSample-DCNN[61] 96.55 √ × DropSample-DCNN (集成)[61] 97.06 √ √(9) 表 3 不同深度学习方法在CASIA-OLHWDB1.0-1.1以及ICDAR2013竞赛数据集上的识别结果 (%)
Table 3 Comparison with different methods on the CASIA-OLHWDB1.0-1.1 and ICDAR 2013 Online CompetitionDB (%)
表 4 不同深度学习方法及部分典型的传统方法在ICDAR2013脱机手写汉字竞赛集上的识别性能
Table 4 Comparison with different traditional and deep-learning besed methods on ICDAR 2013 Offline CompetitionDB
方法 Top1 (%) Top5 (%) Top10 (%) 模型存储量 HCCR-Gradient-GoogLeNet[77] 96.28 99.56 99.80 27.77MB HCCR-Gabor-GoogLeNet[77] 96.35 99.6 99.80 27.77MB HCCR-Ensemble-GoogLeNet[77] (average of 4 models) 96.64 99.64 99.83 110.91MB HCCR-Ensemble-GoogLeNet[77] (average of 10 models) 96.74 99.65 99.83 277.25MB CNN-Fujitsu[39] 94.77 - 99.59 2460MB MCDNN-INSIA[74] 95.79 - 99.54 349MB MQDF-HIT[39] 92.61 - 98.99 120MB MQDF-THU[39] 92.56 - 99.13 198MB DLQDF[39] 92.72 - - - ART-CNN[76] 95.04 - - 51.64MB2 R-CNN Voting[76] 95.55 - - 51.64MB2 ATR-CNN Voting[76] 96.06 - - 206.56MB2 MQDF-CNN[78] 94.44 - - - Multi-CNN Voting[129] 96.79 - - - 2根据文献[76]给出的模型参数(CNN层数、各层卷积核大小及数量、聚合层大小及数量、全连接数量),按照每个参数以浮点数存储(占用4个字节)方式推算而得. 表 5 不同研究方法在ICDAR 2013 Offine Text CompetitionDB 数据对比记录表(%)
Table 5 Comparison with di®erent methods on the ICDAR 2013 Offine Text CompetitionDB (%)
-
[1] Hildebrandt T H, Liu W T. Optical recognition of handwritten Chinese characters:advances since 1980. Pattern Recognition, 1993, 26(2):205-225 doi: 10.1016/0031-3203(93)90030-Z [2] Suen C Y, Berthod M, Mori S. Automatic recognition of handprinted characters——the state of the art. Proceedings of the IEEE, 1980, 68(4):469-487 doi: 10.1109/PROC.1980.11675 [3] Tai J W. Some research achievements on Chinese character recognition in China. International Journal of Pattern Recognition and Artificial Intelligence, 1991, 5(01n02):199-206 doi: 10.1142/S0218001491000132 [4] Liu C L, Jaeger S, Nakagawa M. Online recognition of Chinese characters:the state-of-the-art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(2):198-213 doi: 10.1109/TPAMI.2004.1262182 [5] Cheriet M, Kharma N, Liu C L, Suen C Y. Character Recognition Systems:a Guide for Students and Practitioners. USA:John Wiley & Sons, 2007. [6] Plamondon R, Srihari S N. Online and off-line handwriting recognition:a comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(1):63-84 doi: 10.1109/34.824821 [7] Dai R W, Liu C L, Xiao B H. Chinese character recognition:history, status and prospects. Frontiers of Computer Science in China, 2007, 1(2):126-136 doi: 10.1007/s11704-007-0012-5 [8] Liu C L. High accuracy handwritten Chinese character recognition using quadratic classifiers with discriminative feature extraction. In:Proceedings of the 18th International Conference on Pattern Recognition. Hong Kong, China:IEEE, 2006.942-945 [9] Long T, Jin L W. Building compact MQDF classifier for large character set recognition by subspace distribution sharing. Pattern Recognition, 2008, 41(9):2916-2925 doi: 10.1016/j.patcog.2008.02.009 [10] Liu C L, Yin F, Wang D H, Wang Q F. Online and offline handwritten Chinese character recognition:benchmarking on new databases. Pattern Recognition, 2013, 46(1):155-162 doi: 10.1016/j.patcog.2012.06.021 [11] Zhang H G, Guo J, Chen G, Li C G. HCL2000——a large-scale handwritten Chinese character database for handwritten character recognition. In:Proceedings of the 10th International Conference on Document Analysis and Recognition. Barcelona, Spain:IEEE, 2009.286-290 http://cn.bing.com/academic/profile?id=2137472923&encoded=0&v=paper_preview&mkt=zh-cn [12] 钱跃良, 林守勋, 刘群, 刘洋, 刘宏, 谢萦. 863计划中文信息处理与智能人机接口基础数据库的设计和实现. 高技术通讯, 2005, 15(1):107-110Qian Yue-Liang, Lin Shou-Xun, Liu Qun, Liu Yang, Liu Hong, Xie Ying. Design and construction of HTRDP corpora resources for Chinese language processing and intelligent human-machine interaction. Chinese High Technology Letters, 2005, 15(1):107-110 [13] Jin L W, Gao Y, Liu G, Liu G Y, Li Y Y, Ding K. SCUT-COUCH2009——a comprehensive online unconstrained Chinese handwriting database and benchmark evaluation. International Journal on Document Analysis and Recognition, 2011, 14(1):53-64 doi: 10.1007/s10032-010-0116-6 [14] Liu C L, Sako H, Fujisawa H. Handwritten Chinese character recognition:alternatives to nonlinear normalization. In:Proceedings of the 7th International Conference on Document Analysis and Recognition. Edinburgh, UK:IEEE, 2003.524-528 [15] Liu C L, Marukawa K. Pseudo two-dimensional shape normalization methods for handwritten Chinese character recognition. Pattern Recognition, 2005, 38(12):2242-2255 doi: 10.1016/j.patcog.2005.04.019 [16] Jin L W, Huang J C, Yin J X, He Q H. Deformation transformation for handwritten Chinese character shape correction. In:Proceedings of the 3rd International Conference on Advances in Multimodal Interfaces. Beijing, China:Springer, 2000.450-457 [17] Miyao H, Maruyama M. Virtual example synthesis based on PCA for off-line handwritten character recognition. In:Proceedings of the 7th International Workshop on Document Analysis Systems VⅡ. Nelson, New Zealand:Springer, 2006.96-105 [18] Chen G, Zhang H G, Guo J. Learning pattern generation for handwritten Chinese character using pattern transform method with cosine function. In:Proceedings of the 2006 International Conference on Machine Learning and Cybernetics. Dalian, China:IEEE, 2006.3329-3333 [19] Leung K C, Leung C H. Recognition of handwritten Chinese characters by combining regularization, Fisher's discriminant and distorted sample generation. In:Proceedings of the 10th International Conference on Document Analysis and Recognition. Barcelona, Spain:IEEE, 2009.1026-1030 https://www.computer.org/web/csdl/index/-/csdl/proceedings/icdar/2009/3725/00/index.html [20] Okamoto M, Nakamura A, Yamamoto K. Direction-change features of imaginary strokes for on-line handwriting character recognition. In:Proceedings of the 14th International Conference on Pattern Recognition. Brisbane, QLD:IEEE, 1998.1747-1751 [21] Okamoto M, Yamamoto K. On-line handwriting character recognition using direction-change features that consider imaginary strokes. Pattern Recognition, 1999, 32(7):1115-1128 doi: 10.1016/S0031-3203(98)00153-8 [22] Ding K, Deng G Q, Jin L W. An investigation of imaginary stroke techinique for cursive online handwriting Chinese character recognition. In:Proceedings of the 10th International Conference on Document Analysis and Recognition. Barcelona, Spain:IEEE, 2009.531-535 [23] Jin L W, Wei G. Handwritten Chinese character recognition with directional decomposition cellular features. Journal of Circuits, Systems, and Computers, 1998, 8(4):517-524 doi: 10.1142/S0218126698000316 [24] Bai Z L, Huo Q. A study on the use of 8-directional features for online handwritten Chinese character recognition. In:Proceedings of the 8th International Conference on Document Analysis and Recognition. Seoul, Korea:IEEE, 2005.262-266 [25] Liu C L, Zhou X D. Online Japanese character recognition using trajectory-based normalization and direction feature extraction. In:Proceedings of 10th International Workshop on Frontiers in Handwriting Recognition. La Baule, France:IEEE, 2006. http://or.nsfc.gov.cn/bitstream/00001903-5/96633/1/1000007198379.pdf [26] Ge Y, Huo Q, Feng Z D. Offline recognition of handwritten Chinese characters using Gabor features, CDHMM modeling and MCE training. In:Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing. Orlando, FL, USA:IEEE, 2002. I-1053-I-1056 [27] Liu C L. Normalization-cooperated gradient feature extraction for handwritten character recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(8):1465-1469 doi: 10.1109/TPAMI.2007.1090 [28] Kimura F, Takashina K, Tsuruoka S, Miyake Y. Modified quadratic discriminant functions and the application to Chinese character recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1987, PAMI-9(1):149-153 http://cn.bing.com/academic/profile?id=2041570030&encoded=0&v=paper_preview&mkt=zh-cn [29] Mangasarian O L, Musicant D R. Data discrimination via nonlinear generalized support vector machines. Complementarity:Applications, Algorithms and Extensions. US:Springer, 2001.233-251 http://cn.bing.com/academic/profile?id=1518494348&encoded=0&v=paper_preview&mkt=zh-cn [30] Kim H J, Kim K H, Kim S K, Lee J K. On-line recognition of handwritten Chinese characters based on hidden Markov models. Pattern Recognition, 1997, 30(9):1489-1500 doi: 10.1016/S0031-3203(96)00161-6 [31] Liu C L, Sako H, Fujisawa H. Discriminative learning quadratic discriminant function for handwriting recognition. IEEE Transactions on Neural Networks, 2004, 15(2):430-444 doi: 10.1109/TNN.2004.824263 [32] Jin X B, Liu C L, Hou X W. Regularized margin-based conditional log-likelihood loss for prototype learning. Pattern Recognition, 2010, 43(7):2428-2438 doi: 10.1016/j.patcog.2010.01.013 [33] Srihari S N, Yang X S, Ball G R. Offline Chinese handwriting recognition:an assessment of current technology. Frontiers of Computer Science in China, 2007, 1(2):137-155 doi: 10.1007/s11704-007-0015-2 [34] Su T H, Zhang T W, Guan D J, Huang H J. Off-line recognition of realistic Chinese handwriting using segmentation-free strategy. Pattern Recognition, 2009, 42(1):167-182 doi: 10.1016/j.patcog.2008.05.012 [35] Wang Q F, Yin F, Liu C L. Handwritten Chinese text recognition by integrating multiple contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(8):1469-1481 doi: 10.1109/TPAMI.2011.264 [36] Zhou X D, Wang D H, Tian F, Liu C L, Nakagawa M. Handwritten Chinese/Japanese text recognition using semi-Markov conditional random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(10):2413-2426 doi: 10.1109/TPAMI.2013.49 [37] Qiu L Q, Jin L W, Dai R F, Zhang Y X, Li L. An open source testing tool for evaluating handwriting input methods. In:Proceedings of the 13th International Conference on Document Analysis and Recognition. Tunis:IEEE, 2015.136-140 [38] Lin C L, Yin F, Wng Q F, Wang D H. ICDAR 2011 Chinese handwriting recognition competition. In:Proceedings of the 11th International Conference on Document Analysis and Recognition. Beijing, China:IEEE, 2011.1464-1469 [39] Yin F, Wang Q F, Zhang X Y, Liu C L. ICDAR 2013 Chinese handwriting recognition competition. In:Proceedings of the 12th International Conference on Document Analysis and Recognition. Washington, DC, USA:IEEE, 2013.1464-1470 [40] Graham B. Spatially-sparse convolutional neural networks. arXiv:1409.6070, 2014. http://cn.bing.com/academic/profile?id=2270144854&encoded=0&v=paper_preview&mkt=zh-cn [41] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786):504-507 doi: 10.1126/science.1127647 [42] Bengio Y, Courville A, Vincent P. Representation learning:a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8):1798-1828 doi: 10.1109/TPAMI.2013.50 [43] Schmidhuber J. Deep learning in neural networks:an overview. Neural Networks, 2015, 61:85-117 doi: 10.1016/j.neunet.2014.09.003 [44] LeCun Y, Boser B, Denker J S, Howard R E, Habbard W, Jackel L D, Henderson D. Handwritten digit recognition with a back-propagation network. In:Proceedings of Advances in Neural Information Processing Systems 2. San Francisco, CA, USA:Morgan Kaufmann Publishers Inc., 1990.396-404 http://cn.bing.com/academic/profile?id=2109779438&encoded=0&v=paper_preview&mkt=zh-cn [45] LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11):2278-2324 doi: 10.1109/5.726791 [46] Ranzato M A, Poultney C, Chopra S, LeCun Y. Efficient learning of sparse representations with an energy-based model. In:Proceedings of the 2007 Advances in Neural Information Processing Systems. USA:MIT Press, 2007.1137-1144 [47] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8):1735-1780 doi: 10.1162/neco.1997.9.8.1735 [48] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In:Proceedings of the 2012 Advances in Neural Information Processing Systems 25. Lake Tahoe, Nevada, USA:Curran Associates, Inc., 2012.1097-1105 [49] Ouyang W L, Wang X G, Zeng X Y, Qiu S, Luo P, Tian Y L, Li H S, Yang S, Wang Z, Loy C C, Tang X O. Deepid-net:Deformable deep convolutional neural networks for object detection. In:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA:IEEE, 2015.2403-2412 [50] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014. http://cn.bing.com/academic/profile?id=1445015017&encoded=0&v=paper_preview&mkt=zh-cn [51] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473, 2014. http://arxiv.org/abs/1409.0473v6 [52] Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks. In:Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, BC, Canada:IEEE, 2013.6645-6649 http://cn.bing.com/academic/profile?id=2276532228&encoded=0&v=paper_preview&mkt=zh-cn [53] Xu K, Ba J, Kiros R, Cho, Courville A, Salakhutdinov R, Zemel R, Bengio Y. Show, attend and tell:neural image caption generation with visual attention. arXiv:1502.03044, 2015. https://arxiv.org/pdf/1505.00393.pdf [54] Vinyals O, Toshev A, Bengio S, Erhan D. Show and tell:a neural image caption generator. In:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA:IEEE, 2015.3156-3164 http://arxiv.org/pdf/1602.05875.pdf [55] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553):436-444 doi: 10.1038/nature14539 [56] Tang Y C, Mohamed A R. Multiresolution deep belief networks. In:Proceedings of the 15th International Conference on Artificial Intelligence and Statistics. La Palma, Canary Islands, Spain:Microtome Publishing, 2012.1203-1211 [57] Srivastava N, Salakhutdinov R. Multimodal learning with deep Boltzmann machines. In:Proceedings of the 2012 Advances in Neural Information Processing Systems. Tahoe, Nevada, USA:Curran Associates, Inc., 2012.2222-2230 [58] Shao J, Kang K, Loy C C, Wang X G. Deeply learned attributes for crowded scene understanding. In:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA:IEEE, 2015.4657-4666 [59] Oquab M, Bottou L, Laptev I, Sivic J. Learning and transferring mid-level image representations using convolutional neural networks. In:Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA:IEEE, 2014.1717-1724 http://cn.bing.com/academic/profile?id=2396013981&encoded=0&v=paper_preview&mkt=zh-cn [60] Yang W X, Jin L W, Xie Z C, Feng Z Y. Improved deep convolutional neural network for online handwritten Chinese character recognition using domain-specific knowledge. In:Proceedings of the 13th International Conference on Document Analysis and Recognition. Tunis:IEEE, 2015.551-555 http://dl.acm.org/citation.cfm?id=2880878 [61] Yang W X, Jin L W, Tao D C, Xie Z C, Feng Z Y. DropSample:a new training method to enhance deep convolutional neural networks for large-scale unconstrained handwritten Chinese character recognition. arXiv:1505.05354, 2015. http://arxiv.org/pdf/1606.05763v1.pdf [62] Yang W X, Jin L W, Liu M F. Character-level Chinese writer identification using path signature feature, dropstroke and deep CNN. arXiv:1505.04922, 2015. [63] Yang W X, Jin L W, Liu M F. DeepWriterID:an end-to-end online text-independent writer identification system. arXiv:1508.04945, 2015. [64] Su T H, Liu C L, Zhang X Y. Perceptron learning of modified quadratic discriminant function. In:Proceedings of the 2011 International Conference on Document Analysis and Recognition. Beijing, China:IEEE, 2011.1007-1011 [65] Du J, Hu J S, Zhu B, Wei S, Dai L R. A study of designing compact classifiers using deep neural networks for online handwritten Chinese character recognition. In:Proceedings of the 22nd International Conference on Pattern Recognition. Stockholm, Sweden:IEEE, 2014.2950-2955 [66] Du J. Irrelevant variability normalization via hierarchical deep neural networks for online handwritten Chinese character recognition. In:Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition. Heraklion, Greece:IEEE, 2014.303-308 [67] Du J, Huo Q, Chen K. Designing compact classifiers for rotation-free recognition of large vocabulary online handwritten Chinese characters. In:Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan:IEEE, 2012.1721-1724 [68] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7):1527-1554 doi: 10.1162/neco.2006.18.7.1527 [69] Du J, Hu J S, Zhu B, Wei S, Dai L R. Writer adaptation using bottleneck features and discriminative linear regression for online handwritten Chinese character recognition. In:Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition. Heraklion, Greece:IEEE, 2014.311-316 [70] Liwicki M, Graves A, Bunke H. A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks. In:Proceedings of the 9th International Conference on Document Analysis and Recognition. Curitiba, Paraná, Brazil, 2007.367-371 [71] Frinken V, Bhattacharya N, Uchida S, Pal U. Improved BLSTM neural networks for recognition of on-line Bangla complex words. Structural, Syntactic, and Statistical Pattern Recognition. Berlin Heidelberg, German:Springer, 2014.404-413 [72] Wu W, Gao G L. Online cursive handwriting Mongolia words recognition with recurrent neural networks. International Journal of Information Processing and Management, 2011, 2(3):20-26 doi: 10.4156/ijipm [73] Graves A. Generating sequences with recurrent neural networks. arXiv:1308.0850, 2013. http://arxiv.org/pdf/1605.00064.pdf [74] Cireçsan D, Meier U. Multi-column deep neural networks for offline handwritten Chinese character classification. In:Proceedings of the 2015 International Joint Conference on Neural Networks. Killarney, Ireland:IEEE, 2015.1-6 [75] Cireçsan D C, Meier U, Gambardella L M, Schmidhuber J. Convolutional neural network committees for handwritten character classification. In:Proceedings of the 2011 International Conference on Document Analysis and Recognition. Beijing, China:IEEE, 2011.1135-1139 [76] Wu C P, Fan W, He Y, Sun J, Naoi S. Handwritten character recognition by alternately trained relaxation convolutional neural network. In:Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition. Crete, Greece:IEEE, 2014.291-296 [77] Zhong Z Y, Jin L W, Xie Z C. High performance offline handwritten Chinese character recognition using GoogLeNet and directional feature maps. In:Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR). Tunis:IEEE, 2015.846-850 http://dl.acm.org/citation.cfm?id=2880878 [78] Wang Y W, Li X, Liu C S, Ding X Q, Chen Y X. An MQDF-CNN hybrid model for offline handwritten Chinese character recognition. In:Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition. Heraklion, Greece:IEEE, 2014.246-249 [79] 高学, 王有旺. 基于CNN和随机弹性形变的相似手写汉字识别. 华南理工大学学报:自然科学版, 2014, 42(1):72-76 http://www.cnki.com.cn/Article/CJFDTOTAL-HNLG201401016.htmGao Xue, Wang You-Wang. Recognition of similar handwritten Chinese characters based on CNN and random elastic deformation. Journal of South China University of Technology:Natural Science Edition, 2014, 42(1):72-76 http://www.cnki.com.cn/Article/CJFDTOTAL-HNLG201401016.htm [80] 杨钊, 陶大鹏, 张树业, 金连文. 大数据下的基于深度神经网的相似汉字识别. 通信学报, 2014, 35(9):184-189 http://www.cnki.com.cn/Article/CJFDTOTAL-TXXB201409019.htmYang Zhao, Tao Da-Peng, Zhang Shu-Ye, Jin Lian-Wen. Similar handwritten Chinese character recognition based on deep neural networks with big data. Journal on Communications, 2014, 35(9):184-189 http://www.cnki.com.cn/Article/CJFDTOTAL-TXXB201409019.htm [81] Feng B Y, Ren M W, Zhang X Y, Suen C Y. Automatic recognition of serial numbers in bank notes. Pattern Recognition, 2014, 47(8):2621-2634 doi: 10.1016/j.patcog.2014.02.011 [82] He M J, Zhang S Y, Mao H Y, Jin L W. Recognition confidence analysis of handwritten Chinese character with CNN. In:Proceedings of the 13th International Conference on Document Analysis and Recognition. Tunis:IEEE, 2015.61-65 http://dl.acm.org/citation.cfm?id=2880731 [83] Bengio Y, Goodfellow I J, Courville A. Deep learning[Online], available:http://www.deeplearningbook.org,May11,2016 [84] LeCun Y, Boser B, Denker J S, Henderson D, Howard R E, Hubbard W, Jackel L D. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1989, 1(4):541-551 doi: 10.1162/neco.1989.1.4.541 [85] Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA:IEEE, 2015.1-9 http://www.mdpi.com/2072-4292/8/6/483/htm [86] Lin M, Chen Q, Yan S C. Network in network. arXiv:1312.4400, 2013. http://cn.bing.com/academic/profile?id=2293132816&encoded=0&v=paper_preview&mkt=zh-cn [87] Orr G B, Müller K R. Neural Networks:Tricks of the Trade. German:Springer, 1998. [88] Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580, 2012. http://cn.bing.com/academic/profile?id=2195273494&encoded=0&v=paper_preview&mkt=zh-cn [89] Wan L, Zeiler M, Zhang S X, LeCun Y, Fergus R. Regularization of neural networks using dropConnect. In:Proceedings of the 30th International Conference on Machine Learning. Atlanta, USA, 2013.1058-1066 https://arxiv.org/pdf/1505.00393.pdf [90] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S A, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C, Li F F. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3):211-252 doi: 10.1007/s11263-015-0816-y [91] Sun Y, Chen Y H, Wang X G, Tang X O. Deep learning face representation by joint identification-verification. In:Proceedings of Advances in Neural Information Processing Systems 27. Montréal, Canada:MIT, 2014.1988-1996 [92] Taigman Y, Yang M, Ranzato M A, Wolf L. DeepFace:closing the gap to human-level performance in face verification. In:Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA:IEEE, 2014.1701-1708 http://europepmc.org/articles/PMC4373928 [93] Toshev A, Szegedy C. Deeppose:Human pose estimation via deep neural networks. In:Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA:IEEE, 2014.1653-1660 https://www.computer.org/csdl/proceedings/cvpr/2014/5118/00/index.html [94] Williams R J, Zipser D. A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1989, 1(2):270-280 doi: 10.1162/neco.1989.1.2.270 [95] Graham B. Sparse arrays of signatures for online character recognition. arXiv:1308.0371, 2013. http://cn.bing.com/academic/profile?id=2360228825&encoded=0&v=paper_preview&mkt=zh-cn [96] Jaderberg M, Simonyan K, Vedaldi A, Zisserman A. Synthetic data and artificial neural networks for natural scene text recognition. arXiv:1406.2227, 2014. http://arxiv.org/abs/1406.2227?context=cs [97] Jaderberg M, Vedaldi A, Zisserman A. Deep features for text spotting. In:Proceedings of the 13th European Conference Computer Vision. Zurich, Switzerland:Springer, 2014.512-528 http://cn.bing.com/academic/profile?id=70975097&encoded=0&v=paper_preview&mkt=zh-cn [98] Wu Y C, Yin F, Liu C L. Evaluation of neural network language models in handwritten Chinese text recognition. In:Proceedings of the 13th International Conference on Document Analysis and Recognition. Tunis:IEEE, 2015.166-170 [99] Bengio Y, Schwenk H, Senécal J S, Morin F, Gauvain J L. Neural probabilistic language models. Innovations in Machine Learning. Berlin Heidelberg, Germany:Springer, 2006.137-186 [100] Chen X, Tan T, Liu X, Lanchantin P, Wan M, Gales MJF, Woodland PC. Recurrent neural network language model adaptation for multi-genre broadcast speech recognition. In:Proceedings of the 2015 International Speech Communication Association Interspeech. Dresden, Germany, 2015.3511-3515 [101] Sak H, Senior A, Rao K,ÌIrsoy O, Graves A, Beaufays F, Schalkwyk J. Learning acoustic frame labeling for speech recognition with recurrent neural networks. In:Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. South Brisbane, QLD:IEEE, 2015.4280-4284 [102] De Mulder W, Bethard S, Moens M F. A survey on the application of recurrent neural networks to statistical language modeling. Computer Speech & Language, 2015, 30(1):61-98 http://cn.bing.com/academic/profile?id=2154137718&encoded=0&v=paper_preview&mkt=zh-cn [103] He K M, Zhang X Y, Ren S Q, Sun J. Delving deep into rectifiers:surpassing human-level performance on imagenet classification. In:Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago, Chile:IEEE, 2015.1026-1034 [104] Ioffe S, Szegedy C. Batch normalization:accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167, 2015. http://cn.bing.com/academic/profile?id=2397299141&encoded=0&v=paper_preview&mkt=zh-cn [105] Fukushima K. Neocognitron:a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 1980, 36(4):193-202 doi: 10.1007/BF00344251 [106] Werbos P J. Backpropagation through time:what it does and how to do it. Proceedings of the IEEE, 1990, 78(10):1550-1560 doi: 10.1109/5.58337 [107] Littman M L. Reinforcement learning improves behaviour from evaluative feedback. Nature, 2015, 521(7553):445-451 doi: 10.1038/nature14540 [108] Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540):529-533 doi: 10.1038/nature14236 [109] Cuda-ConvNet2 [Online], available:https://github.com/akrizhevsky/cuda-convnet2, May 11, 2016 [110] Bengio Y, LeCun Y, Nohl C, Burges C. LeRec:a NN/HMM hybrid for on-line handwriting recognition. Neural Computation, 1995, 7(6):1289-1303 doi: 10.1162/neco.1995.7.6.1289 [111] Simard P Y, Steinkraus D, Platt J C. Best practices for convolutional neural networks applied to visual document analysis. In:Proceedings of the 7th International Conference on Document Analysis and Recognition. Edinburgh, UK:IEEE, 2003.958-963 [112] Caffe[Online], available:http://caffe.berkeleyvision.org/, May 11, 2016 [113] Bastien F, Lamblin P, Pascanu R, Bergstra J, Goodfellow I, Bergeron A, Bouchard N, Warde-Farley D, Bengio Y. Theano:new features and speed improvements. arXiv:1211.5590, 2012. http://cn.bing.com/academic/profile?id=2166015963&encoded=0&v=paper_preview&mkt=zh-cn [114] Bergstra J, Breuleux O, Bastien F, Lamblin P, Pascanu R, Desjardins G, Turian J, Warde-Farley D, Bengio Y. Theano:a CPU and GPU math expression compiler. In:Proceedings of the 9th Python for Scientific Computing Conference. Austin, TX, USA, 2010.1-7 http://dl.acm.org/citation.cfm?id=2912118 [115] Torch[Online], available:http://torch.ch/, May 11, 2016 [116] Lin M, Li S, Luo X, Yan S C. Purine:a bi-graph based deep learning framework. arXiv:1412.6249, 2014. [117] MXNet[Online], available:https://github.com/dmlc/mx-net,May11,2016 [118] DIGITS[Online], available:https://developer.nvidia.com/digits, May 11, 2016 [119] ConvNet[Online], available:https://code.google.com/p/cuda-convnet/, May 11, 2016 [120] DeepCNet[Online], available:http://www2.warwick.ac.u-k/fac/sci/statistics/staff/academic-research/graham/,May11,2016 [121] Xing E P, Ho Q R, Dai W, Kim J K, Wei J L, Lee S, Zheng X, Xie P T, Kumar A, Yu Y L. Petuum:a new platform for distributed machine learning on big data. IEEE Transactions on Big Data, 2015, 1(2):49-67 doi: 10.1109/TBDATA.2015.2472014 [122] Weninger F, Bergmann J, Schuller B. Introducing CURRENNT:the Munich open-source CUDA recurrent neural network toolkit. The Journal of Machine Learning Research, 2015, 16(1):547-551 [123] Minerva[Online], available:https://github.com/dmlc/min-erva,May11,2016 [124] TensorFlow[Online], available:https://github.com/tensor-flow/tensorflow,May11,2016 [125] DMTK[Online], available:https://github.com/Microsoft/DMTK,May3,2016 [126] Cireçsan D C, Meier U, Schmidhuber J. Transfer learning for Latin and Chinese characters with deep neural networks. In:Proceedings of the 2012 International Joint Conference on Neural Networks. Brisbane, QLD:IEEE, 2012.1-6 [127] Ciresan D, Meier U, Schmidhuber J. Multi-column deep neural networks for image classification. In:Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, Rhode Island:IEEE, 2012.3642-3649 [128] Bastien F, Bengio Y, Bergeron A, Boulanger-Lewandowski N, Breuel T, Chherawala Y, Cisse M, Côté M, Erhan D, Eustache J, Glorot X, Muller X, Lebeuf S P, Pascanu R, Rifai S, Savard F, Sicard G. Deep self-taught learning for handwritten character recognition. arXiv:1009.3589, 2010. [129] Chen L, Wang S, Fan W, Sun J, Naoi S. Beyond human recognition:a CNN-based framework for handwritten character recognition. In:Proceedings of the 3rd IAPR Asian Conference on Pattern Recognition. Kuala Lumpur, Malaysia:IEEE, 2015.695-699 [130] Chen K T. Integration of paths-A faithful representation of paths by noncommutative formal power series. Transactions of the American Mathematical Society, 1958, 89(2):395-407 http://cn.bing.com/academic/profile?id=2074884967&encoded=0&v=paper_preview&mkt=zh-cn [131] Lyons T. Rough paths, Signatures and the modelling of functions on streams. arXiv:1405.4537, 2014. http://econpapers.repec.org/RePEc:arx:papers:1405.4537 [132] Graham B. Fractional max-pooling. arXiv:1412.6071, 2014. http://arxiv.org/abs/1412.6071 [133] Graves A, Fernández S, Gomez F, Schmidhuber J. Connectionist temporal classification:labelling unsegmented sequence data with recurrent neural networks. In:Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh, Pennsylvania, USA:ACM, 2006.369-376 http://cn.bing.com/academic/profile?id=2168772685&encoded=0&v=paper_preview&mkt=zh-cn [134] Graves A, Schmidhuber J. Offline handwriting recognition with multidimensional recurrent neural networks. In:Proceedings of the 2009 Advances in Neural Information Processing Systems 21. Vancouver, B.C., Canada:Curran Associates, Inc., 2009.545-552 [135] Zhang X, Wang M, Wang L J, Huo Q, Li H F. Building handwriting recognizers by leveraging skeletons of both offline and online samples. In:Proceedings of the 13th International Conference on Document Analysis and Recognition. Tunis:IEEE, 2015.406-410 [136] Simistira F, Ul-Hassan A, Papavassiliou V, Gatos B, Katsouros V, Liwicki M. Recognition of historical Greek polytonic scripts using LSTM networks. In:Proceedings of the 13th International Conference on Document Analysis and Recognition. Tunis:IEEE, 2015.766-770 http://dl.acm.org/citation.cfm?id=2880878 [137] Frinken V, Uchida S. Deep BLSTM neural networks for unconstrained continuous handwritten text recognition. In:Proceedings of the 13th International Conference on Document Analysis and Recognition. Tunis:IEEE, 2015.911-915 http://dl.acm.org/citation.cfm?id=2880731 [138] Messina R, Louradour J. Segmentation-free handwritten Chinese text recognition with LSTM-RNN. In:Proceedings of the 13th International Conference on Document Analysis and Recognition. Tunis:IEEE, 2015.171-175 http://dl.acm.org/citation.cfm?id=2880731 [139] Mioulet L, Garain U, Chatelain C, Barlas P, Paquet T. Language identification from handwritten documents. In:Proceedings of the 13th International Conference on Document Analysis and Recognition. Tunis:IEEE, 2015.676-680 [140] Huang S M, Jin L W, Lv J. A novel approach for rotation free online handwritten Chinese character recognition. In:Proceedings of the 10th International Conference on Document Analysis and Recognition. Barcelona, Spain:IEEE, 2009.1136-1140 [141] Moysset B, Kermorvant C, Wolf C, Louradour J. Paragraph text segmentation into lines with recurrent neural networks. In:Proceedings of the 13th International Conference on Document Analysis and Recognition. Tunis:IEEE, 2015, 456-460 http://dl.acm.org/citation.cfm?id=2880731 [142] He P, Huang W L, Qiao Y, Loy C C, Tang X O. Reading scene text in deep convolutional sequences. arXiv:1506.04395, 2015. http://cn.bing.com/academic/profile?id=2338605913&encoded=0&v=paper_preview&mkt=zh-cn [143] Shi B G, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. arXiv:1507.05717, 2015. http://arxiv.org/abs/1507.05717 [144] iiMedia Research. 2015Q2 Report of input methods for mobile phone in China market[Online], available:http://www.iimedia.com.cn/,May11,2016 [145] 中华人民共和国国家质量监督检验检疫总局, 中国国家标准化管理委员会. GB/T18790-2010联机手写汉字识别系统技术要求与测试规程. 2011General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of China, Standardization Administration of the People's Republic of China. GB/T18790-2010 Requirements and test procedure of on-line handwriting Chinese character recognition system. 2011 [146] Long T, Jin L W. A novel orientation free method for online unconstrained cursive handwritten Chinese word recognition. In:Proceedings of the 19th International Conference on Pattern Recognition. Tampa, FL, USA:IEEE, 2008.1-4 [147] He T T, Huo Q. A character-structure-guided approach to estimating possible orientations of a rotated isolated online handwritten Chinese character. In:Proceedings of the 10th International Conference on Document Analysis and Recognition. Barcelona, Spain:IEEE, 2009.536-540 [148] 黄盛明.联机手写汉字的旋转无关识别研究[硕士学位论文].华南理工大学, 2010 http://cdmd.cnki.com.cn/article/cdmd-10561-1014063919.htmHuang S. A Study on Recognition for Rotated Isolated Online Handwritten Chinese Character[Master dissertation], South China University of Technology, China, 2010 http://cdmd.cnki.com.cn/article/cdmd-10561-1014063919.htm [149] Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar V R, Lu S J, Shafait F, Uchida S, Valveny E. ICDAR 2015 competition on robust reading. In:Proceedings of the 13th International Conference on Document Analysis and Recognition. Tunis:IEEE, 2015.1156-1160 期刊类型引用(8)
1. 郭闽榕. 基于运动想象的脑电信号特征提取研究. 信息技术与网络安全. 2021(01): 62-66 . 百度学术
2. 李昕,安占周,李秋月,蔡二娟,王欣. 基于多重多尺度熵的孤独症静息态脑电信号分析. 自动化学报. 2020(06): 1255-1263 . 本站查看
3. 陈妮,覃玉荣,孙鹏飞. 基于脑电自回归预测的实时相位估计方法. 电子测量与仪器学报. 2020(06): 183-190 . 百度学术
4. 蒋贵虎,陈万忠,马迪,吴佳宝. 基于ITD和PLV的四类运动想象脑电分类方法研究. 仪器仪表学报. 2019(05): 195-202 . 百度学术
5. 林圣琳,李伟,杨明,马萍. 考虑相关性的多元输出仿真模型验证方法. 自动化学报. 2019(09): 1666-1678 . 本站查看
6. 王金甲,党雪,杨倩,王凤嫔,孙梦然. 组LASSO罚多变量自回归模型脑电特征分工类. 高技术通讯. 2019(11): 1073-1081 . 百度学术
7. 孙小棋,李昕,蔡二娟,康健楠. 改进模糊熵算法及其在孤独症儿童脑电分析中的应用. 自动化学报. 2018(09): 1672-1678 . 本站查看
8. 杨默涵,陈万忠,李明阳. 基于总体经验模态分解的多类特征的运动想象脑电识别方法研究. 自动化学报. 2017(05): 743-752 . 本站查看
其他类型引用(17)
-