-
摘要: 序列标注问题是自然语言处理领域的基本问题之一. 序列标注任务是将连续输入的不定长序列, 标注成连续等长的标签序列. 在在线序列标注方法的基本框架下, 针对序列标注任务的特征稀疏特性, 采用置信度加权分类算法思想, 提出了一种新的线性判别式在线序列标注方法---置信度加权在线序列标注算法. 该方法对每个特征权值参数引入一个概率置信度, 取得了优于其他相关算法的性能. 在中文分词, 中文名实体识别以及英文组块分析等问题上, 验证了本文方法的有效性.Abstract: Sequence labeling problem is a basic problem in natural language processing field. The task of sequence labeling is to label an input sequence with a label sequence of the same length. Under the fundamental framework of sequence labeling methods, a new online sequence labeling linear algorithm---confidence-weighted online sequence labeling algorithm---was presented for the characteristic of sequence labeling task with sparse features, based on confidence-weighted classification. This algorithm introduced a probabilistic measure of confidence for each parameter of features, and showed better performance than other relative algorithms. Experiments on Chinese segmentation, Chinese named entity recognition and English chunking validated the effectiveness of the proposed algorithm.
计量
- 文章访问数: 2211
- HTML全文浏览量: 96
- PDF下载量: 1164
- 被引次数: 0