摘要:
为了减小由于说话人之间声道形状的差异而引起的非特定人语音识别系统性能的
下降,研究了两种方法,一种是基于最大似然估计的频率归正说话人自适应方法,另一种是基
于Mellin变换的语音新特征.在非特定人孤立词语音识别系统上的初步实验表明,这两种方
法都可以提高系统对不同说话人的鲁棒性,相比之下,基于Mellin变换的语音新特征具有更
好的性能,它不仅提高了系统对不同话者的识别性能,而且也使系统对不同话者的误识率的
离散程度大大减小.
Abstract:
One major source of interspeaker variability in speaker-independent
(SI) speech recognition is the variation of the vocal tract shape, especially the vocal
tract length (VTL) among individual speakers. If the model of the vocal tract is
assumed to be a uniform tube with length L, then the formant frequencies of
utterances of a given sound are inversely proportional to L. Since the VTL can vary
from approximately 13cm for females to over 18cm for males, formant center
frequencies can vary by as much as 25 %among speakers. This source of variability
results in state-of-the-art SI speech recognizers working poorly for outlier speakers
whose vocal tract shapes differ significantly from those of speakers in the training
set. In an effort to reduce the degradation in speech recognition performance caused
by variation of the VTL among speakers, two methods are investigated in this
paper. One is to remove the variability with a technique of speaker normalization.
Another is to extract new feature based on the Mellin transform (MT). Because of
the scale invariance property of the MT, the new feature is insensitive to variation
of VTL among different speakers. Experiments show that both methods can
improve the performance of an SI recognizer, while the latter approach is more
effective than the former one.