首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 354 毫秒
1.
周峰  俞一彪 《信号处理》2017,33(9):1215-1220
汉语数字语音之间的高混淆性直接影响了汉语数字语音识别的效果,传统的语音识别方法很难对易混淆的语音做出有效的区分。本文提出了一种多参数、多级识别策略,先采用MEL谱参数基于HMM进行初级数字语音识别,然后对易混淆的数字对采用一种新的群延时谱参数——RRCGD-CC(Reflected Roots Chirp Group Delay-Cepstral Coefficients)基于SVM进行二次分类。实验结果表明,通过多参数多级识别方法,数字“2”和“8”的识别率提高了8%,数字识别系统的整体识别率提高了2.3%。这一结果充分说明了本文提出的多参数多级识别方法有利于提高汉语数字语音识别系统的识别性能,同时也说明了RRCGD-CC在易混淆数字语音的识别上是有效的。   相似文献   

2.
陈斌  牛铜  张连海  屈丹  李弼程 《电子学报》2016,44(12):2924-2931
为了提高基于分帧特征变换方法的稳定性,提出了一种基于分段的区分性特征变换方法.该方法将特征变换当成高维信号的稀疏逼近问题,采用状态绑定的方法训练得到基于域划分的线性变换矩阵(Region Dependent Linear Transform,RDLT)和基于最小音素错误准则均值补偿的特征(mean-offset feature Minimum Phone Error,m-fMPE)变换矩阵,将两者的特征变换矩阵构成过完备的字典;采用强制对齐的方式对语音信号进行分段,以似然度最大化作为目标函数,利用匹配追踪算法对目标函数迭代优化,自动地确定各语音信号段中的变换矩阵及其系数.为保证特征变换的稳定性,在选择变换矩阵过程中引入相关度测量,去除相关的特征基矢量.实验结果表明,相比于传统的RDLT方法,当声学模型分别采用最大似然和区分性准则训练时,识别性能分别可以提高1.63%和2.23%.该方法同时能应用于语音增强和模型区分性训练中.  相似文献   

3.
提出了一种结合韵律信息的高性能汉语连续数字语音识别算法,该识别算法基于CHMM(连续隐马尔可夫模型),采用MFCC(MEL频率倒谱系数)为主要语音特征参数,结合韵律信息进行连续数字精确分割,能够有效区分易混数字。算法采用两级识别框架来提高语音识别率,其中,第1级对连续数字分割,在此基础上进行数字语音识别,输出各候选结果,第2级在候选结果中确定易混数字对,并运用韵律信息进一步选择正确结果。实验表明,最终汉语连续数字语音识别率有很大提高。  相似文献   

4.
提出一种基于自组织神经网络的数字语音识别模型。首先用基于小波变换和线性预测的特征提取方法提取语音信号特征,用自组织神经网络进行识别判决。这种语音识别方法适合于小词汇量的孤立词识别,网络结构简单,所需训练数据十分的少,实时性能好。用MATLAB进行仿真实验,识别率达到98%。  相似文献   

5.
范京  陈永宁  刘惠华 《信号处理》2005,21(Z1):180-183
本文提出了一种新的汉语语音模型-多组状态转移顺序聚类模型(MSSC).该模型采用了马尔可夫过程的状态转移方式,描述汉语语音的特征矢量序列的时序过程,同时采用了动态时间匹配DTW的比对概念和直接特征状态而不是HMM法的隐状态.新的模型具有多组子模型特点,从而对语音速度的变化,语音轻重的变化等有较强的适应能力.在描述状态转移方面,增加了记录各状态的自转移次数,用其作权重可更好地利用特征信息,提高识别率.另外,该方法的物理意义明显,故可以根据不同的特征矢量,进行合理的加权评判,且可以方便地扩充特征矢量的种类,更好的利用了汉语语音中的有用信息,进一步提高识别的正确率.本文从原理上及实际的测算结果证实了新方法的有效性.  相似文献   

6.
基于段长分布的HMM语音识别模型   总被引:23,自引:0,他引:23       下载免费PDF全文
王作英  肖熙 《电子学报》2004,32(1):46-49
本文针对齐次HMM语音识别模型在使用段长信息时存在的缺陷,形式化地定义了一种适合语音信号描述的自左向右非齐次隐含马尔科夫模型,证明了这种模型的状态转移概率表示与状态段长表示的等效性,并在此基础上提出了基于段长分布的HMM模型(DDBHMM).非特定人连续语音实验结果表明,仅仅利用状态段长信息的DDBHMM语音识别模型比经典HMM模型的性能有了明显的提高(误识率降低了17.8%),展示了DDBHMM的良好的性能,为语音信号的时长、语速、时间断续性以及语音特征的相关性等重要特征的描述和利用开辟了空间.  相似文献   

7.
设计了一种基于连续概率密度隐含马尔科夫模型的汉语语音识别系统。在实时录音的情况下,利用该语音识别系统,不同的人对10条2~4个字的语音命令进行识别,准确率可达到90%,识别时间1.5~3s。  相似文献   

8.
研究了一种汉语数字语音识别方案,首先提取汉语数字语音线性预测倒谱系数(LPCC)和梅尔频率倒谱系数(MFCC)及其一阶差分,并组合成新特征。通过求取其系数矩阵的均值和方差的方式进行一次降维,然后采用基于关联规则的特征选择算法进行二次降维,并采用C4.5决策树算法进行识别。通过实验表明本文提出的方法能够有效降低特征维度,去除了无用的冗余信息,提高了语音识别率。  相似文献   

9.
录音设备来源识别是通过分析已获取的数字语音信号从而确定其录制设备的一种技术,属于数字音频盲取证.本文提出了一种基于改进PNCC特征和两步区分性训练的录音设备识别方法,由于音频中的静音包含了完整的设备信息,且不受说话人和文本等因素的影响,因此从静音段提取改进的PNCC特征,利用了PNCC的长时帧分析去除背景噪声对设备信息的影响.在模型方面,以GMM-UBM为基准模型,并通过两步区分性训练调整集内设备模型和通用背景模型,提升模型区分能力.该方法对于30种设备闭集识别的平均正确识别率为90.23%;对于15个集内和15个集外设备的测试,等错误率为15.17%,集内平均正确识别率为96.65%,验证了本文算法的有效性.  相似文献   

10.
《现代电子技术》2015,(13):59-62
高斯混合模型(GMM)由于通过改变高斯的混合度,能够逼近任意概率分布,所以在语音识别领域应用广泛。对高斯混合模型的训练,常见的训练方法是最大似然估计(MLE),这种训练方法能最大程度拟合所有样本的分布,但没有考虑模型之间的相互影响,导致识别过程会出现混淆情况;区分性模型训练算法,适合应用于大数据量复杂组合类别的区分问题。这里提出采用的区分性模型训练方法,其原则是最小化分类错误风险,通过更精确细致地刻画不同模型之间的分类面,提升识别的效果。实验结果表明,该训练方法比最大似然估计的训练方法在多类别语音检出任务中具有更好的识别效果。  相似文献   

11.
To utilize the supra-segmental nature of Mandarin tones, this article proposes a feature extraction method for hidden markov model (HMM) based tone modeling. The method uses linear transforms to project F0 (fundamental frequency) features of neighboring syllables as compensations, and adds them to the original F0 features of the current syllable. The transforms are discriminatively trained by using an objective function termed as "minimum tone error", which is a smooth approximation of tone recognition accuracy. Experiments show that the new tonal features achieve 3.82% tone recognition rate improvement, compared with the baseline, using maximum likelihood trained HMM on the normal F0 features. Further experiments show that discriminative HMM training on the new features is 8.78% better than the baseline.  相似文献   

12.
Discriminative metric design for robust pattern recognition   总被引:2,自引:0,他引:2  
Motivated by the development of discriminative feature extraction (DFE), many researchers have come to realize the importance of designing a front-end feature extraction unit with an appropriate link to backend classification. This paper proposes an advanced formalization of DFE, which we call the discriminative metric design (DMD), and elaborates on its exemplar implementation by using a simple, linear feature transformation matrix. The resulting DMD implementation is shown to have a close relationship to various discriminative pattern recognizers, including artificial neural networks. The utility of the proposed method is clearly demonstrated in speech pattern recognition experiments  相似文献   

13.
噪声下差分复合子带语音识别方法   总被引:4,自引:0,他引:4  
蒋文建  韦岗 《通信学报》2002,23(1):18-24
本文根据子带特征反映语音信号局部特性和全带特征反映语音信号整体特性的事实,提出了 一种差分复合子带语音识别新方法。先用频谱差分减少噪声的干扰,再将多子带特征识别概率与全带特征识别概率相结合进行综合判决,以得到最终识别结果。将新方法应用于TIMIT数据包0-9十个英文数字和E-Set在NoiseX92的白噪声和F16战机噪声下的识别实验。实验结果表明新方法比传统方法识别性能有很大提高。  相似文献   

14.
Mandarin speech is known for its tonal characteristic, and prosodic information plays an important role in Mandarin speech recognition. Driven by this property, phonetic and prosodic information are integrated and used for Mandarin telephone speech keyword spotting. A two-stage strategy, with recognition followed by verification, is adopted. For keyword recognition, 132 subsyllable models, two general acoustic filler models and one background/silence model are separately trained and used as the basic recognition units. For utterance verification, 12 anti-subsyllable models, 175 context-dependent prosodic models and five anti-prosodic models are constructed. A keyword verification function combining phonetic-phase and prosodic-phase verification is investigated. Using a test set of 3088 conversational speech utterances from 33 speakers (20 males and 13 females) and a vocabulary of 2583 faculty names, at 8.5% false rejection, the proposed verification method results in an 18.3% false alarm rate. Furthermore, this method is able correctly to reject 90.9% of non-keywords. Comparison with a baseline system without prosodic-phase verification shows that prosodic information can benefit the verification performance  相似文献   

15.
Currently, phonotactic spoken language recognition (SLR) and acoustic SLR systems are widely used language recognition systems. Parallel phone recognition followed by vector space modeling (PPRVSM) is one typical phonotactic system for spoken language recognition. To achieve better performance, researchers assumed to extract more complementary information of the training data using phone recognizers trained for multiple language-specific phone recognizers, different acoustic models and acoustic features. These methods achieve good performance but usually compute at high computational cost and only using complementary information of the training data. In this paper, we explore a novel approach to discriminative vector space model (VSM) training by using a boosting framework to use the discriminative information of test data effectively, in which an ensemble of VSMs is trained sequentially. The effectiveness of our boosting variation comes from the emphasis on working with the high confidence test data to achieve discriminatively trained models. Our variant of boosting also includes utilizing original training data in VSM training. The discriminative boosting algorithm (DBA) is applied to the National Institute of Standards and Technology (NIST) language recognition evaluation (LRE) 2009 task and show performance improvements. The experimental results demonstrate that the proposed DBA shows 1.8 %, 11.72 % and 15.35 % relative reduction for 30s, 10s and 3s test utterances in equal error rate (EER) than baseline system.  相似文献   

16.
The current study puts forward a supervised within-class-similar discriminative dictionary learning (SCDDL) algorithm for face recognition. Some popular discriminative dictionary learning schemes for recognition tasks always incorporate the linear classification error term into the objective function or make some discriminative restrictions on representation coefficients. In the presented SCDDL algorithm, we propose to directly restrict the representation coefficients to be similar within the same class and simultaneously include the linear classification error term in the supervised dictionary learning scheme to derive a more discriminative dictionary for face recognition. The experimental results on three large well-known face databases suggest that our approach can enhance the fisher ratio of representation coefficients when compared with several dictionary learning algorithms that incorporate linear classifiers. In addition, the learned discriminative dictionary, the large fisher ratio of representation coefficients and the simultaneously learned classifier can improve the recognition rate compared with some state-of-the-art dictionary learning algorithms.  相似文献   

17.
Conformer模型因其优越的性能,吸引了越来越多研究者的关注,逐渐成为语音识别领域的主流模型,但因其采用注意力机制从输入中提取信息,需要对输入序列中所有样本点进行交互计算,导致网络计算复杂度为输入序列长度的平方,因此在对长语音进行识别时需要消耗更多计算资源,其识别速度较慢。针对此问题,本文提出一种线性注意力机制的语音识别方法。首先,提出一种新型门控线性注意力结构将多头注意力改进为单头,将注意力计算复杂度改进为序列长度的线性关系,以有效减少注意力计算复杂度。其次,为了弥补使用线性注意力导致的模型建模能力下降,在线性注意力求解过程中,综合使用局部注意力和全局注意力,联合线性注意力编码,提高模型识别精度。最后,为了进一步提升模型识别效果,在注意力损失和连接时序分类(connectionist temporal classification, CTC)损失的基础上使用注意力引导损失和中间CTC损失融合建模目标函数。在中文普通话数据集AISHELL-1和英文LibriSpeech数据集上的实验结果表明,改进模型的性能明显优于基线模型,且模型显存消耗下降,训练、识别速度得到较大提升。  相似文献   

18.
This paper presents a combined microphone array and model adaptation algorithm for hands-free speech recognition. Our purpose is to remove the inconvenience of using head-mounted/hand-holding microphone in conventional speech recognizer. To improve the speech quality with car noise interference, a linear microphone array is applied and acted as robust acquisition system. A time-domain coherence measure (TDCM) is applied to reliably estimate the time delay for speech signals collected by different microphones. The estimated delay is adopted in a delay-and-sum beamformer for speech enhancement. Further, we adapt the speech hidden Markov models to get close to the acoustic conditions of the enhanced test speech for robust speech recognition. In acquisition and recognition experiments using connected Chinese digits, we found that TDCM can effectively estimate the time delay. The increase in the speech sampling rate is helpful to determine the time delay. Incorporating the model adaptation scheme significantly reduces the recognition errors with moderate computation overhead.  相似文献   

19.
根据联合傅立叶变换相关识别理论,首先计算模拟其识别过程,然后结合光学傅立叶变换相关识别系统的特点,应用数字图像处理的方法对光学联合傅立叶变换相关识别系统进行改进。实验表明:改进后的光电联合傅立叶变换相关识别系统不仅对相关输出结果有一定增强作用,而且还能够识别出传统光学傅立叶变换相关识别系统所不能识别的对象。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号