首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 156 毫秒
1.
提出一种噪声下的多数据流子带语音识别方法。传统的子带特征方法虽然能提高噪声下的语音识别性能,但通常会使无噪声情况下的识别性能下降。新方法提取感知线性预测(PLP)特征和子带特征,分别进行识别,然后在识别概率层将两者相结合。通过E-Set在NoiseX92下的白噪声的识别实验表明,新方法不仅具有更好的抗噪性能,而且同时能提高无噪声情况下的识别性能。  相似文献   

2.
提出一种用于语音识别的鲁棒特征提取算法。该算法基于子带主频率信息,实现子带主频率信息与子带能量信息相结合,在特征参数中保留语谱中子带峰值位置信息。使用该算法设计抗噪孤立词语音识别系统,分别在白高斯噪声和背景语音噪声环境下,与传统特征算法做多种信噪比对比实验。试验结果表明该特征算法在2种噪声环境下的识别率有不同程度提高,具有良好的噪声鲁棒性。  相似文献   

3.
针对提高应用多通道皮肤听声系统进行语音识别的识别率,提出了基于多频带谱减法的语音增强算法。在多通道皮肤听声的实验中,有色噪声会严重降低语音质量,进而降低皮肤听声系统语音识别的识别率,因而首次将基于多带谱减法的语音增强算法引入到皮肤听声系统中以降低有色噪声。多频带谱减法将语音频带划分为多个子频带,分别在每个子频带作不同系数的谱减运算实现语音增强。通过Matlab完成了算法仿真并通过DSP硬件实现了算法并将增强后的语音信号输出给皮肤听声系统,实验证明此设计能够有效抑制有色噪声,增强皮肤听声系统的可靠性和实用性。  相似文献   

4.
一种基于子带处理的PAC说话人识别方法研究   总被引:1,自引:1,他引:0  
目前,说话人识别系统对于干净语音已经达到较高的性能,但在噪声环境中,系统的性能急剧下降.一种基于子带处理的以相位自相关(PAC)系数及其能量作为特征的说话人识别方法,即宽带语音信号经Mel滤波器组后变为多个子带信号,对各个子带数据经DCT变换后提取PAC系数作为特征参数,然后对每个子带分别建立HMM模型进行识别,最后在识别概率层中将HMM得出的结果相结合之后得到最终的识别结果.实验表明,该方法在不同信噪比噪声和无噪声情况下的识别性能都有很大提高.  相似文献   

5.
为提高说话人识别中语音特征参数对噪声的鲁棒性,本文提出在对语音进行小波包分解基础上,分析噪声的特性,在不同子带内进行谱减并设立权重,提出了一种新的语音特征参数多层美尔倒谱系数.仿真实验表明,与MFCC特征参数相比,ML-MFCC在噪声环境下具有更好的抗噪性能和说话人识别率.  相似文献   

6.
基于顺序统计滤波的实时语音端点检测算法   总被引:1,自引:0,他引:1  
针对嵌入式语音识别系统,提出了一种高效的实时语音端点检测算法. 算法以子带频谱熵为语音/噪声的区分特征, 首先将每帧语音的频谱划分成若干个子带, 计算出每个子带的频谱熵, 然后把相继若干帧的子带频谱熵经过一组顺序统计滤波器获得每帧的频谱熵, 根据频谱熵的值对输入的语音进行分类. 实验结果表明, 该算法能够有效地区分语音和噪声, 可以显著地提高语音识别系统的性能. 在不同的噪声环境和信噪比条件下具有鲁棒性. 此外, 本文提出的算法计算代价小, 简单易实现, 适合实时嵌入式语音识别系统的应用.  相似文献   

7.
针对多噪声环境下的语音识别问题,提出了将环境噪声作为语音识别上下文考虑的层级语音识别模型。该模型由含噪语音分类模型和特定噪声环境下的声学模型两层组成,通过含噪语音分类模型降低训练数据与测试数据的差异,消除了特征空间研究对噪声稳定性的限制,并且克服了传统多类型训练在某些噪声环境下识别准确率低的弊端,又通过深度神经网络(DNN)进行声学模型建模,进一步增强声学模型分辨噪声的能力,从而提高模型空间语音识别的噪声鲁棒性。实验中将所提模型与多类型训练得到的基准模型进行对比,结果显示所提层级语音识别模型较该基准模型的词错率(WER)相对降低了20.3%,表明该层级语音识别模型有利于增强语音识别的噪声鲁棒性。  相似文献   

8.
语音信号端点检测的实验研究   总被引:1,自引:0,他引:1  
徐刚  徐华中 《福建电脑》2006,(1):77-77,53
端点检测是语音识别中的一项关键技术,端点检测的准确性对语音识别的性能有很大影响,特别是对端点检测比较敏感的语音识别算法。本文通过实验证明采用LPCMFCC的带噪声端点检测改进方法在白噪声低信噪比下性能明显优于基于能量和常规倒谱距离的检测方法。它消除了噪声的影响,具有很好的鲁棒性,且具有较强的实际应用价值。  相似文献   

9.
为了在复杂的噪声环境中区分出语音信号和非语音信号(噪声),提出了一种基于小波及能量熵的带噪语音端点检测方法.该方法利用小波的多分辨率特性以及它对非平稳信号局部特征的表现能力,对含噪语音信号进行小波变换,用各层能量熵值的平均值来有效地区分语音段和非语音段.不同背景噪声及不同信噪比下的实验结果表明,提出的带噪语音端点检测算法获得了较高的检测正确率.  相似文献   

10.
基于子带GMM-UBM的广播语音多语种识别   总被引:2,自引:0,他引:2  
提出了一种基于概率统计模型的与语言内容无关的语种识别方法,它不需要掌握各语种的专业语言学知识就可以实现几十种语言的语种识别;并针对广播语音噪声干扰大的特点,采用GMM-UBM模型作为语种模型,提高了系统的噪声鲁棒性;由于广播语音的背景噪声不是简单的全频带加性白噪声,因此本文构建了一种基于子带GMM-UBM模型的多子系统结构的语种识别系统,后端采用神经网络进行系统级融合。本文通过对37种语言及方言的识别实验,证明了子带GMM-UBM方法的有效性。  相似文献   

11.
提出了一种适应复杂环境下的高效的实时语音端点检测算法,给出了每帧声信号在滤波中的噪声功率谱的推算方法。先将每帧语音的频谱进行迭代维纳滤波,再将它划分成若干个子带并计算出每个子带的频谱熵,然后把相继若干帧的子带频谱熵经过一组中值滤波器获得每帧的频谱熵,根据频谱熵的值对输入的语音进行分类。实验结果表明,该算法能够有效地区分语音和噪声,可以显著地提高语音识别系统的性能,在不同的噪声环境条件下具有鲁棒性。该算法计算代价小,简单易实现,适合实时语音识别系统的应用。  相似文献   

12.
Speech recognizers achieve high recognition accuracy under quiet acoustic environments, but their performance degrades drastically when they are deployed in real environments, where the speech is degraded by additive ambient noise. This paper advocates a two phase approach for robust speech recognition in such environment. Firstly, a front end subband speech enhancement with adaptive noise estimation (ANE) approach is used to filter the noisy speech. The whole noisy speech spectrum is portioned into eighteen dissimilar subbands based on Bark scale and noise power from each subband is estimated by the ANE approach, which does not require the speech pause detection. Secondly, the filtered speech spectrum is processed by the non parametric frequency domain algorithm based on human perception along with the back end building a robust classifier to recognize the utterance. A suite of experiments is conducted to evaluate the performance of the speech recognizer in a variety of real environments, with and without the use of a front end speech enhancement stage. Recognition accuracy is evaluated at the word level, and at a wide range of signal to noise ratios for real world noises. Experimental evaluations show that the proposed algorithm attains good recognition performance when signal to noise ratio is lower than 5 dB.  相似文献   

13.
刘艳  倪万顺 《计算机应用》2015,35(3):868-871
前端噪声处理直接关系着语音识别的准确性和稳定性,针对小波去噪算法所分离出的信号不是原始信号的最佳估计,提出一种基于子带谱熵的仿生小波变换(BWT)去噪算法。充分利用子带谱熵端点检测的精确性,区分含噪语音部分和噪声部分,实时更新仿生小波变换中的阈值,精确地区分出噪声信号小波系数,达到语音增强目的。实验结果表明,提出的基于子带谱熵的仿生小波语音增强方法与维纳滤波方法相比,信噪比(SNR)平均提高约8%,所提方法对噪声环境下语音信号有显著的增强效果。  相似文献   

14.
In this paper, we introduce Subband LIkelihood-MAximizing BEAMforming (S-LIMABEAM), a new microphone-array processing algorithm specifically designed for speech recognition applications. The proposed algorithm is an extension of the previously developed LIMABEAM array processing algorithm. Unlike most array processing algorithms which operate according to some waveform-level objective function, the goal of LIMABEAM is to find the set of array parameters that maximizes the likelihood of the correct recognition hypothesis. Optimizing the array parameters in this manner results in significant improvements in recognition accuracy over conventional array processing methods when speech is corrupted by additive noise and moderate levels of reverberation. Despite the success of the LIMABEAM algorithm in such environments, little improvement was achieved in highly reverberant environments. In such situations where the noise is highly correlated to the speech signal and the number of filter parameters to estimate is large, subband processing has been used to improve the performance of LMS-type adaptive filtering algorithms. We use subband processing principles to design a novel array processing architecture in which select groups of subbands are processed jointly to maximize the likelihood of the resulting speech recognition features, as measured by the recognizer itself. By creating a subband filtering architecture that explicitly accounts for the manner in which recognition features are computed, we can effectively apply the LIMABEAM framework to highly reverberant environments. By doing so, we are able to achieve improvements in word error rate of over 20% compared to conventional methods in highly reverberant environments.  相似文献   

15.
为了抑制语音信号中的环境噪声,提出了一种基于子带谱减法进行噪声抑制的语音增强方法。首先通过滤波器组将时域信号分成若干个频(子)带,然后在每个子带中,独立使用改进的谱减法技术进行语音增强。由于实际环境中的背景噪声绝大多数都不是随频率均匀分布的,因此这种在不同频带内进行噪声估计和频谱相减的方法更具有针对性,且更加准确。在实际语音处理实验中证明,所提方法在达到噪声抑制效果的同时较好地保留了语音的结构,使增强后的语音具有更高的听觉舒适度和可理解度。  相似文献   

16.
A novel spatio-temporal filter for video denoising, which operates entirely in the wavelet domain, is proposed. For effective noise reduction, the spatial and temporal redundancies that exist in the wavelet domain representation of a video signal are exploited. First, a 2D discrete wavelet transform is applied to the input noisy frames. This is followed by a discrete cosine transform (DCT), which is applied to the temporal subband coefficients to minimise the redundancy among the consecutive frames. The DCT transformed, noise-free coefficients in the different wavelet domain subbands for the original image sequence are modelled using a prior having a generalised Gaussian distribution. On the basis of this prior, filtering of the noisy wavelet coefficients in each subband is carried out using a new, low-complexity wavelet shrinkage method, which utilises the correlation that exists between subsequent resolution levels. Experimental results show that the proposed scheme outperforms several state-of-the-art spatio-temporal filters in terms of both the peak signal-to-noise ratio and the visual quality  相似文献   

17.
A new robust microphone array method to enhance speech signals generated by a moving person in a noisy environment is presented. This blind approach is based on a two-stage scheme. First, a subband time-delay estimation method is used to localize the dominant speech source. The second stage involves speech enhancement, based on the acquired spatial information, by means of a soft-constrained subband beamformer. The novelty of the proposed method involves considering the spatial spreading of the sound source as equivalent to a time-delay spreading, thus, allowing for the estimated intersensor time-delays to be directly used in the beamforming operations. In comparison to previous approaches, this new method requires no special array geometry, knowledge of the array manifold, or acquisition of calibration data to adapt the array weights. Furthermore, such a scheme allows for the beamformer to efficiently adapt to speaker movement. The robustness of the time-delay estimation of speech signals in high noise levels is improved by making use of the non-Gaussian nature of speech trough a subband Kurtosis-weighted structure. Evaluation in a real environment with a moving speaker shows promising results, with suppression levels of up to 16 dB for background noise and interfering (speech) signals, associated to a relatively small effect of speech distortion.  相似文献   

18.
针对抗噪声语音特征技术和基于MFCC特征的模型补偿技术在低信噪比时识别率不高的缺点,将抗噪声语音特征和模型补偿结合起来,提出了一种基于单边自相关序列(One—sided autocorrelation,OSA)MFCC特征的模型补偿噪声语音识别方法,以提高语音识别系统在低信噪比时的性能。对0~9十个英文数字和NOISEX92中的白噪声、F16噪声和FACTORY噪声的识别实验结果表明.本文提出的识别方法可以有效地提高OSA—MFCC识别器在噪声环境中的识别率,并且在低信噪比时其性能明显优于经过相同补偿处理的MFCC识别器。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号