首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
针对基于高斯分布的谱减语音增强算法,增强语音出现噪声残留和语音失真的问题,提出了基于拉普拉斯分布的最小均方误差(MMSE)谱减算法。首先,对原始带噪语音信号进行分帧、加窗处理,并对处理后每帧的信号进行傅里叶变换,得到短时语音的离散傅里叶变换(DFT)系数;然后,通过计算每一帧的对数谱能量及谱平坦度,进行噪声帧检测,更新噪声估计;其次,基于语音DFT系数服从拉普拉斯分布的假设,在最小均方误差准则下,求解最佳谱减系数,使用该系数进行谱减,得到增强信号谱;最后,对增强信号谱进行傅里叶逆变换、组帧,得到增强语音。实验结果表明,使用所提算法增强的语音信噪比(SNR)平均提高了4.3 dB,与过减法相比,有2 dB的提升;在语音质量感知评估(PESQ)得分方面,与过减法相比,所提算法平均得分有10%的提高。该算法有更好的噪声抑制能力和较小的语音失真,在SNR和PESQ评价标准上有较大提升。  相似文献   

2.
广义Gamma模型是近年来新提出的一种语音分布模型,相对于传统的高斯或超高斯模型具有更好的普适性和灵活性,提出一种基于广义Gamma语音模型和语音存在概率修正的语音增强算法。在假设语音和噪声的幅度谱系数分别服从广义Gamma分布和Gaussian分布的基础上,推导了语音信号对数谱的最小均方误差估计式;在该模型下进一步推导了语音存在概率,对最小均方误差估计进行修正。仿真结果表明,与传统的短时谱估计算法相比,该算法不仅能够进一步提高增强语音的信噪比,而且可以有效减小增强语音的失真度,提高增强语音的主观感知质量。  相似文献   

3.
对于基于统计模型的语音增强算法,不同分布模型对应于不同的增益函数,由于语音信号的不确定性,没有一种分布函数能准确对语音和噪声谱的分布建模,因此任何一种固定的统计模型均会存在一定的误差。所以提出一种增益字典查询的语音增强算法,该算法通过采用对数谱失真准则对一个语音噪声库进行增益的训练,得到一个增益的字典,其中输入为先验信噪比和后验信噪比的估计值。最后采用ITU-T P.826 PESQ、分段信噪比、总信噪比和对数谱失真对该算法进行了测试,并与基于高斯分布模型、拉普拉斯分布模型的算法进行了对比。实验结果表明,该算法无论在非平稳噪声还是平稳噪声环境下都比其他几种算法增强效果好,且音乐噪声和残留背景噪声也可以得到很好的抑制。  相似文献   

4.
提出了一种基于χ2分布的子带噪声估计方法.带噪语音信号在临界带进行分解,并且假设子带信号服从χ2分布,然后在各个子带,采用基于χ2分布的改进最小统计量控制递归平均方法进行噪声估计.与传统的改进最小统计量控制递归平均噪声估计相比,该子带噪声估计方法可以利用人耳感知特性,并大大减少计算量.实验结果表明,提出的方法具有较好的噪声跟踪能力和较小的计算需求.采用该噪声估计的语音增强系统具有更强的噪声抑制性能和较好的增强语音信号质量.  相似文献   

5.
提出了一种基于X^2分布的子带噪声估计方法。带噪语音信号在临界带进行分解,并且假设子带信号服从X^2分布,然后在各个子带,采用基于X^2分布的改进最小统计量控制递归平均方法进行噪声估计。与传统的改进最小统计量控制递归平均噪声估计相比,该子带噪声估计方法可以利用人耳感知特性,并大大减少计算量。实验结果表明,提出的方法具有较好的噪声跟踪能力和较小的计算需求。采用该噪声估计的语音增强系统具有更强的噪声抑制性能和较好的增强语音信号质量。  相似文献   

6.
针对单通道音增强,常用的谱减法及其改进型算法等语音增强方法不能很好地消除残留音乐噪声以及不能兼顾噪声抑制和语音失真的问题,提出用维纳滤波法与掩蔽效应相结合的算法。介绍了DSP的实现方法。结果表明,该算法相对于传统谱减法和改进谱减法语音增强效果显著、失真小、音乐噪声几乎没有,并且本算法计算量小,很容易在基于DSP的语音增强算法中实现。  相似文献   

7.
针对强噪声环境下语音增强中噪声估计和先验信噪比估计算法导致的语音失真和音乐噪声的问题,利用语音和噪声的统计模型的对称性得到一种噪声幅度的估计值为参考,提出了一种噪声估计算法,改进了先验信噪比估计算法,形成了一种新的增强算法,适用于强噪声环境下的语音增强。由仿真实验给出的客观评分看出,在0 dB乃至-5 dB条件下,给出信噪比估计算法能够有效减小信号失真,基本上没有残留音乐噪声。  相似文献   

8.
联合听觉掩蔽效应的子空间语音增强算法   总被引:1,自引:0,他引:1       下载免费PDF全文
在经典子空间语音增强算法中,因语音特征值估计偏差会造成语音失真和音乐噪声。针对该问题,提出一种联合听觉掩蔽效应的语音增强算法。该算法联合掩蔽阈值自适应调节噪声特征值的抑制系数,并利用维纳滤波对音乐噪声的抑制性,对该特征值并行修正,最终还原出纯净的语音。实验结果证明,该算法在白噪声和有色噪声的背景下,与经典子空间的语音增强算法相比,能提高信噪比,减少语音失真和音乐噪声。  相似文献   

9.
针对传统的小波包语音增强算法增强后的语音失真严重的问题,本文提出了一种基于自适应阈值和新阈值函数的小波包语音增强算法。该算法在小波包域将带噪语音加窗分帧,基于相邻帧快速傅立叶变换功率谱的互相关值,计算各帧存在语音的概率,然后通过语音存在概率对传统通用小波包阈值进行调整,使得阈值在非语音帧中较大,在语音帧中较小,实现阈值的自适应调整,可以在最大程度消除噪声的同时,尽可能的保留语音,减小语音失真。本文还设计了一种新阈值函数,克服了传统硬阈值函数不连续和软阈值函数会带来恒定偏差的缺点,进一步减小了语音失真。本文采用TIMIT 数据库和NOISEX-92 数据库中的语音和噪声进行了大量的模拟实验,主观评比和客观评比结果均证明本文提出的语音增强算法比现有的两种算法有更好的增强效果,采用本文算法增强后的语音失真更小,听觉效果更好。  相似文献   

10.
一种基于噪声对消与倒谱均值相减的鲁棒语音识别方法   总被引:1,自引:0,他引:1  
提出一种基于语音增强算法的噪声鲁棒语音识别方法.在语音识别预处理阶段,通过噪声对消语音增强法来抑制噪声提高信噪比.然后对增强语音提取Mel频段倒谱特征参数,并在倒谱域应用倒谱均值相减处理来补偿增强语音中的失真成分和剩余噪声.实验结果表明,在低信噪比(-12—0 dB)条件下,该方法对于数字语音识别具有较好的识别率,其性能明显优于基本的Mel频段倒谱参数识别器、传统的谱减法和噪声对消语音增强法.  相似文献   

11.
This paper presents a robust algorithm for a voice activity detector (VAD) based on generalized autoregressive conditional heteroscedasticity (GARCH) filter, variance gamma distribution (VGD), and adaptive threshold function. GARCH models are new statistical methods that are used especially in economic time series. There is a consensus that speech signals exhibit variances that change through time. GARCH models are a popular choice to model these changing variances. A speech signal is assumed to have a VGD because the VGD has heavier tails than the Gaussian distribution (GD). The distribution of noise signal is assumed to be Gaussian. In proposed method, heteroscedasticity will be modeled by GARCH, and then the parameters of the distributions will be estimated recursively. Finally, hard detection is the result of comparing a multiple observation likelihood ratio test (MOLRT) with an adaptive threshold function. The simulation results show that the proposed VAD is able to operate down to -5 dB and in nonstationary environments  相似文献   

12.
董胡  钱盛友 《微计算机信息》2007,23(30):228-229,264
提出了一种新的基于小波和时频分解的语音端点检测方法。首先通过小渡分解对舍噪信号进行增强,然后采用Matching pursuits算法对去噪信号进行时频分解.使得信号在时频平面上具有较明显的魏格纳能量分布.最后利用该特点设定合适的门限来进行语音端点检测。实验结果表明,该方法对低信噪比的语音端点检测仍有效。  相似文献   

13.
We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods.  相似文献   

14.
We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods.  相似文献   

15.
This paper proposes a speech enhancement approach, which statistically determines an adaptive threshold using the Teager energy operated WP coefficients of noisy speech. The obtained threshold is employed upon the WP coefficients of the noisy speech by employing a modified hard thresholding function. Extensive simulations in the presence of different noises indicate that this new method is very effective for both white noise and color noise reduction from speech, resulting in enhanced speech with better speech quality. Several standard objective measures and subjective observations show that the proposed method outperforms recent state-of-the-art thresholding based approaches from high to low level SNRs.  相似文献   

16.
This paper proposes a method for enhancing speech signals contaminated by room reverberation and additive stationary noise. The following conditions are assumed. 1) Short-time spectral components of speech and noise are statistically independent Gaussian random variables. 2) A room's convolutive system is modeled as an autoregressive system in each frequency band. 3) A short-time power spectral density of speech is modeled as an all-pole spectrum, while that of noise is assumed to be time-invariant and known in advance. Under these conditions, the proposed method estimates the parameters of the convolutive system and those of the all-pole speech model based on the maximum likelihood estimation method. The estimated parameters are then used to calculate the minimum mean square error estimates of the speech spectral components. The proposed method has two significant features. 1) The parameter estimation part performs noise suppression and dereverberation alternately. (2) Noise-free reverberant speech spectrum estimates, which are transferred by the noise suppression process to the dereverberation process, are represented in the form of a probability distribution. This paper reports the experimental results of 1500 trials conducted using 500 different utterances. The reverberation time RT60 was 0.6 s, and the reverberant signal to noise ratio was 20, 15, or 10 dB. The experimental results show the superiority of the proposed method over the sequential performance of the noise suppression and dereverberation processes.  相似文献   

17.
Voice activity detection (VAD) is essential for multiple microphone arrays processing, in which massive potential devices, such as microphone devices for far-field voice-based interaction in smart home environments, will be activated when sound sources appear. Therefore, the VAD can save a lot of computing resources in massive microphone arrays processing for the sparsity in sound source activity. However, it may not be feasible to obtain an accurate VAD in harsh environments, such as far-field, time-varying noise field. In this paper, the long-term speech information (LTSI) and the log-energy are modeled for deriving a more accurate VAD. First, the LTSI can be obtained by measuring the differential entropy of long-term smoothed noisy signal spectrum. Then, the LTSI is used to get labeled data for the initialization of a Gaussian mixture model (GMM), which is used to fit the log-energy distribution of noise and (noisy) speech. Finally, combining the LTSI and the GMM parameters of noise and speech distribution, this paper derives an adaptive threshold, which represents a reasonable boundary between noise and speech. Experimental results show that our VAD method has a remarkable improvement for a massive microphone network.  相似文献   

18.
何志勇  朱忠奎 《计算机应用》2011,31(12):3441-3445
语音增强的目标在于从含噪信号中提取纯净语音,纯净语音在某些环境下会被脉冲噪声所污染,但脉冲噪声的时域分布特征却给语音增强带来困难,使传统方法在脉冲噪声环境下难以取得满意效果。为在平稳脉冲噪声环境下进行语音增强,提出了一种新方法。该方法通过计算确定脉冲噪声样本的能量与含噪信号样本的能量之比最大的频段,利用该频段能量分布情况逐帧判别语音信号是否被脉冲噪声所污染。进一步地,该方法只在被脉冲噪声污染的帧应用卡尔曼滤波算法去噪,并改进了传统算法执行时的自回归(AR)模型参数估计过程。实验中,采用白色脉冲噪声以及有色脉冲噪声污染语音信号,并对低输入信噪比的信号进行语音增强,结果表明所提出的算法能显著地改善信噪比和抑制脉冲噪声。  相似文献   

19.
针对语音信号去噪问题, 提出小波熵自适应阈值去噪法。首先利用小波变换分解带噪语音信号, 计算小波分解后信号子带区间的小波熵, 然后将小波熵和自适应阈值相结合确定各层高频系数的阈值门限, 采用折中指数阈值函数对各层高频系数进行去噪处理, 重构降噪后的语音信号, 最后对比小波熵自适应阈值、极大极小阈值、固定阈值和无偏风险阈值去噪方法的性能。实验结果表明, 当输入信噪比为5 dB时, 小波熵自适应阈值去噪法的输出信噪比是最大的, 且其输入输出信噪比曲线高于其他三种阈值去噪法的输入输出信噪比曲线, 从而证实该算法具有更好的去噪性能。  相似文献   

20.
提出了一种新的基于阈值的小波域语音降噪算法。采用小波包对含噪语音进行分解,克服了传统的正交小波变换的缺陷。采用自适应阈值的方法,对每一尺度上的噪声最大量进行去噪,保留有用信号,可以进一步提高信噪比,仿真实验表明,该方法有更好的去噪效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号