期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Speech enhancement based on undecimated wavelet packet-perceptual filterbanks and MMSE–STSA estimation in various noise environments

Hac&#x; Ergun 《Digital Signal Processing》2008,18(5):797-812

In this paper, we proposed a new speech enhancement system, which integrates a perceptual filterbank and minimum mean square error–short time spectral amplitude (MMSE–STSA) estimation, modified according to speech presence uncertainty. The perceptual filterbank was designed by adjusting undecimated wavelet packet decomposition (UWPD) tree, according to critical bands of psycho-acoustic model of human auditory system. The MMSE–STSA estimation (modified according to speech presence uncertainty) was used for estimation of speech in undecimated wavelet packet domain. The perceptual filterbank provides a good auditory representation (sufficient frequency resolution), good perceptual quality of speech and low computational load. The MMSE–STSA estimator is based on a priori SNR estimation. A priori SNR estimation, which is a key parameter in MMSE–STSA estimator, was performed by using “decision directed method.” The “decision directed method” provides a trade off between noise reduction and signal distortion when correctly tuned. The experiments were conducted for various noise types. The results of proposed method were compared with those of other popular methods, Wiener estimation and MMSE–log spectral amplitude (MMSE–LSA) estimation in frequency domain. To test the performance of the proposed speech enhancement system, three objective quality measurement tests (SNR, segSNR and Itakura–Saito distance (ISd)) were conducted for various noise types and SNRs. Experimental results and objective quality measurement test results proved the performance of proposed speech enhancement system. The proposed speech enhancement system provided sufficient noise reduction and good intelligibility and perceptual quality, without causing considerable signal distortion and musical background noise. 相似文献

2.

基于小波包与自适应维纳滤波的语音增强算法

董胡徐雨明马振中李列文任可《计算机技术与发展》2020,(1):50-53

语音增强主要用来提高受噪声污染的语音可懂度和语音质量,它的主要应用与在嘈杂环境中提高移动通信质量有关。传统的语音增强方法有谱减法、维纳滤波、小波系数法等。针对复杂噪声环境下传统语音增强算法增强后的语音质量不佳且存在音乐噪声的问题,提出了一种结合小波包变换和自适应维纳滤波的语音增强算法。分析小波包多分辨率在信号频谱划分中的作用,通过小波包对含噪信号作多尺度分解,对不同尺度的小波包系数进行自适应维纳滤波,使用滤波后的小波包系数重构进而获取增强的语音信号。仿真实验结果表明,与传统增强算法相比,该算法在低信噪比的非平稳噪声环境下不仅可以更有效地提高含噪语音的信噪比,而且能较好地保存语音的谱特征,提高了含噪语音的质量。相似文献

3.

小波包分解下的多窗谱估计语音增强算法 总被引：1，自引：0，他引：1

下载免费PDF全文

查诚杨平潘平《计算机工程》2012,38(5):291-292

传统谱减法是基于短时傅里叶变换的单一分辨率算法,具有较大方差。为此,提出一种基于小波包分解下的多窗谱估计语音增强算法。将含噪语音在小波包下分解成不同频段,在不同频段下进行多窗谱谱减运算,并逐一进行小波包重构,以得到去噪后的语音信号。仿真结果表明,该算法能提高含噪语音的信噪比,降低语言失真度。相似文献

4.

Speech enhancement based on stationary bionic wavelet transform and maximum a posterior estimator of magnitude-squared spectrum

Talbi Mourad 《International Journal of Speech Technology》2017,20(1):75-88

Numerous efforts have focused on the problem of reducing the impact of noise on the performance of various speech systems such as speech coding, speech recognition and speaker recognition. These approaches consider alternative speech features, improved speech modeling, or alternative training for acoustic speech models. In this paper, we propose a new speech enhancement technique, which integrates a new proposed wavelet transform which we call stationary bionic wavelet transform (SBWT) and the maximum a posterior estimator of magnitude-squared spectrum (MSS-MAP). The SBWT is introduced in order to solve the problem of the perfect reconstruction associated with the bionic wavelet transform. The MSS-MAP estimation was used for estimation of speech in the SBWT domain. The experiments were conducted for various noise types and different speech signals. The results of the proposed technique were compared with those of other popular methods such as Wiener filtering and MSS-MAP estimation in frequency domain. To test the performance of the proposed speech enhancement system, four objective quality measurement tests [signal to noise ratio (SNR), segmental SNR, Itakura–Saito distance and perceptual evaluation of speech quality] were conducted for various noise types and SNRs. Experimental results and objective quality measurement test results proved the performance of the proposed speech enhancement technique. It provided sufficient noise reduction and good intelligibility and perceptual quality, without causing considerable signal distortion and musical background noise. 相似文献

5.

Speech enhancement using Teager energy operated ERB-like perceptual wavelet packet decomposition

Anirban Bhowmick Mahesh Chandra Astik Biswas 《International Journal of Speech Technology》2017,20(4):813-827

In recent past, wavelet packet (WP) based speech enhancement techniques have been gaining popularity due to their inherent nature of noise minimization. WP based techniques appeared as more robust and efficient than short-time Fourier transform based methods. In the present work, a speech enhancement method using Teager energy operated equal rectangular bandwidth (ERB)-like WP decomposition has been proposed. Twenty four sub-band perceptual wavelet packet decomposition (PWPD) structure is implemented according to the auditory ERB scale. ERB scale based decomposition structure is used because the central frequency of the ERB scale distribution is similar to the frequency response of the human cochlea. Teager energy operator is applied to estimate the threshold value for the PWPD coefficients. Lastly, Wiener filtering is applied to remove the low frequency noise before final reconstruction stage. The proposed method has been applied to evaluate the Hindi sentences database, corrupted with six noise conditions. The proposed method’s performance is analysed with respect to several speech quality parameters and output signal to noise ratio levels. Performance indicates that the proposed technique outperforms some traditional speech enhancement algorithms at all SNR levels. 相似文献

6.

Bark子带小波包自适应阈值语音去噪方法

田玉静左红伟董玉民魏德生《计算机应用》2010,30(11):3111-3114

为了克服低信噪比输入下,语音增强造成清音弱分量损失,导致信号重构失真的问题,提出了一种新的语音增强方法。该方法采用小波包拟合语音感知模型的临界带,按子带能量对语音清浊音分离,然后对清音和浊音信号分别作8层和4层小波包分解,在阈值计算上采用Bark子带小波包自适应节点阈值算法,在Bark子带实时跟踪噪声水平,有效保护清音中高频弱分量,减少失真。通过与传统语音增强方法的仿真对比实验,证实该方法在低信噪比输入时,具有明显优势,输出信噪比高,语音失真度低。将该方法与谱减法相结合,进行语音二次增强,能进一步提高增强语音质量。相似文献

7.

基于Hilbert-Huang变换的语音增强技术研究

下载免费PDF全文

夏敏磊徐振俞祁焰范影乐《计算机工程与应用》2010,46(17):139-141

提出了一种基于Hilbert-Huang变换的语音增强方法。首先利用经验模态分解方法（Empirical Mode Decomposition,EMD）,选择合适的固有模态函数对含噪语音进行初步降噪,然后根据信噪比分别确定过减因子进行谱减运算。实验结果表明,与传统的谱减法相比,该方法的输出信噪比提高了5 dB以上,尤其在非稳定噪声条件下,输出性能有更为明显的改善。经Hilbert-Huang变换后得到的特征量能较为有效地描述语音信号的非线性以及非平稳特性。相似文献

8.

采用子带谱减法的语音增强

蔡宇郝程鹏侯朝焕《计算机应用》2014,34(2):567-571

为了抑制语音信号中的环境噪声,提出了一种基于子带谱减法进行噪声抑制的语音增强方法。首先通过滤波器组将时域信号分成若干个频（子）带,然后在每个子带中,独立使用改进的谱减法技术进行语音增强。由于实际环境中的背景噪声绝大多数都不是随频率均匀分布的,因此这种在不同频带内进行噪声估计和频谱相减的方法更具有针对性,且更加准确。在实际语音处理实验中证明,所提方法在达到噪声抑制效果的同时较好地保留了语音的结构,使增强后的语音具有更高的听觉舒适度和可理解度。相似文献

9.

基于阈值的小波域语音增强新算法 总被引：1，自引：0，他引：1

徐爽韩芳芳郑德忠《传感技术学报》2004,17(1):150-153

提出了一种新的基于阈值的小波域语音增强算法,采用Bark尺度小波包对含噪语音进行分解,以模拟人耳的听觉特性.采用结点阈值法,用基于谱熵的方法估计结点噪声,实验表明,该算法在多种噪声,尤其是有色噪声和非平稳噪声条件下均有较好的语音增强效果. 相似文献

10.

基于小波变换和Kalman滤波的语音增强方法 总被引：1，自引：0，他引：1

张恩东黄文浩《模式识别与人工智能》2009,22(1):28-31

针对受加性噪声干扰的语音信号,采用基于小波变换的Kalman滤波方法,提出一种有效的语音增强方法.分析在实际处理中所遇到的二进小波变换、滤波参数估计、Kalman滤波发散等问题.语音增强的效果采用信噪比来进行评估.仿真实验表明在加性噪声为高斯白噪声和色噪的情况下,该方法均具有较好的有效性. 相似文献

11.

基于阈值的小波域语音降噪新算法

付炜许山川《计算机与数字工程》2005,33(11):80-83

提出了一种新的基于阈值的小波域语音降噪算法。采用小波包对含噪语音进行分解，克服了传统的正交小波变换的缺陷。采用自适应阈值的方法，对每一尺度上的噪声最大量进行去噪，保留有用信号，可以进一步提高信噪比，仿真实验表明，该方法有更好的去噪效果。相似文献

12.

基于子带谱熵的仿生小波语音增强

刘艳倪万顺《计算机应用》2015,35(3):868-871

前端噪声处理直接关系着语音识别的准确性和稳定性,针对小波去噪算法所分离出的信号不是原始信号的最佳估计,提出一种基于子带谱熵的仿生小波变换(BWT)去噪算法。充分利用子带谱熵端点检测的精确性,区分含噪语音部分和噪声部分,实时更新仿生小波变换中的阈值,精确地区分出噪声信号小波系数,达到语音增强目的。实验结果表明,提出的基于子带谱熵的仿生小波语音增强方法与维纳滤波方法相比,信噪比(SNR)平均提高约8%,所提方法对噪声环境下语音信号有显著的增强效果。相似文献

13.

一种新阈值函数的小波包语音增强算法 总被引：1，自引：1，他引：0

任永梅张雪英贾海蓉《计算机应用研究》2013,30(1):114-116

针对传统软、硬阈值函数去噪方法增强的语音存在失真的问题,提出一种新阈值函数的小波包语音增强算法,同时给出了新阈值函数和新的Bark尺度小波包分解结构。新阈值函数在小波包系数绝对值大于给定阈值的区间内,灵活地结合了软、硬阈值函数;在小波包系数绝对值小于给定阈值的区间内,用一种非线性函数代替传统阈值函数中的简单置零,实现了阈值函数的平缓过渡;新的60个频带Bark尺度小波包分解结构能更好地模拟人耳的听觉感知特性。仿真实验结果表明,在高斯白噪声和有色噪声背景下,与传统软、硬阈值函数去噪方法相比,新算法有效提高了增强语音信噪比和分段信噪比,减少了语音失真,具有更好的去噪效果。相似文献

14.

A novel fast nonstationary noise tracking approach based on MMSE spectral power estimator

《Digital Signal Processing》2019

Estimating the noise power spectral density (PSD) from the corrupted speech signal is an essential component for speech enhancement algorithms. In this paper, a novel noise PSD estimation algorithm based on minimum mean-square error (MMSE) is proposed. The noise PSD estimate is obtained by recursively smoothing the MMSE estimation of the current noise spectral power. For the noise spectral power estimation, a spectral weighting function is derived, which depends on the a priori signal-to-noise ratio (SNR). Since the speech spectral power is highly important for the a priori SNR estimate, this paper proposes an MMSE spectral power estimator incorporating speech presence uncertainty (SPU) for speech spectral power estimate to improve the a priori SNR estimate. Moreover, a bias correction factor is derived for speech spectral power estimation bias. Then, the estimated speech spectral power is used in “decision-directed” (DD) estimator of the a priori SNR to achieve fast noise tracking. Compared to three state-of-the-art approaches, i.e., minimum statistics (MS), MMSE-based approach, and speech presence probability (SPP)-based approach, it is clear from experimental results that the proposed algorithm exhibits more excellent noise tracking capability under various nonstationary noise environments and SNR conditions. When employed in a speech enhancement system, improved speech enhancement performances in terms of segmental SNR improvements (SSNR+) and perceptual evaluation of speech quality (PESQ) can be observed. 相似文献

15.

Wavelet based speech presence probability estimator for speech enhancement

Daniel Pak-Kong Lun Tak-Wai Shen Tai-Chiu Hsung Dominic K.C. Ho 《Digital Signal Processing》2012,22(6):1161-1173

A reliable speech presence probability (SPP) estimator is important to many frequency domain speech enhancement algorithms. It is known that a good estimate of SPP can be obtained by having a smooth a-posteriori signal to noise ratio (SNR) function, which can be achieved by reducing the noise variance when estimating the speech power spectrum. Recently, the wavelet denoising with multitaper spectrum (MTS) estimation technique was suggested for such purpose. However, traditional approaches directly make use of the wavelet shrinkage denoiser which has not been fully optimized for denoising the MTS of noisy speech signals. In this paper, we firstly propose a two-stage wavelet denoising algorithm for estimating the speech power spectrum. First, we apply the wavelet transform to the periodogram of a noisy speech signal. Using the resulting wavelet coefficients, an oracle is developed to indicate the approximate locations of the noise floor in the periodogram. Second, we make use of the oracle developed in stage 1 to selectively remove the wavelet coefficients of the noise floor in the log MTS of the noisy speech. The wavelet coefficients that remained are then used to reconstruct a denoised MTS and in turn generate a smooth a-posteriori SNR function. To adapt to the enhanced a-posteriori SNR function, we further propose a new method to estimate the generalized likelihood ratio (GLR), which is an essential parameter for SPP estimation. Simulation results show that the new SPP estimator outperforms the traditional approaches and enables an improvement in both the quality and intelligibility of the enhanced speeches. 相似文献

16.

Noise Tracking Using DFT Domain Subspace Decompositions

Hendriks R.C. Jensen J. Heusdens R. 《IEEE transactions on audio, speech, and language processing》2008,16(3):541-553

All discrete Fourier transform (DFT) domain-based speech enhancement gain functions rely on knowledge of the noise power spectral density (PSD). Since the noise PSD is unknown in advance, estimation from the noisy speech signal is necessary. An overestimation of the noise PSD will lead to a loss in speech quality, while an underestimation will lead to an unnecessary high level of residual noise. We present a novel approach for noise tracking, which updates the noise PSD for each DFT coefficient in the presence of both speech and noise. This method is based on the eigenvalue decomposition of correlation matrices that are constructed from time series of noisy DFT coefficients. The presented method is very well capable of tracking gradually changing noise types. In comparison to state-of-the-art noise tracking algorithms the proposed method reduces the estimation error between the estimated and the true noise PSD. In combination with an enhancement system the proposed method improves the segmental SNR with several decibels for gradually changing noise types. Listening experiments show that the proposed system is preferred over the state-of-the-art noise tracking algorithm. 相似文献

17.

一种基于小波包变换加权自相关的基音检测算法

孙婷婷章小兵《计算机工程与科学》2017,39(8):1525-1529

噪声环境下的基音检测在语音信号处理中占有重要地位。为了有效提取低信噪比情况下的语音基音周期,提出了一种基于小波包变换加权线性预测自相关的检测方法。该方法首先利用小波包自适应阈值消除噪声,将多级小波包变换的近似分量求和以突出基音信息,并采用小波包系数加权线性预测误差自相关的方法突出基音周期处的峰值,提高了基音周期检测的精度。实验结果表明,与传统的自相关法、小波加权自相关法相比,该方法鲁棒性好,基音轨迹平滑,具有更高的准确性,即使在信噪比为-5dB时仍能取得较为理想的结果。相似文献

18.

Tracking of Nonstationary Noise Based on Data-Driven Recursive Noise Power Estimation

Erkelens J.S. Heusdens R. 《IEEE transactions on audio, speech, and language processing》2008,16(6):1112-1123

This paper considers estimation of the noise spectral variance from speech signals contaminated by highly nonstationary noise sources. The method can accurately track fast changes in noise power level (up to about 10 dB/s). In each time frame, for each frequency bin, the noise variance estimate is updated recursively with the minimum mean-square error (mmse) estimate of the current noise power. A time- and frequency-dependent smoothing parameter is used, which is varied according to an estimate of speech presence probability. In this way, the amount of speech power leaking into the noise estimates is kept low. For the estimation of the noise power, a spectral gain function is used, which is found by an iterative data-driven training method. The proposed noise tracking method is tested on various stationary and nonstationary noise sources, for a wide range of signal-to-noise ratios, and compared with two state-of-the-art methods. When used in a speech enhancement system, improvements in segmental signal-to-noise ratio of more than 1 dB can be obtained for the most nonstationary noise sources at high noise levels. 相似文献

19.

基于新阈值函数和自适应阈值的小波包语音增强研究

刘冲冲邹翔周正仙《计算机应用研究》2017,34(11)

针对传统的小波包语音增强算法增强后的语音失真严重的问题,本文提出了一种基于自适应阈值和新阈值函数的小波包语音增强算法。该算法在小波包域将带噪语音加窗分帧,基于相邻帧快速傅立叶变换功率谱的互相关值,计算各帧存在语音的概率,然后通过语音存在概率对传统通用小波包阈值进行调整,使得阈值在非语音帧中较大,在语音帧中较小,实现阈值的自适应调整,可以在最大程度消除噪声的同时,尽可能的保留语音,减小语音失真。本文还设计了一种新阈值函数,克服了传统硬阈值函数不连续和软阈值函数会带来恒定偏差的缺点,进一步减小了语音失真。本文采用TIMIT 数据库和NOISEX-92 数据库中的语音和噪声进行了大量的模拟实验,主观评比和客观评比结果均证明本文提出的语音增强算法比现有的两种算法有更好的增强效果,采用本文算法增强后的语音失真更小,听觉效果更好。相似文献

20.

拉普拉斯分布下的MMSE谱减语音增强算法

王永彪张文喜王亚慧孔新新吕彤《计算机应用》2020,40(3):878-882

针对基于高斯分布的谱减语音增强算法,增强语音出现噪声残留和语音失真的问题,提出了基于拉普拉斯分布的最小均方误差（MMSE）谱减算法。首先,对原始带噪语音信号进行分帧、加窗处理,并对处理后每帧的信号进行傅里叶变换,得到短时语音的离散傅里叶变换（DFT）系数;然后,通过计算每一帧的对数谱能量及谱平坦度,进行噪声帧检测,更新噪声估计;其次,基于语音DFT系数服从拉普拉斯分布的假设,在最小均方误差准则下,求解最佳谱减系数,使用该系数进行谱减,得到增强信号谱;最后,对增强信号谱进行傅里叶逆变换、组帧,得到增强语音。实验结果表明,使用所提算法增强的语音信噪比（SNR）平均提高了4.3 dB,与过减法相比,有2 dB的提升;在语音质量感知评估（PESQ）得分方面,与过减法相比,所提算法平均得分有10%的提高。该算法有更好的噪声抑制能力和较小的语音失真,在SNR和PESQ评价标准上有较大提升。相似文献