首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 976 毫秒
1.
Speech recognizers achieve high recognition accuracy under quiet acoustic environments, but their performance degrades drastically when they are deployed in real environments, where the speech is degraded by additive ambient noise. This paper advocates a two phase approach for robust speech recognition in such environment. Firstly, a front end subband speech enhancement with adaptive noise estimation (ANE) approach is used to filter the noisy speech. The whole noisy speech spectrum is portioned into eighteen dissimilar subbands based on Bark scale and noise power from each subband is estimated by the ANE approach, which does not require the speech pause detection. Secondly, the filtered speech spectrum is processed by the non parametric frequency domain algorithm based on human perception along with the back end building a robust classifier to recognize the utterance. A suite of experiments is conducted to evaluate the performance of the speech recognizer in a variety of real environments, with and without the use of a front end speech enhancement stage. Recognition accuracy is evaluated at the word level, and at a wide range of signal to noise ratios for real world noises. Experimental evaluations show that the proposed algorithm attains good recognition performance when signal to noise ratio is lower than 5 dB.  相似文献   

2.
基于听觉掩蔽效应的MMSE语音增强算法   总被引:2,自引:2,他引:0  
针对MMSE语音增强算法低信噪比时产生较大的语音畸变的缺点,提出了一种结合人耳听觉掩蔽效应的MMSE语音增强算法。该算法利用掩蔽阈值来调整MMSE算法中的增益值,使得增强后的语音信号残留噪声和语音畸变较小。通过计算机仿真对增强前后语音信号的信噪比分析以及主观试听表明:改进的MMSE语音增强算法不仅提高了语音信号的信噪比,而且减少了语音畸变,提高了语音的可懂度。  相似文献   

3.
This paper addresses the problem of acoustic noise reduction and speech enhancement by adaptive filtering algorithms. Most speech enhancement methods and algorithms which use adaptive filtering structure are generally expressed in fullband form. One of these widespread structures is the Forward Blind Source Separation Structure (FBSS). This FBSS structure is often used to separate speech form noise and therefore enhance the speech signal at the processing output. In this paper, we propose a new subband implementation of this FBSS structure. In order to give more robustness to the proposed structure, we adapt then we apply to this subband structure a new combination of criteria based on the system mismatch and the smoothing filtering errors minimizations. The combination between this proposed subband structure with this optimal criteria allows to obtain a new two-channel subband forward (2CSF) algorithm that improves the convergence speed of the cross adaptive filters which are used to separate speech from noise. Objective tests under various environments are presented showing the good behavior of the proposed 2CSF algorithm.  相似文献   

4.
盲源分离在单通道语音增强算法中的应用   总被引:1,自引:1,他引:0  
提出一种单通道语音增强算法。首先由接收到的单声道语音信号的含噪部分构造一个假想噪声源,将这一噪声源和含噪的信号作为多通道自适应去相关(MAD)盲分离算法的输入,得到增强的语音信号。进一步将这一增强的语音作为输入,利用Daubechies小波对其进行分解,在小波域中选取合适的阈值函数进行滤波,然后合成时域语音信号。根据以上步骤得到的增强语音有较高的信噪比及可懂度。  相似文献   

5.
In this paper, we propose a speech enhancement method where the front-end decomposition of the input speech is performed by temporally processing using a filterbank. The proposed method incorporates a perceptually motivated stationary wavelet packet filterbank (PM-SWPFB) and an improved spectral over-subtraction (I-SOS) algorithm for the enhancement of speech in various noise environments. The stationary wavelet packet transform (SWPT) is a shift invariant transform. The PM-SWPFB is obtained by selecting the stationary wavelet packet tree in such a manner that it matches closely the non-linear resolution of the critical band structure of the psychoacoustic model. After the decomposition of the input speech, the I-SOS algorithm is applied in each subband, separately for the estimation of speech. The I-SOS uses a continuous noise estimation approach and estimate noise power from each subband without the need of explicit speech silence detection. The subband noise power is estimated and updated by adaptively smoothing the noisy signal power. The smoothing parameter in each subband is controlled by a function of the estimated signal-to-noise ratio (SNR). The performance of the proposed speech enhancement method is tested on speech signals degraded by various real-world noises. Using objective speech quality measures (SNR, segmental SNR (SegSNR), perceptual evaluation of speech quality (PESQ) score), and spectrograms with informal listening tests, we show that the proposed speech enhancement method outperforms than the spectral subtractive-type algorithms and improves quality and intelligibility of the enhanced speech.  相似文献   

6.
卷积神经网络的感受野大小与卷积核的尺寸相关,传统的卷积采用了固定大小的卷积核,限制了网络模型的特征感知能力;此外,卷积神经网络使用参数共享机制,对空间区域中所有的样本点采用了相同的特征提取方式,然而带噪频谱图噪声信号与干净语音信号的分布存在差异,特别是在复杂噪声环境下,使得传统卷积方式难以实现高质量的语音信号特征提取和过滤.为了解决上述问题,提出了多尺度区域自适应卷积模块,利用多尺度信息提升模型的特征感知能力;根据对应采样点的特征值自适应地分配区域卷积权重,实现区域自适应卷积,提升模型过滤噪声的能力.在TIMIT公开数据集上的实验表明,提出的算法在语音质量和可懂度的评价指标上取得了更优的实验结果.  相似文献   

7.
复杂环境中噪声干扰严重影响语音信号的质量,无法正确传达语义,因此语音增强处理十分必要。传统语音增强技术存在适应性差、输入信号高度相关时收敛速度慢等问题。综合变步长最小均方(VSSLMS)算法与解相关的优点,提出了一种改进的语音增强算法,优化自适应滤波算法中步长的大小和权矢量的更新方向,提高语音降噪收敛速度。同时算法引入了连续块处理理论归一化权矢量,以提高其在嵌入式系统实现上的稳定性。仿真测试表明该算法收敛速度快、跟踪性能强,能有效去除强噪语音信号中的噪声,提高语音的清晰度与可懂度。  相似文献   

8.
针对语音系统受外界强噪声干扰而导致识别精度降低以及通信质量受损的问题,提出一种基于自适应噪声估计的语音增强方法。通过端点检测将语音信号分为语音段与非语音段,对这两种情况的噪声幅度谱分别进行自适应估计,并对谱减法中不具有通用性的假设进行研究从而改进原理公式。实验结果表明,相对于传统谱减法,该方法能更好地抑制音乐噪声,并保持较高清晰度和可懂度,提高了强噪声环境下的语音识别精度和通信质量。  相似文献   

9.
Speaker recognition faces many practical difficulties, among which signal inconsistency due to environmental and acquisition channel factors is most challenging. The noise imposed to the voice signal varies greatly and a priori noise model is usually unavailable. In this article, we propose a robust speaker recognition method that employs a novel adaptive wavelet shrinkage method for noise suppression. In our method, wavelet subband coefficient thresholds are automatically computed, which are proportional to the noise contamination. In the application of wavelet shrinkage for noise removal, a dual-threshold strategy is developed to suppress noise, preserve signal coefficients and minimize the introduction of artifacts. The recognition is achieved using modification of Mel-frequency cepstral coefficient of overlapped voice signal segments. The efficacy of our method is evaluated with voice signals from two public available speech signal databases and is compared with state-of-the-art methods. It is demonstrated that our proposed method exhibits great robustness in various noise conditions. The improvement is significant especially when noise dominates the underlying speech.  相似文献   

10.
针对频域受限子空间语音增强在构造增强矩阵时,采用固定拉格朗日乘子,使得减小语音畸变和提高语音可懂度的过程中,有音乐噪声残留,提出一种变拉格朗日乘子的算法。利用听觉特性中较强的频率成分对噪声进行掩蔽,通过掩蔽阈值的频率域与子空间特征值之间的变换算法,用变量控制子空间拉格朗日乘子计算增益函数的对角矩阵。对比实验和试听结果表明,提出算法增强的语音信号不仅信噪比有较大提高,语音质量主观感知度也有明显改善。  相似文献   

11.
We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods.  相似文献   

12.
李艳生  刘园  张毅 《计算机应用》2019,39(3):894-898
针对非负矩阵分解(NMF)语音增强算法在低信噪比(SNR)非稳定环境下存在噪声残留的问题,提出一种基于感知掩蔽的重构NMF(PM-RNMF)单通道语音增强算法。首先,将心理声学掩蔽特性应用于NMF语音增强算法中;其次,对不同频率位采用不同的掩蔽阈值,建立自适应感知掩蔽增益函数,通过阈值约束残余噪声能量和语音失真能量;最后,结合语音存在概率(SPP)进行感知增益修正,重构NMF算法,以此建立新的目标函数。仿真结果表明,在不同SNR的3种非稳定噪声环境下,与NMF、重构NMF(RNMF)、感知掩蔽深度神经网络(PM-DNN)算法相比,PM-RNMF算法的感知语音质量评估(PESQ)平均值分别提高了0.767、0.474、0.162,信源失真比(SDR)平均值分别提高了2.785、1.197、0.948。实验结果表明,无论是在低频还是高频PM-RNMF有更好的降噪效果。  相似文献   

13.
This paper addresses the problem of speech enhancement and acoustic noise reduction by adaptive filtering algorithms. Recently, we have proposed a new Forward blind source separation algorithm that enhances very noisy speech signals with a subband approach. In this paper, we propose a new variable subband step-sizes algorithm that allows improving the previous algorithm behaviour when the number of subband is selected high. This new proposed algorithm is based on recursive formulas to compute the new variable step-sizes of the cross-coupling filters by using the decorrelation criterion between the estimated sub-signals at each subband output. This new algorithm has shown an important improvement in the steady state and the mean square error values. Along this paper, we present the obtained simulation results by the proposed algorithm that confirm its superiority in comparison with its original version that employs fixed step-sizes of the cross-coupling adaptive filters and with another fullband algorithm.  相似文献   

14.
语音增强主要用来提高受噪声污染的语音可懂度和语音质量,它的主要应用与在嘈杂环境中提高移动通信质量有关。传统的语音增强方法有谱减法、维纳滤波、小波系数法等。针对复杂噪声环境下传统语音增强算法增强后的语音质量不佳且存在音乐噪声的问题,提出了一种结合小波包变换和自适应维纳滤波的语音增强算法。分析小波包多分辨率在信号频谱划分中的作用,通过小波包对含噪信号作多尺度分解,对不同尺度的小波包系数进行自适应维纳滤波,使用滤波后的小波包系数重构进而获取增强的语音信号。仿真实验结果表明,与传统增强算法相比,该算法在低信噪比的非平稳噪声环境下不仅可以更有效地提高含噪语音的信噪比,而且能较好地保存语音的谱特征,提高了含噪语音的质量。  相似文献   

15.
This paper presents the results of three studies of intelligibility and quality of speech recorded through a bone conduction microphone (BCM). All speech signals were captured and recorded using a Temco HG-17 BCM. Twelve locations on or close to the skull were selected for the BCM placement. In the first study, listeners evaluated the intelligibility and quality of the bone conducted speech signals presented through traditional earphones. Listeners in the second study evaluated the intelligibility and quality of signals presented through a loudspeaker. In the third study the signals were reproduced through a bone conduction headset; however, signal evaluation was limited to speech intelligibility only. In all three studies, the Forehead and Temple BCM locations yielded the highest intelligibility and quality rating scores. The Collarbone location produced the least intelligible and lowest quality signals across all tested BCM locations.  相似文献   

16.
维纳滤波算法是改善噪声环境下听障患者语音理解度的常用算法之一。针对传统维纳滤波算法噪声谱估计偏差大的问题,提出一种基于改进的多通道维纳滤波算法的助听器语音降噪算法。算法首先结合人耳听觉特性和助听器响度补偿的特点,将语音信号进行Gammatone分解为多路子带信号。然后在每个子带内用基于先验信噪比估计的维纳滤波器进行语音增强处理。最后通过综合子带信号,得到增强的语音。此外,为了改善维纳滤波算法噪声谱估计的问题,提出一种基于包络估计的语音活动检测算法,并用于改善维纳滤波性能。实验结果表明,与传统维纳滤波法相比,该方法能更有效地抑制残留噪声,提高语音可懂度,具有较高的实用价值。  相似文献   

17.
A human factors experiment was conducted to assess the intelligibility of synthesized speech under a variety of noise conditions for both hearing-impaired and normal-hearing subjects. Modified Rhyme Test stimuli were used to determine intelligibility in four speech-to-noise (S/N) ratios (0, 5, 10, and 15 dB), and three noise types, consisting of fiat-by-octaves (pink) noise, interior noise of a currently produced heavy truck, and truck cab noise with added background speech. A quiet condition was also investigated. During recording of the truck noise for the experiment, in-cab noise measurements were obtained. According to OSHA standards, these data indicated that drivers of the sampled trucks have a minimal risk for noise-induced hearing loss due to in-cab noise exposure when driving at freeway speeds because noise levels were below 80 dBA. In the intelligibility experiment, subjects with hearing loss had significantly lower intelligibility than normal-hearing subjects, both in quiet and in noise, but no interaction with noise type or S/N ratio was found. Intelligibility was significantly lower for the noise with background speech than the other noises, but the truck noise produced intelligibility equal to the pink noise. An analytical prediction of intelligibility using Articulation Index calculations exhibited a high positive correlation with the empirically obtained intelligibility data for both groups of subjects.  相似文献   

18.
双麦克风噪声抵消应用中,由于交叉串的存在,传统自适应算法降噪性能受到很大的影响。为了提高双麦克风算法降噪性能,使用两级自适应滤波系统消除交叉串扰问题。为提高自适应滤波器收敛性能,采用主从结构LMS算法自适应调节步长因子。同时为了适合窄带处理算法,将输入信号进行子带分析预处理,对每个子带独立进行抗交叉串绕自适应处理,将各子带增强信号合并得到增强语音信号。实验结果表明,该方消噪量大,语音损伤小,语音增强效果显著。  相似文献   

19.
This paper introduces a robust voiced/non-voiced (VnV) speech classification method using bivariate empirical mode decomposition (bEMD). Fractional Gaussian noise (fGn) is employed as the reference signal to derive a data adaptive threshold for VnV discrimination. The analyzing speech signal and fGn are combined to generate a complex signal which is decomposed into a finite number of complex-valued intrinsic mode functions (IMFs) by using bEMD. The real and imaginary parts of the IMFs represent the IMFs of observed speech and fGn, respectively. The log-energies of both types of IMFs are calculated. There exist similarities between the IMF log-energy representation of fGn and unvoiced speech signals. Hence, the upper confidence limit from IMF log-energies of fGn is used as data adaptive threshold for VnV classification. If the subband log-energy of speech segment exceeds the threshold, the segment is classified as voiced and unvoiced otherwise. The experimental results show that the proposed algorithm performs better than the recently reported methods without requiring any training data for a wide range of SNRs.  相似文献   

20.
This paper presents a new approach to speech enhancement based on modified least mean square-multi notch adaptive digital filter (MNADF). This approach differs from traditional speech enhancement methods since no a priori knowledge of the noise source statistics is required. Specifically, the proposed method is applied to the case where speech quality and intelligibility deteriorates in the presence of background noise. Speech coders and automatic speech recognition systems are designed to act on clean speech signals. Therefore, corrupted speech signals by the noise must be enhanced before their processing. The proposed method uses a primary input containing the corrupted speech signal and a reference input containing noise only. The new computationally efficient algorithm is developed here based on tracking significant frequencies of the noise and implementing MNADF at those frequencies. To track frequencies of the noise time-frequency analysis method such as short time frequency transform is used. Different types of noises from Noisex-92 database are used to degrade real speech signals. Objective measures, the study of the speech spectrograms and global signal-to-noise ratio (SNR), segmental SNR (segSNR) as well as subjective listing test demonstrate consistently superior enhancement performance of the proposed method over tradition speech enhancement method such as spectral subtraction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号