共查询到20条相似文献,搜索用时 31 毫秒
1.
Improved Subspace-Based Single-Channel Speech Enhancement Using Generalized Super-Gaussian Priors 总被引:1,自引:0,他引:1
Jesper Jensen Richard Heusdens 《IEEE transactions on audio, speech, and language processing》2007,15(3):862-872
Traditional single-channel subspace-based schemes for speech enhancement rely mostly on linear minimum mean-square error estimators, which are globally optimal only if the Karhunen-Loeacuteve transform (KLT) coefficients of the noise and speech processes are Gaussian distributed. We derive in this paper subspace-based nonlinear estimators assuming that the speech KLT coefficients are distributed according to a generalized super-Gaussian distribution which has as special cases the Laplacian and the two-sided Gamma distribution. As with the traditional linear estimators, the derived estimators are functions of the a priori signal-to-noise ratio (SNR) in the subspaces spanned by the KLT transform vectors. We propose a scheme for estimating these a priori SNRs, which is in fact a generalization of the "decision-directed" approach which is well-known from short-time Fourier transform (STFT)-based enhancement schemes. We show that the proposed a priori SNR estimation scheme leads to a significant reduction of the residual noise level, a conclusion which is confirmed in extensive objective speech quality evaluations as well as subjective tests. We also show that the derived estimators based on the super-Gaussian KLT coefficient distribution lead to improvements for different noise sources and levels as compared to when a Gaussian assumption is imposed 相似文献
2.
In this paper we discuss an unsupervised approach for co-channel speech separation where two speakers are speaking simultaneously over same channel. We propose a two stage separation process where the initial stage is based on empirical mode decomposition (EMD) and Hilbert transform generally known as Hilbert–Huang transform. EMD decomposes the mixed signal into oscillatory functions known as intrinsic mode functions. Hilbert transform is applied to find the instantaneous amplitudes and Fuzzy C-Means clustering is applied to group the speakers at initial stage. In second stage of separation speaker groups are transformed into time–frequency domain using short time Fourier transform (STFT). Time–frequency ratio’s are computed by dividing the STFT matrix of mixed speech signal and STFT matrix of stage1 recovered speech signals. Histogram of the ratios obtained can be used to estimate the ideal binary mask for each speaker. These masks are applied to the speech mixture and the underlying speakers are estimated. Masks are estimated from the speech mixture and helps in imputing the missing values after stage1 grouping of speakers. Results obtained show significant improvement in objective measures over other existing single-channel speech separation methods. 相似文献
3.
《IEEE transactions on audio, speech, and language processing》2009,17(7):1420-1434
4.
《IEEE transactions on audio, speech, and language processing》2009,17(6):1109-1123
5.
《IEEE transactions on audio, speech, and language processing》2010,18(1):158-170
6.
基于分段线性预测算法估计语音的共振峰频率,运用多通道的滤波器组对语音的频段进行划分,然后选择合适的逆滤波器逼近不同频段的短时频谱,最后依据该逆滤波器估计共振峰频率。实验结果表明,与传统方法相比,该方法提高了语音共振峰频率估计时的分辨率与准确性,受噪声的影响较小。 相似文献
7.
本文讨论了最小方差无失真响应建模方法,并与线性预测方法进行了比较,比较发现最小方差无失真响应滤波器能提供一个更好的原始语音包络。然后在研究ICA原理及FastICA快速算法的基础上,将MVDR参数提取方法与独立分量分析方法相结合,并与传统语音识别方法在有噪声和无噪声的情况下进行了比较,进而对识别率、计算时间等结果进行了分析。MVDR参数提取方法可以提高语音识别系统的识别率,但是会增加平均识别时间;而经过ICA特征变换后的语音识别系统具有较好的鲁棒性。 相似文献
8.
针对语音卷积盲源分离频域法排列顺序不确定性问题,提出一种多频段能量排序算法。首先,通过对混合信号的短时傅立叶变换(STFT),在频域上各个频点建立一个瞬时混合模型进行独立分量分析,之后结合能量相关排序法和波达方向(DOA)排序法解决排序不确定性问题,再利用分裂语谱方法解决幅度不确定性问题,进而得到每个频点正确的分离子信号,最后利用逆短时傅立叶(ISTFT)变换得到分离的源信号。仿真结果表明,与Murata的排序算法对比,改进的算法在信号偏差比、信道干扰比、系统误差比上都所提高。 相似文献
9.
基于时频分析方法LPFT的语音处理GUI系统 总被引:1,自引:1,他引:0
由于语音的非平稳性及时变性,时频分析方法是处理语音信号的重要工具.然而线性短时傅里叶变换的时频聚集性较差,而双线性维格纳变换在处理多分量信号时会受到交叉项的干扰.为了克服以上两种时频分析方法的缺点,利用短时傅里叶变换的扩展形式即局部多项式傅里叶变换LPFT来处理语音信号,建立了基于LPFT的语音处理GUI系统,实现了在时域、频域和时频域对语音的分析和对比.并给出语音处理的例子,验证了LPFT方法与其它方法相比所具有的优势.该系统简明直观,是语音处理的较好的平台. 相似文献
10.
11.
Yiteng Huang Benesty J. Jingdong Chen 《IEEE transactions on audio, speech, and language processing》2008,16(5):957-968
Noise reduction for speech enhancement is a useful technique, but in general it is a challenging problem. While a single-channel algorithm is easy to use in practice, it inevitably introduces speech distortion to the desired speech signal while reducing noise. Today, the explosive growth in computational power and the continuous drop in the cost and size of acoustic electric transducers are driving the interest of employing multiple microphones in speech processing systems. This opens new opportunities for noise reduction. In this paper, we present an analysis of three multichannel noise reduction algorithms, namely Wiener filter, subspace, and spatial-temporal prediction, in a common framework. We intend to investigate whether it is possible for the multichannel noise reduction algorithms to reduce noise without speech distortion. Finally, we justify what we learn via theoretical analyses by simulations using real impulse responses measured in the varechoic chamber at Bell Labs. 相似文献
12.
Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors 总被引:1,自引:0,他引:1
Erkelens J.S. Hendriks R.C. Heusdens R.. Jensen J.. 《IEEE transactions on audio, speech, and language processing》2007,15(6):1741-1752
This paper considers techniques for single-channel speech enhancement based on the discrete Fourier transform (DFT). Specifically, we derive minimum mean-square error (MMSE) estimators of speech DFT coefficient magnitudes as well as of complex-valued DFT coefficients based on two classes of generalized gamma distributions, under an additive Gaussian noise assumption. The resulting generalized DFT magnitude estimator has as a special case the existing scheme based on a Rayleigh speech prior, while the complex DFT estimators generalize existing schemes based on Gaussian, Laplacian, and Gamma speech priors. Extensive simulation experiments with speech signals degraded by various additive noise sources verify that significant improvements are possible with the more recent estimators based on super-Gaussian priors. The increase in perceptual evaluation of speech quality (PESQ) over the noisy signals is about 0.5 points for street noise and about 1 point for white noise, nearly independent of input signal-to-noise ratio (SNR). The assumptions made for deriving the complex DFT estimators are less accurate than those for the magnitude estimators, leading to a higher maximum achievable speech quality with the magnitude estimators. 相似文献
13.
《Digital Signal Processing》2006,16(5):597-606
We propose a new method for adaptively removing noise and interference from a signal. In this method unwanted components are removed from the short time Fourier transform (STFT) surface, and the clean signal is estimated by integrating the modified STFT with respect to frequency. Isolation of the signal and interference components is facilitated by a concentration process based on the phase of the STFT differentiated with respect to time. The concentrated STFT is a linear representation, free of cross terms and having the property that signal and interference components are easily recognized because their distributions are more concentrated in frequency. Interference removal may be accomplished by removing unwanted components from the concentrated STFT, and the clean signal may be estimated by integration of the modified concentrated STFT. We demonstrate the advantages of the proposed method over conventional methods. 相似文献
14.
《IEEE transactions on audio, speech, and language processing》2008,16(8):1633-1641
15.
Despite great developments in the field of acoustic echo cancellation (AEC), the presence of double-talk remains difficult problem. The main role of double-talk detection (DTD) is to control adaptation of the filter coefficients by halting their update in double-talk situations. In this paper, we propose a new method of DTD based on a time–frequency analysis that uses the Stockwell transform (ST).The ST is a time–frequency spectral localization method that combines the characteristics of the short-time Fourier transform and the wavelet transform. This method provides better time–frequency resolution, especially for non-stationary signals. In the experimental tests, the normalized least mean squares (NLMS) algorithm is used to update the filter coefficients along with speech signals taken from the TIMIT database. The obtained results show better performance compared to existing methods in terms of misalignment convergence and speech intelligibility enhancement. 相似文献
16.
Chin-Teng Lin 《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》2003,33(1):137-143
Discusses the problem of single-channel speech enhancement in variable noise-level environment. Commonly used, single-channel subtractive-type speech enhancement algorithms always assume that the background noise level is fixed or slowly varying. In fact, the background noise level may vary quickly. This condition usually results in wrong speech/noise detection and wrong speech enhancement process. In order to solve this problem, we propose a subtractive-type speech enhancement scheme. This new enhancement scheme uses the RTF (refined time-frequency parameter)-based RSONFIN (recurrent self-organizing neural fuzzy inference network) algorithm we developed previously to detect the word boundaries in the condition of variable background noise level. In addition, a new parameter (MiFre) is proposed to estimate the varying background noise level. Based on this parameter, the noise level information used for subtractive-type speech enhancement can be estimated not only during speech pauses, but also during speech segments. This new subtractive-type enhancement scheme has been tested and found to perform well, not only in variable background noise level condition, but also in fixed background noise level condition. 相似文献
17.
《Computer Speech and Language》2007,21(1):187-205
In this paper, a set of features derived by filtering and spectral peak extraction in autocorrelation domain are proposed. We focus on the effect of the additive noise on speech recognition. Assuming that the channel characteristics and additive noises are stationary, these new features improve the robustness of speech recognition in noisy conditions. In this approach, initially, the autocorrelation sequence of a speech signal frame is computed. Filtering of the autocorrelation of speech signal is carried out in the second step, and then, the short-time power spectrum of speech is obtained from the speech signal through the fast Fourier transform. The power spectrum peaks are then calculated by differentiating the power spectrum with respect to frequency. The magnitudes of these peaks are then projected onto the mel-scale and pass the filter bank. Finally, a set of cepstral coefficients are derived from the outputs of the filter bank. The effectiveness of the new features for speech recognition in noisy conditions will be shown in this paper through a number of speech recognition experiments.A task of multi-speaker isolated-word recognition and another one of multi-speaker continuous speech recognition with various artificially added noises such as factory, babble, car and F16 were used in these experiments. Also, a set of experiments were carried out on Aurora 2 task. Experimental results show significant improvements under noisy conditions in comparison to the results obtained using traditional feature extraction methods. We have also reported the results obtained by applying cepstral mean normalization on the methods to get robust features against both additive noise and channel distortion. 相似文献
18.
Gökhan Ince Kazuhiro Nakadai Tobias Rodemann Hiroshi Tsujino Jun-ichi Imura 《Applied Intelligence》2011,34(3):360-371
We describe an architecture that gives a robot the capability to recognize speech by cancelling ego noise, even while the
robot is moving. The system consists of three blocks: (1) a multi-channel noise reduction block, comprising consequent stages
of microphone-array-based sound localization, geometric source separation and post-filtering; (2) a single-channel noise reduction
block utilizing template subtraction; and (3) an automatic speech recognition block. In this work, we specifically investigate
a missing feature theory-based automatic speech recognition (MFT-ASR) approach in block (3). This approach makes use of spectro-temporal
elements derived from (1) and (2) to measure the reliability of the acoustic features, and generates masks to filter unreliable
acoustic features. We then evaluated this system on a robot using word correct rates. Furthermore, we present a detailed analysis
of recognition accuracy to determine optimal parameters. Implementation of the proposed MFT-ASR approach resulted in significantly
higher recognition performance than single or multi-channel noise reduction methods. 相似文献
19.
The S-transform (ST) is a popular linear time-frequency (TF) transform with hybrid characteristics from the short-time Fourier transform (STFT) and the wavelet transform. It enables a multi-resolution TF analysis and returns globally referenced local phase information, but its expensive computational requirements often overshadow its other desirable features. In this paper, we develop a fully discrete ST (DST) with a controllable TF sampling scheme based on a filter-bank interpretation. The presented DST splits the analyzed signal into subband channels whose bandwidths increase progressively in a fully controllable manner, providing a frequency resolution that can be varied and made as high as required, which is a desirable property for processing oscillatory signals lacked by previously presented DSTs. Thanks to its flexible sampling scheme, the behavior of the developed transform in the TF domain can be adjusted easily; with specific parameter settings, for example, it samples the TF domain dyadically, while by choosing different settings, it may act as a STFT. The spectral partitioning is performed through asymmetric raised-cosine windows whose collective amplitude is unitary over the signal spectrum to ensure that the transform is easily and exactly invertible. The proposed DST retains all the appealing properties of the original ST, representing a local image of the Fourier transform; it requires low computational complexity and returns a modest number of TF coefficients. To confirm its effectiveness, the developed transform is utilized for different applications using real-world and synthetic signals. 相似文献
20.
一种基于MVDR和CCBC的抗噪语音识别方法 总被引:1,自引:0,他引:1
提出了一种适用于抗噪声语音识别的方法,其特征提取过程基于最小方差无失真响应(Minimum variance distortionles sresponse,MVDR)谱估计方法,并对该特征进行频率弯折以提高其知觉分辨率,最后使用基于正则相关分析的谱变换补偿(Canonical correlation based on compensation,CCBC)法对该特征进行自适应处理,从而提高了系统的鲁棒性。在展览馆噪声、人群噪声和汽车噪声下,与基于传统Mel倒谱系数(MFCC)特征的系统进行了对比实验,结果表明使用本文方法的语音识别系统的识别率得到了显著的提高。 相似文献