首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
A perceptually motivated objective measure for evaluating speech quality is presented. The measure, computed from the original and coded versions of an utterance, exhibits statistically a monotonic relationship with the mean opinion score, a widely used criterion for speech coder assessment. For each 10-ms segment of an utterance, a weighted spectral vector is computed via 15 critical band filters for telephone bandwidth speech. The overall distortion, called Bark spectral distortion (BSD), is the average squared Euclidean distance between spectral vectors of the original and coded utterances. The BSD takes into account auditory frequency warping, critical band integration, amplitude sensitivity variations with frequency, and subjective loudness  相似文献   

2.
It is demonstrated that multiple sources of speech information can be integrated at a subsymbolic level to improve vowel recognition. Feedforward and recurrent neural networks are trained to estimate the acoustic characteristics of a vocal tract from images of the speaker's mouth. These estimates are then combined with the noise-degraded acoustic information, effectively increasing the signal-to-noise ratio and improving the recognition of these noise-degraded signals. Alternative symbolic strategies such as direct categorization of the visual signals into vowels are also presented. The performances of these neural networks compare favorably with human performance and with other pattern-matching and estimation techniques  相似文献   

3.
语音增强技术在低信噪比情况下,由于语音增强带来的失真使得系统的识别性能严重下降.因此提出一种结合特征空间的倒谱均值归一化算法(CMN)和模型空间的并行模型合并算法(PMC)的语音增强失真补偿技术.实验结果表明,该方法有效提高了低信噪比情况下的语音信号识别率.  相似文献   

4.
李伟  李媛媛 《电声技术》2011,35(7):42-44
针对目前汉语连续语音识别中英文识别问题,采用中英文混合建模的方法建立中英文混合模型.在分析已有语音识别系统的基础上,根据发音学的一些先验知识,提出一种基于主元音及英文音素序列混合的声学模型,同时利用最大似然规则训练出的声学模型,通过最小音素错误准则对声学模型进行区分性训练,更新得到最终的声学模型.在测试集上的结果表明,...  相似文献   

5.
Encoding frequency modulation to improve cochlear implant performance in noise   总被引:10,自引:0,他引:10  
Different from traditional Fourier analysis, a signal can be decomposed into amplitude and frequency modulation components. The speech processing strategy in most modern cochlear implants only extracts and encodes amplitude modulation in a limited number of frequency bands. While amplitude modulation encoding has allowed cochlear implant users to achieve good speech recognition in quiet, their performance in noise is severely compromised. Here, we propose a novel speech processing strategy that encodes both amplitude and frequency modulations in order to improve cochlear implant performance in noise. By removing the center frequency from the subband signals and additionally limiting the frequency modulation's range and rate, the present strategy transforms the fast-varying temporal fine structure into a slowly varying frequency modulation signal. As a first step, we evaluated the potential contribution of additional frequency modulation to speech recognition in noise via acoustic simulations of the cochlear implant. We found that while amplitude modulation from a limited number of spectral bands is sufficient to support speech recognition in quiet, frequency modulation is needed to support speech recognition in noise. In particular, improvement by as much as 71 percentage points was observed for sentence recognition in the presence of a competing voice. The present result strongly suggests that frequency modulation be extracted and encoded to improve cochlear implant performance in realistic listening situations. We have proposed several implementation methods to stimulate further investigation. Index Terms-Amplitude modulation, cochlear implant, fine structure, frequency modulation, signal processing, speech recognition, temporal envelope.  相似文献   

6.
An improved method based on minimum mean square error-short time spectral amplitude (MMSE-STSA) is proposed to cancel background noise in whispered speech. Using the acoustic character of whispered speech, the algorithm can track the change of non-stationary background noise effectively. Compared with original MMSE-STSA algorithm and method in selectable mode Vo-coder (SMV), the improved algorithm can further suppress the residual noise for low signal-to-noise radio (SNR) and avoid the excessive suppression. Simulations show that under the non-stationary noisy environment, the proposed algorithm can not only get a better performance in enhancement, but also reduce the speech distortion.  相似文献   

7.
随着大词汇量连续语音识别技术的发展,越来越多的研究人员选取声韵母作为识别单元。在基于声韵母的汉语连续语音识别中,声韵母基元的准确分割是非常重要的一步。结合汉语发音声学特性,提出了基于声母分割方法和基于段间距离方法相结合的策略。实验结果表明:该方法达到了准确分割的目的。  相似文献   

8.
Results from a series of experiments that use neural networks to process the visual speech signals of a male talker are presented. In these preliminary experiments, the results are limited to static images of vowels. It is demonstrated that these networks are able to extract speech information from the visual images and that this information can be used to improve automatic vowel recognition. The structure of speech and its corresponding acoustic and visual signals are reviewed. The specific data that was used in the experiments along with the network architectures and algorithms are described. The results of integrating the visual and auditory signals for vowel recognition in the presence of acoustic noise are presented  相似文献   

9.
We report continuous frequency shifting of a train of mode-locked pulses of light by up to /spl plusmn/400 GHz using electrooptic frequency shifting (EOFS). Dynamic, continuous, and accurate shifting of the optical pulses is achieved by controlling the phase and power of a single frequency microwave signal in a traveling wave phase modulator structure. Operation of the device under different microwave power and optical pulse length settings was investigated in order to study spectral distortion predicted to occur at high levels of frequency shifting and when using long pulses. Possible techniques for future device improvement were proposed and preliminary tests were performed. The use of a second-harmonic component of the microwave drive signal leads to a longer region of constant amplitude gradient and results in reduced distortion.  相似文献   

10.
以使用嵌入武操作系统PocketPC的个人数字助理(PDA)为实验平台研究了基于非特定人语音命令识别的可定制声控拨号器。针对PDA存绪空问和运算能力的限制,在保证性能的前提下从严格控制搜索空间和提高解码速度出发,提出了结合搜索路径分数差值实时调整剪枝宽度的动态调整直方图剪技策略,提出了利用速查表加速似然计算的方法,并在通过实验验证舌采用较少维数的特征、结合扩展声韵母进行声学建模等措施,有效地解决了上述问题.在实际PDA设备上实验表明,在词表大小为200个人名时,识别正确率达98.70%,而识别速度比采用标准算法的参考系统提高了约80倍,同时节省了约30%搜索存储空间。  相似文献   

11.
A new approach to the design of IWSR systems is proposed in this paper. This involves a dynamic matching strategy based on the nature of the input speech segment. This is called signal-dependent matching. The computational complexity in the implementation of the proposed algorithm is significantly reduced by adopting a two stage approach in matching. In the first stage, the warping path between the test utterance and a reference utterance is determined. In the second stage, the distance between the utterances is computed along the path. There will be a slight degradation in the performance of a two stage approach as compared to the single stage approach, but this can be tolerated in view of the significant computational advantage. The performance degradation is more than compensated by the signal-dependent matching strategy in the second stage. To measure the improvement in the recognition performance, a new index of performance is defined, that reflects the characteristics of the distance matrix for a given vocabulary, rather than the characteristics of the confusion matrix. The performance of the signal-dependent matching algorithm is significantly better than the standard dynamic time warping matching algorithm for confusable as well as nonconfusable vocabulary.We also develop a signal-dependent matching algorithm, which takes into account some distortions in the input speech. As an example we offer the agorithm twice the same test utterance, once undistorted, once after a distortion. Our research until now indicates a improvement in automatic isolated word speech recognition systems while using signal-dependent parameter measuring and signal dependent matching.  相似文献   

12.
基于MATLAB的语音增强系统的设计   总被引:1,自引:0,他引:1  
冯岩  唐普英 《通信技术》2010,43(5):187-188,191
语音增强是信号处理领域中的一个重要的组成部分。在许多语音处理的应用中,例如移动通信,语音识别和助听器,语音信号的处理不得不在具有噪声的环境下进行。在过去的几十年里,人们提出了许多方法去消除噪声和减少语音失真,例如谱减法,基于小波的方法,隐式马尔科夫模型法和信号子空间法等。小波分析由于能同时在时域和频域中对信号进行分析,所以它能有效地实现对信号的去噪。介绍了一种语音增强系统的设计方法,采用Least Mean Square(LMS)算法和小波变换相结合的方法对带噪语音进行去噪,并在MATLAB的Simulink环境下建立了该系统的模型。通过对该模型的仿真表明:该方法去噪效果明显,为该系统在硬件上的实现打下了理论基础。  相似文献   

13.
In this paper, we propose an approach for the analysis and detection of acoustic events in speech signals using the Bessel series expansion. The acoustic events analyzed are the voice onset time (VOT) and the glottal closure instants (GCIs). The hypothesis is that the Bessel functions with their damped sinusoid-like basis functions are better suited for representing the speech signals than the sinusoidal basis functions used in the conventional Fourier representation. The speech signal is band-pass filtered by choosing the appropriate range of Bessel coefficients to obtain a narrow-band signal, which is decomposed further into amplitude modulated (AM) and frequency modulated (FM) components. The discrete energy separation algorithm (DESA) is used to compute the amplitude envelope (AE) of the narrow-band AM-FM signal. Events such as the consonant and vowel beginnings in an unvoiced stop consonant vowel (SCV) and the GCIs are derived by processing the AE of the signal. The proposed approach for the detection of the VOT using the Bessel expansion is shown to perform better than the conventional Fourier representation. The performance of the proposed GCI detection method using the Bessel series expansion is compared against some of the existing methods for various noise environments and signal-to-noise ratios.  相似文献   

14.
In this paper, we propose an efficient approach to spotting and recognition of consonant-vowel (CV) units from continuous speech using accurate detection of vowel onset points (VOPs). Existing methods for VOP detection suffer from lack of high accuracy, spurious VOPs, and missed VOPs. The proposed VOP detection is designed to overcome most of the shortcomings of the existing methods and provide accurate detection of VOPs for improving the performance of spotting and recognition of CV units. The proposed method for VOP detection is carried out in two levels. At the first level, VOPs are detected by combining the complementary evidence from excitation source, spectral peaks, and modulation spectrum. At the second level, hypothesized VOPs are verified (genuine or spurious), and their positions are corrected using the uniform epoch intervals present in the vowel regions. The spotted CV units are recognized using a two-stage CV recognizer. Two-stage CV recognition system consists of hidden Markov models (HMMs) at the first stage for recognizing the vowel category of a CV unit and support vector machines (SVMs) for recognizing the consonant category of a CV unit at the second stage. Performance of spotting and recognition of CV units from continuous speech is evaluated using Telugu broadcast news speech corpus.  相似文献   

15.
Transformation of a segment of acoustic signal, by processing into a vectorial representation such as the spectrum, can permit the identification of the constituent phonemes within spoken speech. Subsequent comparison against a previously stored representation using techniques such as dynamic time warping or hidden Markov modelling then permits a speech recognition operation to be accomplished. These signal-processor-intensive transform and graph-search-based pattern-matching techniques are reviewed and currently achievable recognition accuracies are reported  相似文献   

16.
A segment-based speech recognition scheme is proposed. The basic idea is to model explicitly the correlation among successive frames of speech signals by using features representing contours of spectral parameters. The speech signal of an utterance is regarded as a template formed by directly concatenating a sequence of acoustic segments. Each constituent acoustic segment is of variable length in nature and represented by a fixed dimensional feature vector formed by coefficients of discrete orthonormal polynomial expansions for approximating its spectral parameter contours. In the training, an automatic algorithm is proposed to generate several segment-based reference templates for each syllable class. In the testing, a frame-based dynamic programming procedure is employed to calculate the matching score of comparing the test utterance with each reference template. Performance of the proposed scheme was examined by simulations on multi-speaker speech recognition for 408 highly confusing isolated Mandarin base-syllables. A recognition rate of 81.1% was achieved for the case using 5-segment, 8-reference template models with cepstral and delta-cepstral coefficients as the recognition features. It is 4.5% higher than that of a well-modelled 12-state, 5-mixture CHMM method using cepstral, delta cepstral, and delta-delta cepstral coefficients  相似文献   

17.
该文结合短时谱估计算法和人耳掩蔽效应提出了一种单通道语音增强算法。该算法在MMSE准则下采用了非固定参数的语音跟踪,并且引入人耳掩蔽效应动态的确定增强滤波器的传递函数以适应语音信号的变化。实验结果表明:该算法使降噪后的语音信号有较小的语音失真并且很好地抑制了音乐噪声。  相似文献   

18.
提出了利用频域的独立成分分析(Independent components analysis)算法分离语音信号和噪声信号,达到抑制噪声的效果.并且,针对ICA算法在噪声源集中的环境中效果较好,在噪声源分散的环境中性能有所退化的情况,基于时域带噪信号的ICA算法提出频域带噪信号的ICA算法.最后利用最小均方误差估计谱幅度算法(Minimum mean square error)去除残留噪声,达到较好的语音增强效果.通过大量的实验数据测试,文中提出的基于ICA和MMSE短时谱幅度估计的双麦克语音增强算法在不同信噪比(Signal to Noise Ratio)下,都取得了良好的降噪效果.  相似文献   

19.
Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximum-likelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for high-performance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine prior knowledge in an existing collection of general models with a new set of condition-specific adaptation data. In this paper, the mathematical framework for Bayesian adaptation of acoustic and language model parameters is first described. Maximum a posteriori point estimation is then developed for hidden Markov models and a number of useful parameters densities commonly used in automatic speech recognition and natural language processing  相似文献   

20.
In this paper, a warped discrete cosine transform (WDCT)-based approach to enhance the degraded speech under background noise environments is proposed. For developing an effective expression of the frequency characteristics of the input speech, the variable frequency warping filter is applied to the conventional discrete cosine transform (DCT). The frequency warping control parameter is adjusted according to the analysis of spectral distribution in each frame. For a more accurate analysis of spectral characteristics, the split-band approach in which the global soft decision for speech presence is performed in each band separately is employed. A number of subjective and objective tests show that the WDCT-based enhancement method yields better performance than the conventional DCT-based algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号