首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
噪声环境下的基音检测在语音信号处理中占有重要地位。为了有效提取低信噪比情况下的语音基音周期,提出了一种基于小波包变换加权线性预测自相关的检测方法。该方法首先利用小波包自适应阈值消除噪声,将多级小波包变换的近似分量求和以突出基音信息,并采用小波包系数加权线性预测误差自相关的方法突出基音周期处的峰值,提高了基音周期检测的精度。实验结果表明,与传统的自相关法、小波加权自相关法相比,该方法鲁棒性好,基音轨迹平滑,具有更高的准确性,即使在信噪比为-5dB时仍能取得较为理想的结果。  相似文献   

2.
In this work, an average framing linear prediction coding (AFLPC) technique for text-independent speaker identification systems is presented. Conventionally, linear prediction coding (LPC) has been applied in speech recognition applications. However, in this study the combination of modified LPC with wavelet transform (WT), termed AFLPC, is proposed for speaker identification. The investigation procedure is based on feature extraction and voice classification. In the phase of feature extraction, the distinguished speaker’s vocal tract characteristics were extracted using the AFLPC technique. The size of a speaker’s feature vector can be optimized in term of an acceptable recognition rate by means of genetic algorithm (GA). Hence, an LPC order of 30 is found to be the best according to the system performance. In the phase of classification, probabilistic neural network (PNN) is applied because of its rapid response and ease in implementation. In the practical investigation, performances of different wavelet transforms in conjunction with AFLPC were compared with one another. In addition, the capability analysis on the proposed system was examined by comparing it with other systems proposed in literature. Consequently, the PNN classifier achieves a better recognition rate (97.36%) with the wavelet packet (WP) and AFLPC termed WPLPCF feature extraction method. It is also suggested to analyze the proposed system in additive white Gaussian noise (AWGN) and real noise environments; 58.56% for 0 dB and 70.52% for 5 dB. The recognition rates for the whole database of the Gaussian mixture model (GMM) reached the lowest value in case of small number of training samples.  相似文献   

3.
The most widely used speech representation is based on the mel-frequency cepstral coefficients, which incorporates biologically inspired characteristics into artificial recognizers. However, the recognition performance with these features can still be enhanced, specially in adverse conditions. Recent advances have been made with the introduction of wavelet based representations for different kinds of signals, which have shown to improve the classification performance. However, the problem of finding an adequate wavelet based representation for a particular problem is still an important challenge. In this work we propose a genetic algorithm to evolve a speech representation, based on a non-orthogonal wavelet decomposition, for phoneme classification. The results, obtained for a set of spanish phonemes, show that the proposed genetic algorithm is able to find a representation that improves speech recognition results. Moreover, the optimized representation was evaluated in noise conditions.  相似文献   

4.
孤立词语音识别技术,采用的是模式匹配法,是语音识别技术的核心之一。首先,用户将词汇表中的每一词依次说一遍,并且将其特征矢量作为模板存入棋板库。然后,将输入语音的特征矢量依次与模板库中的每个模板进行相似度比较,将相似度最高者作为识别结果输出。本文介绍了孤立词语音识别技术的研究现状及几种常见的技术方法,并且分析探讨了孤立词语音识别技术的应用和发展前景。  相似文献   

5.
Speaker recognition faces many practical difficulties, among which signal inconsistency due to environmental and acquisition channel factors is most challenging. The noise imposed to the voice signal varies greatly and a priori noise model is usually unavailable. In this article, we propose a robust speaker recognition method that employs a novel adaptive wavelet shrinkage method for noise suppression. In our method, wavelet subband coefficient thresholds are automatically computed, which are proportional to the noise contamination. In the application of wavelet shrinkage for noise removal, a dual-threshold strategy is developed to suppress noise, preserve signal coefficients and minimize the introduction of artifacts. The recognition is achieved using modification of Mel-frequency cepstral coefficient of overlapped voice signal segments. The efficacy of our method is evaluated with voice signals from two public available speech signal databases and is compared with state-of-the-art methods. It is demonstrated that our proposed method exhibits great robustness in various noise conditions. The improvement is significant especially when noise dominates the underlying speech.  相似文献   

6.
Tone study is very important for Mandarin speech recognition. In this paper, a Mixture Stochastic Polynomial Tone Model (MSPTM) is proposed for tone modeling in continuous Mandarin speech. In this model the pitch contour, main representative of tone pattern, is described as a mixed stochastic trajectory. The mean trajectory is represented by a polynomial function of normalized time while the variance is time varying. Effective training and tone recognition algorithms were developed. The experimental results based on the proposed MSPTM showed 40.7% tone recognition error rate reduction relative to the traditional Hidden Markov Model (HMM) tone model. We also present a decision tree based approach to learning the tone pattern variation in continuous speech. The phonetic and linguistic factors that may affect the tone patterns were taken into consideration while constructing the tree. After the tree was established, 28 different tone patterns were obtained. We found that in addition to the tone of the neighboring syllable, Consonant/Vowel type of the syllable and the position of the syllable in the utterance also made important contributions to tone pattern variations in continuous speech. Finally, a new approach of integrating tone information into the search process at word level is discussed. Experiments on continuous Mandarin speech recognition showed that the new tone model and tone information integration method were efficient, achieving a 16.2% relative character error rate reduction.  相似文献   

7.
三字词音节声调模式具有连续语音中音节声调模式的特征,声调的提取和识别远较孤立字困难。采用小波变换方法提取语音基音,用Fuzzy ARTMAP神经网络进行声调识别,获得了比BP网络更好的实验结果。分析了仿真参数对识别结果的影响,讨论了Fuzzy ARTMAP神经网络中的过拟合问题,给出了一种基于Fuzzy ARTMAP神经网络的三字词声调识别方法。  相似文献   

8.
The recognition of emotion in human speech has gained increasing attention in recent years due to the wide variety of applications that benefit from such technology. Detecting emotion from speech can be viewed as a classification task. It consists of assigning, out of a fixed set, an emotion category e.g. happiness, anger, to a speech utterance. In this paper, we have tackled two emotions namely happiness and anger. The parameters extracted from speech signal depend on speaker, spoken word as well as emotion. To detect the emotion, we have kept the spoken utterance and the speaker constant and only the emotion is changed. Different features are extracted to identify the parameters responsible for emotion. Wavelet packet transform (WPT) is found to be emotion specific. We have performed the experiments using three methods. Method uses WPT and compares the number of coefficients greater than threshold in different bands. Second method uses energy ratios of different bands using WPT and compares the energy ratios in different bands. The third method is a conventional method using MFCC. The results obtained using WPT for angry, happy and neutral mode are 85 %, 65 % and 80 % respectively as compared to results obtained using MFCC i.e. 75 %, 45 % and 60 % respectively for the three emotions. Based on WPT features a model is proposed for emotion conversion namely neutral to angry and neutral to happy emotion.  相似文献   

9.
This paper proposes an improved voice activity detection (VAD) algorithm using wavelet and support vector machine (SVM) for European Telecommunication Standards Institution (ETSI) adaptive multi-rate (AMR) narrow-band (NB) and wide-band (WB) speech codecs. First, based on the wavelet transform, the original IIR filter bank and pitch/tone detector are implemented, respectively, via the wavelet filter bank and the wavelet-based pitch/tone detection algorithm. The wavelet filter bank can divide input speech signal into several frequency bands so that the signal power level at each sub-band can be calculated. In addition, the background noise level can be estimated in each sub-band by using the wavelet de-noising method. The wavelet filter bank is also derived to detect correlated complex signals like music. Then the proposed algorithm can apply SVM to train an optimized non-linear VAD decision rule involving the sub-band power, noise level, pitch period, tone flag, and complex signals warning flag of input speech signals. By the use of the trained SVM, the proposed VAD algorithm can produce more accurate detection results. Various experimental results carried out from the Aurora speech database with different noise conditions show that the proposed algorithm gives considerable VAD performances superior to the AMR-NB VAD Options 1 and 2, and AMR-WB VAD.  相似文献   

10.
李晶皎  孙杰 《控制与决策》1998,13(6):665-668,699
提出了一种基于听觉与小波变换处理的汉语语音基音的方法,在对听觉系统描述的基础上,给出了人的听觉与小波变换的关系,选取适合汉语事音基频提取的小波函数,给出了基频提取的应用实例和基于FCM模糊聚类分析的汉语四声调值识别结果。  相似文献   

11.
This study focuses on the perception of emotion and attitude in speech. The ability to identify vocal expressions of emotion and/or attitude in speech material was investigated. Systematic perception experiments were carried out to determine optimal values for the acoustic parameters: pitch level, pitch range and speech rate. Speech was manipulated by varying these parameters around the values found in a selected subset of the speech material which consisted of two sentences spoken by a male speaker expressing seven emotions or attitudes: neutrality, joy, boredom, anger, sadness, fear, and indignation. Listening tests were carried out with this speech material, and optimal values for pitch level, pitch range, and speech rate were derived for the generation of speech expressing emotion or attitude, from a neutral utterance. These values were perceptually tested in re-synthesized speech and in synthetic speech generated from LPC-coded diphones.  相似文献   

12.
Over the past several years there has been considerable attention focused on coding and enhancement of speech signals. This interest is progressed towards the development of new techniques capable of producing good quality speech at the output. Speech coding is a process of converting human speech into efficient encoded representations that can be decoded to produce a close approximation of the original signal. This paper deals with the problem of speech coding. It proposes novel approach called Best Tree Encoding (BTE) to encode the wavelet packet Best Tree Structure into a vector of four elements. This research is introducing BTE for solving another problem for speech compression and syntheses. Tree node data coefficients are encoded using LPC Filters and trigonometric features. The encoded vector consists of 4 elements from BTE analysis as well as LPC and trigonometric vector for each leaf node. The quality of the reproduced speech is evaluated for both understanding and quality. The quality of speech signal is measured on the basis of signal to noise ratio, log likelihood ratio, and spectral distortion.  相似文献   

13.
提出了基于肌电信号(EMG)的无声语音识别系统。由于该系统是通过EMG信号而非声音信号进行识别,因此可应用于高噪声环境和帮助失去发音能力的人实现无声交流,有着良好的应用前景。关于该系统的实现,提出了以下方法:实验时使用0—9十个中文数字,由受试者不发声地重复说出,从三块面部肌肉采集EMG信号;对EMG信号进行小波变换,获取变换系数矩阵后提取其能量值,构造特征矢量送入BP神经网络分类器分类。实验表明,基于小波变换的特征提取方法是一种有效的方法.适用于类似EMC信号的非平稳生理信号。  相似文献   

14.
This paper presents a wavelet-based feature extraction method for human gait recognition. The selection of features with most discriminative information is the key to improve recognition performance. The frequency domain representation of the gait image is obtained by using fast Fourier transforms. Next, a discrete wavelet transform is applied to the obtained spectrum. With single-level wavelet decomposition, four coefficients are generated. The sum of the entropy of these four wavelet coefficients is computed yielding the wavelet Entropy Image (wEnI) which is used here as the potential feature for human gait recognition. A template matching-based approach is used as the classification. The performance of the proposed wEnI feature is evaluated using whole-based and part-based methods. The experimental results show that the wEnI feature performs better compared to state-of-the-art gait features in common use.  相似文献   

15.
小波网络和RBF网络的抗噪语音识别   总被引:1,自引:0,他引:1       下载免费PDF全文
针对目前在噪音环境下语音识别系统性能较差的问题,利用小波神经网络融合了小波变换良好的时频局域化性质和RBF神经网络具有最佳分类能力和辨识能力等特性。构建了一个用小波基替代RBF网络中激活函数的小波-RBF神经网络结构,并采用全监督训练算法,实现了基于小波-RBF网络的抗噪语音识别系统。实验结果表明该系统比RBF网络具有更好的识别效果,尤其在噪声环境下,具有更强的鲁棒性。  相似文献   

16.
基于自相关平方函数与小波变换的基音检测   总被引:2,自引:0,他引:2  
林琴  郭玉堂  刘亚楠 《计算机应用》2009,29(5):1433-1436
在背景噪声干扰条件下,研究语音信号的基音周期,提出了一种基于自相关平方函数与小波变换结合的基音检测算法。该算法先用小波变换对带噪语音去噪,然后再求语音的自相关平方函数以突出真实基音周期的峰值,以获取较精确的基音周期。实验结果表明,与传统的自相关法相比,该算法鲁棒性好,具有更高的准确性,且计算复杂度低,利于语音合成和编码的实时处理。  相似文献   

17.
模板匹配技术用于连呼数字识别,其存在的主要问题是巨大的计算量。本文在分析了汉语连呼数字发音参数特征的基础上,提出了一种集预分割和非预分割方法为一体的连呼数字识别算法,该算法的运算量与多级匹配方法,相比有了大幅度下降,本文同时还探讨了语音参数考模板的建立方法对识别性能的影响及通过加入基音信息进一步提高识别性能的途径。  相似文献   

18.
针对传统小波语音增强算法存在过度周值处理的问题,提出一种改进的时间自适应阈值小波包去噪算法.该方法采用听觉感知小波包对噪声语音进行分解,得到小波包听觉感知节点上的系数,并基于语音存在概率估计按帧自动调节去噪周值,因改进的闲值能更好地避免语音小波包系数被过度阈值处理的情况,从而在抑制噪声的同时保留了更多的原始语音成分,进一步提高了降噪效果,实验结果表明,该算法比常规小波自适应闻值算法能得到更清晰的语音增强信号.  相似文献   

19.
针对基音周期检测中容易出现的半周期和倍周期错误,综合考虑了常用的小波变换和短时自相关方法的优缺点,以及相邻基音周期长度的渐变性,提出了把两者相结合的基音周期检测算法.对语音信号进行清浊音检测和前置带通滤波,利用小波变换方法进行初步检测,对基音周期变化过大的情况使用自相关方法进行验证.实验结果表明,该方法在不同信噪比下的基音周期检测准确率都明显高于普通的小波变换检测方法.同时,该方法还有助于通过人工方式快速修正基音周期.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号