共查询到20条相似文献,搜索用时 0 毫秒
1.
在智能人-机交互系统中,语音信号的情感分类是目前热点的研究领域,并且得到了广泛的应用.本文提出一种基于特征提取和借助支持向量机(support vector machine,SVM)分类器(classifier)的情感互相关性的方法,并应用于情感语音识别.利用这种方法对3种情感语音信号进行情感分类.SVM分类器是利用情感语音信号中情感互相关性的特征提取进行分类的.这种通过 SVM 分类器的情感互相关性的自动分类方法,可以将情感识别率大幅提高,并且在识别愤怒情感时的准确率可以达到95.04%. 相似文献
2.
Hyung-Min Park Ho-Young Jung Te-Won Lee Soo-Young Lee 《Electronics letters》1999,35(23):2011-2012
A method for directly extracting clean speech features from noisy speech is proposed. This process is based on independent component analysis (ICA) and a new feature analysis technique for reducing the computational complexity of the frequency domain ICA. For noisy speech signals recorded in real environments, this method yielded a considerable performance improvement 相似文献
3.
为了提高情感识别的正确率,针对单一语音信号特征和表面肌电信号特征存在的局限性,提出了一种集成语音信号特征和表面肌电信号特征的情感自动识别模型.首先对语音信号和表面肌电信号进行预处理,并分别提取相关的语音信号和表面肌电信号特征,然后采用支持向量机对语音信号和表面肌电信号特征进行学习,分别建立相应的情感分类器,得到相应的识别结果,最后将识别结果分别输入到支持向量机确定两种特征的权重系数,从而得到最终的情感识别结果.两个标准语情感数据库的仿真结果表明,相对于其它情感识别模型,本文模型大幅提高了情感识别的正确率,人机交互情感识别系统提供了一种新的研究工具. 相似文献
4.
Tsanas A Little MA McSharry PE Spielman J Ramig LO 《IEEE transactions on bio-medical engineering》2012,59(5):1264-1271
There has been considerable recent research into the connection between Parkinson's disease (PD) and speech impairment. Recently, a wide range of speech signal processing algorithms (dysphonia measures) aiming to predict PD symptom severity using speech signals have been introduced. In this paper, we test how accurately these novel algorithms can be used to discriminate PD subjects from healthy controls. In total, we compute 132 dysphonia measures from sustained vowels. Then, we select four parsimonious subsets of these dysphonia measures using four feature selection algorithms, and map these feature subsets to a binary classification response using two statistical classifiers: random forests and support vector machines. We use an existing database consisting of 263 samples from 43 subjects, and demonstrate that these new dysphonia measures can outperform state-of-the-art results, reaching almost 99% overall classification accuracy using only ten dysphonia features. We find that some of the recently proposed dysphonia measures complement existing algorithms in maximizing the ability of the classifiers to discriminate healthy controls from PD subjects. We see these results as an important step toward noninvasive diagnostic decision support in PD. 相似文献
5.
Yu Shi Xian-Da Zhang 《Signal Processing, IEEE Transactions on》2001,49(12):2994-3004
A Gabor atom neural network approach is proposed for signal classification. The Gabor atom network uses a multilayer feedforward neural network structure, and its input layer constitutes the feature extraction part, whereas the hidden layer and the output layer constitute the signal classification part. From the physics point of view, it is shown that the time-shifted, frequency-modulated, and scaled Gaussian function is available for a basic model for the signal of high-resolution radar. Two experiment examples show that the Gabor atom network approach has a higher recognition rate in radar target recognition from range profiles as compared with several existing methods 相似文献
6.
7.
8.
A wide variety of speech recognition distortion measures have been proposed and tested, including some especially effective ones. It is shown that there is a general framework, based on the concepts of information theory, linking most of these measures. The distortion measure between any two speech spectra can be defined in terms of the distortions between the associated probability distributions. This general framework defines three broad families of distortion measures for speech recognition and provides a consistent way of combining the energy and the spectral information of a phonetic event. In addition, the cepstral-domain representation for several distortion measures is derived, allowing comparison of these measures in a domain that also yields convenient equations for their practical implementation 相似文献
9.
Graphical model architectures for speech recognition 总被引:3,自引:0,他引:3
《Signal Processing Magazine, IEEE》2005,22(5):89-100
This article discusses the foundations of the use of graphical models for speech recognition as presented in J. R. Deller et al. (1993), X. D. Huang et al. (2001), F. Jelinek (19970, L. R. Rabiner and B. -H. Juang (1993) and S. Young et al. (1990) giving detailed accounts of some of the more successful cases. Our discussion employs dynamic Bayesian networks (DBNs) and a DBN extension using the Graphical Model Toolkit's (GMTK's) basic template, a dynamic graphical model representation that is more suitable for speech and language systems. While this article concentrates on speech recognition, it should be noted that many of the ideas presented here are also applicable to natural language processing and general time-series analysis. 相似文献
10.
V. V. Savchenko P. G. Lukin 《Journal of Communications Technology and Electronics》2006,51(2):192-196
The possibility of enhancing speech-recognition efficiency by using the supplemented-vocabulary method is studied. The minimum-information-mismatch criterion is proposed for selecting one; two; or, in a general case, several realizations of recognition words to be added to a working vocabulary. By use a particular practical example, it is shown that the positive effect achieved does not substantially weight the vocabulary and enhance the computational complexity. 相似文献
11.
Stochastic correlation model for speech recognition 总被引:1,自引:0,他引:1
A stochastic model, drawn from the upper bound of the joint probability distributions, is suggested for modelling the spectral correlation in speech. Experiments on a speaker independent E-set database show the effectiveness of this new modelling approach 相似文献
12.
The authors propose a frame decorrelation method to cope with background noise in speech recognition. Since noise is modelled as a stationary perturbation in most cases, it is effective in reducing slow-varying components. One example of using this principle is the highpass scheme. The proposed method has the same property as the highpass scheme. It transforms feature vector sequences into decorrelated sequences and enhances transition regions. Simulation results show that this method is effective for speech with significant noise, and works better than other highpass methods 相似文献
13.
14.
The satisfactory estimation of speech autocorrelation by means of generalised zero-crossings indicates that they can be used for efficient feature extraction in speech recognition. In addition, high consistency between the Itakura-Saito distances, calculated before and after clipping, allowed for only a mode-rate degradation of the related recognition performance, which was compensated by including the excitation distortion into the distance measure.<> 相似文献
15.
Cun-Tai Guan Shu-Hung Leung Wing-Hong Lan 《Electronics letters》1998,34(1):30-32
A multi-model approach for noisy speech recognition is proposed. This approach comprised an SVD-based preprocessing front-end and a multi-model HMM recognition structure. It can provide a high recognition rate over a large range of SNRs for speech recognition in wide-band additive noise 相似文献
16.
Yoshihiko Horio Shogo Nakamura Hiroyuki Takase 《Analog Integrated Circuits and Signal Processing》1992,2(2):79-94
A switched-capacitor (SC) preprocessing system (preprocessor) which extracts and emphasizes the local peaks of the spectrum in real time is proposed for speech recognition systems. Main components of the system are a specially designed bandpass filter bank, a low-pass decimation filter bank, two-dimensional local peak extraction (LPE) filters, and a LPE filter selection circuit. Furthermore, a SC cascaded integrator-comb filter design technique is proposed to realize the decimation low-pass filter and the LPE filter. Finally, the system is tested by using two speech recognition systems. 相似文献
17.
18.
语音作为一种搭载着特定的信息模拟信号,已成为人们社会生活中获取信息和传播信息的重要的手段。语音信号处理的目的就是在复杂的语音环境中提取有效的语音信息。环境干扰在语音传播过程中对信号的影响不容小觑,因此语音信号处理的抗噪声能力已经成为一个重要的研究方向。Matlab的应用有着广泛的领域,在信息处理领域其强大的数据处理能力可以将非平稳时变的语音数据转换为离散的数据,然后可对离散数据进行分析或者做进一步运算处理。它的信号处理工具箱可以迅速、有效地实现语音信号的处理和分析,Matlab是适用于信号处理领域的强大的处理工具。在此运用Matlab对一段包含有环境噪声的语音进行傅里叶变换、时域和频域分析、提取部分语音信号及分析信号的处理。 相似文献
19.
Brown M.K. McGee M.A. Rabiner L.R. Wilpon J.G. 《Signal Processing, IEEE Transactions on》1991,39(6):1268-1281
Two methods for generating training sets for a speech recognition system are studied. The first uses a nondeterministic statistical method to generate a uniform distribution of sentences from a finite state machine (FSM) represented in digraph form. The second method, a deterministic heuristic approach, takes into consideration the importance of word ordering to address the problem of coarticulation effects. The two methods are critically compared. The first algorithm, referred to as MARKOV, converts the FSM into a first-order Markov model. The digraphs are determined, transitive closure computed, transition probabilities are assigned, and stopping criteria established. An efficient algorithm for computing these parameters is described. Statistical tests are conducted to verify performance and demonstrate its utility. A second algorithm for generating training sentences, referred to as BIGRAM, uses heuristics to satisfy three requirements: adequate coverage of basic speech (subword) units; adequate coverage of words in the recognition vocabulary (intraword contextual units); and adequate coverage of word pairs bigrams (interword contextual units) 相似文献
20.
Quenot G.M. Gauvain J.-L. Gangolf J.-J. Mariani J.J. 《Solid-State Circuits, IEEE Journal of》1989,24(2):349-357
A dynamic programming processor with parallel and pipeline architecture is described. A 2-μm CMOS technology was applied to the DP processor, which is composed of 127309 transistors on a 7.17×8.62-mm2 die and is housed in an 84-pin PLCC (plastic leaded chip carrier) or PGA (pin grid array) package. The clock frequency is 20 MHz, and the instruction cycle time is 100 ns. Precise electrical simulations permitted the safe use of nonstandard logic and area and power reduction. Implementation of a direct access to all internal registers has proven useful for chip test and software development. A system using one DP processor has given very good results on a wide variety of applications and 0.48% error rate on tests with standard NATO tapes. These results are significantly better than those published for other systems on the same tests 相似文献