首页 | 官方网站   微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
在智能人-机交互系统中,语音信号的情感分类是目前热点的研究领域,并且得到了广泛的应用.本文提出一种基于特征提取和借助支持向量机(support vector machine,SVM)分类器(classifier)的情感互相关性的方法,并应用于情感语音识别.利用这种方法对3种情感语音信号进行情感分类.SVM分类器是利用情感语音信号中情感互相关性的特征提取进行分类的.这种通过 SVM 分类器的情感互相关性的自动分类方法,可以将情感识别率大幅提高,并且在识别愤怒情感时的准确率可以达到95.04%.  相似文献   

Subband-based blind signal separation for noisy speech recognition   总被引:1,自引:0,他引:1  
A method for directly extracting clean speech features from noisy speech is proposed. This process is based on independent component analysis (ICA) and a new feature analysis technique for reducing the computational complexity of the frequency domain ICA. For noisy speech signals recorded in real environments, this method yielded a considerable performance improvement  相似文献   

为了提高情感识别的正确率,针对单一语音信号特征和表面肌电信号特征存在的局限性,提出了一种集成语音信号特征和表面肌电信号特征的情感自动识别模型.首先对语音信号和表面肌电信号进行预处理,并分别提取相关的语音信号和表面肌电信号特征,然后采用支持向量机对语音信号和表面肌电信号特征进行学习,分别建立相应的情感分类器,得到相应的识别结果,最后将识别结果分别输入到支持向量机确定两种特征的权重系数,从而得到最终的情感识别结果.两个标准语情感数据库的仿真结果表明,相对于其它情感识别模型,本文模型大幅提高了情感识别的正确率,人机交互情感识别系统提供了一种新的研究工具.  相似文献   

There has been considerable recent research into the connection between Parkinson's disease (PD) and speech impairment. Recently, a wide range of speech signal processing algorithms (dysphonia measures) aiming to predict PD symptom severity using speech signals have been introduced. In this paper, we test how accurately these novel algorithms can be used to discriminate PD subjects from healthy controls. In total, we compute 132 dysphonia measures from sustained vowels. Then, we select four parsimonious subsets of these dysphonia measures using four feature selection algorithms, and map these feature subsets to a binary classification response using two statistical classifiers: random forests and support vector machines. We use an existing database consisting of 263 samples from 43 subjects, and demonstrate that these new dysphonia measures can outperform state-of-the-art results, reaching almost 99% overall classification accuracy using only ten dysphonia features. We find that some of the recently proposed dysphonia measures complement existing algorithms in maximizing the ability of the classifiers to discriminate healthy controls from PD subjects. We see these results as an important step toward noninvasive diagnostic decision support in PD.  相似文献   

A Gabor atom neural network approach is proposed for signal classification. The Gabor atom network uses a multilayer feedforward neural network structure, and its input layer constitutes the feature extraction part, whereas the hidden layer and the output layer constitute the signal classification part. From the physics point of view, it is shown that the time-shifted, frequency-modulated, and scaled Gaussian function is available for a basic model for the signal of high-resolution radar. Two experiment examples show that the Gabor atom network approach has a higher recognition rate in radar target recognition from range profiles as compared with several existing methods  相似文献   

语音信号中的情感特征分析和识别的研究   总被引:11,自引:0,他引:11  
本文分析了含有欢快、愤怒、惊奇、悲伤等4种情感语音信号的时间构造、振幅构造、基频构造和共振峰构造的特征。通过和不带情感的平静语音信号的比较,总结了不同情感语音信号的情感特征的分布规律。根据这些分析,提取了9个情感特征进行了情感识别的实验,获得了基本上接近于人的正常表现的识别结果。  相似文献   

胡洋  蒲南江  吴黎慧  高磊 《电子测试》2011,(8):33-35,87
语音情感识别是语音识别中的重要分支,是和谐人机交互的基础理论。由于单一分类器在语音情感识别中的局限性,本文提出了隐马尔科夫模型(HMM)和人工神经网络(ANN)相结合的方法,对高兴、惊奇、愤怒、悲伤、恐惧、平静六种情感分别设计一个HMM模型,得到每种情感的最佳匹配序列,然后利用ANN作为后验分类器对测试样本进行分类,通...  相似文献   

A wide variety of speech recognition distortion measures have been proposed and tested, including some especially effective ones. It is shown that there is a general framework, based on the concepts of information theory, linking most of these measures. The distortion measure between any two speech spectra can be defined in terms of the distortions between the associated probability distributions. This general framework defines three broad families of distortion measures for speech recognition and provides a consistent way of combining the energy and the spectral information of a phonetic event. In addition, the cepstral-domain representation for several distortion measures is derived, allowing comparison of these measures in a domain that also yields convenient equations for their practical implementation  相似文献   

Graphical model architectures for speech recognition   总被引:3,自引:0,他引:3  
This article discusses the foundations of the use of graphical models for speech recognition as presented in J. R. Deller et al. (1993), X. D. Huang et al. (2001), F. Jelinek (19970, L. R. Rabiner and B. -H. Juang (1993) and S. Young et al. (1990) giving detailed accounts of some of the more successful cases. Our discussion employs dynamic Bayesian networks (DBNs) and a DBN extension using the Graphical Model Toolkit's (GMTK's) basic template, a dynamic graphical model representation that is more suitable for speech and language systems. While this article concentrates on speech recognition, it should be noted that many of the ideas presented here are also applicable to natural language processing and general time-series analysis.  相似文献   

The possibility of enhancing speech-recognition efficiency by using the supplemented-vocabulary method is studied. The minimum-information-mismatch criterion is proposed for selecting one; two; or, in a general case, several realizations of recognition words to be added to a working vocabulary. By use a particular practical example, it is shown that the positive effect achieved does not substantially weight the vocabulary and enhance the computational complexity.  相似文献   

Stochastic correlation model for speech recognition   总被引:1,自引:0,他引:1  
Ming  J. Smith  F.J. 《Electronics letters》1996,32(11):970-971
A stochastic model, drawn from the upper bound of the joint probability distributions, is suggested for modelling the spectral correlation in speech. Experiments on a speaker independent E-set database show the effectiveness of this new modelling approach  相似文献   

Jung  H.Y. Kim  D.Y. Un  C.K. 《Electronics letters》1996,32(13):1163-1164
The authors propose a frame decorrelation method to cope with background noise in speech recognition. Since noise is modelled as a stationary perturbation in most cases, it is effective in reducing slow-varying components. One example of using this principle is the highpass scheme. The proposed method has the same property as the highpass scheme. It transforms feature vector sequences into decorrelated sequences and enhances transition regions. Simulation results show that this method is effective for speech with significant noise, and works better than other highpass methods  相似文献   

Lipovac  V. 《Electronics letters》1989,25(2):90-92
The satisfactory estimation of speech autocorrelation by means of generalised zero-crossings indicates that they can be used for efficient feature extraction in speech recognition. In addition, high consistency between the Itakura-Saito distances, calculated before and after clipping, allowed for only a mode-rate degradation of the related recognition performance, which was compensated by including the excitation distortion into the distance measure.<>  相似文献   

A multi-model approach for noisy speech recognition is proposed. This approach comprised an SVD-based preprocessing front-end and a multi-model HMM recognition structure. It can provide a high recognition rate over a large range of SNRs for speech recognition in wide-band additive noise  相似文献   

A switched-capacitor (SC) preprocessing system (preprocessor) which extracts and emphasizes the local peaks of the spectrum in real time is proposed for speech recognition systems. Main components of the system are a specially designed bandpass filter bank, a low-pass decimation filter bank, two-dimensional local peak extraction (LPE) filters, and a LPE filter selection circuit. Furthermore, a SC cascaded integrator-comb filter design technique is proposed to realize the decimation low-pass filter and the LPE filter. Finally, the system is tested by using two speech recognition systems.  相似文献   

语音信号的情感特征分析与识别研究综述   总被引:13,自引:0,他引:13  
语音情感的分析与识别是近年来新兴研究课题之一,本文介绍了近几年来国内外语音情感识别的状况,阐述了各种人类情感分类的方法,归纳了各种语音特征参数的提取方法以及各特征参数对情感识别的意义,在此基础上综述了国内外在情感识别领域的研究进展与主要识别建模方法,同时总结了各种识别建模方法的利弊。最后概括了语音情感识别领域的发展趋势,并进行了展望。  相似文献   

语音作为一种搭载着特定的信息模拟信号,已成为人们社会生活中获取信息和传播信息的重要的手段。语音信号处理的目的就是在复杂的语音环境中提取有效的语音信息。环境干扰在语音传播过程中对信号的影响不容小觑,因此语音信号处理的抗噪声能力已经成为一个重要的研究方向。Matlab的应用有着广泛的领域,在信息处理领域其强大的数据处理能力可以将非平稳时变的语音数据转换为离散的数据,然后可对离散数据进行分析或者做进一步运算处理。它的信号处理工具箱可以迅速、有效地实现语音信号的处理和分析,Matlab是适用于信号处理领域的强大的处理工具。在此运用Matlab对一段包含有环境噪声的语音进行傅里叶变换、时域和频域分析、提取部分语音信号及分析信号的处理。  相似文献   

Two methods for generating training sets for a speech recognition system are studied. The first uses a nondeterministic statistical method to generate a uniform distribution of sentences from a finite state machine (FSM) represented in digraph form. The second method, a deterministic heuristic approach, takes into consideration the importance of word ordering to address the problem of coarticulation effects. The two methods are critically compared. The first algorithm, referred to as MARKOV, converts the FSM into a first-order Markov model. The digraphs are determined, transitive closure computed, transition probabilities are assigned, and stopping criteria established. An efficient algorithm for computing these parameters is described. Statistical tests are conducted to verify performance and demonstrate its utility. A second algorithm for generating training sentences, referred to as BIGRAM, uses heuristics to satisfy three requirements: adequate coverage of basic speech (subword) units; adequate coverage of words in the recognition vocabulary (intraword contextual units); and adequate coverage of word pairs bigrams (interword contextual units)  相似文献   

A dynamic programming processor with parallel and pipeline architecture is described. A 2-μm CMOS technology was applied to the DP processor, which is composed of 127309 transistors on a 7.17×8.62-mm2 die and is housed in an 84-pin PLCC (plastic leaded chip carrier) or PGA (pin grid array) package. The clock frequency is 20 MHz, and the instruction cycle time is 100 ns. Precise electrical simulations permitted the safe use of nonstandard logic and area and power reduction. Implementation of a direct access to all internal registers has proven useful for chip test and software development. A system using one DP processor has given very good results on a wide variety of applications and 0.48% error rate on tests with standard NATO tapes. These results are significantly better than those published for other systems on the same tests  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号