首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
为了方便学生使用中文或英文说出学号与名字登录注册系统,设计了中英文数字语音登录系统。采用MFCC(Mel频率倒谱系数)作为语音特征参数,在隐马尔可夫模型HMM(HiddenMarkovModel)框架下建立了基于语音识别开发工具包HTK的中英文连续数字语音识别系统,包括对语音信号的预处理、特征参数的提取,识别模版的训练,最后送到识别器进行识别。通过建立中文、英文和中英文混合训练集和测试集声学模型,并得到了较高的识别率,从而加强多媒体注册系统的稳定性和鲁棒性。  相似文献   

2.
提出一种基于隐马尔可夫模型(HMM)和学习向量量化(LVQ)神经网络的语音识别方法.该方法先用HMM生成最佳语音状态序列,然后用函数逼近技术产生对最佳状态序列进行时闻归正,最后通过LVQ神经网络进行分类识别.理论和实验结果表明,混合模型的识别率明显高于隐马尔可夫模型的识别率.  相似文献   

3.
史媛媛  刘加  刘润生 《电子学报》2002,30(7):959-963
尽管汉语数码语音识别只涉及十个数字,但由于不同数字的发音存在相同或相似的声母或韵母,造成汉语数码语音之间的混淆性很大.采用通常的隐含马尔科夫模型(HMM)作为汉语数码语音识别模型难以得到很高的识别率.为了解决汉语数码之间的混淆问题,提高汉语数码语音识别性能,本文在隐含马尔科夫模型的状态层次上采用线性区分分析方法,将不同状态之间容易混淆的特征样本构成混淆模式类,针对混淆模式类进行线性区分分析.通过线性区分变换,在变换特征空间中仅保留那些能够有效区分该混淆类别的特征参数.这种基于状态的线性区分分析有效地提高了模型对混淆数码的区分能力.实验表明即使采用状态数很少的粗糙识别模型,也能很大幅度提高模型的识别性能;经过线性区分变换优化后的汉语数码识别模型,孤立汉语数码语音识别率可以达到99.32%.  相似文献   

4.
利用隐马尔可夫模型(HMM)的动态时间序列建模能力及神经网络的模式分类能力,构成混合语音识别模型,同时考虑到语音信号的非平稳性,采用小波分析方法提取语音特征向量。通过时间规整方法,将所有具有可变长度的语音特征向量转换为相同维数的特征向量,从而简化了神经网络的结构。仿真结果表明,采用混合语音识别模型以及时间规整方法,不仅可提高识别率,同时大大缩减了训练时间,获得了很好的识别效果。  相似文献   

5.
吴佳龙  李坤  刘中 《电子科技》2015,28(2):22-25,29
语音识别在非接触式控制系统中的应用普遍,基于数字化平台的孤立词语音识别技术是一项重要的研究方向。文中介绍了短时能零差断点检测算法,采用MEL频率倒谱系数特征参数提取算法和动态时间规整的模型匹配方法,并用Matlab进行仿真,仿真结果表明,系统具有较强的实时性和较高的识别率。  相似文献   

6.
高斯混合模型采用固定混合数结构的建模方法并不符合说话人语音特征分布的多样性,从而出现过拟合或者欠拟合的情况并影响系统的识别性能。提出一种混合数可变的自适应高斯混合模型并将其应用于说话人识别。模型训练中根据说话人语音特征参数分布的聚类特性,采用吸收合并与分裂机制动态调整混合数以获得更加精确的拟合性能,提高系统识别率。实验结果显示,在特征参数MFCC和BFCC(Bilinear Frequency Cepstrum Coefficients)下相对误识率分别下降了41.41%和22.21%。  相似文献   

7.
李战明  苏敏  赵正天  李二超 《电声技术》2007,31(12):44-46,50
基于隐马尔可夫模型(HMM)和改进后的概率神经网络(PNN)模型提出了一种用于语音识别的混合模型,该模型首先利用HMM生成最佳语音状态序列,然后对最佳状态序列进行时间规整,最后通过PNN神经网络进行分类识别。给出了HMM参数训练及时间规整的算法。实验结果表明这种模型比HMM具有更好的识别效果。  相似文献   

8.
基于状态码本的准连续隐马尔可夫模型   总被引:1,自引:0,他引:1  
本文针对经典HMM模型对训练数据要求多且算法复杂的问题,提出了一种改进的模型一基于状态码本的准连续HMM模型(SCBHMM),该模型在有限训练数据的条件下能更加有效地描述语音信号的声学特征.通过将状态转移概率与动态谱变化量相关联,使得SCBHMM能有效地将语音信号的静态特征和动态特征相结合.通过在标准语音数据库USTC94上的大量实验表明了SCBHMM在汉语音节识别中的有效性,它缓减了模型对训练数据的要求,并大大降低了训练、识别的计算量,但同样取得了相当高的识别率.  相似文献   

9.
结合维吾尔语的语音特征和语义信息,在大量电话语音语料库的基础上,以建立维吾尔语连续音素识别平台为目标,通过构建隐马尔科夫模型工具HTK(Hidden Markov Model Toolkit)工具实现了维吾尔语连续音素识别算法:首先根据具体技术指标完成了较大规模电话语音语料库的录制和标注工作;确定音素为基元,通过训练获得了每个音素的HMM(Hidden Markov Model)声学模型,随后对输入的语音进行识别,声学模型在不同的高斯混合数目下,得出了识别结果;统计了32个音素的识别率并对它进行分析,为了进一步提高识别率奠定了基础。  相似文献   

10.
马帅  高岳  何翔宇 《电子质量》2011,(4):17-18,21
HMM模型(隐含马尔科夫模型)由于对时间序列结构具有较强的建模能力.而逐步成为主流的语音识别技术.该文首先深入浅出地概述了基于HMM的语音识别技术,然后介绍了三个基本问题,最后在MATLAB下实现了孤立词语音识别系统.  相似文献   

11.
The authors demonstrate the effectiveness of phonemic hidden Markov models with Gaussian mixture output densities (mixture HMMs) for speaker-dependent large-vocabulary word recognition. Speech recognition experiments show that for almost any reasonable amount of training data, recognizers using mixture HMMs consistently outperform those employing unimodal Gaussian HMMs. With a sufficiently large training set (e.g. more than 2500 words), use of HMMs with 25-component mixture distributions typically reduces recognition errors by about 40%. It is also found that the mixture HMMs outperform a set of unimodal generalized triphone models having the same number of parameters. Previous attempts to employ mixture HMMs for speech recognition proved discouraging because of the high complexity and computational cost in implementing the Baum-Welch training algorithm. It is shown how mixture HMMs can be implemented very simply in unimodal transition-based frameworks by allowing multiple transitions from one state to another  相似文献   

12.
基于电话用户交换机的语音识别系统研究   总被引:3,自引:0,他引:3  
本论文对电话用户交换机研制了一个声控语音命令交换系统,该系统能够实现与特定人无关中小词汇量连续命令语音自动识别,研究中统计了用和命令语句,生成相应识别文法网络,识别系统的训练采用由子词模型构成的复合模型进行强化训练,识别采用令牌传递式改进Viterbi算法,提高系统的识别性能,论文比较了不同语音特征参数以及隐含马尔可夫模型状态数对电话语音识别精度的影响,研究中还开发识别系统拒识系统,在无拒识情况下  相似文献   

13.
A neural network system which combines a self-organizing feature map and multilayer perception for the problem of isolated word speech recognition is presented. A new method combining self-organization learning and K-means clustering is used for the training of the feature map, and an efficient adaptive nearby-search coding method based on the `locality' of the self-organization is designed. The coding method is shown to save about 50% computation without degradation in recognition rate compared to full-search coding. Various experiments for different choices of parameters in the system were conducted on the TI 20 word database with best recognition rates as high as 99.5% for both speaker-dependent and multispeaker-dependent tests  相似文献   

14.
The authors present a new type of hidden Markov model (HMM) for vowel-to-consonant (VC) and consonant-to-vowel (CV) transitions based on the locus theory of speech perception. The parameters of the model can be trained automatically using the Baum-Welch algorithm and the training procedure does not require that instances of all possible CV and VC pairs be present. When incorporated into an isolated word recognizer with a 75000 word vocabulary it leads to the modest improvement in recognition rates. The authors give recognition results for the state interpolation HMM and compare them to those obtained by standard context-independent HMMs and generalized triphone models  相似文献   

15.
张晨燕  孙成立 《电信科学》2006,22(10):60-63
在SEED-DEC5502 DSP嵌入式系统开发平台上实现了一个面向非特定人的孤立词语音识别系统,与传统的基于特定人的语音识别系统相比,该系统无需用户训练,易于使用.系统采用改进的基于语音对数域能量变化率的实时端点检测算法,仅对检测的有声段语音进行特征提取和解码,减少了要处理的语音帧数;对状态输出概率计算进行了分析和优化,进一步降低了计算负担.实验表明系统在100词条的情况下识别率达到98%,识别时间为1.03倍实时.  相似文献   

16.
为了解决传统氦语音处理技术存在的处理速度慢、计算复杂、操作困难等问题,提出了一种采用机器学习的氦语音识别方法,通过深层网络学习高维信息、提取多种特征,不但解决了过拟合问题,同时也具备了字错率(Word Error Rate,WER)低、收敛速度快的优点。首先自建氦语音孤立词和连续氦语音数据库,对氦语音数据预处理,提取的语音特征主要包括共振峰特征、基音周期特征和FBank(Filter Bank)特征。之后将语音特征输入到由深度卷积神经网络(Deep Convolutional Neural Network,DCNN)和连接时序分类(Connectionist Temporal Classification,CTC)组成的声学模型进行语音到拼音的建模,最后应用Transformer语言模型得到汉字输出。提取共振峰特征、基音周期特征和FBank特征的氦语音孤立词识别模型相比于仅提取FBank特征的识别模型的WER降低了7.91%,连续氦语音识别模型的WER降低了14.95%。氦语音孤立词识别模型的最优WER为1.53%,连续氦语音识别模型的最优WER为36.89%。结果表明,所提方法可有效识别氦语音。  相似文献   

17.
It is well known that a strong relationship exists between human voices and the movement of articulatory facial muscles. In this paper, we utilize this knowledge to implement an automatic speech recognition scheme which uses solely surface electromyogram (EMG) signals. The sequence of EMG signals for each word is modelled by a hidden Markov model (HMM) framework. The main objective of the work involves building a model for state observation density when multichannel observation sequences are given. The proposed model reflects the dependencies between each of the EMG signals, which are described by introducing a global control variable. We also develop an efficient model training method, based on a maximum likelihood criterion. In a preliminary study, 60 isolated words were used as recognition variables. EMG signals were acquired from three articulatory facial muscles. The findings indicate that such a system may have the capacity to recognize speech signals with an accuracy of up to 87.07%, which is superior to the independent probabilistic model.  相似文献   

18.
A multi-HMM speaker-independent isolated word recognition system is described. In this system, three vector quantisation methods, the LBG algorithm, the EM algorithm, and a new MGC algorithm, are used for the classification of the speech space. These quantisations of the speech space are then used to produce three HMMs for each word in the vocabulary. In the recognition step, the Viterbi algorithm is used in the three subrecognisers. The log probabilities of the observation sequences matching-the models are multiplied by the weights determined by the recognition accuracies of individual subrecognisers and summed to give the log probability that the utterance is of a particular word in the vocabulary. This multi-HMM system results in a reduction of about 50% in the error rate in comparison with the single model system  相似文献   

19.
Two methods for generating training sets for a speech recognition system are studied. The first uses a nondeterministic statistical method to generate a uniform distribution of sentences from a finite state machine (FSM) represented in digraph form. The second method, a deterministic heuristic approach, takes into consideration the importance of word ordering to address the problem of coarticulation effects. The two methods are critically compared. The first algorithm, referred to as MARKOV, converts the FSM into a first-order Markov model. The digraphs are determined, transitive closure computed, transition probabilities are assigned, and stopping criteria established. An efficient algorithm for computing these parameters is described. Statistical tests are conducted to verify performance and demonstrate its utility. A second algorithm for generating training sentences, referred to as BIGRAM, uses heuristics to satisfy three requirements: adequate coverage of basic speech (subword) units; adequate coverage of words in the recognition vocabulary (intraword contextual units); and adequate coverage of word pairs bigrams (interword contextual units)  相似文献   

20.
一种基于SDTS的HMM训练算法   总被引:7,自引:0,他引:7  
用传统的BW算法训练语音识别系统的HMM需要大量的语音数据。本文在假设声学模型系统的子空间捆绑结构(SDTS)为己知的前提下,提出了一种新的训练算法,可以有效地减少系统对训练数据的需求。理论分析和仿真表明,与传统的BW算法比较,新的训练算法(IBW)可压缩模型参数15倍,从而可大量地减少训练数据。尽管新算法要用到系统的先验知识,但它还是显示了许多优越性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号