期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王新民姚天任《计算机工程与应用》2004,40(15):79-81

虽然基于对角协方差矩阵高斯分布的隐马尔可夫模型(HiddenMarkovModelBasedonDiagonalGaussiandistributions,HMM-DG)目前在现代大词表连续语音识别系统中得到了广泛的应用,但HMM-DG在帧内特征相关(intra-framefeaturescorrelation)建模方面存在缺陷。该文将因子分析方法与HMM-DG的混合高斯建模相结合,提出了一种具有弹性的帧内特征相关隐马尔可夫模型框架—基于因子分析的隐马尔可夫模型(HiddenMarkovModelBasedonFactorAnalysis,HMM-FA),并导出了HMM-FA的训练算法。仿真实验表明:在相同的条件下,HMM-FA的性能优于HMM-DG。相似文献

2.

基于隐马尔可夫模型和高斯混合模型结合的声音转换方法

岳振军邹翔王浩《数据采集与处理》2009,24(3)

针对隐马尔可夫模型较强的语音信号表征能力和高斯混合模型良好的声音转换效果,提出了一种了隐马尔可夫模型和高斯混合模型相结合转换线谱频率的方法,给出了理论推导和算法流程,并利用高斯建模实现了韵律特征的转换.利用所述算法对录制的两段语音进行了仿真实验,转换语音有较好的自然度和清晰度,ABX测试结果显示,文中算法得到的语音在听觉上有90.2%的概率更接近目标说话人语音. 相似文献

3.

基于双向长短时记忆联结时序分类和加权有限状态转换器的端到端中文语音识别系统

姚煜 RYAD Chellali 《计算机应用》2018,38(9):2495-2499

针对隐马尔可夫模型（HMM）在语音识别中存在的不合理条件假设,进一步研究循环神经网络的序列建模能力,提出了基于双向长短时记忆神经网络的声学模型构建方法,并将联结时序分类（CTC）训练准则成功地应用于该声学模型训练中,搭建出不依赖于隐马尔可夫模型的端到端中文语音识别系统;同时设计了基于加权有限状态转换器（WFST）的语音解码方法,有效解决了发音词典和语言模型难以融入解码过程的问题。与传统GMM-HMM系统和混合DNN-HMM系统对比,实验结果显示该端到端系统不仅明显降低了识别错误率,而且大幅提高了语音解码速度,表明了该声学模型可以有效地增强模型区分度和优化系统结构。相似文献

4.

韵律相关的汉语语音识别系统研究*

倪崇嘉刘文举徐波《计算机应用研究》2011,28(8):2941-2945

首先,给出结合韵律信息的系统框架。然后,针对汉语的特点,解决了韵律相关的语音识别系统中建模单元选择、模型训练等问题,并在多空间概率分布隐马尔可夫模型(multiple-space distribution hidden Markov model, MSD-HMM)框架下构建了韵律相关的语音识别系统。最后,通过语音识别的实验验证了方法的有效性。在“863”测试集上,该方法能够达到76.18%的带调音节识别正确率。相似文献

5.

基于隐马尔可夫模型的音频自动分类 总被引：27，自引：0，他引：27

卢坚陈毅松孙正兴张福炎《软件学报》2002,13(8):1593-1597

音频的自动分类,尤其是语音和音乐的分类,是提取音频结构和内容语义的重要手段之一,它在基于内容的音频检索、视频的检索和摘要以及语音文档检索等领域都有重大的应用价值.由于隐马尔可夫模型能够很好地刻画音频信号的时间统计特性,因此,提出一种基于隐马尔可夫模型的音频分类算法,用于语音、音乐以及它们的混合声音的分类.实验结果表明,隐马尔可夫模型的音频分类性能较好,最优分类精度达到90.28%. 相似文献

6.

语音识别关键技术研究 总被引：11，自引：0，他引：11

息晓静林坤辉周昌乐蔡骏《计算机工程与应用》2006,42(11):66-69,115

采用隐马尔可夫模型(HMM)进行语音声学建模是大词汇连续语音识别取得突破性进展最主要的原因之一,HMM本身依赖的某些不合理建模假设和不具有区分性的训练算法正在成为制约语音识别系统未来发展的瓶颈。神经网络依靠权能够进行长时间记忆和知识存储,但对于输入模式的瞬时响应的记忆能力比较差。采用混合HMM/ANN模型对HMM的一些不尽合理的建模假设和训练算法进行了革新。混合模型用神经网络非参数概率模型代替高斯混合器(GM)计算HMM的状态所需要的观测概率。另外对神经网络的结构进行了优化,取得了很好的效果。相似文献

7.

基于小波分析和HMM的语音识别模型建立与仿真

张丽王福忠张涛《计算机与现代化》2007,(9):72-75

利用隐马尔可夫模型HMM优异的时序建模能力及小波变换可以对信号进行多尺度分析并有效提取信号的局部信息的特点,建立了混合语音识别模型.在语音信号的识别过程中考虑到了信号的非平稳性,采用并行识别的方法分别获取分类信息,根据混合模型的识别算法做出识别决策,减小了系统对环境的依赖性,提高了其自适应能力.仿真实验结果表明,混合模型识别结果比单一HMM模型或小波模型识别结果更佳,提高了整体的识别速度和识别率. 相似文献

8.

基于HMM和遗传神经网络的语音识别系统 总被引：1，自引：0，他引：1

包亚萍郑骏武晓光《计算机工程与科学》2011,33(4):139

本文提出了一种基于隐马尔可夫(HMM)和遗传算法优化的反向传播网络(GA-BP)的混合模型语音识别方法。该方法首先利用HMM对语音信号进行时序建模,并计算出语音对HMM的输出概率的评分,将得到的概率评分作为优化后反向传播网络的输入,得到分类识别信息,最后根据混合模型的识别算法作出识别决策。通过Matlab软件对已有的样本数据进行训练和测试。仿真结果表明,由于设计充分利用了HMM时间建模能力强和GA-BP神经网络分类能力强等特点,该混合模型比单纯的HMM具有更强的抗噪性,克服了神经网络的局部最优问题,大大提高了识别的速度,明显改善了语音识别系统的性能。相似文献

9.

混合语音识别模型的设计与仿真研究

宋志章马丽刘省非李奇楠《计算机仿真》2012,29(5):152-155

研究语音识别率问题,语音信号是一种非平稳信号,含有大量噪声信息,目前大多数识别算法线性理论,难以正确识别语音信号非线性变化过程,识别正确率低。通过将隐马尔可夫模型(HMM)和SVM相结合组成一个混合抗噪语音识别模型(HMM-SVM)。同时用HMM模型对语音信号时序进行建模,并得到待识别语音信号的输出概率,然后将输出概率作为SVM的输入进行学习,得到语音分类信息,最后通过利用HMM-SVM识别结果做出正确识别决策。仿真结果表明,HMM-SVM提高语音识别正确率,尤其在低信噪比环境下,明显改善了语音识别系统的性能。相似文献

10.

改进的HMM和小波神经网络的抗噪语音识别

下载免费PDF全文

肖勇覃爱娜《计算机工程与应用》2010,46(22):162-164

通过MFFC计算出的语音特征系数,由于语音信号的动态性,帧之间有重叠,噪声的影响,使特征系数不能完全反映出语音的信息。提出一种隐马尔可夫模型（HMM）和小波神经网络（WNN）混合模型的抗噪语音识别方法。该方法对MFCC特征系数利用小波神经网络进行训练,得到新的MFCC特征系数。实验结果表明,在噪声环境下,该混合模型比单纯HMM具有更强的噪声鲁棒性,明显改善了语音识别系统的性能。相似文献

11.

Acoustic Factor Analysis for Streamed Hidden Markov Modeling

Jen-Tzung Chien Chuan-Wei Ting 《IEEE transactions on audio, speech, and language processing》2009,17(7):1279-1291

This paper presents a novel streamed hidden Markov model (HMM) framework for speech recognition. The factor analysis (FA) principle is adopted to explore the common factors from acoustic features. The streaming regularities in building HMMs are governed by the correlation between cepstral features, which is inherent in common factors. Those features corresponding to the same factor are generated by the identical HMM state. Accordingly, the multiple Markov chains are adopted to characterize the variation trends in different dimensions of cepstral vectors. An FA streamed HMM (FASHMM) method is developed to relax the assumption of standard HMM topology, namely, that all features of a speech frame perform the same state emission. The proposed FASHMM is more flexible than the streamed factorial HMM (SFHMM) where the streaming was empirically determined. To reduce the number of factor loading matrices in FA, we evaluated the similarity between individual matrices to find the optimal solution to parameter clustering of FA models. A new decoding algorithm was presented to perform FASHMM speech recognition. FASHMM carries out the streamed Markov chains for a sequence of multivariate Gaussian mixture observations through the state transitions of the partitioned vectors. In the experiments, the proposed method reduced the recognition error rates significantly when compared with the standard HMM and SFHMM methods. 相似文献

12.

Continuous speech recognition using linear dynamic models

Tao Ma Sundararajan Srinivasan Georgios Lazarou Joseph Picone 《International Journal of Speech Technology》2014,17(1):11-16

Hidden Markov models (HMMs) with Gaussian mixture distributions rely on an assumption that speech features are temporally uncorrelated, and often assume a diagonal covariance matrix where correlations between feature vectors for adjacent frames are ignored. A Linear Dynamic Model (LDM) is a Markovian state-space model that also relies on hidden state modeling, but explicitly models the evolution of these hidden states using an autoregressive process. An LDM is capable of modeling higher order statistics and can exploit correlations of features in an efficient and parsimonious manner. In this paper, we present a hybrid LDM/HMM decoder architecture that postprocesses segmentations derived from the first pass of an HMM-based recognition. This smoothed trajectory model is complementary to existing HMM systems. An Expectation-Maximization (EM) approach for parameter estimation is presented. We demonstrate a 13 % relative WER reduction on the Aurora-4 clean evaluation set, and a 13 % relative WER reduction on the babble noise condition. 相似文献

13.

Review of several stochastic speech unit models

Yann Gudon 《Computer Speech and Language》1992,6(4)

Hidden Markov models are commonly used for speech unit modelling. This type of model is composed of a non-observable or “hidden” process, representing the temporal structure of the speech unit, and an observation process linking the hidden process with the acoustic parameters extracted from the speech signal.Different types of hidden processes (Markov chain, semi-Markov chain, “expanded-state” Markov chain) as well as different types of observation processes (discrete, continuous, semi-continuous—multiple processes) are reviewed, showing their relationships. The maximum likelihood estimation of two-stage stochastic process parameters is presented in an a posteriori probability formalism. An intepretation of the expectation-maximization algorithm is proposed and the practical learning algorithms for hidden Markov models and hidden semi-Markov models are compared in terms of computation structure, probabilistic justification and complexity.This presentation is illustrated by experiments on a multi-speaker 130 isolated word recognition system. The implementation techniques are detailed and the different combinations of state occupancy modelling techniques and observation modelling techniques are studied from a practical point of view. 相似文献

14.

Robust Sequential Data Modeling Using an Outlier Tolerant Hidden Markov Model

Chatzis Sotirios P. Kosmopoulos Dimitrios I. Varvarigou Theodora A. 《IEEE transactions on pattern analysis and machine intelligence》2009,31(9):1657-1669

Hidden Markov (chain) models using finite Gaussian mixture models as their hidden state distributions have been successfully applied in sequential data modeling and classification applications. Nevertheless, Gaussian mixture models are well known to be highly intolerant to the presence of untypical data within the fitting data sets used for their estimation. Finite Student's t-mixture models have recently emerged as a heavier-tailed, robust alternative to Gaussian mixture models, overcoming these hurdles. To exploit these merits of Student's t-mixture models in the context of a sequential data modeling setting, we introduce, in this paper, a novel hidden Markov model where the hidden state distributions are considered to be finite mixtures of multivariate Student's t-densities. We derive an algorithm for the model parameters estimation under a maximum likelihood framework, assuming full, diagonal, and factor-analyzed covariance matrices. The advantages of the proposed model over conventional approaches are experimentally demonstrated through a series of sequential data modeling applications. 相似文献

15.

Speech driven photo realistic facial animation based on an articulatory DBN model and AAM features

Dongmei Jiang Yong Zhao Hichem Sahli Yanning Zhang 《Multimedia Tools and Applications》2014,73(1):397-415

This paper presents a photo realistic facial animation synthesis approach based on an audio visual articulatory dynamic Bayesian network model (AF_AVDBN), in which the maximum asynchronies between the articulatory features, such as lips, tongue and glottis/velum, can be controlled. Perceptual Linear Prediction (PLP) features from audio speech, as well as active appearance model (AAM) features from face images of an audio visual continuous speech database, are adopted to train the AF_AVDBN model parameters. Based on the trained model, given an input audio speech, the optimal AAM visual features are estimated via a maximum likelihood estimation (MLE) criterion, which are then used to construct face images for the animation. In our experiments, facial animations are synthesized for 20 continuous audio speech sentences, using the proposed AF_AVDBN model, as well as the state-of-art methods, being the audio visual state synchronous DBN model (SS_DBN) implementing a multi-stream Hidden Markov Model, and the state asynchronous DBN model (SA_DBN). Objective evaluations on the learned AAM features show that much more accurate visual features can be learned from the AF_AVDBN model. Subjective evaluations show that the synthesized facial animations using AF_AVDBN are better than those using the state based SA_DBN and SS_DBN models, in the overall naturalness and matching accuracy of the mouth movements to the speech content. 相似文献

16.

Denoising and recognition using hidden Markov models with observation distributions modeled by hidden Markov trees

Diego H. Milone Author Vitae Leandro E. Di Persia Author Vitae Author Vitae 《Pattern recognition》2010,43(4):1577-1589

Hidden Markov models have been found very useful for a wide range of applications in machine learning and pattern recognition. The wavelet transform has emerged as a new tool for signal and image analysis. Learning models for wavelet coefficients have been mainly based on fixed-length sequences, but real applications often require to model variable-length, very long or real-time sequences. In this paper, we propose a new learning architecture for sequences analyzed on short-term basis, but not assuming stationarity within each frame. Long-term dependencies will be modeled with a hidden Markov model which, in each internal state, will deal with the local dynamics in the wavelet domain, using a hidden Markov tree. The training algorithms for all the parameters in the composite model are developed using the expectation-maximization framework. This novel learning architecture could be useful for a wide range of applications. We detail two experiments with artificial and real data: model-based denoising and speech recognition. Denoising results indicate that the proposed model and learning algorithm are more effective than previous approaches based on isolated hidden Markov trees. In the case of the ‘Doppler’ benchmark sequence, with 1024 samples and additive white noise, the new method reduced the mean squared error from 1.0 to 0.0842. The proposed methods for feature extraction, modeling and learning, increased the phoneme recognition rates in 28.13%, with better convergence than models based on Gaussian mixtures. 相似文献

17.

The Hierarchical Hidden Markov Model: Analysis and Applications 总被引：20，自引：0，他引：20

Fine Shai Singer Yoram Tishby Naftali 《Machine Learning》1998,32(1):41-62

We introduce, analyze and demonstrate a recursive hierarchical generalization of the widely used hidden Markov models, which we name Hierarchical Hidden Markov Models (HHMM). Our model is motivated by the complex multi-scale structure which appears in many natural sequences, particularly in language, handwriting and speech. We seek a systematic unsupervised approach to the modeling of such structures. By extending the standard Baum-Welch (forward-backward) algorithm, we derive an efficient procedure for estimating the model parameters from unlabeled data. We then use the trained model for automatic hierarchical parsing of observation sequences. We describe two applications of our model and its parameter estimation procedure. In the first application we show how to construct hierarchical models of natural English text. In these models different levels of the hierarchy correspond to structures on different length scales in the text. In the second application we demonstrate how HHMMs can be used to automatically identify repeated strokes that represent combination of letters in cursive handwriting. 相似文献

18.

基于上下文的二阶隐马尔可夫模型

下载免费PDF全文

刘洁彬宋茂强赵方杨志宇《计算机工程》2010,36(10):231-232

为体现上下文信息对当前词汇词性的影响,在传统隐马尔可夫模型的基础上提出一种基于上下文的二阶隐马尔可夫模型,并应用于中文词性标注中。针对改进后的统计模型中由于训练数据过少而出现的数据稀疏问题,给出基于指数线性插值改进平滑算法,对参数进行有效平滑。实验表明,基于上下文的二阶隐马尔可夫模型比传统的隐马尔可夫模型具有更高的词性标注正确率和消歧率。相似文献

19.

HMM在自然语言处理领域中的应用研究

韩普姜杰《微机发展》2010,(2):245-248,252

隐马尔可夫模型（HMM）是一种强大的统计学机器学习技术,该模型已经成功地应用于连续语音识别、在线手写识别,在生物学信息中也得到了广泛的应用。由于该模型的强大的学习能力,在自然语言处理领域逐渐得到了应用。对隐马尔可夫模型在词性标注、命名实体识别、信息抽取应用中的关键问题进行了分析。着重分析了在信息抽取时使用隐马尔可夫模型的重点和难点问题,期望让更多的研究人员进一步认识和了解HMM。最后分析了隐马尔可夫模型在应用中的不足之处和改进研究。相似文献