期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences

《Computer Speech and Language》2007,21(1):153-173

In the present paper, a trajectory model, derived from a hidden Markov model (HMM) by imposing explicit relationships between static and dynamic feature vector sequences, is developed and evaluated. The derived model, named a trajectory HMM, can alleviate two limitations of the standard HMM, which are (i) piece-wise constant statistics within a state and (ii) conditional independence assumption of state output probabilities, without increasing the number of model parameters. In the present paper, a Viterbi-type training algorithm based on the maximum likelihood criterion is also derived. The performance of the trajectory HMM was evaluated both in speech recognition and synthesis. In a speaker-dependent continuous speech recognition experiment, the trajectory HMM achieved an error reduction over the corresponding standard HMM. Subjective listening test results showed that the introduction of the trajectory HMM improved the naturalness of synthetic speech. 相似文献

2.

Combining diverse on-line and off-line systems for handwritten text line recognition

Marcus Liwicki Horst BunkeAuthor vitae 《Pattern recognition》2009,42(12):3254-3263

In this paper we present a multiple classifier system (MCS) for on-line handwriting recognition. The MCS combines several individual recognition systems based on hidden Markov models (HMMs) and bidirectional long short-term memory networks (BLSTM). Beside using two different recognition architectures (HMM and BLSTM), we use various feature sets based on on-line and off-line features to obtain diverse recognizers. Furthermore, we generate a number of different neural network recognizers by changing the initialization parameters. To combine the word sequences output by the recognizers, we incrementally align these sequences using the recognizer output voting error reduction framework (ROVER). For deriving the final decision, different voting strategies are applied. The best combination ensemble has a recognition rate of 84.13%, which is significantly higher than the 83.64% achieved if only one recognition architecture (HMM or BLSTM) is used for the combination, and even remarkably higher than the 81.26% achieved by the best individual classifier. To demonstrate the high performance of the classification system, the results are compared with two widely used commercial recognizers from Microsoft and Vision Objects. 相似文献

3.

基于改进粒子群算法的隐马尔可夫模型训练

朱嘉瑜高鹰《计算机工程与设计》2010,31(1)

针对隐马尔可夫模型传统训练算法易收敛于局部极值的问题,提出一种带极值扰动的自适应调整惯性权重和加速系数的粒子群算法,将改进后的粒子群优化算法引入到隐马尔可夫模型的训练中,分别对隐马尔可夫模型的状态数与参数进优化.通过对手写数字识别的实验说明,提出的基于改进粒子群优化算法的隐马尔可夫模型训练算法与传统隐马尔可夫模型训练算法Baum-Welch算法相比,能有效地跳出局部极值,从而使训练后的隐马尔可夫模型具有较高的识别能力. 相似文献

4.

一种针对区分性训练的受限线性搜索优化方法 总被引：1，自引：0，他引：1

刘聪胡郁戴礼荣王仁华《模式识别与人工智能》2010,23(4):450-455

提出一种称为“受限线性搜索”的优化方法,并用于语音识别中混合高斯的连续密度隐马尔科夫(CDHMM)模型的区分性训练。该方法可用于优化基于最大互信息(MMI)准则的区分性训练目标函数。在该方法中,首先把隐马尔科夫模型(HMM)的区分性训练问题看成一个受限的优化问题,并利用模型间的KL度量作为优化过程中的一个限制。再基于线性搜索的思想,指出通过限制更新前后模型间的KL度量,可将HMM的参数表示成一种简单的二次形式。该方法可用于优化混合高斯CDHMM模型中的任何参数,包括均值、协方差矩阵、高斯权重等。将该方法分别用于中英文两个标准语音识别任务上,包括英文TIDIGITS数据库和中文863数据库。实验结果表明,该方法相对传统的扩展Baum-Welch方法在识别性能和收敛特性上都取得一致提升。相似文献

5.

Large margin hidden Markov models for speech recognition 总被引：1，自引：0，他引：1

Hui Jiang Xinwei Li Chaojun Liu 《IEEE transactions on audio, speech, and language processing》2006,14(5):1584-1595

In this paper, motivated by large margin classifiers in machine learning, we propose a novel method to estimate continuous-density hidden Markov model (CDHMM) for speech recognition according to the principle of maximizing the minimum multiclass separation margin. The approach is named large margin HMM. First, we show this type of large margin HMM estimation problem can be formulated as a constrained minimax optimization problem. Second, we propose to solve this constrained minimax optimization problem by using a penalized gradient descent algorithm, where the original objective function, i.e., minimum margin, is approximated by a differentiable function and the constraints are cast as penalty terms in the objective function. The new training method is evaluated in the speaker-independent isolated E-set recognition and the TIDIGITS connected digit string recognition tasks. Experimental results clearly show that the large margin HMMs consistently outperform the conventional HMM training methods. It has been consistently observed that the large margin training method yields significant recognition error rate reduction even on top of some popular discriminative training methods. 相似文献

6.

隐马尔可夫模型的一种有区分力的反向传播训练方法

邓伟赵荣椿《自动化学报》2000,26(4):492-498

研究隐马尔可夫模型(HMM)的一种有区分力的训练方法.在多层前向神经网络的框架中实现了HMM的前向概率计算.基于这一框架,利用偏导数的反向传播计算方法,通过梯度上升的优化过程来实现互信息的最大化,从而对HMM进行有区分力的训练.这一训练方法被称之为HMM的反向传播训练方法.此外,还设计了一个用以实现这一训练方法的在数值计算上具有强鲁棒性的算法.语音识别的实验结果证实了这一训练方法的优越性. 相似文献

7.

A Constrained Line Search Optimization Method for Discriminative Training of HMMs

Liu P. Liu C. Jiang H. Soong F. Wang R.-H. 《IEEE transactions on audio, speech, and language processing》2008,16(5):900-909

In this paper, we propose a novel optimization algorithm called constrained line search (CLS) for discriminative training (DT) of Gaussian mixture continuous density hidden Markov model (CDHMM) in speech recognition. The CLS method is formulated under a general framework for optimizing any discriminative objective functions including maximum mutual information (MMI), minimum classification error (MCE), minimum phone error (MPE)/minimum word error (MWE), etc. In this method, discriminative training of HMM is first cast as a constrained optimization problem, where Kullback-Leibler divergence (KLD) between models is explicitly imposed as a constraint during optimization. Based upon the idea of line search, we show that a simple formula of HMM parameters can be found by constraining the KLD between HMM of two successive iterations in an quadratic form. The proposed CLS method can be applied to optimize all model parameters in Gaussian mixture CDHMMs, including means, covariances, and mixture weights. We have investigated the proposed CLS approach on several benchmark speech recognition databases, including TIDIGITS, Resource Management (RM), and Switchboard. Experimental results show that the new CLS optimization method consistently outperforms the conventional EBW method in both recognition performance and convergence behavior. 相似文献

8.

Noise Condition-Dependent Training Based on Noise Classification and SNR Estimation

Haitian Xu Dalsgaard P. Zheng-Hua Tan Lindberg B. 《IEEE transactions on audio, speech, and language processing》2007,15(8):2431-2443

Condition-dependent training strategy divides a training database into a number of clusters, each corresponding to a noise condition and subsequently trains a hidden Markov model (HMM) set for each cluster. This paper investigates and compares a number of condition-dependent training strategies in order to achieve a better understanding of the effects on automatic speech recogntion (ASR) performance as caused by a splitting of the training databases. Also, the relationship between mismatches in signal-to-noise ratio (SNR) is analyzed. The results show that a splitting of the training material in terms of both noise type and SNR value is advantageous compared to previously used methods, and that training of only a limited number of HMM sets is sufficient for each noise type for robustly handling of SNR mismatches. This leads to the introduction of an SNR and noise classification-based training strategy (SNT-SNC). Better ASR performance is obtained on test material containing data from known noise types as compared to either multicondition training or noise-type dependent training strategies. The computational complexity of the SNT-SNC framework is kept low by choosing only one HMM set for recognition. The HMM set is chosen on the basis of results from noise classification and SNR value estimations. However, compared to other strategies, the SNT-SNC framework shows lower performance for unknown noise types. This problem is partly overcome by introducing a number of model and feature domain techniques. Experiments using both artificially corrupted and real-world noisy speech databases are conducted and demonstrate the effectiveness of these methods. 相似文献

9.

Modelling of the interframe dependence in an HMM using conditional Gaussian mixtures

Ji Ming F.Jack Smith 《Computer Speech and Language》1996,10(4):229-247

This paper investigates the modelling of the interframe dependence in a hidden Markov model (HMM) for speech recognition. First, a new observation model, assuming dependence on multiple previous frames, is proposed. This model represents such a dependence structure with a weighted mixture of a set of first-order conditional Gaussian densities, each mixture component accounting for a specific conditional frame. Next, an optimization in choosing the conditional frames/segment is performed in both training and recognition, thereby helping to remove the mismatch of the conditional segments due to different observation histories. An EM (Expectation–Maximization) iteration algorithm is developed for the estimation of the model parameters and for the optimization over the dependence structure. Experimental comparisons on a speaker-independent E-set database show that the new model, without optimization on the dependence structure, achieves better performance than the standard HMM, the bigram HMM and the linear-predictive HMM, all in comparable or smaller parameter sizes. The optimization over the dependence structure leads to further improvement in the performance. 相似文献

10.

基于HMM与遗传神经网络的改进语音识别系统

吴延占《计算机系统应用》2016,25(1):204-208

为了解决语音信号中帧与帧之间的重叠,提高语音信号的自适应能力,本文提出基于隐马尔可夫(HMM)与遗传算法神经网络改进的语音识别系统.该改进方法主要利用小波神经网络对Mel频率倒谱系数(MFCC)进行训练,然后利用HMM对语音信号进行时序建模,计算出语音对HMM的输出概率的评分,结果作为遗传神经网络的输入,即得语音的分类识别信息.实验结果表明,改进的语音识别系统比单纯的HMM有更好的噪声鲁棒性,提高了语音识别系统的性能. 相似文献

11.

小词汇量非特定人语音识别在嵌入式系统中的应用 总被引：5，自引：0，他引：5

刘振安孙捷王晋军《计算机工程》2006,32(11):213-215

给出了一个嵌入式小词汇量非特定人语音识别系统的方案，它基于单片微控制器加数字信号处理器（MCU＋DSP）架构，用离散隐马尔可大模型方法实现语音识别系统。该系统适应性强，可扩展性好，具有一定的实时性和语言无关性。当在算法上针对汉语普通话进行优化后，则能进一步提高识别汉语的准确性和实时性。相似文献

12.

用于脱机手写数字识别的隐马尔可夫模型 总被引：9，自引：0，他引：9

刘刚张洪刚郭军《计算机研究与发展》2003,40(8):1252-1257

将隐马尔可夫模型(HMM)用于脱机手写数字识别中，系统如何建模是一个值得研究的问题．在考虑手写数字自身特点及特征抽取的基础上，对HMM模型的训练方法及模型参数的选取进行了研究，以提高系统识别率．在银行票据OCR的应用中，与基于神经网络的方法结合使用，使得整张票据的拒识率降低了3％，明显提高了银行票据OCR系统的性能．相似文献

13.

A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition

Zi-Rui Wang Jun Du Wen-Chao Wang Jian-Fang Zhai Jin-Shui Hu 《International Journal on Document Analysis and Recognition》2018,21(4):241-251

This paper proposes an effective segmentation-free approach using a hybrid neural network hidden Markov model (NN-HMM) for offline handwritten Chinese text recognition (HCTR). In the general Bayesian framework, the handwritten Chinese text line is sequentially modeled by HMMs with each representing one character class, while the NN-based classifier is adopted to calculate the posterior probability of all HMM states. The key issues in feature extraction, character modeling, and language modeling are comprehensively investigated to show the effectiveness of NN-HMM framework for offline HCTR. First, a conventional deep neural network (DNN) architecture is studied with a well-designed feature extractor. As for the training procedure, the label refinement using forced alignment and the sequence training can yield significant gains on top of the frame-level cross-entropy criterion. Second, a deep convolutional neural network (DCNN) with automatically learned discriminative features demonstrates its superiority to DNN in the HMM framework. Moreover, to solve the challenging problem of distinguishing quite confusing classes due to the large vocabulary of Chinese characters, NN-based classifier should output 19900 HMM states as the classification units via a high-resolution modeling within each character. On the ICDAR 2013 competition task of CASIA-HWDB database, DNN-HMM yields a promising character error rate (CER) of 5.24% by making a good trade-off between the computational complexity and recognition accuracy. To the best of our knowledge, DCNN-HMM can achieve a best published CER of 3.53%. 相似文献

14.

基于循环神经网络的语音识别模型 总被引：5，自引：1，他引：4

朱小燕王昱徐伟《计算机学报》2001,24(2):213-218

近年来基于隐马尔可夫模型（HMM）的语音识别技术得到了很大发展。然而HMM模型有着一定的局限性,如何克服HMM的一阶假设和独立性假设带来的问题一直是研究讨论的热点,在语音识别中引入神经网络的方法是克服HMM局限性的一条途径。该文将循环神经网络应用于汉语语音识别,修改了原网络模型并提出了相应的训练方法,实验结果表明该模型具有良好的连续信号处理性能,与传统的HMM模型效果相当,新的训练策略能够在提高训练速度的同时,使得模型分类性能有明显提高。相似文献

15.

基于HMM和GMM的维吾尔语联机手写体识别研究

许辉热依曼.吐尔逊吾守尔.斯拉木《计算机工程与应用》2014,(11):202-205,222

给出了一个基于HMM和GMM双引擎识别模型的维吾尔语联机手写体整词识别系统。在GMM部分,系统提取了8-方向特征,生成8-方向特征样式图像、定位空间采样点以及提取模糊的方向特征。在对模型精细化迭代训练之后,得到GMM模型文件。HMM部分,系统采用了笔段特征的方法来获取笔段分段点特征序列,在对模型进行精细化迭代训练后,得到HMM模型文件。将GMM模型文件和HMM模型文件分别打包封装再进行联合封装成字典。在第一期的实验中,系统的识别率达到97%,第二期的实验中,系统的识别率高达99%。相似文献

16.

Optical character recognition for cursive handwriting 总被引：5，自引：0，他引：5

Arica N. Yarman-Vural F.T. 《IEEE transactions on pattern analysis and machine intelligence》2002,24(6):801-813

A new analytic scheme, which uses a sequence of image segmentation and recognition algorithms, is proposed for the off-line cursive handwriting recognition problem. First, some global parameters, such as slant angle, baselines, stroke width and height, are estimated. Second, a segmentation method finds character segmentation paths by combining gray-scale and binary information. Third, a hidden Markov model (HMM) is employed for shape recognition to label and rank the character candidates. For this purpose, a string of codes is extracted from each segment to represent the character candidates. The estimation of feature space parameters is embedded in the HMM training stage together with the estimation of the HMM model parameters. Finally, information from a lexicon and from the HMM ranks is combined in a graph optimization problem for word-level recognition. This method corrects most of the errors produced by the segmentation and HMM ranking stages by maximizing an information measure in an efficient graph search algorithm. The experiments indicate higher recognition rates compared to the available methods reported in the literature 相似文献

17.

Parametric hidden Markov models for gesture recognition 总被引：7，自引：0，他引：7

Wilson A.D. Bobick A.F. 《IEEE transactions on pattern analysis and machine intelligence》1999,21(9):884-900

A method for the representation, recognition, and interpretation of parameterized gesture is presented. By parameterized gesture we mean gestures that exhibit a systematic spatial variation; one example is a point gesture where the relevant parameter is the two-dimensional direction. Our approach is to extend the standard hidden Markov model method of gesture recognition by including a global parametric variation in the output probabilities of the HMM states. Using a linear model of dependence, we formulate an expectation-maximization (EM) method for training the parametric HMM. During testing, a similar EM algorithm simultaneously maximizes the output likelihood of the PHMM for the given sequence and estimates the quantifying parameters. Using visually derived and directly measured three-dimensional hand position measurements as input, we present results that demonstrate the recognition superiority of the PHMM over standard HMM techniques, as well as greater robustness in parameter estimation with respect to noise in the input features. Finally, we extend the PHMM to handle arbitrary smooth (nonlinear) dependencies. The nonlinear formulation requires the use of a generalized expectation-maximization (GEM) algorithm for both training and the simultaneous recognition of the gesture and estimation of the value of the parameter. We present results on a pointing gesture, where the nonlinear approach permits the natural spherical coordinate parameterization of pointing direction 相似文献

18.

基于拉普拉斯脸和隐马尔可夫的视频人脸识别 总被引：1，自引：2，他引：1

下载免费PDF全文

江艳霞周宏仁敬忠良《计算机工程》2007,33(1):204-206

提出了一种基于拉普拉斯脸和隐马尔可夫模型的视频人脸识别方法。在训练过程中，采用拉普拉斯脸方法将每一视频序列中的人脸图像映射到拉普拉斯空间，将降维后的特征作为观测值，通过隐马尔可夫模型得到每一训练视频的统计特性和时间动态特性。在识别过程中，用每一个训练视频的隐马尔可夫模型来分析测试视频的时间动态特性，计算出每一训练模型产生该序列的概率，概率最大值所对应的模型就是待识别序列所属的类别。实验结果表明，该方法能够很好地进行视频人脸识别。相似文献

19.

Application of a stereovision neural network to continuous speech recognition

Tetsuro Kitazoe Tomoyuki Ichiki Makoto Funamori 《Artificial Life and Robotics》2001,5(3):165-170

The two- or three-layered neural networks (2LNN, 3LNN) which originated from stereovision neural networks are applied to speech recognition. To accommodate sequential data flow, we consider a window through which the new acoustic data enter and from which the final neural activities are output. Inside the window, a recurrent neural network develops neural activity toward a stable point. The process is called winner-take-all (WTA) with cooperation and competition. The resulting neural activities clearly showed recognition of continuous speech of a word. The string of phonemes obtained is compared with reference words by using a dynamic programming method. The resulting recognition rate was 96.7% for 100 words spoken by nine male speakers, compared with 97.9% by a hidden Markov model (HMM) with three states and a single gaussian distribution. These results, which are close to those of HMM, seem important because the architecture of the neural network is very simple, and the number of parameters in the neural net equations is small and fixed. This work was presented in part at the Fifth International Symposium on Artificial Life and Robotics, Oita, Japan, January 26–28, 2000 相似文献

20.

一种基于改进CP网络与HMM相结合的混合音素识别方法 总被引：2，自引：0，他引：2

邓伟赵荣椿《数据采集与处理》2000,15(1):6-11

提出了一种基于改进对偶传播（ＣＰ）神经网络与隐驰尔可夫模型（ＨＭＭ）相结合的混合音素识别方法．这一方法的特点是用一个具有有指导学习矢量量化（ＬＶＱ）和动态节点分配等特性的改进的ＣＰ网络生成离散ＨＭＭ音素识别系统中的码书。因此,用这一方法构造的混合音素识别系统中的码书实际上是一个由有指导ＬＶＱ算法训练的具有很强分类能力的高性能分类器,这就意味着在用ＨＭＭ对语音信号进行建模之前,由码书产生的观测序列中相似文献