期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Thai spelling analysis for automatic spelling speech recognition

Chutima Pisarn Thanaruk Theeramunkong 《Information Sciences》2008,178(1):122-136

Spelling speech recognition can be applied for several purposes including enhancement of speech recognition systems and implementation of name retrieval systems. This paper presents a Thai spelling analysis to develop a Thai spelling speech recognizer. The Thai phonetic characteristics, alphabet system and spelling methods have been analyzed. As a training resource, two alternative corpora, a small spelling speech corpus and an existing large continuous speech corpus, are used to train hidden Markov models (HMMs). Then their recognition results are compared to each other. To solve the problem of utterance speed difference between spelling utterances and continuous speech utterances, the adjustment of utterance speed has been taken into account. Two alternative language models, bigram and trigram, are used for investigating performance of spelling speech recognition. Our approach achieves up to 98.0% letter correction rate, 97.9% letter accuracy and 82.8% utterance correction rate when the language model is trained based on trigram and the acoustic model is trained from the small spelling speech corpus with eight Gaussian mixtures. 相似文献

2.

Robust speech recognition method based on discriminative environment feature extraction

下载免费PDF全文

韩纪庆高文《计算机科学技术学报》2001,16(5):458-464

It is an effective approach to learn the influence of environmental parameters,such as additive noise and channel distortions,from training data for robust speech recognition.Most of the previous methods are based on maximum likelihood estimation criterion.However,these methods do not lead to a minimum error rate result.In this paper,a novel discriinative learning method of environmental parameters,which is based on Minimum Classification Error (MCE) criterion,is proposed.In the method,a simple classifier and the Generalized Probabilistic Descent (GPD)algorithm are adopted to iteratively learn the environmental parameters.Consequently,the clean speech features are estimated from the noisy speech features with the estimated environmental parameters,and then the estimations of clean speech features are utilized in the back-end HMM classifier,Experiments show that the best error rate reudction of 32.1% is obtained,tested on a task of 18 isolated confusion Korean words,relative to a conventional HMM system. 相似文献

3.

Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach

《Advanced Engineering Informatics》2014,28(1):102-110

Dysarthria is a neurological impairment of controlling the motor speech articulators that compromises the speech signal. Automatic Speech Recognition (ASR) can be very helpful for speakers with dysarthria because the disabled persons are often physically incapacitated. Mel-Frequency Cepstral Coefficients (MFCCs) have been proven to be an appropriate representation of dysarthric speech, but the question of which MFCC-based feature set represents dysarthric acoustic features most effectively has not been answered. Moreover, most of the current dysarthric speech recognisers are either speaker-dependent (SD) or speaker-adaptive (SA), and they perform poorly in terms of generalisability as a speaker-independent (SI) model. First, by comparing the results of 28 dysarthric SD speech recognisers, this study identifies the best-performing set of MFCC parameters, which can represent dysarthric acoustic features to be used in Artificial Neural Network (ANN)-based ASR. Next, this paper studies the application of ANNs as a fixed-length isolated-word SI ASR for individuals who suffer from dysarthria. The results show that the speech recognisers trained by the conventional 12 coefficients MFCC features without the use of delta and acceleration features provided the best accuracy, and the proposed SI ASR recognised the speech of the unforeseen dysarthric evaluation subjects with word recognition rate of 68.38%. 相似文献

4.

语音情感识别中特征参数的研究进展

李杰周萍《传感器与微系统》2012,31(2):4-7

语音情感识别是近年来新兴的研究课题之一,特征参数的提取直接影响到最终的识别效率,特征降维可以提取出最能区分不同情感的特征参数。提出了特征参数在语音情感识别中的重要性,介绍了语音情感识别系统的基本组成,重点对特征参数的研究现状进行了综述,阐述了目前应用于情感识别的特征降维常用方法,并对其进行了分析比较。展望了语音情感识别的可能发展趋势。相似文献

5.

Invited paper: Automatic speech recognition: History, methods and challenges

Douglas O’Shaughnessy Author Vitae 《Pattern recognition》2008,41(10):2965-2979

The field of automatic speech recognition (ASR) is discussed from the viewpoint of pattern recognition (PR). This tutorial examines the problem area, its methods, successes and failures, focusing on the nature of the speech signal and techniques to accomplish useful data reduction. Comparison is made with other areas of PR. Suggestions are given for areas of future progress. 相似文献

6.

An on-line adaptive neural network for speech recognition

Li-Peng Zhang Li-Mei Li Zheru Chi 《International Journal of Speech Technology》1998,2(3):241-248

In this paper, we present an on-line learning neural network model, Dynamic Recognition Neural Network (DRNN), for real-time speech recognition. The property of accumulative learning of the DRNN makes it very suitable for real-time speech recognition with on-line learning. A comparison between the DRNN and Hidden Markov Model (HMM) shows that the computational complexity of the former is lower than that of the latter in both training and recognition. Encouraging results are obtained when the DRNN is tested on a BUPT digit database (Mandarin) and on the on-line learning of twenty isolated English computer command words. 相似文献

7.

Joint evaluation of multiple speech patterns for speech recognition and training

Nishanth Ulhas Nair T.V. Sreenivas 《Computer Speech and Language》2010,24(2):307-340

We are addressing the novel problem of jointly evaluating multiple speech patterns for automatic speech recognition and training. We propose solutions based on both the non-parametric dynamic time warping (DTW) algorithm, and the parametric hidden Markov model (HMM). We show that a hybrid approach is quite effective for the application of noisy speech recognition. We extend the concept to HMM training wherein some patterns may be noisy or distorted. Utilizing the concept of “virtual pattern” developed for joint evaluation, we propose selective iterative training of HMMs. Evaluating these algorithms for burst/transient noisy speech and isolated word recognition, significant improvement in recognition accuracy is obtained using the new algorithms over those which do not utilize the joint evaluation strategy. 相似文献

8.

双模态语音识别中乘积HMM权重系数与瞬时SNR的关系研究

赵晖顾亚强唐朝京《计算机应用》2009,29(Z2)

在有噪声污染等复杂情况下,为了能够得到更高的语音识别率,提出了一种新的乘积隐马尔可夫模型(HMM)用于双模态语音识别,研究并确定了模型中权重系数与瞬时信噪比(SNR)之间的关系.该模型在独立训练音频和视频HMM的基础上,建立二雏训练模型,并使用重估策略保证更高的准确性.同时引入广义几率递减(GPD)算法,调整音视频特征的权重系数.实验结果表明,提出的方法在噪声环境下体现出了良好稳定的识别性能. 相似文献

9.

利用投票选择机制进行语音分割的新方法 总被引：1，自引：1，他引：0

黄湘松赵春晖陈立伟《计算机工程与应用》2009,45(24):21-24

针对在噪声背景下连续语音信号的语音分割性能会明显下降的问题,提出了一种针对连续语音信号分割的新方法。该方法不再采用单一的端点检测方法,而是将基于分形维数的端点检测方法,基于倒谱特征的端点检测方法,基于HMM的端点检测方法等多种不同方法下得到的端点检测结果,通过投票选择的方式,得到最终的端点检测结果,从而达到对连续语音信号进行分割的目的。实验结果表明,该方法较明显地提高了语音分割的准确性。相似文献

10.

汉语语音识别中融合发音信息的随机段模型研究

晁浩杨占磊刘文举《计算机应用研究》2014,(11)

提出了一种基于随机段模型的发音信息集成方法。根据随机段模型的模型特性,建立了阶层式人工神经网络来获取语音段信号属于各类音素的后验概率,并通过一遍解码的方式集成到随机段模型系统中。在“863-test”测试集上进行的汉语连续语音识别实验显示汉语字的相对错误率下降了5．93％。实验结果表明了将发音信息应用到随机段模型的可行性。相似文献

11.

汉语语音识别中融合发音信息的随机段模型研究

晁浩刘志中薛霄《计算机应用研究》2015,32(4)

提出了一种基于随机段模型的发音信息集成方法.根据随机段模型的模型特性,建立了阶层式人工神经网络来获取语音段信号属于各类音素的后验概率,并通过一遍解码的方式集成到随机段模型系统中.在“863-test”测试集上进行的汉语连续语音识别实验显示汉语字的相对错误率下降了5.93％.实验结果表明了将发音信息应用到随机段模型的可行性. 相似文献

12.

A pilot study on augmented speech communication based on Electro-Magnetic Articulography

Panikos Heracleous Pierre BadinGérard Bailly Norihiro Hagita 《Pattern recognition letters》2011,32(8):1119-1125

Speech is the most natural form of communication for human beings. However, in situations where audio speech is not available because of disability or adverse environmental condition, people may resort to alternative methods such as augmented speech, that is, audio speech supplemented or replaced by other modalities, such as audiovisual speech, or Cued Speech. This article introduces augmented speech communication based on Electro-Magnetic Articulography (EMA). Movements of the tongue, lips, and jaw are tracked by EMA and are used as features to create hidden Markov models (HMMs). In addition, automatic phoneme recognition experiments are conducted to examine the possibility of recognizing speech only from articulation, that is, without any audio information. The results obtained are promising, which confirm that phonetic features characterizing articulation are as discriminating as those characterizing acoustics (except for voicing). This article also describes experiments conducted in noisy environments using fused audio and EMA parameters. It has been observed that when EMA parameters are fused with noisy audio speech, the recognition rate increases significantly as compared with using noisy audio speech only. 相似文献

13.

A Hidden Markov Model approach for appearance-based 3D object recognition

Manuele Bicego Umberto Castellani Vittorio Murino 《Pattern recognition letters》2005,26(16):2588-2599

In this paper, a new appearance-based 3D object classification method is proposed based on the Hidden Markov Model (HMM) approach. Hidden Markov Models are a widely used methodology for sequential data modelling, of growing importance in the last years. In the proposed approach, each view is subdivided in regular, partially overlapped sub-images, and wavelet coefficients are computed for each window. These coefficients are then arranged in a sequential fashion to compose a sequence vector, which is used to train a HMM, paying particular attention to the model selection issue and to the training procedure initialization. A thorough experimental evaluation on a standard database has shown promising results, also in presence of image distortions and occlusions, the latter representing one of the most severe problems of the recognition methods. This analysis suggests that the proposed approach represents an interesting alternative to classic appearance-based methods to 3D object classification. 相似文献

14.

混合语音识别系统的一种新的简化神经网络结构

邓伟《数据采集与处理》2002,17(1):25-28

研究适用于隐马尔可夫模型（HMM）结合多层感知器（MLP）的小词汇量混合语音识别系统的一种简化神经网络结构。利用小词汇量混合语音识别系统中的HMM状态所形成的规则的二维阵列,对状态观测概率进行分解。基于这种利用HMM的二维结构特性的方法,实现了用一种由多个简单的MLP所组成的简化神经网络结构来估计状态观测概率。理论分析和语音识别实验的结果都表明,这种简化神经网络结构在性能上优于Franco等人提出的简化神经网络结构。相似文献

15.

Smoothing of HMM parameters for efficient recognition of online handwriting

O. Samanta U. Bhattacharya S.K. Parui 《Pattern recognition》2014

相似文献

16.

A comparison of skin history and trajectory-based representation schemes for the recognition of user-specified gestures

Stephen J. McKenna Author Vitae Kenny Morrison^{Author Vitae} 《Pattern recognition》2004,37(5):999-1009

Gesture recognition error rates and the qualitative nature of the errors made are heavily influenced by the choice of visual representation. A direct empirical comparison of two contrasting approaches, namely trajectory- and history-based representation, is presented. Skin colour is used as a common visual cue and recognition is based on hidden Markov models, moment features and normalised template matching. Two novel representation schemes are proposed and evaluated: (i) skin history images and (ii) composite history images which represent occluded motion. Results are reported for an application in which able-bodied and disabled subjects specify their own gesture vocabularies. 相似文献

17.

Joint variable frame rate and length analysis for speech recognition under adverse conditions

《Computers & Electrical Engineering》2014,40(7):2139-2149

This paper presents a method that combines variable frame length and rate analysis for speech recognition in noisy environments, together with an investigation of the effect of different frame lengths on speech recognition performance. The method adopts frame selection using an a posteriori signal-to-noise (SNR) ratio weighted energy distance and increases the length of the selected frames, according to the number of non-selected preceding frames. It assigns a higher frame rate and a normal frame length to a rapidly changing and high SNR region of a speech signal, and a lower frame rate and an increased frame length to a steady or low SNR region. The speech recognition results show that the proposed variable frame rate and length method outperforms fixed frame rate and length analysis, as well as standalone variable frame rate analysis in terms of noise-robustness. 相似文献

18.

Local contrast enhancement and adaptive feature extraction for illumination-invariant face recognition

Wen-Chung Kao^{Author Vitae} Ming-Chai Hsu Author VitaeAuthor Vitae 《Pattern recognition》2010,43(5):1736-1747

Recognizing human faces in various lighting conditions is quite a difficult problem. The problem becomes more difficult when face images are taken in extremely high dynamic range scenes. Most of the automatic face recognition systems assume that images are taken under well-controlled illumination. The face segmentation as well as recognition becomes much simpler under such a constrained condition. However, illumination control is not feasible when a surveillance system is installed in any location at will. Without compensating for uneven illumination, it is impossible to get a satisfactory recognition rate. In this paper, we propose an integrated system that first compensates uneven illumination through local contrast enhancement. Then the enhanced images are fed into a robust face recognition system which adaptively selects the most important features among all candidate features and performs classification by support vector machines (SVMs). The dimension of feature space as well as the selected types of features is customized for each hyperplane. Three face image databases, namely Yale, Yale Group B, and Extended Yale Group B, are used to evaluate performance. The experimental result shows that the proposed recognition system give superior results compared to recently published literatures. 相似文献

19.

A Chinese sign language recognition system based on SOFM/SRN/HMM 总被引：3，自引：0，他引：3

Wen Gaolin Debin Yiqiang 《Pattern recognition》2004,37(12):2389-2402

In sign language recognition (SLR), the major challenges now are developing methods that solve signer-independent continuous sign problems. In this paper, SOFM/HMM is first presented for modeling signer-independent isolated signs. The proposed method uses the self-organizing feature maps (SOFM) as different signers' feature extractor for continuous hidden Markov models (HMM) so as to transform input signs into significant and low-dimensional representations that can be well modeled by the emission probabilities of HMM. Based on these isolated sign models, a SOFM/SRN/HMM model is then proposed for signer-independent continuous SLR. This model applies the improved simple recurrent network (SRN) to segment continuous sign language in terms of transformed SOFM representations, and the outputs of SRN are taken as the HMM states in which the lattice Viterbi algorithm is employed to search the best matched word sequence. Experimental results demonstrate that the proposed system has better performance compared with conventional HMM system and obtains a word recognition rate of 82.9% over a 5113-sign vocabulary and an accuracy of 86.3% for signer-independent continuous SLR. 相似文献

20.

Bayesian shape model for facial feature extraction and recognition 总被引：4，自引：0，他引：4

Zhong Stan Z. Eam Khwang 《Pattern recognition》2003,36(12):2819-2833

A facial feature extraction algorithm using the Bayesian shape model (BSM) is proposed in this paper. A full-face model consisting of the contour points and the control points is designed to describe the face patch, using which the warping/normalization of the extracted face patch can be performed efficiently. First, the BSM is utilized to match and extract the contour points of a face. In BSM, the prototype of the face contour can be adjusted adaptively according to its prior distribution. Moreover, an affine invariant internal energy term is introduced to describe the local shape deformations between the prototype contour in the shape domain and the deformable contour in the image domain. Thus, both global and local shape deformations can be tolerated. Then, the control points are estimated from the matching result of the contour points based on the statistics of the full-face model. Finally, the face patch is extracted and normalized using the piece-wise affine triangle warping algorithm. Experimental results based on real facial feature extraction demonstrate that the proposed BSM facial feature extraction algorithm is more accurate and effective as compared to that of the active shape model (ASM). 相似文献