共查询到20条相似文献,搜索用时 312 毫秒
1.
2.
民航陆空通话对民航飞行安全十分重要,但因其通话模式有特殊的语法结构与发音方式,日常语音识别声学模型无法有效应用于民航陆空通话的语音处理问题。针对民航陆空通话的特殊语境,本文提出了基于双向长短时记忆网络(BiLSTM)的民航陆空通话语音识别方法。首先,提取民航陆空通话语音的FBANK特征作为输入,以时序链式连接(CTC)为目标函数,训练BiLSTM网络得到BiLSTM/CTC模型。然后,利用声学模型,语言模型与陆空通话词典实现民航陆空通话的语音识别,并结合数据增强与数据迁移对模型进行增强训练提高语音识别性能。实验结果表明本文提出的方法适用于民航陆空通话语音识别,并且数据增强模型可有效降低民航陆空通话语音识别的词错误率。 相似文献
3.
针对大边距估计(Large Margin Estimation,LME)准则仅选取支持集内的最小边距进行调整导致边距利用不合理的问题,本文提出一种大边距准则目标函数的改进形式,通过增强竞争假设中与正确标注竞争关系较强的路径的似然得分,使训练数据的分类边距在一定程度上变小,从而进一步提高大边距估计的训练效果。并在此基础上,提出一种新的逼近优化方法,即当某点目标函数与辅助函数梯度方向相同时,在该点邻近的一定范围内,优化辅助函数即可带来目标函数相应的优化。在微软语料库上的实验成功证明了本文算法的有效性。 相似文献
4.
为了提高基于分帧特征变换方法的稳定性,提出了一种基于分段的区分性特征变换方法.该方法将特征变换当成高维信号的稀疏逼近问题,采用状态绑定的方法训练得到基于域划分的线性变换矩阵(Region Dependent Linear Transform,RDLT)和基于最小音素错误准则均值补偿的特征(mean-offset feature Minimum Phone Error,m-fMPE)变换矩阵,将两者的特征变换矩阵构成过完备的字典;采用强制对齐的方式对语音信号进行分段,以似然度最大化作为目标函数,利用匹配追踪算法对目标函数迭代优化,自动地确定各语音信号段中的变换矩阵及其系数.为保证特征变换的稳定性,在选择变换矩阵过程中引入相关度测量,去除相关的特征基矢量.实验结果表明,相比于传统的RDLT方法,当声学模型分别采用最大似然和区分性准则训练时,识别性能分别可以提高1.63%和2.23%.该方法同时能应用于语音增强和模型区分性训练中. 相似文献
5.
Chatterjee A. Pulasinghe K. Watanabe K. Izumi K. 《Industrial Electronics, IEEE Transactions on》2005,52(6):1478-1489
This paper shows the possible development of particle swarm optimization (PSO)-based fuzzy-neural networks (FNNs) that can be employed as an important building block in real robot systems, controlled by voice-based commands. The PSO is employed to train the FNNs that can accurately output the crisp control signals for the robot systems, based on fuzzy linguistic spoken language commands, issued by a user. The FNN is also trained to capture the user-spoken directive in the context of the present performance of the robot system. Hidden Markov model (HMM)-based automatic speech recognizers (ASRs) are developed, as part of the entire system, so that the system can identify important user directives from the running utterances. The system has been successfully employed in two real-life situations, namely: 1) for navigation of a mobile robot; and 2) for motion control of a redundant manipulator. 相似文献
6.
Gradient-based learning applied to document recognition 总被引:69,自引:0,他引:69
Lecun Y. Bottou L. Bengio Y. Haffner P. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1998,86(11):2278-2324
Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day 相似文献
7.
We propose an approach to recognize trajectory-based dynamic hand gestures in real time for human–computer interaction (HCI). We also introduce a fast learning mechanism that does not require extensive training data to teach gestures to the system. We use a six-degrees-of-freedom position tracker to collect trajectory data and represent gestures as an ordered sequence of directional movements in 2D. In the learning phase, sample gesture data is filtered and processed to create gesture recognizers, which are basically finite-state machine sequence recognizers. We achieve online gesture recognition by these recognizers without needing to specify gesture start and end positions. The results of the conducted user study show that the proposed method is very promising in terms of gesture detection and recognition performance (73% accuracy) in a stream of motion. Additionally, the assessment of the user attitude survey denotes that the gestural interface is very useful and satisfactory. One of the novel parts of the proposed approach is that it gives users the freedom to create gesture commands according to their preferences for selected tasks. Thus, the presented gesture recognition approach makes the HCI process more intuitive and user specific. 相似文献
8.
《Journal of Visual Communication and Image Representation》2014,25(7):1647-1675
The main issue in personal authentication systems for military, security, industrial and social applications is accuracy. This paper presents a finger knuckle print (FKP) recognition approach to identity authentication. It applies a discriminative common vectors (DCV) based method to obtain the unique feature vectors, called discriminative common vectors, and the Euclidean distance as matching strategy to achieve the identification and verification tasks. The recognition process can be divided into the following phases: capturing the image; pre-processing; extracting the discriminative common vectors; matching and, finally, making a decision. In order to test and evaluate the proposed approach both the most representative FKP public databases and an established non-uniform FKP database were used. Experiments with these databases confirm that the DCV-based FKP recognition method achieves the authentication tasks effectively. The results showed the performance of the system in terms of the recognition rate had 100% accuracy for both training data and unseen test data. 相似文献
9.
汉语连续语音识别中不同基元声学模型的复合 总被引:1,自引:0,他引:1
该文研究由不同声学基元训练的声学模型的复合。在汉语连续语音识别中,流行的基元包括上下文相关的声韵母基元和音素基元。实验发现,有些汉语音节在声韵母模型下有更高的识别率,有些音节在音素模型下有更高的识别率。该文提出一种复合这两种声学模型的方法,一方面在识别过程中同时使用两种模型,另一方面在识别过程中避开造成低识别率的模型。实验表明,采用本文的方法后,音节错误率比音素模型和声韵母模型分别下降了9.60%和6.10%。 相似文献
10.
Model-based target recognition in pulsed ladar imagery 总被引:1,自引:0,他引:1
A pulsed ladar based object-recognition system with applications to automatic target recognition (ATR) is presented. The approach used is to fit the sensed range images to range templates extracted through a laser physics based simulation applied to geometric target models. A projection-based prescreener filters out more than 80% of candidate templates. For recognition, an M of N pixel matching scheme for internal shape matching is combined with a silhouette matching scheme. The system was trained on synthetic data obtained from the simulation, and has been blind tested on a data set containing real ladar images of military vehicles at various orientations and ranges. Successful blind testing on real imagery demonstrates the utility of synthetic imagery for training of recognizers operating on ladar imagery. 相似文献
11.
Li Y Chen X Zhang X Wang K Wang ZJ 《IEEE transactions on bio-medical engineering》2012,59(10):2695-2704
Identification of constituent components of each sign gesture can be beneficial to the improved performance of sign language recognition (SLR), especially for large-vocabulary SLR systems. Aiming at developing such a system using portable accelerometer (ACC) and surface electromyographic (sEMG) sensors, we propose a framework for automatic Chinese SLR at the component level. In the proposed framework, data segmentation, as an important preprocessing operation, is performed to divide a continuous sign language sentence into subword segments. Based on the features extracted from ACC and sEMG data, three basic components of sign subwords, namely the hand shape, orientation, and movement, are further modeled and the corresponding component classifiers are learned. At the decision level, a sequence of subwords can be recognized by fusing the likelihoods at the component level. The overall classification accuracy of 96.5% for a vocabulary of 120 signs and 86.7% for 200 sentences demonstrate the feasibility of interpreting sign components from ACC and sEMG data and clearly show the superior recognition performance of the proposed method when compared with the previous SLR method at the subword level. The proposed method seems promising for implementing large-vocabulary portable SLR systems. 相似文献
12.
Neural networks for statistical recognition of continuous speech 总被引:4,自引:0,他引:4
Morgan N. Bourlard H.A. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1995,83(5):742-772
In recent years there has been a significant body of work, both theoretical and experimental, that has established the viability of artificial neural networks (ANN's) as a useful technology for speech recognition. It has been shown that neural networks can be used to augment speech recognizers whose underlying structure is essentially that of hidden Markov models (HMM's). In particular, we have demonstrated that fairly simple layered structures, which we lately have termed big dumb neural networks (BDNN's), can be discriminatively trained to estimate emission probabilities for an HMM. Recently simple speech recognition systems (using context-independent phone models) based on this approach have been proved on controlled tests, to be both effective in terms of accuracy (i.e., comparable or better than equivalent state-of-the-art systems) and efficient in terms of CPU and memory run-time requirements. Research is continuing on extending these results to somewhat more complex systems. In this paper, we first give a brief overview of automatic speech recognition (ASR) and statistical pattern recognition in general. We also include a very brief review of HMM's, and then describe the use of ANN's as statistical estimators. We then review the basic principles of our hybrid HMM/ANN approach and describe some experiments. We discuss some current research topics, including new theoretical developments in training ANN's to maximize the posterior probabilities of the correct models for speech utterances. We also discuss some issues of system resources required for training and recognition. Finally, we conclude with some perspectives about fundamental limitations in the current technology and some speculations about where we can go from here 相似文献
13.
In this paper, we propose a rank‐weighted reconstruction feature to improve the robustness of a feed‐forward deep neural network (FFDNN)‐based acoustic model. In the FFDNN‐based acoustic model, an input feature is constructed by vectorizing a submatrix that is created by slicing the feature vectors of frames within a context window. In this type of feature construction, the appropriate context window size is important because it determines the amount of trivial or discriminative information, such as redundancy, or temporal context of the input features. However, we ascertained whether a single parameter is sufficiently able to control the quantity of information. Therefore, we investigated the input feature construction from the perspectives of rank and nullity, and proposed a rank‐weighted reconstruction feature herein, that allows for the retention of speech information components and the reduction in trivial components. The proposed method was evaluated in the TIMIT phone recognition and Wall Street Journal (WSJ) domains. The proposed method reduced the phone error rate of the TIMIT domain from 18.4% to 18.0%, and the word error rate of the WSJ domain from 4.70% to 4.43%. 相似文献
14.
15.
16.
17.
Discriminative metric design for robust pattern recognition 总被引:2,自引:0,他引:2
Motivated by the development of discriminative feature extraction (DFE), many researchers have come to realize the importance of designing a front-end feature extraction unit with an appropriate link to backend classification. This paper proposes an advanced formalization of DFE, which we call the discriminative metric design (DMD), and elaborates on its exemplar implementation by using a simple, linear feature transformation matrix. The resulting DMD implementation is shown to have a close relationship to various discriminative pattern recognizers, including artificial neural networks. The utility of the proposed method is clearly demonstrated in speech pattern recognition experiments 相似文献
18.
The techniques used to develop an acoustic-phonetic hidden Markov model, the problems associated with representing the whole acoustic-phonetic structure, the characteristics of the model, and how it performs as a phonetic decoder for recognition of fluent speech are discussed. The continuous variable duration model was trained using 450 sentences of fluent speech, each of which was spoken by a single speaker, and segmented and labeled using a fixed number of phonemes, each of which has a direct correspondence to the states of the matrix. The inherent variability of each phoneme is modeled as the observable random process of the Markov chain, while the phonotactic model of the unobservable phonetic sequence is represented by the state transition matrix of the hidden Markov model. The model assumes that the observed spectral data were generated by a Gaussian source. However, an analysis of the data shows that the spectra for the most of the phonemes are not normally distributed and that an alternative representation would be beneficial 相似文献
19.
This paper introduces an adaptive and integrated utterance verification (UV) framework using minimum verification error (MVE) training as a new set of solutions suitable for real applications. UV is traditionally considered an add‐on procedure to automatic speech recognition (ASR) and thus treated separately from the ASR system model design. This traditional two‐stage approach often fails to cope with a wide range of variations, such as a new speaker or a new environment which is not matched with the original speaker population or the original acoustic environment that the ASR system is trained on. In this paper, we propose an integrated solution to enhance the overall UV system performance in such real applications. The integration is accomplished by adapting and merging the target model for UV with the acoustic model for ASR based on the common MVE principle at each iteration in the recognition stage. The proposed iterative procedure for UV model adaptation also involves revision of the data segmentation and the decoded hypotheses. Under this new framework, remarkable enhancement in not only recognition performance, but also verification performance has been obtained. 相似文献
20.
Chin-Hui Lee Qiang Huo 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》2000,88(8):1241-1269
Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximum-likelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for high-performance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine prior knowledge in an existing collection of general models with a new set of condition-specific adaptation data. In this paper, the mathematical framework for Bayesian adaptation of acoustic and language model parameters is first described. Maximum a posteriori point estimation is then developed for hidden Markov models and a number of useful parameters densities commonly used in automatic speech recognition and natural language processing 相似文献