期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Three‐Stage Framework for Unsupervised Acoustic Modeling Using Untranscribed Spoken Content

Andrej Zgank 《ETRI Journal》2010,32(5):810-818

相似文献

2.

民航陆空通话语音识别BiLSTM网络模型

下载免费PDF全文

邱意贾桂敏杨金锋刘远庆《信号处理》2019,35(2):293-300

民航陆空通话对民航飞行安全十分重要,但因其通话模式有特殊的语法结构与发音方式,日常语音识别声学模型无法有效应用于民航陆空通话的语音处理问题。针对民航陆空通话的特殊语境,本文提出了基于双向长短时记忆网络(BiLSTM)的民航陆空通话语音识别方法。首先,提取民航陆空通话语音的FBANK特征作为输入,以时序链式连接(CTC)为目标函数,训练BiLSTM网络得到BiLSTM/CTC模型。然后,利用声学模型,语言模型与陆空通话词典实现民航陆空通话的语音识别,并结合数据增强与数据迁移对模型进行增强训练提高语音识别性能。实验结果表明本文提出的方法适用于民航陆空通话语音识别,并且数据增强模型可有效降低民航陆空通话语音识别的词错误率。相似文献

3.

采用逼近优化的提升大边距估计准则

徐双印屈丹《信号处理》2013,29(6):753-760

针对大边距估计(Large Margin Estimation,LME)准则仅选取支持集内的最小边距进行调整导致边距利用不合理的问题,本文提出一种大边距准则目标函数的改进形式,通过增强竞争假设中与正确标注竞争关系较强的路径的似然得分,使训练数据的分类边距在一定程度上变小,从而进一步提高大边距估计的训练效果。并在此基础上,提出一种新的逼近优化方法,即当某点目标函数与辅助函数梯度方向相同时,在该点邻近的一定范围内,优化辅助函数即可带来目标函数相应的优化。在微软语料库上的实验成功证明了本文算法的有效性。相似文献

4.

不相关匹配追踪的分段区分性特征变换方法

下载免费PDF全文

陈斌牛铜张连海屈丹李弼程《电子学报》2016,44(12):2924-2931

为了提高基于分帧特征变换方法的稳定性,提出了一种基于分段的区分性特征变换方法.该方法将特征变换当成高维信号的稀疏逼近问题,采用状态绑定的方法训练得到基于域划分的线性变换矩阵（Region Dependent Linear Transform,RDLT）和基于最小音素错误准则均值补偿的特征（mean-offset feature Minimum Phone Error,m-fMPE）变换矩阵,将两者的特征变换矩阵构成过完备的字典;采用强制对齐的方式对语音信号进行分段,以似然度最大化作为目标函数,利用匹配追踪算法对目标函数迭代优化,自动地确定各语音信号段中的变换矩阵及其系数.为保证特征变换的稳定性,在选择变换矩阵过程中引入相关度测量,去除相关的特征基矢量.实验结果表明,相比于传统的RDLT方法,当声学模型分别采用最大似然和区分性准则训练时,识别性能分别可以提高1.63%和2.23%.该方法同时能应用于语音增强和模型区分性训练中. 相似文献

5.

A particle-swarm-optimized fuzzy-neural network for voice-controlled robot systems 总被引：1，自引：0，他引：1

Chatterjee A. Pulasinghe K. Watanabe K. Izumi K. 《Industrial Electronics, IEEE Transactions on》2005,52(6):1478-1489

This paper shows the possible development of particle swarm optimization (PSO)-based fuzzy-neural networks (FNNs) that can be employed as an important building block in real robot systems, controlled by voice-based commands. The PSO is employed to train the FNNs that can accurately output the crisp control signals for the robot systems, based on fuzzy linguistic spoken language commands, issued by a user. The FNN is also trained to capture the user-spoken directive in the context of the present performance of the robot system. Hidden Markov model (HMM)-based automatic speech recognizers (ASRs) are developed, as part of the entire system, so that the system can identify important user directives from the running utterances. The system has been successfully employed in two real-life situations, namely: 1) for navigation of a mobile robot; and 2) for motion control of a redundant manipulator. 相似文献

6.

Gradient-based learning applied to document recognition 总被引：69，自引：0，他引：69

Lecun Y. Bottou L. Bengio Y. Haffner P. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1998,86(11):2278-2324

Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day 相似文献

7.

A hand gesture recognition technique for human–computer interaction

《Journal of Visual Communication and Image Representation》2015

We propose an approach to recognize trajectory-based dynamic hand gestures in real time for human–computer interaction (HCI). We also introduce a fast learning mechanism that does not require extensive training data to teach gestures to the system. We use a six-degrees-of-freedom position tracker to collect trajectory data and represent gestures as an ordered sequence of directional movements in 2D. In the learning phase, sample gesture data is filtered and processed to create gesture recognizers, which are basically finite-state machine sequence recognizers. We achieve online gesture recognition by these recognizers without needing to specify gesture start and end positions. The results of the conducted user study show that the proposed method is very promising in terms of gesture detection and recognition performance (73% accuracy) in a stream of motion. Additionally, the assessment of the user attitude survey denotes that the gestural interface is very useful and satisfactory. One of the novel parts of the proposed approach is that it gives users the freedom to create gesture commands according to their preferences for selected tasks. Thus, the presented gesture recognition approach makes the HCI process more intuitive and user specific. 相似文献

8.

Discriminative common vector based finger knuckle recognition

《Journal of Visual Communication and Image Representation》2014,25(7):1647-1675

The main issue in personal authentication systems for military, security, industrial and social applications is accuracy. This paper presents a finger knuckle print (FKP) recognition approach to identity authentication. It applies a discriminative common vectors (DCV) based method to obtain the unique feature vectors, called discriminative common vectors, and the Euclidean distance as matching strategy to achieve the identification and verification tasks. The recognition process can be divided into the following phases: capturing the image; pre-processing; extracting the discriminative common vectors; matching and, finally, making a decision. In order to test and evaluate the proposed approach both the most representative FKP public databases and an established non-uniform FKP database were used. Experiments with these databases confirm that the DCV-based FKP recognition method achieves the authentication tasks effectively. The results showed the performance of the system in terms of the recognition rate had 100% accuracy for both training data and unseen test data. 相似文献

9.

汉语连续语音识别中不同基元声学模型的复合 总被引：1，自引：0，他引：1

张辉杜利民《电子与信息学报》2006,28(11):2045-2049

该文研究由不同声学基元训练的声学模型的复合。在汉语连续语音识别中,流行的基元包括上下文相关的声韵母基元和音素基元。实验发现,有些汉语音节在声韵母模型下有更高的识别率,有些音节在音素模型下有更高的识别率。该文提出一种复合这两种声学模型的方法,一方面在识别过程中同时使用两种模型,另一方面在识别过程中避开造成低识别率的模型。实验表明,采用本文的方法后,音节错误率比音素模型和声韵母模型分别下降了9.60%和6.10%。相似文献

10.

Model-based target recognition in pulsed ladar imagery 总被引：1，自引：0，他引：1

Qinfen Zheng Der S.Z. Mahmoud H.I. 《IEEE transactions on image processing》2001,10(4):565-572

A pulsed ladar based object-recognition system with applications to automatic target recognition (ATR) is presented. The approach used is to fit the sensed range images to range templates extracted through a laser physics based simulation applied to geometric target models. A projection-based prescreener filters out more than 80% of candidate templates. For recognition, an M of N pixel matching scheme for internal shape matching is combined with a silhouette matching scheme. The system was trained on synthetic data obtained from the simulation, and has been blind tested on a data set containing real ladar images of military vehicles at various orientations and ranges. Successful blind testing on real imagery demonstrates the utility of synthetic imagery for training of recognizers operating on ladar imagery. 相似文献

11.

A Sign-Component-Based Framework for Chinese Sign Language Recognition Using Accelerometer and sEMG Data

Li Y Chen X Zhang X Wang K Wang ZJ 《IEEE transactions on bio-medical engineering》2012,59(10):2695-2704

Identification of constituent components of each sign gesture can be beneficial to the improved performance of sign language recognition (SLR), especially for large-vocabulary SLR systems. Aiming at developing such a system using portable accelerometer (ACC) and surface electromyographic (sEMG) sensors, we propose a framework for automatic Chinese SLR at the component level. In the proposed framework, data segmentation, as an important preprocessing operation, is performed to divide a continuous sign language sentence into subword segments. Based on the features extracted from ACC and sEMG data, three basic components of sign subwords, namely the hand shape, orientation, and movement, are further modeled and the corresponding component classifiers are learned. At the decision level, a sequence of subwords can be recognized by fusing the likelihoods at the component level. The overall classification accuracy of 96.5% for a vocabulary of 120 signs and 86.7% for 200 sentences demonstrate the feasibility of interpreting sign components from ACC and sEMG data and clearly show the superior recognition performance of the proposed method when compared with the previous SLR method at the subword level. The proposed method seems promising for implementing large-vocabulary portable SLR systems. 相似文献

12.

Neural networks for statistical recognition of continuous speech 总被引：4，自引：0，他引：4

Morgan N. Bourlard H.A. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1995,83(5):742-772

In recent years there has been a significant body of work, both theoretical and experimental, that has established the viability of artificial neural networks (ANN's) as a useful technology for speech recognition. It has been shown that neural networks can be used to augment speech recognizers whose underlying structure is essentially that of hidden Markov models (HMM's). In particular, we have demonstrated that fairly simple layered structures, which we lately have termed big dumb neural networks (BDNN's), can be discriminatively trained to estimate emission probabilities for an HMM. Recently simple speech recognition systems (using context-independent phone models) based on this approach have been proved on controlled tests, to be both effective in terms of accuracy (i.e., comparable or better than equivalent state-of-the-art systems) and efficient in terms of CPU and memory run-time requirements. Research is continuing on extending these results to somewhat more complex systems. In this paper, we first give a brief overview of automatic speech recognition (ASR) and statistical pattern recognition in general. We also include a very brief review of HMM's, and then describe the use of ANN's as statistical estimators. We then review the basic principles of our hybrid HMM/ANN approach and describe some experiments. We discuss some current research topics, including new theoretical developments in training ANN's to maximize the posterior probabilities of the correct models for speech utterances. We also discuss some issues of system resources required for training and recognition. Finally, we conclude with some perspectives about fundamental limitations in the current technology and some speculations about where we can go from here 相似文献

13.

Rank‐weighted reconstruction feature for a robust deep neural network‐based acoustic model

Hoon Chung Jeon Gue Park Ho‐Young Jung 《ETRI Journal》2019,41(2):235-241

In this paper, we propose a rank‐weighted reconstruction feature to improve the robustness of a feed‐forward deep neural network (FFDNN)‐based acoustic model. In the FFDNN‐based acoustic model, an input feature is constructed by vectorizing a submatrix that is created by slicing the feature vectors of frames within a context window. In this type of feature construction, the appropriate context window size is important because it determines the amount of trivial or discriminative information, such as redundancy, or temporal context of the input features. However, we ascertained whether a single parameter is sufficiently able to control the quantity of information. Therefore, we investigated the input feature construction from the perspectives of rank and nullity, and proposed a rank‐weighted reconstruction feature herein, that allows for the retention of speech information components and the reduction in trivial components. The proposed method was evaluated in the TIMIT phone recognition and Wall Street Journal (WSJ) domains. The proposed method reduced the phone error rate of the TIMIT domain from 18.4% to 18.0%, and the word error rate of the WSJ domain from 4.70% to 4.43%. 相似文献

14.

一种改进的隐马尔可夫模型训练方法及其在声目标识别中的应用

刘辉杨俊安许学忠《电路与系统学报》2011,16(1):58-63

提出了一种基于最大相对界的改进隐马尔可夫模型训练方法.为解决隐马尔可夫模型的传统Baum_Welch训练算法在识别声目标时的局限以及现存区分训练算法泛化能力不足的问题,在经典隐马尔可夫模型为初始模型的基础上,定义了相对界,并通过最大化最小相对界建立一个最优化问题,用梯度下降法进行迭代求解,得到基于相对界的隐马尔可夫模型... 相似文献

15.

一种用于表情识别的局部判别简 总被引：2，自引：0，他引：2

下载免费PDF全文

蒋斌贾克斌《电子学报》2014,42(1):155-159

在判别分量分析算法的基础上,提出了一种针对人脸表情识别任务的局部判别分量分析算法.首先该算法为每个测试样本选取了一组近邻训练样本,获取了训练集的局部样本结构.然后在最大化判别样本子集协方差的同时,最小化样本子集内所有数据的协方差,从而有效地提取了测试样本的表情特征.在多个人脸表情数据库上的实验结果表明,该算法不但提高了判别分量分析算法的表情识别率,而且具有较强的鲁棒性. 相似文献

16.

基于混合模型状态修正算法的非母语语音识别

张晴晴潘接林颜永红《数字通信》2009,36(1):33-37

非母语语音识别的性能较低,对于刚开始学习目标语言的说话人或者口音很重的说话人而言,性能下降更为明显。本文提出一种新型的双语模型修正算法用于提高非母语语音的识别性能。在该算法中,基线声学模型的每个状态都将被代表说话人母语特点的辅助模型状态所修正。文章给出了状态修正准则以及不同候选修正状态数下的性能比较。相比已用非母语训练数据自适应以后的基线声学模型,通过双语模型修正的声学模型在保证识别实时率的前提下,短语错误率相对下降了11．7％。相似文献

17.

Discriminative metric design for robust pattern recognition 总被引：2，自引：0，他引：2

Watanabe H. Yamaguchi T. Katagiri S. 《Signal Processing, IEEE Transactions on》1997,45(11):2655-2662

Motivated by the development of discriminative feature extraction (DFE), many researchers have come to realize the importance of designing a front-end feature extraction unit with an appropriate link to backend classification. This paper proposes an advanced formalization of DFE, which we call the discriminative metric design (DMD), and elaborates on its exemplar implementation by using a simple, linear feature transformation matrix. The resulting DMD implementation is shown to have a close relationship to various discriminative pattern recognizers, including artificial neural networks. The utility of the proposed method is clearly demonstrated in speech pattern recognition experiments 相似文献

18.

Development of an acoustic-phonetic hidden Markov model forcontinuous speech recognition

Ljolje A. Levinson S.E. 《Signal Processing, IEEE Transactions on》1991,39(1):29-39

The techniques used to develop an acoustic-phonetic hidden Markov model, the problems associated with representing the whole acoustic-phonetic structure, the characteristics of the model, and how it performs as a phonetic decoder for recognition of fluent speech are discussed. The continuous variable duration model was trained using 450 sentences of fluent speech, each of which was spoken by a single speaker, and segmented and labeled using a fixed number of phonemes, each of which has a direct correspondence to the states of the matrix. The inherent variability of each phoneme is modeled as the observable random process of the Markov chain, while the phonotactic model of the unobservable phonetic sequence is represented by the state transition matrix of the hidden Markov model. The model assumes that the observed spectral data were generated by a Gaussian source. However, an analysis of the data shows that the spectra for the most of the phonemes are not normally distributed and that an alternative representation would be beneficial 相似文献

19.

An Adaptive Utterance Verification Framework Using Minimum Verification Error Training

Sung‐Hwan Shin Ho‐Young Jung Biing‐Hwang Juang 《ETRI Journal》2011,33(3):423-433

This paper introduces an adaptive and integrated utterance verification (UV) framework using minimum verification error (MVE) training as a new set of solutions suitable for real applications. UV is traditionally considered an add‐on procedure to automatic speech recognition (ASR) and thus treated separately from the ASR system model design. This traditional two‐stage approach often fails to cope with a wide range of variations, such as a new speaker or a new environment which is not matched with the original speaker population or the original acoustic environment that the ASR system is trained on. In this paper, we propose an integrated solution to enhance the overall UV system performance in such real applications. The integration is accomplished by adapting and merging the target model for UV with the acoustic model for ASR based on the common MVE principle at each iteration in the recognition stage. The proposed iterative procedure for UV model adaptation also involves revision of the data segmentation and the decoded hypotheses. Under this new framework, remarkable enhancement in not only recognition performance, but also verification performance has been obtained. 相似文献

20.

On adaptive decision rules and decision parameter adaptation forautomatic speech recognition

Chin-Hui Lee Qiang Huo 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》2000,88(8):1241-1269

Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximum-likelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for high-performance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine prior knowledge in an existing collection of general models with a new set of condition-specific adaptation data. In this paper, the mathematical framework for Bayesian adaptation of acoustic and language model parameters is first described. Maximum a posteriori point estimation is then developed for hidden Markov models and a number of useful parameters densities commonly used in automatic speech recognition and natural language processing 相似文献