共查询到20条相似文献,搜索用时 703 毫秒
1.
汉语数码语音识别自适应算法 总被引:4,自引:0,他引:4
说话人自适应是提高非特定人语音识别性能的有效方法之一。本文将MAP算法应用于汉语数码语音识别中,并讨论了几种加快自适应速度的方法以及自适应对非自适应人的影响。实验表明,MAP算法可以有效地降低汉语数码识别对被适应人的误识率,而且对非自适应人性能影响很小。 相似文献
2.
3.
Makhoul J. Kubala F. Leek T. Daben Liu Long Nguyen Schwartz R. Srivastava A. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》2000,88(8):1338-1353
With the advent of essentially unlimited data storage capabilities and with the proliferation of the use of the Internet, it becomes reasonable to imagine a world in which it would be possible to access any of the stored information at will with a few keystrokes or voice commands. Since much of this data will be in the form of speech from various sources, it becomes important to develop the technologies necessary for indexing and browsing such audio data. This paper describes some of the requisite speech and language technologies that would be required and introduces an effort aimed at integrating these technologies into a system, called Rough `n' Ready, which indexes speech data, creates a structural summarization, and provides tools for browsing the stored data. The technologies highlighted in the paper include speaker-independent continuous speech recognition, speaker segmentation and identification, name spotting, topic classification, story segmentation, and information retrieval. The system automatically segments the continuous audio input stream by speaker, clusters audio segments from the same speaker, identifies speakers known to the system, and transcribes the spoken words. It also segments the input stream into stories, based on their topic content, and locates the names of persons, places, and organizations. These structural features are stored in a database and are used to construct highly selective search queries for retrieving specific content from large audio archives 相似文献
4.
《Signal Processing Magazine, IEEE》1998,15(3):24-48
This article provides a succinct review of speech research, in particular its history, current trends, and prospects for the future. The research areas covered are speech analysis and synthesis, speech coding, speech enhancement, speech recognition, spoken language understanding, speaker identification and verification, and multimodal communication 相似文献
5.
Neural networks for statistical recognition of continuous speech 总被引:4,自引:0,他引:4
Morgan N. Bourlard H.A. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1995,83(5):742-772
In recent years there has been a significant body of work, both theoretical and experimental, that has established the viability of artificial neural networks (ANN's) as a useful technology for speech recognition. It has been shown that neural networks can be used to augment speech recognizers whose underlying structure is essentially that of hidden Markov models (HMM's). In particular, we have demonstrated that fairly simple layered structures, which we lately have termed big dumb neural networks (BDNN's), can be discriminatively trained to estimate emission probabilities for an HMM. Recently simple speech recognition systems (using context-independent phone models) based on this approach have been proved on controlled tests, to be both effective in terms of accuracy (i.e., comparable or better than equivalent state-of-the-art systems) and efficient in terms of CPU and memory run-time requirements. Research is continuing on extending these results to somewhat more complex systems. In this paper, we first give a brief overview of automatic speech recognition (ASR) and statistical pattern recognition in general. We also include a very brief review of HMM's, and then describe the use of ANN's as statistical estimators. We then review the basic principles of our hybrid HMM/ANN approach and describe some experiments. We discuss some current research topics, including new theoretical developments in training ANN's to maximize the posterior probabilities of the correct models for speech utterances. We also discuss some issues of system resources required for training and recognition. Finally, we conclude with some perspectives about fundamental limitations in the current technology and some speculations about where we can go from here 相似文献
6.
7.
8.
设计了一个嵌入式语音识别系统,该系统硬件平台以ADSP-BF531为核心,采用离散隐马尔可夫模型(DHMM)检测和识别算法完成了对非特定人的孤立词语音识别。试验结果表明,该系统对非特定人短词汇的综合识别率在90%以上。该系统具有小型、高速、可靠以及扩展性好等特点;可应用于许多特定场合,有很好的市场前景。文中讲述了该系统CODEC、片外RAM、ROM以及CPLD等与DSP的接口设计,语音识别运用的矢量量化、Mel倒谱参数、Viterbi等有关算法及其实际应用效果。 相似文献
9.
Eun Ho Kim Kyung Hak Hyun Soo Hyun Kim Yoon Keun Kwak 《Mechatronics, IEEE/ASME Transactions on》2009,14(3):317-325
Emotion recognition is one of the latest challenges in human-robot interaction. This paper describes the realization of emotional interaction for a Thinking Robot, focusing on speech emotion recognition. In general, speaker-independent systems show a lower accuracy rate compared with speaker-dependent systems, as emotional feature values depend on the speaker and their gender. However, speaker-independent systems are required for commercial applications. In this paper, a novel speaker-independent feature, the ratio of a spectral flatness measure to a spectral center (RSS), with a small variation in speakers when constructing a speaker-independent system is proposed. Gender and emotion are hierarchically classified by using the proposed feature (RSS), pitch, energy, and the mel frequency cepstral coefficients. An average recognition rate of 57.2% (plusmn 5.7%) at a 90% confidence interval is achieved with the proposed system in the speaker-independent mode. 相似文献
10.
Shaofei Xue Hui Jiang Lirong Dai Qingfeng Liu 《Journal of Signal Processing Systems》2016,82(2):175-185
Recently several speaker adaptation methods have been proposed for deep neural network (DNN) in many large vocabulary continuous speech recognition (LVCSR) tasks. However, only a few methods rely on tuning the connection weights in trained DNNs directly to optimize system performance since it is very prone to over-fitting especially when some class labels are missing in the adaptation data. In this paper, we propose a new speaker adaptation method for the hybrid NN/HMM speech recognition model based on singular value decomposition (SVD). We apply SVD on the weight matrices in trained DNNs and then tune rectangular diagonal matrices with the adaptation data. This alleviates the over-fitting problem via updating the weight matrices slightly by only modifying the singular values. We evaluate the proposed adaptation method in two standard speech recognition tasks, namely TIMIT phone recognition and large vocabulary speech recognition in the Switchboard task. Experimental results have shown that it is effective to adapt large DNN models using only a small amount of adaptation data. For example, recognition results in the Switchboard task have shown that the proposed SVD-based adaptation method may achieve up to 3-6 % relative error reduction using only a few dozens of adaptation utterances per speaker. 相似文献
11.
维度语音情感识别(Dim-SER)是情感计算领域的一个新兴分支,它从多维、连续的角度看待情感,将SER问题建模为连续值的预测回归任务。当前的Dim-SER系统在进行情感预测时缺少对语料间情感程度相对顺序的考虑,严重影响了人机交互系统对说话人情感变化趋势的把握。从该需求出发,本文以人类情感认知特性为参照,构建了一个对情感程度相对顺序敏感的Dim-SER系统,并引入Gamma统计对SER系统性能评价标准加以完善。系统构建过程中,本文构造了Top-rank概率分布对语料间的情感顺序进行描述,并使用Kullback-Leibler距离对预测造成的顺序一致性损失进行度量,最后提出顺序敏感的神经网络算法实现系统预测损失的最小化。情感预测实验结果表明,同常用的k近邻算法和支持向量回归算法相比,该系统有效地提高了语料间情感程度相对顺序的正确性。 相似文献
12.
语音识别与理解的研究进展 总被引:1,自引:0,他引:1
本文综述了当前语音识别理解的发展趋势和最新进展。指出美国在不依说话人的大词汇表的连续语音隐马尔柯夫模型识别方面起主导地位,日本在大词汇表的连续语音神经网络识别、模拟人工智能进行语音后处理方面起主导地位,并介绍了国际上最优秀的语音识别理解系统。 相似文献
13.
14.
15.
16.
17.
18.
Ke Chen 《IEEE transactions on systems, man and cybernetics. Part C, Applications and reviews》2005,35(3):301-314
Numerous speech representations have been reported to be useful in speaker recognition. However, there is much less agreement on which speech representation provides a perfect representation of speaker-specific information conveyed in a speech signal. Unlike previous work, we propose an alternative approach to speaker modeling by the simultaneous use of different speech representations in an optimal way. Inspired by our previous empirical studies, we present a soft competition scheme on different speech representations to exploit different speech representations in encoding speaker-specific information. On the basis of this soft competition scheme, we present a parametric statistical model, generalized Gaussian mixture model (GGMM), to characterize a speaker identity based on different speech representations. Moreover, we develop an expectation-maximization algorithm for parameter estimation in the GGMM. The proposed speaker modeling approach has been applied to text-independent speaker recognition and comparative results on the KING speech corpus demonstrate its effectiveness. 相似文献
19.
声带准周期振动的缺失,使得汉语耳语音成为了一种特殊的发音模式,也使得耳语声调无法用基音周期表征。目前用于语音识别和声纹识别的常规语音特征,包含声调信息较少,所以在声调识别实验中很难获得良好的效果。本文提出一种新的特征参数来模拟正常语音的基频声调轨迹,即以人的听觉特性为出发点,研究人的声调敏感Bark频带,发现部分扩散Bark谱能量归一化比例拟合曲线,能够呈现出类似正常语音的基频轨迹,这说明在某些方面该轨迹或多或少包含了耳语音的声调信息。在以该轨迹和语音短时能量曲线为特征,以神经网络为模型的耳语声调识别实验中获得了较高的识别正确率,汉语四声的总体识别正确率高达78%,这也为对耳语音的进一步处理提供了很多有力依据。 相似文献
20.
本文研究开发了一套邮包信息校核语音识别系统.该系统利用中大词汇量非特定人连续语音识别技术实时实现了邮包信息的语音校核.系统可以识别普通话或四川话语音,可识别的词汇量约为4500条.系统还采用了拒识技术与说话人自适应技术,提高了整个系统的稳健性.实验表明对普通话的首选识别率达到98.7%,前三选识别率达到99.9%.对四川话的首选识别率达到95.9%,前三选识别率达到98.6%,对无关语音的正确拒识率达到85%,对口音较重的说话人经过自适应后识别率可提高5-8个百分点. 相似文献