期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

李虎生杨明杰《电路与系统学报》1999,4(2):1-6

说话人自适应是提高非特定人语音识别性能的有效方法之一。本文将ＭＡＰ算法应用于汉语数码语音识别中,并讨论了几种加快自适应速度的方法以及自适应对非自适应人的影响。实验表明,ＭＡＰ算法可以有效地降低汉语数码识别对被适应人的误识率,而且对非自适应人性能影响很小。相似文献

2.

基于说话人聚类和高斯混合模型的语言辨识研究

屈丹侯风雷王炳锡吴保民《信号处理》2004,20(3):285-289

本文给出了一种语言辨识的新方法。通常来讲,语言辨识系统是说话人无关的,但说话人的个体特征对语言辨识系统有很大的影响,文本采用了一种粗分类精识别的思想,利用说话人聚类技术有效解决了粗分类的问题,对每类相近说话人集合建立模型,然后进行识别。实验表明,该方法对于说话人无关的语言辨识问题是有效的。相似文献

3.

Speech and language technologies for audio indexing and retrieval 总被引：6，自引：0，他引：6

Makhoul J. Kubala F. Leek T. Daben Liu Long Nguyen Schwartz R. Srivastava A. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》2000,88(8):1338-1353

With the advent of essentially unlimited data storage capabilities and with the proliferation of the use of the Internet, it becomes reasonable to imagine a world in which it would be possible to access any of the stored information at will with a few keystrokes or voice commands. Since much of this data will be in the form of speech from various sources, it becomes important to develop the technologies necessary for indexing and browsing such audio data. This paper describes some of the requisite speech and language technologies that would be required and introduces an effort aimed at integrating these technologies into a system, called Rough `n' Ready, which indexes speech data, creates a structural summarization, and provides tools for browsing the stored data. The technologies highlighted in the paper include speaker-independent continuous speech recognition, speaker segmentation and identification, name spotting, topic classification, story segmentation, and information retrieval. The system automatically segments the continuous audio input stream by speaker, clusters audio segments from the same speaker, identifies speakers known to the system, and transcribes the spoken words. It also segments the input stream into stories, based on their topic content, and locates the names of persons, places, and organizations. These structural features are stored in a database and are used to construct highly selective search queries for retrieving specific content from large audio archives 相似文献

4.

The past, present, and future of speech processing

《Signal Processing Magazine, IEEE》1998,15(3):24-48

This article provides a succinct review of speech research, in particular its history, current trends, and prospects for the future. The research areas covered are speech analysis and synthesis, speech coding, speech enhancement, speech recognition, spoken language understanding, speaker identification and verification, and multimodal communication 相似文献

5.

Neural networks for statistical recognition of continuous speech 总被引：4，自引：0，他引：4

Morgan N. Bourlard H.A. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1995,83(5):742-772

In recent years there has been a significant body of work, both theoretical and experimental, that has established the viability of artificial neural networks (ANN's) as a useful technology for speech recognition. It has been shown that neural networks can be used to augment speech recognizers whose underlying structure is essentially that of hidden Markov models (HMM's). In particular, we have demonstrated that fairly simple layered structures, which we lately have termed big dumb neural networks (BDNN's), can be discriminatively trained to estimate emission probabilities for an HMM. Recently simple speech recognition systems (using context-independent phone models) based on this approach have been proved on controlled tests, to be both effective in terms of accuracy (i.e., comparable or better than equivalent state-of-the-art systems) and efficient in terms of CPU and memory run-time requirements. Research is continuing on extending these results to somewhat more complex systems. In this paper, we first give a brief overview of automatic speech recognition (ASR) and statistical pattern recognition in general. We also include a very brief review of HMM's, and then describe the use of ANN's as statistical estimators. We then review the basic principles of our hybrid HMM/ANN approach and describe some experiments. We discuss some current research topics, including new theoretical developments in training ANN's to maximize the posterior probabilities of the correct models for speech utterances. We also discuss some issues of system resources required for training and recognition. Finally, we conclude with some perspectives about fundamental limitations in the current technology and some speculations about where we can go from here 相似文献

6.

基于FVQ/HMM的无教师说话人自适应

赵力邹采荣吴镇扬《电子学报》2002,30(7):967-969

本文提出了一种新的语音识别方法,它综合了VQ、HMM和无教师说话人自适应算法的优点,在每个状态通过用矢量量化误差值取代传统HMM的输出概率值来建立FVQ/HMM,同时采用基于模糊矢量量化的无教师自适应算法,来改变FVQ/HMM的各状态的码字,从而实现对未知说话人的码本适应.本文通过非特定人汉语数码(孤立和连续数码)语音识别实验,把该新的组合方法同基于CHMM的自适应和识别方法进行了比较,实验结果表明该方法的自适应和识别效果优于基于CHMM的方法. 相似文献

7.

说话人分割聚类研究进展

下载免费PDF全文

马勇鲍长春《信号处理》2013,29(9):1190-1199

说话人分割聚类是近几年新兴起的语音信号处理研究方向,它主要研究如何确定连续语流中多说话人起止时间的位置,并标出每个语音段对应的说话人。这项研究对自动语音识别、多说话人识别和基于内容的音频分析等都具有重要的意义。根据说话人分割和聚类实现过程不同,本文从异步策略和同步策略的角度回顾了十年来国内外研究的主流算法、技术和代表系统,对比了不同代表系统在近几年NIST富信息转写评测的结果,最后讨论了目前还存在的问题,并对未来的发展进行了展望。相似文献

8.

ADSP-BF531在嵌入式语音识别系统中的应用

王维强《电子设计工程》2012,20(12):186-189

设计了一个嵌入式语音识别系统,该系统硬件平台以ADSP-BF531为核心,采用离散隐马尔可夫模型(DHMM)检测和识别算法完成了对非特定人的孤立词语音识别。试验结果表明,该系统对非特定人短词汇的综合识别率在90%以上。该系统具有小型、高速、可靠以及扩展性好等特点;可应用于许多特定场合,有很好的市场前景。文中讲述了该系统CODEC、片外RAM、ROM以及CPLD等与DSP的接口设计,语音识别运用的矢量量化、Mel倒谱参数、Viterbi等有关算法及其实际应用效果。相似文献

9.

Improved Emotion Recognition With a Novel Speaker-Independent Feature

Eun Ho Kim Kyung Hak Hyun Soo Hyun Kim Yoon Keun Kwak 《Mechatronics, IEEE/ASME Transactions on》2009,14(3):317-325

Emotion recognition is one of the latest challenges in human-robot interaction. This paper describes the realization of emotional interaction for a Thinking Robot, focusing on speech emotion recognition. In general, speaker-independent systems show a lower accuracy rate compared with speaker-dependent systems, as emotional feature values depend on the speaker and their gender. However, speaker-independent systems are required for commercial applications. In this paper, a novel speaker-independent feature, the ratio of a spectral flatness measure to a spectral center (RSS), with a small variation in speakers when constructing a speaker-independent system is proposed. Gender and emotion are hierarchically classified by using the proposed feature (RSS), pitch, energy, and the mel frequency cepstral coefficients. An average recognition rate of 57.2% (plusmn 5.7%) at a 90% confidence interval is achieved with the proposed system in the speaker-independent mode. 相似文献

10.

Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition

Shaofei Xue Hui Jiang Lirong Dai Qingfeng Liu 《Journal of Signal Processing Systems》2016,82(2):175-185

Recently several speaker adaptation methods have been proposed for deep neural network (DNN) in many large vocabulary continuous speech recognition (LVCSR) tasks. However, only a few methods rely on tuning the connection weights in trained DNNs directly to optimize system performance since it is very prone to over-fitting especially when some class labels are missing in the adaptation data. In this paper, we propose a new speaker adaptation method for the hybrid NN/HMM speech recognition model based on singular value decomposition (SVD). We apply SVD on the weight matrices in trained DNNs and then tune rectangular diagonal matrices with the adaptation data. This alleviates the over-fitting problem via updating the weight matrices slightly by only modifying the singular values. We evaluate the proposed adaptation method in two standard speech recognition tasks, namely TIMIT phone recognition and large vocabulary speech recognition in the Switchboard task. Experimental results have shown that it is effective to adapt large DNN models using only a small amount of adaptation data. For example, recognition results in the Switchboard task have shown that the proposed SVD-based adaptation method may achieve up to 3-6 % relative error reduction using only a few dozens of adaptation utterances per speaker. 相似文献

11.

考虑情感程度相对顺序的维度语音情感识别

下载免费PDF全文

韩文静李海峰马琳《信号处理》2011,27(11):1658-1663

维度语音情感识别(Dim-SER)是情感计算领域的一个新兴分支,它从多维、连续的角度看待情感,将SER问题建模为连续值的预测回归任务。当前的Dim-SER系统在进行情感预测时缺少对语料间情感程度相对顺序的考虑,严重影响了人机交互系统对说话人情感变化趋势的把握。从该需求出发,本文以人类情感认知特性为参照,构建了一个对情感程度相对顺序敏感的Dim-SER系统,并引入Gamma统计对SER系统性能评价标准加以完善。系统构建过程中,本文构造了Top-rank概率分布对语料间的情感顺序进行描述,并使用Kullback-Leibler距离对预测造成的顺序一致性损失进行度量,最后提出顺序敏感的神经网络算法实现系统预测损失的最小化。情感预测实验结果表明,同常用的k近邻算法和支持向量回归算法相比,该系统有效地提高了语料间情感程度相对顺序的正确性。相似文献

12.

语音识别与理解的研究进展 总被引：1，自引：0，他引：1

江铭虎袁保宗《电路与系统学报》1999,4(2):53-59

本文综述了当前语音识别理解的发展趋势和最新进展。指出美国在不依说话人的大词汇表的连续语音隐马尔柯夫模型识别方面起主导地位,日本在大词汇表的连续语音神经网络识别、模拟人工智能进行语音后处理方面起主导地位,并介绍了国际上最优秀的语音识别理解系统。相似文献

13.

汉语连续语音识别中多项式拟合语音轨迹模型的研究

下载免费PDF全文

欧智坚王作英《电子学报》2003,31(4):608-611

尽管作为当前最为流行的语音识别模型, HMM由于采用状态输出独立同分布假设,忽略了对语音轨迹动态特性的描述.本文基于一个更为灵活的语音描述统计框架—广义DDBHMM,提出了一个具体的多项式拟合语音轨迹模型,以及新的训练和识别算法,更好地刻划了真实的语音特性.本文还给出了一种有效的剪枝算法,得到一个实用化模型.汉语大词汇量非特定人连续语音识别的实验表明,这种剪枝的多项式拟合语音轨迹模型以较少的计算量明显改善了识别系统的性能. 相似文献

14.

语音识别HMM中引入帧间相关信息的一种参数化模型 总被引：4，自引：1，他引：3

杨浩荣王作英陆大《电子学报》1998,26(10):50-54,8

虽然隐马尔可夫模型（ＨＭＭ）是当前最为流行的语音识别模型，但由于一般都采用了状态输出独立假设，因此存在着不能描述语音现象中时间相关性的固有缺陷，本文提出的新模型对语音状态输出特征矢量序列的静态和动态特性信息分别进行参数化建模，然后将它们结合在一起，由此在基于段长分布的ＨＭＭ（ＤＤＢＨＭＭ）中引入了帧间相关信息，这种上引入帧间相关信息的ＨＭＭ能够更为精确地描述真实的语音现象。本文在给出新模型的框架后相似文献

15.

分数傅里叶变换域上含噪语音的联合滤波

包永强赵力邹采荣《信号处理》2006,22(6):899-902

噪声是影响语音识别和说话人识别性能的主要因素,目前常用的降噪方法多是针对平稳噪声的,而针对非平稳噪声的降噪方法很少。而在实际环境中,通常的噪声是非平稳的。本文将含噪语音变换到分数傅立叶域上,提出了一种在分数傅立叶变换域上进行线性最优滤波和中值滤波的联合滤波降噪方法。实验结果表明,该方法对含非平稳噪声的语音的降噪效果明显优于维纳滤波,能够有效地降低非平稳噪声的影响,提高非平稳噪声环境下的语音识别和说话人识别性能。相似文献

16.

老年陪护机器人系统的设计与实现 总被引：1，自引：0，他引：1

侯锐曹宏刘加《电声技术》2011,35(5):41-44

设计了一种以引导式语音交互和肢体动作为表达形式的老年陪护机器人系统,该机器人系统采用人性化语音交互、触觉感知与交互技术,对使用者提供个性化服务.采用非特定人语音识别技术,利用有限的硬件资源,构建稳健的识别模型,实现高性能并满足实时应答要求的语音识别片上系统.在真实动物运动姿态建模的基础上,提出了一种应用欠驱动柔性结构的... 相似文献

17.

基于音素的非特定人英语命令词识别算法研究 总被引：2，自引：0，他引：2

贲俊余小清万旺根《信号处理》2002,18(6):535-538

本文提出了一种新的基于音素的非特定人英语命令词识别算法,并在此算法基础上构建了一个非特定人英语命令词识别系统。结合非特定人语音识别系统的特点,系统的实现采用了HTK与VisualC++两种工具混和使用,提高了整个系统的开发效率。在识别阶段将置信度评估和不完全匹配的方法结合在一起,在一定程度上提高了识别的质量,在词汇量大于10的情况下取得了87.8％的识别率。相似文献

18.

On the use of different speech representations for speaker modeling

Ke Chen 《IEEE transactions on systems, man and cybernetics. Part C, Applications and reviews》2005,35(3):301-314

Numerous speech representations have been reported to be useful in speaker recognition. However, there is much less agreement on which speech representation provides a perfect representation of speaker-specific information conveyed in a speech signal. Unlike previous work, we propose an alternative approach to speaker modeling by the simultaneous use of different speech representations in an optimal way. Inspired by our previous empirical studies, we present a soft competition scheme on different speech representations to exploit different speech representations in encoding speaker-specific information. On the basis of this soft competition scheme, we present a parametric statistical model, generalized Gaussian mixture model (GGMM), to characterize a speaker identity based on different speech representations. Moreover, we develop an expectation-maximization algorithm for parameter estimation in the GGMM. The proposed speaker modeling approach has been applied to text-independent speaker recognition and comparative results on the KING speech corpus demonstrate its effectiveness. 相似文献

19.

中文耳语元音的声调特征研究

潘欣裕赵鹤鸣《信号处理》2011,27(10):1525-1530

声带准周期振动的缺失,使得汉语耳语音成为了一种特殊的发音模式,也使得耳语声调无法用基音周期表征。目前用于语音识别和声纹识别的常规语音特征,包含声调信息较少,所以在声调识别实验中很难获得良好的效果。本文提出一种新的特征参数来模拟正常语音的基频声调轨迹,即以人的听觉特性为出发点,研究人的声调敏感Bark频带,发现部分扩散Bark谱能量归一化比例拟合曲线,能够呈现出类似正常语音的基频轨迹,这说明在某些方面该轨迹或多或少包含了耳语音的声调信息。在以该轨迹和语音短时能量曲线为特征,以神经网络为模型的耳语声调识别实验中获得了较高的识别正确率,汉语四声的总体识别正确率高达78%,这也为对耳语音的进一步处理提供了很多有力依据。相似文献

20.

邮包校核语音识别系统的实时实现 总被引：6，自引：0，他引：6

下载免费PDF全文

单翼翔张昊天李虎生钟林张进刘加刘润生《电子学报》2002,30(4):544-547

本文研究开发了一套邮包信息校核语音识别系统.该系统利用中大词汇量非特定人连续语音识别技术实时实现了邮包信息的语音校核.系统可以识别普通话或四川话语音,可识别的词汇量约为4500条.系统还采用了拒识技术与说话人自适应技术,提高了整个系统的稳健性.实验表明对普通话的首选识别率达到98.7%,前三选识别率达到99.9%.对四川话的首选识别率达到95.9%,前三选识别率达到98.6%,对无关语音的正确拒识率达到85%,对口音较重的说话人经过自适应后识别率可提高5－8个百分点. 相似文献