首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Results from a series of experiments that use neural networks to process the visual speech signals of a male talker are presented. In these preliminary experiments, the results are limited to static images of vowels. It is demonstrated that these networks are able to extract speech information from the visual images and that this information can be used to improve automatic vowel recognition. The structure of speech and its corresponding acoustic and visual signals are reviewed. The specific data that was used in the experiments along with the network architectures and algorithms are described. The results of integrating the visual and auditory signals for vowel recognition in the presence of acoustic noise are presented  相似文献   

2.
There has been progress in improving speech recognition using a tightly-coupled modality such as lip movement; and using additional input interfaces to improve recognition of commands in multimodal human? computer interfaces such as speech and pen-based systems. However, there has been little work that attempts to improve the recognition of spontaneous, conversational speech by adding information from a loosely?coupled modality. The study investigated this idea by integrating information from gaze into an automatic speech recognition (ASR) system. A probabilistic framework for multimodal recognition was formalised and applied to the specific case of integrating gaze and speech. Gaze-contingent ASR systems were developed from a baseline ASR system by redistributing language model probability mass according to the visual attention. These systems were tested on a corpus of matched eye movement and related spontaneous conversational British English speech segments (n = 1355) for a visual-based, goal-driven task. The best performing systems had similar word error rates to the baseline ASR system and showed an increase in keyword spotting accuracy. The core values of this work may be useful for developing robust speech-centric multimodal decoding system functions.  相似文献   

3.
The paper reviews the work done in speech recognition and understanding, mostly in the years 1976–1977. The attention is focussed on problems of system organization, use and representation of syntax and semantics, control strategies, lexical classification, extraction and emission of hypotheses about acoustic, phonetic and phonemic features.  相似文献   

4.
5.
The distributed acoustic sensing technology was used for real-time speech reproduction and recognition, in which the voiceprint can be extracted by the Mel frequency cepstral coefficient(MFCC) method. A classic ancient Chinese poem “You Zi Yin”, also called “A Traveler’s Song”, was analyzed both in time and frequency domains, where its real-time reproduction was achieved with a 116.91 ms time delay. The smaller scaled MFCC0 at 1/12 of MFCC matrix was taken as a feature vector of each ...  相似文献   

6.
In automatic speech recognition, the acoustic signal is the only tangible connection between the talker and the machine. While the signal conveys linguistic information, this information is often encoded in such a complex manner that the signal exhibits a great deal of variability. In addition, variations in environment and speaker can introduce further distortions that are linguistically irrelevant. This paper has three aims: 1) to discuss the nature of variabilities; 2) to describe the kinds of speech knowledge that may help us understand variabilities; and 3) to advocate and suggest specific procedures for the increased utilization of speech knowledge in automatic speech recognition.  相似文献   

7.
通信信号自动识别方法   总被引:7,自引:0,他引:7  
通信信号的自动识别是通信信号处理的一个重要研究课题,近年来随着数字信号处理技术的发展,通信信号的调制方式增加了,对通信信号的自动识别提出了更高的要求.许多新的方法应用于该领域,本文对近年来这个领域的研究作了综合评述,讨论了其中存在的问题,并指出了今后的发展方向.  相似文献   

8.
9.
The effect of phone-level speaking rate on speech recogniser performance is investigated. It is shown that deviation from the mean phone-level rate is correlated with the phone recognition error, and that the interval length for rate calculation is important, with optimal values of approximately four and 15 phones. A rate compensation algorithm is proposed  相似文献   

10.
11.
Recent advances in the automatic recognition of audiovisual speech   总被引:11,自引:0,他引:11  
Visual speech information from the speaker's mouth region has been successfully shown to improve noise robustness of automatic speech recognizers, thus promising to extend their usability in the human computer interface. In this paper, we review the main components of audiovisual automatic speech recognition (ASR) and present novel contributions in two main areas: first, the visual front-end design, based on a cascade of linear image transforms of an appropriate video region of interest, and subsequently, audiovisual speech integration. On the latter topic, we discuss new work on feature and decision fusion combination, the modeling of audiovisual speech asynchrony, and incorporating modality reliability estimates to the bimodal recognition process. We also briefly touch upon the issue of audiovisual adaptation. We apply our algorithms to three multisubject bimodal databases, ranging from small- to large-vocabulary recognition tasks, recorded in both visually controlled and challenging environments. Our experiments demonstrate that the visual modality improves ASR over all conditions and data considered, though less so for visually challenging environments and large vocabulary tasks.  相似文献   

12.
This paper describes the implementation of a Speech Understanding System component which tracks the formants of pseudo-syllabic nuclei containing voiced consonants. The nuclei are isolated from continuous speech after a precategorical classification in which feature extraction is carried out by modules organized in a hierarchy of levels. FFT and LPC spectra are the input to the formant tracking system. It works under the control of rules specifying the possible formant evolutions given previously hypothesized phonetic features and produces fuzzy graphs rather than usual formant patterns because formants are not always evident in the spectrogram pattern.  相似文献   

13.
传统的基于鼠标和键盘的人机交互方式已无法满足需要,自动语音识别技术正是解决这个问题的重要方向。本文阐述了一种基于短时能量和过零率的双门限端点检测方法,和基于HMM的自动语音识别技术原理以及具体工程应用实例:智能媒资检索系统V1.0和气象档案管理系统V1.0,该系统解决了传统的媒资管理系统出现的问题,提高了媒资文件的管理效率、检索率和利用率。  相似文献   

14.
15.
The detection volume of the surface electromyographic (EMG) signal was explored using a finite-element model, to examine the feasibility of obtaining independent myoelectric control signals from regions of reinnervated muscle. The selectivity of the surface EMG signal was observed to decrease with increasing subcutaneous fat thickness. The results confirm that reducing the interelectrode distance or using double-differential electrodes can increase surface EMG selectivity in an inhomogeneous volume conductor. More focal control signals can be obtained, at the expense of increased variability, by using the mean square value, rather than the root mean square or average rectified value.  相似文献   

16.
17.
《信息技术》2019,(6):115-120
文中利用Eesen框架声学建模简化了现有的自动语音识别(ASR),通过训练单个递归神经网络(RNN)来预测上下文无关的目标(音素或字符)。为了消除对预生成帧标签的需求,采用了连接时间分类(CTC)目标函数来推断语音和标签序列之间的对齐。同时,采用基于加权有限状态换能器(WFST)的广义译码方法,将词汇和语言模型有效地整合到CTC译码中。实验结果表明,与混合HMM/DNN模型相比,所提方法具有较低的误码率(WER),同时显著加快了译码速度。  相似文献   

18.
在建立语音识别系统的过程中错误率评估起着非常重要的作用,传统的词错误率算法仅仅是基于最小错误率,具有显著的缺陷,因而不能准确评估系统的错误率。提出一种改进的基于最小错误率和时间信息的词错误率评估算法,能够准确评估系统的错误率,为声学模型的优化提供指导。同时列举了该评估算法在建立语音识别系统过程中的应用。  相似文献   

19.
《现代电子技术》2018,(10):179-182
传统的英文发音识别系统对于学习者的错误发音不能及时进行反馈与纠正,存在误导学习者以及学习者英文水平提高缓慢的弊端。在此设计新的英文发音错误语音自动识别系统,其由语音录制模块、语音播放模块、英语发音评分模块和发音共振峰图像显示模块构成,给出评分模块的发音评分流程,实现英文发音的有效评分以及评分的存储,系统通过发音共振峰图形显示模块,清晰地表达出学习者发音与标准发音的不同之处,纠正其错误读音。通过英语音素检错程序使用独立阈值的方式来提高错误读音的检测性能,对不同音素用独立阈值进行衡量,使得英语发音中的错误语音自动识别结果更加科学化、精准化。实验结果表明,所设计的系统具有较高的错误语音自动识别能力。  相似文献   

20.
In this paper, we introduce wavelet packets as an alternative method for spectral analysis of surface myoelectric (ME) signals. Both computer synthesized and real ME signals are used to investigate the performance. Our simulation results show that wavelet packet estimate has slightly less mean square error (MSE) than Fourier method, and both methods perform similarly on the real data. Moreover, wavelet packets give us some advantages over the traditional methods such as multiresolution of frequency, as well as its potential use for effecting time-frequency decomposition of the nonstationary signals such as the ME signals during dynamic contractions. We also introduce wavelet shrinkage method for improving spectral estimates by significantly reducing the MSE's for both Fourier and wavelet packet methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号