期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using geometric spectral subtraction approach for feature extraction for DSR front-end Arabic system

Zied Sakka Elhem Techini MedSalim Bouhlel 《International Journal of Speech Technology》2017,20(3):645-650

Noise robustness and Arabic language are still considered as the main challenges for speech recognition over mobile environments. This paper contributed to these trends by proposing a new robust Distributed Speech Recognition (DSR) system for Arabic language. A speech enhancement algorithm was applied to the noisy speech as a robust front-end pre-processing stage to improve the recognition performance. While an isolated Arabic word engine was designed, and developed using HMM Model to perform the recognition process at the back-end. To test the engine, several conditions including clean, noisy and enhanced noisy speech were investigated together with speaker dependent and speaker independent tasks. With the experiments carried out on noisy database, multi-condition training outperforms the clean training mode in all noise types in terms of recognition rate. The results also indicate that using the enhancement method increases the DSR accuracy of our system under severe noisy conditions especially at low SNR down to 10 dB. 相似文献

2.

Robust Arabic speech recognition in noisy environments using prosodic features and formant

Anissa Imen Amrous Mohamed Debyeche Abderrahman Amrouche 《International Journal of Speech Technology》2011,14(4):351-359

This paper investigates the contribution of formants and prosodic features such as pitch and energy in Arabic speech recognition under real-life conditions. Our speech recognition system based on Hidden Markov Models (HMMs) is implemented using the HTK Toolkit. The front-end of the system combines features based on conventional Mel-Frequency Cepstral Coefficient (MFFC), prosodic information and formants. The experiments are performed on the ARADIGIT corpus which is a database of Arabic spoken words. The obtained results show that the resulting multivariate feature vectors, in noisy environment, lead to a significant improvement, up to 27%, in word accuracy relative the word accuracy obtained from the state-of-the-art MFCC-based system. 相似文献

3.

An experimental framework for Arabic digits speech recognition in noisy environments

Azzedine Touazi Mohamed Debyeche 《International Journal of Speech Technology》2017,20(2):205-224

相似文献

4.

Robust Audio-Visual Speech Recognition Based on Late Integration

Jong-Seok Lee Cheol Hoon Park 《Multimedia, IEEE Transactions on》2008,10(5):767-779

Audio-visual speech recognition (AVSR) using acoustic and visual signals of speech has received attention because of its robustness in noisy environments. In this paper, we present a late integration scheme-based AVSR system whose robustness under various noise conditions is improved by enhancing the performance of the three parts composing the system. First, we improve the performance of the visual subsystem by using the stochastic optimization method for the hidden Markov models as the speech recognizer. Second, we propose a new method of considering dynamic characteristics of speech for improved robustness of the acoustic subsystem. Third, the acoustic and the visual subsystems are effectively integrated to produce final robust recognition results by using neural networks. We demonstrate the performance of the proposed methods via speaker-independent isolated word recognition experiments. The results show that the proposed system improves robustness over the conventional system under various noise conditions without a priori knowledge about the noise contained in the speech. 相似文献

5.

Arabic speech recognition using SPHINX engine

Hussein Hyassat Raed Abu Zitar 《International Journal of Speech Technology》2006,9(3-4):133-150

相似文献

6.

A Robust Speech Recognition System for Communication Robots in Noisy Environments

Ishi C.T. Matsuda S. Kanda T. Jitsuhiro T. Ishiguro H. Nakamura S. Hagita N. 《Robotics, IEEE Transactions on》2008,24(3):759-763

The application range of communication robots could be widely expanded by the use of automatic speech recognition (ASR) systems with improved robustness for noise and for speakers of different ages. In past researches, several modules have been proposed and evaluated for improving the robustness of ASR systems in noisy environments. However, this performance might be degraded when applied to robots, due to problems caused by distant speech and the robot's own noise. In this paper, we implemented the individual modules in a humanoid robot, and evaluated the ASR performance in a real-world noisy environment for adults' and children's speech. The performance of each module was verified by adding different levels of real environment noise recorded in a cafeteria. Experimental results indicated that our ASR system could achieve over 80% word accuracy in 70-dBA noise. Further evaluation of adult speech recorded in a real noisy environment resulted in 73% word accuracy. 相似文献

7.

多噪声环境下的层级语音识别模型

曹晶晶许洁萍邵聖淇《计算机应用》2018,38(6):1790-1794

针对多噪声环境下的语音识别问题,提出了将环境噪声作为语音识别上下文考虑的层级语音识别模型。该模型由含噪语音分类模型和特定噪声环境下的声学模型两层组成,通过含噪语音分类模型降低训练数据与测试数据的差异,消除了特征空间研究对噪声稳定性的限制,并且克服了传统多类型训练在某些噪声环境下识别准确率低的弊端,又通过深度神经网络（DNN）进行声学模型建模,进一步增强声学模型分辨噪声的能力,从而提高模型空间语音识别的噪声鲁棒性。实验中将所提模型与多类型训练得到的基准模型进行对比,结果显示所提层级语音识别模型较该基准模型的词错率（WER）相对降低了20.3%,表明该层级语音识别模型有利于增强语音识别的噪声鲁棒性。相似文献

8.

Hierarchical Singleton-Type Recurrent Neural Fuzzy Networks for Noisy Speech Recognition

Juang C.-F. Chiou C.-T. Lai C.-L. 《Neural Networks, IEEE Transactions on》2007,18(3):833-843

This paper proposes noisy speech recognition using hierarchical singleton-type recurrent neural fuzzy networks (HSRNFNs). The proposed HSRNFN is a hierarchical connection of two singleton-type recurrent neural fuzzy networks (SRNFNs), where one is used for noise filtering and the other for recognition. The SRNFN is constructed by recurrent fuzzy if-then rules with fuzzy singletons in the consequences, and their recurrent properties make them suitable for processing speech patterns with temporal characteristics. In n words recognition, n SRNFNs are created for modeling n words, where each SRNFN receives the current frame feature and predicts the next one of its modeling word. The prediction error of each SRNFN is used as recognition criterion. In filtering, one SRNFN is created, and each SRNFN recognizer is connected to the same SRNFN filter, which filters noisy speech patterns in the feature domain before feeding them to the SRNFN recognizer. Experiments with Mandarin word recognition under different types of noise are performed. Other recognizers, including multilayer perceptron (MLP), time-delay neural networks (TDNNs), and hidden Markov models (HMMs), are also tested and compared. These experiments and comparisons demonstrate good results with HSRNFN for noisy speech recognition tasks 相似文献

9.

抗噪声语音识别及语音增强算法的应用 总被引：1，自引：0，他引：1

汤玲戴斌《计算机仿真》2006,23(9):80-82,143

提高语音识别系统的鲁棒性是语音识别技术一个重要的研究课题。语音识别系统往往由于训练环境下的数据和识别环境下的数据不匹配造成系统的识别性能下降,为了让语音识别系统在含噪的环境下获得令人满意的工作性能,该文根据人耳听觉特性提出了一种鲁棒语音特征提取方法。在MFCC特征提取之前先对含噪语音特征进行掩蔽特性处理,同时结合语音增强方法对特征进行处理,最后得到鲁棒语音特征。通过4种不同试验结果分析表明,将这种方法用于抗噪声分析可以提高系统的抗噪声能力;同时这种特征的处理方法对不同噪声在不同信噪比有很好的适应性。相似文献

10.

Auditory driven subband speech enhancement for automatic recognition of noisy speech

Navneet Upadhyay Hamurabi Gamboa Rosales 《International Journal of Speech Technology》2016,19(4):869-880

Speech recognizers achieve high recognition accuracy under quiet acoustic environments, but their performance degrades drastically when they are deployed in real environments, where the speech is degraded by additive ambient noise. This paper advocates a two phase approach for robust speech recognition in such environment. Firstly, a front end subband speech enhancement with adaptive noise estimation (ANE) approach is used to filter the noisy speech. The whole noisy speech spectrum is portioned into eighteen dissimilar subbands based on Bark scale and noise power from each subband is estimated by the ANE approach, which does not require the speech pause detection. Secondly, the filtered speech spectrum is processed by the non parametric frequency domain algorithm based on human perception along with the back end building a robust classifier to recognize the utterance. A suite of experiments is conducted to evaluate the performance of the speech recognizer in a variety of real environments, with and without the use of a front end speech enhancement stage. Recognition accuracy is evaluated at the word level, and at a wide range of signal to noise ratios for real world noises. Experimental evaluations show that the proposed algorithm attains good recognition performance when signal to noise ratio is lower than 5 dB. 相似文献

11.

Performance Estimation of Speech Recognition System Under Noise Conditions Using Objective Quality Measures and Artificial Voice

《IEEE transactions on audio, speech, and language processing》2006,14(6):2006-2013

It is essential to ensure quality of service (QoS) when offering a speech recognition service for use in noisy environments. This means that the recognition performance in the target noise environment must be investigated. One approach is to estimate the recognition performance from a distortion value, which represents the difference between noisy speech and its original clean version. Previously, estimation methods using the segmental signal-to-noise ratio (SNRseg), the cepstral distance (CD), and the perceptual evaluation of speech quality (PESQ) have been proposed. However, their estimation accuracy has not been verified for the case when a noise reduction algorithm is adopted as a preprocessing stage in speech recognition. We, therefore, evaluated the effectiveness of these distortion measures by experiments using the AURORA-2J connected digit recognition task and four different noise reduction algorithms. The results showed that in each case the distortion measure correlates well with the word accuracy when the estimators used are optimized for each individual noise reduction algorithm. In addition, it was confirmed that when a single estimator, optimized for all the noise reduction algorithms, is used, the PESQ method gives a more accurate estimate than SNRseg and CD. Furthermore, we have proposed the use of artificial voice of several seconds duration instead of a large amount of real speech and confirmed that a relatively accurate estimate can be obtained by using the artificial voice. 相似文献

12.

Comparing ANN to HMM in implementing limited Arabic vocabulary ASR systems

Yousef Ajami Alotaibi 《International Journal of Speech Technology》2012,15(1):25-32

In this paper we investigated Artificial Neural Networks (ANN) based Automatic Speech Recognition (ASR) by using limited Arabic vocabulary corpora. These limited Arabic vocabulary subsets are digits and vowels carried by specific carrier words. In addition to this, Hidden Markov Model (HMM) based ASR systems are designed and compared to two ANN based systems, namely Multilayer Perceptron (MLP) and recurrent architectures, by using the same corpora. All systems are isolated word speech recognizers. The ANN based recognition system achieved 99.5% correct digit recognition. On the other hand, the HMM based recognition system achieved 98.1% correct digit recognition. With vowels carrier words, the MLP and recurrent ANN based recognition systems achieved 92.13% and 98.06, respectively, correct vowel recognition; but the HMM based recognition system achieved 91.6% correct vowel recognition. 相似文献

13.

基于稀疏编码的鲁棒说话人识别

何勇军孙广路付茂国韩纪庆《数据采集与处理》2014,29(2):198-203

目前的说话人识别系统在噪声环境下性能将急剧下降,为了解决这一问题,提出一种新的基于稀疏编码的说话人识别方法。该方法用一个通用背景字典(Universal Background Dictionary,UBD)刻画说话人语音的共性,并为每个说话人和环境噪声训练相应的字典来刻画说话人和环境的特殊变化。这些字典拼接成一个大字典,然后将待测试语音稀疏分解在这个大字典上以实现说话人识别。为了提高说话人字典的区分能力,通过从说话人字典中移除与通用背景字典原子相似的原子来优化说话人字典。为了跟踪变化的噪声,采用带噪声的语音在线更新噪声字典。在各种噪声条件下的实验表明,所提出的方法在噪声环境下具有较强的鲁棒性。相似文献

14.

A stereophonic acoustic signal extraction scheme for noisy and reverberant environments

Klaus Reindl Yuanhang Zheng Andreas Schwarz Stefan Meier Roland Maas Armin Sehr Walter Kellermann 《Computer Speech and Language》2013,27(3):726-745

In this contribution, a novel two-channel acoustic front-end for robust automatic speech recognition in adverse acoustic environments with nonstationary interference and reverberation is proposed. From a MISO system perspective, a statistically optimum source signal extraction scheme based on the multichannel Wiener filter (MWF) is discussed for application in noisy and underdetermined scenarios. For free-field and diffuse noise conditions, this optimum scheme reduces to a Delay & Sum beamformer followed by a single-channel Wiener postfilter. Scenarios with multiple simultaneously interfering sources and background noise are usually modeled by a diffuse noise field. However, in reality, the free-field assumption is very weak because of the reverberant nature of acoustic environments. Therefore, we propose to estimate this simplified MWF solution in each frequency bin separately to cope with reverberation. We show that this approach can very efficiently be realized by the combination of a blocking matrix based on semi-blind source separation (‘directional BSS’), which provides a continuously updated reference of all undesired noise and interference components separated from the desired source and its reflections, and a single-channel Wiener postfilter. Moreover, it is shown, how the obtained reference signal of all undesired components can efficiently be used to realize the Wiener postfilter, and at the same time, generalizes well-known postfilter realizations. The proposed front-end and its integration into an automatic speech recognition (ASR) system are analyzed and evaluated in noisy living-room-like environments according to the PASCAL CHiME challenge. A comparison to a simplified front-end based on a free-field assumption shows that the introduced system substantially improves the speech quality and the recognition performance under the considered adverse conditions. 相似文献

15.

基于EMG信号的无声语音识别应用及实现

许佳佳姚晓东《计算机与数字工程》2006,34(5):133-136

提出了基于肌电信号（EMG）的无声语音识别系统。由于该系统是通过EMG信号而非声音信号进行识别,因此可应用于高噪声环境和帮助失去发音能力的人实现无声交流,有着良好的应用前景。关于该系统的实现,提出了以下方法：实验时使用0—9十个中文数字,由受试者不发声地重复说出,从三块面部肌肉采集EMG信号;对EMG信号进行小波变换,获取变换系数矩阵后提取其能量值,构造特征矢量送入BP神经网络分类器分类。实验表明,基于小波变换的特征提取方法是一种有效的方法．适用于类似EMC信号的非平稳生理信号。相似文献

16.

Toward enhanced Arabic speech recognition using part of speech tagging

Dia?AbuZeina Email author Wasfi?Al-Khatib Moustafa?Elshafei Husni?Al-Muhtaseb 《International Journal of Speech Technology》2011,14(4):419-426

相似文献

17.

Robust feature extraction based on an asymmetric level-dependent auditory filterbank and a subband spectrum enhancement technique

《Digital Signal Processing》2014

In this paper we introduce a robust feature extractor, dubbed as robust compressive gammachirp filterbank cepstral coefficients (RCGCC), based on an asymmetric and level-dependent compressive gammachirp filterbank and a sigmoid shape weighting rule for the enhancement of speech spectra in the auditory domain. The goal of this work is to improve the robustness of speech recognition systems in additive noise and real-time reverberant environments. As a post processing scheme we employ a short-time feature normalization technique called short-time cepstral mean and scale normalization (STCMSN), which, by adjusting the scale and mean of cepstral features, reduces the difference of cepstra between the training and test environments. For performance evaluation, in the context of speech recognition, of the proposed feature extractor we use the standard noisy AURORA-2 connected digit corpus, the meeting recorder digits (MRDs) subset of the AURORA-5 corpus, and the AURORA-4 LVCSR corpus, which represent additive noise, reverberant acoustic conditions and additive noise as well as different microphone channel conditions, respectively. The ETSI advanced front-end (ETSI-AFE), the recently proposed power normalized cepstral coefficients (PNCC), conventional MFCC and PLP features are used for comparison purposes. Experimental speech recognition results demonstrate that the proposed method is robust against both additive and reverberant environments. The proposed method provides comparable results to that of the ETSI-AFE and PNCC on the AURORA-2 as well as AURORA-4 corpora and provides considerable improvements with respect to the other feature extractors on the AURORA-5 corpus. 相似文献

18.

Multiple cameras for audio-visual speech recognition in an automotive environment

Rajitha Navarathna David Dean Sridha Sridharan Patrick Lucey 《Computer Speech and Language》2013,27(4):911-927

Audio-visual speech recognition, or the combination of visual lip-reading with traditional acoustic speech recognition, has been previously shown to provide a considerable improvement over acoustic-only approaches in noisy environments, such as that present in an automotive cabin. The research presented in this paper will extend upon the established audio-visual speech recognition literature to show that further improvements in speech recognition accuracy can be obtained when multiple frontal or near-frontal views of a speaker's face are available. A series of visual speech recognition experiments using a four-stream visual synchronous hidden Markov model (SHMM) are conducted on the four-camera AVICAR automotive audio-visual speech database. We study the relative contribution between the side and central orientated cameras in improving visual speech recognition accuracy. Finally combination of the four visual streams with a single audio stream in a five-stream SHMM demonstrates a relative improvement of over 56% in word recognition accuracy when compared to the acoustic-only approach in the noisiest conditions of the AVICAR database. 相似文献

19.

基于顺序统计滤波的实时语音端点检测算法 总被引：1，自引：0，他引：1

郭丽惠何昕张亚昕吕岳《自动化学报》2008,34(4):419-425

针对嵌入式语音识别系统,提出了一种高效的实时语音端点检测算法. 算法以子带频谱熵为语音/噪声的区分特征, 首先将每帧语音的频谱划分成若干个子带, 计算出每个子带的频谱熵, 然后把相继若干帧的子带频谱熵经过一组顺序统计滤波器获得每帧的频谱熵, 根据频谱熵的值对输入的语音进行分类. 实验结果表明, 该算法能够有效地区分语音和噪声, 可以显著地提高语音识别系统的性能. 在不同的噪声环境和信噪比条件下具有鲁棒性. 此外, 本文提出的算法计算代价小, 简单易实现, 适合实时嵌入式语音识别系统的应用. 相似文献

20.

含噪语音实时迭代维纳滤波 总被引：1，自引：1，他引：0

下载免费PDF全文

王景芳《计算机工程与应用》2011,47(19):132-135

针对传统去噪方法在强背景噪声情况下,提取声音信号的能力变弱甚至失效与对不同噪声环境适应性差,提出了迭代维纳滤波声音信号特征提取方法。给出了语音噪声频谱与功率谱信噪比迭代更新机制与具体实施方案。实验仿真表明,该算法能有效地去噪滤波,显著地提高语音识别系统性能,且在不同的噪声环境和信噪比条件下具有鲁棒性。该算法计算代价小,简单易实现,适用于嵌入式语音识别系统。相似文献