首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Good performance in cochlear implant users depends in large part on the ability of a speech processor to effectively decompose speech signals into multiple channels of narrow-band electrical pulses for stimulation of the auditory nerve. Speech processors that extract only envelopes of the narrow-band signals (e.g., the continuous interleaved sampling (CIS) processor) may not provide sufficient information to encode the tonal cues in languages such as Chinese. To improve the performance in cochlear implant users who speak tonal language, we proposed and developed a novel speech-processing strategy, which extracted both the envelopes of the narrow-band signals and the fundamental frequency (F0) of the speech signal, and used them to modulate both the amplitude and the frequency of the electrical pulses delivered to stimulation electrodes. We developed an algorithm to extract the fundatmental frequency and identified the general patterns of pitch variations of four typical tones in Chinese speech. The effectiveness of the extraction algorithm was verified with an artificial neural network that recognized the tonal patterns from the extracted F0 information. We then compared the novel strategy with the envelope-extraction CIS strategy in human subjects with normal hearing. The novel strategy produced significant improvement in perception of Chinese tones, phrases, and sentences. This novel processor with dynamic modulation of both frequency and amplitude is encouraging for the design of a cochlear implant device for sensorineurally deaf patients who speak tonal languages.  相似文献   

2.
Speaker normalization for chinese vowel recognition in cochlear implants   总被引:1,自引:0,他引:1  
Because of the limited spectra-temporal resolution associated with cochlear implants, implant patients often have greater difficulty with multitalker speech recognition. The present study investigated whether multitalker speech recognition can be improved by applying speaker normalization techniques to cochlear implant speech processing. Multitalker Chinese vowel recognition was tested with normal-hearing Chinese-speaking subjects listening to a 4-channel cochlear implant simulation, with and without speaker normalization. For each subject, speaker normalization was referenced to the speaker that produced the best recognition performance under conditions without speaker normalization. To match the remaining speakers to this "optimal" output pattern, the overall frequency range of the analysis filter bank was adjusted for each speaker according to the ratio of the mean third formant frequency values between the specific speaker and the reference speaker. Results showed that speaker normalization provided a small but significant improvement in subjects' overall recognition performance. After speaker normalization, subjects' patterns of recognition performance across speakers changed, demonstrating the potential for speaker-dependent effects with the proposed normalization technique.  相似文献   

3.
Cochlear implants currently fail to convey phase information, which is important for perceiving music, tonal languages, and for hearing in noisy environments. We propose a bio-inspired asynchronous interleaved sampling (AIS) algorithm that encodes both envelope and phase information, in a manner that may be suitable for delivery to cochlear implant users. Like standard continuous interleaved sampling (CIS) strategies, AIS naturally meets the interleaved-firing requirement, which is to stimulate only one electrode at a time, minimizing electrode interactions. The majority of interspike intervals are distributed over 1-4 ms, thus staying within the absolute refractory limit of neurons, and form a more natural, pseudostochastic pattern of firing due to complex channel interactions. Stronger channels are selected to fire more often but the strategy ensures that weaker channels are selected to fire in proportion to their signal strength as well. The resulting stimulation rates are considerably lower than those of most modern implants, saving power yet delivering higher potential performance. Correlations with original sounds were found to be significantly higher in AIS reconstructions than in signal reconstructions using only envelope information. Two perceptual tests on normal-hearing listeners verified that the reconstructed signals enabled better melody and speech recognition in noise than those processed using tone-excited envelope-vocoder simulations of cochlear implant processing. Thus, our strategy could potentially save power and improve hearing performance in cochlear implant users.  相似文献   

4.
The performance of cochlear implants deteriorates in noisy environments compared to quiet conditions. This paper presents an adaptive cochlear implant system, which is capable of classifying the background noise environment in real time for the purpose of adjusting or tuning its noise suppression algorithm to that environment. The tuning is done automatically with no user intervention. Five objective quality measures are used to show the superiority of this adaptive system compared to a conventional fixed noise-suppression system. Steps taken to achieve the real-time implementation of the entire system, incorporating both the cochlear implant speech processing and the background noise suppression, on a portable PDA research platform are presented along with the timing results.  相似文献   

5.
Cochlear implants (CIs) restore partial hearing to people with severe to profound sensorineural deafness; but there is still a marked performance gap in speech recognition between those who have received cochlear implant and people with a normal hearing capability. One of the factors that may lead to this performance gap is the inadequate signal processing method used in CIs. This paper investigates the application of an improved signal-processing method called bionic wavelet transform (BWT). This method is based upon the auditory model and allows for signal processing. Comparing the neural network simulations on the same experimental materials processed by wavelet transform (WT) and BWT, the application of BWT to speech signal processing in CI has a number of advantages, including: improvement in recognition rates for both consonants and vowels, reduction of the number of required channels, reduction of the average stimulation duration for words, and high noise tolerance. Consonant recognition results in 15 normal hearing subjects show that the BWT produces significantly better performance than the WT (t = -4.36276, p = 0.00065). The BWT has great potential to reduce the performance gap between CI listeners and people with a normal hearing capability in the future.  相似文献   

6.
This paper presents a new application of the dynamic iterated rippled noise (IRN) algorithm by generating dynamic pitch contours representative of those that occur in natural speech in the context of EEG and the frequency following response (FFR). Besides IRN steady state and linear rising stimuli, curvilinear rising stimuli were modeled after pitch contours of natural productions of Mandarin Tone 2. Electrophysiological data on pitch representation at the level of the brainstem, as reflected in FFR, were evaluated for all stimuli, static or dynamic. Autocorrelation peaks were observed corresponding to the fundamental period (tau) as well as spectral bands at the fundamental and its harmonics for both a low and a high iteration step. At the higher iteration step, both spectral and temporal FFR representations were more robust, indicating that both acoustic properties may be utilized for pitch extraction at the level of the brainstem. By applying curvilinear IRN stimuli to elicit FFRs, we can evaluate the effects of temporal degradation on 1) the neural representation of linguistically-relevant pitch features in a target population (e.g., cochlear implant) and 2) the efficacy of signal processing schemes in conventional hearing aids and cochlear implants to recover these features.  相似文献   

7.
Speech enhancement algorithms play an important role in speech signal processing. Over the past several decades, many algorithms have been studied for speech enhancement. A speech enhancement algorithm uses a noise removal method and a statistical model filter to analyze the speech signal in the frequency domain. Spectral subtraction and Wiener filters have been used as representative algorithms. These algorithms have excellent speech enhancement performance, but suffer from deterioration in performance due to specific noise or low signal-to-noise ratio (SNR) environments. In addition, according to estimations of erroneous noise, a noise existing in a voice signal is maintained so that a spectrum corresponding to a voice signal is distorted, or a frame corresponding to a voice signal cannot be retrieved, and voice recognition performance deteriorates. The problem of deterioration in speech recognition performance arises from the difference between speech recognition and training model. We use silence-feature normalization model as a methodology to improve the recognition rate resulting from the difference in the noisy environments. Conventional silence-feature normalization has a problem in that the silent part of the energy increases, which affects recognition performance due to unclear boundaries categorizing the voice. In this study, we use the cepstrum feature of the noise signals in the silence-feature normalization model to improve the performance of silence-feature normalization in a signal with a low SNR by setting a reference value for voiced and unvoiced classification. As a result of recognition rate confirmation, the recognition rates improve in performance, compared with other methods.  相似文献   

8.
针对传统谱减语音增强算法增强后的语音信号会残留明显的"音乐噪声"的问题,采用多频带谱减算法对其进行改进。改进算法的原理是将带噪的语音信号按照频率划分成不同的频带,并使这些频带之间互不交叠,根据频带内带有噪声的语音信号和噪声信号信噪比,利用自适应算法求得该频带的过减因子。仿真结果表明:改进多频带谱减算法的语音增强效果优于传统谱减法。  相似文献   

9.
基于Simulink技术的噪声调幅干扰仿真   总被引:2,自引:0,他引:2  
雷达干扰系统中的干扰信号设计与性能分析是一个难点。噪声调幅信号是雷达干扰系统中常用的一种信号,以噪声调幅干扰为例,通过分析噪声调幅干扰的原理,建立了一个简单的噪声调幅信号模型,利用Simulink语言对噪声调幅干扰进行建模仿真,针对频率对准、频率瞄准误差为半个中放带宽和频率瞄准误差大于半个中放带宽三种情况得到在不同条件下噪声调幅信号对雷达系统的干扰效果,结果显示在瞄准式干扰的条件下噪声调幅信号对雷达信号的干扰效果最为显著,说明噪声调幅信号适用于瞄准式干扰。  相似文献   

10.
A discriminative temporal feature processing method for robust speech recognition is presented by combining the knowledge and the statistical methods. The cepstral features are first filtered by a RASTA method based on human hearing perception and then processed using the minimum classification error algorithm. Improved recognition performance can be achieved in both quiet and noisy environments  相似文献   

11.
听觉外周计算模型在水中目标分类识别中的应用   总被引:3,自引:0,他引:3       下载免费PDF全文
 听觉外周的理论和建模已取得长足的发展,并已广泛应用于语音信号处理.本文集成Gammatone听觉滤波器和Meddis内毛细胞模型来模拟耳蜗的处理机制,并根据水中目标辐射噪声信号的特点对Meddis模型的参数进行了修正.提出基于Gammatone-Meddis听觉外周计算模型的水中目标特征提取方法,得到一个23维的特征向量.对大量海上实测数据的分析表明该特征具有以下优点:(1)分类识别效果好,对测试集识别率达到94%以上;(2)抗卷积噪声能力强,对原始信号加入卷积噪声,识别能力没有下降.最后通过实验证明基底膜对频率的非线性选取和内毛细胞都能够很好地抑制噪声.  相似文献   

12.
We examine alternative architectures for a client-server model of speech-enabled applications over the World Wide Web (WWW). We compare a server-only processing model where the client encodes and transmits the speech signal to the server, to a model where the recognition front end runs locally at the client and encodes and transmits the cepstral coefficients to the recognition server over the Internet. We follow a novel encoding paradigm, trying to maximize recognition performance instead of perceptual reproduction, and we find that by transmitting the cepstral coefficients we can achieve significantly higher recognition performance at a fraction of the bit rate required when encoding the speech signal directly. We find that the required bit rate to achieve the recognition performance of high-quality unquantized speech is just 2000 bits per second  相似文献   

13.
针对计算复杂度高和误码率(BER)受限幅噪声估计精度影响的问题,提出一种低复杂度的混合非对称限幅光正交频分复用(HACO-OFDM)系统及基于时域处理的接收解调方法,详细介绍了HACO-OFDM系统的组成及其时域信号的结构特点,通过简单的时域信号处理实现限幅噪声消除。实验结果表明:该方案显著降低了计算复杂度;当限幅噪声估计误差较大时,系统接收机的脉冲幅度调制的离散多音频(PAM-DMT)支路BER性能显著优于传统系统接收机。  相似文献   

14.
A combined feature extraction and recognition method is proposed based on higher-order spectrum, cyclic spectrum and time-frequency characteristics. In the application of this method, α-dimensional features, quadratic spectral characteristics and Fourier transform spectral characteristics of the signal are used to extract three characteristic values including the envelope means (EM) of α plane, the recursive normalized frequency component detection value (RNFCDV) and the quadratic spectrum normalized frequency component detection value (QSNFCDV), which have the merits of less identification parameters, insensitive to noise, less computation, high recognition rate, and multi-species identification. With this method, simulation results show that the recognition rate is more the 98% with the signal to noise rate (SNR) not less than 6 dB. And the performance of this method is better than the common recognition algorithms. There are eight types of signal, such as amplitude modulation (AM), phase modulation (PM), amplitude shift keying (ASK), frequency shift keying (FSK), phase shift keying (PSK), minimum shift keying (MSK), quadrature amplitude modulation (QAM) and direct sequence spread spectrum (DSSS), have been used to validate the feasibility of the method.  相似文献   

15.
A new modification of the spectral subtraction algorithm is presented which enables operating entirely in the time domain and is thus suitable for realization in analog integrated circuits. The noise spectrum is obtained during speechless intervals and stored for spectral subtraction when speech is present in the signal. The frequency range of interest of the speech signal is divided into narrow frequency bands by means of a bank of band-pass filters. For each frequency band the noise model is realized as an auxiliary signal multiplied by a particular weight. A subsystem is presented that produces an output signal whose power is equal to the difference between the input signal power and the noise model power for each frequency channel, thereby realizing the spectral subtraction. Circuits to achieve the described operation are outlined. Finally, simulation results of the noise removal algorithm are shown in the form of a spectrogram and the results showing improvement in automatic speech recognition are given.  相似文献   

16.
In this paper, we present a speech recognition system using a throat microphone. The use of this kind of microphone minimizes the impact of environmental noise. Due to the absence of high frequencies and the partial loss of formant frequencies, previous systems using throat microphones have shown a lower recognition rate than systems which use standard microphones. To develop a high performance automatic speech recognition (ASR) system using only a throat microphone, we propose two methods. First, based on Korean phonological feature theory and a detailed throat signal analysis, we show that it is possible to develop an ASR system using only a throat microphone, and propose conditions of the feature extraction algorithm. Second, we optimize the zero‐crossing with peak amplitude (ZCPA) algorithm to guarantee the high performance of the ASR system using only a throat microphone. For ZCPA optimization, we propose an intensification of the formant frequencies and a selection of cochlear filters. Experimental results show that this system yields a performance improvement of about 4% and a reduction in time complexity of 25% when compared to the performance of a standard ZCPA algorithm on throat microphone signals.  相似文献   

17.
Because there are many parameters in the cochlear implant (CI) device that can be optimized for individual patients, it is important to estimate a parameter's effect before patient evaluation. In this paper, Mel-frequency cepstrum coefficients (MFCCs) were used to estimate the acoustic vowel space for vowel stimuli processed by the CI simulations. The acoustic space was then compared to vowel recognition performance by normal-hearing subjects listening to the same processed speech. Five CI speech processor parameters were simulated to produce different degree of spectral resolution, spectral smearing, spectral warping, spectral shifting, and amplitude distortion. The acoustic vowel space was highly correlated with normal hearing subjects' vowel recognition performance for parameters that affected the spectral channels and spectral smearing. However, the acoustic vowel space was not significantly correlated with perceptual performance for parameters that affected the degree of spectral warping, spectral shifting, and amplitude distortion. In particular, while spectral warping and shifting did not significantly reshape the acoustic space, vowel recognition performance was significantly affected by these parameters. The results from the acoustic analysis suggest that the CI device can preserve phonetic distinctions under conditions of spectral warping and shifting. Auditory training may help CI patients better perceive these speech cues transmitted by their speech processors.  相似文献   

18.
基于MATLAB的语音增强系统的设计   总被引:1,自引:0,他引:1  
冯岩  唐普英 《通信技术》2010,43(5):187-188,191
语音增强是信号处理领域中的一个重要的组成部分。在许多语音处理的应用中,例如移动通信,语音识别和助听器,语音信号的处理不得不在具有噪声的环境下进行。在过去的几十年里,人们提出了许多方法去消除噪声和减少语音失真,例如谱减法,基于小波的方法,隐式马尔科夫模型法和信号子空间法等。小波分析由于能同时在时域和频域中对信号进行分析,所以它能有效地实现对信号的去噪。介绍了一种语音增强系统的设计方法,采用Least Mean Square(LMS)算法和小波变换相结合的方法对带噪语音进行去噪,并在MATLAB的Simulink环境下建立了该系统的模型。通过对该模型的仿真表明:该方法去噪效果明显,为该系统在硬件上的实现打下了理论基础。  相似文献   

19.
并行子带HMM最大后验概率自适应非线性类估计算法   总被引:1,自引:0,他引:1  
目前,自动语音识别(ASR)系统在实验室环境下获得了较高的识别率,但是在实际环境中,由于受到背景噪声和传输信道的影响,系统的识别性能急剧恶化.本文以听觉试验为基础,提出一种新的独立子带并行最大后验概率的非线性类估计算法,用以提高识别系统的鲁棒性.本算法利用多种噪声和识别内容功率谱差异,以及噪声在不同频带上对HMM影响的不同,采用多层感知机(MLP)对噪声环境下最大后验概率进行非线性映射,以减少识别系统由于环境不匹配而导致的识别性能下降.实验表明:该算法性能明显优于最大后验线性回归算法和Sangita提出的子带语音识别算法.  相似文献   

20.
This paper presents a noisy suppressed speech enhancement method by combining the basic spectral subtraction technique and spectral processing in the frequency domain to provide better noise suppression as well as better enhancement in the speech regions. In contrast to several previous approaches we do not try to achieve a complete removal of the noise, but instead our goal is to preserve a pre-defined amount of the original noise in the processed signal. This is accomplished by exploiting the masking properties of the human auditory system. The proposed algorithm is named PM “Proposed Method” which simulates properties of the human auditory system and applies it to the speech recognition system to enhance its robustness. The performance of the speech enhancement algorithm using the proposed masking model was compared with three other speech enhancement methods over 4 different noise types and five SNRs. The performances of the proposed approach are objectively and subjectively compared to the conventional approaches to highlight the aforementioned improvement. In this paper we discuss the design and development of a digital signal processor (DSP) implementation to achieve real-time performance of our filter. The target processor is a Texas Instruments TMS320C6713 floating point DSP.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号