期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A new optimum feature extraction and classification method for speaker recognition: GWPNN

《Expert systems with applications》2007,32(2):485-498

Speech and speaker recognition is an important topic to be performed by a computer system. In this paper, an expert speaker recognition system based on optimum wavelet packet entropy is proposed for speaker recognition by using real speech/voice signal. This study contains both the combination of the new feature extraction and classification approach by using optimum wavelet packet entropy parameter values. These optimum wavelet packet entropy values are obtained from measured real English language speech/voice signal waveforms using speech experimental set. A genetic-wavelet packet-neural network (GWPNN) model is developed in this study. GWPNN includes three layers which are genetic algorithm, wavelet packet and multi-layer perception. The genetic algorithm layer of GWPNN is used for selecting the feature extraction method and obtaining the optimum wavelet entropy parameter values. In this study, one of the four different feature extraction methods is selected by using genetic algorithm. Alternative feature extraction methods are wavelet packet decomposition, wavelet packet decomposition – short-time Fourier transform, wavelet packet decomposition – Born–Jordan time–frequency representation, wavelet packet decomposition – Choi–Williams time–frequency representation. The wavelet packet layer is used for optimum feature extraction in the time–frequency domain and is composed of wavelet packet decomposition and wavelet packet entropies. The multi-layer perceptron of GWPNN, which is a feed-forward neural network, is used for evaluating the fitness function of the genetic algorithm and for classification speakers. The performance of the developed system has been evaluated by using noisy English speech/voice signals. The test results showed that this system was effective in detecting real speech signals. The correct classification rate was about 85% for speaker classification. 相似文献

2.

The speaker identification by using genetic wavelet adaptive network based fuzzy inference system

E. Avci D. Avci 《Expert systems with applications》2009,36(6):9928-9940

In this paper, an intelligent speaker identification system is presented for speaker identification by using speech/voice signal. This study includes both combination of the adaptive feature extraction and classification by using optimum wavelet entropy parameter values. These optimum wavelet entropy values are obtained from measured Turkish speech/voice signal waveforms using speech experimental set. It is developed a genetic wavelet adaptive network based on fuzzy inference system (GWANFIS) model in this study. This model consists of three layers which are genetic algorithm, wavelet and adaptive network based on fuzzy inference system (ANFIS). The genetic algorithm layer is used for selecting of the feature extraction method and obtaining the optimum wavelet entropy parameter values. In this study, one of the eight different feature extraction methods is selected by using genetic algorithm. Alternative feature extraction methods are wavelet decomposition, wavelet decomposition – short time Fourier transform, wavelet decomposition – Born–Jordan time–frequency representation, wavelet decomposition – Choi–Williams time–frequency representation, wavelet decomposition – Margenau–Hill time–frequency representation, wavelet decomposition – Wigner–Ville time–frequency representation, wavelet decomposition – Page time–frequency representation, wavelet decomposition – Zhao–Atlas–Marks time–frequency representation. The wavelet layer is used for optimum feature extraction in the time–frequency domain and is composed of wavelet decomposition and wavelet entropies. The ANFIS approach is used for evaluating to fitness function of the genetic algorithm and for classification speakers. It has been evaluated the performance of the developed system by using noisy Turkish speech/voice signals. The test results showed that this system is effective in detecting real speech signals. The correct classification rate is about 91% for speaker classification. 相似文献

3.

Improvement to speech-music discrimination using sinusoidal model based features

Jalil Shirazi Shahrokh Ghaemmaghami 《Multimedia Tools and Applications》2010,50(2):415-435

This paper addresses a model-based audio content analysis for classification of speech-music mixed audio signals into speech and music. A set of new features is presented and evaluated based on sinusoidal modeling of audio signals. The new feature set, including variance of the birth frequencies and duration of the longest frequency track in sinusoidal model, as a measure of the harmony and signal continuity, is introduced and discussed in detail. These features are used and compared to typical features as inputs to an audio classifier. Performance of these sinusoidal model features is evaluated through classification of audio into speech and music using both the GMM (Gaussian Mixture Model) and the SVM (Support Vector Machine) classifiers. Experimental results show that the proposed features are quite successful in speech/music discrimination. By using only a set of two sinusoidal model features, extracted from 1-s segments of the signal, we achieved 96.84% accuracy in the audio classification. Experimental comparisons also confirm superiority of the sinusoidal model features to the popular time domain and frequency domain features in audio classification. 相似文献

4.

基于小波包变换的自适应门限的语音激活检测

陈明义李微黎华《计算机仿真》2009,26(3)

语音激活检测是语音信号处理的一个重要环节.在低信噪比的情况下,传统的检测方法已不适用.为了提高语音激活检测的性能和鲁棒性,针对主要由白噪声组成的噪声背景,提出了一种基于小波包变换的自适应门限的语音激活检测方法(VAD),它将语音信号进行小波包变换,得到各个子带信号,符个子带信号通过Teager能量算子(TEO)将有声部分强化,同时衰减无声部分,最后进行自适应门限判决.实验结果表明在低信噪比的情况下,算法能够正确判别语音段和噪声段. 相似文献

5.

Genetic wavelet packets for speech recognition

Leandro D. Vignolo Diego H. Milone Hugo L. Rufiner 《Expert systems with applications》2013,40(6):2350-2359

The most widely used speech representation is based on the mel-frequency cepstral coefficients, which incorporates biologically inspired characteristics into artificial recognizers. However, the recognition performance with these features can still be enhanced, specially in adverse conditions. Recent advances have been made with the introduction of wavelet based representations for different kinds of signals, which have shown to improve the classification performance. However, the problem of finding an adequate wavelet based representation for a particular problem is still an important challenge. In this work we propose a genetic algorithm to evolve a speech representation, based on a non-orthogonal wavelet decomposition, for phoneme classification. The results, obtained for a set of spanish phonemes, show that the proposed genetic algorithm is able to find a representation that improves speech recognition results. Moreover, the optimized representation was evaluated in noise conditions. 相似文献

6.

Adaptive network-based fuzzy inference system vs. other classification algorithms for warped LPC-based speech/music discrimination

《Engineering Applications of Artificial Intelligence》2007,20(6):783-793

Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents an effective approach based on an adaptive network-based fuzzy inference system (ANFIS) for the classification stage required in a speech/music discrimination system. A new simple feature, called warped LPC-based spectral centroid (WLPC-SC), is also proposed. Comparison between WLPC-SC and the classical features proposed in the literature for audio classification is performed, aiming to assess the good discriminatory power of the proposed feature. The vector length used to describe the proposed psychoacoustic-based feature is reduced to a few statistical values (mean, variance and skewness). With the aim of increasing the classification accuracy percentage, the feature space is then transformed to a new feature space by LDA. The classification task is performed applying ANFIS to the features in the transformed space. To evaluate the performance of the ANFIS system for speech/music discrimination, comparison to other commonly used classifiers is reported. The classification results for different types of music and speech signals show the good discriminating power of the proposed approach. 相似文献

7.

Improved voice activity detection algorithm using wavelet and support vector machine

Shi-Huang Chen Rodrigo Capobianco Guido Trieu-Kien Truong Yaotsu Chang 《Computer Speech and Language》2010,24(3):531-543

This paper proposes an improved voice activity detection (VAD) algorithm using wavelet and support vector machine (SVM) for European Telecommunication Standards Institution (ETSI) adaptive multi-rate (AMR) narrow-band (NB) and wide-band (WB) speech codecs. First, based on the wavelet transform, the original IIR filter bank and pitch/tone detector are implemented, respectively, via the wavelet filter bank and the wavelet-based pitch/tone detection algorithm. The wavelet filter bank can divide input speech signal into several frequency bands so that the signal power level at each sub-band can be calculated. In addition, the background noise level can be estimated in each sub-band by using the wavelet de-noising method. The wavelet filter bank is also derived to detect correlated complex signals like music. Then the proposed algorithm can apply SVM to train an optimized non-linear VAD decision rule involving the sub-band power, noise level, pitch period, tone flag, and complex signals warning flag of input speech signals. By the use of the trained SVM, the proposed VAD algorithm can produce more accurate detection results. Various experimental results carried out from the Aurora speech database with different noise conditions show that the proposed algorithm gives considerable VAD performances superior to the AMR-NB VAD Options 1 and 2, and AMR-WB VAD. 相似文献

8.

基于小波变换和支持向量机的音频分类 总被引：2，自引：0，他引：2

下载免费PDF全文

郑继明俞佳《计算机工程与应用》2009,45(11):158-161

音频特征提取是音频分类的基础,而音频分类又是内容的音频检索的关键。综合分析了语音和音乐的区别性特征,提出一种基于小波变换和支持向量机的音频特征提取和分类的方法,用于纯语音、音乐、带背景音乐的语音以及环境音的分类,并且评估了新特征集合在SVM分类器上的分类效果。实验结果表明,提出的音频特征有效、合理,分类性能较好。相似文献

9.

基于前置滤波和小波变换的带噪语音基音周期检测方法 总被引：10，自引：0，他引：10

李辉戴蓓蒨陆伟《数据采集与处理》2005,20(1):100-104

根据语音信号的基音周期范围有限和在声门闭合时刻语音信号出现锐变的特点,提出一种基于前置滤波和小波变换的基音周期检测方法。带噪语音信号经过3阶椭圆低通滤波器滤波后,采用以二次样条小波作为小波函数,进行一级小波变换检测语音信号的锐变点,再计算基音周期。实验表明,本文提出的基音周期检测方法,与平均幅度差函数(AMDF)和自相关函数(ACF)方法相比,提高了提取基音周期的准确率;与多尺度小波变换的基音周期检测方法相比,减小了计算量,削弱了噪声信号和语音的共振峰对基音周期检测的影响。相似文献

10.

基于样本熵的语音/音乐识别

杨松于凤芹《计算机工程与应用》2012,48(23):125-127,154

传统的MFCC及短时能量特征只反映了信号序列的静态特征,目前基于这些特征的语音/音乐识别率为79%～86%。样本熵可以反映信号序列中的新信息量的大小以及新信息量的变化程度。以样本熵作为特征对语音/音乐进行分类识别,提取混合信号的样本熵,计算每段信号样本熵的均值和方差,采用k均值聚类进行识别。仿真实验结果表明,基于样本熵的语音/音乐识别的识别率可提高到88.073%。相似文献

11.

混响环境下的宽带波束形成语音增强方法

王冬霞郑家超范真维周城旭《计算机工程与应用》2012,48(34):136-139,151

结合小波滤波器组理论和自适应波束形成技术,提出了一种基于宽带波束形成的麦克风阵列语音增强方法。该方法利用小波分析滤波器组将含噪语音信号变换到小波域;进行小波域阵列自适应波束形成;通过小波综合滤波器组重构增强后的语音信号。计算机仿真实验验证了该方法的有效性。相似文献

12.

A wavelet- based transform method for quality improvement in noisy speech patterns of Arabic language

Sachin Singh A. M. Mutawa 《International Journal of Speech Technology》2016,19(4):677-685

This paper addresses the problem of single-channel speech enhancement of low (negative) SNR of Arabic noisy speech signals. For this aim, a binary mask thresholding function based coiflet5 mother wavelet transform is proposed for Arabic speech enhancement. The effectiveness of binary mask thresholding function based coiflet5 mother wavelet transform is compared with Wiener method, spectral subtraction, log-MMSE, test-PSC and p-mmse in presence of babble, pink, white, f-16 and Volvo car interior noise. The noisy input speech signals are processed at various levels of input SNR range from ?5 to ?25 dB. Performance of the proposed method is evaluated with the help of PESQ, SNR and cepstral distance measure. The results obtained by proposed binary mask thresholding function based coiflet5 wavelet transform method are very encouraging and shows that the proposed method is much helpful in Arabic speech enhancement than other existing methods. 相似文献

13.

A perceptually motivated stationary wavelet packet filterbank using improved spectral over-subtraction for enhancement of speech in various noise environments

Navneet Upadhyay Abhijit Karmakar 《International Journal of Speech Technology》2014,17(2):117-132

In this paper, we propose a speech enhancement method where the front-end decomposition of the input speech is performed by temporally processing using a filterbank. The proposed method incorporates a perceptually motivated stationary wavelet packet filterbank (PM-SWPFB) and an improved spectral over-subtraction (I-SOS) algorithm for the enhancement of speech in various noise environments. The stationary wavelet packet transform (SWPT) is a shift invariant transform. The PM-SWPFB is obtained by selecting the stationary wavelet packet tree in such a manner that it matches closely the non-linear resolution of the critical band structure of the psychoacoustic model. After the decomposition of the input speech, the I-SOS algorithm is applied in each subband, separately for the estimation of speech. The I-SOS uses a continuous noise estimation approach and estimate noise power from each subband without the need of explicit speech silence detection. The subband noise power is estimated and updated by adaptively smoothing the noisy signal power. The smoothing parameter in each subband is controlled by a function of the estimated signal-to-noise ratio (SNR). The performance of the proposed speech enhancement method is tested on speech signals degraded by various real-world noises. Using objective speech quality measures (SNR, segmental SNR (SegSNR), perceptual evaluation of speech quality (PESQ) score), and spectrograms with informal listening tests, we show that the proposed speech enhancement method outperforms than the spectral subtractive-type algorithms and improves quality and intelligibility of the enhanced speech. 相似文献

14.

An experimental study of energy dips for speech and music

Shiro Okamura Kensei Aoyama 《Pattern recognition》1983,16(2):163-166

Speech signals generally have many intervals with low energy “energy dips”. For music signals, energy dips are not always remarkable. We studied stochastic features of energy dips for speech and music signals. A certain difference was found between the signals in the length of energy dips. Also, the number of energy dips in a time window and their distribution were investigated. From this distribution, a threshold number of energy dips was estimated which provided a scheme for the discrimination of speech from music. 相似文献

15.

基于小波变换和K-SVD的应急广播语音压缩方法

鄢化彪胡超黄绿娥《计算机应用研究》2022,39(11)

针对应急广播中语音传输效率低的问题,提出了一种基于小波变换和K-奇异值分解（K-SVD）的语音压缩方法,以提升应急广播的信息传输时效性。首先,该方法舍弃语音小波分解得到的高频分量,在小波合成时用随机信号代替;其次,在低频分量的压缩感知过程中,用K-SVD字典学习算法训练的过完备字典对其稀疏表示;最后,采用改进的基于子空间回溯的广义正交匹配追踪算法重构信号。实验结果表明,在压缩效率为50%时,该方法重构应急广播语音的客观语音质量评分（PESQ）达到3.717,比其他对照算法分别提升了3%~47%,说明在保证压缩效率的同时,所提出的方法能提升应急广播语音重构质量,确保应急广播的传输时效性。相似文献

16.

结合节拍语义和MFCC声学特征的音乐流派分类

庄严于凤芹《计算机工程与应用》2015,51(3):197-201

由于音乐节拍的强度、快慢、持续时间等是反映音乐不同流派风格的重要语义特征,而音乐节拍多属于由打击乐器所产生的低频部分,为此利用小波变换对音乐信号进行6层分解来提取低频节拍特征;针对节拍特征差异不明显的音乐流派,提出用描述频域能量包络的MFCC声学特征与节拍特征结合,并用基于音乐流派机理分析的8阶MFCC代替常用的12阶MFCC。对8类音乐流派实验仿真结果表明,基于语义特征和声学特征结合的方法,总体分类准确率可达68.37%,同时特征维数增加对分类时间影响很小。相似文献

17.

Study on processing of wavelet speech denoising in speech recognition system 总被引：1，自引：0，他引：1

Xinmei Zhong Yunzhong Dai Yong Dai Tao Jin 《International Journal of Speech Technology》2018,21(3):563-569

The development of society promotes the continuous progress of science and technology, and speech processing technology gradually occupies an increasingly important position in people’s life and work, which puts forward higher requirements on the speech processing technology, especially in noisy environment. Due to the complexity of the real environment, denoising processing has great practical significance. In order to improve the level of speech denoising and increase the accuracy of the speech recognition system, wavelet denoising technology was used to analyze the de-noising requirements and hard and soft threshold functions in the speech recognition system, and an improved wavelet threshold denoising algorithm was put forward. Firstly, the signals were processed by wavelet decomposition according to primary function; then denoising was performed using the improved function; finally the denoised signals were reconstructed using inverse operation. The denoising effect of the algorithm was verified. The results showed that it was effective in denoising conventional speech signals. Besides, it was applied to the speech recognition system to denoise the noisy speech collected in the real environment, and finally high system self-assessment parameters were obtained. Thus it is concluded that wavelet denoising is effective in the speech denoising of the speech recognition system and can be put into practice. 相似文献

18.

Pitch estimation of speech and music sound based on multi-scale product with auditory feature extraction

Mohamed Anouar Ben Messaoud Aïcha Bouzid 《International Journal of Speech Technology》2016,19(1):65-73

The pitch is a crucial parameter in speech and music signals. However, due to severe noisy conditions, missing harmonics, unsuitable physical vibration, the determination of pitch presents a great challenge when desiring to get a good accuracy. In this paper, we propose a method for pitch estimation of speech and music sounds. Our method is based on the fast Fourier transform (FFT) of the multi-scale product (MP) provided by a feature auditory model of the sound signals. The auditory model simulates the spectral behaviour of the cochlea by a gammachirp filter-bank, and the out/middle ear filtering by a low-pass filter. For the two output channels, the FFT function of the MP is computed over frames. The MP is based on constituting the product of the speech and music wavelet transform coefficients at three scales. The experimental results show that our method estimates the pitch with high accuracy. Besides, our proposed method outperforms several other pitch detection algorithms in clean and noisy environments. 相似文献

19.

Content-based audio classification and segmentation by using support vector machines 总被引：9，自引：0，他引：9

Lie Lu Hong-Jiang Zhang Stan Z. Li 《Multimedia Systems》2003,8(6):482-492

Content-based audio classification and segmentation is a basis for further audio/video analysis. In this paper, we present our work on audio segmentation and classification which employs support vector machines (SVMs). Five audio classes are considered in this paper: silence, music, background sound, pure speech, and non- pure speech which includes speech over music and speech over noise. A sound stream is segmented by classifying each sub-segment into one of these five classes. We have evaluated the performance of SVM on different audio type-pairs classification with testing unit of different- length and compared the performance of SVM, K-Nearest Neighbor (KNN), and Gaussian Mixture Model (GMM). We also evaluated the effectiveness of some new proposed features. Experiments on a database composed of about 4- hour audio data show that the proposed classifier is very efficient on audio classification and segmentation. It also shows the accuracy of the SVM-based method is much better than the method based on KNN and GMM. 相似文献

20.

An expert system for speaker identification using adaptive wavelet sure entropy

Derya Avci 《Expert systems with applications》2009,36(3):6295-6300

In this study, an expert speaker identification system is presented for speaker identification using Turkish speech signals. Here, a discrete wavelet adaptive network based fuzzy inference system (DWANFIS) model is used for this aim. This model consists of two layers: discrete wavelet and adaptive network based fuzzy inference system. The discrete wavelet layer is used for adaptive feature extraction in the time–frequency domain and is composed of discrete wavelet decomposition and discrete wavelet entropy. The performance of the used system is evaluated by using repeated speech signals. These test results show the effectiveness of the developed intelligent system presented in this paper. The rate of correct classification is about 90.55% for the sample speakers. 相似文献