首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The speech signal is decomposed through adapted local trigonometric transforms. The decomposed signal is classified by M uniform sub-bands for each subinterval. The energy of each sub-band is used as a speech feature. This feature is applied to vector quantisation and the hidden Markov model. The new speech feature shows a slightly better recognition rate than the cepstrum for speaker independent speech recognition. The new speech feature also shows a lower standard deviation between speakers than does the cepstrum  相似文献   

2.
An effective and robust speech feature extraction method is presented. Based on the time-frequency multiresolution property of the wavelet transform, the input speech signal is decomposed into various frequency channels. For capturing the characteristics of an individual speaker, the linear predictive cepstral coefficients of the approximation channel and entropy value of the detail channel for each decomposition process are calculated. In addition, an adaptive thresholding technique for each lower resolution is also applied to remove the influence of noise interference. Experimental results show that using this mechanism not only effectively reduces the influence of noise interference but also improves the recognition performance. Finally, the proposed method is evaluated on the MAT telephone speech database for text-independent speaker identification using the group vector quantisation identifier. Some popular existing methods are also evaluated for comparison, and the results show that the proposed feature extraction algorithm is more effective and robust than the other existing methods. In addition, the performance of the proposed method is very satisfactory even in a low SNR environment corrupted by Gaussian white noise.  相似文献   

3.
Bootstrap and aggregating VQ classifier for speaker recognition   总被引:1,自引:0,他引:1  
A bootstrap and aggregating (bagging) vector quantisation (VQ) classifier is proposed for speaker recognition. This method obtains multiple training data sets by resampling the original training data set, and then integrates the corresponding multiple classifiers into a single classifier. Experiments involving a closed set, text-independent and speaker identification system are carried out using the TIMIT database. The proposed bagging VQ classifier shows considerably improved performance over the conventional VQ classifier  相似文献   

4.
Although the continuous hidden Markov model (CHMM) technique seems to be the most flexible and complete tool for speech modelling. It is not always used for the implementation of speech recognition systems because of several problems related to training and computational complexity. Thus, other simpler types of HMMs, such as discrete (DHMM) or semicontinuous (SCHMM) models, are commonly utilised with very acceptable results. Also, the superiority of continuous models over these types of HMMs is not clear. The authors' group has previously introduced the multiple vector quantisation (MVQ) technique, the main feature of which is the use of one separated VQ codebook for each recognition unit. The MVQ technique applied to DHMM models generates a new HMM modelling (basic MVQ models) that allows incorporation into the recognition dynamics of the input sequence information wasted by the discrete models in the VQ process. The authors propose a new variant of HMM models that arises from the idea of applying MVQ to SCHMM models. These are SCMVQ-HMM (semicontinuous multiple vector quantisation HMM) models that use one VQ codebook per recognition unit and several quantisation candidates for each input vector. It is shown that SCMVQ modelling is formally the closest one to CHMM, although requiring even less computation than SCHMMs. After studying several implementation issues of the MVQ technique. Such as which type of probability density function should be used, the authors show the superiority of SCMVQ models over other types of HMM models such as DHMMs, SCHMMs or the basic MVQs  相似文献   

5.
基于可区分性加权的模糊核说话人识别   总被引:2,自引:1,他引:1       下载免费PDF全文
林琳  王树勋  陈建 《电子学报》2008,36(7):1446-1450
 针对训练和识别语音数据较少的情况,本文提出了一种新的说话人识别算法.通过核映射,在高维特征空间对说话人的语音特征进行模糊矢量量化.为了增加说话人之间的可区分性,提出了一种基于高维特征空间的码字矢量的权值分配方法,对具有较强区分性的码字矢量分配较大的权值,并将产生的权值和说话人的码书一起形成说话人数据库.识别时,提出一种模糊核加权最近邻近分类器,在高维特征空间中对说话人进行匹配.实验表明,该算法在训练语音少于8s,识别语音为1s时,能够得到较好的识别结果.  相似文献   

6.
The authors evaluate continuous density hidden Markov models (CDHMM), dynamic time warping (DTW) and distortion-based vector quantisation (VQ) for speaker recognition, emphasising the performance of each model structure across incremental amounts of training data. Text-independent (TI) experiments are performed with VQ and CDHMMs, and text-dependent (TD) experiments are performed with DTW, VQ and CDHMMs. For TI speaker recognition, VQ performs better than an equivalent CDHMM with one training version, but is outperformed by CDHMM when trained with ten training versions. For TD experiments, DTW outperforms VQ and CDHMMs for sparse amounts of training data, but with more data the performance of each model is indistinguishable. The performance of the TD procedures is consistently superior to TI, which is attributed to subdividing the speaker recognition problem into smaller speaker-word problems. It is also shown that there is a large variation in performance across the different digits, and it is concluded that digit zero is the best digit for speaker discrimination  相似文献   

7.
本文根据倒谱系数矢量在特征空间的统计分布特性,提出了一种新的等方差加权倒谱失真测度,这种测度的加权函数充分刻画了语音倒谱矢量在特征空间分布的精细结构,从而有效地辨识不同讲话者的特征,实验表明,和常规的欧氏距离及方差倒数加权距离等相比,本文所提的失真测度能显著提高基于矢量量化的说话人识别的正识率。  相似文献   

8.
An efficient compression technique employing adaptive vector quantisation of multiple non-orthogonal transform domain representations of still images is developed. For each sub-image, the encoder selects a code from the domain that yields best representation. The performance improvement employing the proposed technique relative to existing single domain vector quantisation coding methods, for the same compression ratio, is obtained at the expense of increased computational complexity.  相似文献   

9.
An optimised feature map finite-state vector quantisation (referred to as optimised FMFSVQ) is presented for image coding. Based on the block-based gradient descent search algorithm used for motion estimation in video coding, the optimised FMFSVQ system finds a neighbourhood-based optimal codevector for each input vector by extending the associated state codebook stage by stage, thus rendering each state quantiser a variable rate vector quantisation. The optimised FMFSVQ system can be interpreted as a cascade of a finite-state vector quantiser and classified vector quantisers. Furthermore, an adaptive optimised FMFSVQ is obtained. Experiments demonstrate the superior rate-distortion performance of the adaptive optimised FMFSVQ compared with the original adaptive FMFSVQ and the memoryless vector quantisation  相似文献   

10.
Speaker normalization for chinese vowel recognition in cochlear implants   总被引:1,自引:0,他引:1  
Because of the limited spectra-temporal resolution associated with cochlear implants, implant patients often have greater difficulty with multitalker speech recognition. The present study investigated whether multitalker speech recognition can be improved by applying speaker normalization techniques to cochlear implant speech processing. Multitalker Chinese vowel recognition was tested with normal-hearing Chinese-speaking subjects listening to a 4-channel cochlear implant simulation, with and without speaker normalization. For each subject, speaker normalization was referenced to the speaker that produced the best recognition performance under conditions without speaker normalization. To match the remaining speakers to this "optimal" output pattern, the overall frequency range of the analysis filter bank was adjusted for each speaker according to the ratio of the mean third formant frequency values between the specific speaker and the reference speaker. Results showed that speaker normalization provided a small but significant improvement in subjects' overall recognition performance. After speaker normalization, subjects' patterns of recognition performance across speakers changed, demonstrating the potential for speaker-dependent effects with the proposed normalization technique.  相似文献   

11.
该文提出了一种将模糊C-均值聚类法与矢量量化法相结合进行说话人识别的方法。该算法将从语音信号中提取的 12阶 LPC(线性预测编码)倒谱系数作为待分类样本的 12个指标,先用矢量量化法求出每个说话人表征特征参数的码书,作为模糊聚类算法的聚类中心,最后将待识别的特征矢量以得到的码书为聚类中心,进行聚类识别。该算法所使用的特征参数较少,计算比较简单,但识别率较矢量量化法高。  相似文献   

12.
Neural networks for vector quantization of speech and images   总被引:6,自引:0,他引:6  
Using neural networks for vector quantization (VQ) is described. The authors show how a collection of neural units can be used efficiently for VQ encoding, with the units performing the bulk of the computation in parallel, and describe two unsupervised neural network learning algorithms for training the vector quantizer. A powerful feature of the new training algorithms is that the VQ codewords are determined in an adaptive manner, compared to the popular LBG training algorithm, which requires that all the training data be processed in a batch mode. The neural network approach allows for the possibility of training the vector quantizer online, thus adapting to the changing statistics of the input data. The authors compare the neural network VQ algorithms to the LBG algorithm for encoding a large database of speech signals and for encoding images  相似文献   

13.
A new video compression algorithm based on a temporal blocking structure, rather than the more conventional spatial blocking structure, is described. This blocking structure forms the basis of an adaptive vector quantisation (VQ) algorithm, the performance of which is then compared with a similar adaptive VQ scheme based on a spatial blocking structure  相似文献   

14.
Improved robust VQ-based watermarking   总被引:2,自引:0,他引:2  
Charalampidis  D. 《Electronics letters》2005,41(23):1272-1273
A robust watermarking method based on vector quantisation (VQ) is proposed as an improvement to existing VQ watermarking techniques. Experimental results illustrate that the proposed method exhibits superior performance compared to existing techniques for a variety of attacks.  相似文献   

15.
文中以语音信号的LPC倒谱系数、△倒谱系数、基音周期和△基音周期的混合特征参数作为识别说话人的特征,运用VQ技术实现了说话人自动识别。在10个说话人,1800个汉语数字和单词语音的语音库上进行了系统的识别实验,其中单音节语音的平均识别率达到了92%,双音节语音达到了96.67%,四音节语音达到了97.67%。  相似文献   

16.
The generalization of gain adaptation to vector quantization (VQ) is explored in this paper and a comprehensive examination of alternative techniques is presented. We introduce a class of adaptive vector quantizers that can dynamically adjust the "gain" or amplitude scale of code vectors according to the input signal level. The encoder uses a gain estimator to determine a suitable normalization of each input vector prior to VQ encoding. The normalized vectors have reduced dynamic range and can then be more efficiently coded. At the receiver, the VQ decoder output is multiplied by the estimated gain. Both forward and backward adaptation are considered and several different gain estimators are compared and evaluated. Gain-adaptive VQ can be used alone for "vector PCM" coding (i.e., direct waveform VQ) or as a building block in other vector coding schemes. The design algorithm for generating the appropriate gain-normalized VQ codebook is introduced. When applied to speech coding, gain-adaptive VQ achieves significant performance improvement over fixed VQ with a negligible increase in complexity.  相似文献   

17.
赵力  邹采荣  吴镇扬 《电子学报》2002,30(7):967-969
本文提出了一种新的语音识别方法,它综合了VQ、HMM和无教师说话人自适应算法的优点,在每个状态通过用矢量量化误差值取代传统HMM的输出概率值来建立FVQ/HMM,同时采用基于模糊矢量量化的无教师自适应算法,来改变FVQ/HMM的各状态的码字,从而实现对未知说话人的码本适应.本文通过非特定人汉语数码(孤立和连续数码)语音识别实验,把该新的组合方法同基于CHMM的自适应和识别方法进行了比较,实验结果表明该方法的自适应和识别效果优于基于CHMM的方法.  相似文献   

18.
Koh  J.-S. Kim  J.-K. 《Electronics letters》1988,24(17):1082-1083
Describes a simple algorithm for reducing the coding complexity of vector quantisation (VQ), exploiting the feature of a vector currently being coded. A proposed VQ of vector dimension 16 can reduce the complexity from 256 codeword searches to 16-32 with a slight performance degradation of about 0.1-0.9 dB  相似文献   

19.
A novel approach called `VQ-agglomeration' capable of performing fast and autonomous clustering is presented. The approach involves a vector quantisation (VQ) process followed by an agglomeration algorithm that treats codewords as initial prototypes. Each codeword is associated with a gravisphere that has a well defined attraction radius. The agglomeration algorithm requires that each codeword be moved directly to the centroid of its neighbouring codewords. The movements of codewords in the feature space are synchronous, and will converge quickly to certain sets of concentric circles for which the centroids identify the resulting clusters. Unlike other techniques, such as the k-means and the fuzzy C-means, the proposed approach is free of the initial prototype problem and it does not need pre-specification of the number of clusters. Properties of the agglomeration algorithm are characterised and its convergence is proved  相似文献   

20.
Lu  Z.-M. Burkhardt  H. 《Electronics letters》2005,41(17):956-957
A new kind of feature for colour image retrieval based on DCT-domain vector quantisation (VQ) index histograms (DCTVQIH) is proposed. For each colour image in the database, 12 histograms (four for each colour component) are calculated from 12 DCT-VQ index sequences, respectively. The retrieval simulation results show that, compared with the traditional spatial-domain colour-histogram-based features, the proposed features can largely improve the recall and precision performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号