期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Support vector machines for speaker and language recognition

《Computer Speech and Language》2006,20(2-3):210-229

Support vector machines (SVMs) have proven to be a powerful technique for pattern classification. SVMs map inputs into a high-dimensional space and then separate classes with a hyperplane. A critical aspect of using SVMs successfully is the design of the inner product, the kernel, induced by the high dimensional mapping. We consider the application of SVMs to speaker and language recognition. A key part of our approach is the use of a kernel that compares sequences of feature vectors and produces a measure of similarity. Our sequence kernel is based upon generalized linear discriminants. We show that this strategy has several important properties. First, the kernel uses an explicit expansion into SVM feature space—this property makes it possible to collapse all support vectors into a single model vector and have low computational complexity. Second, the SVM builds upon a simpler mean-squared error classifier to produce a more accurate system. Finally, the system is competitive and complimentary to other approaches, such as Gaussian mixture models (GMMs). We give results for the 2003 NIST speaker and language evaluations of the system and also show fusion with the traditional GMM approach. 相似文献

2.

Applying SVMs and weight-based factor analysis to unsupervised adaptation for speaker verification

Mitchell McLaren Driss Matrouf Robbie Vogt Jean-Francois Bonastre 《Computer Speech and Language》2011,25(2):327-340

This paper presents an extended study on the implementation of support vector machine (SVM) based speaker verification in systems that employ continuous progressive model adaptation using the weight-based factor analysis model. The weight-based factor analysis model compensates for session variations in unsupervised scenarios by incorporating trial confidence measures in the general statistics used in the inter-session variability modelling process. Employing weight-based factor analysis in Gaussian mixture models (GMMs) was recently found to provide significant performance gains to unsupervised classification. Further improvements in performance were found through the integration of SVM-based classification in the system by means of GMM supervectors.This study focuses particularly on the way in which a client is represented in the SVM kernel space using single and multiple target supervectors. Experimental results indicate that training client SVMs using a single target supervector maximises performance while exhibiting a certain robustness to the inclusion of impostor training data in the model. Furthermore, the inclusion of low-scoring target trials in the adaptation process is investigated where they were found to significantly aid performance. 相似文献

3.

NAP序列核函数在话者识别中的应用 总被引：1，自引：1，他引：0

下载免费PDF全文

邢玉娟李明《计算机工程》2010,36(8):194-196

针对话者识别系统中特征向量不定长和交叉信道干扰等问题,提出一种基于超向量的扰动属性投影(NAP)核函数。该函数是一种新型的序列核函数,使支持向量机能在整体语音序列上分类,移除核函数空间中与话者识别无关的信道子空间信息。仿真实验结果表明,该函数可有效提高支持向量机的分类性能和话者识别系统的识别准确率。相似文献

4.

Enhancement of a text-independent speaker verification system by using feature combination and parallel structure classifiers

Abdalmalak Kerlos Atia Gallardo-Antolín Ascensión 《Neural computing & applications》2018,29(3):637-651

Speaker verification (SV) systems involve mainly two individual stages: feature extraction and classification. In this paper, we explore these two modules with the aim of improving the performance of a speaker verification system under noisy conditions. On the one hand, the choice of the most appropriate acoustic features is a crucial factor for performing robust speaker verification. The acoustic parameters used in the proposed system are: Mel Frequency Cepstral Coefficients, their first and second derivatives (Deltas and Delta–Deltas), Bark Frequency Cepstral Coefficients, Perceptual Linear Predictive, and Relative Spectral Transform Perceptual Linear Predictive. In this paper, a complete comparison of different combinations of the previous features is discussed. On the other hand, the major weakness of a conventional support vector machine (SVM) classifier is the use of generic traditional kernel functions to compute the distances among data points. However, the kernel function of an SVM has great influence on its performance. In this work, we propose the combination of two SVM-based classifiers with different kernel functions: linear kernel and Gaussian radial basis function kernel with a logistic regression classifier. The combination is carried out by means of a parallel structure approach, in which different voting rules to take the final decision are considered. Results show that significant improvement in the performance of the SV system is achieved by using the combined features with the combined classifiers either with clean speech or in the presence of noise. Finally, to enhance the system more in noisy environments, the inclusion of the multiband noise removal technique as a preprocessing stage is proposed.

相似文献

5.

基于KL散度的支持向量机方法及应用研究 总被引：1，自引：0，他引：1

屈微刘贺平张海军《信息与控制》2005,34(5):627-630

针对ICA提取的说话人语音特征,导出以库尔贝克—莱布勒（KL）散度作为距离测度的KL核函数用来设计支持向量机,实现了一个高分辨率的ICA/SVM说话人确认系统.说话人确认的仿真实验结果表明,使用ICA特征基函数系数比直接使用语音数据训练SVM得到的分类间隔大,支持向量少,而且使用KL核函数的ICA/SVM系统确认的等差率也低于其它传统SVM方法,证明了基于KL散度的支持向量机方法在实现分类和判决上具有高效性能. 相似文献

6.

基于PCA和核Fisher判别的说话人确认 总被引：1，自引：0，他引：1

邢玉娟 LI Ming 张亚芬《计算机工程与设计》2008,29(15)

针对核Fisher判别技术在说话人确认中实时性较差的问题,提出了一种基于PCA和核Fisher判别的说话人确认方法.利用PCA进行特征向量的降维、去冗余,以减少后续计算的复杂度,提高说话人确认的速度,使用基于核函数的Fisher判别技术对说话人进行确认,从而在整体上提高系统的实时性.并通过实验验证了该方法的有效性. 相似文献

7.

Face recognition using Gabor-based direct linear discriminant analysis and support vector machine

《Computers & Electrical Engineering》2013,39(3):727-745

This paper presents a novel and uniform framework for face recognition. This framework is based on a combination of Gabor wavelets, direct linear discriminant analysis (DLDA) and support vector machine (SVM). First, feature vectors are extracted from raw face images using Gabor wavelets. These Gabor-based features are robust against local distortions caused by the variance of illumination, expression and pose. Next, the extracted feature vectors are projected to a low-dimensional subspace using DLDA technique. The Gabor-based DLDA feature vectors are then applied to SVM classifier. A new kernel function for SVM called hyperhemispherically normalized polynomial (HNP) is also proposed in this paper and its validity on the improvement of classification accuracy is theoretically proved and experimentally tested for face recognition. The proposed algorithm was evaluated using the FERET database. Experimental results show that the proposed face recognition system outperforms other related approaches in terms of recognition rate. 相似文献

8.

基于新序列核支持向量机的说话人识别

李杰王成儒《数据采集与处理》2009,24(Z1)

为了更好地将区分式分类方法应用于说话者确认系统中,构建序列核支持向量机已成为说话人识别领域的研究热点与趋势.本文在研究可再生希尔伯特空间框架的基础之上构建出一个新的序列核来对语音序列间的相似性进行度量,并结合近年来提出针对支持向量机(SVM)跨信道子空间特征差异(ISV)所提出的归整技术(LFA,NAP,CSP),进一步优化序列核系统.在美国国家标准与技术研究所(NIST)2004年评测数据集的实验中,新序列核系统的识别率高于传统高斯混合模型(GMM)和基于广义线性区分性核(GLDS)的支持向量机. 相似文献

9.

基于支撑向量机的说话人确认系统 总被引：2，自引：1，他引：1

何昕刘重庆李介谷《计算机工程与应用》2000,36(12):70-71,91

支撑向量机(SVM)是一种新的统计学习方法,和以往的学习方法不同的是SVM的学习原则是使结构风险(Structural Risk)最小,而经典的学习方法遵循经验风险(Empirical Risk)最小原则,这使得SVM具有较好的总体性能.文章提出一种基于支撑向量机的文本无关的说话人确认系统,实验表明同基于向量量化(VQ)和高斯混合模式(GMM)的经典方法相比,基于SVM的方法具有更高的区分力和更好的总体性能. 相似文献

10.

Efficient Speaker Recognition Using Approximated Cross Entropy (ACE)

Aronowitz H. Burshtein D. 《IEEE transactions on audio, speech, and language processing》2007,15(7):2033-2043

Techniques for efficient speaker recognition are presented. These techniques are based on approximating Gaussian mixture modeling (GMM) likelihood scoring using approximated cross entropy (ACE). Gaussian mixture modeling is used for representing both training and test sessions and is shown to perform speaker recognition and retrieval extremely efficiently without any notable degradation in accuracy compared to classic GMM-based recognition. In addition, a GMM compression algorithm is presented. This algorithm decreases considerably the storage needed for speaker retrieval. 相似文献

11.

Improving the performance of speaker and language identification tasks using unique characteristics of a class

B. Bharathi C. Arun Kumar T. Nagarajan 《International Journal of Speech Technology》2013,16(1):115-124

In classification tasks, the error rate is proportional to the commonality among classes. In conventional GMM-based modeling technique, since the model parameters of a class are estimated without considering other classes in the system, features that are common across various classes may also be captured, along with unique features. This paper proposes to use unique characteristics of a class at the feature-level and at the phoneme-level, separately, to improve the classification accuracy. At the feature-level, the performance of a classifier has been analyzed by capturing the unique features while modeling, and removing common feature vectors during classification. Experiments were conducted on speaker identification task, using speech data of 40 female speakers from NTIMIT corpus, and on a language identification task, using speech data of two languages (English and French) from OGI_MLTS corpus. At the phoneme-level, performance of a classifier has been analyzed by identifying a subset of phonemes, which are unique to a speaker with respect to his/her closely resembling speaker, in the acoustic sense, on a speaker identification task. In both the cases (feature-level and phoneme-level) considerable improvement in classification accuracy is observed over conventional GMM-based classifiers in the above mentioned tasks. Among the three experimental setup, speaker identification task using unique phonemes shows as high as 9.56 % performance improvement over conventional GMM-based classifier. 相似文献

12.

基于GMM统计特性参数和SVM的话者确认 总被引：1，自引：0，他引：1

黄伟戴蓓蒨《数据采集与处理》2004,19(4):365-370

针对与文本无关的话者确认中大量训练样本数据的情况，本文提出了一种基于GMM统计特性参数和支持向量机的与文本无关的话者确认系统，以说话人的GMM统计特性参数作为特征参数训练建立目标话者的SVM模型，既有效地提取了话者特征信息，解决了大样本数据下的SVM训练问题，又结合了统计模型鲁棒性好和辨别模型分辨力好的优点，提高了确认系统的确认性能及鲁棒性。对微软麦克风语音数据库和NIST’01手机电话语音数据库的实验表明该方法的有效性。相似文献

13.

Discriminative speaker recognition using large margin GMM

Reda Jourani Khalid Daoudi Régine André-Obrecht Driss Aboutajdine 《Neural computing & applications》2013,22(7-8):1329-1336

Most state-of-the-art speaker recognition systems are based on discriminative learning approaches. On the other hand, generative Gaussian mixture models (GMM) have been widely used in speaker recognition during the last decades. In an earlier work, we proposed an algorithm for discriminative training of GMM with diagonal covariances under a large margin criterion. In this paper, we propose an improvement of this algorithm, which has the major advantage of being computationally highly efficient, thus well suited to handle large-scale databases. We also develop a new strategy to detect and handle the outliers that occur in the training data. To evaluate the performances of our new algorithm, we carry out full NIST speaker identification and verification tasks using NIST-SRE’2006 data, in a Symmetrical Factor Analysis compensation scheme. The results show that our system significantly outperforms the traditional discriminative support vector machines (SVM)-based system of SVM-GMM supervectors, in the two speaker recognition tasks. 相似文献

14.

Text-independent speaker recognition based on the Hurst parameter and the multidimensional fractional Brownian motion model

Sant'Ana R. Coelho R. Alcaim A. 《IEEE transactions on audio, speech, and language processing》2006,14(3):931-940

In this paper, a text-independent automatic speaker recognition (ASkR) system is proposed-the SR/sub Hurst/-which employs a new speech feature and a new classifier. The statistical feature pH is a vector of Hurst (H) parameters obtained by applying a wavelet-based multidimensional estimator (M/spl I.bar/dim/spl I.bar/wavelets ) to the windowed short-time segments of speech. The proposed classifier for the speaker identification and verification tasks is based on the multidimensional fBm (fractional Brownian motion) model, denoted by M/spl I.bar/dim/spl I.bar/fBm. For a given sequence of input speech features, the speaker model is obtained from the sequence of vectors of H parameters, means, and variances of these features. The performance of the SR/sub Hurst/ was compared to those achieved with the Gaussian mixture models (GMMs), autoregressive vector (AR), and Bhattacharyya distance (dB) classifiers. The speech database-recorded from fixed and cellular phone channels-was uttered by 75 different speakers. The results have shown the superior performance of the M/spl I.bar/dim/spl I.bar/fBm classifier and that the pH feature aggregates new information on the speaker identity. In addition, the proposed classifier employs a much simpler modeling structure as compared to the GMM. 相似文献

15.

高维金字塔匹配核改进算法

下载免费PDF全文

张俊赵光宙顾弘《中国图象图形学报》2011,16(9):1650-1655

随着特征维数增加,原金字塔匹配核（PMK）期望误差线性上升,从而性能存在着大幅下降的可能。提出一种改进的金字塔匹配算法,通过不断的二分维特征空间从而产生一系列特征子空间,加权求和每一特征子空间内对特征的金字塔匹配核,最后通过核优化得到半正定核矩阵,从而能够利用基于核学习算法（如支持向量机）求解。在两个数据集（Caltech-101、ETH-80）上的实验表明,相对于其他相应改进算法需要增加几百倍的计算时间,DP-PMK只增加46倍的计算时间就能够达到与其一样的准确率。相似文献

16.

Environment adaptation for robust speaker verification by cascading maximum likelihood linear regression and reinforced learning

《Computer Speech and Language》2007,21(2):231-246

In speaker verification over public telephone networks, utterances can be obtained from different types of handsets. Different handsets may introduce different degrees of distortion to the speech signals. This paper attempts to combine a handset selector with (1) handset-specific transformations, (2) reinforced learning, and (3) stochastic feature transformation to reduce the effect caused by the acoustic distortion. Specifically, during training, the clean speaker models and background models are firstly transformed by MLLR-based handset-specific transformations using a small amount of distorted speech data. Then reinforced learning is applied to adapt the transformed models to handset-dependent speaker models and handset-dependent background models using stochastically transformed speaker patterns. During a verification session, a GMM-based handset classifier is used to identify the most likely handset used by the claimant; then the corresponding handset-dependent speaker and background model pairs are used for verification. Experimental results based on 150 speakers of the HTIMIT corpus show that environment adaptation based on the combination of MLLR, reinforced learning and feature transformation outperforms CMS, Hnorm, Tnorm, and speaker model synthesis. 相似文献

17.

Texture classification using the support vector machines 总被引：12，自引：0，他引：12

Shutao James T. Hailong Yaonan 《Pattern recognition》2003,36(12):2883-2893

In recent years, support vector machines (SVMs) have demonstrated excellent performance in a variety of pattern recognition problems. In this paper, we apply SVMs for texture classification, using translation-invariant features generated from the discrete wavelet frame transform. To alleviate the problem of selecting the right kernel parameter in the SVM, we use a fusion scheme based on multiple SVMs, each with a different setting of the kernel parameter. Compared to the traditional Bayes classifier and the learning vector quantization algorithm, SVMs, and, in particular, the fused output from multiple SVMs, produce more accurate classification results on the Brodatz texture album. 相似文献

18.

Local fuzzy PCA based GMM with dimension reduction on speaker identification

Ki Yong Lee 《Pattern recognition letters》2004,25(16):3846-1817

To reduce the high dimensionality required for training of feature vectors in speaker identification, we propose an efficient GMM based on local PCA with fuzzy clustering. The proposed method firstly partitions the data space into several disjoint clusters by fuzzy clustering, and then performs PCA using the fuzzy covariance matrix on each cluster. Finally, the GMM for speaker is obtained from the transformed feature vectors with reduced dimension in each cluster. Compared to the conventional GMM with diagonal covariance matrix, the proposed method shows faster result with less storage maintaining same performance. 相似文献

19.

Support vector learning for fuzzy rule-based classification systems 总被引：11，自引：0，他引：11

Yixin Chen Wang J.Z. 《Fuzzy Systems, IEEE Transactions on》2003,11(6):716-728

To design a fuzzy rule-based classification system (fuzzy classifier) with good generalization ability in a high dimensional feature space has been an active research topic for a long time. As a powerful machine learning approach for pattern recognition problems, the support vector machine (SVM) is known to have good generalization ability. More importantly, an SVM can work very well on a high- (or even infinite) dimensional feature space. This paper investigates the connection between fuzzy classifiers and kernel machines, establishes a link between fuzzy rules and kernels, and proposes a learning algorithm for fuzzy classifiers. We first show that a fuzzy classifier implicitly defines a translation invariant kernel under the assumption that all membership functions associated with the same input variable are generated from location transformation of a reference function. Fuzzy inference on the IF-part of a fuzzy rule can be viewed as evaluating the kernel function. The kernel function is then proven to be a Mercer kernel if the reference functions meet a certain spectral requirement. The corresponding fuzzy classifier is named positive definite fuzzy classifier (PDFC). A PDFC can be built from the given training samples based on a support vector learning approach with the IF-part fuzzy rules given by the support vectors. Since the learning process minimizes an upper bound on the expected risk (expected prediction error) instead of the empirical risk (training error), the resulting PDFC usually has good generalization. Moreover, because of the sparsity properties of the SVMs, the number of fuzzy rules is irrelevant to the dimension of input space. In this sense, we avoid the "curse of dimensionality." Finally, PDFCs with different reference functions are constructed using the support vector learning approach. The performance of the PDFCs is illustrated by extensive experimental results. Comparisons with other methods are also provided. 相似文献

20.

Design of a real time automatic speech recognition system using Modified One Against All SVM classifier

J. Manikandan B. VenkataramaniAuthor vitae 《Microprocessors and Microsystems》2011,35(6):568-578

In this paper, Texas Instruments TMS320C6713 DSP based real-time speech recognition system using Modified One Against All Support Vector Machine (SVM) classifier is proposed. The major contributions of this paper are: the study and evaluation of the performance of the classifier using three feature extraction techniques and proposal for minimizing the computation time for the classifier. From this study, it is found that the recognition accuracies of 93.33%, 98.67% and 96.67% are achieved for the classifier using Mel Frequency Cepstral Coefficients (MFCC) features, zerocrossing (ZC) and zerocrossing with peak amplitude (ZCPA) features respectively. To reduce the computation time required for the systems, two techniques – one using optimum threshold technique for the SVM classifier and another using linear assembly are proposed. The ZC based system requires the least computation time and the above techniques reduce the execution time by a factor of 6.56 and 5.95 respectively. For the purpose of comparison, the speech recognition system is also implemented using Altera Cyclone II FPGA with Nios II soft processor and custom instructions. Of the two approaches, the DSP approach requires 87.40% less number of clock cycles. Custom design of the recognition system on the FPGA without using the soft-core processor would have resulted in less computational complexity. The proposed classifier is also found to reduce the number of support vectors by a factor of 1.12–3.73 when applied to speaker identification and isolated letter recognition problems. The techniques proposed here can be adapted for various other SVM based pattern recognition systems. 相似文献