首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于改进VQ算法的文本无关的说话人识别   总被引:5,自引:2,他引:3  
基于矢量量化的说话人识别,因其运算过程简单等特点,在文本无关的说话人识别领域有着广泛的应用。论文根据说话人识别中训练语音的特点并结合快速搜索算法,对矢量量化的码书形成算法进行了改进,提出了一种基于改进算法的与文本无关的说话人识别方法。经实验结果证明,论文的方法加快了码书的形成,减少了码书形成的计算量,改善了码本的性能,提高了说话人识别的识别率。  相似文献   

2.
传统的声音识别系统通过短时声音频谱信息来辨识说话人,这种方法在某些条件下具有较好的性能。但是由于有些说话人特征隐藏在较长的语音片段中,通过添加长时信息可能会进一步提高系统的性能。在文中,音素持续时间信息被添加到传统模型上,以提高说话人辨识率。频谱信息是通过短时分析获得的,但音素持续时间的提取却属于长时分析,它需要更多的语音数据。通过大量语音数据探讨了音素持续时间信息对说话人辨识的有效性,提出2种方法来解决数据量小所引起的问题。实验结果表明,当说话人的声音模型被恰当建立时,即使在语音数据量小的情况下,音素持续时间信息对说话人辨识率的提高也是有效的。  相似文献   

3.
In this paper a new text-independent speaker verification method GSMSV is proposed based on likelihood score normalization.In this novel method a global speaker model is established to represent the universal features of speech and normalize the likelihood score.Statistical analysis demonstrates that this normalization method can remove common factors of speech and bring the differences between speakers into prominence.As a result the equal error rate is decreased significantly,verification procedure is accelerated and system adaptability to speaking speed is improved.  相似文献   

4.

Speaker recognition revolution has lead to the inclusion of speaker recognition modules in several commercial products. Most published algorithms for speaker recognition focus on text-dependent speaker recognition. In contrast, text-independent speaker recognition is more advantageous as the client can talk freely to the system. In this paper, text-independent speaker recognition is considered in the presence of some degradation effects such as noise and reverberation. Mel-Frequency Cepstral Coefficients (MFCCs), spectrum and log-spectrum are used for feature extraction from the speech signals. These features are processed with the Long-Short Term Memory Recurrent Neural Network (LSTM-RNN) as a classification tool to complete the speaker recognition task. The network learns to recognize the speakers efficiently in a text-independent manner, when the recording circumstances are the same. The recognition rate reaches 95.33% using MFCCs, while it is increased to 98.7% when using spectrum or log-spectrum. However, the system has some challenges to recognize speakers from different recording environments. Hence, different speech enhancement techniques, such as spectral subtraction and wavelet denoising, are used to improve the recognition performance to some extent. The proposed approach shows superiority, when compared to the algorithm of R. Togneri and D. Pullella (2011).

  相似文献   

5.
陈迪  龚卫国  杨利平 《计算机应用》2007,27(5):1217-1219
提出了一种可用于改善说话人识别效果的基于基音周期的可变窗长语音MFCC参数提取方法。基本原理是将原始的语音分解为当前基音周期整数倍长度以内部分及其以外部分,并保留前者舍去后者,以减小训练语音与测试语音的频谱失真。通过文本无关的说话人确认实验,验证了该方法能有效提高说话人确认的识别率,并能提高短时语音的稳定性。  相似文献   

6.
In this paper an online text-independent speaker verification system developed at IIT Guwahati under multivariability condition for remote person authentication is described. The system is developed on a voice server accessible via telephone network using an interactive voice response (IVR) system in which both enrollment and testing can be done online. The speaker verification system is developed using Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction and Gaussian Mixture Model—Universal Background Model (GMM-UBM) for modeling. The performance of the system under multi-variable condition is evaluated using online enrollments and testing from the subjects. The evaluation of the system helps in understanding the impact of several well known issues related to speaker verification such as the effect of environment noise, duration of test speech, robustness of the system against playing recorded speech etc. in an online system scenario. These issues need to be taken care for the development and deployment of speaker verification system in real life applications.  相似文献   

7.
该文提出了一种新的与文本无关的说话人识别算法。这种算法使用了能处理说话人交叉变量的语音信号频谱变化的模型。使用了两种不同音质的语音,即″纯净音质″和″电话音质″来测试这一算法,得到了很好的实验结果。  相似文献   

8.
Speaker verification techniques neglect the short-time variation in the feature space even though it contains speaker related attributes. We propose a simple method to capture and characterize this spectral variation through the eigenstructure of the sample covariance matrix. This covariance is computed using sliding window over spectral features. The newly formulated feature vectors representing local spectral variations are used with classical and state-of-the-art speaker recognition systems. Results on multiple speaker recognition evaluation corpora reveal that eigenvectors weighted with their normalized singular values are useful in representing local covariance information. We have also shown that local variability features can be extracted using mel frequency cepstral coefficients (MFCCs) as well as using three recently developed features: frequency domain linear prediction (FDLP), mean Hilbert envelope coefficients (MHECs) and power-normalized cepstral coefficients (PNCCs). Since information conveyed in the proposed feature is complementary to the standard short-term features, we apply different fusion techniques. We observe considerable relative improvements in speaker verification accuracy in combined mode on text-independent (NIST SRE) and text-dependent (RSR2015) speech corpora. We have obtained up to 12.28% relative improvement in speaker recognition accuracy on text-independent corpora. Conversely in experiments on text-dependent corpora, we have achieved up to 40% relative reduction in EER. To sum up, combining local covariance information with the traditional cepstral features holds promise as an additional speaker cue in both text-independent and text-dependent recognition.  相似文献   

9.
In this paper, the problem of identifying in-set versus out-of-set speakers using extremely limited enrollment data is addressed. The recognition objective is to form a binary decision regarding an input speaker as being a legitimate member of a set of enrolled speakers or not. Here, the emphasis is on low enrollment (about 5 sec of speech for each enrolled speaker) and test data durations (2-8 sec), in a text-independent scenario. In order to overcome the limited enrollment, data from speakers that are acoustically close to a given in-set speaker are used to form an informative prior (base model) for speaker adaptation. Score normalization for in-set systems is addressed, and the difficulty of using conventional score normalization schemes for in-set speaker recognition is highlighted. Distribution scaling based score normalization techniques are developed specifically for the in-set/out-of-set problem and compared against existing score normalization schemes used in open-set speaker recognition. Experiments are performed using the following three separate corpora: (1) noise-free TIMIT; (2) noisy in-vehicle CU-move; and (3) the NIST-SRE-2006 database. Experimental results show a consistent increase in system performance for the proposed techniques.  相似文献   

10.
以线性预测系数为特征通过高斯混合模型的迭代算法对训练样本的初始k均值聚类结果进行优化,得到语音组成单位的表示.以语音组成单位的模式匹配为基础,提出一种文本无关说话人确认的方法——均值法,以及一种文本无关说话人辨认方法.实验结果表明,即使在短时语音下本文方法都能取得较好效果.  相似文献   

11.
基于高斯混合模型的说话人确认系统   总被引:5,自引:1,他引:4  
杨澄宇  赵文  杨鉴 《计算机应用》2001,21(4):7-8,11
由于在人的话音频谱中,低频和较高频段含有较多说话人的个性信息,本文提出一种LPC倒谱的改进算法用于与文本无关的说话人识别,该改进算法通过话音频谱的各频段进行加权,突出说话人的个性信息,从而使说话人更易于区分。  相似文献   

12.
联合因子分析中的本征信道空间拼接方法   总被引:1,自引:1,他引:0  
何亮  史永哲  刘加 《自动化学报》2011,37(7):849-856
为了使联合因子分析适用于多种信道条件下的文本无关说话人识别,提出了一种本征信道空间的正交拼接法.在多信道条件下,可以通过混合数据法或简单拼接法估计本征信道空间,但前者存在空间掩盖,后者虽解决了空间掩盖但引入了空间重叠.本文首先证明说话人建模和测试的核心运算是斜投影,基于上述证明,通过将待拼接空间正交的方法移除了空间重叠.在NIST SRE 2008核心评测数据库上的实验表明,本文所提算法优于混合数据法和简单拼接法.  相似文献   

13.
研究了基于美尔倒谱特征参数及高斯混合模型的文本无关的说话人识别系统,为了提高噪声环境下识别系统的识别率,从两个角度研究改善该系统抗噪性能的方法,即利用语音识别将文本无关的系统转化为文本有关的说话人识别方法和通过选择鲁棒性较强的帧进行说话人识别的方法,分析了以上方法对系统识别性能的改善作用,并通过实验验证上述方法确实可以提高系统在噪声环境下的识别率。  相似文献   

14.
This correspondence introduces a new text-independent speaker verification method, which is derived from the basic idea of pattern recognition that the discriminating ability of a classifier can be improved by removing the common information between classes. In looking for the common speech characteristics between a group of speakers, a global speaker model can be established. By subtracting the score acquired from this model, the conventional likelihood score is normalized with the consequence of more compact score distribution and lower equal error rates. Several experiments are carried out to demonstrate the effectiveness of the proposed method  相似文献   

15.
在文本无关的说话人辨识中,为了提高系统在电话语音条件下的鲁棒性,提出了将说话人确认中常用的评分规整手段用于说话人辨识中,即对测试语音通过不同话者模型的评分分别进行评分规整,为测试语音选取最接近的话者模型作为系统识别输出,有效地提高了系统性能。在NIST’03 1spk数据库上的说话人辨识实验表明了评分规整技术对说话人辨识的有效性。  相似文献   

16.
余巍  李辉 《计算机工程》2011,37(23):162-164
基于高斯混合模型(GMM)-通用背景模型(UBM)结构的说话人确认系统不能完全表现说话人的个性特征信息。为此,将聚类方法和排序高斯混合模型相结合,对每个高斯分量按照对应排序值顺序排列,并对UBM进行训练。基于NIST 06 8side-1side数据库的实验结果表明,该方法能在基本保持系统识别性能的前提下,降低UBM的训练运算量。  相似文献   

17.
We present a new modeling approach for speaker recognition that uses the maximum-likelihood linear regression (MLLR) adaptation transforms employed by a speech recognition system as features for support vector machine (SVM) speaker models. This approach is attractive because, unlike standard frame-based cepstral speaker recognition models, it normalizes for the choice of spoken words in text-independent speaker verification without data fragmentation. We discuss the basics of the MLLR-SVM approach, and show how it can be enhanced by combining transforms relative to multiple reference models, with excellent results on recent English NIST evaluation sets. We then show how the approach can be applied even if no full word-level recognition system is available, which allows its use on non-English data even without matching speech recognizers. Finally, we examine how two recently proposed algorithms for intersession variability compensation perform in conjunction with MLLR-SVM.  相似文献   

18.
基于模型距离和支持向量机的说话人确认   总被引:1,自引:0,他引:1  
针对采用支持向量机的说话人的确认问题,提出采用背景模型、说话人模型、测试语句模型间距离和夹角作为支持向量机的特征矢量,同时将组特征矢量与广义线性判别式序列核函数的参数相拼接,能够取得相对于基线的混合高斯模型算法更高的识别率.在2004年NIST评测数据库上,采用推荐算法的系统等错误率比基线的混合高斯-背景模型系统低16%.对说话人识别取得一定进展.  相似文献   

19.
研究了多种低速率信道环境下,语音编码对与文本无关说话人确认的影响。针对训练和测试语音匹配和不匹配的两种情况下,分别提出了两种方法来提高系统的鲁棒性。在前者中,通过分析语音编码对LPCC参数的影响,提出了一种基于编码失真的 LPCC 加权参数。在后者中,采用了基于高斯混合模型(GMM)的语音编码检测器,通过判别测试语音的编码类型,选择不同的说话人确认模型。实验结果表明,这两种方法提高了说话人确认系统在多信道条件下的鲁棒性。  相似文献   

20.
In speaker verification over public telephone networks, utterances can be obtained from different types of handsets. Different handsets may introduce different degrees of distortion to the speech signals. This paper attempts to combine a handset selector with (1) handset-specific transformations, (2) reinforced learning, and (3) stochastic feature transformation to reduce the effect caused by the acoustic distortion. Specifically, during training, the clean speaker models and background models are firstly transformed by MLLR-based handset-specific transformations using a small amount of distorted speech data. Then reinforced learning is applied to adapt the transformed models to handset-dependent speaker models and handset-dependent background models using stochastically transformed speaker patterns. During a verification session, a GMM-based handset classifier is used to identify the most likely handset used by the claimant; then the corresponding handset-dependent speaker and background model pairs are used for verification. Experimental results based on 150 speakers of the HTIMIT corpus show that environment adaptation based on the combination of MLLR, reinforced learning and feature transformation outperforms CMS, Hnorm, Tnorm, and speaker model synthesis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号