共查询到19条相似文献,搜索用时 125 毫秒
1.
2.
本文介绍了在语音识别中使用人工神经网络(ANN)构成识别系统的新方法,分析了它与传统识别方法的不同及优越性,并以BP网络构成不定人汉语数字语音识别器,通过计算机模拟实验表明,其识别性能明显优于同样条件下HMM识别器,证明了用ANN进行语音识别是一种很具吸引力有发展前途的新方法。 相似文献
3.
4.
基于ANN的汉语数字语音识别 总被引:1,自引:0,他引:1
本文介绍了在语音识别中使用人工神经网络构成识别系统的新方法,分析了它与传统识别方法的不同及优越性,并以BP网络构成不定人汉语数字语音识别器,通过计算机模拟实验表明,勘误别性能明显优于同样条件下HMM识别器,证明了用ANN进行语音识别是一种具吸引力有发展前途的新方法。 相似文献
5.
介绍一个非特定人、小词汇表、孤立词的语音识别系统,它采用基于隐马尔可夫随机模型(HMM)的语音信号端点检测方法和基于VQIHMM的自学习语音识别算法,同时以高速的TMS320C54X DSP芯片为核心进行硬件设计,实现语音的实时识别。 相似文献
6.
7.
基于自适应滤波的DPSK解调方法及性能 总被引:5,自引:0,他引:5
本文研究了一种基于自适应滤波(ADF)算法的解调差分相移键控(DPSK)信号的方法,用常用的最小均方误差(LMS)自适应算法,研究了自适应解调(ADEM)方法对DPSK信号的解调及其性能,计算机模拟结果表明,自适应DPSK解调比传统的DPSK相干解调性能优越,便于用数字信号处理技术实现。 相似文献
8.
本文给出了一个高性能汉语数码串非特定人连续语音识别系统,其声学模型基于Mel倒谱系数和连续HMM,识别时采用多候选帧同步搜索算法,并采用了MCE算法进行训练以提高系统的区分能力,实验证明该系统的识别率为94.8%(不定长数字串)和96.8%(定长数字串).为增强系统的实用性,本文还研究了基于MAP算法的说话人自适应算法和基于置信度的拒识算法.在进行自适应后,误识率可相对下降40%以上,在拒绝掉5%的正确语音时,系统识别率可以上升到96.9%(不定长数字串)和98.7%(定长数字串). 相似文献
9.
文章介绍了用ADSP21020组成的语音信号处理系统,给出了用ADSP21020汇编语言实现语音信号的提取方法。并在此系统上对通讯线路中的回波相消的自适应算法进行了仿真实现。 相似文献
10.
11.
The author presents a study of large-vocabulary continuous Mandarin speech recognition based on a segmental probability model (SPM) approach. The SPM was found to be very suitable for recognition of isolated Mandarin syllables especially considering the monosyllabic structure of the Chinese language. To extend the application of the model to continuous Mandarin speech recognition, a concatenated syllable matching (CSM) algorithm in place of the conventional Viterbi search algorithm is first introduced. Also, to utilise the available training material efficiently, a training procedure is proposed to re-estimate the SPM parameters using the maximum a posteriori (MAP) algorithm. A few special techniques integrating acoustic and linguistic knowledge are developed further to improve the performance step by step. Preliminary experimental results show that the final achievable rate is as high as 91.62%, which indicates a 18.48% error rate reduction and more than three times faster than the well studied subsyllable-based CHMM 相似文献
12.
Lee L.-M. Chen J.-K. Wang H.-C. 《Vision, Image and Signal Processing, IEE Proceedings -》1994,141(6):397-402
The authors deal with the problem of automatic speech recognition in the presence of additive white noise. The effect of noise is modelled as an additive term to the power spectrum of the original clean speech. The cepstral coefficients of the noisy speech are then derived from this model. The reference cepstral vectors trained from clean speech are adapted to their appropriate noisy version to best fit the testing speech cepstral vector. The LPC coefficients, LPC derived cepstral coefficients, and the distance between test and reference, are all regarded as functions of the noise ratio (the spectral power ratio of noise to noisy speech). A gradient based algorithm is proposed to find the optimal noise ratio as well as the minimum distance between the test cepstral vector and the noise adapted reference. A recursive algorithm based on Levinson-Durbin recursion is proposed to simultaneously calculate the LPC coefficients and the derivatives of the LPC coefficients with respect to the noise ratio. The stability of the proposed adaptation algorithm is also addressed. Experiments on multispeaker (50 males and 50 females) isolated Mandarin digits recognition demonstrate remarkable performance improvements over noncompensated method under noisy environment. The results are also compared to the projection based approach, and experiments show that the proposed method is superior to the projection approach under a severe noisy environment 相似文献
13.
本征音子说话人自适应算法在自适应数据量充足时可以取得很好的自适应效果,但在自适应数据量不足时会出现严重的过拟合现象。为此该文提出一种基于本征音子说话人子空间的说话人自适应算法来克服这一问题。首先给出基于隐马尔可夫模型-高斯混合模型(HMM-GMM)的语音识别系统中本征音子说话人自适应的基本原理。其次通过引入说话人子空间对不同说话人的本征音子矩阵间的相关性信息进行建模;然后通过估计说话人相关坐标矢量得到一种新的本征音子说话人子空间自适应算法。最后将本征音子说话人子空间自适应算法与传统说话人子空间自适应算法进行了对比。基于微软语料库的汉语连续语音识别实验表明,与本征音子说话人自适应算法相比,该算法在自适应数据量极少时能大幅提升性能,较好地克服过拟合现象。与本征音自适应算法相比,该算法以较小的性能牺牲代价获得了更低的空间复杂度而更具实用性。 相似文献
14.
15.
The Oregon Graduate Institute Multi-language Telephone Speech Corpus (OGI-TS) was designed specifically for language identification research. It currently consists of spontaneous and fixed-vocabulary utterances in 11 languages: English, Farsi, French, German, Hindi, Japanese, Korean, Mandarin, Spanish, Tamil, and Vietnamese. These utterances were produced by 90 native speakers in each language over real telephone lines. Language identification is related to speaker-independent speech recognition and speaker identification in several interesting ways. It is therefore not surprising that many of the recent developments in language identification can be related to developments in those two fields. We review some of the more important recent approaches to language identification against the background of successes in speaker and speech recognition. In particular, we demonstrate how approaches to language identification based on acoustic modeling and language modeling, respectively, are similar to algorithms used in speaker-independent continuous speech recognition. Thereafter, prosodic and duration-based information sources are studied. We then review an approach to language identification that draws heavily on speaker identification. Finally, the performance of some representative algorithms is reported 相似文献
16.
Minimizing morphological variances of the vocal tract across speakers is a challenge for articulatory analysis and modeling. In order to reduce morphological differences in speech organs among speakers and retain speakers’ speech dynamics, our study proposes a method of normalizing the vocal-tract shapes of Mandarin and Japanese speakers by using a Thin-Plate Spline (TPS) method. We apply the properties of TPS in a two-dimensional space in order to normalize vocal-tract shapes. Furthermore, we also use DNN (Deep Neural Networks) based speech recognition for our evaluations. We obtained our template for normalization by measuring three speakers’ palates and tongue shapes. Our results show a reduction in variances among subjects. The similar vowel structure of pre/post-normalization data indicates that our framework retains speaker specific characteristics. Our results for the articulatory recognition of isolated phonemes show an improvement of 25%. Moreover, our phone error rate of continuous speech reduced by 5.84%. 相似文献
17.
Wu C.-H. Chen Y.-J. Yan G.-L. 《Vision, Image and Signal Processing, IEE Proceedings -》2000,147(1):55-61
Mandarin speech is known for its tonal characteristic, and prosodic information plays an important role in Mandarin speech recognition. Driven by this property, phonetic and prosodic information are integrated and used for Mandarin telephone speech keyword spotting. A two-stage strategy, with recognition followed by verification, is adopted. For keyword recognition, 132 subsyllable models, two general acoustic filler models and one background/silence model are separately trained and used as the basic recognition units. For utterance verification, 12 anti-subsyllable models, 175 context-dependent prosodic models and five anti-prosodic models are constructed. A keyword verification function combining phonetic-phase and prosodic-phase verification is investigated. Using a test set of 3088 conversational speech utterances from 33 speakers (20 males and 13 females) and a vocabulary of 2583 faculty names, at 8.5% false rejection, the proposed verification method results in an 18.3% false alarm rate. Furthermore, this method is able correctly to reject 90.9% of non-keywords. Comparison with a baseline system without prosodic-phase verification shows that prosodic information can benefit the verification performance 相似文献
18.
该文提出一种基于低秩约束的本征音子(Eigenphone)说话人自适应方法。原始的本征音子说话人自适应方法在自适应语料充分时具有很好的效果,然而当自适应语料不足时,出现严重的过拟合现象,导致自适应后的系统可能比自适应前的系统还要差。首先,对协方差矩阵为对角阵的隐马尔可夫-高斯混合模型语音识别系统,推导出一种简化的本征音子矩阵估计算法;然后,对本征音子矩阵引入低秩约束,采用矩阵的核范数作为矩阵秩的凸近似,通过调节核范数的权重因子以有效控制自适应模型的复杂度;最后,给出一种加速近点梯度算法以求解新算法中引入的带有核范数正则项的数学优化问题。汉语连续语音识别的说话人自适应实验表明,引入低秩约束后,本征音子说话人自适应方法的自适应效果得到了明显提高,在5~50 s的自适应数据条件下,均取得了比最大似然线性回归后接最大后验(MLLR+MAP)自适应更佳的识别效果。 相似文献
19.
A text independent speaker recognition system based on the Karhunen-Loeve transform, derived from the split-and-merge algorithm, is proposed. The split-and-merge algorithm is applied to speaker data compression in the time domain. A set of experiments is conducted which gives a 91% recognition rate for 100 Mandarin speakers 相似文献