共查询到19条相似文献,搜索用时 877 毫秒
1.
针对汉语是基于音节的语言,发音过程具有“枣核型”的特点,提出一种描述汉语动态视位的模型,模型分别对音节自身和音节之间的唇部运动进行建模。对音节利用基于声韵母的唇部子运动模型描述,先提取声母和韵母发音时的唇部特征参数,并按参数对口型归类,得到化简的音节视位模型,再计算唇部子运动与音节发音过程在口型上的相似性。在音节间采用元音影响分级的权重函数模拟协同发音影响,先分析各元音与其后接辅音的口型影响,再通过权重函数控制实际发音口型。实验结果表明,相对于单音子或三音子模型表征汉语动态视位,方法提高了动画效率,使得汉语音唇动画更为合理,自然。 相似文献
2.
用1/3音节作匹配基元研究汉语单字音识别[Ⅰ] 总被引:1,自引:0,他引:1
本文通过对汉语语音特点的分析,结合数字信号处理的有关理论,提出了用1/3音节作匹配基元进行汉语单字音识别的方法。它是按汉语音节结构将单字音分成三个匹配基元进行识别的,然后再把结果相拼组成单字音。这种方法介于音素识别和音节识别之间。小字库(104个音节)实验表明:该方法基本上保持了音素识别法的运算量,存贮量低的优点:同时也保持了音节识别法的识别率较高的优势,是一种值得探索的方法。本文主要介绍该方法的原理及韵母识别部分的实验结果。 相似文献
3.
本文首先介绍了骨导汉语音节清晰度测试方法,然后给出了在不同使用部位、不同质量拾振器、不同频率宽度、不同噪声级等多种条件下的测量数值。还分析讨论了声母、韵母的误听情况及音节清晰度与听感之间的关系,指出了骨导语言的特征和最佳状态条件下清晰度可能达到的最大值。 相似文献
4.
基于模糊粗神经网络的汉语声韵母切分 总被引:1,自引:1,他引:0
针对汉语连续语音,提出一种声韵母切分方法.根据扩展的声韵母为识别基元.采用汉语音节的重叠音素分割策略,利用模糊粗神经网络进行声韵母自动切分.实验室实验证明了该方法进行音节分割的有效性和合理性. 相似文献
5.
汉语连续语音识别中不同基元声学模型的复合 总被引:1,自引:0,他引:1
该文研究由不同声学基元训练的声学模型的复合。在汉语连续语音识别中,流行的基元包括上下文相关的声韵母基元和音素基元。实验发现,有些汉语音节在声韵母模型下有更高的识别率,有些音节在音素模型下有更高的识别率。该文提出一种复合这两种声学模型的方法,一方面在识别过程中同时使用两种模型,另一方面在识别过程中避开造成低识别率的模型。实验表明,采用本文的方法后,音节错误率比音素模型和声韵母模型分别下降了9.60%和6.10%。 相似文献
6.
7.
在介绍和评述了当今各种语音合成方式优缺点的基础上,作者认为用参数合成方式实现音节型语音合成系统是汉语合成较优越方式,设计了一个用有限音节合成无限词汇的汉语语音合成系统模型,试验证明其可行性,并指出进一步提高合成汉语语音自然度的途径。 相似文献
8.
9.
本文基于谱包络参数对孤立音节频域分段的常用分段方法进行了较为系统的分析和比较,在此基础上提出了基于离散卡洛变换(KLT)的谱压缩分段法和以段内离散度最小,段间离散度最大为准则的聚类分段方法.实验表明,采用这两种方法后,对汉语孤立音节的分段效果均有相当程度的提高,从而为汉语语音的音素分段和特征提取提供了新的手段和方法。 相似文献
10.
本文通过对汉语语音的特性分析,及各类音素的DFT谱特性,特别是清/浊音的DFT谱差异的研究,概括出了可用于连续语音音节分割的两个相对最佳的动态特征;同时,提出了动态特征曲线极小值区域分布情况的一种定量描述方法——凹谷函数描述法。在这些研究的基础上,本文给出了一个具体的分段算法。实验验证表明,本文的分段方法对连续汉语语音的音节分割是有效的。最后,本文将这种方法应用到语图分析中,并首次实现了连续语音动态语图按音节的自动分割。 相似文献
11.
汉语语音正弦模型特征分析和听觉辨识 总被引:1,自引:0,他引:1
为了研究汉语语音的声学特征,将语音信号的正弦模型应用于语音的特征提取和分析,通过对语音的模型参数应用峰值匹配算法,得到了基于正弦模型的语谱图.该语谱图能直观地反映出语音信号中基音频率及共振峰的细节及其变化规律,为语音信号的分析提供了可视化的工具.在此基础上,对汉语单韵母音节的前两个共振峰进行了分析,在控制使用少数几个主... 相似文献
12.
Pietro Laface 《Signal processing》1980,2(2):113-129
This paper describes the implementation of a Speech Understanding System component which tracks the formants of pseudo-syllabic nuclei containing voiced consonants. The nuclei are isolated from continuous speech after a precategorical classification in which feature extraction is carried out by modules organized in a hierarchy of levels. FFT and LPC spectra are the input to the formant tracking system. It works under the control of rules specifying the possible formant evolutions given previously hypothesized phonetic features and produces fuzzy graphs rather than usual formant patterns because formants are not always evident in the spectrogram pattern. 相似文献
13.
A linear predictive coding (LPC) model based on time-dependent poles which has yielded promising results when applied to synthetic data is applied to real speech data. The data are processed pitch-synchronously using a simple procedure to identify regions of the data that best fit the model. The maximum-likelihood technique, which has been found to be robust in the presence of noise, is used to estimate the parameters. Resulting formant estimates for several diphthongs are presented. The algorithm tracks the formants well, both in stable regions and in regions of transition. This ability to track formant variation within analysis intervals is a definite advantage over traditional LPC. Results from speech data involving final stop consonants are presented. Rapid changes, particularly in the first and second formants, in the region immediately prior to the stop are detected. Such abrupt transitions are often not detected by traditional time-invariant methods 相似文献
14.
V. N. Sorokin I. V. Geras’kin 《Journal of Communications Technology and Electronics》2013,58(12):1292-1301
Two methods for estimating the vocal-tract length equivalent to the homogeneous acoustic tube length are investigated. One method is based on calculating the tract length from the difference between the frequencies of the adjacent local spectral maxima, which exceed 4 kHz. In the other method, the vocal-tract length is calculated according to the average frequency of the second formant determined by the frequencies of first three formants. In addition, various variants of analysis are discussed irrespective of the context and with allowance for known vowels. The probability that the speaker gender is correctly recognized via two methods is about 13%, and its value is almost independent of the knowledge of the context. The probabilities that male and female voices are correctly recognized according to the difference of higher formants are, respectively, 31 and 25.5% regardless of the context and 37 and 31% with allowance for it. The probabilities of correct recognition of male and female voices reach to 27 and 21.5%, respectively, if context-independent recognition is performed from the average frequency of the second formant and 43 and 35.5% after context-dependent recognition with the known vowel type. 相似文献
15.
Vijayalakshmi P Reddy MR O'Shaughnessy D 《IEEE transactions on bio-medical engineering》2007,54(4):621-629
In this paper, we describe a group delay-based signal processing technique for the analysis and detection of hypernasal speech. Our preliminary acoustic analysis on nasalized vowels shows that, even though additional resonances are introduced at various frequency locations, the introduction of a new resonance in the low-frequency region (around 250 Hz) is found to be consistent. This observation is further confirmed by a perceptual analysis carried out on vowel sounds that are modified by introducing different nasal resonances, and an acoustic analysis on hypernasal speech. Based on this, for subsequent experiments the focus is given only to the low-frequency region. The additive property of the group delay function can be exploited to resolve two closely spaced formants. However, when the formants are very close with considerably wider bandwidths as in hypernasal speech, the group delay function also fails to resolve. To overcome this, we suggest a band-limited approach to estimate the locations of the formants. Using the band-limited group delay spectrum, we define a new acoustic measure for the detection of hypernasality. Experiments are carried out on the phonemes /a/, /i/, and /u/ uttered by 33 hypernasal speakers and 30 normal speakers. Using the group delay-based acoustic measure, the performance on a hypernasality detection task is found to be 100% for /a/, 88.78% for /i/ and 86.66% for /u/. The effectiveness of this acoustic measure is further cross-verified on a speech data collected in an entirely different recording environment. 相似文献
16.
An algorithm is introduced which labels formants from the peaks of pole-focused spectra. A clustering procedure is first used to produce line segments of possible formants. These can be considered as anchor traces for the later processing. Rule-based labelling is then applied to provide final formant trace estimates. Experimental results show that the proposed algorithm offers improved formant labelling accuracy. 相似文献
17.
18.
An approach to finding readable patterns corresponding to Arabic numerals is presented. A two-dimensional CRT indicator for depicting the approximate locations of the first and second formants of the spoken numerals is described. 相似文献
19.
This paper describes a novel end-to-end deep generative model-based speaker recognition system using prosodic features. The usefulness of variational autoencoders (VAE) in learning the speaker-specific prosody representations for the speaker recognition task is examined herein for the first time. The speech signal is first automatically segmented into syllable-like units using vowel onset points (VOP) and energy valleys. Prosodic features, such as the dynamics of duration, energy, and fundamental frequency ( ), are then extracted at the syllable level and used to train/adapt a speaker-dependent VAE from a universal VAE. The initial comparative studies on VAEs and traditional autoencoders (AE) suggest that the former can efficiently learn speaker representations. Investigations on the impact of gender information in speaker recognition also point out that gender-dependent impostor banks lead to higher accuracies. Finally, the evaluation on the NIST SRE 2010 dataset demonstrates the usefulness of the proposed approach for speaker recognition. 相似文献