期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Formant labelling using pole-focused spectra

Huang X.D. Jack M.A. Duncan G. 《Electronics letters》1987,23(20):1047-1048

An algorithm is introduced which labels formants from the peaks of pole-focused spectra. A clustering procedure is first used to produce line segments of possible formants. These can be considered as anchor traces for the later processing. Rule-based labelling is then applied to provide final formant trace estimates. Experimental results show that the proposed algorithm offers improved formant labelling accuracy. 相似文献

2.

A time-varying analysis method for rapid transitions in speech

Nathan K.S. Lee Y.-T. Silverman H.F. 《Signal Processing, IEEE Transactions on》1991,39(4):815-824

A linear predictive coding (LPC) model based on time-dependent poles which has yielded promising results when applied to synthetic data is applied to real speech data. The data are processed pitch-synchronously using a simple procedure to identify regions of the data that best fit the model. The maximum-likelihood technique, which has been found to be robust in the presence of noise, is used to estimate the parameters. Resulting formant estimates for several diphthongs are presented. The algorithm tracks the formants well, both in stable regions and in regions of transition. This ability to track formant variation within analysis intervals is a definite advantage over traditional LPC. Results from speech data involving final stop consonants are presented. Rapid changes, particularly in the first and second formants, in the region immediately prior to the stop are detected. Such abrupt transitions are often not detected by traditional time-invariant methods 相似文献

3.

Speech synthesis using AM/FM sinusoids and band-pass noise

Nezih C. Geçki̇ni̇ Tülay Güngen Hilmi Güngen Mehmet Eti̇şkol 《Signal processing》1985,8(3):339-361

In the speech synthesis model presented in this paper, voiced speech is synthesized as the sum of sinusoidally modulated two FM sinusoids corresponding to the first and second formants. Each FM signal is generated such that its amplitude is equal to the formant amplitude, its carrier frequency to the formant frequency or its linear combination, its modulation frequency to the pitch, and its modulation index to one fifth of the carrier to modulation frequency ratio. Unvoiced speech is generated by shifting the center frequency of a low-pass noise with a bandwidth of 1 KHz, to the frequency where the energy of the unvoiced speech is concentrated. The drawbacks of this scheme are that the pitch and the formant frequencies of the FM signals may deviate up to 40% and 9%, respectively, and spurious formants may occur. A hardware implementation can be accomplished by driving a linear analog circuitry which can simply be integrated on a single chip, by a digital computer which supplies voltages at every T = 5 ms corresponding to seven parameter values. Examples of the signals and spectrograms of synthesized speech obtained by both synthesis by analysis and synthesis by rule are given along with a set of rules for text-to-speech synthesis of Turkish. It is observed that the speech synthesized by analysis loses the speaker's identity but it is highly intelligible, while understanding the speech synthesized by rules requires a training period. 相似文献

4.

A cepstral method for analysis of acoustic transmissioncharacteristics of respiratory system

Jingping Xu Jingzhi Cheng Yanjun Wu 《IEEE transactions on bio-medical engineering》1998,45(5):660-664

The generation and transmission process of transmitted sound signals (TSS) is analyzed and a mathematical model of TSS is established here. The power cepstral characteristics of TSS are studied based on the mathematical model and a new analysis method of acoustic transmission of respiratory system using a homomorphic processing technique is proposed. The experimental results show that the normal respiratory system has only one formant, while the abnormal respiratory system presenting lung consolidation has two formants and the second formant plays important role in that system. This new method is a simple and effective one 相似文献

5.

Vocal-tract length estimation

V. N. Sorokin I. V. Geras’kin 《Journal of Communications Technology and Electronics》2013,58(12):1292-1301

Two methods for estimating the vocal-tract length equivalent to the homogeneous acoustic tube length are investigated. One method is based on calculating the tract length from the difference between the frequencies of the adjacent local spectral maxima, which exceed 4 kHz. In the other method, the vocal-tract length is calculated according to the average frequency of the second formant determined by the frequencies of first three formants. In addition, various variants of analysis are discussed irrespective of the context and with allowance for known vowels. The probability that the speaker gender is correctly recognized via two methods is about 13%, and its value is almost independent of the knowledge of the context. The probabilities that male and female voices are correctly recognized according to the difference of higher formants are, respectively, 31 and 25.5% regardless of the context and 37 and 31% with allowance for it. The probabilities of correct recognition of male and female voices reach to 27 and 21.5%, respectively, if context-independent recognition is performed from the average frequency of the second formant and 43 and 35.5% after context-dependent recognition with the known vowel type. 相似文献

6.

一种新的语音信号共振峰提取的算法

何峰陈晓清李国锁林嘉宇《信号处理》2007,23(4):618-621

本文提出了一种新的语音信号共振峰的提取方法。在LPC幅度谱上搜寻最大的极大值点所对应的频率,并将它作为构成声道参数的某一谐振腔所对应的共轭复根的角度,再通过LPC系数的相—频特性的一次导数和三次导数相结合的方法求出这对共轭复根的幅度,从而确定了该谐振腔,也就得到了该谐振腔的共振峰。然后,用LPC的多项式对该谐振腔所对应的多项式做多项式除法,得到新的LPC系数,接着重复前面的步骤,可以较好地求出在LPC谱中对应幅度最大的两个共振峰。相似文献

7.

An algorithm for formant tracking

Tülay Güngen Nezih C. Geçkinli 《Signal processing》1984,6(4):293-300

A formant tracking algorithm than first forms strings of spectral peaks considering only the relative positions of the peaks and then assigns these strings to the formants according to the relative positions and lengths of the strings is presented. An example is also given. 相似文献

8.

一种基于共振峰分析的语音驱动人脸动画方法

潘晋杨卫英《电声技术》2009,33(5):62-65

快速、高效地实现语音驱动下的唇形自动合成,以及优化语音与唇动的同步是语音驱动人脸动画的重点。提出了一种基于共振峰分析的语音驱动人脸动画的方法。对语音信号进行加窗分帧,DFT变换,再对短时音频信号的频谱进行第一、第二共振峰分析,将分析结果映射为一组控制序列,并对控制序列进行去奇异点等后处理。设定三维人脸模型的动态基本口形,以定时方式将控制序列导入模型,完成人脸动画驱动。实验结果表明,该方法简单快速,有效实现了语音和唇形的同步,动画效果连贯自然,可广泛用于各类虚拟角色的配音,缩短虚拟人物的制作周期。相似文献

9.

汉语语音正弦模型特征分析和听觉辨识 总被引：1，自引：0，他引：1

张毅楠肖熙《电声技术》2011,35(8):38-41

为了研究汉语语音的声学特征,将语音信号的正弦模型应用于语音的特征提取和分析,通过对语音的模型参数应用峰值匹配算法,得到了基于正弦模型的语谱图.该语谱图能直观地反映出语音信号中基音频率及共振峰的细节及其变化规律,为语音信号的分析提供了可视化的工具.在此基础上,对汉语单韵母音节的前两个共振峰进行了分析,在控制使用少数几个主... 相似文献

10.

A Low-Rate Digital Formant Vocoder

Chong Un 《Communications, IEEE Transactions on》1978,26(3):344-355

A complete algorithm of a 1200-bits/s digital formant vocoder system is described. This vocoder algorithm draws heavily on the results of recent research in linear predictive coding. The transmitting parameters are frequencies and amplitudes of the first three formants, the pitch period, voiced/unvoiced decision, and the gain. Formant bandwidths are estimated at the synthesizer by using the amplitude information. The synthesizer structure is in the parallel form. The synthetic speech quality at 1200 bits/s is reasonably good; most of the speech is intelligible and speaker-recognizable. 相似文献

11.

Hierarchical approach to formant detection and tracking throughinstantaneous frequency estimation

Ghaemmaghami S. Deriche M. Boashash B. 《Electronics letters》1997,33(1):17-18

Formant frequencies, represented by major peaks in the spectrum of speech signals, convey important information about speech. The authors propose a method for detecting the formants of voiced speech through `instantaneous frequency' (IF) estimation using a recursive least square (RLS) algorithm. The accuracy of the technique is assessed by comparing it with conventional formant detection techniques. This method is also analysed from the viewpoint of phonetic conformity using `temporal decomposition' 相似文献

12.

Measuring and modeling vocal source-tract interaction

Childers D.G. Chun-Fan Wong 《IEEE transactions on bio-medical engineering》1994,41(7):663-671

The quality of synthetic speech is affected by two factors: intelligibility and naturalness. At present, synthesized speech may be highly intelligible, but often sounds unnatural. Speech intelligibility depends on the synthesizer's ability to reproduce the formants, the formant bandwidths, and formant transitions, whereas speech naturalness is thought to depend on the excitation waveform characteristics for voiced and unvoiced sounds. Voiced sounds may be generated by a quasiperiodic train of glottal pulses of specified shape exciting the vocal tract filter. It is generally assumed that the glottal source and the vocal tract filter are linearly separable and do not interact. However, this assumption is often not valid, since it has been observed that appreciable source-tract interaction can occur in natural speech. Previous experiments in speech synthesis have demonstrated that the naturalness of synthetic speech does improve when source-tract interaction is simulated in the synthesis process. The purpose of this paper is two-fold: (1) to present an algorithm for automatically measuring source-tract interaction for voiced speech, and (2) to present a simple speech production model that incorporates source-tract interaction into the glottal source model, This glottal source model controls: (1) the skewness of the glottal pulse, and (2) the amount of the first formant ripple superimposed on the glottal pulse. A major application of the results of this paper is the modeling of vocal disorders 相似文献

13.

Dynamic formant tracking of noisy speech using temporal analysis onoutputs from a nonlinear cochlear model

Deng L. Kheirallah I. 《IEEE transactions on bio-medical engineering》1993,40(5):456-467

The authors take a modeling approach to studying representation of formant frequencies of spoken speech and speech in noise in the temporal responses of the peripheral auditory system. On the basis of the properties of the representation, they have devised and evaluated a cross-channel correlation algorithm and an interpeak interval analysis for automatic formant extraction of speech which is strongly dynamic in acoustic characteristics and is embedded in noise. The basilar membrane model used in this study contains laterally coupled damping elements, which are made monotonically dependent on the spatial distribution of the short-term power in the outputs of the model. Efficient digital implementation and the related salient numerical properties of the model are described. Simulation results from the model in response to speech and speech in noise illustrate temporal response patterns that are tonotopically organized in relation to speech formant parameters with little influence by the noise level. By utilizing such relations the devised cross-channel correlation algorithm is shown to be capable of accurately tracking formant movements in spoken syllables and sentences 相似文献

14.

利用频谱搬移控制语音转换中的共振峰

彭柏许刚《电声技术》2007,31(1):39-43

在研究频谱搬移方法和分析语音共振峰性质及变化规律的基础上,提出了利用频谱搬移调整共振峰的算法,能有效控制共振峰轨迹合成声道模型。讨论了语音转换的实现流程,并将合成的声源模型应用于男、女声之间的转换,实验结果和分析表明,该方法可实现对共振峰的灵活控制,使语音转换具有更高的融合度。相似文献

15.

汉语双音节中第一音节的元音共振峰轨迹研究

周忠诚王孟杰于水源《电声技术》2007,31(3):8-10,13

汉语中,协同发音主要取决于相邻前一音节末尾的元音,以及相邻后一音节首的辅音。主要考察在汉语普通话双音节中,第一音节元音韵母和不同第二音节声母组合时对第一个音节元音共振峰轨迹的影响。元音韵母选用元音三角形的3个顶点的元音,总结了轨迹变化的规律。相似文献

16.

一种基于语音频谱的基频和共振峰提取算法

王坤赤蒋华《信息技术》2007,(10):20-22

基音频率和共振峰频率的提取在语音编码、语音合成和语音识别中有着广泛的应用。通过深入分析语音信号的时域和频域性质,针对语音信号幅度谱的特征设计了一种有效的基频和共振峰提取算法。并对实际语音信号进行参数提取测试,实验结果证明了这种算法能够准确提取不同讲话者和录音条件下的语音信号的基频与共振峰频率。相似文献

17.

Acoustical properties of speech as indicators of depression and suicidal risk

France DJ Shiavi RG Silverman S Silverman M Wilkes DM 《IEEE transactions on bio-medical engineering》2000,47(7):829-837

Acoustic properties of speech have previously been identified as possible cues to depression, and there is evidence that certain vocal parameters may be used further to objectively discriminate between depressed and suicidal speech. Studies were performed to analyze and compare the speech acoustics of separate male and female samples comprised of normal individuals and individuals carrying diagnoses of depression and high-risk, near-term suicidality. The female sample consisted of ten control subjects, 17 dysthymic patients, and 21 major depressed patients. The male sample contained 24 control subjects, 21 major depressed patients, and 22 high-risk suicidal patients. Acoustic analyses of voice fundamental frequency (Fo), amplitude modulation (AM), formants, and power distribution were performed on speech samples extracted from audio recordings collected from the sample members. Multivariate feature and discriminant analyses were performed on feature vectors representing the members of the control and disordered classes. Features derived from the formant and power spectral density measurements were found to be the best discriminators of class membership in both the male and female studies. AM features emerged as strong class discriminators of the male classes. Features describing Fo were generally ineffective discriminators in both studies. The results support theories that identify psychomotor disturbances as central elements in depression and suicidality. 相似文献

18.

一种基于共振峰提取的多通道响度补偿算法

赵毅尹雪飞陈克安《信号处理》2012,28(3):352-360

共振峰是语音信号的一个重要特征,对提高耳聋患者的语言识别率具有重要意义。然而,目前数字助听器领域常用的响度补偿算法（多通道响度补偿和宽动态压缩）均对共振峰结构具有一定的破坏性,对患者听懂语音十分不利。本文结合共振峰检测,提出一种基于共振峰提取的多通道响度补偿算法,在原有多通道响度补偿的基础上,通过对滤波器组的重新设计并加入共振峰提取模块对共振峰予以保护。仿真结果证明,该算法对4类常见患耳均能达到满意的补偿效果,同时,与多通道响度补偿和宽动态压缩两种方法比较,该算法在保护共振峰结构完整性方面性能更优。相似文献

19.

New method for extracting speech formants using LPC phase spectrum 总被引：1，自引：0，他引：1

Cai Jinhai Jiang Gangji Zhang Lihe 《Electronics letters》1993,29(24):2081-2082

A new method for formant extraction using the LPC phase spectrum is proposed, which is especially effective in finding merged peaks. The bandwidth of a formant is easily calculated from the magnitude of the third derivative of the LPC phase spectrum.<> 相似文献

20.

Formant coding of speech using dynamic programming

Dupree B.C. 《Electronics letters》1984,20(7):279-280

An algorithm is proposed which will obtain, from an input speech signal, formant parameter data to control a parallel formant speech synthesiser. By allowing some delay and employing variable-frame-rate techniques, the parameter data can be obtained at a low frame rate (typically 20 frames per second) suitable for transmission or storage. 相似文献