首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 687 毫秒
1.
This paper describes the implementation of a Speech Understanding System component which tracks the formants of pseudo-syllabic nuclei containing voiced consonants. The nuclei are isolated from continuous speech after a precategorical classification in which feature extraction is carried out by modules organized in a hierarchy of levels. FFT and LPC spectra are the input to the formant tracking system. It works under the control of rules specifying the possible formant evolutions given previously hypothesized phonetic features and produces fuzzy graphs rather than usual formant patterns because formants are not always evident in the spectrogram pattern.  相似文献   

2.
汉语语音正弦模型特征分析和听觉辨识   总被引:1,自引:0,他引:1  
张毅楠  肖熙 《电声技术》2011,35(8):38-41
为了研究汉语语音的声学特征,将语音信号的正弦模型应用于语音的特征提取和分析,通过对语音的模型参数应用峰值匹配算法,得到了基于正弦模型的语谱图.该语谱图能直观地反映出语音信号中基音频率及共振峰的细节及其变化规律,为语音信号的分析提供了可视化的工具.在此基础上,对汉语单韵母音节的前两个共振峰进行了分析,在控制使用少数几个主...  相似文献   

3.
本文提出了一种新的语音信号共振峰的提取方法。在LPC幅度谱上搜寻最大的极大值点所对应的频率,并将它作为构成声道参数的某一谐振腔所对应的共轭复根的角度,再通过LPC系数的相—频特性的一次导数和三次导数相结合的方法求出这对共轭复根的幅度,从而确定了该谐振腔,也就得到了该谐振腔的共振峰。然后,用LPC的多项式对该谐振腔所对应的多项式做多项式除法,得到新的LPC系数,接着重复前面的步骤,可以较好地求出在LPC谱中对应幅度最大的两个共振峰。  相似文献   

4.
In the speech synthesis model presented in this paper, voiced speech is synthesized as the sum of sinusoidally modulated two FM sinusoids corresponding to the first and second formants. Each FM signal is generated such that its amplitude is equal to the formant amplitude, its carrier frequency to the formant frequency or its linear combination, its modulation frequency to the pitch, and its modulation index to one fifth of the carrier to modulation frequency ratio. Unvoiced speech is generated by shifting the center frequency of a low-pass noise with a bandwidth of 1 KHz, to the frequency where the energy of the unvoiced speech is concentrated. The drawbacks of this scheme are that the pitch and the formant frequencies of the FM signals may deviate up to 40% and 9%, respectively, and spurious formants may occur. A hardware implementation can be accomplished by driving a linear analog circuitry which can simply be integrated on a single chip, by a digital computer which supplies voltages at every T = 5 ms corresponding to seven parameter values. Examples of the signals and spectrograms of synthesized speech obtained by both synthesis by analysis and synthesis by rule are given along with a set of rules for text-to-speech synthesis of Turkish. It is observed that the speech synthesized by analysis loses the speaker's identity but it is highly intelligible, while understanding the speech synthesized by rules requires a training period.  相似文献   

5.
潘晋  杨卫英 《电声技术》2009,33(5):62-65
快速、高效地实现语音驱动下的唇形自动合成,以及优化语音与唇动的同步是语音驱动人脸动画的重点。提出了一种基于共振峰分析的语音驱动人脸动画的方法。对语音信号进行加窗分帧,DFT变换,再对短时音频信号的频谱进行第一、第二共振峰分析,将分析结果映射为一组控制序列,并对控制序列进行去奇异点等后处理。设定三维人脸模型的动态基本口形,以定时方式将控制序列导入模型,完成人脸动画驱动。实验结果表明,该方法简单快速,有效实现了语音和唇形的同步,动画效果连贯自然,可广泛用于各类虚拟角色的配音,缩短虚拟人物的制作周期。  相似文献   

6.
The quality of synthetic speech is affected by two factors: intelligibility and naturalness. At present, synthesized speech may be highly intelligible, but often sounds unnatural. Speech intelligibility depends on the synthesizer's ability to reproduce the formants, the formant bandwidths, and formant transitions, whereas speech naturalness is thought to depend on the excitation waveform characteristics for voiced and unvoiced sounds. Voiced sounds may be generated by a quasiperiodic train of glottal pulses of specified shape exciting the vocal tract filter. It is generally assumed that the glottal source and the vocal tract filter are linearly separable and do not interact. However, this assumption is often not valid, since it has been observed that appreciable source-tract interaction can occur in natural speech. Previous experiments in speech synthesis have demonstrated that the naturalness of synthetic speech does improve when source-tract interaction is simulated in the synthesis process. The purpose of this paper is two-fold: (1) to present an algorithm for automatically measuring source-tract interaction for voiced speech, and (2) to present a simple speech production model that incorporates source-tract interaction into the glottal source model, This glottal source model controls: (1) the skewness of the glottal pulse, and (2) the amount of the first formant ripple superimposed on the glottal pulse. A major application of the results of this paper is the modeling of vocal disorders  相似文献   

7.
Huang  X.D. Jack  M.A. Duncan  G. 《Electronics letters》1987,23(20):1047-1048
An algorithm is introduced which labels formants from the peaks of pole-focused spectra. A clustering procedure is first used to produce line segments of possible formants. These can be considered as anchor traces for the later processing. Rule-based labelling is then applied to provide final formant trace estimates. Experimental results show that the proposed algorithm offers improved formant labelling accuracy.  相似文献   

8.
LPC方法提防语音信号共振峰的分析   总被引:3,自引:1,他引:2  
通过对LPC(线性预测编码)方法提取语音信号共振峰进行的研究表明,采用相一频特性与对数幅-频特性同样能提取语音信号共振峰。与对数据-频特性的二次导数相比,相-频特性的三导数有更高的频率分辨率,能更有效地解决共振峰合并的问题,撮更精确的共振峰参数。  相似文献   

9.
介绍了2种改进型的语音识别方法:单边自相关LPC系数法和线性预测误差法。这2种方法与传统的线性预测编码LPC法相比,其抗噪能力增强,即在强噪声环境下仍能达到较高的识别率。把这3种方法分别应用于端点检测和语音识别,用实验数据说明了2种改进型方法显著的抗噪性特点。  相似文献   

10.
Formant frequencies, represented by major peaks in the spectrum of speech signals, convey important information about speech. The authors propose a method for detecting the formants of voiced speech through `instantaneous frequency' (IF) estimation using a recursive least square (RLS) algorithm. The accuracy of the technique is assessed by comparing it with conventional formant detection techniques. This method is also analysed from the viewpoint of phonetic conformity using `temporal decomposition'  相似文献   

11.
为了解决传统氦语音处理技术存在的处理速度慢、计算复杂、操作困难等问题,提出了一种采用机器学习的氦语音识别方法,通过深层网络学习高维信息、提取多种特征,不但解决了过拟合问题,同时也具备了字错率(Word Error Rate,WER)低、收敛速度快的优点。首先自建氦语音孤立词和连续氦语音数据库,对氦语音数据预处理,提取的语音特征主要包括共振峰特征、基音周期特征和FBank(Filter Bank)特征。之后将语音特征输入到由深度卷积神经网络(Deep Convolutional Neural Network,DCNN)和连接时序分类(Connectionist Temporal Classification,CTC)组成的声学模型进行语音到拼音的建模,最后应用Transformer语言模型得到汉字输出。提取共振峰特征、基音周期特征和FBank特征的氦语音孤立词识别模型相比于仅提取FBank特征的识别模型的WER降低了7.91%,连续氦语音识别模型的WER降低了14.95%。氦语音孤立词识别模型的最优WER为1.53%,连续氦语音识别模型的最优WER为36.89%。结果表明,所提方法可有效识别氦语音。  相似文献   

12.
提出一种基于正弦加噪声模型的说话人转换方法,着重讨论通过修改音素段内的声学参数实现说话人的转换。通过修改基音频率和共振峰结构,该方法合成的语音有效地模拟了目标说话人的特性。听力测试表明,转换后的语音和目标说话人的语音相似度达到78.8%。与经典的LPC方法的对比实验验证了该法在合成语音质量方面的优越性。  相似文献   

13.
A complete algorithm of a 1200-bits/s digital formant vocoder system is described. This vocoder algorithm draws heavily on the results of recent research in linear predictive coding. The transmitting parameters are frequencies and amplitudes of the first three formants, the pitch period, voiced/unvoiced decision, and the gain. Formant bandwidths are estimated at the synthesizer by using the amplitude information. The synthesizer structure is in the parallel form. The synthetic speech quality at 1200 bits/s is reasonably good; most of the speech is intelligible and speaker-recognizable.  相似文献   

14.
提出了一种结合 MBE(多带激励 )模型和 LPC(线性预测编码 )模型的 1 .8kbps声码器 .在这种声码器中 ,采用 LPC特征参数来代表语音帧的频谱 ,利用 LPC残差进行基音提取和多带清浊音判决 ,采用 MBE模型合成语音 ,并在高频浊音带的语音合成中混以清音 .在定点 Mo-torola DSP560 0 2 EVM上 ,可以在 1 .8kbps的码率下对语音进行实时的编解码处理 ,具有存储量和计算量较小的特点 .其合成语音质量超过了 LPC- 1 0 e.  相似文献   

15.
An information theory approach to the theory and practice of linear predictive coded (LPC) speech compression systems is developed. It is shown that a traditional LPC system can be viewed as a minimum distortion or nearest-neighbor system where the distortion measure is a minimum discrimination information between a speech process model and an observed frame of actual speech. This distortion measure is used in an algorithm for computer-aided design of block source codes subject to a fidelity criterion to obtain a 750-bits/s speech compression system that resembles an LPC system but has a much lower rate, a larger memory requirement, and requires no on-line LPC analysis. Quantitative and informal subjective comparisons are made among our system and LPC systems.  相似文献   

16.
The generation and transmission process of transmitted sound signals (TSS) is analyzed and a mathematical model of TSS is established here. The power cepstral characteristics of TSS are studied based on the mathematical model and a new analysis method of acoustic transmission of respiratory system using a homomorphic processing technique is proposed. The experimental results show that the normal respiratory system has only one formant, while the abnormal respiratory system presenting lung consolidation has two formants and the second formant plays important role in that system. This new method is a simple and effective one  相似文献   

17.
Software and hardware have been developed to create a powerful, inexpensive, compact digital signal processing system which in real-time extracts a low-bit rate linear predictive coding (LPC) speech system model. The model parameters derived include accurate spectral envelope, formant, pitch, and amplitude information. The system is based on the Texas Instruments TMS320 family, and the most compact realization requires only three chips (TMS320E17, A/D-D/A, op-amp), consuming a total of less than 0.5 W. The processor is part of programmable cochlear implant system under development by a multiuniversity Canadian team, but also has other applications in aids to the hearing handicapped.  相似文献   

18.
The authors take a modeling approach to studying representation of formant frequencies of spoken speech and speech in noise in the temporal responses of the peripheral auditory system. On the basis of the properties of the representation, they have devised and evaluated a cross-channel correlation algorithm and an interpeak interval analysis for automatic formant extraction of speech which is strongly dynamic in acoustic characteristics and is embedded in noise. The basilar membrane model used in this study contains laterally coupled damping elements, which are made monotonically dependent on the spatial distribution of the short-term power in the outputs of the model. Efficient digital implementation and the related salient numerical properties of the model are described. Simulation results from the model in response to speech and speech in noise illustrate temporal response patterns that are tonotopically organized in relation to speech formant parameters with little influence by the noise level. By utilizing such relations the devised cross-channel correlation algorithm is shown to be capable of accurately tracking formant movements in spoken syllables and sentences  相似文献   

19.
多带激励低速率语音压缩编码算法研究及实时实现   总被引:3,自引:0,他引:3  
崔慧娟  唐昆  郑海生  江灏 《电子学报》1998,26(10):129-132
本文以多带激励声码器为模型,采用了多种技术去降低编码速率和改善音质,我们利用动态规划算法对基音周期进行了平滑,去除了声码器中常用的音调噪声,MBE算法对谱包络的量化要花费大量的比特,这里利用LPC全极点模型谱逼近MBE谱包络,并采用共振峰增强技术来补偿模型误差,对谱幅度参数的量化,采用了分裂矢量量化(SPVQ)和多级矢量量化(MSVQ)的方法,使之在2.4kbps,1.2kbps及800bps等速  相似文献   

20.
该文基于LPC的自适应前后向量化技术,提出了一种可变速率的混合激励线性预测MELP语音编码算法。该算法中,采用当前语音帧(前向LPC)或前面某帧已合成语音帧(后向LPC)进行线性预测,当采用后向LPC时,只需传输时间序列编码,故减少了LPC系数的平均编码比特。计算机模拟表明,该算法与标准MELP算法合成的语音质量相当,但显著减少了LPC的传输带宽,从而明显降低了MELP平均编码速率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号