首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 312 毫秒
1.
This paper proposes a flexible method for pitch contour modification using the instants of significant excitation of the vocal tract system during the production of speech. The instants of significant excitation correspond to the instants of glottal closure (epochs) in the case of voiced speech, and to some random excitations like onset of burst in the case of nonvoiced speech. Instants of significant excitation are computed from the Linear Prediction (LP) residual of speech signals by using the property of average group-delay of minimum phase signals. The modification of pitch contour is achieved by manipulating the LP residual with the help of the knowledge of the instants of significant excitation. The modified residual is used to excite the time-varying filter, whose parameters are derived from the original speech signal. Perceptual quality of the synthesized speech is good, and is without any significant distortion. The proposed method is evaluated using waveforms, spectrograms and listening tests. Listening tests are performed on voice conversion application, where the source speaker’s pitch contour is modified by the proposed method according to the target speaker’s pitch contour. The performance of the proposed method is compared with Linear Prediction Pitch Synchronous Overlap and Add (LP-PSOLA) method using listening tests, for the voice conversion application.  相似文献   

2.
基于多元Laplace语音模型的语音增强算法   总被引:1,自引:0,他引:1  
传统的短时谱估计语音增强算法通常假设语音谱分量相互独立,没有考虑语音谱分量间的相关性。针对这一问题,该文提出一种新的基于多元Laplace分布模型的短时谱估计算法。首先,假设语音的离散余弦变换(DCT)系数服从多元Laplace分布,以此利用谱分量间的相关性;在此基础上,利用多元随机矢量的高斯尺度混合模型表示,推导得到语音DCT系数矢量的最小均方误差(MMSE)估计的解析表达式;并进一步推导了基于该分布模型的语音存在概率,对最小均方误差估计子进行修正。实验结果表明,该算法在抑制背景噪声和减少语音失真等方面优于传统的语音增强方法。  相似文献   

3.
提出了一种基于EMD的功率谱分析法,即先用经验模态分解方法(EMD)将语音信号分解成若干个固有模态函数IMF分量,而后对包含主要信息的IMF分量利用现代参数模型法估计出它们各自的功率谱。文中对不同情感状态下语音数据的分析结果表明,EMD方法能有效地应用于非平稳语音信号的功率谱分析中,可更细致的体现语音信号内在特征。  相似文献   

4.
郭海燕  杨震  朱卫平 《电子学报》2012,40(4):762-768
 论文以新的语音信号稀疏基—准KLT基的构造为基础,提出了一种新的基于稀疏分解的单通道混合语音分离方法.论文首先以理想准KLT基的构造为基础,从理论上提出并证明了基于各源语音信号的理想准KLT基,利用l exp(0)-范数优化算法,可实现单通道混合语音的完美分离.鉴于单通道混合语音分离时,无法精确求取各源语音信号的理想准KLT基,论文提出先基于正交匹配追踪算法,以混合语音信号为已知条件,构造各源语音信号的正交匹配追踪模板匹配准KLT基,再由l exp(0)-范数优化算法来分离单通道混合语音.仿真实验表明论文所提理论的正确性,和基于正交匹配追踪模板匹配准KLT基来分离单通道混合语音信号的有效性.  相似文献   

5.
An adaptive speech streaming method to improve the perceived speech quality of a software‐based multipoint control unit (SW‐based MCU) over IP networks is proposed. First, the proposed method predicts whether the speech packet to be transmitted is lost. To this end, the proposed method learns the pattern of packet losses in the IP network, and then predicts the loss of the packet to be transmitted over that IP network. The proposed method classifies the speech signal into different classes of silence, unvoiced, speech onset, or voiced frame. Based on the results of packet loss prediction and speech classification, the proposed method determines the proper amount and bitrate of redundant speech data (RSD) that are sent with primary speech data (PSD) in order to assist the speech decoder to restore the speech signals of lost packets. Specifically, when a packet is predicted to be lost, the amount and bitrate of the RSD must be increased through a reduction in the bitrate of the PSD. The effectiveness of the proposed method for learning the packet loss pattern and assigning a different speech coding rate is then demonstrated using a support vector machine and adaptive multirate‐narrowband, respectively. The results show that as compared with conventional methods that restore lost speech signals, the proposed method remarkably improves the perceived speech quality of an SW‐based MCU under various packet loss conditions in an IP network.  相似文献   

6.
语音带宽扩展是为了提高语音质量,利用语音低频和高频之间的相关性重构语音高频的一种技术。高斯混合模型法是语音带宽技术中被广泛应用的一种方法,但是,由于该方法假设语音高频、低频服从高斯分布,且只表征了语音低频、高频之间的线性关系,从而导致合成的高频语音出现失真。因此,该文提出一种基于受限玻尔兹曼机的方法,该方法利用两个高斯伯努利受限玻尔兹曼机提取语音低频和高频中蕴含的高阶统计特性;并利用前馈神经网络将语音低频高阶统计特性参数映射为高频高阶统计特性参数。这样,通过提取语音低频和高频中蕴含的高阶统计特性,该方法可以深层挖掘语音高频和语音低频之间的实际关系,从而更加准确地模拟频谱包络分布,合成质量更高的语音。客观测试、主观测试结果表明,该方法性能优于传统的高斯混合模型方法。  相似文献   

7.
重点讨论了iLBC编解码器独立于帧的长期预测。独立于帧的长期预测是用来在编码语音没有遭受与传输丢失相关的多帧语音退化情况下,开发斜度标记相关的办法。然后介绍了iLBC,G.729A和G.723.1编解码器的平均主观得分MOS,并用信号为例说明基于独立于帧的长期预测编解码器和CELP编解码器之间的不同,最后用语音重构的例子说明二者语音质量间的差别。  相似文献   

8.
提出了基于点过程模型(PPM)的连续语音关键词检测方法。该方法首先利用时态模式(TRAP)特征和多层感知器(MLP)计算每个音素的帧级后验概率,在此基础上,将语音可看作多个相互独立的事件(音素),利用泊松过程对事件建立点过程模型,最后通过计算似然比达到关键词检测目的。实验结果表明,对8kHz采样语音,关键词平均召回率和准确率分别可达69.5%和82%以上。  相似文献   

9.
一种基于非线性特征的应力影响下变异语音识别方法   总被引:2,自引:1,他引:1  
王玉伟  张磊  韩纪庆 《信号处理》2002,18(5):484-486
考虑到变异语音产生的非线性特点,本文提出了一种基于TEO能量算子倒谱特征的应力影响下变异语音识别方法。先将语音信号分割成21个不同频带的信号,然后计算TEO能量,最后进行对数运算和离散余弦变换。对航空模拟飞行器中采集的小词表特定人的识别实验,采用非线性分析的基于TEO能量算子倒谱特征的方法,能有效地提高变异语音的识别性能,比传统的基于MFCC特征的方法识别率提高了11.3%。  相似文献   

10.
针对基于局部二值模式的伪装语音检测方法的合成语音检测准确度较低的情况,提出了一种基于中心对称局部二值模式的伪装语音检测方法。该方法通过短时傅里叶变换得到语音信号的语谱图,再利用中心对称局部二值模式提取语谱图的纹理特征,并用该纹理特征训练随机森林分类器,从而实现真伪语音的判别。该方法综合考虑语谱图中像素点的数值大小和位置关系,包含了更加全面的纹理信息,并将特征维度降低至16维,有利于减少计算量。实验结果表明,在ASVspoof 2019数据集上,与传统的基于局部二值模式的伪装语音检测方法相比,所提方法将合成伪装语音的串联检测代价函数(t-DCF)降低了16.98%,检测速度提高了89.73%。  相似文献   

11.
基于谱稳定性特征的语音与笑声区分新方法   总被引:1,自引:0,他引:1  
该文提出一种采用谱稳定性作为特征参数的区分语音与笑声的新方法.通过分析语音与笑声的谱稳定性参数的特性,发现前者明显小于后者,这表明谱稳定性可以作为区分语音与笑声的特征参数.比较了采用谱稳定性参数、Mel频率倒谱系数、感知线性预测和基音频率等特征参数在相同实验条件下区分语音与笑声的性能.实验结果表明:在特定人和非特定人情况下,采用谱稳定性作为特征参数区分语音与笑声的正确率分别为90.74%和73.63%,其区分能力优于其它特征参数.  相似文献   

12.
基于HMM/VQ的认人的中等词表连续语音识别   总被引:2,自引:2,他引:0  
本文讨论基于隐马尔可夫模型(HMM)和矢量量化(VQ)的连续语音识别方法。用这种方法,对每个单词作成一个HMM,对多个模型组合成的状态转移网络搜索其状态转移的最佳路径,从而实现不预先进行单词切分的连续语音的识别,使用有限态文法约束及其它一些改善识别性能的措施,演示系统能识别特定人的18种英语句式,150个单词,用312个话句(共有2710个单词)进行测试,识别延迟时间为发音时长的62%,发音速度平均为每秒2.32个单词,单词识准率为97.3%。  相似文献   

13.
In mobile communications such as automobile telephone systems, an instantaneous interruption is caused by rapid fading, and speech quality is markedly deteriorated. A pitch-synchronized interpolation method is proposed to improve this quality. This method utilizes the fact that most speech signal regions have a pitch period. In this method, when an instantaneous interruption is detected at the receiver, a speech signal, which includes instantaneous interruption noise, is deleted and interpolated by repetition of the speech signal received one pitch interval wave before the instantaneous interruption. This method can be applied to receivers used in analog and digital transmission systems. A receiver using this method has been constructed, and it was shown that a 10-dB carrier-to-noise ratio (CNR) gain can be obtained by this receiver.  相似文献   

14.
张志华  王炳锡  彭煊 《电声技术》2005,(5):52-54,69
给出一种新的话音检测方法,即在SNR算法的基础上,应用线性判别分析(LDA)对语音特征参数进行降维。在大噪声环境下,该方法提高了系统的稳健性。同时将这种新的方法与基于信噪比(SNR)和基于噪声/语音统计量(N&S STAT)的算法做了比较,实验表明该方法可以提高检测效率。  相似文献   

15.
The predicted wordlength assignment system (PWA) is a digital speech interpolation method which avoids speech clipping and "freeze-out" distortion. Inactive sources are excluded by a speech detector. The active speech signals are coded with variable wordlengths (3-8 bits) at a sampling rate of 8 kHz. In an overload case, all active sources are still served, but at reduced wordlength. The required wordlength is calculated using only the signal history, which is also available at the receiver. Therefore, no auxiliary information about the individual wordlength is transmitted. A system with up to 128 telephone conversation speech sources has been studied using computer simulation. The signal-to-noise ratio (SNR) is employed to describe speech quality. With an input of 128 sources (40 percent activity) and a transmission rate per source of 21 kbits/s, an SNR of 34 dB can be achieved. Above a bit rate of 16 kbits/s, distortions are not audible. As a first step towards implementation, a specially designed fast microprocessor has been used to simulate the most important PWA system functions, such as speech detection, linear prediction, and coding algorithm.  相似文献   

16.
In this paper, we propose a robust distant-talking speech recognition by combining cepstral domain denoising autoencoder (DAE) and temporal structure normalization (TSN) filter. As DAE has a deep structure and nonlinear processing steps, it is flexible enough to model highly nonlinear mapping between input and output space. In this paper, we train a DAE to map reverberant and noisy speech features to the underlying clean speech features in the cepstral domain. For the proposed method, after applying a DAE in the cepstral domain of speech to suppress reverberation, we apply a post-processing technology based on temporal structure normalization (TSN) filter to reduce the noise and reverberation effects by normalizing the modulation spectra to reference spectra of clean speech. The proposed method was evaluated using speech in simulated and real reverberant environments. By combining a cepstral-domain DAE and TSN, the average Word Error Rate (WER) was reduced from 25.2 % of the baseline system to 21.2 % in simulated environments and from 47.5 % to 41.3 % in real environments, respectively.  相似文献   

17.
陈浩  鲍长春  夏丙寅 《信号处理》2014,30(7):813-821
为了解决基于相位差滤波器(PBF)双麦克风方法残留噪声较多的问题,本文在PBF方法基础上提出一种基于高斯混合模型的双麦克风噪声消除方法。该方法首先采用高斯混合模型(GMM)对目标语音存在(λ1)与目标语音不存(λ0)在这两种情况进行建模。其次,在实时增强阶段,根据贝叶斯分类器计算每帧的目标语音存在概率(TSPP),随后根据噪声抑制最大化准则修正PBF的增益函数并得到改进的相位差滤波器(IPBF),最后将TSPP与 IPBF的增益函数相结合,进而得到一种用于双麦克风噪声消除的掩蔽滤波器。实验结果表明:本文提出算法可有效抑制残留噪声,尤其是在目标语音不存在的时间段   相似文献   

18.
In stress speech recognition, a recognition model that is capable of processing multi-stress speech needs to be designed in the view points of accuracy and add-ability. This paper proposes addable stress speech recognition with multiplexing Hidden-Markov model (HMM). To achieve multi-stress speech, we propose a multiplexing topology that combines multiple stress speech models. Since each stress affects a speech in different way, having a speech recognition model that specifically trained to recognize words effected by the stress help improve the recognition rates. However, since each stress speech model gives it own independent recognized word, we need to have an effective decision module to choose the correct word. In each stress speech model, a MFCC is applied to the input speech. The result is fed into a HMM that is segmented into N parts. Each part of the segmentation provides its own tentative recognized word which in turn is an input to the proposed non-training decision module. Based on these tentative recognized words from segments of all stress speech models, the final recognized word is decided using coarse-to-fine concept performed by a majority vote, segment-weighted difference square score and next best score, respectively. Besides neutral speech, the proposed method was verified using three stresses including angry, loud, and Lombard. The results showed that the proposed method achieved 94.7 % recognition rate comparing to 94.2 % of the training-based decision method.  相似文献   

19.
The authors propose a channel compensation method for the hidden Markov model (HMM) parameters in automatic speech recognition. The proposed approach is to adapt the existing reference models to a new channel environment by using a small amount of adaptation data. The concept of HMM parameter adaptation by incorporating the corresponding phone-dependent channel compensation (PDCC) vectors is applied to improve the performance of speech recognition. Two extended PDCC techniques are presented. One is based on the refinement of PDCC using vector quantisation. The other is based on the interpolation of compensation vectors. Both techniques are evaluated on the experiments on telephone speech recognition and speaker adaptation. The experimental results show that the performance can be significantly improved  相似文献   

20.
胡丹  曾庆宁  龙超  黄桂敏 《电视技术》2015,39(24):43-46
针对大词汇量连续语音识别中识别率不高的问题,提出了将语音增强级联在识别系统前端,在语音增强中将谱减法和对数最小均方误差算法(logmmse)与用于噪声估计的最小控制递归平均算法(imcra)相结合。识别系统使用Mel频率倒谱系数(MFCC)提取特征,用隐马尔科夫模型(HMM)训练与识别。实验结果表明,提出的方法最高能使单词识别率提高38.9%,使句子正确率提高21.8%。该方法用于大词汇量连续语音识别是可行的,有效的。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号