首页 | 官方网站   微博 | 高级检索  
     

基于长短时能量均值的活动语音检测算法
引用本文:游大涛,韩纪庆,邓世文.基于长短时能量均值的活动语音检测算法[J].智能计算机与应用,2011(2):35-39.
作者姓名:游大涛  韩纪庆  邓世文
作者单位:[1]哈尔滨工业大学计算机科学与技术学院,哈尔滨150001 [2]哈尔滨师范大学数学学院,哈尔滨150001
基金项目:国家973基金项目(2007CB311100).
摘    要:为了有效抑制非平稳背景噪音对语音处理系统的严重干扰,提出了一种基于长短时能量均值的活动语音检测算法。该算法基于两个合理的假设,一个是基于语音隐含成分集的稀疏分解,不但能尽可能地深留含噪语音中的语音信息,还能在一定程度上消除非语音类噪音的干扰;另一个是对上述稀疏分解的语音进行重构,该重构信号中语音段的时域能量高于非语音段的时域能量。在上述两个假设的基础上,采用重构信号的时域能量作为音频特征,以当前帧为中心,并将与其相邻的特定数量帧的短时能量均值作为当前帧的得分值;以当前帧及其之前特定数量帧的长时能量均值怍为判决阈值,进而提出了以当前帧的短时能量均值和长时能量均值大小作为判断条件的活动语音检测算法。买验结果显示,该算法能有效地区分低信噪比(平稳噪音和忙平稳噪音)条件下的语音和非语音片段,并且其性能优于基于单Gaussian分布的似然比算法.

关 键 词:语音隐含成分  稀疏分解  能量均值  活动语音检测

The Voice Activity Detection Algorithm based on the Mean of Long and Short Time Frame Energy
YOU Datao,HAN Jiqing,DENG Shiwen.The Voice Activity Detection Algorithm based on the Mean of Long and Short Time Frame Energy[J].INTELLIGENT COMPUTER AND APPLICATIONS,2011(2):35-39.
Authors:YOU Datao  HAN Jiqing  DENG Shiwen
Affiliation:1 School of Computer Science and Technology, Harbin Inslitute of Techuology, Harbin 150001, China; 2 College of Mathematics, Harbin Normal University, Harbin 150001, China)
Abstract:Aimed to remove the effect of the non-stationary noise in speech processing system, the paper proposes a voice activity detection (VAD) algorithm which meanly uses the mean of long and short time frame energy. The proposed VAD algorithm is based on two hypothesis, the first one is that, when the signal is decomposed by sparse representation plus the set of speech underlying structures, the part of energy of speech will be effectivelyreserved, while the part ofenergy ofnon-speech will be partly removed; the other one is that, when time-domain signal is reconstructed from the sparse coefficients which are generated by the above-mentioned sparse representation, the energy of the part of speech is larger than the part of non-speech among the reconstructed signal, in the proposed VAD algorithm, the energy of the reconstructed signal is used as the audio feature, the Simultaneously, taking the current frame as the center, mean of a certain numnber of adjacent shout-time frame energy is used as the score of the current flame; the mean of fixed number of long-time flame energy before the current flame is considered as the detection threshold, and then the current flame is claimed as speech when the score of current flame is larger than the detection threshold, or else it is claimed as non-speech. The experimental results show that the proposed VAD algorithm can distinguish the speech and non-speech effectively, and its performance outperforms the Gaussian distribution based likelihood ratio test (LRT Gaussian) algorithm.
Keywords:Speech Underlying Structure  Sparse Representation  Energy Mean  Voice Activity Detection
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号