期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Preprocessing and Segmentation of the Speech Signal in the Frequency Domain for Speech Recognition

A. S. Kolokolov 《Automation and Remote Control》2003,64(6):985-994

Preprocessing of the speech signal before recognition of phonemes was considered. Methods of processing the spectrum and segmenting the speech signal for stable speech recognition in the presence of frequency distortions were proposed. They are based on a procedure of linear filtering of the logarithmic spectrum envelope. 相似文献

2.

一种新的网络电话会议混音算法

韩钰普杰信《计算机应用》2010,30(2):564-566

混音处理在网络电话会议系统中起着举足轻重的作用,怎样解决混音中带来的溢出和噪声问题又是混音处理的核心。通过对现有混音算法中存在问题的分析,提出了一种自适应减谱法,在不发生溢出现象的基础上,对语音信号进行傅里叶变换,从带噪语音的功率谱中减去噪声功率谱,从而得到较为纯净的语音频谱来降低噪声,再进行傅里叶反变换将其与混音权重相适应,使混音后的音质更清晰且流畅,避免了混音过程中的噪声问题,提高了混音后的质量,音质更接近于现场会议,可以应用于多媒体电话会议系统中。相似文献

3.

Bandwidth extension of telephone speech using magnitude spectrum data hiding

Prasad Nizampatnam Kishore Kumar Tappeta 《International Journal of Speech Technology》2017,20(1):151-162

Public telephone systems transmit speech across a limited frequency range, about 300–3400 Hz, called narrowband (NB) which results in a significant reduction of quality and intelligibility of speech. This paper proposes a fully backward compatible novel method for bandwidth extension of NB speech. The method uses magnitude spectrum data hiding technique to provide a perceptually better wideband speech signal. Code excited linear prediction parameters are extracted from the down sampled frequency shifted version of the high frequency components of speech signal existing above NB, which are spread by using pseudo-noise codes, and are embedded in the low amplitude high-frequency regions of the magnitude spectrum of NB speech signal. The embedded information is extracted at the receiving end to reconstruct the wideband speech signal. Theoretical and simulation analyses show that the proposed method is robust to quantization and channel noises. The comparison category rating listening and log spectral distortion tests clearly show that the reconstructed wideband signal gives a much better performance in terms of speech quality when compared to some of the existing speech bandwidth extension methods employing data hiding. 相似文献

4.

基于计算听觉场景分析的单声道浊音分离

张丽娜张二华江军亮《计算机工程与科学》2019,41(7):1266-1272

针对单声道语音分离中浊音分离的问题,提出了一种准确估计基音周期的方法。首先,以语音的短时平稳性和基音周期的连续性等为线索,利用语音信号的倒谱峰值构成基音周期谱图,并自动提取基音周期轨迹。然后,利用谐波频率为基音频率整数倍的性质来拾取各次谐波的频谱。最后,通过傅里叶逆变换对浊音进行重构。实验结果表明,该方法能准确提取基音周期轨迹,有效分离浊音信号。相似文献

5.

基于MFCC的频谱重构实现音高估计和发声分类

张少华秦会斌《测控技术》2019,38(11):86-89

音高估计和发声分类可以帮助快速检索目标语音,是语音检索中十分重要且困难的研究方向之一,对语音识别领域具有重要的意义。提出了一种新型音高估计和发声分类方法。利用梅尔频率倒谱系数（MFCC）进行频谱重构,并在对数下对重构的频谱进行压缩和过滤。通过高斯混合模型（GMM）对音高频率和滤波频率的联合密度建模来实现音高估计,实验结果在TIMIT数据库上的相对误差为6.62%。基于高斯混合模型的模型也可以完成发声分类任务,经试验测试表明发声分类的准确率超过99%,为音高估计和发声分类提供了一种新的模型。相似文献

6.

A method for speech signal processing based on band filtering of the logarithmic spectrum

A. S. Kolokolov 《Automation and Remote Control》2014,75(3):496-502

We propose a method for speech signal preprocessing based on band filtering of the logarithmic amplitude spectrum with a filter with odd impulse characteristic. With such filtering, we can detect local nonuniformities in the spectrum of a speech signal caused by abrupt inclinations of the vocal tract frequency characteristic, which represent useful features for speech recognition. We show examples of using the proposed approach on natural speech signals. 相似文献

7.

采用子带谱减法的语音增强

蔡宇郝程鹏侯朝焕《计算机应用》2014,34(2):567-571

为了抑制语音信号中的环境噪声,提出了一种基于子带谱减法进行噪声抑制的语音增强方法。首先通过滤波器组将时域信号分成若干个频（子）带,然后在每个子带中,独立使用改进的谱减法技术进行语音增强。由于实际环境中的背景噪声绝大多数都不是随频率均匀分布的,因此这种在不同频带内进行噪声估计和频谱相减的方法更具有针对性,且更加准确。在实际语音处理实验中证明,所提方法在达到噪声抑制效果的同时较好地保留了语音的结构,使增强后的语音具有更高的听觉舒适度和可理解度。相似文献

8.

语音信号共振峰频率估计的分段线性预测算法

下载免费PDF全文

陈宁万茂文《计算机工程与应用》2009,45(28):156-159

基于分段线性预测算法估计语音的共振峰频率,运用多通道的滤波器组对语音的频段进行划分,然后选择合适的逆滤波器逼近不同频段的短时频谱,最后依据该逆滤波器估计共振峰频率。实验结果表明,与传统方法相比,该方法提高了语音共振峰频率估计时的分辨率与准确性,受噪声的影响较小。相似文献

9.

Voice activity detection algorithm using nonlinear spectral weights,hangover and hangbefore criteria

Damjan Vlaj Zdravko Kačič Marko Kos 《Computers & Electrical Engineering》2012

This paper introduces a nonlinear function into the frequency spectrum that improves the detection of vowels, diphthongs, and semivowels within the speech signal. The lower efficiency of consonant detection was solved by implementing the hangover and hangbefore criteria. This paper presents a procedure for faster definition of those optimal constants used by hangover and hangbefore criteria. A nonlinearly changed frequency spectrum is used in the proposed GMM (Gaussian Mixture Model) based VAD (Voice Activity Detection) algorithm. Comparative tests between the proposed VAD algorithm and seven other VAD algorithms were made on the Aurora 2 database. The experiments were based on frame error detection and on speech recognition performance for two types of acoustic training modes (multi-condition and clean only). The lowest average percentage of frame errors was obtained by the proposed VAD algorithm, which also achieved positive improvement in the speech recognition performance for both types of acoustic training modes. 相似文献

10.

基于改进深度置信网络的语音增强算法

余华唐於烽赵力《数据采集与处理》2018,33(5):793-800

研究了一种基于深度置信网络的语音增强算法，并针对其不足做如下改进：考虑到对应训练集中噪声种类较少，噪声特性不够丰富的情况，在频域对噪声频谱进行扰动，以丰富噪声频谱特性；考虑到不同频点的信号对系统误差的影响不一样，结合绝对听阈构造权重系数。最后选取在噪声环境下传统语音增强算法中较好的LOG-MMSE和本文改进的基于深度置信网络的语音增强算法进行了分析比较，结果证明深度置信网络的语音增强算法显示出较好性能，尤其对增强后语音质量的提升超过了LOG-MMSE方法。相似文献

11.

一种改进的MFCC参数提取方法

王彪《计算机与数字工程》2012,40(4):19-21

为了提高语音识别率,提出了一种改进的MFCC参数提取方法。该方法应用小波包变换高分辨率的特点和语音高频加权的功能,在传统MFCC参数的基础上提取了一种新特征参数。新参数能对语音信号频率进行更加精细的划分,能够更稳定地减小频谱失真,且在一定程度上降低了信号的噪声。最后采用高斯混合模型（GMM）进行说话人语音识别,实验表明新特征参数取得了较好的识别率。相似文献

12.

基于改进能熵比的维纳滤波语音增强算法

王帅蒲宝明李相泽张笑东姚恺丰《计算机系统应用》2017,26(11):124-131

为了提高低信噪比环境下语音增强的效果、算法的鲁棒性.在基于维纳滤波算法的基础上,结合基于频域特征的语音端点检查算法,提出了一种新的语音增强算法.端点检测算法使用小波包ERB子带的谱熵和改进的频域能量的能熵比法.其中,小波包ERB子带的谱熵考虑了人耳听觉掩蔽模型和语音与噪声信号之间的频率分布之间的不同;频域能量利用了有语音帧和无语音帧的能量不同.维纳滤波算法实时采集语音数据并使用新的参数来区别无语音段和有语音段,并在无语音段平滑更新噪声谱.实验结果表明,该端点检测算法能够很好的区分有语音段和无语音段,这就使得在低信噪比的情况下语音增强效果得到了提升,同时算法的鲁棒性和实时性也得到了保障.在与其他两种算法对比中,得到了更好的语音增强效果. 相似文献

13.

以语音出现时频相关性为基础的语音掩模估计

战鸽黄兆琼应冬文潘接林颜永红《软件学报》2016,27(S2):64-68

在二维的时频域网格结构中,相邻点上语音信号的存在与否是相关的,传统的马尔可夫链不能对二维的时频相关性进行自适应的建模.基于语音信号在时频域中的相关性,提出了一种利用二维的相关模型估计语音掩模的方法.该方法将时频域中带噪语音信号的对数功率谱划分为语音和非语音类,利用时域中的状态转移概率和前向因子描述语音信号的时域相关性,同时利用频域中的状态转移概率和邻域因子描述语音信号的频域相关性.通过全局的统计最优化,该模型将时域相关性和频域相关性相结合.给出了该模型的序贯化更新方法,逐帧更新模型并估计语音出现概率.在当前已知对数功率谱和模型参数的条件下,通过最大化后验概率得到的语音信号状态矩阵可以作为语音掩模的最优估计.将该方法与几种现有的语音掩模在线估计方法进行比较,实验结果显示出了该方法的优越性. 相似文献

14.

小波包分解下的多窗谱估计语音增强算法 总被引：1，自引：0，他引：1

下载免费PDF全文

查诚杨平潘平《计算机工程》2012,38(5):291-292

传统谱减法是基于短时傅里叶变换的单一分辨率算法,具有较大方差。为此,提出一种基于小波包分解下的多窗谱估计语音增强算法。将含噪语音在小波包下分解成不同频段,在不同频段下进行多窗谱谱减运算,并逐一进行小波包重构,以得到去噪后的语音信号。仿真结果表明,该算法能提高含噪语音的信噪比,降低语言失真度。相似文献

15.

基于自适应加权谱内插的宽带语音编码算法

凌震华戴礼荣王仁华双志伟周斌《数据采集与处理》2005,20(1):28-33

提出了一种基于自适应加权谱内插(STRAIGHT)的宽带语音编码算法。输入的语音信号首先经过STRAIGHT分析得到精确的基频参数和谱参数,然后通过时域抽取和频域建模实现有效的编码压缩。在时域抽取时采用的区别于传统编码算法固定帧长的自适应可变帧长方法,使得编码存储量可以根据实际语音变化情况得到更加合理的分配。主观测听结果表明,该算法针对16kHz采样的语音信号,在6kbps码率上可以取得与AMR-WB(G．722．2)在8．85kbps时的相当的音质效果。此外,该算法还具有对恢复语音的时长、基频以及谱参数较强的调整能力。相似文献

16.

基于FD—PSOLA算法的语音合成分析方法 总被引：3，自引：0，他引：3

郑新春柴佩琪《微型电脑应用》2001,17(7):26-29

介绍了一种基于FD－PSOLA算法来实现汉语韵律特征的修改。在短时信号频域修改的过程中,通过同态滤波处理分离了频谱包络和激励源频谱,并通过修改频率轴坐标来实现激励源频谱的压缩或拉伸。实验结果表明,FD－PSOLA算法比TD－PSOLA算法更适合于较高频率调整范围的语音合成分析。相似文献

17.

一种适用于说话人识别的改进Mel滤波器 总被引：1，自引：0，他引：1

项要杰杨俊安李晋徽陆俊《计算机工程》2013,(11):214-217,222

Mel倒谱系数（MFcc）侧重提取语音信号的低频信息,对语音信号的频谱分布特性描述不充分,不能有效区分说话人个性信息。为此,通过分析语音信号各频段所含说话人个性信息的不同,结合Mel滤波器和反Mel滤波器在高低频段的不同特性,提出一种适于说话人识别的改进Mel滤波器。实验结果表明,改进Mel滤波器提取的新特征能够获得比传统Mel倒谱系数以及反Mel倒谱系数（IMFCC）更好的识别效果,并且基本不增加说话人识别系统训练和识别的时间开销。相似文献

18.

Speech Dereverberation Based on Maximum-Likelihood Estimation With Time-Varying Gaussian Source Model

《IEEE transactions on audio, speech, and language processing》2008,16(8):1512-1527

Distant acquisition of acoustic signals in an enclosed space often produces reverberant components due to acoustic reflections in the room. Speech dereverberation is in general desirable when the signal is acquired through distant microphones in such applications as hands-free speech recognition, teleconferencing, and meeting recording. This paper proposes a new speech dereverberation approach based on a statistical speech model. A time-varying Gaussian source model (TVGSM) is introduced as a model that represents the dynamic short time characteristics of nonreverberant speech segments, including the time and frequency structures of the speech spectrum. With this model, dereverberation of the speech signal is formulated as a maximum-likelihood (ML) problem based on multichannel linear prediction, in which the speech signal is recovered by transforming the observed signal into one that is probabilistically more like nonreverberant speech. We first present a general ML solution based on TVGSM, and derive several dereverberation algorithms based on various source models. Specifically, we present a source model consisting of a finite number of states, each of which is manifested by a short time speech spectrum, defined by a corresponding autocorrelation (AC) vector. The dereverberation algorithm based on this model involves a finite collection of spectral patterns that form a codebook. We confirm experimentally that both the time and frequency characteristics represented in the source models are very important for speech dereverberation, and that the prior knowledge represented by the codebook allows us to further improve the dereverberated speech quality. We also confirm that the quality of reverberant speech signals can be greatly improved in terms of the spectral shape and energy time-pattern distortions from simply a short speech signal using a speaker-independent codebook. 相似文献

19.

一种基于MVDR和CCBC的抗噪语音识别方法 总被引：1，自引：0，他引：1

龙潜孔凡让刘永斌刘维来刘志刚《数据采集与处理》2006,21(3):297-301

提出了一种适用于抗噪声语音识别的方法，其特征提取过程基于最小方差无失真响应（Minimum variance distortionles sresponse，MVDR）谱估计方法，并对该特征进行频率弯折以提高其知觉分辨率，最后使用基于正则相关分析的谱变换补偿（Canonical correlation based on compensation，CCBC）法对该特征进行自适应处理，从而提高了系统的鲁棒性。在展览馆噪声、人群噪声和汽车噪声下，与基于传统Mel倒谱系数（MFCC）特征的系统进行了对比实验，结果表明使用本文方法的语音识别系统的识别率得到了显著的提高。相似文献

20.

基于Bark域噪声估计及掩蔽效应的语音增强 总被引：4，自引：3，他引：1

下载免费PDF全文

赵欢熊敏侯卫国《计算机工程》2009,35(12):261-263

针对非平稳环境下噪声估计和语音增强性能降低的特点,提出一种基于Bark域的快速自适应噪声谱估计算法。它基于听觉模型,将带噪信号变换到Bark域,并在Bark域内实现基于人耳掩蔽的语音增强。仿真实验表明该算法能充分利用Bark带内频带间的相关性,跟踪快变的背景噪声,提高语音增强性能,减少运算量和复杂度。相似文献