基于感知掩蔽深度神经网络的单通道语音增强方法 A Single-channel Speech Enhancement Approach Based on Perceptual Masking Deep Neural Network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于感知掩蔽深度神经网络的单通道语音增强方法

引用本文：	韩伟,张雄伟,闵刚,张启业.基于感知掩蔽深度神经网络的单通道语音增强方法[J].自动化学报,2017,43(2):248-258.

作者姓名：	韩伟张雄伟闵刚张启业

作者单位：	1.解放军理工大学南京 210007

基金项目：	国家自然科学基金（61471394，61402519），江苏省自然科学基金（BK20140071，BK20140074）资助

摘要：	本文将心理声学掩蔽特性应用于基于深度神经网络（Deep neural network，DNN）的单通道语音增强任务中，提出了一种具有感知掩蔽特性的DNN结构.首先，提出的DNN对带噪语音幅度谱特征进行训练并分别得到纯净语音和噪声的幅度谱估计.其次，利用估计的纯净语音幅度谱计算噪声掩蔽阈值.然后，将噪声掩蔽阈值和估计的噪声幅度谱联合计算得到一个感知增益函数.最后，利用感知增益函数从带噪语音幅度谱中估计出增强语音幅度谱.在TIMIT数据库上，对不同信噪比下的20种噪声进行的仿真实验表明，无论噪声类型是否在语音的训练集中出现，所提出的感知掩蔽DNN都能够在有效去除噪声的同时保持较小的语音失真，增强效果明显优于常见的DNN增强方法以及NMF（Nonnegative matrix factorization）增强方法.
关键词：	语音增强深度神经网络感知增益函数掩蔽阈值
收稿时间：	2015-10-31
A Single-channel Speech Enhancement Approach Based on Perceptual Masking Deep Neural Network

HAN Wei,ZHANG Xiong-Wei,MIN Gang,ZHANG Qi-Ye.A Single-channel Speech Enhancement Approach Based on Perceptual Masking Deep Neural Network[J].Acta Automatica Sinica,2017,43(2):248-258.

Authors:	HAN Wei ZHANG Xiong-Wei MIN Gang ZHANG Qi-Ye

Affiliation:	1.PLA University of Science and Technology, Nanjing 2100072.Xi'an Communications Institute, Xi'an 7101063.Unit 96637 of PLA, Beijing 102101

Abstract:	A new deep neural network (DNN) is proposed for single-channel speech enhancement, which incorporates the perceptual masking properties of psychoacoustic models. Firstly, the proposed DNN is trained to learn both the clean speech magnitude spectrum and the noise magnitude spectrum from the noisy magnitude spectrum. Secondly, the estimated clean speech magnitude spectrum is used to calculate the noise masking threshold. Then, the noise masking threshold and the estimated noise magnitude spectrum are combined to calculate a perceptual gain function. Finally, the enhanced speech magnitude spectrum are obtained by jointly training the perceptual gain function and the noisy speech magnitude spectrum. Experimental results on TIMIT with 20 noise types at various SNR (signal-noise ratio) levels demonstrate that the proposed perceptual masking DNN can effectively remove the noise while maintaining small speech distortion, so as to obtain better performance than the common DNN methods and the NMF (nonnegative matrix factorization) method, no matter noise conditions are included in the training set or not.

Keywords:	Speech enhancement deep neural network perceptual gain function masking threshold

	点击此处可从《自动化学报》浏览原始摘要信息
	点击此处可从《自动化学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏