利用深度全卷积编解码网络的单通道语音增强 Single Channel Speech Enhancement Based on Deep Fully Convolutional Encoder-Decoder Neural Network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

利用深度全卷积编解码网络的单通道语音增强

引用本文：	时文华,张雄伟,邹霞,孙蒙.利用深度全卷积编解码网络的单通道语音增强[J].信号处理,2019,35(4):631-640.

作者姓名：	时文华张雄伟邹霞孙蒙

作者单位：	陆军工程大学

基金项目：	国家自然科学基金项目（61471394）；江苏省优秀青年基金（BK20180080）

摘要：	针对传统的神经网络未能对时频域的相关性充分利用的问题,提出了一种利用深度全卷积编解码神经网络的单通道语音增强方法。在编码端,通过卷积层的卷积操作对带噪语音的时频表示逐级提取特征,在得到目标语音高级特征表示的同时逐层抑制背景噪声。解码端和编码端在结构上对称,在解码端,对编码端获得的高级特征表示进行反卷积、上采样操作,逐层恢复目标语音。跳跃连接可以很好地解决极深网络中训练时存在的梯度弥散问题,本文在编解码端的对应层之间引入跳跃连接,将编码端特征图信息传递到对应的解码端,有利于更好地恢复目标语音的细节特征。对特征融合和特征拼接两种跳跃连接方式、L1和L2两种训练损失函数对语音增强性能的影响进行了研究,通过实验验证所提方法的有效性。
关键词：	语音增强跳跃连接编解码卷积神经网络
收稿时间：	2019-01-24
Single Channel Speech Enhancement Based on Deep Fully Convolutional Encoder-Decoder Neural Network

Shi Wenhua,Zhang Xiongwei,Zou Xia,Sun Meng.Single Channel Speech Enhancement Based on Deep Fully Convolutional Encoder-Decoder Neural Network[J].Signal Processing,2019,35(4):631-640.

Authors:	Shi Wenhua Zhang Xiongwei Zou Xia Sun Meng

Affiliation:	Army Engineering UniversityAir Force Aviation University

Abstract:	Considering the time frequency correlation characteristics of speech is not well utilized in the conventional deep neural network, a single channel speech enhancement method based on deep encoder-decoder neural network is proposed. At the coding end，the time-frequency representation of noisy speech is extracted step by step through convolution and pooling operations of convolution layer to obtain high level feature representation of the target speech. At the same time，the background noise is suppressed. The decoder and the encoder are symmetrical in structure， and the target speech features are reconstructed from the advanced feature representation obtained in the encoder step through de-convolution and up-sampling operations at decoding end. Skip connections are employed to solve the gradient dispersion problem in very deep neural networks. In this paper，low level feature maps which include the detail information of speech are delivered by skip connections from the coding end to the corresponding decoding end feature map in the decoding end. This will help the decoder recover the detailed features of the target speech better. The network is trained in two ways with L₁ loss and L₂ loss， the performance of two forms of connections， feature fusion and feature concatenation are evaluated in the experiments. The results demonstrate the effectiveness of proposed method.

Keywords:	speech enhancement skip connection encoder-decoder convolutional network
本文献已被维普等数据库收录！
	点击此处可从《信号处理》浏览原始摘要信息
	点击此处可从《信号处理》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏