首页 | 官方网站   微博 | 高级检索  
     

利用深度全卷积编解码网络的单通道语音增强
引用本文:时文华,张雄伟,邹霞,孙蒙.利用深度全卷积编解码网络的单通道语音增强[J].信号处理,2019,35(4):631-640.
作者姓名:时文华  张雄伟  邹霞  孙蒙
作者单位:陆军工程大学
基金项目:国家自然科学基金项目(61471394);江苏省优秀青年基金(BK20180080)
摘    要:针对传统的神经网络未能对时频域的相关性充分利用的问题,提出了一种利用深度全卷积编解码神经网络的单通道语音增强方法。在编码端,通过卷积层的卷积操作对带噪语音的时频表示逐级提取特征,在得到目标语音高级特征表示的同时逐层抑制背景噪声。解码端和编码端在结构上对称,在解码端,对编码端获得的高级特征表示进行反卷积、上采样操作,逐层恢复目标语音。跳跃连接可以很好地解决极深网络中训练时存在的梯度弥散问题,本文在编解码端的对应层之间引入跳跃连接,将编码端特征图信息传递到对应的解码端,有利于更好地恢复目标语音的细节特征。对特征融合和特征拼接两种跳跃连接方式、L1和L2两种训练损失函数对语音增强性能的影响进行了研究,通过实验验证所提方法的有效性。

关 键 词:语音增强  跳跃连接  编解码  卷积神经网络
收稿时间:2019-01-24

Single Channel Speech Enhancement Based on Deep Fully Convolutional Encoder-Decoder Neural Network
Shi Wenhua,Zhang Xiongwei,Zou Xia,Sun Meng.Single Channel Speech Enhancement Based on Deep Fully Convolutional Encoder-Decoder Neural Network[J].Signal Processing,2019,35(4):631-640.
Authors:Shi Wenhua  Zhang Xiongwei  Zou Xia  Sun Meng
Affiliation:Army Engineering UniversityAir Force Aviation University
Abstract:Considering the time frequency correlation characteristics of speech is not well utilized in the conventional deep neural network, a single channel speech enhancement method based on deep encoder-decoder neural network is proposed. At the coding end,the time-frequency representation of noisy speech is extracted step by step through convolution and pooling operations of convolution layer to obtain high level feature representation of the target speech. At the same time,the background noise is suppressed. The decoder and the encoder are symmetrical in structure, and the target speech features are reconstructed from the advanced feature representation obtained in the encoder step through de-convolution and up-sampling operations at decoding end. Skip connections are employed to solve the gradient dispersion problem in very deep neural networks. In this paper,low level feature maps which include the detail information of speech are delivered by skip connections from the coding end to the corresponding decoding end feature map in the decoding end. This will help the decoder recover the detailed features of the target speech better. The network is trained in two ways with L1 loss and L2 loss, the performance of two forms of connections, feature fusion and feature concatenation are evaluated in the experiments. The results demonstrate the effectiveness of proposed method. 
Keywords:speech enhancement  skip connection  encoder-decoder  convolutional network
本文献已被 维普 等数据库收录!
点击此处可从《信号处理》浏览原始摘要信息
点击此处可从《信号处理》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号