共查询到20条相似文献,搜索用时 281 毫秒
1.
为了充分利用含噪语音特征来提高语音增强网络的性能,基于含噪语音在时间和频率两个维度上的相关性,本文结合卷积神经网络的局部特征提取能力和门控循环单元的长期依赖建模能力,设计了一种适用于语音增强的卷积门控循环网络.该网络采用卷积网络结构代替全连接网络结构来改进门控循环单元中的特征计算过程,从而能够更好地保留含噪语音特征中的时频结构信息.实验结果表明,与其它语音增强网络相比,本文网络在语音成分的保留和噪声成分的抑制上具有明显优势,增强后语音具有更好的语音质量和可懂度. 相似文献
2.
针对声音事件检测中仅在时频维度使用注意力机制的局限性以及卷积层单一导致的 特征提取不足问题,本文提出基于多尺度注意力特征融合的卷积循环神经网络(convolutional recurrent neural network,CRNN)模型,以提高声音事件检测性能。首 先,提出多尺度注意力模块,实现对局部时频单元和全局通道特征的多尺度注意,提高模型 的特征选择能力;其次,提出一种多尺度特征融合方法,融合含有丰富上下文信息的多尺度 注意力特征,提高模型的特征表达能力;最后,双向门控循环网络层对时间依赖性进行建模 , 全连接层对声音事件进行逐帧分类。除此之外,使用数据平衡技术进一步泛化模型。在 AudioSet子数据集上的实验结果表明:提出的网络模型与CRNN相比,评估集(error rate, ER)下降 11%,F1分数 (F1-score, F1)提升8.3%,有效地提高了声音事件检测性能。 相似文献
3.
针对DCASE2017挑战赛任务4提供的大规模弱标记声音事件检测数据集,搭建了基于梅尔滤波器特征(Fbank)、卷积神经网络(CNN)以及循环神经网络(RNN)的多类别声音事件检测系统,分析了attention和linear softmax两种已有的常用池化层在神经网络反向传播中的部分推演过程,并在linear softmax池化层的基础上进行改进,提出了一种“指数可学习的幂函数softmax”池化层。实验结果表明,相比于DCASE竞赛中获得第一名的模型,应用“指数可学习的幂函 softmax”池化层的检测系统,将段级别的声音事件预测的F1值从0.556提高到0.652,帧级别预测的F1值从0.518提高到0.583,帧级别预测的error rate (ER) 从0.730降低到0.667。 相似文献
4.
In order to achieve more accurate emotion recognition accuracy from multi-modal bio-signal features,a novel method to extract and fuse the signal with the stacked auto-encoder and LSTM recurrent neural networks was proposed.The stacked auto-encoder neural network was used to compress and fuse the features.The deep LSTM recurrent neural network was employed to classify the emotion states.The results present that the fused multi-modal features provide more useful information than single-modal features.The deep LSTM recurrent neural network achieves more accurate emotion classification results than other method.The highest accuracy rate is 0.792 6 相似文献
5.
6.
传统的基于几何形态的神经元分类方法依赖于神经元空间结构特征的提取与选择,会损失大量有用的神经元分类信息.应用自适应投影算法将三维神经元进行转换,不需要提取神经元的几何特征,提出了一种基于深度学习网络的神经元几何形态分类方法.该方法将原始神经元数据进行三维体素重建,经过自适应投影过程构成二维神经元图像数据,并构建了基于双卷积门限循环神经网络的深度学习模型对神经元进行分类.将该方法应用于三种神经元分类数据集,通过与基于特征提取的神经元分类方法相比,实验结果表明该方法具有更高的分类准确率和良好的适应能力. 相似文献
7.
Jiuchao Feng Chi K. Tse Francis C. M. Lau 《International Journal of Communication Systems》2004,17(3):217-232
A number of schemes have been proposed for communication using chaos over the past years. Regardless of the exact modulation method used, the transmitted signal must go through a physical channel which undesirably introduces distortion to the signal and adds noise to it. The problem is particularly serious when coherent‐based demodulation is used because the necessary process of chaos synchronization is difficult to implement in practice. This paper addresses the channel distortion problem and proposes a technique for channel equalization in chaos‐based communication systems. The proposed equalization is realized by a modified recurrent neural network (RNN) incorporating a specific training (equalizing) algorithm. Computer simulations are used to demonstrate the performance of the proposed equalizer in chaos‐based communication systems. The Hénon map and Chua's circuit are used to generate chaotic signals. It is shown that the proposed RNN‐based equalizer outperforms conventional equalizers as well as those based on feedforward neural networks for noisy, distorted linear and non‐linear channels. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献
8.
9.
Typical methods for overlapping sound event detection (SED) do not fully consider the joint spectral and temporal transition characteristics of the audio signal. They are generally based on training models using either separate data from each event class or mixed signals containing simultaneous sound events. This paper introduced a new approach for SED in real-life audio using Nonnegative Matrix Factor 2-D Deconvolution and RUSBoost techniques. The idea is to capture the two-dimensional joint spectral and temporal information from the time-frequency representation while possibly separating the sound mixture into several sources. In addition, the RUSBoost technique is utilized to address the class imbalance problem of the training data. The proposed approach is evaluated using the TUT Sound Event 2016 and 2017 datasets. The results showed that the proposed method outperformed the baseline methods. For the TUT Sound Event 2016 dataset, the proposed method reduced the total error rate by 5% while increasing the F1 score by 13.8%. For the TUT Sound Event 2017 dataset, the proposed method reduced the total error rate by 3% while increasing the F1 score by 8.1%. 相似文献
10.
论文针对各种背景声音中低信噪比声音事件的检测问题,提出把背景声音与声音事件混合,形成带噪声样本来训练分类器.在预处理阶段,使用基于经验模态分解与2-6级固有模态函数的投票方法,对背景声音与声音事件端点进行预测并估算信噪比.接着使用子带能量分布方法,提取声音数据的特征.最后,论文将背景声音与声音事件样本库中所有声音样本按照估算的信噪比相混合,生成混合声音特征训练多随机森林,用于低信噪比声音事件的检测.实验证实,所提出的方法可以用于各种声场景下低信噪比声音事件的检测,并能在信噪比为-5dB的情况下保持67.1%的平均检测率. 相似文献
11.
12.
Kwang Myung Jeon Su Yeon Park Chan Jun Chun Nam In Park Hong Kook Kim 《ETRI Journal》2017,39(3):398-405
In this paper, an artificial stereo extension method that creates stereophonic sound from a mono sound source is proposed. The proposed method first trains deep neural networks (DNNs) that model the nonlinear relationship between the dominant and residual signals of the stereo channel. In the training stage, the band‐wise log spectral magnitude and unwrapped phase of both the dominant and residual signals are utilized to model the nonlinearities of each sub‐band through deep architecture. From that point, stereo extension is conducted by estimating the residual signal that corresponds to the input mono channel signal with the trained DNN model in a sub‐band domain. The performance of the proposed method was evaluated using a log spectral distortion (LSD) measure and multiple stimuli with a hidden reference and anchor (MUSHRA) test. The results showed that the proposed method provided a lower LSD and higher MUSHRA score than conventional methods that use hidden Markov models and DNN with full‐band processing. 相似文献
13.
The inter‐channel level difference (ICLD) is a cue parameter to estimate spectral information in binaural cue coding that has been recently in the spotlight as a multichannel audio signal compression technique. Even though the ICLD is an essential parameter, it is generally distorted by quantization. In this paper, a new modified ICLD representation method to minimize the quantization distortion is proposed by adopting a flexible determination of the reference channel and the unidirectional quantization scheme. Our experimental result confirms that the proposed method improves the multichannel audio output quality even with the reduced bit‐rate. 相似文献
14.
现有的深度神经网络语音增强方法忽视了相位谱学习的重要性,从而造成增强语音质量不理想。针对这一问题,文中提出了一种基于卷积循环网络与非局部模块的语音增强方法。通过设计一种编解码网络,将语音信号的时域表示作为编码端的输入进行深层特征提取,从而充分利用语音信号的幅值信息以及相位信息。在编码端和解码端的卷积层中加入非局部模块,在提取语音序列关键特征的同时,抑制无用特征,并引入门控循环单元网络捕捉语音序列间的时序相关性信息。在ST-CMDS中文语音数据集上实验结果表明,与未处理的含噪语音相比,使用文中方法生成的增强语音质量和可懂度平均提升了61%和7.93%。 相似文献
15.
在语种识别过程中,为提取语音信号中的空间特 征以及时序特征,从而达到提高多语 种识别准确率的目的,提出了一种利用卷积循环神经网络(convolutional recurrent neural network,CRNN)混合神经网络的多语种识别模型。该模型首先提 取语音信号的声学特征;然后将特征输入到卷积神经网络(convolutional neural network,CNN) 提取低维度的空间特征;再通过空 间金字塔池化层(spatial pyramid pooling layer,SPP layer) 对空间特征进行规整,得到固定长度的一维特征;最后将其输入到循环神经 网络(recurrenrt neural network,CNN) 来判别语种信息。为验证模型的鲁棒性,实验分别在3个数据集上进行,结果表明:相 比于传统的CNN和RNN,CRNN混合神经网络对不同数据集的语种识别 准确率均有提高,其中在8语种数据集中时长为5 s的语音上最为明显,分别提高了 5.3% 和6.1%。 相似文献
16.
17.
流量加密技术给流量分类带来了新的挑战,为实现加密流量的快速准确分类,提出了一种基于卷积注意力门控循环网络的加密流量分类方法。将卷积神经网络和门控循环单元相结合,针对流量数据的特点,修改卷积神经网络的池化层以提取单个数据包特征,通过注意力机制寻找单个数据包的关键特征并赋予高权重;然后采用门控循环单元提取流层面数据包间的时间序列特征,从包层面和流层面全面反映流量的整体和局部特征。实验证明该方法相对于现有方法,提高了分类准确率、实时性和训练效率。 相似文献
18.
19.
Ying Tan 《The Journal of VLSI Signal Processing》2002,32(1-2):45-54
In this paper, we propose a new approach for signal detection in wireless digital communications based on the neural network with transient chaos and time-varying gain (NNTCTG), and give a concrete model of the signal detector after appropriate transformations and mappings. It is well known that the problem of the maximum likelihood signal detection can be described as a complex optimization problem that has so many local optima that conventional Hopfield-type neural networks fail to solve. By refraining from the serious local optima problem of Hopfield-type neural networks, the NNTCTG makes use of the time-varying parameters of the recurrent neural network to control the evolving behavior of the network so that the network undergoes the transition from chaotic behavior to gradient convergence. It has richer and more flexible dynamics rather than conventional neural networks only with point attractors, so that it can be expected to have much ability to search for globally optimal or near-optimal solutions. After going through a transiently inverse-bifurcation process, the NNTCTG can approach the global optimum or the neighborhood of global optimum of our problem. Simulation experiments have been performed to show the effectiveness and validation of the proposed neural network based method for the signal detection in digital communications. 相似文献
20.
This paper proposes an approach to improve the performance of no-reference video quality assessment for sports videos with dynamic motion scenes using an efficient spatiotemporal model. In the proposed method, we divide the video sequences into video blocks and apply a 3D shearlet transform that can efficiently extract primary spatiotemporal features to capture dynamic natural motion scene statistics from the incoming video blocks. The concatenation of a deep residual bidirectional gated recurrent neural network and logistic regression is used to learn the spatiotemporal correlation more robustly and predict the perceptual quality score. In addition, conditional video block-wise constraints are incorporated into the objective function to improve quality estimation performance for the entire video. The experimental results show that the proposed method extracts spatiotemporal motion information more effectively and predicts the video quality with higher accuracy than the conventional no-reference video quality assessment methods. 相似文献