首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
为了充分利用含噪语音特征来提高语音增强网络的性能,基于含噪语音在时间和频率两个维度上的相关性,本文结合卷积神经网络的局部特征提取能力和门控循环单元的长期依赖建模能力,设计了一种适用于语音增强的卷积门控循环网络.该网络采用卷积网络结构代替全连接网络结构来改进门控循环单元中的特征计算过程,从而能够更好地保留含噪语音特征中的时频结构信息.实验结果表明,与其它语音增强网络相比,本文网络在语音成分的保留和噪声成分的抑制上具有明显优势,增强后语音具有更好的语音质量和可懂度.  相似文献   

2.
刘亚灵  郭敏  马苗 《光电子.激光》2021,32(12):1271-1277
针对声音事件检测中仅在时频维度使用注意力机制的局限性以及卷积层单一导致的 特征提取不足问题,本文提出基于多尺度注意力特征融合的卷积循环神经网络(convolutional recurrent neural network,CRNN)模型,以提高声音事件检测性能。首 先,提出多尺度注意力模块,实现对局部时频单元和全局通道特征的多尺度注意,提高模型 的特征选择能力;其次,提出一种多尺度特征融合方法,融合含有丰富上下文信息的多尺度 注意力特征,提高模型的特征表达能力;最后,双向门控循环网络层对时间依赖性进行建模 , 全连接层对声音事件进行逐帧分类。除此之外,使用数据平衡技术进一步泛化模型。在 AudioSet子数据集上的实验结果表明:提出的网络模型与CRNN相比,评估集(error rate, ER)下降 11%,F1分数 (F1-score, F1)提升8.3%,有效地提高了声音事件检测性能。  相似文献   

3.
刘淼  王晶  董桂官  易伟明 《信号处理》2021,37(10):1907-1913
针对DCASE2017挑战赛任务4提供的大规模弱标记声音事件检测数据集,搭建了基于梅尔滤波器特征(Fbank)、卷积神经网络(CNN)以及循环神经网络(RNN)的多类别声音事件检测系统,分析了attention和linear softmax两种已有的常用池化层在神经网络反向传播中的部分推演过程,并在linear softmax池化层的基础上进行改进,提出了一种“指数可学习的幂函数softmax”池化层。实验结果表明,相比于DCASE竞赛中获得第一名的模型,应用“指数可学习的幂函 softmax”池化层的检测系统,将段级别的声音事件预测的F1值从0.556提高到0.652,帧级别预测的F1值从0.518提高到0.583,帧级别预测的error rate (ER) 从0.730降低到0.667。   相似文献   

4.
In order to achieve more accurate emotion recognition accuracy from multi-modal bio-signal features,a novel method to extract and fuse the signal with the stacked auto-encoder and LSTM recurrent neural networks was proposed.The stacked auto-encoder neural network was used to compress and fuse the features.The deep LSTM recurrent neural network was employed to classify the emotion states.The results present that the fused multi-modal features provide more useful information than single-modal features.The deep LSTM recurrent neural network achieves more accurate emotion classification results than other method.The highest accuracy rate is 0.792 6  相似文献   

5.
6.
传统的基于几何形态的神经元分类方法依赖于神经元空间结构特征的提取与选择,会损失大量有用的神经元分类信息.应用自适应投影算法将三维神经元进行转换,不需要提取神经元的几何特征,提出了一种基于深度学习网络的神经元几何形态分类方法.该方法将原始神经元数据进行三维体素重建,经过自适应投影过程构成二维神经元图像数据,并构建了基于双卷积门限循环神经网络的深度学习模型对神经元进行分类.将该方法应用于三种神经元分类数据集,通过与基于特征提取的神经元分类方法相比,实验结果表明该方法具有更高的分类准确率和良好的适应能力.  相似文献   

7.
A number of schemes have been proposed for communication using chaos over the past years. Regardless of the exact modulation method used, the transmitted signal must go through a physical channel which undesirably introduces distortion to the signal and adds noise to it. The problem is particularly serious when coherent‐based demodulation is used because the necessary process of chaos synchronization is difficult to implement in practice. This paper addresses the channel distortion problem and proposes a technique for channel equalization in chaos‐based communication systems. The proposed equalization is realized by a modified recurrent neural network (RNN) incorporating a specific training (equalizing) algorithm. Computer simulations are used to demonstrate the performance of the proposed equalizer in chaos‐based communication systems. The Hénon map and Chua's circuit are used to generate chaotic signals. It is shown that the proposed RNN‐based equalizer outperforms conventional equalizers as well as those based on feedforward neural networks for noisy, distorted linear and non‐linear channels. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

8.
海杂波是雷达在海洋表面采集到的海面电磁散射回波.受海洋环境要素(风速、风向、浪高、浪向等)和雷达参数的影响,其幅度随时间具有随机起伏性,海杂波的幅度预测精度的提高有助于增加目标检测准确度.本文结合海杂波非高斯非线性的特点,提出了基于门控循环神经网络的海杂波幅度预测方法.通过对IPIX雷达和P波段雷达海杂波实测数据的预测分析,结果表明,本文方法相对已有传统方法具有更高的预测精度.  相似文献   

9.
Typical methods for overlapping sound event detection (SED) do not fully consider the joint spectral and temporal transition characteristics of the audio signal. They are generally based on training models using either separate data from each event class or mixed signals containing simultaneous sound events. This paper introduced a new approach for SED in real-life audio using Nonnegative Matrix Factor 2-D Deconvolution and RUSBoost techniques. The idea is to capture the two-dimensional joint spectral and temporal information from the time-frequency representation while possibly separating the sound mixture into several sources. In addition, the RUSBoost technique is utilized to address the class imbalance problem of the training data. The proposed approach is evaluated using the TUT Sound Event 2016 and 2017 datasets. The results showed that the proposed method outperformed the baseline methods. For the TUT Sound Event 2016 dataset, the proposed method reduced the total error rate by 5% while increasing the F1 score by 13.8%. For the TUT Sound Event 2017 dataset, the proposed method reduced the total error rate by 3% while increasing the F1 score by 8.1%.  相似文献   

10.
李应  印佳丽 《电子学报》2018,46(11):2705-2713
论文针对各种背景声音中低信噪比声音事件的检测问题,提出把背景声音与声音事件混合,形成带噪声样本来训练分类器.在预处理阶段,使用基于经验模态分解与2-6级固有模态函数的投票方法,对背景声音与声音事件端点进行预测并估算信噪比.接着使用子带能量分布方法,提取声音数据的特征.最后,论文将背景声音与声音事件样本库中所有声音样本按照估算的信噪比相混合,生成混合声音特征训练多随机森林,用于低信噪比声音事件的检测.实验证实,所提出的方法可以用于各种声场景下低信噪比声音事件的检测,并能在信噪比为-5dB的情况下保持67.1%的平均检测率.  相似文献   

11.
级联卷积神经网络(CNN)结构和循环神经网络(RNN)结构的卷积循环神经网络(CRNN)及其改进是当前主流的声音事件检测模型.然而,以端到端方式训练的CRNN声音事件检测模型无法从功能上约束CNN和RNN结构的作用.针对这一问题,该文提出了音频标记一致性约束CRNN声音事件检测方法(ATCC-CRNN).该方法在CRN...  相似文献   

12.
In this paper, an artificial stereo extension method that creates stereophonic sound from a mono sound source is proposed. The proposed method first trains deep neural networks (DNNs) that model the nonlinear relationship between the dominant and residual signals of the stereo channel. In the training stage, the band‐wise log spectral magnitude and unwrapped phase of both the dominant and residual signals are utilized to model the nonlinearities of each sub‐band through deep architecture. From that point, stereo extension is conducted by estimating the residual signal that corresponds to the input mono channel signal with the trained DNN model in a sub‐band domain. The performance of the proposed method was evaluated using a log spectral distortion (LSD) measure and multiple stimuli with a hidden reference and anchor (MUSHRA) test. The results showed that the proposed method provided a lower LSD and higher MUSHRA score than conventional methods that use hidden Markov models and DNN with full‐band processing.  相似文献   

13.
The inter‐channel level difference (ICLD) is a cue parameter to estimate spectral information in binaural cue coding that has been recently in the spotlight as a multichannel audio signal compression technique. Even though the ICLD is an essential parameter, it is generally distorted by quantization. In this paper, a new modified ICLD representation method to minimize the quantization distortion is proposed by adopting a flexible determination of the reference channel and the unidirectional quantization scheme. Our experimental result confirms that the proposed method improves the multichannel audio output quality even with the reduced bit‐rate.  相似文献   

14.
现有的深度神经网络语音增强方法忽视了相位谱学习的重要性,从而造成增强语音质量不理想。针对这一问题,文中提出了一种基于卷积循环网络与非局部模块的语音增强方法。通过设计一种编解码网络,将语音信号的时域表示作为编码端的输入进行深层特征提取,从而充分利用语音信号的幅值信息以及相位信息。在编码端和解码端的卷积层中加入非局部模块,在提取语音序列关键特征的同时,抑制无用特征,并引入门控循环单元网络捕捉语音序列间的时序相关性信息。在ST-CMDS中文语音数据集上实验结果表明,与未处理的含噪语音相比,使用文中方法生成的增强语音质量和可懂度平均提升了61%和7.93%。  相似文献   

15.
在语种识别过程中,为提取语音信号中的空间特 征以及时序特征,从而达到提高多语 种识别准确率的目的,提出了一种利用卷积循环神经网络(convolutional recurrent neural network,CRNN)混合神经网络的多语种识别模型。该模型首先提 取语音信号的声学特征;然后将特征输入到卷积神经网络(convolutional neural network,CNN) 提取低维度的空间特征;再通过空 间金字塔池化层(spatial pyramid pooling layer,SPP layer) 对空间特征进行规整,得到固定长度的一维特征;最后将其输入到循环神经 网络(recurrenrt neural network,CNN) 来判别语种信息。为验证模型的鲁棒性,实验分别在3个数据集上进行,结果表明:相 比于传统的CNN和RNN,CRNN混合神经网络对不同数据集的语种识别 准确率均有提高,其中在8语种数据集中时长为5 s的语音上最为明显,分别提高了 5.3% 和6.1%。  相似文献   

16.
储备池计算概述   总被引:2,自引:0,他引:2       下载免费PDF全文
彭宇  王建民  彭喜元 《电子学报》2011,39(10):2387-2396
针对传统递归神经网络存在训练困难的问题,一种新的递归神经网络的训练方法——储备池计算被提出,这种方法的核心思想是只训练网络部分连接权,其余连接权一经产生就不再改变,网络的训练一般只需要通过求解线性回归问题.广义地说,储备池可以作为一种时序相关的核函数使用,从而完全拓展了其应用领域,使之不再仅仅是递归神经网络训练算法的一...  相似文献   

17.
张彦晖  吕娜  刘鹏飞  陈卓 《信号处理》2021,37(7):1180-1188
流量加密技术给流量分类带来了新的挑战,为实现加密流量的快速准确分类,提出了一种基于卷积注意力门控循环网络的加密流量分类方法。将卷积神经网络和门控循环单元相结合,针对流量数据的特点,修改卷积神经网络的池化层以提取单个数据包特征,通过注意力机制寻找单个数据包的关键特征并赋予高权重;然后采用门控循环单元提取流层面数据包间的时间序列特征,从包层面和流层面全面反映流量的整体和局部特征。实验证明该方法相对于现有方法,提高了分类准确率、实时性和训练效率。   相似文献   

18.
D类功率放大器具有优异的传输效率,属于开关类功放,其输出信号存在较大的非线性失真。对D类功率放大器进行行为建模时要同时考虑其非线性和记忆特性。文中将小波变换引入到编码—解码神经网络模型中,提出了小波编码—解码神经网络模型。使用基于门限循环单元的编码—解码模型和小波编码—解码模型进行D类功率放大器的行为建模。实验结果表明,文中提出的D类功率放大器行为模型相比于传统的Voterra-Laguerre模型而言,在信号的时域和频域都具有更高的精度。  相似文献   

19.
In this paper, we propose a new approach for signal detection in wireless digital communications based on the neural network with transient chaos and time-varying gain (NNTCTG), and give a concrete model of the signal detector after appropriate transformations and mappings. It is well known that the problem of the maximum likelihood signal detection can be described as a complex optimization problem that has so many local optima that conventional Hopfield-type neural networks fail to solve. By refraining from the serious local optima problem of Hopfield-type neural networks, the NNTCTG makes use of the time-varying parameters of the recurrent neural network to control the evolving behavior of the network so that the network undergoes the transition from chaotic behavior to gradient convergence. It has richer and more flexible dynamics rather than conventional neural networks only with point attractors, so that it can be expected to have much ability to search for globally optimal or near-optimal solutions. After going through a transiently inverse-bifurcation process, the NNTCTG can approach the global optimum or the neighborhood of global optimum of our problem. Simulation experiments have been performed to show the effectiveness and validation of the proposed neural network based method for the signal detection in digital communications.  相似文献   

20.
This paper proposes an approach to improve the performance of no-reference video quality assessment for sports videos with dynamic motion scenes using an efficient spatiotemporal model. In the proposed method, we divide the video sequences into video blocks and apply a 3D shearlet transform that can efficiently extract primary spatiotemporal features to capture dynamic natural motion scene statistics from the incoming video blocks. The concatenation of a deep residual bidirectional gated recurrent neural network and logistic regression is used to learn the spatiotemporal correlation more robustly and predict the perceptual quality score. In addition, conditional video block-wise constraints are incorporated into the objective function to improve quality estimation performance for the entire video. The experimental results show that the proposed method extracts spatiotemporal motion information more effectively and predicts the video quality with higher accuracy than the conventional no-reference video quality assessment methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号