首页 | 官方网站   微博 | 高级检索  
     

基于混合分布注意力机制与混合神经网络的语音情绪识别方法
引用本文:陈巧红,于泽源,贾宇波.基于混合分布注意力机制与混合神经网络的语音情绪识别方法[J].计算机工程与科学,2022,44(12):2246-2254.
作者姓名:陈巧红  于泽源  贾宇波
作者单位:(浙江理工大学信息学院,浙江 杭州 310018)
摘    要:针对现有语音情绪识别中存在无关特征多和准确率较差的问题,提出一种基于混合分布注意力机制与混合神经网络的语音情绪识别方法。该方法在2个通道内,分别使用卷积神经网络和双向长短时记忆网络进行语音的空间特征和时序特征提取,然后将2个网络的输出同时作为多头注意力机制的输入矩阵。同时,考虑到现有多头注意力机制存在的低秩分布问题,在注意力机制计算方式上进行改进,将低秩分布与2个神经网络的输出特征的相似性做混合分布叠加,再经过归一化操作后将所有子空间结果进行拼接,最后经过全连接层进行分类输出。实验结果表明,基于混合分布注意力机制与混合神经网络的语音情绪识别方法比现有其他方法的准确率更高,验证了所提方法的有效性。

关 键 词:语音情绪识别  梅尔频率倒谱系数  双向长短时记忆网络  卷积神经网络  多头注意力机制  
收稿时间:2021-03-19
修稿时间:2021-06-25

A speech emotion recognition method using mixeddistributed attention mechanism and hybrid neural network
CHEN Qiao-hong,YU Ze-yuan,JIA Yu-bo.A speech emotion recognition method using mixeddistributed attention mechanism and hybrid neural network[J].Computer Engineering & Science,2022,44(12):2246-2254.
Authors:CHEN Qiao-hong  YU Ze-yuan  JIA Yu-bo
Affiliation:(School of Information Science and Technology,Zhejiang Sci-Tech University,Hangzhou 310018,China)
Abstract:Aiming at the problem that there are many irrelevant features and low accuracy in the existing speech emotion recognition, a speech emotion recognition method based on mixed distributed attention mechanism and hybrid neural network is proposed. The method is in two channels, and the convolutional neural network and bidirectional short and long-time memory network are used to extract the spatial and temporal features of speech respectively, Then, the outputs of the two networks are used as the input matrix of the multi-head attention mechanism. At the same time, considering the low-rank distribution problem of the existing multi-head attention mechanism, the attention mechanism calculation method is improved. The low rank distribution and the similarity of the output characteristics of the two neural networks are superimposed by mixed distribution. After the normalization operation, all the subspace results are stitched together. Finally, the output is classified through the full connection layer. The experimental results show that, the speech emotion recognition method based on mixed distributed attention mechanism and hybrid neural network has higher accuracy than other existing models, verify- ing the validity of the proposed method.
Keywords:speech emotion recognition  Mel frequency cepstral coefficient  bidirectional long short-term memory network  convolutional neural network  multi-head attention mechanism  
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号