首页 | 官方网站   微博 | 高级检索  
     

基于ResNet-BLSTM的端到端语音识别
引用本文:胡章芳,徐轩,付亚芹,夏志广,马苏东.基于ResNet-BLSTM的端到端语音识别[J].计算机工程与应用,2020,56(18):124-130.
作者姓名:胡章芳  徐轩  付亚芹  夏志广  马苏东
作者单位:1.重庆邮电大学 光电工程学院,重庆 400065 2.重庆邮电大学 先进制造学院,重庆 400065
摘    要:基于深度学习的端到端语音识别模型中,由于模型的输入采用固定长度的语音帧,造成时域信息和部分高频信息损失进而导致识别率不高、鲁棒性差等问题。针对上述问题,提出了一种基于残差网络与双向长短时记忆网络相结合的模型,该模型采用语谱图作为输入,同时在残差网络中设计并行卷积层,提取不同尺度的特征,然后进行特征融合,最后采用连接时序分类方法进行分类,实现一个端到端的语音识别模型。实验结果表明,该模型在Aishell-1语音集上字错误率相较于传统端到端模型的WER下降2.52%,且鲁棒性较好。

关 键 词:残差网络(ResNet)  双向长短时记忆网络(BLSTM)  并行卷积层  连接时序分类  

End to End Speech Recognition Based on ResNet-BLSTM
HU Zhangfang,XU Xuan,FU Yaqin,XIA Zhiguang,MA Sudong.End to End Speech Recognition Based on ResNet-BLSTM[J].Computer Engineering and Applications,2020,56(18):124-130.
Authors:HU Zhangfang  XU Xuan  FU Yaqin  XIA Zhiguang  MA Sudong
Affiliation:1.School of Optoelectronic Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China 2.School of Advanced Manufacturing Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
Abstract:In the end-to-end speech recognition model based on deep learning, the input of the model adopts fixed length speech frames, which results in the loss of time-domain information and part of high-frequency information, resulting in low recognition rate and at weak robust of system. According to the above problem, this paper proposes a model based on the ResNet and the BLSTM, the model uses the spectrogram as input, and simultaneously designs the parallel convolution layer in the residual network, extracts features of different scales, and then performs features fusion, and finally uses the connection timing classification method to classify and realize an end-to-end speech recognition model. The experimental results show that compared with the traditional end-to-end model, the WER of the model in this paper decreases by 2.52% on the Aishell-1 speech set, and the robustness is better.
Keywords:Residual Network(ResNet)  Bi-directional Long Short-Term Memory(BLSTM)  parallel convolutional layer  connectionist temporal classification  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号