首页 | 官方网站   微博 | 高级检索  
     

简化LSTM的语音合成
引用本文:陈宙斯,胡文心.简化LSTM的语音合成[J].计算机工程与应用,2018,54(3):131-135.
作者姓名:陈宙斯  胡文心
作者单位:华东师范大学 计算中心,上海 200062
摘    要:在增大训练数据的情况下,使用传统的隐马尔科夫模型难以提升参数化语音合成预测质量。长短期记忆神经网络学习序列内的长程特征,在大规模并行数值计算下获得更准确的语音时长和更连贯的频谱模型,但同时也包含了可简化的计算。首先分析双向长短期记忆神经网络功能结构,接着移除遗忘门和输出门,最后对文本音素信息到倒频谱的映射关系建模。在普通话语料库上的对比实验证明,简化双向长短期记忆神经网络计算量减少一半,梅尔倒频率失真度由隐马尔科夫模型的3.466 1降低到1.945 9。

关 键 词:参数化语音合成  神经网络  长短期记忆神经网络  

Speech synthesis using simplified LSTM
CHEN Zhousi,HU Wenxin.Speech synthesis using simplified LSTM[J].Computer Engineering and Applications,2018,54(3):131-135.
Authors:CHEN Zhousi  HU Wenxin
Affiliation:Computer Center, East China Normal University, Shanghai 200062, China
Abstract:Conventional parametric speech synthesis approach using hidden Markov model can hardly obtain significant improvement when trained with large scale data. As Long Short-Term Memory(LSTM) is designed to take full account of the long-term sequence features, it dynamically produces an output respecting on the input and its internal status, which brings more accuracy and smoothness in sequential prediction. However, its large computation is still tailorable. In this paper, LSTM is simplified by removing the forget gate and output gate, and then models the relationship between syllable and its cepstral on a Chinese speech data set. Both training and prediction time decrease by half while Mel cepstral distortion goes down from HMM’s 3.466 1 to 1.945 9.
Keywords:parametric speech synthesis  neural network  Long Short-Term Memory(LSTM)  
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号