简化LSTM的语音合成 Speech synthesis using simplified LSTM期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

简化LSTM的语音合成

引用本文：	陈宙斯,胡文心.简化LSTM的语音合成[J].计算机工程与应用,2018,54(3):131-135.

作者姓名：	陈宙斯胡文心

作者单位：	华东师范大学计算中心，上海 200062

摘要：	在增大训练数据的情况下，使用传统的隐马尔科夫模型难以提升参数化语音合成预测质量。长短期记忆神经网络学习序列内的长程特征，在大规模并行数值计算下获得更准确的语音时长和更连贯的频谱模型，但同时也包含了可简化的计算。首先分析双向长短期记忆神经网络功能结构，接着移除遗忘门和输出门，最后对文本音素信息到倒频谱的映射关系建模。在普通话语料库上的对比实验证明，简化双向长短期记忆神经网络计算量减少一半，梅尔倒频率失真度由隐马尔科夫模型的3.466 1降低到1.945 9。
关键词：	参数化语音合成神经网络长短期记忆神经网络
Speech synthesis using simplified LSTM

CHEN Zhousi,HU Wenxin.Speech synthesis using simplified LSTM[J].Computer Engineering and Applications,2018,54(3):131-135.

Authors:	CHEN Zhousi HU Wenxin

Affiliation:	Computer Center, East China Normal University, Shanghai 200062, China

Abstract:	Conventional parametric speech synthesis approach using hidden Markov model can hardly obtain significant improvement when trained with large scale data. As Long Short-Term Memory（LSTM） is designed to take full account of the long-term sequence features, it dynamically produces an output respecting on the input and its internal status, which brings more accuracy and smoothness in sequential prediction. However, its large computation is still tailorable. In this paper, LSTM is simplified by removing the forget gate and output gate, and then models the relationship between syllable and its cepstral on a Chinese speech data set. Both training and prediction time decrease by half while Mel cepstral distortion goes down from HMM’s 3.466 1 to 1.945 9.

Keywords:	parametric speech synthesis neural network Long Short-Term Memory（LSTM）

	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏