小规模情感数据和大规模中性数据相结合的情感韵律建模研究 A Study on the Emotional Prosody Model Building Based on Small-Scale Emotional Data and Large-Scale Neutral Data期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

小规模情感数据和大规模中性数据相结合的情感韵律建模研究

引用本文：	邵艳秋,穗志方,韩纪庆,王志伟.小规模情感数据和大规模中性数据相结合的情感韵律建模研究[J].计算机研究与发展,2007,44(9):1624-1631.

作者姓名：	邵艳秋穗志方韩纪庆王志伟

作者单位：	1. 北京大学计算语言学研究所,北京,100871 2. 哈尔滨工业大学计算机科学与技术学院,哈尔滨,150001

基金项目：	国家自然科学基金 , 国家重点基础研究发展计划(973计划)

摘要：	建立好的情感韵律模型是合成情感语音的重要环节,而在情感语音的研究过程中,一个必须面对的现实问题就是通常情感数据量相比于中性数据量要少得多.将一个含有高兴、生气、悲伤3种情感语音的小规模数据库和一个较大规模的中性语音数据库相结合,进行情感韵律建模研究.对影响情感的韵律参数进行了分析,建立了基于人工神经网络的情感韵律模型.针对情感数据量相对于中性数据量的不足而导致的过拟合现象,提出了3种解决办法,即混合语料法、最小二乘融合法和级联网络法.这些方法都在不同程度上扩大了情感语料的作用,使得情感预测效果都有所提高.尤其是级联网络法,将中性模型的结果作为级联网络的一个输入,相当于扩大了情感模型的特征空间,更加强化了情感模型各输入特征的作用,在3种情感的各韵律参数生成中效果是最好的.
关键词：	情感语音合成韵律模型数据稀疏数据融合过拟合大规模情感数据中性数据结合韵律建模研究 Data Neutral Based Model Building Prosody 预测效果参数生成特征空间强化情感模型输入级联网络结果中性模型
修稿时间：	2006-03-14
A Study on the Emotional Prosody Model Building Based on Small-Scale Emotional Data and Large-Scale Neutral Data

Shao Yanqiu,Sui Zhifang,Han Jiqing,Wang Zhiwei.A Study on the Emotional Prosody Model Building Based on Small-Scale Emotional Data and Large-Scale Neutral Data[J].Journal of Computer Research and Development,2007,44(9):1624-1631.

Authors:	Shao Yanqiu Sui Zhifang Han Jiqing Wang Zhiwei

Affiliation:	1. Institute of Computational Linguistics, Peking University, Beijing 100871 ;2. School of Computer Science and Technology, Harbin Institute of Technology, Habin 150001

Abstract:	Emotional prosody model building is very important for emotional speech synthesis.However,in the courses of researches,it is a serious problem that the quantity of emotional data is much less than neutral data.The corpus including three emotions,i.e.happiness,anger and sadness,is built in this paper.The parameters that affect the emotional prosody are analyzed and an emotional prosody model based on neural network is built.In the process of training the prosody model,because emotional corpus is too small,the problem of over-fitting caused by data sparsity will occur.In order to utilize the large-scale neutral data to improve the quality of emotional prosody model,three methods are proposed,namely,the method of mixed corpus,data fusion based on least-square algorithm,and multistage network.All of these methods amplify the impact of emotional corpus.So,the prediction results of emotional parameters are all improved to some extent.Especially the method of multistage network,which uses the result of neutral model as one input of the network,corresponds to enlarge the features space and strengthen the function of the emotional input features.The results show that the multistage network is the best one of the three methods.

Keywords:	emotional speech synthesis prosody model data sparsity data fusion over-fitting
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏