首页 | 官方网站   微博 | 高级检索  
     

小规模情感数据和大规模中性数据相结合的情感韵律建模研究
引用本文:邵艳秋,穗志方,韩纪庆,王志伟.小规模情感数据和大规模中性数据相结合的情感韵律建模研究[J].计算机研究与发展,2007,44(9):1624-1631.
作者姓名:邵艳秋  穗志方  韩纪庆  王志伟
作者单位:1. 北京大学计算语言学研究所,北京,100871
2. 哈尔滨工业大学计算机科学与技术学院,哈尔滨,150001
基金项目:国家自然科学基金 , 国家重点基础研究发展计划(973计划)
摘    要:建立好的情感韵律模型是合成情感语音的重要环节,而在情感语音的研究过程中,一个必须面对的现实问题就是通常情感数据量相比于中性数据量要少得多.将一个含有高兴、生气、悲伤3种情感语音的小规模数据库和一个较大规模的中性语音数据库相结合,进行情感韵律建模研究.对影响情感的韵律参数进行了分析,建立了基于人工神经网络的情感韵律模型.针对情感数据量相对于中性数据量的不足而导致的过拟合现象,提出了3种解决办法,即混合语料法、最小二乘融合法和级联网络法.这些方法都在不同程度上扩大了情感语料的作用,使得情感预测效果都有所提高.尤其是级联网络法,将中性模型的结果作为级联网络的一个输入,相当于扩大了情感模型的特征空间,更加强化了情感模型各输入特征的作用,在3种情感的各韵律参数生成中效果是最好的.

关 键 词:情感语音合成  韵律模型  数据稀疏  数据融合  过拟合  大规模  情感数据  中性数据  结合  韵律  建模研究  Data  Neutral  Based  Model  Building  Prosody  预测效果  参数生成  特征空间  强化  情感模型  输入  级联网络  结果  中性模型
修稿时间:2006-03-14

A Study on the Emotional Prosody Model Building Based on Small-Scale Emotional Data and Large-Scale Neutral Data
Shao Yanqiu,Sui Zhifang,Han Jiqing,Wang Zhiwei.A Study on the Emotional Prosody Model Building Based on Small-Scale Emotional Data and Large-Scale Neutral Data[J].Journal of Computer Research and Development,2007,44(9):1624-1631.
Authors:Shao Yanqiu  Sui Zhifang  Han Jiqing  Wang Zhiwei
Affiliation:1. Institute of Computational Linguistics, Peking University, Beijing 100871 ;2. School of Computer Science and Technology, Harbin Institute of Technology, Habin 150001
Abstract:Emotional prosody model building is very important for emotional speech synthesis.However,in the courses of researches,it is a serious problem that the quantity of emotional data is much less than neutral data.The corpus including three emotions,i.e.happiness,anger and sadness,is built in this paper.The parameters that affect the emotional prosody are analyzed and an emotional prosody model based on neural network is built.In the process of training the prosody model,because emotional corpus is too small,the problem of over-fitting caused by data sparsity will occur.In order to utilize the large-scale neutral data to improve the quality of emotional prosody model,three methods are proposed,namely,the method of mixed corpus,data fusion based on least-square algorithm,and multistage network.All of these methods amplify the impact of emotional corpus.So,the prediction results of emotional parameters are all improved to some extent.Especially the method of multistage network,which uses the result of neutral model as one input of the network,corresponds to enlarge the features space and strengthen the function of the emotional input features.The results show that the multistage network is the best one of the three methods.
Keywords:emotional speech synthesis  prosody model  data sparsity  data fusion  over-fitting
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号