首页 | 官方网站   微博 | 高级检索  
     

基于机器学习的语音驱动人脸动画方法
引用本文:陈益强,高文,王兆其,姜大龙. 基于机器学习的语音驱动人脸动画方法[J]. 软件学报, 2003, 14(2): 215-221
作者姓名:陈益强  高文  王兆其  姜大龙
作者单位:1. 中国科学院,计算技术研究所,北京,100080
2. 中国科学院,计算技术研究所,北京,100080;哈尔滨工业大学,计算机科学与工程系,黑龙江,哈尔滨,150001
基金项目:Supported by the National Natural Science Foundation of China under Grant No.60103007 (国家自然科学基金); the National High-Tech Research and Development Plan of China under Grant No.2001AA114160 (国家高技术研究发展计划)
摘    要:语音与唇动面部表情的同步是人脸动画的难点之一.综合利用聚类和机器学习的方法学习语音信号和唇动面部表情之间的同步关系,并应用于基于MEPG-4标准的语音驱动人脸动画系统中.在大规模音视频同步数据库的基础上,利用无监督聚类发现了能有效表征人脸运动的基本模式,采用神经网络学习训练,实现了从含韵律的语音特征到人脸运动基本模式的直接映射,不仅回避了语音识别鲁棒性不高的缺陷,同时学习的结果还可以直接驱动人脸网格.最后给出对语音驱动人脸动画系统定量和定性的两种分析评价方法.实验结果表明,基于机器学习的语音驱动人脸动画不仅能有效地解决语音视频同步的难题,增强动画的真实感和逼真性,同时基于MPEG-4的学习结果独立于人脸模型,还可用来驱动各种不同的人脸模型,包括真实视频、2D卡通人物以及3维虚拟人脸.

关 键 词:机器学习  人脸动画  语音驱动
文章编号:1000-9825/2003/14(02)0215
收稿时间:2001-06-04
修稿时间:2001-08-01

A Speech Driven Face Animation System Based on Machine Learning
CHEN Yi-Qiang,GAO Wen,WANG Zhao-Qi and JIANG Da-Long. A Speech Driven Face Animation System Based on Machine Learning[J]. Journal of Software, 2003, 14(2): 215-221
Authors:CHEN Yi-Qiang  GAO Wen  WANG Zhao-Qi  JIANG Da-Long
Abstract:Lip synchronization is the key issue in speech driven face animation system. In this paper, some clustering and machine learning methods are combined together to estimate face animation parameters from audio sequences and then apply the learning results to MPEG-4 based speech driven face animation system. Based on a large recorded audio-visual database, an unsupervised cluster algorithm is proposed to obtain basic face animation parameter patterns that can describe face motion characteristic. An Artificial Neural Network (ANN) is trained to map the cepstral coefficients of an individual's natural speech to face animation parameter patterns directly. It avoids the potential limitation of speech recognition. And the output can be used to drive the articulation of the synthetic face straightforward. Two approaches for evaluation test are also proposed: quantitative evaluation and qualitative evaluation. The performance of this system shows that the proposed learning algorithm is suitable, which greatly improves the realism of face animation during speech. And this MPEG-4 based learning are suitable for driving many different kinds of animation ranging from video-realistic image wraps to 3D Cartoon characters.
Keywords:machine learning  facial animation  speech driven
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号