首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 625 毫秒
1.
汉语中,协同发音主要取决于相邻前一音节末尾的元音,以及相邻后一音节首的辅音。主要考察在汉语普通话双音节中,第一音节元音韵母和不同第二音节声母组合时对第一个音节元音共振峰轨迹的影响。元音韵母选用元音三角形的3个顶点的元音,总结了轨迹变化的规律。  相似文献   

2.
根据汉语拼音的构成及发音时的唇动特点,对汉语复韵母进行了分类,提出了一套适合各类汉语复韵母发音特点的动态视位模型,再利用Directx9.0进行图形的变换及渲染,使三维人脸口形的变换过程更加丰富自然.较传统的二维人脸动画更加灵活、生动,可广泛应用于三维游戏中角色语音动画的制作、虚拟主持人配音等方面。  相似文献   

3.
由文本至口形的媒体变换技术的研究   总被引:1,自引:0,他引:1  
杨丹宁  郭峰 《电子学报》1996,24(1):122-125
本文根据汉语的发音特点,提出了由基本音素口形参数和预先建立的口形模型来生成各种音节相应的正面口形图象的方法,从而完成由文本至口形的媒体变换,首先我们建立变形物体运动模型,然后由此构造一个口唇的动态发音模型,最后采用纹理映射技术,由基本音素口形参数和动态发音模型合成正面发音口形的图象。  相似文献   

4.
对语音口型匹配的机理进行了概要探析.首先归结了语音口型运动遵循的一些规则,然后在此基础上,结合汉语语音发音特点,从几何形状匹配和时间匹配两个层面,探析了语音与口型匹配的机理.在几何匹配层面,阐述了汉语语音与口型的对应方法,并进一步分析了声母和韵母口型的形成机理;在时间匹配层面,简要分析了声母和韵母口型的时间对位方法.  相似文献   

5.
提出了基于工作结构分解和层次分析法动态权重的效能评估方法,根据系统工作原理、信号流程和任务要求建立工作任务结构分解模型,根据子任务时间逻辑顺序建立活动结构分解模型。子任务效能采用ADC模型进行评估,子任务权重采用层次分析法计算,并根据子任务的执行时间区间对每个评估时间点的子任务权重进行动态计算,计算得到的效能时间函数曲线能较好地体现系统效能的动态变化过程。最后给出了评估仿真算例。  相似文献   

6.
徐向华  朱杰  郭强 《信号处理》2004,20(5):497-500
针对汉语语音单音节结构的特点,考虑音节间协同发音的现象,本文提出了一种对三音子模型进行分级聚类的方法。与传统的基于决策树的状态聚类算法相比,该方法通过对稀少三音子模型聚类,更充分地利用训练数据,减少稀少三音子对状态聚类的影响,从而提高声学模型的鲁棒性。实验结果表明:大词汇量连续语音识别器采用这种分级聚类方法,不仅可以大大减少模型及其参数的数量,还可使系统识别率有所提高,其中误识率相对于传统的决策树状态聚类系统降低了4.93%。  相似文献   

7.
李蓓蓓 《电视技术》2011,35(21):135-137
采用风噪声作为基本模型,模拟海面波动态运动过程.海面波的高度受到海面波平面运动方向的函数影响.海面波的运动可表示成平面的两个方向的函数,这两个函数控制着海面波的高度.海面波在某一时间点截面的高度表示成风噪声函数,再通过主函数的形式把这两个函数对海面波高度的影响进行统一,并采用人工参与的遗传算法对仿真效果进行优化.仿真时采用动画帧的形式形成时间片,产生用户满意的三维海面波动态效果.  相似文献   

8.
在3D人物表情动画制作中,为了实现动画的真实自然,必须使人物说话时的口型(也称唇型)与所说的话同步,即口型与声音同步.目前,能实现口型动画匹配的软件很少,比较著名的有Poser口型大师Mimic,3ds max的口型插件Voice-O-Matic等,但这些软件的设置都较复杂,而且主要是针对英语语言,对汉语不支持或支持差.  相似文献   

9.
赵晖  唐朝京 《电子与信息学报》2009,31(12):3010-3014
为了合成具有真实感的视频序列,该文提出一种基于汉语视频三音素的可视语音合成方法。根据汉语的发音规律和音素与视素的对应关系,该文提出视频三音素的概念。在此基础上,建立隐马尔可夫(HMM)训练与合成模型,在训练过程中使用了视频音频联合特征,并加入了动态特征。在合成过程中,连接视频三音素HMM模型形成句子HMM,并从中提取特征参数,合成可视语音。从主观和客观评估结果来看,合成视频的真实感强,满意度较高。  相似文献   

10.
口型动画是制作人物动画的基本要素,常用的抓元音、重音的方法会耗费大量的时间与精力,笔者结合自己的工作实践,总结了"三格一帧,快则两格"的技巧,能极大提高工作效率.  相似文献   

11.
This paper describes a new and efficient method for facial expression generation on cloned synthetic head models. The system uses abstract facial muscles called action units (AUs) based on both anatomical muscles and the facial action coding system. The facial expression generation method has real-time performance, is less computationally expensive than physically based models, and has greater anatomical correspondence than rational free-form deformation or spline-based, techniques. Automatic cloning of a real human head is done by adapting a generic facial and head mesh to Cyberware laser scanned data. The conformation of the generic head to the individual data and the fitting of texture onto it are based on a fully automatic feature extraction procedure. Individual facial animation parameters are also automatically estimated during the conformation process. The entire animation system is hierarchical; emotions and visemes (the visual mouth shapes that occur during speech) are defined in terms of the AUs, and higher-level gestures are defined in terms of AUs, emotions, and visemes as well as the temporal relationships between them. The main emphasis of the paper is on the abstract muscle model, along with limited discussion on the automatic cloning process and higher-level animation control aspects.  相似文献   

12.
Speech-driven facial animation combines techniques from different disciplines such as image analysis, computer graphics, and speech analysis. Active shape models (ASM) used in image analysis are excellent tools for characterizing lip contour shapes and approximating their motion in image sequences. By controlling the coefficients for an ASM, such a model can also be used for animation. We design a mapping of the articulatory parameters used in phonetics into ASM coefficients that control nonrigid lip motion. The mapping is designed to minimize the approximation error when articulatory parameters measured on training lip contours are taken as input to synthesize the training lip movements. Since articulatory parameters can also be estimated from speech, the proposed technique can form an important component of a speech-driven facial animation system.  相似文献   

13.
Lifelike talking faces for interactive services   总被引:1,自引:0,他引:1  
Lifelike talking faces for interactive services are an exciting new modality for man-machine interactions. Recent developments in speech synthesis and computer animation enable the real-time synthesis of faces that look and behave like real people, opening opportunities to make interactions with computers more like face-to-face conversations. This paper focuses on the technologies for creating lifelike talking heads, illustrating the two main approaches: model-based animations and sample-based animations. The traditional model-based approach uses three-dimensional wire-frame models, which can be animated from high-level parameters such as muscle actions, lip postures, and facial expressions. The sample-based approach, on the other hand, concatenates segments of recorded videos, instead of trying to model the dynamics of the animations in detail. Recent advances in image analysis enable the creation of large databases of mouth and eye images, suited for sample-based animations. The sample-based approach tends to generate more naturally looking animations at the expense of a larger size and less flexibility than the model-based animations. Beside lip articulation, a talking head must show appropriate head movements, in order to appear natural. We illustrate how such "visual prosody" is analyzed and added to the animations. Finally, we present four applications where the use of face animation in interactive services results in engaging user interfaces and an increased level of trust between user and machine. Using an RTP-based protocol, face animation can be driven with only 800 bits/s in addition to the rate for transmitting audio.  相似文献   

14.
潘晋  杨卫英 《电声技术》2009,33(5):62-65
快速、高效地实现语音驱动下的唇形自动合成,以及优化语音与唇动的同步是语音驱动人脸动画的重点。提出了一种基于共振峰分析的语音驱动人脸动画的方法。对语音信号进行加窗分帧,DFT变换,再对短时音频信号的频谱进行第一、第二共振峰分析,将分析结果映射为一组控制序列,并对控制序列进行去奇异点等后处理。设定三维人脸模型的动态基本口形,以定时方式将控制序列导入模型,完成人脸动画驱动。实验结果表明,该方法简单快速,有效实现了语音和唇形的同步,动画效果连贯自然,可广泛用于各类虚拟角色的配音,缩短虚拟人物的制作周期。  相似文献   

15.
We describe the components of the system used for real-time facial communication using a cloned head. We begin with describing the automatic face cloning using two orthogonal photographs of a person. The steps in this process are the face model matching and texture generation. After an introduction to the MPEG-4 parameters that we are using, we proceed with the explanation of the facial feature tracking using a video camera. The technique requires an initialization step and is further divided into mouth and eye tracking. These steps are explained in detail. We then explain the speech processing techniques used for real-time phoneme extraction and subsequent speech animation module. We conclude with the results and comments on the integration of the modules towards a complete system  相似文献   

16.
倪奎  董兰芳 《电子技术》2009,36(12):64-67
人脸动画广泛地应用于游戏行业、远程会议、代理和化身等许多其它领域,近年吸引了很多学者的研究,其中口腔/眼睛等器官的动画一直是一个较大的难点。本文提出了一种将口腔/眼睛的器官样本图片融合到人脸图像中并根据单张中性人脸图片生成人脸动画的方法。该方法根据特征点生成样条,在极坐标上对样条插值来实现空间映射,然后采用后向映射和插值进行图像重采样得到融合图像。实验结果表明,该方法产生的融合图片较为自然,能实现口腔/眼球等器官的运动,能满足人脸动画生成的实时性要求。  相似文献   

17.
Realistic speech animation based on observed 3D face dynamics   总被引:1,自引:0,他引:1  
An efficient system for realistic speech animation is proposed. The system supports all steps of the animation pipeline, from the capture or design of 3D head models up to the synthesis and editing of the performance. This pipeline is fully 3D, which yields high flexibility in the use of the animated character. Real detailed 3D face dynamics, observed at video frame rate for thousands of points on the face of speaking actors, underpin the realism of the facial deformations. These are given a compact and intuitive representation via independent component analysis (ICA). Performances amount to trajectories through this 'viseme space'. When asked to animate a face, the system replicates the 'visemes' that it has learned, and adds the necessary coarticulation effects. Realism has been improved through comparisons with motion captured groundtruth. Faces for which no 3D dynamics have been observed can be animated nonetheless. Their visemes are adapted automatically to their physiognomy by localising the face in a 'face space'.  相似文献   

18.
张承云  蔡阳生 《电声技术》2007,31(11):20-23
通过主观评价实验来分析频响范围对语言清晰度的影响。将测试得到的不同频响范围的清晰度平均值进行比较后发现,高于4kHz的信号对语言清晰度仍有较大影响,特别是4~6kHz频率段的信号影响最大。而对于下限频率来说,至少可以认为只要不低于300Hz,对语言清晰度不会有太大的影响。根据声母及韵母的频谱特性对实验结果进行了分析。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号