首页 | 官方网站   微博 | 高级检索  
     

人类面部重演方法综述
引用本文:刘锦,陈鹏,王茜,付晓蒙,戴娇,韩冀中. 人类面部重演方法综述[J]. 中国图象图形学报, 2022, 27(9): 2629-2651
作者姓名:刘锦  陈鹏  王茜  付晓蒙  戴娇  韩冀中
作者单位:中国科学院信息工程研究所, 北京 100093;中国科学院大学网络空间安全学院, 北京 100049
基金项目:科技创新2030-“新一代人工智能”重大项目(2020AAA0140000);国家自然科学基金项目(61702502)
摘    要:随着计算机视觉领域图像生成研究的发展,面部重演引起广泛关注,这项技术旨在根据源人脸图像的身份以及驱动信息提供的嘴型、表情和姿态等信息合成新的说话人图像或视频。面部重演具有十分广泛的应用,例如虚拟主播生成、线上授课、游戏形象定制、配音视频中的口型配准以及视频会议压缩等,该项技术发展时间较短,但是涌现了大量研究。然而目前国内外几乎没有重点关注面部重演的综述,面部重演的研究概述只是在深度伪造检测综述中以深度伪造的内容出现。鉴于此,本文对面部重演领域的发展进行梳理和总结。本文从面部重演模型入手,对面部重演存在的问题、模型的分类以及驱动人脸特征表达进行阐述,列举并介绍了训练面部重演模型常用的数据集及评估模型的评价指标,对面部重演近年研究工作进行归纳、分析与比较,最后对面部重演的演化趋势、当前挑战、未来发展方向、危害及应对策略进行了总结和展望。

关 键 词:人工智能(AI)  计算机视觉  深度学习  生成对抗网络(GAN)  深度伪造  面部重演
收稿时间:2022-01-20
修稿时间:2022-05-25

Critical review of human face reenactment methods
Liu Jin,Chen Peng,Wang Xi,Fu Xiaomeng,Dai Jiao,Han Jizhong. Critical review of human face reenactment methods[J]. Journal of Image and Graphics, 2022, 27(9): 2629-2651
Authors:Liu Jin  Chen Peng  Wang Xi  Fu Xiaomeng  Dai Jiao  Han Jizhong
Affiliation:Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
Abstract:Current image and video data have been increasing dramatically in terms of huge artificial intelligence (AI)-generated contents.The derived face reenactment has been developing based on generated facial images or videos.Given source face information and driving motion information,face reenactment aims to generate a reenacted face or corresponding reenacted face video of driving motion information in related to the animation of expression,mouth shape,eye gazing and pose while preserving the identity information of the source face.Face reenactment methods can generate a variety of multiple feature-based and motion-based face videos,which are widely used with less constraints and becomes a research focus in the field of face generation.However,almost no reviews are specially written for the aspect of face reenactment.In view of this,we carry out the critical review of the development of face reenactment beyond DeepFake detection contexts.Our review is focused on the nine perspectives as following:1) the universal process of face reenactment model;2) facial information representation;3) key challenges and barriers;4) the classification of related methods;5) introduction of various face reenactment methods;6) evaluation metrics;7) commonly used datasets;8) practical applications;and 9) Conclusion and future prospect.The identity information and background information is extracted from source faces while motion features are extracted from driving information,which are combined to generate the reenacted faces.Generally,latent codes,3D morphable face models (3DMM) coefficients,facial landmarks and facial action units are all served as motion features.Besides,there exist several challenges and problems which are always focused in related research.The identity mismatch problem means the inability of face reenactment model to preserve the identity of source faces.The issue of temporal or background inconsistency indicates that the generated face videos are related to the cross-framing jitter or obvious artifacts between the facial contour and the background.The constraints of identity are originated from the model design and training procedure,which can merely reenact the specific person seen in the training data.As for the category of face reenactment methods,image-driven methods and cross-modality-driven methods are involved according to the modality of driving information.Based on the difference of driving information representation,image-driven methods can be divided into four categories.The driving information representation includes facial landmarks,3DMM,motion field prediction and feature decoupling.The subclasses of identity restriction (yes/no issue) can be melted into the landmark-based and 3DMM-based methods further in terms of whether the model could generate unseen subjects or not.Our demonstration of each category,corresponding model flowchart and following improvement work will be illustrated in detail.As for the cross-modality driven methods,the text and audio related methods are introduced,which are ill-posed questions due to audio or text facial motion information may have multiple corresponding solutions.For instance,different facial poses or motions of same identity can produce basically the same audio.Cross-modality face reenactment is challenged to attract attention,which will also be introduced comprehensively.Text driven methods are developed based on three stages in terms of driving content progressively,which are extra required audio,restricted text-driven and arbitrary text-driven.The audio driven methods can be further divided into two categories depending on whether additional driving information is demanded or not.The additional driving information refers to eye blinking label or head pose videos,which offer auxiliary information in generation procedure.Moreover,comparative experiments are conducted to evaluate the performance between various methods.Image quality and facial motion accuracy are taken into consideration during evaluation.The peak signal-to-noise ratio (PSNR),structural similarity index measure (SSIM),cumulative probability of blur detection (CPBD),frechet inception distance (FID) or other traditional image generation evaluation metrics are adopted together.To judge the facial motion accuracy,landmark difference,action unit detection analysis,and pose difference are utilized.In most facial-related cases,the landmarks,the presence of action unit or Euler angle are predicted all via corresponding pre-trained models.As for audio driven methods,the lip synchronization extent is also estimated in the aid of the pretrained evaluation model.Apart from the Objective evaluations,subjective metrics like user study are applied as well.Furthermore,the commonly-used datasets in face reenactment are illustrated,each of which contains face images or videos of various expressions,view angles,illumination conditions or corresponding talking audios.The videos are usually collected from the interviews,news broadcast or actor recording.To reflect different level of difficulties,the image and video datasets are tested related to indoor and outdoor scenario.Commonly,the indoor scenario refers to white or grey walls while the outdoor scenario denotes actual moving scenes or the news live room.As for Conclusion part,the practical applications and potential threats are critically illustrated.Face reenactment can contribute to entertainment industry like movie video dubbing,video production,game character avatar or old photo colorization.It can be utilized in conference compressing,online customer service,virtual uploader or 3D digital person as well.However,it is warning that misused face reenactment behaviors of lawbreakers can be used for calumniate,false information spreading or harmful media content creation in DeepFake,which will definitely damage the social stability and causing panic on social media.Therefore,it is important to consider more ethical issues of face reenactment.Furthermore,the development status of each category and corresponding future directions are displayed.Overall,model optimization and generation-scenario robustness are served as the two main concerns.Optimization issue is focused on data dependence alleviation,feature disentanglement,real time testing or evaluation metric improvement.Robustness improvement of face reenactment denotes generate high-quality reenacted faces under situations like face occlusion,outdoor scenario,large pose faces or complicated illumination.In a word,our critical review covers the universal pipeline of face reenactment model,main challenges,the classification and detailed explanation about each category of methods,the evaluation metrics and commonly used datasets,the current research analysis and prospects.The potential introduction and guidance of face reenactment research is facilitated.
Keywords:artificial intelligence (AI)  computer vision  deep learning  generative adversarial network (GAN)  DeepFake  face reenactment
点击此处可从《中国图象图形学报》浏览原始摘要信息
点击此处可从《中国图象图形学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号