首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 187 毫秒
1.
为了利用计算机方便快捷地生成表情逼真的动漫人物,提出一种基于深度学习和表情AU参数的人脸动画生成方法.该方法定义了用于描述面部表情的24个面部运动单元参数,即表情AU参数,并利用卷积神经网络和FEAFA数据集构建和训练了相应的参数回归网络模型.在根据视频图像生成人脸动画时,首先从单目摄像头获取视频图像,采用有监督的梯度下降法对视频帧进行人脸检测,进而对得到的人脸表情图像准确地回归出表情AU参数值,将其视为三维人脸表情基系数,并结合虚拟人物相对应的24个基础三维表情形状和中立表情形状,在自然环境下基于表情融合变形模型驱动虚拟人物生成人脸动画.该方法省去了传统方法中的三维重建过程,并且考虑了运动单元参数之间的相互影响,使得生成的人脸动画的表情更加自然、细腻.此外,基于人脸图像比基于特征点回归出的表情系数更加准确.  相似文献   

2.
用含有面部表情信息的向量作为输入条件指导生成高真实性人脸图像是一个重要的研究课题,但常用的八类表情标签较为单一,为更好地反映人脸各处丰富的微表情信息,以面部各个肌肉群作为动作单元(AUs),提出一种基于面部动作编码系统(FACS)的人脸表情生成对抗网络。将注意力机制融合到编码解码生成模块中,网络更加集中关注局部区域并针对性做出生成改变,使用了一种基于判别模块重构误差、分类误差和注意力平滑损失的目标函数。在常用BP4D人脸数据集上的实验结果表明,该方法可以更有效地关注各个动作单元对应区域位置并用单个AU标签控制表情生成,且连续AU标签值大小能控制表情幅度强弱,与其他方法相比,该方法所生成的表情图像细节保留更清晰且真实性更高。  相似文献   

3.
针对视频序列中表情强度不一致,长短时记忆网络(LSTM)难以有效地提取其特征的问题,提出一种基于面部运动单元和时序注意力的视频表情识别方法.首先在卷积LSTM(ConvLSTM)的基础上引入时序注意力模块,对视频序列进行时序建模,在降低维度的同时保留丰富人脸图像特征信息;其次提出基于面部动作单元的人脸图像分割规则,解决面部表情活跃区域难以界定的问题;最后在模型中嵌入标签修正模块,解决自然条件下数据集中样本不确定性的问题.在MMI, Oulu-CASIA和AFEW数据集上的实验结果表明,所提方法的模型参数量低于已公开的主流模型,且在MMI数据集上的平均识别准确率达到87.22%,高于目前主流方法,在整体效果上优于目前具有代表性的方法.  相似文献   

4.
李冠彬  张锐斐  朱鑫  林倞 《软件学报》2023,34(6):2922-2941
面部动作单元分析旨在识别人脸图像每个面部动作单元的状态,可以应用于测谎,自动驾驶和智能医疗等场景.近年来,随着深度学习在计算机视觉领域的普及,面部动作单元分析逐渐成为人们关注的热点.面部动作单元分析可以分为面部动作单元检测和面部动作单元强度预测两个不同的任务,然而现有的主流算法通常只针对其中一个问题.更重要的是,这些方法通常只专注于设计更复杂的特征提取模型,却忽略了面部动作单元之间的语义相关性.面部动作单元之间往往存在着很强的相互关系,有效利用这些语义知识进行学习和推理是面部动作单元分析任务的关键.因此,通过分析不同人脸面部行为中面部动作单元之间的共生性和互斥性构建了基于面部动作单元关系的知识图谱,并基于此提出基于语义关系的表征学习算法(semantic relationship embedded representation learning,SRERL).在现有公开的面部动作单元检测数据集(BP4D、DISFA)和面部动作单元强度预测数据集(FERA2015、DISFA)上,SRERL算法均超越现有最优的算法.更进一步地,在BP4D+数据集上进行泛化性能测试和在BP4D数据集上进行遮挡测试,同样取得当前最优的性能.  相似文献   

5.
面部运动单元检测旨在让计算机从给定的人脸图像或视频中自动检测需要关注的运动单元目标。经过二十多年的研究,尤其是近年来越来越多的面部运动单元数据库的建立和深度学习的兴起,面部运动单元检测技术发展迅速。首先,阐述了面部运动单元的基本概念,介绍了已有的常用面部运动单元检测数据库,概括了包括预处理、特征提取、分类器学习等步骤在内的传统检测方法;然后针对区域学习、面部运动单元关联学习、弱监督学习等几个关键研究方向进行了系统性的回顾梳理与分析;最后讨论了目前面部运动单元检测研究存在的不足以及未来潜在的发展方向。  相似文献   

6.
本文提出基于事例的交互式遗传算法进行面部动作单元识别的算法,将用户的比较能力融入到搜索过程,快速检索到与待识别图像匹配的事例图像,从而实现动作单元的半自动识别。该方法不需抽取图像特征,因而可用于识别非控制成像条件下自发面部图像或图像序列中的动作单元,具有较好的鲁棒性和实用性。文中采用16幅受控成像条件下收集的简单图像进行实验,单独AU的平均识别率达到77.5%,AU组合的平均相似度为82.8%。采用10幅有干扰的非受控成像条件下收集的复杂图像进行实验,单独AU的平均识别率为82.8%,AU组合的平均相似度为93.1%。相对于特征脸算法,本文算法的平均识别率和相似度都有较大程度的提高。  相似文献   

7.
脸部动作编码系统为人脸表情信息定义了脸部动作单元(AU)的概念,但在AU强度的检测上由于各级别之间的区分度较低且个体间人脸表情差异较大,导致检测效果较差。为此,挖掘AU激活和区域之间较强的相关特性,提出一种新的基于区域和特征融合的特征提取算法,并同时给出一种AU强度计算方法,即在对高AU强度和低AU强度二分类后根据有序回归判断AU最终的强度。该算法利用强AU和弱AU较强的可分性,考虑不同AU强度间的相关性,发挥分类和回归方法在AU强度检测方面的优势。在DISFA、FERA2015数据集上的实验结果表明,该算法具有较高的鲁棒性,AU强度的计算效果优于CNN、VGG16等方法。  相似文献   

8.
随着人脸表情识别任务逐渐从实验室受控环境转移至具有挑战性的真实世界环境,在深度学习技术的迅猛发展下,深度神经网络能够学习出具有判别能力的特征,逐渐应用于自动人脸表情识别任务。目前的深度人脸表情识别系统致力于解决以下两个问题:1)由于缺乏足量训练数据导致的过拟合问题;2)真实世界环境下其他与表情无关因素变量(例如光照、头部姿态和身份特征)带来的干扰问题。本文首先对近十年深度人脸表情识别方法的研究现状以及相关人脸表情数据库的发展进行概括。然后,将目前基于深度学习的人脸表情识别方法分为两类:静态人脸表情识别和动态人脸表情识别,并对这两类方法分别进行介绍和综述。针对目前领域内先进的深度表情识别算法,对其在常见表情数据库上的性能进行了对比并详细分析了各类算法的优缺点。最后本文对该领域的未来研究方向和机遇挑战进行了总结和展望:考虑到表情本质上是面部肌肉运动的动态活动,基于动态序列的深度表情识别网络往往能够取得比静态表情识别网络更好的识别效果。此外,结合其他表情模型如面部动作单元模型以及其他多媒体模态,如音频模态和人体生理信息能够将表情识别拓展到更具有实际应用价值的场景。  相似文献   

9.
面部表情分析是计算机通过分析人脸信息尝试理解人类情感的一种技术,目前已成为计算机视觉领域的热点话题。其挑战在于数据标注困难、多人标签一致性差、自然环境下人脸姿态大以及遮挡等。为了推动面部表情分析发展,本文概述了面部表情分析的相关任务、进展、挑战和未来趋势。首先,简述了面部表情分析的几个常见任务、基本算法框架和数据库;其次,对人脸表情识别方法进行了综述,包括传统的特征设计方法以及深度学习方法;接着,对人脸表情识别存在的问题与挑战进行总结思考;最后,讨论了未来发展趋势。通过全面综述和讨论,总结以下观点:1)针对可靠人脸表情数据库规模小的问题,从人脸识别模型进行迁移学习以及利用无标签数据进行半监督学习是两个重要策略;2)受模糊表情、低质量图像以及标注者的主观性影响,非受控自然场景的人脸表情数据的标签库存在一定的不确定性,抑制这些因素可以使得深度网络学习真正的表情特征;3)针对人脸遮挡和大姿态问题,利用局部块进行融合的策略是一个有效的策略,另一个值得考虑的策略是先在大规模人脸识别数据库中学习一个对遮挡和姿态鲁棒的模型,再进行人脸表情识别迁移学习;4)由于基于深度学习的表情识别方法受很多超参数影响,导致当前人脸表情识别方法的可比性不强,不同的表情识别方法有必要在不同的简单基线方法上进行评测。目前,虽然非受控自然环境下的表情分析得到较快发展,但是上述问题和挑战仍然有待解决。人脸表情分析是一个比较实用的任务,未来发展除了要讨论方法的精度也要关注方法的耗时以及存储消耗,也可以考虑用非受控环境下高精度的人脸运动单元检测结果进行表情类别推断。  相似文献   

10.
视觉理解,如物体检测、语义和实例分割以及动作识别等,在人机交互和自动驾驶等领域中有着广泛的应用并发挥着至关重要的作用。近年来,基于全监督学习的深度视觉理解网络取得了显著的性能提升。然而,物体检测、语义和实例分割以及视频动作识别等任务的数据标注往往需要耗费大量的人力和时间成本,已成为限制其广泛应用的一个关键因素。弱监督学习作为一种降低数据标注成本的有效方式,有望对缓解这一问题提供可行的解决方案,因而获得了较多的关注。围绕视觉弱监督学习,本文将以物体检测、语义和实例分割以及动作识别为例综述国内外研究进展,并对其发展方向和应用前景加以讨论分析。在简单回顾通用弱监督学习模型,如多示例学习(multiple instance learning,MIL)和期望—最大化(expectation-maximization,EM)算法的基础上,针对物体检测和定位,从多示例学习、类注意力图机制等方面分别进行总结,并重点回顾了自训练和监督形式转换等方法;针对语义分割任务,根据不同粒度的弱监督形式,如边界框标注、图像级类别标注、线标注或点标注等,对语义分割研究进展进行总结分析,并主要回顾了基于图像级别类别标注和边界框标注的弱监督实例分割方法;针对视频动作识别,从电影脚本、动作序列、视频级类别标签和单帧标签等弱监督形式,对弱监督视频动作识别的模型与算法进行回顾,并讨论了各种弱监督形式在实际应用中的可行性。在此基础上,进一步讨论视觉弱监督学习面临的挑战和发展趋势,旨在为相关研究提供参考。  相似文献   

11.
Surface defect detection plays a crucial role in the production process to ensure product quality. With the development of Industry 4.0 and smart manufacturing, traditional manual defect detection becomes no longer satisfactory, and deep learning-based technologies are gradually applied to surface defect detection tasks. However, the application of deep learning-based defect detection methods in actual production lines is often constrained by insufficient data, expensive annotations, and limited computing resources. Detection methods are expected to require fewer annotations as well as smaller computational consumption. In this paper, we propose the Self-Supervised Efficient Defect Detector (SEDD), a high-efficiency defect defector based on self-supervised learning strategy and image segmentation. The self-supervised learning strategy with homographic enhancement is employed to ensure that defective samples with annotations are no longer needed in our pipeline, while competitive performance can still be achieved. Based on this strategy, a new surface defect simulation dataset generation method is proposed to solve the problem of insufficient training data. Also, a lightweight structure with the attention module is designed to reduce the computation cost without incurring accuracy. Furthermore, a multi-task auxiliary strategy is employed to reduce segmentation errors of edges. The proposed model has been evaluated with three typical datasets and achieves competitive performance compared with other tested methods, with 98.40% AUC and 74.84% AP on average. Experimental results show that our network has the smallest computational consumption and the highest running speed among the networks tested.  相似文献   

12.
A system that could automatically analyze the facial actions in real time has applications in a wide range of different fields. However, developing such a system is always challenging due to the richness, ambiguity, and the dynamic nature of facial actions. Although a number of research groups attempt to recognize facial action units (AUs) by either improving facial feature extraction techniques, or the AU classification techniques, these methods often recognize AUs or certain AU combinations individually and statically, ignoring the semantic relationships among AUs and the dynamics of AUs. Hence, these approaches cannot always recognize AUs reliably, robustly, and consistently.In this paper, we propose a novel approach that systematically accounts for the relationships among AUs and their temporal evolutions for AU recognition. Specifically, we use a dynamic Bayesian network (DBN) to model the relationships among different AUs. The DBN provides a coherent and unified hierarchical probabilistic framework to represent probabilistic relationships among various AUs and to account for the temporal changes in facial action development. Within our system, robust computer vision techniques are used to obtain AU measurements. And such AU measurements are then applied as evidence to the DBN for inferring various AUs. The experiments show that the integration of AU relationships and AU dynamics with AU measurements yields significant improvement of AU recognition, especially for spontaneous facial expressions and under more realistic environment including illumination variation, face pose variation, and occlusion.  相似文献   

13.
Chen  Jing  Wang  Chenhui  Wang  Kejun  Liu  Meichen 《Applied Intelligence》2022,52(6):6354-6375

Facial action unit (AU) detection has been applied in a wild range of fields, and has attracted great attention over the last decades. Most existing methods employ the predefined regions of interest with same number and range for all samples. However, we find that the flexibility of predefined regions of interest is finite, as the occurrence of different AUs may not be simultaneous and their ranges change with intensity changes. In addition, many AU detection works try to independently design feature extraction modules and classifiers for each AU, which is of high computation cost and ignores the dependency among different AUs. In view of the limited flexibility of predefined regions of interest, we propose difference saliency maps that do not depend on facial landmarks. They are the spatial pixel-wise attentions, where each element represents the importance of the corresponding pixel on the entire image. Therefore, all the regions of interest can be irregular. In addition, in order to solve the problem of high computation cost, we combine group convolution with skip connection to propose a lightweight network that is more suitable for AU detection. All AUs share features and there is only one classifier, so the computation cost and the number of parameters are greatly reduced. In particular, the difference saliency maps and the global feature maps are combined to obtain the regional enhancement features. To maximize the enhancement effect, the down-sampled difference saliency maps are added to multiple blocks of the lightweight network. The enhanced global features are directly sent to the classifier for AU detection. By changing the number of neurons in the classifier, our framework can easily adapt to different datasets. Extensive experimental results show that the proposed framework soundly outperforms the classic deep learning method when evaluated on the DISFA+ and CK+ datasets. After adding the difference saliency maps, the detection result is better than the state-of-the-art AU detection methods. Further experiments demonstrate that our network is more efficient in using parameters, computation complexity and inference time.

  相似文献   

14.
Facial Action Coding System (FACS) is the de facto standard in the analysis of facial expressions. FACS describes expressions in terms of the configuration and strength of atomic units called Action Units: AUs. FACS defines 44 AUs and each AU intensity is defined on a nonlinear scale of five grades. There has been significant progress in the literature on the detection of AUs. However, the companion problem of estimating the AU strengths has not been much investigated. In this work we propose a novel AU intensity estimation scheme applied to 2D luminance and/or 3D surface geometry images. Our scheme is based on regression of selected image features. These features are either non-specific, that is, those inherited from the AU detection algorithm, or are specific in that they are selected for the sole purpose of intensity estimation. For thoroughness, various types of local 3D shape indicators have been considered, such as mean curvature, Gaussian curvature, shape index and curvedness, as well as their fusion. The feature selection from the initial plethora of Gabor moments is instrumented via a regression that optimizes the AU intensity predictions. Our AU intensity estimator is person-independent and when tested on 25 AUs that appear singly or in various combinations, it performs significantly better than the state-of-the-art method which is based on the margins of SVMs designed for AU detection. When evaluated comparatively, one can see that the 2D and 3D modalities have relative merits per upper face and lower face AUs, respectively, and that there is an overall improvement if 2D and 3D intensity estimations are used in fusion.  相似文献   

15.
目的 人脸关键点检测和人脸表情识别两个任务紧密相关。已有对两者结合的工作均是两个任务的直接耦合,忽略了其内在联系。针对这一问题,提出了一个多任务的深度框架,借助关键点特征识别人脸表情。方法 参考inception结构设计了一个深度网络,同时检测关键点并且识别人脸表情,网络在两个任务的监督下,更加关注关键点附近的信息,使得五官周围的特征获得较大响应值。为进一步减小人脸其他区域的噪声对表情识别的影响,利用检测到的关键点生成一张位置注意图,进一步增加五官周围特征的权重,减小人脸边缘区域的特征响应值。复杂表情引起人脸部分区域的形变,增加了关键点检测的难度,为缓解这一问题,引入了中间监督层,在第1级检测关键点的网络中增加较小权重的表情识别任务,一方面,提高复杂表情样本的关键点检测结果,另一方面,使得网络提取更多表情相关的特征。结果 在3个公开数据集:CK+(Cohn-Kanade dataset),Oulu(Oulu-CASIA NIR&VIS facial expression database)和MMI(MMI facial expression database)上与经典方法进行比较,本文方法在CK+数据集上的识别准确率取得了最高值,在Oulu和MMI数据集上的识别准确率比目前识别率最高的方法分别提升了0.14%和0.54%。结论 实验结果表明了引入关键点信息的有效性:多任务的卷积神经网络表情识别准确率高于单任务的传统卷积神经网络。同时,引入注意力模型也提升了多任务网络中表情的识别率。  相似文献   

16.
目的 输电线路金具种类繁多、用处多样,与导线和杆塔安全密切相关。评估金具运行状态并实现故障诊断,需对输电线路金具目标进行精确定位和识别,然而随着无人机巡检采集的数据逐渐增多,将全部数据进行人工标注愈发困难。针对无标注数据无法有效利用的问题,提出一种基于自监督E-Swin Transformer (efficient shifted windows Transformer)的输电线路金具检测模型,充分利用无标注数据提高检测精度。方法 首先,为了减少自注意力的计算量、提高模型计算效率,对Swin Transformer自注意力计算进行优化,提出一种高效的主干网络E-Swin。然后,为了利用无标注金具数据加强特征提取效果,针对E-Swin设计轻量化的自监督方法,并进行预训练。最后,为了提高检测定位精度,采用一种添加额外分支的检测头,并结合预训练之后的主干网络构建检测模型,利用少量有标注的数据进行微调训练,得到最终检测结果。结果 实验结果表明,在输电线路金具数据集上,本文模型的各目标平均检测精确度(AP50)为88.6%,相比传统检测模型提高了10%左右。结论 本文改进主干网络的自注意力计算,并采用自监督学习,使模型高效提取特征,实现无标注数据的有效利用,构建的金具检测模型为解决输电线路金具检测的数据利用问题提供了新思路。  相似文献   

17.
In intensive care units in hospitals, it has been recently shown that enormous improvements in patient outcomes can be gained from the medical staff periodically monitoring patient pain levels. However, due to the burden/stress that the staff are already under, this type of monitoring has been difficult to sustain so an automatic solution could be an ideal remedy. Using an automatic facial expression system to do this represents an achievable pursuit as pain can be described via a number of facial action units (AUs). To facilitate this work, the “University of Northern British Columbia-McMaster Shoulder Pain Expression Archive Database” was collected which contains video of participant's faces (who were suffering from shoulder pain) while they were performing a series of range-of-motion tests. Each frame of this data was AU coded by certified FACS coders, and self-report and observer measures at the sequence level were taken as well. To promote and facilitate research into pain and augmentcurrent datasets, we have publicly made available a portion of this database, which includes 200 sequences across 25 subjects, containing more than 48,000 coded frames of spontaneous facial expressions with 66-point AAM tracked facial feature landmarks. In addition to describing the data distribution, we give baseline pain and AU detection results on a frame-by-frame basis at the binary-level (i.e. AU vs. no-AU and pain vs. no-pain) using our AAM/SVM system. Another contribution we make is classifying pain intensities at the sequence-level by using facial expressions and 3D head pose changes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号