期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

吕中正刘骊付晓东刘利军黄青松《中国图象图形学报》2022,27(4):1110-1124

目的人体姿态估计旨在识别和定位不同场景图像中的人体关节点并优化关节点定位精度。针对由于服装款式多样、背景干扰和着装姿态多变导致人体姿态估计精度较低的问题,本文以着装场景下时尚街拍图像为例,提出一种着装场景下双分支网络的人体姿态估计方法。方法对输入图像进行人体检测,得到着装人体区域并分别输入姿态表示分支和着装部位分割分支。姿态表示分支通过在堆叠沙漏网络基础上增加多尺度损失和特征融合输出关节点得分图,解决服装款式多样以及复杂背景对关节点特征提取干扰问题,并基于姿态聚类定义姿态类别损失函数,解决着装姿态视角多变问题;着装部位分割分支通过连接残差网络的浅层特征与深层特征进行特征融合得到着装部位得分图。然后使用着装部位分割结果约束人体关节点定位,解决服装对关节点遮挡问题。最后通过姿态优化得到最终的人体姿态估计结果。结果在构建的着装图像数据集上验证了本文方法。实验结果表明,姿态表示分支有效提高了人体关节点定位准确率,着装部位分割分支能有效避免着装场景中人体关节点误定位。在结合着装部位分割优化后,人体姿态估计精度提高至92.5%。结论本文提出的人体姿态估计方法能够有效提高着装场景下的人体姿态估计精度,较好地满足虚拟试穿等实际应用需求。相似文献

2.

基于时空权重姿态运动特征的人体骨架行为识别研究 总被引：1，自引：0，他引：1

丁重阳刘凯李光闫林陈博洋钟育民《计算机学报》2020,43(1):29-40

人体行为识别在视觉领域的广泛应用使得它在过去的几十年里一直都是备受关注的研究热点.近些年来,深度传感器的普及以及基于深度图像实时骨架估测算法的提出,使得基于骨架序列的人体行为识别研究越来越吸引人们的注意.已有的研究工作大部分提取帧内骨架不同关节点的空间域信息和帧间骨架关节点的时间域信息来表征行为序列,但没有考虑到不同关节点和姿态对判定行为类别所起作用是不同的.因此本文提出了一种基于时空权重姿态运动特征的行为识别方法,采用双线性分类器迭代计算得到关节点和静止姿态相对于该类别动作的权重,确定那些信息量大的关节点和姿态;同时,为了对行为特征进行更好的时序分析,本文引入了动态时间规整和傅里叶时间金字塔算法进行时序建模,最后采用支持向量机完成行为分类.在多个数据集上的实验结果表明,该方法与其它一些方法相比,表现出了相当大的竞争力,甚至更好的识别效果. 相似文献

3.

Articulated Billboards for Video‐based Rendering

Marcel Germann Alexander Hornung Richard Keiser Remo Ziegler Stephan Würmlin Markus Gross 《Computer Graphics Forum》2010,29(2):585-594

We present a novel representation and rendering method for free‐viewpoint video of human characters based on multiple input video streams. The basic idea is to approximate the articulated 3D shape of the human body using a subdivision into textured billboards along the skeleton structure. Billboards are clustered to fans such that each skeleton bone contains one billboard per source camera. We call this representation articulated billboards. In the paper we describe a semi‐automatic, data‐driven algorithm to construct and render this representation, which robustly handles even challenging acquisition scenarios characterized by sparse camera positioning, inaccurate camera calibration, low video resolution, or occlusions in the scene. First, for each input view, a 2D pose estimation based on image silhouettes, motion capture data, and temporal video coherence is used to create a segmentation mask for each body part. Then, from the 2D poses and the segmentation, the actual articulated billboard model is constructed by a 3D joint optimization and compensation for camera calibration errors. The rendering method includes a novel way of blending the textural contributions of each billboard and features an adaptive seam correction to eliminate visible discontinuities between adjacent billboards textures. Our articulated billboards do not only minimize ghosting artifacts known from conventional billboard rendering, but also alleviate restrictions to the setup and sensitivities to errors of more complex 3D representations and multiview reconstruction techniques. Our results demonstrate the flexibility and the robustness of our approach with high quality free‐viewpoint video generated from broadcast footage of challenging, uncontrolled environments. 相似文献

4.

Automatic reconstruction of 3D human motion pose from uncalibrated monocular video sequences based on markerless human motion tracking

Beiji Zou Author Vitae Author Vitae Cao Shi Umugwaneza Marie Providence 《Pattern recognition》2009,42(7):1559-1571

We present a method to reconstruct human motion pose from uncalibrated monocular video sequences based on the morphing appearance model matching. The human pose estimation is made by integrated human joint tracking with pose reconstruction in depth-first order. Firstly, the Euler angles of joint are estimated by inverse kinematics based on human skeleton constrain. Then, the coordinates of pixels in the body segments in the scene are determined by forward kinematics, by projecting these pixels in the scene onto the image plane under the assumption of perspective projection to obtain the region of morphing appearance model in the image. Finally, the human motion pose can be reconstructed by histogram matching. The experimental results show that this method can obtain favorable reconstruction results on a number of complex human motion sequences. 相似文献

5.

Human Pose Tracking in Monocular Sequence Using Multilevel Structured Models

Lee Mun Wai Nevatia Ramakant 《IEEE transactions on pattern analysis and machine intelligence》2009,31(1):27-38

Tracking human body poses in monocular video has many important applications. The problem is challenging in realistic scenes due to background clutter, variation in human appearance and self-occlusion. The complexity of pose tracking is further increased when there are multiple people whose bodies may inter-occlude. We proposed a three-stage approach with multi-level state representation that enables a hierarchical estimation of 3D body poses. Our method addresses various issues including automatic initialization, data association, self and inter-occlusion. At the first stage, humans are tracked as foreground blobs and their positions and sizes are coarsely estimated. In the second stage, parts such as face, shoulders and limbs are detected using various cues and the results are combined by a grid-based belief propagation algorithm to infer 2D joint positions. The derived belief maps are used as proposal functions in the third stage to infer the 3D pose using data-driven Markov chain Monte Carlo. Experimental results on several realistic indoor video sequences show that the method is able to track multiple persons during complex movement including sitting and turning movements with self and inter-occlusion. 相似文献

6.

基于sift特征匹配的人体上半身三维运动跟踪

栗涛陈姝《计算机仿真》2012,29(1):202-205

研究人体姿态与视频优化跟踪问题,单目视频缺少深度信息,使得单目视频的人体运动跟踪难以实现三维姿态恢复问题。为解决上述问题,提出了一种利用sift特征尺度不变性的优点进行人体上半身三维运动跟踪的算法。在跟踪过程中先计算初始匹配sift特征点对,然后反复迭代出除误匹配点,消除误差,最后求解由两个匹配sift特征组成的方程组得到胸部关节的位姿,根据人体骨骼模型采用深度遍历依次恢复其它关节的姿态。实验结果表明,系统能够对人体上半身运动进行比较准确的三维运动跟踪。相似文献

7.

Real-time 3D human pose recovery from a single depth image using principal direction analysis

Dong-Luong Dinh Myeong-Jun Lim Nguyen Duc Thang Sungyoung Lee Tae-Seong Kim 《Applied Intelligence》2014,41(2):473-486

In this paper, we present a novel approach to recover a 3D human pose in real-time from a single depth image using principal direction analysis (PDA). Human body parts are first recognized from a human depth silhouette via trained random forests (RFs). PDA is applied to each recognized body part, which is presented as a set of points in 3D, to estimate its principal direction. Finally, a 3D human pose is recovered by mapping the principal direction to each body part of a 3D synthetic human model. We perform both quantitative and qualitative evaluations of our proposed 3D human pose recovering methodology. We show that our proposed approach has a low average reconstruction error of 7.07 degrees for four key joint angles and performs more reliably on a sequence of unconstrained poses than conventional methods. In addition, our methodology runs at a speed of 20 FPS on a standard PC, indicating that our system is suitable for real-time applications. Our 3D pose recovery methodology is applicable to applications ranging from human computer interactions to human activity recognition. 相似文献

8.

基于单目视频运动跟踪的三维人体动画

吴玥田兴彦《微计算机应用》2008,29(5):42-47

针对传统人体动画制作成本高、人体运动受捕获设备限制等缺陷,提出了一种基于单目视频运动跟踪的三维人体动画方法。首先给出了系统实现框架,然后采用比例正交投影模型及人体骨架模型来恢复关节的三维坐标,关节的旋转欧拉角由逆运动学计算得到,最后采用H-anim标准对人体建模,由关节欧拉角驱动虚拟人产生三维人体动画。实验结果表明,该系统能够对人体运动进行准确的跟踪和三维重建,可应用于人体动画制作领域。相似文献

9.

Recovering 3D human body configurations using shape contexts 总被引：3，自引：0，他引：3

Mori G Malik J 《IEEE transactions on pattern analysis and machine intelligence》2006,28(7):1052-1062

The problem we consider in this paper is to take a single two-dimensional image containing a human figure, locate the joint positions, and use these to estimate the body configuration and pose in three-dimensional space. The basic approach is to store a number of exemplar 2D views of the human body in a variety of different configurations and viewpoints with respect to the camera. On each of these stored views, the locations of the body joints (left elbow, right knee, etc.) are manually marked and labeled for future use. The input image is then matched to each stored view, using the technique of shape context matching in conjunction with a kinematic chain-based deformation model. Assuming that there is a stored view sufficiently similar in configuration and pose, the correspondence process would succeed. The locations of the body joints are then transferred from the exemplar view to the test shape. Given the 2D joint locations, the 3D body configuration and pose are then estimated using an existing algorithm. We can apply this technique to video by treating each frame independently - tracking just becomes repeated recognition. We present results on a variety of data sets. 相似文献

10.

适用于单目视频的无标记三维人体运动跟踪 总被引：2，自引：2，他引：0

Zou Beiji 陈姝彭小宁 Shi Cao 《计算机辅助设计与图形学学报》2008,20(8)

在无标记人体运动跟踪过程中,由于被跟踪目标缺乏明显的特征以及背景复杂而使得跟踪到的人体运动姿态与真实值偏差较大,不能进行长序列视频跟踪.针对这一现象,提出一种基于形变外观模板匹配进行单目视频的三维人体运动跟踪算法,其中所用的人体外观模型由三维人体骨骼模型及二维纸板模型组成.首先根据人体骨骼比例约束采用逆运动学计算出关节旋转欧拉角;然后利用正向运动学求得纸板模型中像素在三维空间中的坐标,将这些像素根据摄像机成像模型投影到二维图像中得到形变外观模板;最后采用直方图匹配得到人体运动跟踪结果.实验结果表明,该算法对于一些复杂的长序列人体运动能够得到较为理想的跟踪结果,可应用于人机交互和动画制作等领域. 相似文献

11.

Tri-view morphing

Jiangjian Xiao Mubarak Shah 《Computer Vision and Image Understanding》2004,96(3):345

This paper presents an efficient image-based approach to navigate a scene based on only three wide-baseline uncalibrated images without the explicit use of a 3D model. After automatically recovering corresponding points between each pair of images, an accurate trifocal plane is extracted from the trifocal tensor of these three images. Next, based on a small number of feature marks using a friendly GUI, the correct dense disparity maps are obtained by using our trinocular-stereo algorithm. Employing the barycentric warping scheme with the computed disparity, we can generate an arbitrary novel view within a triangle spanned by three camera centers. Furthermore, after self-calibration of the cameras, 3D objects can be correctly augmented into the virtual environment synthesized by the tri-view morphing algorithm. Three applications of the tri-view morphing algorithm are demonstrated. The first one is 4D video synthesis, which can be used to fill in the gap between a few sparsely located video cameras to synthetically generate a video from a virtual moving camera. This synthetic camera can be used to view the dynamic scene from a novel view instead of the original static camera views. The second application is multiple view morphing, where we can seamlessly fly through the scene over a 2D space constructed by more than three cameras. The last one is dynamic scene synthesis using three still images, where several rigid objects may move in any orientation or direction. After segmenting three reference frames into several layers, the novel views in the dynamic scene can be generated by applying our algorithm. Finally, the experiments are presented to illustrate that a series of photo-realistic virtual views can be generated to fly through a virtual environment covered by several static cameras. 相似文献

12.

3D human pose estimation in motion based on multi-stage regression

《Displays》2021

3D human pose estimation in motion is a hot research direction in the field of computer vision. However, the performance of the algorithm is affected by the complexity of 3D spatial information, self-occlusion of human body, mapping uncertainty and other problems. In this paper, we propose a 3D human joint localization method based on multi-stage regression depth network and 2D to 3D point mapping algorithm. First of all, we use a single RGB image as the input, through the introduction of heatmap and multi-stage regression to constantly optimize the coordinates of human joint points. Then we input the 2D joint points into the mapping network for calculation, and get the coordinates of 3D human body joint points, and then to complete the 3D human body pose estimation task. The MPJPE of the algorithm in Human3.6 M dataset is 40.7. The evaluation of dataset shows that our method has obvious advantages. 相似文献

13.

A model-based approach for estimating human 3D poses in static images 总被引：2，自引：0，他引：2

Lee MW Cohen I 《IEEE transactions on pattern analysis and machine intelligence》2006,28(6):905-916

Estimating human body poses in static images is important for many image understanding applications including semantic content extraction and image database query and retrieval. This problem is challenging due to the presence of clutter in the image, ambiguities in image observation, unknown human image boundary, and high-dimensional state space due to the complex articulated structure of the human body. Human pose estimation can be made more robust by integrating the detection of body components such as face and limbs, with the highly constrained structure of the articulated body. In this paper, a data-driven approach based on Markov chain Monte Carlo (DD-MCMC) is used, where component detection results generate state proposals for 3D pose estimation. To translate these observations into pose hypotheses, we introduce the use of "proposal maps," an efficient way of consolidating the evidence and generating 3D pose candidates during the MCMC search. Experimental results on a set of test images show that the method is able to estimate the human pose in static images of real scenes. 相似文献

14.

Learning Latent Representations of 3D Human Pose with Deep Neural Networks

Isinsu Katircioglu Bugra Tekin Mathieu Salzmann Vincent Lepetit Pascal Fua 《International Journal of Computer Vision》2018,126(12):1326-1341

Most recent approaches to monocular 3D pose estimation rely on Deep Learning. They either train a Convolutional Neural Network to directly regress from an image to a 3D pose, which ignores the dependencies between human joints, or model these dependencies via a max-margin structured learning framework, which involves a high computational cost at inference time. In this paper, we introduce a Deep Learning regression architecture for structured prediction of 3D human pose from monocular images or 2D joint location heatmaps that relies on an overcomplete autoencoder to learn a high-dimensional latent pose representation and accounts for joint dependencies. We further propose an efficient Long Short-Term Memory network to enforce temporal consistency on 3D pose predictions. We demonstrate that our approach achieves state-of-the-art performance both in terms of structure preservation and prediction accuracy on standard 3D human pose estimation benchmarks. 相似文献

15.

Detection and Tracking of Multiple,Partially Occluded Humans by Bayesian Combination of Edgelet based Part Detectors 总被引：6，自引：1，他引：5

Bo?Wu Email author Ram?Nevatia 《International Journal of Computer Vision》2007,75(2):247-266

Detection and tracking of humans in video streams is important for many applications. We present an approach to automatically detect and track multiple, possibly partially occluded humans in a walking or standing pose from a single camera, which may be stationary or moving. A human body is represented as an assembly of body parts. Part detectors are learned by boosting a number of weak classifiers which are based on edgelet features. Responses of part detectors are combined to form a joint likelihood model that includes an analysis of possible occlusions. The combined detection responses and the part detection responses provide the observations used for tracking. Trajectory initialization and termination are both automatic and rely on the confidences computed from the detection responses. An object is tracked by data association and meanshift methods. Our system can track humans with both inter-object and scene occlusions with static or non-static backgrounds. Evaluation results on a number of images and videos and comparisons with some previous methods are given. Electronic Supplementary Material Supplementary material is available in the online version of this article at 相似文献

16.

结合稀疏表示和深度学习的视频中3D人体姿态估计

下载免费PDF全文

王伟楠张荣郭立君《中国图象图形学报》2020,25(3):456-467

目的 2D姿态估计的误差是导致3D人体姿态估计产生误差的主要原因,如何在2D误差或噪声干扰下从2D姿态映射到最优、最合理的3D姿态,是提高3D人体姿态估计的关键。本文提出了一种稀疏表示与深度模型联合的3D姿态估计方法,以将3D姿态空间几何先验与时间信息相结合,达到提高3D姿态估计精度的目的。方法利用融合稀疏表示的3D可变形状模型得到单帧图像可靠的3D初始值。构建多通道长短时记忆MLSTM（multi-channel long short term memory）降噪编/解码器,将获得的单帧3D初始值以时间序列形式输入到其中,利用MLSTM降噪编/解码器学习相邻帧之间人物姿态的时间依赖关系,并施加时间平滑约束,得到最终优化的3D姿态。结果在Human3.6M数据集上进行了对比实验。对于两种输入数据：数据集给出的2D坐标和通过卷积神经网络获得的2D估计坐标,相比于单帧估计,通过MLSTM降噪编/解码器优化后的视频序列平均重构误差分别下降了12.6%,13%;相比于现有的基于视频的稀疏模型方法,本文方法对视频的平均重构误差下降了6.4%,9.1%。对于2D估计坐标数据,相比于现有的深度模型方法,本文方法对视频的平均重构误差下降了12.8%。结论本文提出的基于时间信息的MLSTM降噪编/解码器与稀疏模型相结合,有效利用了3D姿态先验知识,视频帧间人物姿态连续变化的时间和空间依赖性,一定程度上提高了单目视频3D姿态估计的精度。相似文献

17.

基于深度数据的人体动作识别方法

下载免费PDF全文

王鑫沃波海管秋陈胜勇《中国图象图形学报》2014,19(6)

本文提出了一个基于流形学习的动作识别框架,用来识别深度图像序列中的人体行为。本文从Kinect设备获得的深度信息中评估出人体的关节点信息,并用相对关节点位置差作为人体特征表达。在训练阶段,本文利用Lapacian eigenmaps(LE)流形学习对高维空间下的训练集进行降维,得到低维隐空间下的运动模型。在识别阶段,本文用最近邻差值方法将测试序列映射到低维流形空间中去,然后进行匹配计算。在匹配过程中,通过使用改进的Hausdorff距离对低维空间下测试序列和训练运动集的吻合度和相似度进行度量。本文用Kinect设备捕获的数据进行了实验,取得了良好的效果;同时本文也在MSR Action3D数据库上进行了测试,结果表明在训练样本较多情况下,本文识别效果优于以往方法。实验结果表明本文所提的方法适用于基于深度图像序列的人体动作识别。相似文献

18.

Real-time 3D motion capture by monocular vision and virtual rendering

David Antonio Gómez Jáuregui Patrick Horain 《Machine Vision and Applications》2017,28(8):839-858

Networked 3D virtual environments allow multiple users to interact over the Internet by means of avatars and to get some feeling of a virtual telepresence. However, avatar control may be tedious. 3D sensors for motion capture systems based on 3D sensors have reached the consumer market, but webcams remain more widespread and cheaper. This work aims at animating a user’s avatar by real-time motion capture using a personal computer and a plain webcam. In a classical model-based approach, we register a 3D articulated upper-body model onto video sequences and propose a number of heuristics to accelerate particle filtering while robustly tracking user motion. Describing the body pose using wrists 3D positions rather than joint angles allows efficient handling of depth ambiguities for probabilistic tracking. We demonstrate experimentally the robustness of our 3D body tracking by real-time monocular vision, even in the case of partial occlusions and motion in the depth direction. 相似文献

19.

Combining 3D flow fields with silhouette-based human motion capture for immersive video

Christian Theobalt Joel Carranza Marcus A. Magnor Hans-Peter Seidel 《Graphical Models》2004,66(6):333-351

In recent years, the convergence of computer vision and computer graphics has put forth a new field of research that focuses on the reconstruction of real-world scenes from video streams. To make immersive 3D video reality, the whole pipeline spanning from scene acquisition over 3D video reconstruction to real-time rendering needs to be researched. In this paper, we describe latest advancements of our system to record, reconstruct and render free-viewpoint videos of human actors. We apply a silhouette-based non-intrusive motion capture algorithm making use of a 3D human body model to estimate the actor’s parameters of motion from multi-view video streams. A renderer plays back the acquired motion sequence in real-time from any arbitrary perspective. Photo-realistic physical appearance of the moving actor is obtained by generating time-varying multi-view textures from video. This work shows how the motion capture sub-system can be enhanced by incorporating texture information from the input video streams into the tracking process. 3D motion fields are reconstructed from optical flow that are used in combination with silhouette matching to estimate pose parameters. We demonstrate that a high visual quality can be achieved with the proposed approach and validate the enhancements caused by the the motion field step. 相似文献

20.

Parsing human skeletons in an operating room

Vasileios Belagiannis Xinchao Wang Horesh Beny Ben Shitrit Kiyoshi Hashimoto Ralf Stauder Yoshimitsu Aoki Michael Kranzfelder Armin Schneider Pascal Fua Slobodan Ilic Hubertus Feussner Nassir Navab 《Machine Vision and Applications》2016,27(7):1035-1046

Multiple human pose estimation is an important yet challenging problem. In an operating room (OR) environment, the 3D body poses of surgeons and medical staff can provide important clues for surgical workflow analysis. For that purpose, we propose an algorithm for localizing and recovering body poses of multiple human in an OR environment under a multi-camera setup. Our model builds on 3D Pictorial Structures and 2D body part localization across all camera views, using convolutional neural networks (ConvNets). To evaluate our algorithm, we introduce a dataset captured in a real OR environment. Our dataset is unique, challenging and publicly available with annotated ground truths. Our proposed algorithm yields to promising pose estimation results on this dataset. 相似文献