首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 656 毫秒
1.
Two novel systems computing dense three-dimensional (3-D) scene flow and structure from multiview image sequences are described in this paper. We do not assume rigidity of the scene motion, thus allowing for nonrigid motion in the scene. The first system, integrated model-based system (IMS), assumes that each small local image region is undergoing 3-D affine motion. Non-linear motion model fitting based on both optical flow constraints and stereo constraints is then carried out on each local region in order to simultaneously estimate 3-D motion correspondences and structure. The second system is based on extended gradient-based system (EGS), a natural extension of two-dimensional (2-D) optical flow computation. In this method, a new hierarchical rule-based stereo matching algorithm is first developed to estimate the initial disparity map. Different available constraints under a multiview camera setup are further investigated and utilized in the proposed motion estimation. We use image segmentation information to adopt and maintain the motion and depth discontinuities. Within the framework for EGS, we present two different formulations for 3-D scene flow and structure computation. One formulation assumes that initial disparity map is accurate, while the other does not. Experimental results on both synthetic and real imagery demonstrate the effectiveness of our 3-D motion and structure recovery schemes. Empirical comparison between IMS and EGS is also reported.  相似文献   

2.
Analyzing and capturing articulated hand motion in image sequences   总被引:2,自引:0,他引:2  
Capturing the human hand motion from video involves the estimation of the rigid global hand pose as well as the nonrigid finger articulation. The complexity induced by the high degrees of freedom of the articulated hand challenges many visual tracking techniques. For example, the particle filtering technique is plagued by the demanding requirement of a huge number of particles and the phenomenon of particle degeneracy. This paper presents a novel approach to tracking the articulated hand in video by learning and integrating natural hand motion priors. To cope with the finger articulation, this paper proposes a powerful sequential Monte Carlo tracking algorithm based on importance sampling techniques, where the importance function is based on an initial manifold model of the articulation configuration space learned from motion-captured data. In addition, this paper presents a divide-and-conquer strategy that decouples the hand poses and finger articulations and integrates them in an iterative framework to reduce the complexity of the problem. Our experiments show that this approach is effective and efficient for tracking the articulated hand. This approach can be extended to track other articulated targets.  相似文献   

3.
In this paper, we propose a novel method to achieve both dense 3D reconstruction of the scene and estimation of the camera intrinsic parameters by using coplanarities and other constraints (e.g., orthogonalities or parallelisms) derived from relations between planes in the scene and reflected curves of line lasers captured by a single camera. In our study, we categorize coplanarities in the scene into two types: implicit coplanarities, which can be observed as reflected curves of line lasers, and explicit coplanarities, which are, for example, observed as walls of a building. By using both types of coplanarities, we can construct simultaneous equations and can solve them up to four degrees of freedom. To upgrade the solution to the Euclidean space and estimate the camera intrinsic parameters, we can use metric constraints such as orthogonalities of the planes. Such metric constraints are given by, for example, observing the corners of rectangular boxes in the scene, or using special laser projecting device composed of two line lasers whose laser planes are configured to be perpendicular.  相似文献   

4.

This paper proposes the object depth estimation in real-time, using only a monocular camera in an onboard computer with a low-cost GPU. Our algorithm estimates scene depth from a sparse feature-based visual odometry algorithm and detects/tracks objects’ bounding box by utilizing the existing object detection algorithm in parallel. Both algorithms share their results, i.e., feature, motion, and bounding boxes, to handle static and dynamic objects in the scene. We validate the scene depth accuracy of sparse features with KITTI and its ground-truth depth map made from LiDAR observations quantitatively, and the depth of detected object with the Hyundai driving datasets and satellite maps qualitatively. We compare the depth map of our algorithm with the result of (un-) supervised monocular depth estimation algorithms. The validation shows that our performance is comparable to that of monocular depth estimation algorithms which train depth indirectly (or directly) from stereo image pairs (or depth image), and better than that of algorithms trained with monocular images only, in terms of the error and the accuracy. Also, we confirm that our computational load is much lighter than the learning-based methods, while showing comparable performance.

  相似文献   

5.
The estimation of dense velocity fields from image sequences is basically an ill-posed problem, primarily because the data only partially constrain the solution. It is rendered especially difficult by the presence of motion boundaries and occlusion regions which are not taken into account by standard regularization approaches. In this paper, the authors present a multimodal approach to the problem of motion estimation in which the computation of visual motion is based on several complementary constraints. It is shown that multiple constraints can provide more accurate flow estimation in a wide range of circumstances. The theoretical framework relies on Bayesian estimation associated with global statistical models, namely, Markov random fields. The constraints introduced here aim to address the following issues: optical flow estimation while preserving motion boundaries, processing of occlusion regions, fusion between gradient and feature-based motion constraint equations. Deterministic relaxation algorithms are used to merge information and to provide a solution to the maximum a posteriori estimation of the unknown dense motion field. The algorithm is well suited to a multiresolution implementation which brings an appreciable speed-up as well as a significant improvement of estimation when large displacements are present in the scene. Experiments on synthetic and real world image sequences are reported  相似文献   

6.
In this paper we describe an algorithm to recover the scene structure, the trajectories of the moving objects and the camera motion simultaneously given a monocular image sequence. The number of the moving objects is automatically detected without prior motion segmentation. Assuming that the objects are moving linearly with constant speeds, we propose a unified geometrical representation of the static scene and the moving objects. This representation enables the embedding of the motion constraints into the scene structure, which leads to a factorization-based algorithm. We also discuss solutions to the degenerate cases which can be automatically detected by the algorithm. Extension of the algorithm to weak perspective projections is presented as well. Experimental results on synthetic and real images show that the algorithm is reliable under noise.  相似文献   

7.
We present a novel representation and rendering method for free‐viewpoint video of human characters based on multiple input video streams. The basic idea is to approximate the articulated 3D shape of the human body using a subdivision into textured billboards along the skeleton structure. Billboards are clustered to fans such that each skeleton bone contains one billboard per source camera. We call this representation articulated billboards. In the paper we describe a semi‐automatic, data‐driven algorithm to construct and render this representation, which robustly handles even challenging acquisition scenarios characterized by sparse camera positioning, inaccurate camera calibration, low video resolution, or occlusions in the scene. First, for each input view, a 2D pose estimation based on image silhouettes, motion capture data, and temporal video coherence is used to create a segmentation mask for each body part. Then, from the 2D poses and the segmentation, the actual articulated billboard model is constructed by a 3D joint optimization and compensation for camera calibration errors. The rendering method includes a novel way of blending the textural contributions of each billboard and features an adaptive seam correction to eliminate visible discontinuities between adjacent billboards textures. Our articulated billboards do not only minimize ghosting artifacts known from conventional billboard rendering, but also alleviate restrictions to the setup and sensitivities to errors of more complex 3D representations and multiview reconstruction techniques. Our results demonstrate the flexibility and the robustness of our approach with high quality free‐viewpoint video generated from broadcast footage of challenging, uncontrolled environments.  相似文献   

8.
SLAM(即时定位与地图构建)系统是近年来计算机视觉领域的一大重要课题,其中特征法的SLAM凭借稳定性好、计算效率高的优点成为SLAM算法的主流。目前特征法SLAM主要基于点特征进行。针对基于点特征的视觉里程计依赖于数据质量,相机运动过快时容易跟丢,且生成的特征地图不包含场景结构信息等缺点,提出了一种基于点线结合特征的优化算法。相较于传统基于线段端点的六参数表达方式,算法采用一种四参数的方式表示空间直线,并使用点线特征进行联合图优化估计相机位姿。使用公开数据集和自采集鱼眼影像数据分别进行实验的结果表明,与仅使用点特征的方法相比,该方法可有效改善因相机运动过快产生的跟丢问题,增加轨迹长度,提升位姿估计精度,且生成的稀疏特征地图更能反映场景结构特征。  相似文献   

9.
The classic approach to structure from motion entails a clear separation between motion estimation and structure estimation and between two-dimensional (2D) and three-dimensional (3D) information. For the recovery of the rigid transformation between different views only 2D image measurements are used. To have available enough information, most existing techniques are based on the intermediate computation of optical flow which, however, poses a problem at the locations of depth discontinuities. If we knew where depth discontinuities were, we could (using a multitude of approaches based on smoothness constraints) accurately estimate flow values for image patches corresponding to smooth scene patches; but to know the discontinuities requires solving the structure from motion problem first. This paper introduces a novel approach to structure from motion which addresses the processes of smoothing, 3D motion and structure estimation in a synergistic manner. It provides an algorithm for estimating the transformation between two views obtained by either a calibrated or uncalibrated camera. The results of the estimation are then utilized to perform a reconstruction of the scene from a short sequence of images.The technique is based on constraints on image derivatives which involve the 3D motion and shape of the scene, leading to a geometric and statistical estimation problem. The interaction between 3D motion and shape allows us to estimate the 3D motion while at the same time segmenting the scene. If we use a wrong 3D motion estimate to compute depth, we obtain a distorted version of the depth function. The distortion, however, is such that the worse the motion estimate, the more likely we are to obtain depth estimates that vary locally more than the correct ones. Since local variability of depth is due either to the existence of a discontinuity or to a wrong 3D motion estimate, being able to differentiate between these two cases provides the correct motion, which yields the least varying estimated depth as well as the image locations of scene discontinuities. We analyze the new constraints, show their relationship to the minimization of the epipolar constraint, and present experimental results using real image sequences that indicate the robustness of the method.  相似文献   

10.
A fundamental task of reconstructing non-rigid articulated motion from sequences of unstructured feature points is to solve the problem of feature correspondence and motion estimation. This problem is challenging in high-dimensional configuration spaces. In this paper, we propose a general model-based dynamic point matching algorithm to reconstruct freeform non-rigid articulated movements from data presented solely by sparse feature points. The algorithm integrates key-frame-based self-initialising hierarchial segmental matching with inter-frame tracking to achieve computation effectiveness and robustness in the presence of data noise. A dynamic scheme of motion verification, dynamic key-frame-shift identification and backward parent-segment correction, incorporating temporal coherency embedded in inter-frames, is employed to enhance the segment-based spatial matching. Such a spatial–temporal approach ultimately reduces the ambiguity of identification inherent in a single frame. Performance evaluation is provided by a series of empirical analyses using synthetic data. Testing on motion capture data for a common articulated motion, namely human motion, gave feature-point identification and matching without the need for manual intervention, in buffered real-time. These results demonstrate the proposed algorithm to be a candidate for feature-based real-time reconstruction tasks involving self-resuming tracking for articulated motion.  相似文献   

11.
Existing algorithms for camera calibration and metric reconstruction are not appropriate for image sets containing geometrically transformed images for which we cannot apply the camera constraints such as square or zero-skewed pixels. In this paper, we propose a framework to use scene constraints in the form of camera constraints. Our approach is based on image warping using images of parallelograms. We show that the warped image using parallelograms constrains the camera both intrinsically and extrinsically. Image warping converts the calibration problems of transformed images into the calibration problem with highly constrained cameras. In addition, it is possible to determine affine projection matrices from the images without explicit projective reconstruction. We introduce camera motion constraints of the warped image and a new parameterization of an infinite homography using the warping matrix. Combining the calibration and the affine reconstruction results in the fully metric reconstruction of scenes with geometrically transformed images. The feasibility of the proposed algorithm is tested with synthetic and real data. Finally, examples of metric reconstructions are shown from the geometrically transformed images obtained from the Internet.  相似文献   

12.
We propose a model-based tracking method for articulated objects in monocular video sequences under varying illumination conditions. The tracking method uses estimates of optical flows constructed by projecting model textures into the camera images and comparing the projected textures with the recorded information. An articulated body is modelled in terms of 3D primitives, each possessing a specified texture on its surface. An important step in model-based tracking of 3D objects is the estimation of the pose of the object during the tracking process. The optimal pose is estimated by minimizing errors between the computed optical flow and the projected 2D velocities of the model textures. This estimation uses a least-squares method with kinematic constraints for the articulated object and a perspective camera model. We test our framework with an articulated robot and show results.  相似文献   

13.
This paper presents a novel approach for the classification of planar surfaces in an unorganized point clouds. A feature-based planner surface detection method is proposed which classifies a point cloud data into planar and non-planar points by learning a classification model from an example set of planes. The algorithm performs segmentation of the scene by applying a graph partitioning approach with improved representation of association among graph nodes. The planarity estimation of the points in a scene segment is then achieved by classifying input points as planar points which satisfy planarity constraint imposed by the learned model. The resultant planes have potential application in solving simultaneous localization and mapping problem for navigation of an unmanned-air vehicle. The proposed method is validated on real and synthetic scenes. The real data consist of five datasets recorded by capturing three-dimensional(3D) point clouds when a RGBD camera is moved in five different indoor scenes. A set of synthetic 3D scenes are constructed containing planar and non-planar structures. The synthetic data are contaminated with Gaussian and random structure noise. The results of the empirical evaluation on both the real and the simulated data suggest that the method provides a generalized solution for plane detection even in the presence of the noise and non-planar objects in the scene. Furthermore, a comparative study has been performed between multiple plane extraction methods.  相似文献   

14.
《Real》1996,2(5):285-296
Image stabilization can be used as front-end system for many tasks that require dynamic image analysis, such as navigation and tracking of independently moving objects from a moving platform. We present a fast and robust electronic digital image stabilization system that can handle large image displacements based on a two-dimensional feature-based multi-resolution motion estimation technique. The method tracks a small set of features and estimates the movement of the camera between consecutive frames. Stabilization is achieved by combining all motion from a reference frame and warping the current frame back to the reference. The system has been implemented on parallel pipeline image processing hardware (a Datacube MaxVideo 200) connected to a SUN SPARCstation 20/612 via a VME bus adaptor. Experimental results using video sequences taken from a camera mounted on a vehicle moving on rough terrain show the robustness of the system while running at approximately 20 frames/s.  相似文献   

15.
In this paper, we describe a reconstruction method for multiple motion scenes, which are scenes containing multiple moving objects, from uncalibrated views. Assuming that the objects are moving with constant velocities, the method recovers the scene structure, the trajectories of the moving objects, the camera motion, and the camera intrinsic parameters (except skews) simultaneously. We focus on the case where the cameras have unknown and varying focal lengths while the other intrinsic parameters are known. The number of the moving objects is automatically detected without prior motion segmentation. The method is based on a unified geometrical representation of the static scene and the moving objects. It first performs a projective reconstruction using a bilinear factorization algorithm and, then, converts the projective solution to a Euclidean one by enforcing metric constraints. Experimental results on synthetic and real images are presented.  相似文献   

16.
We present a system that estimates the motion of a stereo head, or a single moving camera, based on video input. The system operates in real time with low delay, and the motion estimates are used for navigational purposes. The front end of the system is a feature tracker. Point features are matched between pairs of frames and linked into image trajectories at video rate. Robust estimates of the camera motion are then produced from the feature tracks using a geometric hypothesize‐and‐test architecture. This generates motion estimates from visual input alone. No prior knowledge of the scene or the motion is necessary. The visual estimates can also be used in conjunction with information from other sources, such as a global positioning system, inertia sensors, wheel encoders, etc. The pose estimation method has been applied successfully to video from aerial, automotive, and handheld platforms. We focus on results obtained with a stereo head mounted on an autonomous ground vehicle. We give examples of camera trajectories estimated in real time purely from images over previously unseen distances (600 m) and periods of time. © 2006 Wiley Periodicals, Inc.  相似文献   

17.
Multi-frame estimation of planar motion   总被引:4,自引:0,他引:4  
Traditional plane alignment techniques are typically performed between pairs of frames. We present a method for extending existing two-frame planar motion estimation techniques into a simultaneous multi-frame estimation, by exploiting multi-frame subspace constraints of planar surfaces. The paper has three main contributions: 1) we show that when the camera calibration does not change, the collection of all parametric image motions of a planar surface in the scene across multiple frames is embedded in a low dimensional linear subspace; 2) we show that the relative image motion of multiple planar surfaces across multiple frames is embedded in a yet lower dimensional linear subspace, even with varying camera calibration; and 3) we show how these multi-frame constraints can be incorporated into simultaneous multi-frame estimation of planar motion, without explicitly recovering any 3D information, or camera calibration. The resulting multi-frame estimation process is more constrained than the individual two-frame estimations, leading to more accurate alignment, even when applied to small image regions.  相似文献   

18.
The problem of determining the camera motion from apparent contours or silhouettes of a priori unknown curved 3D surfaces is considered. In a sequence of images, it is shown how to use the generalized epipolar constraint on apparent contours. One such constraint is obtained for each epipolar tangency point in each image pair. An accurate algorithm for computing the motion is presented based on a maximum likelihood estimate. It is shown how to generate initial estimates on the camera motion using only the tracked contours. It is also shown that in theory the motion can be calculated from the deformation of a single contour. The algorithm has been tested on several real image sequences, for both Euclidean and projective reconstruction. The resulting motion estimate is compared to motion estimates calculated independently using standard feature-based methods. The motion estimate is also used to classify the silhouettes as curves or apparent contours. The statistical evaluation shows that the technique gives accurate and stable results  相似文献   

19.
The image motion of a planar surface between two camera views is captured by a homography (a 2D projective transformation). The homography depends on the intrinsic and extrinsic camera parameters, as well as on the 3D plane parameters. While camera parameters vary across different views, the plane geometry remains the same. Based on this fact, we derive linear subspace constraints on the relative homographies of multiple (⩾ 2) planes across multiple views. The paper has three main contributions: 1) We show that the collection of all relative homographies (homologies) of a pair of planes across multiple views, spans a 4-dimensional linear subspace. 2) We show how this constraint can be extended to the case of multiple planes across multiple views. 3) We show that, for some restricted cases of camera motion, linear subspace constraints apply also to the set of homographies of a single plane across multiple views. All the results derived are true for uncalibrated cameras. The possible utility of these multiview constraints for improving homography estimation and for detecting nonrigid motions are also discussed  相似文献   

20.
In this paper, we describe how geometrically correct and visually realistic shadows may be computed for objects composited into a single view of a target scene. Compared to traditional single view compositing methods, which either do not deal with the shadow effects or manually create the shadows for the composited objects, our approach efficiently utilizes the geometric and photometric constraints extracted from a single target image to synthesize the shadows consistent with the overall target scene for the inserted objects. In particular, we explore (i) the constraints provided by imaged scene structure, e.g. vanishing points of orthogonal directions, for camera calibration and thus explicit determination of the locations of the camera and the light source; (ii) the relatively weaker geometric constraint, the planar homology, that models the imaged shadow relations when explicit camera calibration is not possible; and (iii) the photometric constraints that are required to match the color characteristics of the synthesized shadows with those of the original scene. For each constraint, we demonstrate the working examples followed by our observations. To show the accuracy and the applications of the proposed method, we present the results for a variety of target scenes, including footage from commercial Hollywood movies and 3D video games.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号