首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 25 毫秒
1.
Pose estimation for planar structures   总被引:1,自引:0,他引:1  
We address the registration problem for interactive AR applications. Such applications require a real-time registration process. Although the registration problem has received a lot of attention in the computer vision community, it's far from being solved. Ideally, an AR system should work in all environments without the need to prepare the scene ahead of time, and users should be able to walk anywhere they want. In the past, several AR systems have achieved accurate and fast tracking and registration, putting dots over objects and tracking the dots with a camera. We can also achieve registration by identifying features in the scene that we can carefully measure for real-world coordinates. However, such methods restrict the system's flexibility. Hence, we need to investigate registration methods that work in unprepared environments and reduce the need to know the objects' geometry in the scene. We propose an efficient solution to real-time camera tracking for scenes that contain planar structures. We can consider many types of scene with our method. We show that our system is reliable and we can use it for real-time applications. We also present results demonstrating real-time camera tracking on indoor and outdoor scenes.  相似文献   

2.
3.
In this work, we introduce the ‘mobility‐tree’ construct for high‐level functional representation of complex 3D indoor scenes. In recent years, digital indoor scenes are becoming increasingly popular, consisting of detailed geometry and complex functionalities. These scenes often consist of objects that reoccur in various poses and interrelate with each other. In this work we analyse the reoccurrence of objects in the scene and automatically detect their functional mobilities. ‘Mobility’ analysis denotes the motion capabilities (i.e. degree of freedom) of an object and its subpart which typically relates to their indoor functionalities. We compute an object's mobility by analysing its spatial arrangement, repetitions and relations with other objects and store it in a ‘mobility‐tree’. Repetitive motions in the scenes are grouped in ‘mobility‐groups’, for which we develop a set of sophisticated controllers facilitating semantical high‐level editing operations. We show applications of our mobility analysis to interactive scene manipulation and reorganization, and present results for a variety of indoor scenes.  相似文献   

4.
Pan  Meng  Zhang  Huanrong  Wu  Jiahao  Jin  Zhi 《Multimedia Tools and Applications》2022,81(25):35899-35913

As one of the most crucial tasks of scene perception, Monocular Depth Estimation (MDE) has made considerable development in recent years. Current MDE researchers are interested in the precision and speed of the estimation, but pay less attention to the generalization ability across scenes. For instance, the MDE networks trained on outdoor scenes achieve impressive performance on outdoor scenes but poor performance on indoor scenes, and vice versa. To tackle this problem, we propose a self-distillation MDE framework to improve the generalization ability across different scenes in this paper. Specifically, we design a student encoder that extracts features from two datasets of indoor and outdoor scenes, respectively. After that, we introduce a dissimilarity loss to pull apart encoded features of different scenes in the feature space. Finally, a decoder is adopted to estimate the final depth from encoded features. By doing so, our self-distillation MDE framework can learn the depth estimation of two different datasets. To the best of our knowledge, we are the first one to tackle the generalization problem across datasets of different scenes in the MDE field. Experiments demonstrate that our method reduces the degradation problem when a MDE network is in the face of datasets with complex data distribution. Note that evaluating on two datasets by a single network is more challenging than evaluating on two datasets by two different networks.

  相似文献   

5.
This paper presents an interactive system for quickly designing and previewing colored snapshots of indoor scenes. Different from high-quality 3D indoor scene rendering, which often takes several minutes to render a moderately complicated scene under a specific color theme with high-performance computing devices, our system aims at improving the effectiveness of color theme design of indoor scenes and employs an image colorization approach to efficiently obtain high-resolution snapshots with editable colors. Given several pre-rendered, multi-layer, gray images of the same indoor scene snapshot, our system is designed to colorize and merge them into a single colored snapshot. Our system also assists users in assigning colors to certain objects/components and infers more harmonious colors for the unassigned objects based on pre-collected priors to guide the colorization. The quickly generated snapshots of indoor scenes provide previews of interior design schemes with different color themes, making it easy to determine the personalized design of indoor scenes. To demonstrate the usability and effectiveness of this system, we present a series of experimental results on indoor scenes of different types, and compare our method with a state-of-the-art method for indoor scene material and color suggestion and offline/online rendering software packages.  相似文献   

6.
3D video billboard clouds reconstruct and represent a dynamic three-dimensional scene using displacement-mapped billboards. They consist of geometric proxy planes augmented with detailed displacement maps and combine the generality of geometry-based 3D video with the regularization properties of image-based 3D video. 3D video billboards are an image-based representation placed in the disparity space of the acquisition cameras and thus provide a regular sampling of the scene with a uniform error model. We propose a general geometry filtering framework which generates time-coherent models and removes reconstruction and quantization noise as well as calibration errors. This replaces the complex and time-consuming sub-pixel matching process in stereo reconstruction with a bilateral filter. Rendering is performed using a GPU-accelerated algorithm which generates consistent view-dependent geometry and textures for each individual frame. In addition, we present a semi-automatic approach for modeling dynamic three-dimensional scenes with a set of multiple 3D video billboards clouds.  相似文献   

7.
Mobile robotics has achieved notable progress, however, to increase the complexity of the tasks that mobile robots can perform in natural environments, we need to provide them with a greater semantic understanding of their surrounding. In particular, identifying indoor scenes, such as an Office or a Kitchen, is a highly valuable perceptual ability for an indoor mobile robot, and in this paper we propose a new technique to achieve this goal. As a distinguishing feature, we use common objects, such as Doors or furniture, as a key intermediate representation to recognize indoor scenes. We frame our method as a generative probabilistic hierarchical model, where we use object category classifiers to associate low-level visual features to objects, and contextual relations to associate objects to scenes. The inherent semantic interpretation of common objects allows us to use rich sources of online data to populate the probabilistic terms of our model. In contrast to alternative computer vision based methods, we boost performance by exploiting the embedded and dynamic nature of a mobile robot. In particular, we increase detection accuracy and efficiency by using a 3D range sensor that allows us to implement a focus of attention mechanism based on geometric and structural information. Furthermore, we use concepts from information theory to propose an adaptive scheme that limits computational load by selectively guiding the search for informative objects. The operation of this scheme is facilitated by the dynamic nature of a mobile robot that is constantly changing its field of view. We test our approach using real data captured by a mobile robot navigating in Office and home environments. Our results indicate that the proposed approach outperforms several state-of-the-art techniques for scene recognition.  相似文献   

8.
9.
Algorithm frameworks based on feature point matching are mature and widely used in simultaneous localization and mapping (SLAM). However, in the complex and changeable indoor environment, feature point matching-based SLAM currently has two major problems, namely, decreased accuracy of pose estimation due to the interference caused by dynamic objects to the SLAM system and tracking loss caused by the lack of feature points in weak texture scenes. To address these problems, herein, we present a robust and real-time RGB-D SLAM algorithm that is based on ORBSLAM3. For interference caused by indoor moving objects, we add the improved lightweight object detection network YOLOv4-tiny to detect dynamic regions, and the dynamic features in the dynamic area are then eliminated in the algorithm tracking stage. In the case of indoor weak texture scenes, while extracting point features the system extracts surface features at the same time. The framework fuses point and surface features to track camera pose. Experiments on the public TUM RGB-D data sets show that compared with the ORB-SLAM3 algorithm in highly dynamic scenes, the root mean square error (RMSE) of the absolute path error of the proposed algorithm improved by an average of 94.08%. Camera pose is tracked without loss over time. The algorithm takes an average of 34 ms to track each frame of the picture just with a CPU, which is suitably real-time and practical. The proposed algorithm is compared with other similar algorithms, and it exhibits excellent real-time performance and accuracy. We also used a Kinect camera to evaluate our algorithm in complex indoor environment, and also showed high robustness and real-time. To sum up, our algorithm can not only deal with the interference caused by dynamic objects to the system but also stably run in the open indoor weak texture scene.  相似文献   

10.
In this paper, a novel approach for creating 3D models of building scenes is presented. The proposed method is fully automated and fast, and accurately reconstructs both outdoor images of a building and indoor scenes, with perspective cues in real-time, using only one image. It combines the extracted line segments to identify the vanishing points of the image, the orientation, the different planes that are depicted in the image and concludes whether the image depicts indoor or outdoor scenes. In addition, the proposed method efficiently eliminates the perspective distortion and produces an accurate 3D model of the scene without any intervention from the user. The main innovation of the method is that it uses only one image for the 3D reconstruction, while other state-of-the-art methods rely on the processing of multiple images. A website and a database of 100 images were created to prove the efficiency of the proposed method in terms of time needed for the 3D reconstruction, its automation and 3D model accuracy and can be used by anyone so as to easily produce user-generated 3D content: http://3d-test.iti.gr:8080/3d-test/3D_recon/  相似文献   

11.

In this paper, we propose two approaches to analyze the crowd scenes. The first one is motion units and meta-tracking based approach (MUDAM Approach). In this approach, the scene is divided into a number of dynamic divisions with coherent motion dynamics called the motion units (MUs). By analyzing the relationships between these MUs using a proposed continuation likelihood, the scene entrance and exit gates are retrieved. A meta-tracking procedure is then applied and the scene dominant motion pathways are retrieved. To overcome the limitations of the MUDAM approach, and detect some of the anomalies, that may happen in these scenes, we proposed another new LSTM based approach. In this approach, the scene is divided into a number of static overlapped spatial regions named super regions (SRs), which cover the whole scene. Long Short Term Memory (LSTM) is used in defining a predictive model for each of the scene SRs. Each LSTM predictive model uses its SR tracklets in the training, such that, it can capture the whole motion dynamics of that SR. Using apriori known scene entrance segments, the proposed LSTM predictive models are applied and the scene dominant motion pathways are retrieved. an anomaly metric is formulated to be used with the LSTM predictive models to detect the scene anomalies. Prototypes of our proposed approaches were developed and evaluated on the challenging New York Grand Central station scene, in addition to four other crowded scenes. Four types of anomalies that may happen in the crowded scenes were defined in the context, and our proposed LSTM based approach was used in detecting these anomalies. Experimental results on anomalies detection were applied too on a number of data sets. Ov erall, the proposed approaches managed to outperform the state of the art methods in retrieving the scene gates and common pathways, in addition to detecting motion anomalies.

  相似文献   

12.
We present an optimized pruning algorithm that allows for considerable geometry reduction in large botanical scenes while maintaining high and coherent rendering quality. We improve upon previous techniques by applying model‐specific geometry reduction functions and optimized scaling functions. For this we introduce the use of Precision and Recall (PR) as a measure of quality to rendering and show how PR‐scores can be used to predict better scaling values. We conducted a user‐study letting subjects adjust the scaling value, which shows that the predicted scaling matches the preferred ones. Finally, we extend the originally purely stochastic geometry prioritization for pruning to account for view‐optimized geometry selection, which allows to take global scene information, such as occlusion, into consideration. We demonstrate our method for the rendering of scenes with thousands of complex tree models in real‐time.  相似文献   

13.
With the development of computer vision technologies, 3D reconstruction has become a hotspot. At present, 3D reconstruction relies heavily on expensive equipment and has poor real-time performance. In this paper, we aim at solving the problem of 3D reconstruction of an indoor scene with large vertical span. In this paper, we propose a novel approach for 3D reconstruction of indoor scenes with only a Kinect. Firstly, this method uses a Kinect sensor to get color images and depth images of an indoor scene. Secondly, the combination of scale-invariant feature transform and random sample consensus algorithm is used to determine the transformation matrix of adjacent frames, which can be seen as the initial value of iterative closest point (ICP). Thirdly, we establish the relative coordinate relation between pair-wise frames which are the initial point cloud data by using ICP. Finally, we achieve the 3D visual reconstruction model of indoor scene by the top-down image registration of point cloud data. This approach not only mitigates the sensor perspective restriction and achieves the indoor scene reconstruction of large vertical span, but also develops the fast algorithm of indoor scene reconstruction with large amount of cloud data. The experimental results show that the proposed algorithm has better accuracy, better reconstruction effect, and less running time for point cloud registration. In addition, the proposed method has great potential applied to 3D simultaneous location and mapping.  相似文献   

14.
We present an approach which exploits the coupling between human actions and scene geometry to use human pose as a cue for single-view 3D scene understanding. Our method builds upon recent advances in still-image pose estimation to extract functional and geometric constraints on the scene. These constraints are then used to improve single-view 3D scene understanding approaches. The proposed method is validated on monocular time-lapse sequences from YouTube and still images of indoor scenes gathered from the Internet. We demonstrate that observing people performing different actions can significantly improve estimates of 3D scene geometry.  相似文献   

15.
Detecting elements such as planes in 3D is essential to describe objects for applications such as robotics and augmented reality. While plane estimation is well studied, table-top scenes exhibit a large number of planes and methods often lock onto a dominant plane or do not estimate 3D object structure but only homographies of individual planes. In this paper we introduce MDL to the problem of incrementally detecting multiple planar patches in a scene using tracked interest points in image sequences. Planar patches are reconstructed and stored in a keyframe-based graph structure. In case different motions occur, separate object hypotheses are modelled from currently visible patches and patches seen in previous frames. We evaluate our approach on a standard data set published by the Visual Geometry Group at the University of Oxford [24] and on our own data set containing table-top scenes. Results indicate that our approach significantly improves over the state-of-the-art algorithms.  相似文献   

16.
Natural image understanding is a very active and promising research domain both in psychology for visual perception modelling and in computer science for image retrieval. In this study, we investigate the efficiency of orientation distributions over the whole image in the scale space. The global distribution of the local dominant orientations (LDO) appears to be a powerful feature for discriminating between four semantic categories of real world scenes (indoor scenes, urban scenes, open landscapes, closed landscapes). The selected optimal scale is compatible with psychological experiments and classification performances show that a global representation of dominant orientations is a simple, efficient and compact method for scene recognition.  相似文献   

17.
This paper presents a novel approach for the classification of planar surfaces in an unorganized point clouds. A feature-based planner surface detection method is proposed which classifies a point cloud data into planar and non-planar points by learning a classification model from an example set of planes. The algorithm performs segmentation of the scene by applying a graph partitioning approach with improved representation of association among graph nodes. The planarity estimation of the points in a scene segment is then achieved by classifying input points as planar points which satisfy planarity constraint imposed by the learned model. The resultant planes have potential application in solving simultaneous localization and mapping problem for navigation of an unmanned-air vehicle. The proposed method is validated on real and synthetic scenes. The real data consist of five datasets recorded by capturing three-dimensional(3D) point clouds when a RGBD camera is moved in five different indoor scenes. A set of synthetic 3D scenes are constructed containing planar and non-planar structures. The synthetic data are contaminated with Gaussian and random structure noise. The results of the empirical evaluation on both the real and the simulated data suggest that the method provides a generalized solution for plane detection even in the presence of the noise and non-planar objects in the scene. Furthermore, a comparative study has been performed between multiple plane extraction methods.  相似文献   

18.
Ren  Bo  Wu  Jia-Cheng  Lv  Ya-Lei  Cheng  Ming-Ming  Lu  Shao-Ping 《计算机科学技术学报》2019,34(3):581-593

The Iterative Closest Point (ICP) scheme has been widely used for the registration of surfaces and point clouds. However, when working on depth image sequences where there are large geometric planes with small (or even without) details, existing ICP algorithms are prone to tangential drifting and erroneous rotational estimations due to input device errors. In this paper, we propose a novel ICP algorithm that aims to overcome such drawbacks, and provides significantly stabler registration estimation for simultaneous localization and mapping (SLAM) tasks on RGB-D camera inputs. In our approach, the tangential drifting and the rotational estimation error are reduced by: 1) updating the conventional Euclidean distance term with the local geometry information, and 2) introducing a new camera stabilization term that prevents improper camera movement in the calculation. Our approach is simple, fast, effective, and is readily integratable with previous ICP algorithms. We test our new method with the TUM RGB-D SLAM dataset on state-of-the-art real-time 3D dense reconstruction platforms, i.e., ElasticFusion and Kintinuous. Experiments show that our new strategy outperforms all previous ones on various RGB-D data sequences under different combinations of registration systems and solutions.

  相似文献   

19.
Guo  Lin  Fan  Guoliang 《Multimedia Tools and Applications》2022,81(25):35751-35773

We propose a new method flow that utilizes pixel-level labeling information for instance-level object detection in indoor scenes from RGB-D data. Semantic labeling and instance segmentation are two different paradigms for indoor scene understanding that are usually accomplished separately and independently. We are interested in integrating the two tasks in a synergistic way in order to take advantage of their complementary nature for comprehensive understanding. Our work can capitalize on any deep learning networks used for semantic labeling by treating the intermediate layer as the category-wise local detection output, from which instance segmentation is optimized by jointly considering both the spatial fitness and the relational context encoded by three graphical models, namely, the vertical placement model (VPM), horizontal placement model (HPM) and non-placement model (NPM). VPM, HPM and NPM represent three common but distinct indoor object placement configurations: vertical, horizontal and hanging relationships, respectively. Experimental results on two standard RGB-D datasets show that our method can significantly improve small object segmentation with promising overall performance that is competitive with the state-of-the-art methods.

  相似文献   

20.
We present a method for reshuffle-based 3D interior scene synthesis guided by scene structures. Given several 3D scenes, we form each 3D scene as a structure graph associated with a relationship set. Considering both the object similarity and relation similarity, we then establish a furniture-object-based matching between scene pairs via graph matching. Such a matching allows us to merge the structure graphs into a unified structure, i.e., Augmented Graph (AG). Guided by the AG, we perform scene synthesis by reshuffling objects through three simple operations, i.e., replacing, growing and transfer. A synthesis compatibility measure considering the environment of the furniture objects is also introduced to filter out poor-quality results. We show that our method is able to generate high-quality scene variations and outperforms the state of the art.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号