首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
基于注意力感知和语义感知的RGB-D室内图像语义分割算法   总被引:1,自引:0,他引:1  
近年来,全卷积神经网络有效提升了语义分割任务的准确率.然而,由于室内环境的复杂性,室内场景语义分割仍然是一个具有挑战性的问题.随着深度传感器的出现,人们开始考虑利用深度信息提升语义分割效果.以往的研究大多简单地使用等权值的拼接或求和操作来融合RGB特征和深度特征,未能充分利用RGB特征与深度特征之间的互补信息.本文提出一种基于注意力感知和语义感知的网络模型ASNet(Attention-aware and Semantic-aware Network).通过引入注意力感知多模态融合模块和语义感知多模态融合模块,有效地融合多层次的RGB特征和深度特征.其中,在注意力感知多模态融合模块中,本文设计了一种跨模态注意力机制,RGB特征和深度特征利用互补信息相互指导和优化,从而提取富含空间位置信息的特征表示.另外,语义感知多模态融合模块通过整合语义相关的RGB特征通道和深度特征通道,建模多模态特征之间的语义依赖关系,提取更精确的语义特征表示.本文将这两个多模态融合模块整合到一个带有跳跃连接的双分支编码-解码网络模型中.同时,网络在训练时采用深层监督策略,在多个解码层上进行监督学习.在公开数据集上的实验结果表明,本文算法优于现有的RGB-D图像语义分割算法,在平均精度和平均交并比上分别比近期算法提高了1.9%和1.2%.  相似文献   

2.
Sharp edges are important shape features and their extraction has been extensively studied both on point clouds and surfaces. We consider the problem of extracting sharp edges from a sparse set of colour‐and‐depth (RGB‐D) images. The noise‐ridden depth measurements are challenging for existing feature extraction methods that work solely in the geometric domain (e.g. points or meshes). By utilizing both colour and depth information, we propose a novel feature extraction method that produces much cleaner and more coherent feature lines. We make two technical contributions. First, we show that intensity edges can augment the depth map to improve normal estimation and feature localization from a single RGB‐D image. Second, we designed a novel algorithm for consolidating feature points obtained from multiple RGB‐D images. By utilizing normals and ridge/valley types associated with the feature points, our algorithm is effective in suppressing noise without smearing nearby features.  相似文献   

3.
In recent years, real‐time 3D scanning technology has developed significantly and is now able to capture large environments with considerable accuracy. Unfortunately, the reconstructed geometry still suffers from incompleteness, due to occlusions and lack of view coverage, resulting in unsatisfactory reconstructions. In order to overcome these fundamental physical limitations, we present a novel reconstruction approach based on retrieving objects from a 3D shape database while scanning an environment in real‐time. With this approach, we are able to replace scanned RGB‐D data with complete, hand‐modeled objects from shape databases. We align and scale retrieved models to the input data to obtain a high‐quality virtual representation of the real‐world environment that is quite faithful to the original geometry. In contrast to previous methods, we are able to retrieve objects in cluttered and noisy scenes even when the database contains only similar models, but no exact matches. In addition, we put a strong focus on object retrieval in an interactive scanning context — our algorithm runs directly on 3D scanning data structures, and is able to query databases of thousands of models in an online fashion during scanning.  相似文献   

4.
In traditional illustration the choice of appropriate styles and rendering techniques is guided by the intention of the artist. For illustrative volume visualizations it is difficult to specify the mapping between the 3D data and the visual representation that preserves the intention of the user. The semantic layers concept establishes this mapping with a linguistic formulation of rules that directly map data features to rendering styles. With semantic layers fuzzy logic is used to evaluate the user defined illustration rules in a preprocessing step. In this paper we introduce interaction‐dependent rules that are evaluated for each frame and are therefore computationally more expensive. Enabling interaction‐dependent rules, however, allows the use of a new class of semantics, resulting in more expressive interactive illustrations. We show that the evaluation of the fuzzy logic can be done on the graphics hardware enabling the efficient use of interaction‐dependent semantics. Further we introduce the flat rendering mode and discuss how different rendering parameters are influenced by the rule base. Our approach provides high quality illustrative volume renderings at interactive frame rates, guided by the specification of illustration rules.  相似文献   

5.
We propose a novel approach to robot‐operated active understanding of unknown indoor scenes, based on online RGBD reconstruction with semantic segmentation. In our method, the exploratory robot scanning is both driven by and targeting at the recognition and segmentation of semantic objects from the scene. Our algorithm is built on top of a volumetric depth fusion framework and performs real‐time voxel‐based semantic labeling over the online reconstructed volume. The robot is guided by an online estimated discrete viewing score field (VSF) parameterized over the 3D space of 2D location and azimuth rotation. VSF stores for each grid the score of the corresponding view, which measures how much it reduces the uncertainty (entropy) of both geometric reconstruction and semantic labeling. Based on VSF, we select the next best views (NBV) as the target for each time step. We then jointly optimize the traverse path and camera trajectory between two adjacent NBVs, through maximizing the integral viewing score (information gain) along path and trajectory. Through extensive evaluation, we show that our method achieves efficient and accurate online scene parsing during exploratory scanning.  相似文献   

6.
图像语义分类的树结构SVM方法   总被引:1,自引:0,他引:1  
印勇  吕轶超 《计算机工程与应用》2012,48(12):186-189,201
为了减小低层视觉特征和高层语义之间存在的"语义鸿沟",提出一种采用树结构支持向量机实现图像底层视觉特征到高层语义的映射方法。利用二叉树结构构建支持向量机(SVM),在SVM核函数空间利用距离作为树节点处的分类度量。二叉树的结构可以大大减小语义分类的时间,而将距离较大的语义类先分离开保证了语义分类具有较高的准确率。实验证明,该方法在保证准确率的同时可以在较大程度上缩短分类检索时间。  相似文献   

7.
The wide availability of affordable RGB-D sensors changes the landscape of indoor scene analysis. Years of research on simultaneous localization and mapping (SLAM) have made it possible to merge multiple RGB-D images into a single point cloud and provide a 3D model for a complete indoor scene. However, these reconstructed models only have geometry information, not including semantic knowledge. The advancements in robot autonomy and capabilities for carrying out more complex tasks in unstructured environments can be greatly enhanced by endowing environment models with semantic knowledge. Towards this goal, we propose a novel approach to generate 3D semantic maps for an indoor scene. Our approach creates a 3D reconstructed map from a RGB-D image sequence firstly, then we jointly infer the semantic object category and structural class for each point of the global map. 12 object categories (e.g. walls, tables, chairs) and 4 structural classes (ground, structure, furniture and props) are labeled in the global map. In this way, we can totally understand both the object and structure information. In order to get semantic information, we compute semantic segmentation for each RGB-D image and merge the labeling results by a Dense Conditional Random Field. Different from previous techniques, we use temporal information and higher-order cliques to enforce the label consistency for each image labeling result. Our experiments demonstrate that temporal information and higher-order cliques are significant for the semantic mapping procedure and can improve the precision of the semantic mapping results.  相似文献   

8.
This paper describes a novel real‐time end‐to‐end system for facial expression transformation, without the need of any driving source. Its core idea is to directly generate desired and photo‐realistic facial expressions on top of input monocular RGB video. Specifically, an unpaired learning framework is developed to learn the mapping between any two facial expressions in the facial blendshape space. Then, it automatically transforms the source expression in an input video clip to a specified target expression through the combination of automated 3D face construction, the learned bi‐directional expression mapping and automated lip correction. It can be applied to new users without additional training. Its effectiveness is demonstrated through many experiments on faces from live and online video, with different identities, ages, speeches and expressions.  相似文献   

9.
We present a novel image‐based technique for modeling complex unfoliaged trees. Existing tree modeling tools either require capturing a large number of views for dense 3D reconstruction or rely on user inputs and botanic rules to synthesize natural‐looking tree geometry. In this paper, we focus on faithfully recovering real instead of realistically‐looking tree geometry from a sparse set of images. Our solution directly integrates 2D/3D tree topology as shape priors into the modeling process. For each input view, we first estimate a 2D skeleton graph from its matte image and then find a 2D skeleton tree from the graph by imposing tree topology. We develop a simple but effective technique for computing the optimal 3D skeleton tree most consistent with the 2D skeletons. For each edge in the 3D skeleton tree, we further apply volumetric reconstruction to recover its corresponding curved branch. Finally, we use piecewise cylinders to approximate each branch from the volumetric results. We demonstrate our framework on a variety of trees to illustrate the robustness and usefulness of our technique.  相似文献   

10.
行为识别是当前计算机视觉方向中视频理解领域的重要研究课题。从视频中准确提取人体动作的特征并识别动作,能为医疗、安防等领域提供重要的信息,是一个十分具有前景的方向。本文从数据驱动的角度出发,全面介绍了行为识别技术的研究发展,对具有代表性的行为识别方法或模型进行了系统阐述。行为识别的数据分为RGB模态数据、深度模态数据、骨骼模态数据以及融合模态数据。首先介绍了行为识别的主要过程和人类行为识别领域不同数据模态的公开数据集;然后根据数据模态分类,回顾了RGB模态、深度模态和骨骼模态下基于传统手工特征和深度学习的行为识别方法,以及多模态融合分类下RGB模态与深度模态融合的方法和其他模态融合的方法。传统手工特征法包括基于时空体积和时空兴趣点的方法(RGB模态)、基于运动变化和外观的方法(深度模态)以及基于骨骼特征的方法(骨骼模态)等;深度学习方法主要涉及卷积网络、图卷积网络和混合网络,重点介绍了其改进点、特点以及模型的创新点。基于不同模态的数据集分类进行不同行为识别技术的对比分析。通过类别内部和类别之间两个角度对比分析后,得出不同模态的优缺点与适用场景、手工特征法与深度学习法的区别和融合多模态的优势。最后,总结了行为识别技术当前面临的问题和挑战,并基于数据模态的角度提出了未来可行的研究方向和研究重点。  相似文献   

11.
3D gaze tracking from a single RGB camera is very challenging due to the lack of information in determining the accurate gaze target from a monocular RGB sequence. The eyes tend to occupy only a small portion of the video, and even small errors in estimated eye orientations can lead to very large errors in the triangulated gaze target. We overcome these difficulties with a novel lightweight eyeball calibration scheme that determines the user-specific visual axis, eyeball size and position in the head. Unlike the previous calibration techniques, we do not need the ground truth positions of the gaze points. In the online stage, gaze is tracked by a new gaze fitting algorithm, and refined by a 3D gaze regression method to correct for bias errors. Our regression is pre-trained on several individuals and works well for novel users. After the lightweight one-time user calibration, our method operates in real time. Experiments show that our technique achieves state-of-the-art accuracy in gaze angle estimation, and we demonstrate applications of 3D gaze target tracking and gaze retargeting to an animated 3D character.  相似文献   

12.
Full 3D Plant Reconstruction via Intrusive Acquisition   总被引:1,自引:0,他引:1       下载免费PDF全文
Digitally capturing vegetation using off‐the‐shelf scanners is a challenging problem. Plants typically exhibit large self‐occlusions and thin structures which cannot be properly scanned. Furthermore, plants are essentially dynamic, deforming over the time, which yield additional difficulties in the scanning process. In this paper, we present a novel technique for acquiring and modelling of plants and foliage. At the core of our method is an intrusive acquisition approach, which disassembles the plant into disjoint parts that can be accurately scanned and reconstructed offline. We use the reconstructed part meshes as 3D proxies for the reconstruction of the complete plant and devise a global‐to‐local non‐rigid registration technique that preserves specific plant characteristics. Our method is tested on plants of various styles, appearances and characteristics. Results show successful reconstructions with high accuracy with respect to the acquired data.  相似文献   

13.
We propose a novel framework to generate a global texture atlas for a deforming geometry. Our approach distinguishes from prior arts in two aspects. First, instead of generating a texture map for each timestamp to color a dynamic scene, our framework reconstructs a global texture atlas that can be consistently mapped to a deforming object. Second, our approach is based on a single RGB‐D camera, without the need of a multiple‐camera setup surrounding a scene. In our framework, the input is a 3D template model with an RGB‐D image sequence, and geometric warping fields are found using a state‐of‐the‐art non‐rigid registration method [GXW*15] to align the template mesh to noisy and incomplete input depth images. With these warping fields, our multi‐scale approach for texture coordinate optimization generates a sharp and clear texture atlas that is consistent with multiple color observations over time. Our approach is accelerated by graphical hardware and provides a handy configuration to capture a dynamic geometry along with a clean texture atlas. We demonstrate our approach with practical scenarios, particularly human performance capture. We also show that our approach is resilient on misalignment issues caused by imperfect estimation of warping fields and inaccurate camera parameters.  相似文献   

14.
We describe how the pipeline for 3D online reconstruction using commodity depth and image scanning hardware can be made scalable for large spatial extents and high scanning resolutions. Our modified pipeline requires less than 10% of the memory that is required by previous approaches at similar speed and resolution. To achieve this, we avoid storing a 3D distance field and weight map during online scene reconstruction. Instead, surface samples are binned into a high‐resolution binary voxel grid. This grid is used in combination with caching and deferred processing of depth images to reconstruct the scene geometry. For pose estimation, GPU ray‐casting is performed on the binary voxel grid. A one‐to‐one comparison to level‐set ray‐casting in a distance volume indicates slightly lower pose accuracy. To enable unlimited spatial extents and store acquired samples at the appropriate level of detail, we combine a hash map with a hierarchical tree representation.  相似文献   

15.
16.
We present a novel methodology that utilizes four‐dimensional (4D) space deformation to simulate a magnification lens on versatile volume datasets and textured solid models. Compared with other magnification methods (e.g. geometric optics, mesh editing), 4D differential geometry theory and its practices are much more flexible and powerful for preserving shape features (i.e. minimizing angle distortion), and easier to adapt to versatile solid models. The primary advantage of 4D space lies at the following fact: we can now easily magnify the volume of regions of interest (ROIs) from the additional dimension, while keeping the rest region unchanged. To achieve this primary goal, we first embed a 3D volumetric input into 4D space and magnify ROIs in the fourth dimension. Then we flatten the 4D shape back into 3D space to accommodate other typical applications in the real 3D world. In order to enforce distortion minimization, in both steps we devise the high‐dimensional geometry techniques based on rigorous 4D geometry theory for 3D/4D mapping back and forth to amend the distortion. Our system can preserve not only focus region, but also context region and global shape. We demonstrate the effectiveness, robustness and efficacy of our framework with a variety of models ranging from tetrahedral meshes to volume datasets.  相似文献   

17.
In plant phenotyping, there is a demand for high-throughput, non-destructive systems that can accurately analyse various plant traits by measuring features such as plant volume, leaf area, and stem length. Existing vision-based systems either focus on speed using 2D imaging, which is consequently inaccurate, or on accuracy using time-consuming 3D methods. In this paper, we present a computer-vision system for seedling phenotyping that combines best of both approaches by utilizing a fast three-dimensional (3D) reconstruction method. We developed image processing methods for the identification and segmentation of plant organs (stem and leaf) from the 3D plant model. Various measurements of plant features such as plant volume, leaf area, and stem length are estimated based on these plant segments. We evaluate the accuracy of our system by comparing the measurements of our methods with ground truth measurements obtained destructively by hand. The results indicate that the proposed system is very promising.  相似文献   

18.
提出一种基于感兴趣区域特征提取技术的图像情感语义识别模型.着重论述了感兴趣区域的获取、感兴趣区域与非感兴趣区域权重的确定、从RGB颜色空间到HSV颜色空间的转换算法、加权颜色直方图的统计算法以及最终情感聚类的方法.仿真实验结果表明,该模型所实现的底层特征到高层情感语义映射准确率比传统的颜色特征提取技术的图像情感语义识别模型有很大的提高.  相似文献   

19.
The computer graphics and vision communities have dedicated long standing efforts in building computerized tools for reconstructing, tracking, and analyzing human faces based on visual input. Over the past years rapid progress has been made, which led to novel and powerful algorithms that obtain impressive results even in the very challenging case of reconstruction from a single RGB or RGB‐D camera. The range of applications is vast and steadily growing as these technologies are further improving in speed, accuracy, and ease of use. Motivated by this rapid progress, this state‐of‐the‐art report summarizes recent trends in monocular facial performance capture and discusses its applications, which range from performance‐based animation to real‐time facial reenactment. We focus our discussion on methods where the central task is to recover and track a three dimensional model of the human face using optimization‐based reconstruction algorithms. We provide an in‐depth overview of the underlying concepts of real‐world image formation, and we discuss common assumptions and simplifications that make these algorithms practical. In addition, we extensively cover the priors that are used to better constrain the under‐constrained monocular reconstruction problem, and discuss the optimization techniques that are employed to recover dense, photo‐geometric 3D face models from monocular 2D data. Finally, we discuss a variety of use cases for the reviewed algorithms in the context of motion capture, facial animation, as well as image and video editing.  相似文献   

20.
The concrete aging problem has gained more attention in recent years as more bridges and tunnels in the United States lack proper maintenance. Though the Federal Highway Administration requires these public concrete structures to be inspected regularly, on-site manual inspection by human operators is time-consuming and labor-intensive. Conventional inspection approaches for concrete inspection, using RGB image-based thresholding methods, are not able to determine metric information as well as accurate location information for assessed defects for conditions. To address this challenge, we propose a deep neural network (DNN) based concrete inspection system using a quadrotor flying robot (referred to as CityFlyer) mounted with an RGB-D camera. The inspection system introduces several novel modules. Firstly, a visual-inertial fusion approach is introduced to perform camera and robot positioning and structure 3D metric reconstruction. The reconstructed map is used to retrieve the location and metric information of the defects. Secondly, we introduce a DNN model, namely AdaNet, to detect concrete spalling and cracking, with the capability of maintaining robustness under various distances between the camera and concrete surface. In order to train the model, we craft a new dataset, i.e., the concrete structure spalling and cracking (CSSC) dataset, which is released publicly to the research community. Finally, we introduce a 3D semantic mapping method using the annotated framework to reconstruct the concrete structure for visualization. We performed comparative studies and demonstrated that our AdaNet can achieve 8.41% higher detection accuracy than ResNets and VGGs. Moreover, we conducted five field tests, of which three are manual hand-held tests and two are drone-based field tests. These results indicate that our system is capable of performing metric field inspection, and can serve as an effective tool for civil engineers.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号