首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 24 毫秒
1.
We present a novel approach to optimally retarget videos for varied displays with differing aspect ratios by preserving salient scene content discovered via eye tracking. Our algorithm performs editing with cut, pan and zoom operations by optimizing the path of a cropping window within the original video while seeking to (i) preserve salient regions, and (ii) adhere to the principles of cinematography. Our approach is (a) content agnostic as the same methodology is employed to re‐edit a wide‐angle video recording or a close‐up movie sequence captured with a static or moving camera, and (b) independent of video length and can in principle re‐edit an entire movie in one shot. Our algorithm consists of two steps. The first step employs gaze transition cues to detect time stamps where new cuts are to be introduced in the original video via dynamic programming. A subsequent step optimizes the cropping window path (to create pan and zoom effects), while accounting for the original and new cuts. The cropping window path is designed to include maximum gaze information, and is composed of piecewise constant, linear and parabolic segments. It is obtained via L(1) regularized convex optimization which ensures a smooth viewing experience. We test our approach on a wide variety of videos and demonstrate significant improvement over the state‐of‐the‐art, both in terms of computational complexity and qualitative aspects. A study performed with 16 users confirms that our approach results in a superior viewing experience as compared to gaze driven re‐editing [ JSSH15 ] and letterboxing methods, especially for wide‐angle static camera recordings.  相似文献   

2.
In this paper, we propose a unified neural network for panoptic segmentation, a task aiming to achieve more fine‐grained segmentation. Following existing methods combining semantic and instance segmentation, our method relies on a triple‐branch neural network for tackling the unifying work. In the first stage, we adopt a ResNet50 with a feature pyramid network (FPN) as shared backbone to extract features. Then each branch leverages the shared feature maps and serves as the stuff, things, or mask branch. Lastly, the outputs are fused following a well‐designed strategy. Extensive experimental results on MS‐COCO dataset demonstrate that our approach achieves a competitive Panoptic Quality (PQ) metric score with the state of the art.  相似文献   

3.
In this paper, we propose PCPNET , a deep‐learning based approach for estimating local 3D shape properties in point clouds. In contrast to the majority of prior techniques that concentrate on global or mid‐level attributes, e.g., for shape classification or semantic labeling, we suggest a patch‐based learning method, in which a series of local patches at multiple scales around each point is encoded in a structured manner. Our approach is especially well‐adapted for estimating local shape properties such as normals (both unoriented and oriented) and curvature from raw point clouds in the presence of strong noise and multi‐scale features. Our main contributions include both a novel multi‐scale variant of the recently proposed PointNet architecture with emphasis on local shape information, and a series of novel applications in which we demonstrate how learning from training data arising from well‐structured triangle meshes, and applying the trained model to noisy point clouds can produce superior results compared to specialized state‐of‐the‐art techniques. Finally, we demonstrate the utility of our approach in the context of shape reconstruction, by showing how it can be used to extract normal orientation information from point clouds.  相似文献   

4.
While color plays a fundamental role in film design and production, existing solutions for film analysis in the digital humanities address perceptual and spatial color information only tangentially. We introduce VIAN, a visual film annotation system centered on the semantic aspects of film color analysis. The tool enables expert‐assessed labeling, curation, visualization and Classification of color features based on their perceived context and aesthetic quality. It is the first of its kind that incorporates foreground‐background information made possible by modern deep learning segmentation methods. The proposed tool seamlessly integrates a multimedia data management system, so that films can undergo a full color‐oriented analysis pipeline.  相似文献   

5.
We propose a novel framework to generate a global texture atlas for a deforming geometry. Our approach distinguishes from prior arts in two aspects. First, instead of generating a texture map for each timestamp to color a dynamic scene, our framework reconstructs a global texture atlas that can be consistently mapped to a deforming object. Second, our approach is based on a single RGB‐D camera, without the need of a multiple‐camera setup surrounding a scene. In our framework, the input is a 3D template model with an RGB‐D image sequence, and geometric warping fields are found using a state‐of‐the‐art non‐rigid registration method [GXW*15] to align the template mesh to noisy and incomplete input depth images. With these warping fields, our multi‐scale approach for texture coordinate optimization generates a sharp and clear texture atlas that is consistent with multiple color observations over time. Our approach is accelerated by graphical hardware and provides a handy configuration to capture a dynamic geometry along with a clean texture atlas. We demonstrate our approach with practical scenarios, particularly human performance capture. We also show that our approach is resilient on misalignment issues caused by imperfect estimation of warping fields and inaccurate camera parameters.  相似文献   

6.
We present a novel method to reconstruct a fluid's 3D density and motion based on just a single sequence of images. This is rendered possible by using powerful physical priors for this strongly under‐determined problem. More specifically, we propose a novel strategy to infer density updates strongly coupled to previous and current estimates of the flow motion. Additionally, we employ an accurate discretization and depth‐based regularizers to compute stable solutions. Using only one view for the reconstruction reduces the complexity of the capturing setup drastically and could even allow for online video databases or smart‐phone videos as inputs. The reconstructed 3D velocity can then be flexibly utilized, e.g., for re‐simulation, domain modification or guiding purposes. We will demonstrate the capacity of our method with a series of synthetic test cases and the reconstruction of real smoke plumes captured with a Raspberry Pi camera.  相似文献   

7.
Physically based rendering is a well‐understood technique to produce realistic‐looking images. However, different algorithms exist for efficiency reasons, which work well in certain cases but fail or produce rendering artefacts in others. Few tools allow a user to gain insight into the algorithmic processes. In this work, we present such a tool, which combines techniques from information visualization and visual analytics with physically based rendering. It consists of an interactive parallel coordinates plot, with a built‐in sampling‐based data reduction technique to visualize the attributes associated with each light sample. Two‐dimensional (2D) and three‐dimensional (3D) heat maps depict any desired property of the rendering process. An interactively rendered 3D view of the scene displays animated light paths based on the user's selection to gain further insight into the rendering process. The provided interactivity enables the user to guide the rendering process for more efficiency. To show its usefulness, we present several applications based on our tool. This includes differential light transport visualization to optimize light setup in a scene, finding the causes of and resolving rendering artefacts, such as fireflies, as well as a path length contribution histogram to evaluate the efficiency of different Monte Carlo estimators.  相似文献   

8.
Spectral Monte‐Carlo methods are currently the most powerful techniques for simulating light transport with wavelength‐dependent phenomena (e.g., dispersion, colored particle scattering, or diffraction gratings). Compared to trichromatic rendering, sampling the spectral domain requires significantly more samples for noise‐free images. Inspired by gradient‐domain rendering, which estimates image gradients, we propose spectral gradient sampling to estimate the gradients of the spectral distribution inside a pixel. These gradients can be sampled with a significantly lower variance by carefully correlating the path samples of a pixel in the spectral domain, and we introduce a mapping function that shifts paths with wavelength‐dependent interactions. We compute the result of each pixel by integrating the estimated gradients over the spectral domain using a one‐dimensional screened Poisson reconstruction. Our method improves convergence and reduces chromatic noise from spectral sampling, as demonstrated by our implementation within a conventional path tracer.  相似文献   

9.
We present a new outlier removal technique for a gradient‐domain path tracing (G‐PT) that computes image gradients as well as colors. Our approach rejects gradient outliers whose estimated errors are much higher than those of the other gradients for improving reconstruction quality for the G‐PT. We formulate our outlier removal problem as a least trimmed squares optimization, which employs only a subset of gradients so that a final image can be reconstructed without including the gradient outliers. In addition, we design this outlier removal process so that the chosen subset of gradients maintains connectivity through gradients between pixels, preventing pixels from being isolated. Lastly, the optimal number of inlier gradients is estimated to minimize our reconstruction error. We have demonstrated that our reconstruction with robustly rejecting gradient outliers produces visually and numerically improved results, compared to the previous screened Poisson reconstruction that uses all the gradients.  相似文献   

10.
In this work, we present a method to vectorize raster images of line art. Inverting the rasterization procedure is inherently ill‐conditioned, as there exist many possible vector images that could yield the same raster image. However, not all of these vector images are equally useful to the user, especially if performing further edits is desired. We therefore define the problem of computing an instance segmentation of the most likely set of paths that could have created the raster image. Once the segmentation is computed, we use existing vectorization approaches to vectorize each path, and then combine all paths into the final output vector image. To determine which set of paths is most likely, we train a pair of neural networks to provide semantic clues that help resolve ambiguities at intersection and overlap regions. These predictions are made considering the full context of the image, and are then globally combined by solving a Markov Random Field (MRF). We demonstrate the flexibility of our method by generating results on character datasets, a synthetic random line dataset, and a dataset composed of human drawn sketches. For all cases, our system accurately recovers paths that adhere to the semantics of the drawings.  相似文献   

11.
Robust and efficient rendering of complex lighting effects, such as caustics, remains a challenging task. While algorithms like vertex connection and merging can render such effects robustly, their significant overhead over a simple path tracer is not always justified and – as we show in this paper ‐ also not necessary. In current rendering solutions, caustics often require the user to enable a specialized algorithm, usually a photon mapper, and hand‐tune its parameters. But even with carefully chosen parameters, photon mapping may still trace many photons that the path tracer could sample well enough, or, even worse, that are not visible at all. Our goal is robust, yet lightweight, caustics rendering. To that end, we propose a technique to identify and focus computation on the photon paths that offer significant variance reduction over samples from a path tracer. We apply this technique in a rendering solution combining path tracing and photon mapping. The photon emission is automatically guided towards regions where the photons are useful, i.e., provide substantial variance reduction for the currently rendered image. Our method achieves better photon densities with fewer light paths (and thus photons) than emission guiding approaches based on visual importance. In addition, we automatically determine an appropriate number of photons for a given scene, and the algorithm gracefully degenerates to pure path tracing for scenes that do not benefit from photon mapping.  相似文献   

12.
A mandatory component for many point set algorithms is the availability of consistently oriented vertex‐normals (e.g. for surface reconstruction, feature detection, visualization). Previous orientation methods on meshes or raw point clouds do not consider a global context, are often based on unrealistic assumptions, or have extremely long computation times, making them unusable on real‐world data. We present a novel massively parallelized method to compute globally consistent oriented point normals for raw and unsorted point clouds. Built on the idea of graph‐based energy optimization, we create a complete kNN‐graph over the entire point cloud. A new weighted similarity criterion encodes the graph‐energy. To orient normals in a globally consistent way we perform a highly parallel greedy edge collapse, which merges similar parts of the graph and orients them consistently. We compare our method to current state‐of‐the‐art approaches and achieve speedups of up to two orders of magnitude. The achieved quality of normal orientation is on par or better than existing solutions, especially for real‐world noisy 3D scanned data.  相似文献   

13.
Acquired 3D point clouds make possible quick modeling of virtual scenes from the real world. With modern 3D capture pipelines, each point sample often comes with additional attributes such as normal vector and color response. Although rendering and processing such data has been extensively studied, little attention has been devoted using the light transport hidden in the recorded per‐sample color response to relight virtual objects in visual effects (VFX) look‐dev or augmented reality (AR) scenarios. Typically, standard relighting environment exploits global environment maps together with a collection of local light probes to reflect the light mood of the real scene on the virtual object. We propose instead a unified spatial approximation of the radiance and visibility relationships present in the scene, in the form of a colored point cloud. To do so, our method relies on two core components: High Dynamic Range (HDR) expansion and real‐time Point‐Based Global Illumination (PBGI). First, since an acquired color point cloud typically comes in Low Dynamic Range (LDR) format, we boost it using a single HDR photo exemplar of the captured scene that can cover part of it. We perform this expansion efficiently by first expanding the dynamic range of a set of renderings of the point cloud and then projecting these renderings on the original cloud. At this stage, we propagate the expansion to the regions not covered by the renderings or with low‐quality dynamic range by solving a Poisson system. Then, at rendering time, we use the resulting HDR point cloud to relight virtual objects, providing a diffuse model of the indirect illumination propagated by the environment. To do so, we design a PBGI algorithm that exploits the GPU's geometry shader stage as well as a new mipmapping operator, tailored for G‐buffers, to achieve real‐time performances. As a result, our method can effectively relight virtual objects exhibiting diffuse and glossy physically‐based materials in real time. Furthermore, it accounts for the spatial embedding of the object within the 3D environment. We evaluate our approach on manufactured scenes to assess the error introduced at every step from the perfect ground truth. We also report experiments with real captured data, covering a range of capture technologies, from active scanning to multiview stereo reconstruction.  相似文献   

14.
目的 由于室内点云场景中物体的密集性、复杂性以及多遮挡等带来的数据不完整和多噪声问题,极大地限制了室内点云场景的重建工作,无法保证场景重建的准确度。为了更好地从无序点云中恢复出完整的场景,提出了一种基于语义分割的室内场景重建方法。方法 通过体素滤波对原始数据进行下采样,计算场景三维尺度不变特征变换(3D scale-invariant feature transform,3D SIFT)特征点,融合下采样结果与场景特征点从而获得优化的场景下采样结果;利用随机抽样一致算法(random sample consensus,RANSAC)对融合采样后的场景提取平面特征,将该特征输入PointNet网络中进行训练,确保共面的点具有相同的局部特征,从而得到每个点在数据集中各个类别的置信度,在此基础上,提出了一种基于投影的区域生长优化方法,聚合语义分割结果中同一物体的点,获得更精细的分割结果;将场景物体的分割结果划分为内环境元素或外环境元素,分别采用模型匹配的方法、平面拟合的方法从而实现场景的重建。结果 在S3DIS (Stanford large-scale 3D indoor space dataset)数据集上进行实验,本文融合采样算法对后续方法的效率和效果有着不同程度的提高,采样后平面提取算法的运行时间仅为采样前的15%;而语义分割方法在全局准确率(overall accuracy,OA)和平均交并比(mean intersection over union,mIoU)两个方面比PointNet网络分别提高了2.3%和4.2%。结论 本文方法能够在保留关键点的同时提高计算效率,在分割准确率方面也有着明显提升,同时可以得到高质量的重建结果。  相似文献   

15.
Selecting informative and visually appealing views for 3D indoor scenes is beneficial for the housing, decoration, and entertainment industries. A set of views that exhibit comfort, aesthetics, and functionality of a particular scene can attract customers and facilitate business transactions. However, selecting views for an indoor scene is challenging because the system has to consider not only the need to reveal as much information as possible, but also object arrangements, occlusions, and characteristics. Since there can be many principles utilized to guide the view selection, and various principles to follow under different circumstances, we achieve the goal by imitating popular photos on the Internet. Specifically, we select the view that can optimize the contour similarity of corresponding objects to the photo. Because the selected view can be inadequate if object arrangements in the 3D scene and the photo are different, our system imitates many popular photos and selects a certain number of views. After that, it clusters the selected views and determines the view/cluster centers by the weighted average to finally exhibit the scene. Experimental results demonstrate that the views selected by our method are visually appealing.  相似文献   

16.
Monte‐Carlo path tracing techniques can generate stunning visualizations of medical volumetric data. In a clinical context, such renderings turned out to be valuable for communication, education, and diagnosis. Because a large number of computationally expensive lighting samples is required to converge to a smooth result, progressive rendering is the only option for interactive settings: Low‐sampled, noisy images are shown while the user explores the data, and as soon as the camera is at rest the view is progressively refined. During interaction, the visual quality is low, which strongly impedes the user's experience. Even worse, when a data set is explored in virtual reality, the camera is never at rest, leading to constantly low image quality and strong flickering. In this work we present an approach to bring volumetric Monte‐Carlo path tracing to the interactive domain by reusing samples over time. To this end, we transfer the idea of temporal antialiasing from surface rendering to volume rendering. We show how to reproject volumetric ray samples even though they cannot be pinned to a particular 3D position, present an improved weighting scheme that makes longer history trails possible, and define an error accumulation method that downweights less appropriate older samples. Furthermore, we exploit reprojection information to adaptively determine the number of newly generated path tracing samples for each individual pixel. Our approach is designed for static, medical data with both volumetric and surface‐like structures. It achieves good‐quality volumetric Monte‐Carlo renderings with only little noise, and is also usable in a VR context.  相似文献   

17.
In this work, we introduce multi‐column graph convolutional networks (MGCNs), a deep generative model for 3D mesh surfaces that effectively learns a non‐linear facial representation. We perform spectral decomposition of meshes and apply convolutions directly in the frequency domain. Our network architecture involves multiple columns of graph convolutional networks (GCNs), namely large GCN (L‐GCN), medium GCN (M‐GCN) and small GCN (S‐GCN), with different filter sizes to extract features at different scales. L‐GCN is more useful to extract large‐scale features, whereas S‐GCN is effective for extracting subtle and fine‐grained features, and M‐GCN captures information in between. Therefore, to obtain a high‐quality representation, we propose a selective fusion method that adaptively integrates these three kinds of information. Spatially non‐local relationships are also exploited through a self‐attention mechanism to further improve the representation ability in the latent vector space. Through extensive experiments, we demonstrate the superiority of our end‐to‐end framework in improving the accuracy of 3D face reconstruction. Moreover, with the help of variational inference, our model has excellent generating ability.  相似文献   

18.
We introduce Flexible Live‐Wire, a generalization of the Live‐Wire interactive segmentation technique with floating anchors. In our approach, the user input for Live‐Wire is no longer limited to the setting of pixel‐level anchor nodes, but can use more general anchor sets. These sets can be of any dimension, size, or connectedness. The generality of the approach allows the design of a number of user interactions while providing the same functionality as the traditional Live‐Wire. In particular, we experiment with this new flexibility by designing four novel Live‐Wire interactions based on specific primitives: paint, pinch, probable, and pick anchors. These interactions are only a subset of the possibilities enabled by our generalization. Moreover, we discuss the computational aspects of this approach and provide practical solutions to alleviate any additional overhead. Finally, we illustrate our approach and new interactions through several example segmentations.  相似文献   

19.
Since indoor scenes are frequently changed in daily life, such as re‐layout of furniture, the 3D reconstructions for them should be flexible and easy to update. We present an automatic 3D scene update algorithm to indoor scenes by capturing scene variation with RGBD cameras. We assume an initial scene has been reconstructed in advance in manual or other semi‐automatic way before the change, and automatically update the reconstruction according to the newly captured RGBD images of the real scene update. It starts with an automatic segmentation process without manual interaction, which benefits from accurate labeling training from the initial 3D scene. After the segmentation, objects captured by RGBD camera are extracted to form a local updated scene. We formulate an optimization problem to compare to the initial scene to locate moved objects. The moved objects are then integrated with static objects in the initial scene to generate a new 3D scene. We demonstrate the efficiency and robustness of our approach by updating the 3D scene of several real‐world scenes.  相似文献   

20.
As the number of models for 3D indoor scenes are increasing rapidly, methods for generating the lighting layout have also become increasingly important. This paper presents a novel method that creates optimal placements and intensities of a set of lights in indoor scenes. Our method is characterized by designing the objective functions for the optimization based on the lighting guidelines used in the interior design field. Specifically, to apply major elements of the lighting guideline, we identify three criteria, namely the structure, function, and aesthetics, that are suitable for the virtual space and quantify them through a set of objective terms: pairwise relation, hierarchy, circulation, illuminance, and collision. Given an indoor scene with properly arranged furniture as input, our method combines the procedural and optimization‐based approaches to generate lighting layouts appropriate to the geometric and functional characteristics of the input scene. The effectiveness of our method is demonstrated with an ablation study of cost terms for the optimization and a user study for perceptual evaluation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号