首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 531 毫秒
1.
We present a deep learning based technique that enables novel‐view videos of human performances to be synthesized from sparse multi‐view captures. While performance capturing from a sparse set of videos has received significant attention, there has been relatively less progress which is about non‐rigid objects (e.g., human bodies). The rich articulation modes of human body make it rather challenging to synthesize and interpolate the model well. To address this problem, we propose a novel deep learning based framework that directly predicts novel‐view videos of human performances without explicit 3D reconstruction. Our method is a composition of two steps: novel‐view prediction and detail enhancement. We first learn a novel deep generative query network for view prediction. We synthesize novel‐view performances from a sparse set of just five or less camera videos. Then, we use a new generative adversarial network to enhance fine‐scale details of the first step results. This opens up the possibility of high‐quality low‐cost video‐based performance synthesis, which is gaining popularity for VA and AR applications. We demonstrate a variety of promising results, where our method is able to synthesis more robust and accurate performances than existing state‐of‐the‐art approaches when only sparse views are available.  相似文献   

2.
图像风格迁移技术可以自动地赋予图像不同的风格。现有的研究大多针对图像的整体或者图像中的单一区域进行风格迁移,在实际应用中难免存在局限性。在风格迁移过程中引入内容图像的语义信息,提出一种针对图像不同区域进行的差异风格化的方法。将内容图像经过语义分割后引入VGG损失网络,从而限定图像的风格化区域。分别在每个区域上计算各自的格拉姆矩阵,并在反向传播阶段将梯度传播限定在各语义区域上,得出针对图像不同区域的风格特征值。将正则化损失引入损失函数中,以减弱不同区域间的相互影响。在Microsoft COCO2017数据集上设计了实验,结果表明,该方法实现图像多个区域不同风格化的同时,保证了区域之间过渡自然。  相似文献   

3.
图像风格迁移是一种用不同风格渲染图像语义内容的图像处理方法。随着深度学习的兴起,图像风格迁移获得了进一步的发展,并取得了一系列突破性的研究成果。其出色的风格迁移能力引起了学术界和工业界的广泛关注,具有重要的研究价值。为推进基于深度学习的图像风格迁移的技术研究,本文对目前的主要方法和代表性工作进行了归纳与探讨。首先回顾了非参数的图像风格迁移,详细介绍了目前主要的基于深度学习的图像风格迁移的基本原理和方法,分析了图像风格迁移在相关领域中的应用前景,最后总结了基于深度学习的图像风格迁移目前存在的问题与未来的研究方向。  相似文献   

4.
In this work, we introduce multi‐column graph convolutional networks (MGCNs), a deep generative model for 3D mesh surfaces that effectively learns a non‐linear facial representation. We perform spectral decomposition of meshes and apply convolutions directly in the frequency domain. Our network architecture involves multiple columns of graph convolutional networks (GCNs), namely large GCN (L‐GCN), medium GCN (M‐GCN) and small GCN (S‐GCN), with different filter sizes to extract features at different scales. L‐GCN is more useful to extract large‐scale features, whereas S‐GCN is effective for extracting subtle and fine‐grained features, and M‐GCN captures information in between. Therefore, to obtain a high‐quality representation, we propose a selective fusion method that adaptively integrates these three kinds of information. Spatially non‐local relationships are also exploited through a self‐attention mechanism to further improve the representation ability in the latent vector space. Through extensive experiments, we demonstrate the superiority of our end‐to‐end framework in improving the accuracy of 3D face reconstruction. Moreover, with the help of variational inference, our model has excellent generating ability.  相似文献   

5.
针对使用注意力机制的语义分割模型计算资源消耗与精度不平衡的问题,提出一种轻量化的语义分割注意力增强算法。首先,基于驾驶场景中物体的形状特点设计了条状分维注意力机制,使用条形池化代替传统方形卷积,并结合降维操作分维度提取长程语义关联,削减模型计算量。接着融合通道域与空间域上的注意力,形成可叠加与拆解的轻量化多维注意力融合模块,全方位提取特征信息,进一步提升模型精度。最后,将模块插入基于ResNet-101骨干网的编码—解码网络中,指导高低层语义融合,矫正特征图边缘信息,补充预测细节。实验表明,该模块有较强的鲁棒性和泛化能力,与同类型注意力机制相比,削减了约90%的参数量以及80%的计算量,且分割精度依旧取得了稳定的提升。  相似文献   

6.
基于深度学习的方法在去雾领域已经取得了很大进展,但仍然存在去雾不彻底和颜色失真等问题.针对这些问题,本文提出一种基于内容特征和风格特征相融合的单幅图像去雾网络.所提网络包括特征提取、特征融合和图像复原三个子网络,其中特征提取网络包括内容特征提取模块和风格特征提取模块,分别用于学习图像内容和图像风格以实现去雾的同时可较好地保持原始图像的色彩特征.在特征融合子网络中,引入注意力机制对内容特征提取模块输出的特征图进行通道加权实现对图像主要特征的学习,并将加权后的内容特征图与风格特征图通过卷积操作相融合.最后,图像复原模块对融合后的特征图进行非线性映射得到去雾图像.与已有方法相比,所提网络对合成图像和真实图像均可取得理想的去雾结果,同时可有效避免去雾后的颜色失真问题.  相似文献   

7.
针对室内复杂场景中, 图像语义分割存在的特征损失和双模态有效融合等问题, 提出了一种基于编码器-解码器架构的融合注意力机制的轻量级语义分割网络. 首先采用两个残差网络作为主干网络分别对RGB和深度图像进行特征提取, 并在编码器中引入极化自注意力机制, 然后设计引入双模态融合模块在不同阶段对RGB特征和深度特征进行有效融合, 接着引入并行聚合金字塔池化以获取区域之间的依赖性. 最后, 采用3个不同尺寸的解码器将前面的多尺度特征图进行跳跃连接并融合解码, 使分割结果含有更多的细节纹理. 将本文提出的网络模型在NYUDv2数据集上进行训练和测试, 并与一些较先进RGB-D语义分割网络对比, 实验证明本文网络具有较好分割性能.  相似文献   

8.
We propose a novel approach to robot‐operated active understanding of unknown indoor scenes, based on online RGBD reconstruction with semantic segmentation. In our method, the exploratory robot scanning is both driven by and targeting at the recognition and segmentation of semantic objects from the scene. Our algorithm is built on top of a volumetric depth fusion framework and performs real‐time voxel‐based semantic labeling over the online reconstructed volume. The robot is guided by an online estimated discrete viewing score field (VSF) parameterized over the 3D space of 2D location and azimuth rotation. VSF stores for each grid the score of the corresponding view, which measures how much it reduces the uncertainty (entropy) of both geometric reconstruction and semantic labeling. Based on VSF, we select the next best views (NBV) as the target for each time step. We then jointly optimize the traverse path and camera trajectory between two adjacent NBVs, through maximizing the integral viewing score (information gain) along path and trajectory. Through extensive evaluation, we show that our method achieves efficient and accurate online scene parsing during exploratory scanning.  相似文献   

9.
Sketching is a simple and natural way of expression and communication for humans. For this reason, it gains increasing popularity in human computer interaction, with the emergence of multitouch tablets and styluses. In recent years, sketch‐based interactive methods are widely used in many retrieval systems. In particular, a variety of sketch‐based 3D model retrieval works have been presented. However, almost all of these works focus on directly matching sketches with the projection views of 3D models, and they suffer from the large differences between the sketch drawing and the views of 3D models, leading to unsatisfying retrieval results. Therefore, in this paper, during the matching procedure in the retrieval, we propose to match the sketch with each 3D model from historical users instead of projection views. Yet since the sketches between the current user and the historical users can have big difference, we also aim to handle users' personalized deviations and differences. To this end, we leverage recommendation algorithms to estimate the drawing style characteristic similarity between the current user and historical users. Experimental results on the Large Scale Sketch Track Benchmark(SHREC14LSSTB) demonstrate that our method outperforms several state‐of‐the‐art methods.  相似文献   

10.
生成对抗网络近年来发展迅速,其中语义区域分割与生成模型的结合为图像生成技术研究提供了新方向。在当前的研究中,语义信息作为指导生成的条件,可以通过编辑和控制输入的语义分割掩码来生成理想的特定风格图像。文中提出了一种具有语义区域风格约束的图像生成框架,利用条件对抗生成网络实现了图像分区域的自适应风格控制。具体而言,首先获得图像的语义分割图,并使用风格编码器提取出图像中不同语义区域的风格信息;然后,在生成端将风格信息和语义掩码对应生成器中的每个残差块分别仿射变换为两组调制参数;最后,输入到生成器中的语义特征图根据每个残差块的调制参数加权求和,并通过卷积与上采样渐进式地生成目标风格内容,从而有效地将语义信息和风格信息相结合,得到最终的目标风格内容。针对现有模型难以精准控制各语义区域风格的问题,文中设计了新的风格约束损失,在语义层次上约束区域风格变化,减小不同语义区域的风格编码之间的相互影响;另外,在不影响性能的前提下,采取权重量化的方式,将生成器的参数存储规模压缩为原来的15.6%,有效降低了模型的存储空间消耗。实验结果表明,所提模型的生成质量在主观感受和客观指标上较现有方法均有显著提高,其中FID分数比当前最优模型提升了约3.8%。  相似文献   

11.
In this paper, we propose a unified neural network for panoptic segmentation, a task aiming to achieve more fine‐grained segmentation. Following existing methods combining semantic and instance segmentation, our method relies on a triple‐branch neural network for tackling the unifying work. In the first stage, we adopt a ResNet50 with a feature pyramid network (FPN) as shared backbone to extract features. Then each branch leverages the shared feature maps and serves as the stuff, things, or mask branch. Lastly, the outputs are fused following a well‐designed strategy. Extensive experimental results on MS‐COCO dataset demonstrate that our approach achieves a competitive Panoptic Quality (PQ) metric score with the state of the art.  相似文献   

12.
于明  李学博  郭迎春 《控制与决策》2022,37(7):1721-1728
域泛化的行人再识别能够在源数据集进行训练并在目标数据集进行测试,具有更广泛的实际应用意义.现有域泛化模型往往由于专注解决光照和色彩变化问题而忽略对细节信息的有效利用,导致识别率较低.为了解决上述问题,提出一种融合注意力机制的域泛化行人再识别模型.该模型首先通过叠加卷积层的瓶颈层(bottleneck layer)设计提取出包含不同视野域的多尺度特征,并利用特征融合注意力模块对多尺度特征进行加权动态融合;然后通过多层次注意力模块挖掘细节特征的语义信息;最后将包含丰富语义信息的细节特征输入到判别器进行行人再识别.此外,设计风格正则化模块(style nomalization module)用于降低不同数据集图像明暗变化对模型泛化能力的影响.在Market-1501和DukeMTMC-reID数据集上进行对比和消融实验,表明了所提出方法的有效性.  相似文献   

13.
As the deformation behaviors of hair strands vary greatly depending on the hairstyle, the computational cost and accuracy of hair movement simulations can be significantly improved by applying simulation methods specific to a certain style. This paper makes two contributions with regard to the simulation of various hair styles. First, we propose a novel method to reconstruct simulatable hair strands from hair meshes created by artists. Manually created hair meshes consist of numerous mesh patches, and the strand reconstruction process is challenged by the absence of connectivity information among the patches for the same strand and the omission of hidden parts of strands due to the manual creation process. To this end, we develop a two‐stage spectral clustering method for estimating the degree of connectivity among patches and a strand‐growing method that preserves hairstyles. Next, we develop a hairstyle classification method for style‐specific simulations. In particular, we propose a set of features for efficient classifications and show that classifiers trained with the proposed features have higher accuracy than those trained with naive features. Our method applies efficient simulation methods according to the hairstyle without specific user input, and thus is favorable for real‐time simulation.  相似文献   

14.
Despite the recent impressive development of deep neural networks, using deep learning based methods to generate large‐scale Chinese fonts is still a rather challenging task due to the huge number of intricate Chinese glyphs, e.g., the official standard Chinese charset GB18030‐2000 consists of 27,533 Chinese characters. Until now, most existing models for this task adopt Convolutional Neural Networks (CNNs) to generate bitmap images of Chinese characters due to CNN based models' remarkable success in various applications. However, CNN based models focus more on image‐level features while usually ignore stroke order information when writing characters. Instead, we treat Chinese characters as sequences of points (i.e., writing trajectories) and propose to handle this task via an effective Recurrent Neural Network (RNN) model with monotonic attention mechanism, which can learn from as few as hundreds of training samples and then synthesize glyphs for remaining thousands of characters in the same style. Experimental results show that our proposed FontRNN can be used for synthesizing large‐scale Chinese fonts as well as generating realistic Chinese handwritings efficiently.  相似文献   

15.
We propose a novel method that automatically analyzes stroke-related artistic styles of paintings.A set of adaptive interfaces are also developed to connect the style analysis with existing painterly rendering systems, so that the specific artistic style of a template painting can be effectively transferred to the input photo with minimal effort.Different from conventional texture-synthesis based rendering techniques that focus mainly on texture features, this work extracts, analyzes and simulates high-level style features expressed by artists’ brush stroke techniques.Through experiments, user studies and comparisons with ground truth, we demonstrate that the proposed style-orientated painting framework can significantly reduce tedious parameter adjustment, and it allows amateur users to efficiently create desired artistic styles simply by specifying a template painting.  相似文献   

16.
Paper pop‐ups are interesting three‐dimensional books that fascinate people of all ages. The design and construction of these pop‐up books however are done manually and require a lot of time and effort. This has led to computer‐assisted or automated tools for designing paper pop‐ups. This paper proposes an approach for automatically converting a 3D model into a multi‐style paper pop‐up. Previous automated approaches have only focused on single‐style pop‐ups, where each is made of a single type of pop‐up mechanisms. In our work, we combine multiple styles in a pop‐up, which is more representative of actual artist's creations. Our method abstracts a 3D model using suitable primitive shapes that both facilitate the formation of the considered pop‐up mechanisms and closely approximate the input model. Each shape is then abstracted using a set of 2D patches that combine to form a valid pop‐up. We define geometric conditions that ensure the validity of the combined pop‐up structures. In addition, our method also employs an image‐based approach for producing the patches to preserve the textures, finer details and important contours of the input model. Finally, our system produces a printable design layout and decides an assembly order for the construction instructions. The feasibility of our results is verified by constructing the actual paper pop‐ups from the designs generated by our system.  相似文献   

17.
视频超分辨率(video super-resolution,VSR)的目的是利用多个相邻帧的信息来生成参考帧的高分辨率版本。现有的许多VSR工作都集中在如何有效地对齐相邻帧以更好地融合相邻帧信息,而很少在相邻帧信息融合这一重要步骤上进行研究。针对该问题,提出了基于组反馈融合机制的视频超分辩模型(GFFMVSR)。具体来说,在相邻帧对齐后,将对齐视频序列输入第一重时间注意力模块;然后,将序列分成几个小组,各小组依次通过组内融合模块实现初步融合。不同小组的融合结果经过第二重时间注意力模块;然后,各小组逐组输入反馈融合模块,利用反馈机制反馈融合不同组别的信息,最后将融合结果输出重建。经验证,该模型具有较强的信息融合能力,在客观评价指标和主观视觉效果上都优于现有的模型。  相似文献   

18.
Eulerian‐based smoke simulations are sensitive to the initial parameters and grid resolutions. Due to the numerical dissipation on different levels of the grid and the nonlinearity of the governing equations, the differences in simulation resolutions will result in different results. This makes it challenging for artists to preview the animation results based on low‐resolution simulations. In this paper, we propose a learning‐based flow correction method for fast previewing based on low‐resolution smoke simulations. The main components of our approach lie in a deep convolutional neural network, a grid‐layer feature vector and a special loss function. We provide a novel matching model to represent the relationship between low‐resolution and high‐resolution smoke simulations and correct the overall shape of a low‐resolution simulation to closely follow the shape of a high‐resolution down‐sampled version. We introduce the grid‐layer concept to effectively represent the 3D fluid shape, which can also reduce the input and output dimensions. We design a special loss function for the fluid divergence‐free constraint in the neural network training process. We have demonstrated the efficacy and the generality of our approach by simulating a diversity of animations deviating from the original training set. In addition, we have integrated our approach into an existing fluid simulation framework to showcase its wide applications.  相似文献   

19.
Color scribbling is a unique form of illustration where artists use compact, overlapping, and monochromatic scribbles at microscopic scale to create astonishing colorful images at macroscopic scale. The creation process is skill‐demanded and time‐consuming, which typically involves drawing monochromatic scribbles layer‐by‐layer to depict true‐color subjects using a limited color palette delicately. In this work, we present a novel computational framework for automatic generation of color scribble images from arbitrary raster images. The core contribution of our work lies in a novel color dithering model tailor‐made for synthesizing a smooth color appearance using multiple layers of overlapped monochromatic strokes. Specifically, our system reconstructs the appearance of the input image by (i) generating layers of monochromatic scribbles based on a limited color palette derived from input image, and (ii) optimizing the drawing sequence among layers to minimize the visual color dissimilarity between dithered image and original image as well as the color banding artifacts. We demonstrate the effectiveness and robustness of our algorithm with various convincing results synthesized from a variety of input images with different stroke patterns. The experimental study further shows that our approach faithfully captures the scribble style and the color presentation at respectively microscopic and macroscopic scales, which is otherwise difficult for state‐of‐the‐art methods.  相似文献   

20.
Power saving is a prevailing concern in desktop computers and, especially, in battery‐powered devices such as mobile phones. This is generating a growing demand for power‐aware graphics applications that can extend battery life, while preserving good quality. In this paper, we address this issue by presenting a real‐time power‐efficient rendering framework, able to dynamically select the rendering configuration with the best quality within a given power budget. Different from the current state of the art, our method does not require precomputation of the whole camera‐view space, nor Pareto curves to explore the vast power‐error space; as such, it can also handle dynamic scenes. Our algorithm is based on two key components: our novel power prediction model, and our runtime quality error estimation mechanism. These components allow us to search for the optimal rendering configuration at runtime, being transparent to the user. We demonstrate the performance of our framework on two different platforms: a desktop computer, and a mobile device. In both cases, we produce results close to the maximum quality, while achieving significant power savings.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号