首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 587 毫秒
1.
基于CUDA的并行加速渲染算法   总被引:1,自引:1,他引:0       下载免费PDF全文
GPU可以快速有效的处理海量数据,因此在近些年成为图形图像数据处理领域的研究热点。针对现有GPU渲染中在处理含有大量相同或相似模型场景时存在资源利用率低下和带宽消耗过大的问题,在原有GPU渲染架构的基础上提出了一种基于CUDA的加速渲染方法。在该方法中,根据现有的GPU渲染模式构建对应的模型,通过模型找出其不足,从而引申出常量内存的概念;然后分析常量内存的特性以及对渲染产生的作用,从而引入基于常量内存控制的方法来实现渲染的加速,整个渲染过程可以通过渲染算法进行控制。实验结果表明,该方法对解决上述问题具有较好的效果,最终实现加速渲染。  相似文献   

2.
通过三角形Strip衍生实现三维模型数据的渲染优化   总被引:1,自引:1,他引:0  
GPU的顶点Cache命中率对三维渲染性能有着重大影响,而三维模型中的数据组织是GPU顶点Cache命中率的莺要因素.为此提出一种全新的模型Mesh优化方法.该方法从造成顶点Cache失效的基本情况出发,在模型Mesh中建立Seed Strip,并挑选与Seed Strip具有多个关联边的顶点进行Derived Strip衍生;通过反复地建立SeedStrip和衍生Derived Strip,得到一个优化的Mesh三角形序列,以有效地提高GPU的顶点Cache命中率,从而提高渲染效能.此外,该项工作还为进一步解决三维渲染中OverDraw问题预留了扩展的空间.  相似文献   

3.
基于LBM模型在GPU上实时草波动的实现研究   总被引:1,自引:0,他引:1  
对室外复杂场景真实感的渲染的需求越来越高.越来越多能体现真实感渲染的物理模型用于实时渲染中.LBM(Lattice Boltzmann Model)模型在保证质量守恒和动量守恒的前提下,能模拟复杂流体运动.而图形硬件(GPU)的发展,使LBM模型能在GPU上实现,提高了算法的运行效率,使之能用于实时渲染.为了实现对室外复杂场景真实感的渲染,提出了基于物理模型实现实时渲染草地波动效果的算法.用LBM模型模拟风力场,用简单且实用的方法建模草地,实现实时渲染大规模草地的波动效果,且能达到比较真实的效果.  相似文献   

4.
基于分布式渲染架构的远程可视化研究   总被引:1,自引:0,他引:1  
互联网带宽的增长催生了远程可视化,它有着更好的分享性、移动性和方便性.针对大规模数据的远程可视化问题,提出了一种基于Sort-Last的分布式渲染架构,给出了基于GPU的融合、抗锯齿等算法.该架构用于远程可视化的服务器端,包括渲染节点、融合节点和任务节点等3层结构,具有良好的可扩展性.基于此,实现了一个远程可视化系统Waterman,提供基于Internet的高精地形渲染和海洋排放口污水扩散可视化服务,并给出了详细的设计方法和技术细节,包括基于Raycasting的地形渲染算法、基于陆地掩蔽(mask)方法的海面渲染技术和基于图片、网格模型的客户端混合实现技术等.最后对该架构和系统进行了性能测试和分析.提出的方法实用、鲁棒、扩展性好,可为同类系统设计提供很好的参考.  相似文献   

5.
随着图形处理器(GPU)从仅用来进行图形图像渲染,脱离成为并行计算平台通用图形处理器(GPGPU),其计算能力越来越强,本文在研究GPGPU体系结构的基础上对GPGPU并行计算线程调度进行深入研究,阐述了GPU线程调度原理,揭示了SIMT调度模式的不足.通过公式推导阐述了系统功耗与系统运行频率的关系.  相似文献   

6.
统一渲染架构GPU为图形处理提供了丰富的运算、存储资源,也对软件优化提出了更高要求。为了有效地进行性能设计和优化,针对统一渲染架构实现的GPU提出一种量化的图形处理性能模型,在深入研究统一渲染架构GPU架构和工作原理基础上,分析影响图形处理的各种因素:图形指令生成、主机接口数据传输、图形指令解析、图形处理流水数据吞吐和统一染色阵列处理能力。通过仿真验证表明,在研制自主知识产权GPU过程中,采用本方法设计各部分性能指标,评估统一染色GPU图形处理性能与实测相比,误差小于7.5%。  相似文献   

7.
基于GPU编程的地形纹理快速渲染方法研究   总被引:1,自引:1,他引:0  
在分析GPU并行计算特点的基础上,提出并实现了基于GPU编程的地形纹理快速渲染方法,其核心是用GPU编程对地形纹理图像进行快速解压.与传统渲染流程不同,该方法首先把压缩纹理图像传输到图形卡中,然后通过GPU编程实现对压缩图像解压的硬件加速,从而解决了海量纹理数据存储;传输带宽以及解压速度等一系列问题.实验结果表明基于GPU编程的地形纹理快速渲染方法在虚拟场景的渲染速度方面优势明显,并且随着地形纹理图像分辨率的增大这种优势体现得更加充分.  相似文献   

8.
基于粒子系统的书法书写仿真系统   总被引:1,自引:0,他引:1  
水墨风格渲染是非真实感渲染的一个研究分支.利用CPU、GPU配合实现的粒子系统,结合纹理融合技术来实现书法书写字迹仿真.利用鼠标实现用户交互,鼠标按键、位移方向和位移速度控制墨滴的走向,实现了书法书写的仿真.系统采用实时渲染方式,用户可根据需要对场景和系统进行配置并实时调整视角和缩放.实验表明运算速度能够满足交互系统的实时性需求.  相似文献   

9.
刘明  徐飞  刘玉 《微计算机信息》2008,24(15):293-295
本文通过有效利用图形硬件图形处理单元(GPU)的运算能力和编程性,将大量计算从CPU中分离出来,实现了自然逼真而且高效的大规模波动草叶的实时渲染.利用GPU的顶点程序进行草叶的运动计算,利用GPU的片元程序进行静态阴影的计算.本文技术由OpenGL结合Cg编程实现,达到了自然逼真的渲染效果和较高的渲染效率.  相似文献   

10.
蒋宝吉  唐棣 《计算机科学》2013,40(10):289-291,308
3D模型风格化线条绘制[1]中的一个最大问题是可见性的计算,当前在以复杂的模型计算线条可见性的算法中,不是交互式渲染速度太慢就是渲染后的动画不连贯,由此提出一种在GPU下快速、高质量地计算线条可见性的算法,其利用先进的可编程图形流水线技术,可以支持线条可见性和广泛的风格化选项.实验结果表明,该算法运算速度能够满足交互系统的实时性需求.  相似文献   

11.
We present a hybrid ray tracing system, where the work is divided between the CPU cores and the GPU in an integrated chip, and communication occurs via shared memory. Rays are organized in large packets that can be distributed among the two units as needed. Testing visibility between rays and the scene is mostly performed using an optimized kernel on the GPU, but the CPU can help as necessary. The CPU cores typically handle most or all shading, which makes it easy to support complex appearances. For efficiency, the CPU cores shade whole batches of rays by sorting them on material and shading each material using a vectorized kernel. In addition, we introduce a method to support light paths with arbitrary recursion, such as multiple recursive Whitted‐style ray tracing and adaptive sampling where the result of a ray is examined before sending the next, while still batching up rays for the benefit of GPU‐accelerated traversal and vectorized shading. This allows our system to achieve high rendering performance while maintaining the flexibility to accommodate different rendering algorithms.  相似文献   

12.
为了在虚拟人体经络系统中模拟循经感传现象,提出了一种基于GPU的纹理混合技术,通过对三张不同色的纹理贴图进行采样,在GPU的像素着色器中根据混合公式合成最终的纹理,实现了循经感传动态模拟效果.程序设计结果表明:在不同的模拟参数下,系统都能够实时逼真地重现循经感传过程.因此采用多纹理混合技术来模拟感传是一个合理有效的解决方案.  相似文献   

13.
传统的GPU渲染方法是前向渲染,在当前的实时渲染领域中,一种叫做延迟着色的新管线利用了硬件的多渲染目标特性进行两个阶段的管线处理,在逐像素光照计算之前就完成了所有的可见性测试,将像素填充率降到最低。然而,延迟着色需要大量的视频存储空间,而且不能高效地渲染半透明物体,针对应用中的这些局限性提出可行的解决方案。  相似文献   

14.
Several fast global illumination algorithms rely on the Virtual Point Lights framework. This framework separates illumination into two steps: first, propagate radiance in the scene and store it in virtual lights, then gather illumination from these virtual lights. To accelerate the second step, virtual lights and receiving points are grouped hierarchically, for example using Multi-Dimensional Lightcuts. Computing visibility between clusters of virtual lights and receiving points is a bottleneck. Separately, matrix completion algorithms reconstruct completely a low-rank matrix from an incomplete set of sampled elements. In this paper, we use adaptive matrix completion to approximate visibility information after an initial clustering step. We reconstruct visibility information using as little as 10 % to 20 % samples for most scenes, and combine it with shading information computed separately, in parallel on the GPU. Overall, our method computes global illumination 3 or more times faster than previous state-of-the-art methods.  相似文献   

15.
Existing GPU antialiasing techniques, such as MSAA or MLAA, focus on reducing aliasing artifacts along silhouette boundaries or edges in image space. However, they neglect aliasing from shading in case of high-frequency geometric detail. This may lead to a shading aliasing artifact that resembles Bailey's Bead Phenomenon—the degradation of continuous specular highlights to a string of pearls. These types of artifacts are particularly striking for high-quality surfaces. So far, the only way of removing aliasing from shading is by globally supersampling the entire image with a large number of samples. However, globally supersampling the image is slow and significantly increases bandwidth consumption. We propose three adaptive approaches that locally supersample triangles only where necessary on the GPU. Thereby, we efficiently remove artifacts from shading while aliasing along silhouettes is reduced by efficient hardware MSAA.  相似文献   

16.
当前虚拟桌面实施方法中,终端用户对3D图形处理能力越来越高的要求与虚拟机GPU处理能力之间的矛盾逐渐凸显。为解决上述问题,对GPU虚拟化的典型实施方法进行了研究。在对上述虚拟化技术进行分析的基础上,介绍了一种改进的基于设备独占法和API remoting法的虚拟化方案。利用Hypervisor创建两种模式的虚拟机,分别为一台父虚拟机(GVM)和多台子虚拟机(DVM)。GVM完全独占物理GPU,而DVM与物理GPU无直接交互关系。两种模式虚拟机共享GPU内存以及指令通道,DVM中的GPU调用指令传递至GVM,通过GVM对物理GPU进行快速调用,将调用结果返回到共享内存空间,进而呈现给用户。最后对改进的GPU虚拟化方法与典型虚拟化方法进行了对比与分析,总结了其中的优缺点,梳理了将来的研究重点。  相似文献   

17.
本文通过在MJPEG回放中如何应用GPU Shading Language编程帮助提高嵌入式系统性能的分析,阐述了对GPU可编程管线的合理利用,使其不光用于3D图像的渲染,并且在协助CPU进行通用计算方面起到更多作用的观点。  相似文献   

18.
The Lit-Sphere model proposed by Sloan et al. (Proceedings of Graphics Interface 2001, pp. 143–150, 2001) is a method for emulating expressive artistic shading styles for 3D scenes. Assuming that artistic shading styles are described by view space normals, this model produces a variety of stylized shading scenes beyond traditional 3D lighting control. However, it is limited to the static lighting case: the shading effect is only dependent on the camera view. In addition, it cannot support small-scale brush stroke styles. In this paper, we propose a scheme to extend the Lit-Sphere model based on light space normals rather than view space normals. Owing to the light space representation, our shading model addresses the issues of the original Lit-Sphere approach, and allows artists to use a light source to obtain dynamic diffuse and specular shading. Then the shading appearance can be refined using stylization effects including highlight shape control, sub-lighting effects, and brush stroke styles. Our algorithms are easy to implement on GPU, so that our system allows interactive shading design.  相似文献   

19.
Traditional approaches for rendering segmented volumetric data sets usually deliver unsatisfactory results, such as insufficient frame rate, low image quality, and intermixing artifacts. In this paper, we introduce a novel “color encoding” technique, based on graphics processing unit (GPU) accelerated raycasting and post-color attenuated classification, to address this problem. The result is an algorithm that can generate artifact-free dynamic volumetric images in real time. Next, we present a pre-integrated volume shading algorithm to reduce graphics memory requirements and computational cost when compared to traditional shading methods. We also present a normal-adjustment technique to improve image quality at clipped planes. Furthermore, we propose a new algorithm for color and depth texture indexing that permits virtual solid objects, such as surgical tools, to be manipulated within the dynamically rendered volumetric cardiac images in real time. Finally, all these techniques are combined within an environment that permits real-time visualization, enhancement, and manipulation of dynamic cardiac data sets.  相似文献   

20.
GPUs are widely used in modern high-performance computing systems. To reduce the burden of GPU programmers, operating system and GPU hardware provide great supports for shared virtual memory, which enables GPU and CPU to share the same virtual address space. Unfortunately, the current SIMT execution model of GPU brings great challenges for the virtual-physical address translation on the GPU side, mainly due to the huge number of virtual addresses which are generated simultaneously and the bad locality of these virtual addresses. Thus, the excessive TLB accesses increase the miss ratio of TLB. As an attractive solution, Page Walk Cache (PWC) has received wide attention for its capability of reducing the memory accesses caused by TLB misses. However, the current PWC mechanism suffers from heavy redundancies, which significantly limits its efficiency. In this paper, we first investigate the facts leading to this issue by evaluating the performance of PWC with typical GPU benchmarks. We find that the repeated L4 and L3 indices of virtual addresses increase the redundancies in PWC, and the low locality of L2 indices causes the low hit ratio in PWC. Based on these observations, we propose a new PWC structure, namely Compressed Page Walk Cache (CPWC), to resolve the redundancy burden in current PWC. Our CPWC can be organized in either direct-mapped mode or set-associated mode. Experimental results show that CPWC increases by 3 times over TPC in the number of page table entries, increases by 38.3% over PWC in L2 index hit ratio and reduces by 26.9% in the memory accesses of page tables. The average memory accesses caused by each TLB miss is reduced to 1.13. Overall, the average IPC can improve by 25.3%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号