期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

刘秀玲王冬雨陈栋刘京王洪瑞《计算机工程与设计》2012,33(5):1847-1851

为了保证在大规模复杂场景中,碰撞检测的实时性和精确性,提出了一种基于图形空间与改进的图像空间相结合,并利用GPU加速的快速碰撞检测方法.利用AABB包围盒的检测策略,快速剔除不相交物体,确定潜在碰撞对象.改进传统的基于图像空间的碰撞检测算法,设计了基于向指定平面投影、模板测试和深度测试的碰撞检测算法.在此基础上,利用GPU的并行计算能力加速整个检测过程,有效地减少了碰撞检测时间.通过在虚拟驾驶系统当中的应用,验证了该方法在大规模复杂场景中碰撞检测的实时性和精确性. 相似文献

2.

基于流的实时碰撞检测算法 总被引：21，自引：0，他引：21

范昭炜万华根高曙明《软件学报》2004,15(10):1505-1514

实时碰撞检测是计算机图形应用中不可或缺的问题之一,复杂物体间的实时碰撞检测至今仍未能得以很好的解决.高性能可编程图形硬件的出现,正在改变着通用计算仅能由CPU完成的传统观念.探索性地采用了可编程图形硬件来解决复杂物体间的实时碰撞检测问题.通过将两个任意物体间的碰撞检测计算映射到图形硬件以有效利用图形硬件的并行架构,由实时绘制过程快速产生碰撞检测结果.为此,算法首先将碰撞检测问题转化为一组线段集合与三角形的求交问题,以实现碰撞检测算法向可编程图形硬件的迁移.在对算法复杂度进行理性分析的基础上,给出了两种有效的优化技术以提升算法效率.实验结果表明,与现有的图像空间碰撞检测算法相比,该算法在效率、精确性和实用性方面具有明显优势. 相似文献

3.

基于粒子系统的模糊物体间交互模型设计

王照李光耀《计算机应用》2010,30(Z1)

考虑粒子间碰撞的原理性与碰撞检测的计算实时性要求,提出两种模糊物体粒子间交互的模型思想.通过引入像素投影链表,设计出一套主动粒子检测碰撞,被动粒子维护链表的近似优化算法,该算法在不丢失感官真实性前提下有效约减系统时间复杂度并与被动粒子数成线性关系,实现了满足动画帧频要求的喷雾水珠与火焰粒子交互的原型系统,进而将该思想推广到其他基于粒子系统的模糊物体粒子间交互的模型中. 相似文献

4.

图形硬件通用计算技术的应用研究 总被引：2，自引：0，他引：2

张杨诸昌钤何太军《计算机应用》2005,25(9):2192-2195

在通用计算的图形硬件加速研究中,综合了在OPENGL体系下的计算模型。通过实验,测试了该计算结构的性能并分析了提高计算性能的一些方法。在此基础上,介绍一种基于GPU的并行计算二维离散余弦变换方法。该方法可在GPU上通过一遍绘制,对一幅图像1至4个颜色通道,同时进行8×8大小像素块的离散余弦变换。实验表明在该实验硬件基础上,采用GPU加速的并行离散余弦变换,可比相同算法的CPU实现提高数百倍。相似文献

5.

图形硬件加速的柔性物体连续碰撞检测 总被引：1，自引：0，他引：1

唐敏林江童若锋《计算机学报》2010,33(10)

给出了一种图形硬件加速的柔性物体连续碰撞检测算法,可以实时检测复杂柔性物体场景中所有物体间碰撞和自碰撞.算法将柔性物体的碰撞检测过程进行流式分解,映射到图形硬件上并行执行,同时使用了并行流式登记算法,在图形硬件上高效实现了变长数据结构.该算法已经使用OpenCL在AMD Radeon HD 5870图形硬件上实现.针对一组各具特色的柔性物体仿真场景进行测试,对比CPU(Intel Q6600@2 4GHz)上的单线程优化实现,可以获得9 2～11 4倍的计算加速. 相似文献

6.

一种基于GPU的碰撞检测算法

苏诺季桂树邓拓《计算机系统应用》2009,18(9):65-68

实时碰撞检测是计算机图形应用中不可缺少的组成部分。随着高性能可编程图形处理器（GPU）的发展,出现了许多利用GPU来解决复杂物体间的碰撞检测问题的方法。提出了一种基于GPU的对参数化表面的碰撞检测方法。通过使用几何图像表示的参数化表面,实时的生成GPU优化的包围体层次结构,然后在这个层次结构的基础上实现优化的基于GPU的层次碰撞检测算法。结果显示本方法可以有效的提高碰撞检测的速度,相对于在CPU上实现同样的层次结构遍历方法,基于GPU的方法可以将碰撞检测速度平均提高13%左右。相似文献

7.

gAC:基于GPU的高性能AC算法

陈虎彭江锋施少怀《计算机工程与应用》2012,48(12):43-48

字符串匹配是计算科学中研究最广泛的问题之一,已成为信息检索和生物计算等领域的核心操作。然而受限于CPU的计算能力和存储器访问带宽,传统的串行字符串匹配算法难以进一步提升性能。GPU在计算能力和存储器访问带宽上有很大提升,已经在很多应用上取得了卓越成效。gAC作为一种基于GPU的并行AC算法,针对GPU的SIMT(Single-Instruction Multiple-Thread)以及合并存储器访问的技术特点,采取了减少条件分支、合并访问全局存储器等优化方法,使得在C1060GPU上的字符串扫描速度达到51Gb/s,比基于CPU的串行算法提升了28倍。相似文献

8.

图形处理器用于通用计算的技术、现状及其挑战 总被引：72，自引：4，他引：72

吴恩华《软件学报》2004,15(10):1493-1504

多年来计算机图形处理器(GP以大大超过摩尔定律的速度高速发展.图形处理器的发展极大地提高了计算机图形处理的速度和图形质量,并促进了与计算机图形相关应用领域的快速发展与此同时,图形处理器绘制流水线的高速度和并行性以及近年来发展起来的可编程功能为图形处理以外的通用计算提供了良好的运行平台,这使得基于GPU的通用计算成为近两三年来人们关注的一个研究热点.从介绍GPU的发展历史及其现代GPU的基本结构开始,阐述GPU用于通用计算的技术原理,以及其用于通用计算的主要领域和最新发展情况,并详细地介绍了GPU在流体模拟和代数计算、数据库应用、频谱分析等领域的应用和技术,包括在流体模拟方面的研究工作.还对GPU应用的软件工具及其最新发展作了较详细的介绍.最后,展望了GPU应用于通用计算的发展前景,并从硬件和软件两方面分析了这一领域未来所面临的挑战. 相似文献

9.

基于GPU的现代并行优化算法 总被引：2，自引：2，他引：0

张庆科杨波王琳朱福祥《计算机科学》2012,39(4):304-311

针对现代优化算法在处理相对复杂问题中所面临的求解时间复杂度较高的问题,引入基于GPU的并行处理解决方法。首先从宏观角度阐释了基于计算统一设备架构CUDA的并行编程模型,然后在GPU环境下给出了基于CUDA架构的5种典型现代优化算法(模拟退火算法、禁忌搜索算法、遗传算法、粒子群算法以及人工神经网络)的并行实现过程。通过对比分析在不同环境下测试的实验案例统计结果,指出基于GPU的单指令多线程并行优化策略的优势及其未来发展趋势。相似文献

10.

并行时空处理模型下的快速N-body算法

下载免费PDF全文

王伟曾栩鸿王福焕傅丽丽曾国荪《计算机科学与探索》2011,5(11):1006-1013

图形处理器(graphic processing unit,GPU)的最新发展已经能够以低廉的成本提供高性能的通用计算。基于GPU的CUDA(compute unified device architecture)和OpenCL(open computing language)编程模型为程序员提供了充足的类似于C语言的应用程序接口(application programming interface,API),便于程序员发挥GPU的并行计算能力。采用图形硬件进行加速计算,通过一种新的GPU处理模型——并行时间空间模型,对现有GPU上的N-body实现进行了分析,从而提出了一种新的GPU上快速仿真N-body问题的算法,并在AMD的HD Radeon 5850上进行了实现。实验结果表明,相对于CPU上的实现,获得了400倍左右的加速;相对于已有GPU上的实现,也获得了2至5倍的加速。相似文献

11.

Fast Isosurface Rendering on a GPU by Cell Rasterization

B. Liu G. J. Clapworthy F. Dong 《Computer Graphics Forum》2009,28(8):2151-2164

This paper presents a fast, high‐quality, GPU‐based isosurface rendering pipeline for implicit surfaces defined by a regular volumetric grid. GPUs are designed primarily for use with polygonal primitives, rather than volume primitives, but here we directly treat each volume cell as a single rendering primitive by designing a vertex program and fragment program on a commodity GPU. Compared with previous raycasting methods, ours has a more effective memory footprint (cache locality) and better coherence between multiple parallel SIMD processors. Furthermore, we extend and speed up our approach by introducing a new view‐dependent sorting algorithm to take advantage of the early‐z‐culling feature of the GPU to gain significant performance speed‐up. As another advantage, this sorting algorithm makes multiple transparent isosurfaces rendering available almost for free. Finally, we demonstrate the effectiveness and quality of our techniques in several real‐time rendering scenarios and include analysis and comparisons with previous work. 相似文献

12.

基于GPU的并行最小生成树算法的设计与实现*

郭绍忠王伟王磊《计算机应用研究》2011,28(5):1682-1684

针对目前并行Prim最小生成树算法效率不高的问题,在分析现有并行Prim算法的基础上,提出了适于GPU架构的压缩邻接表图表示形式,开发了基于GPU的minreduction数据并行原语,在NVIDIA GPU上设计并实现了基于Prim算法思想的并行最小生成树算法。该算法通过使用原语缩短关键步骤的查找时间,从而获得较高效率。实验表明,相对于传统CPU实现算法和不使用原语的算法,该算法具有较明显的性能优势。相似文献

13.

CUDA‐quicksort: an improved GPU‐based implementation of quicksort

Emanuele Manca Andrea Manconi Alessandro Orro Giuliano Armano Luciano Milanesi 《Concurrency and Computation》2016,28(1):21-43

Sorting is a very important task in computer science and becomes a critical operation for programs making heavy use of sorting algorithms. General‐purpose computing has been successfully used on Graphics Processing Units (GPUs) to parallelize some sorting algorithms. Two GPU‐based implementations of the quicksort were presented in literature: the GPU‐quicksort, a compute‐unified device architecture (CUDA) iterative implementation, and the CUDA dynamic parallel (CDP) quicksort, a recursive implementation provided by NVIDIA Corporation. We propose CUDA‐quicksort an iterative GPU‐based implementation of the sorting algorithm. CUDA‐quicksort has been designed starting from GPU‐quicksort. Unlike GPU‐quicksort, it uses atomic primitives to perform inter‐block communications while ensuring an optimized access to the GPU memory. Experiments performed on six sorting benchmark distributions show that CUDA‐quicksort is up to four times faster than GPU‐quicksort and up to three times faster than CDP‐quicksort. An in‐depth analysis of the performance between CUDA‐quicksort and GPU‐quicksort shows that the main improvement is related to the optimized GPU memory access rather than to the use of atomic primitives. Moreover, in order to assess the advantages of using the CUDA dynamic parallelism, we implemented a recursive version of the CUDA‐quicksort. Experimental results show that CUDA‐quicksort is faster than the CDP‐quicksort provided by NVIDIA, with better performance achieved using the iterative implementation. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

14.

Interactive vizualization of constructive solid geometry scenes on graphic processors

D. Ulyanov D. Bogolepov V. Turlapov 《Programming and Computer Software》2017,43(4):258-267

A ray-tracing algorithm for interactive visualization of very large and structurally complicated scenes presented in the constructive solid geometry (CSG) form is suggested. The algorithm is capable of visualizing such scenes in real time by using a graphic processor. As primitives, classical shapes and objects represented in an analytical form (in particular, second-order surfaces and implicit functions) are used. Unlike other similar algorithms, our algorithm produces the final image in a single pass and has no constraints on the maximum number of primitives and on the CSG tree depth. The key feature of the algorithm is a method for optimizing CSG models, which converts the input tree to an equivalent spatially coherent and well-balanced form (a completely balanced equivalent tree may not exist). The performance of visualization after applying the optimization technique is shown to depend on only the computational resource of the GPU (in contrast to multi-pass algorithms whose performance is restricted by memory capacity). It has been shown experimentally that our algorithm is capable of rendering CSG models consisting of more than a million CSG primitives with the tree depth up to 24. 相似文献

15.

基于GPU的图像快速旋转算法的研究及实现 总被引：2，自引：0，他引：2

下载免费PDF全文

刘耀林邱飞岳王丽萍《计算机工程与科学》2008,30(6):48-50

本文提出一种基于GPU（图形处理器）的图像旋转并实时绘制方法。首先,文章概述了一种由GPU完成旋转变换的算法,指出其存在的问题与局限性;然后,引出基于GPU的图像旋转算法,并利用DirectX9．0作为软件开发包,在VC＋＋6．0平台实现了图像的旋转以及实时显示;最后,对两种方法的实验结果进行了比较与分析。算法充分吸收GPU在速度以及节约CPU资源等方面的优势,保证了图像旋转的运算速度与旋转质量。相似文献

16.

GPU加速的神经网络BP算法* 总被引：3，自引：3，他引：0

田绪红江敏杰《计算机应用研究》2009,26(5):1679-1681

近年来图形处理器(GPU)快速拓展的可编程性能力加上渲染流水线的高速度及并行性,使得图形处理器通用计算(GPGPU)迅速成为一个研究热点。针对大规模神经网络BP算法效率低下问题,提出了一种GPU加速的神经网络BP算法。将BP网络的前向计算、反向学习转换为GPU纹理的渲染过程,从而利用GPU强大的浮点运算能力和高度并行的计算特性对BP算法进行求解。实验结果表明,在保证求解结果准确度不变的情况下,该方法运行效率有明显的提高。相似文献

17.

Hardware-assisted visibility sorting for unstructured volume rendering 总被引：2，自引：0，他引：2

Callahan SP Ikits M Comba JL Silva CT 《IEEE transactions on visualization and computer graphics》2005,11(3):285-295

Harvesting the power of modern graphics hardware to solve the complex problem of real-time rendering of large unstructured meshes is a major research goal in the volume visualization community. While, for regular grids, texture-based techniques are well-suited for current GPUs, the steps necessary for rendering unstructured meshes are not so easily mapped to current hardware. We propose a novel volume rendering technique that simplifies the CPU-based processing and shifts much of the sorting burden to the GPU, where it can be performed more efficiently. Our hardware-assisted visibility sorting algorithm is a hybrid technique that operates in both object-space and image-space. In object-space, the algorithm performs a partial sort of the 3D primitives in preparation for rasterization. The goal of the partial sort is to create a list of primitives that generate fragments in nearly sorted order. In image-space, the fragment stream is incrementally sorted using a fixed-depth sorting network. In our algorithm, the object-space work is performed by the CPU and the fragment-level sorting is done completely on the GPU. A prototype implementation of the algorithm demonstrates that the fragment-level sorting achieves rendering rates of between one and six million tetrahedral cells per second on an ATI Radeon 9800. 相似文献

18.

屏幕空间自适应的地形Tessellation绘制

下载免费PDF全文

张兵强张立民艾祖亮张建廷《中国图象图形学报》2012,17(11):1431-1438

为了在大规模真实感地形渲染中利用GPU硬件加速的Tessellation技术,在对地形Tessellation原理分析的基础上,提出一种屏幕空间自适应的地形Tessellation绘制算法,实现了在GPU内部对地形模型的三角形自适应细分。该算法采用Tile和Patch的形式对地形数据进行分层组织,在CPU和GPU上分别以Tile和Patch为基础实现地形LOD(level of detail)的自适应简化;提出在Hull Shader上基于Patch边界的细分系数计算模型,确保了Patch细分时的无缝连接;给出了Domain Shader上置换贴图的处理过程,以实现细分顶点的高程纹理映射;并且采用了两级视锥体裁剪机制,减少了渲染数据的冗余量。实验结果表明,该算法具有较好的屏幕空间自适应性和渲染性能,能够在输入粗糙网格的基础上,渲染输出高分辨率几何细节特征的地形模型。相似文献

19.

An efficient GPU-based parallel tabu search algorithm for hardware/software co-design

Neng HOU Fazhi HE Yi ZHOU Yilin CHEN 《Frontiers of Computer Science》2020,14(5):145316

Hardware/software partitioning is an essential step in hardware/software co-design. For large size problems, it is difficult to consider both solution quality and time. This paper presents an efficient GPU-based parallel tabu search algorithm (GPTS) for HW/SW partitioning. A single GPU kernel of compacting neighborhood is proposed to reduce the amount of GPU global memory accesses theoretically. A kernel fusion strategy is further proposed to reduce the amount of GPU global memory accesses of GPTS. To further minimize the transfer overhead of GPTS between CPU and GPU, an optimized transfer strategy for GPU-based tabu evaluation is proposed, which considers that all the candidates do not satisfy the given constraint. Experiments show that GPTS outperforms state-of-the-art work of tabu search and is competitive with other methods for HW/SW partitioning. The proposed parallelization is significant when considering the ordinary GPU platform. 相似文献

20.

Compression and rendering of iso-surfaces and point sampled geometry

Jens Krüger Jens Schneider Rüdiger Westermann 《The Visual computer》2006,22(8):517-530

In this paper we present a streaming compression scheme for gigantic point sets including per-point normals. This scheme extends on our previous Duodecim approach [21] in two different ways. First, we show how to use this approach for the compression and rendering of high-resolution iso-surfaces in volumetric data sets. Second, we use deferred shading of point primitives to considerably improve rendering quality. Iso-surface reconstruction is performed in a hexagonal close packing (HCP) grid, into which the initial data set is resampled. Normals are resampled from the initial domain using volumetric gradients. By incremental encoding, only slightly more than 3 bits per surface point and 5 bits per surface normal are required at high fidelity. The compressed data stream can be decoded in the graphics processing unit (GPU). Decoded point positions are saved in graphics memory, and they are then used on the GPU again to render point primitives. In this way high quality gigantic data sets can directly be rendered from their compressed representation in local GPU memory at interactive frame rates (see Fig. 1). 相似文献