首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 906 毫秒
1.
目前,基于GPU或多核CPU加速的光线跟踪算法是与硬件相关的.研究具有跨平台性能的实时光线跟踪算法既具有挑战性,又具有很强的应用价值.为此,提出一种基于OpenCL并且跨平台的动态场景实时光线跟踪绘制算法.首先通过对通用GPU并行处理性能进行发掘,将光线跟踪中KD-Tree建立、场景遍历和绘制3个过程均设计在GPU上,而CPU只负责其中各过程的调度,从而充分利用了GPU的计算性能,并有效地降低了数据传输开销;通过设计并行分区、并行SAH、紧密的数据管理以及区间性叶结点存储等算法,在GPU中高效、高质量地建立动态场景的KD-Tree,同时高质量的KD-Tree也有效地加速了场景的遍历速度.该算法以广度优先和大规模并行模式建立K D-Tree,更具通用性,既可以运行于NVIDIA GPU(CUDA GPU),也可以运行于AMD GPU.实验结果表明,文中算法可以在NVIDIA GPU和AMD GPU上对中等规模的动态场景实现实时光线跟踪绘制.  相似文献   

2.
针对传统的航海雷达回波模拟利用扫描扇形与电子海图求交,得到的回波多边形数量巨大,大场景的回波模拟无法实时处理、真实感差等问题,提出了利用DEM图像增强的回波模拟方法。该算法建立了一种CPU结合GPU的异构系统,首先在CPU中通过聚合、连通步骤减少多边形数量实时生成回波,然后在GPU中叠加港口DEM高程信息对回波实现图像增强,纬度上分段处理消除纬度渐长率的影响。该算法实现了回波图像从扫描像素点到扇形带状回波的转化,仿真过程符合真实的雷达回波形成原理。在较大仿真场景下,回波图像增强经过GPU处理后并行效率提高,相对CPU的加速比可至240倍。  相似文献   

3.
作为高性能科学计算的典型应用,利用GPU并行加速分子动力学模拟是2007年以来计算化学领域高性能计算的热点。本文概述了支持GPU加速的不同MD软件的特点和其研究进展,重点分析了Amber、GROMACS、ACEMD三个代表性软件的单GPU卡和多GPU卡计算性能,结果表明在配置相同数目GPU卡的情况下,单节点比多节点在计算性能上较有优势,桌面工作站配多块GPU卡是性价比相对较好的MD模拟计算模式。本文还考察了单精度和双精度GPU加速MD的模拟计算结果的准确性,与CPU的计算结果进行了比较,结果表明,GPU的计算结果总体而言是可信的。最后,本文对GPU并行加速MD模拟的研究现状进行总结并对未来发展做了展望。  相似文献   

4.
雷达信号处理算法的高性能实现是雷达系统中的关键技术。传统雷达信号处理算法的高性能加速主要依赖DSP和FPGA等专用设备,而它们具有开发周期长、调试难度大、成本高等缺点。GPU作为通用设备,特别适合处理雷达信号这种大规模数据。目前,GPU加速雷达信号处理的成果大多集中在SAR成像等应用领域,针对脉冲多普勒雷达相关研究还比较少。为了满足雷达回波数据对吞吐量和处理实时性的高要求,提出了基于网格跨步并行的细粒度并行化、基于多CUDA流的粗粒度并行化和基于并行扫描的数据预处理等优化技术。 从性能测试和误差分析等多角度评估了算法的实时性和准确性,在所使用的硬件平台上相比于传统CPU实现达到了300倍以上的加速比,并优于其它已有的CUDA加速的脉冲多普勒雷达信号处理算法。  相似文献   

5.
分子动力学模拟(MD)是分子模拟的一类常用方法,为生物体系的模拟提供了重要途径。由于计算强度大,目前MD可模拟的时空尺度还不能满足真实物理过程的需要。作为CPU的加速设备,近年来,GPU为提高MD计算能力提供了新的可能。GPU编程难点主要在于如何将计算任务分解并映射到GPU端并合理组织线程及存储器,细致地平衡数据传输和指令吞吐量以发挥GPU的最大计算性能。静电效应是长程作用,广泛存在于生物现象的各个方面,对其精确模拟是MD的重要组成部分。Particle-Mesh-Ewald(PME)方法是公认的精确处理静电作用的算法之一。本文介绍在本实验室已建立的GPU加速分子动力学模拟程序GMD的基础上,基于NVIDIACUDA,采用GPU实现PME算法的策略,针对算法中组成静电作用的三个部分即实空间、傅立叶空间和能量修正项,分别采用不同的计算任务组织策略以提升整体性能。使用事实上的标准算例dhfr进行的测试结果表明,实现PME的GMD程序,性能分别是Gromacs4.5.3版单核CPU的3.93倍,8核CPU的1.5倍,基于OpenMM2.0加速的Gromacs4.5.3GPU版本的1.87倍。  相似文献   

6.
基于OpenCL的图像积分图算法优化研究   总被引:1,自引:0,他引:1  
图像积分图算法在快速特征检测中有着广泛的应用,通过GPU对其进行性能加速有着重要的现实意义。然而由于GPU硬件架构的复杂性和不同硬件体系架构间的差异性,完成图像积分图算法在GPU上的优化,进而实现不同GPU平台间的性能移植是一件非常困难的工作。在分析不同CPU平台底层硬件架构的基础上,从片外访存带宽利用率、计算资源利用率和数据本地化等多个角度考察了不同优化方法在不同GPU硬件平台上对性能的影响。并在此基础上实现了基于OpenCL的图像积分图算法。实验结果表明,优化后的算法在AMD和NVIDIA CPU上分别取得了11.26和12.38倍的性能加速,优化后的GPU kernel比NVIDIA NPP库中的相应函数也分别取得了55.01%和65.17%的性能提升。验证了提出的优化方法的有效性和性能可移植性。  相似文献   

7.
索引作为加速数据库查询的一种成熟技术,始终受限于CPU的内存带宽与架构的发展,因此无法在性能上实现质的飞跃.所以使用GPU赋能索引技术来辅助数据库执行查询任务是势在必行的.因此,针对异构环境下索引结构的适应性以及现有GPU索引受限于显存容量导致扩展性不够等问题,提出了一种CPU与GPU协同处理的HPGB+-Tree索引算法.该算法以混合架构的方式重新构建索引结构,使其完全适应GPU的硬件特性,突破CPU内存带宽受限和GPU内存容量受限的双重难关.HPGB+-Tree索引不仅解决了索引异构问题,还充分利用两大硬件平台各自的优势加速基于索引的相关操作.在不同数据量与不同任务规模下对算法的性能进行了评估,实验结果表明,该算法在内核占用率与程序执行速度两个方面都极具优势,在性能上处于领先地位.  相似文献   

8.
在集群与GPU组成的异构并行计算平台上,使用MPI+CUDA混合编程模型,实现基于ABEEMσπ模型的分子动力学模拟中电荷分布的计算.通过对电荷分布分布求解中的计算部分移植到GPU上进行,并针对算法中通信开销大和资源未充分利用的问题,通过异构平台的异步并发方法进行优化,提高了求解效率.性能测试结果表明,相比于单纯MPI并行算法,优化后GPU加速的异构并行算法,在化学大分子模型电荷分布计算上,有着明显的性能优势.  相似文献   

9.
近年来,GPU在通用计算方面对传统的CPU应用发起了强有力的冲击,被广泛运用于各种高性能计算中,特别是网络安全领域.为了解决传统硬件加速存在的缺陷问题,首先介绍GPU的基本硬件架构及其并行计算原理,其次说明基于CUDA的GPU编程与通用CPU编程之间算法实现的性能差异,最后详细分析了几种典型的网络安全算法,并设计了相应的GPU并行加速试验进行性能测试.实验结果表明,在算法设计合理的前提下,GPU可以提升应用算法上百倍的计算性能.  相似文献   

10.
GPU拥有几百GFlops甚至上TFlops的浮点计算能力,将GPU应用于粒子模拟,可有效提高大规模粒子模拟的速度,降低计算成本。本文利用GPU加速三维激光等离子体模拟算法LARED-P,提出了基于CPU+GPU的任务划分、GPU上任务分解、大规模计算核心的分解方法,结合使用了寄存器、纹理内存对算法进行加速。在双精度条件下,移植后的算法在工作频率为1.44GHz的NVIDIA Tesla S1070的单个GPU上获得了相当于主频2.4GHz的Intel(R)Core(TM)2 Quad CPU Q6600单核的6倍加速比。  相似文献   

11.
目前GPU计算能力让kD-Tree划分实时场景光线追踪并行算法的执行变得更具有可行性。图像处理器(GPU)高效应用于多边形的渲染,GPU内部单元的可编程性已经让其广泛应用于多边形渲染以外的领域。本文详细描述使用OpenCL的kD-Tree遍历算法,对运算占主要部分的相交测试作出改进,同时提高了GPU计算能力与存储器的利用率,从而提升了光线追踪算法效率。  相似文献   

12.
Current GPU computational power enables the execution of complex and parallel algorithms, such as ray tracing techniques supported by kD-trees for 3D scene rendering in real time. This work describes in detail the study and implementation of eight different kD-tree traversal algorithms using the parallel framework NVIDIA Compute Unified Device Architecture, in order to point their pros and cons regarding performance, memory consumption, branch divergencies and scalability on multiple GPUs. In addition, two new algorithms are proposed by the authors based on this analysis, aiming to performance improvement. Both of them are capable of reaching speedup gains up to 3 × when compared to recent and optimized parallel traversal implementations. As a consequence, interactive frame rates are possible for scenes with 1,408 × 768 pixels of resolution and 3.6 million primitives.  相似文献   

13.
结构加速度无线监测传感网络的软件设计与实现   总被引:4,自引:0,他引:4  
介绍了结构加速度监测的无线传感网络的拓扑结构以及无线传感器节点的硬件单元,结合网络结构确定了网络的通信协议,并重点介绍了网络的软件设计,利用所建立的无线加速度传感器网络进行了实验测试。研究表明:将软件嵌入在网络硬件平台上,使节点组网方便,具有自校准功能;由软硬件集成的无线加速度传感器网络具有网络集成、功能集成、系统集成的特点,适合土木工程结构监测应用。  相似文献   

14.
The focus of research in acceleration structures for ray tracing recently shifted from render time to time to image, the sum of build time and render time, and also the memory footprint of acceleration structures now receives more attention. In this paper we revisit the grid acceleration structure in this setting. We present two efficient methods for representing and building a grid. The compact grid method consists of a static data structure for representing a grid with minimal memory requirements, more specifically exactly one index per grid cell and exactly one index per object reference, and an algorithm for building that data structure in linear time. The hashed grid method reduces memory requirements even further, by using perfect hashing based on row displacement compression. We show that these methods are more efficient in both time and space than traditional methods based on linked lists and dynamic arrays. We also present a more robust grid traversal algorithm. We show that, for applications where time to image or memory usage is important, such as interactive ray tracing and rendering large models, the grid acceleration structure is an attractive alternative.  相似文献   

15.
This paper proposes a new group-based acceleration data structure called gkDtree for interactive ray tracing of dynamic scenes. The main idea of the gkDtree is to construct the acceleration structure with a multi-level hierarchy, and to integrate a parallelization approach to result in a faster update and a more efficient tree traversal. A gkDtree can be viewed as a set of kd-trees, each of which is a local acceleration structure corresponding to a group. For a gkDtree, a scene is divided into several groups based on a scene graph. The local acceleration structure of each group involving only dynamic primitives is rebuilt. To achieve higher parallelization, dependencies among groups in different levels are removed before rebuilding occurs in parallel. To enhance the scalability of parallelization, a simple and fast load-balancing scheme is introduced. Furthermore, we apply a variety of accurate SAH (surface area heuristic) algorithms into tree generation for both static and dynamic groups. The experimental results show that a gkDtree has a real-time update performance. It has an update performance that is up to 166 times faster than a kd-tree for our test scenes in a six-core hardware system environment. Furthermore, the results also show that tree traversal performance of a gkDtree is competitive with that of a kd-tree.  相似文献   

16.
The trajectory of a shipbome radar target has a certain complexity, randomness, and diversity. Tracking a strong maneuvering target timely, accurately, and effectively is a key technology for a shipbome radar tracking system. Combining a variable structure interacting multiple model with an adaptive grid algorithm, we present a variable structure adaptive grid inter- acting multiple model maneuvering target tracking method. Tracking experiments are performed using the proposed method for five maneuvering targets, including a uniform motion - uniform acceleration motion target, a uniform acceleration motion - uni- form motion target, a serpentine locomotion target, and two variable acceleration motion targets. Experimental results show that the target position, velocity, and acceleration tracking errors for the five typical target trajectories are small. The method has high tracking precision, good stability, and flexible adaptability.  相似文献   

17.
结构监测的无线加速度传感器设计与制作   总被引:10,自引:2,他引:8  
喻言  李宏伟  欧进萍 《传感技术学报》2004,17(3):463-466,471
提出了一种用于土木工程结构监测的无线加速度传感器的模块化设计方法,并研制开发了一种无线加速度传感器.首先,详细地阐述了组成无线传感器的各单元的测试原理和设计原则,并对各组成单元进行了集成;其次,给出了节点的软件设计流程;最后,对节点进行了测试.研究结果表明:该无线加速度传感器具有良好的精度;而且还具有自校准、集数字模拟信号采集于一体、稳定等特点,适合在土木工程结构监测中应用.  相似文献   

18.
We presented a theoretical study of the performance of a novel FBAR-on-diaphragm sensor-head structure for the FBAR-based electro-acoustic resonant micro-accelerometer. This structure overcomes disadvantages in the FBAR-beam structure for its limited cantilever beam thickness, and deficiencies in the embedded-FBAR structure for its complex micro-fabrication process. Its elastic diaphragm is made of silicon dioxide (SiO2)/silicon nitride (Si3N4) bilayer film, which is not only more susceptible to the IC compatible integration process for the Si-based microstructure and the FBAR, but also improves sensitivity and temperature stability of the BAW accelerometer. FBAR-on-diaphragm type BAW accelerometer integrates the acceleration sensing structure, i.e., the SiO2/Si3N4 bilayer diaphragm and the Si proof-mass, with the AlN FBAR electro-acoustic transducer. Preliminary performance analysis on FBAR-on-diaphragm type BAW accelerometer suggests that the FBAR-on-diaphragm structure is feasible. We obtained modal frequencies of the FBAR-on-diaphragm structure and stress distribution of the diaphragm under 0–100 g acceleration loads through the finite element modal analysis and static simulation, Applying the calculated maximum stress to the piezoelectric film in FBAR for qualitative analysis, and combining the dependency of elastic coefficient on stress in the Wurtzite AlN film calculated with the first-principle method, we roughly predicted the maximum elastic coefficient variation in the Wurtzite AlN film under different acceleration load. With the help of the RF simulation software ADS, we changed the longitudinal wave velocity corresponding to the elastic constants with variant acceleration loads. By comparing the resulted resonant frequencies of the sensor head without and with different acceleration loads, we qualitatively characterized its frequency shift and sensitivity. In our study, we gave further analysis of the simulation results. It reveals that the first-order modal frequency of the SiO2/Si3N4 circular diaphragm is quite far away from the higher ones, which means less cross modal coupling. It also reveals that under the acceleration load, its resonant frequency with a quite linear acceleration–frequency shift characteristic will up-shift with the sensitivity of several KHz/g.  相似文献   

19.
提出一种新的自动换档汽车动力系统控制结构。将驾驶员的操作意图提炼为唯一的加速度指令,并在跟随这一指令的过程中根据发动机特性与实际道路情况综合控制节气门开度与变速器档位,实现动力性、舒适性与燃油经济性的优化。仿真证明该结构及其控制策略能够在尽量跟随驾驶员加速度指令的同时实现燃油经济性最优化换档。  相似文献   

20.
We investigate the use of two‐level nested grids as acceleration structure for ray tracing of dynamic scenes. We propose a massively parallel, sort‐based construction algorithm and show that the two‐level grid is one of the structures that is fastest to construct on modern graphics processors. The structure handles non‐uniform primitive distributions more robustly than the uniform grid and its traversal performance is comparable to those of other high quality acceleration structures used for dynamic scenes. We propose a cost model to determine the grid resolution and improve SIMD utilization during ray‐triangle intersection by employing a hybrid packetization strategy. The build times and ray traversal acceleration provide overall rendering performance superior to previous approaches for real time rendering of animated scenes on GPUs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号