期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

方晓健徐骥威华彪何险峰葛蔚《计算机与应用化学》2011,28(10)

粒子模拟是研究离散粒子和连续介质运动规律的常用方法.而大规模的粒子模拟通常借助高性能计算系统.近年来,得益于其众核架构,图形处理器(GPU)已成为高性能计算的重要设备,并被广泛用于大规模粒子模拟过程的加速.本文讨论了一种对GPU加速的分布式粒子模拟进行在线可视化的方法.在该方法中,GPU除了被用于加速粒子模拟过程外,也被用于数据到图像的快速转换.同时,并行绘制技术被用于分布式数据的可视化.通过本文所述的方法,用户可在并行计算运行过程中,通过显示于拼接显示墙的高分辨率图像,实时地观察到粒子模拟中发生的现象,并对计算过程进行跟踪和调整. 相似文献

2.

基于GPU的分子动力学模拟Cell Verlet算法实现及其并行性能分析

张帅徐顺刘倩金钟《计算机科学》2018,45(10):291-294, 299

分子动力学模拟存在空间和时间的复杂性,并行加速分子的模拟过程尤为重要。基于GPU硬件数据并行架构的特点,组合分子动力学模拟的原子划分和空间划分的并行策略,优化实现了短程作用力计算Cell Verlet算法,并对分子动力学核心基础算法的GPU实现做了优化和性能分析。Cell Verlet算法实现首先采用原子划分的方式,将每个粒子的模拟计算任务映射到每个GPU线程,并采用空间划分的方式将模拟区域进行元胞划分,建立元胞索引表,实现粒子在模拟空间的实时定位;而在计算粒子间的作用力时,引入希尔伯特空间填充曲线方法来保持数据的线性存储与数据的三维空间分布的局部相关性,以便通过缓存加速GPU的全局内存访问;也利用了访存地址对齐和块内共享等技术来优化设计GPU分子动力学模拟过程。实例测试与对比分析显示,当前的算法实现具有强可扩展性和加速比等优势。相似文献

3.

CPU与GPU并行计算的火焰模拟

王栋栋庄雷《计算机应用》2009,29(6):1702-1710

采用基于粒子插值的SPH方法对火焰流体进行模拟,用GPU加速粒子状态地计算,同时用CPU并行地计算粒子邻接关系并控制粒子产生速率。在SPH模型中,较为高效地加入了漩涡场的计算,增加了粒子运动的细节。在粒子渲染过程中,采用了色度场、有向点扩散和颜色锐化技术,由离散的粒子空间分布得到了较为理想的连续火焰图像。由于该方法属于流体模拟的拉格朗日法,所以火焰具有物理真实性,又由于采用GPU为主CPU为辅的计算架构,使得模拟达到了实时。相似文献

4.

使用GPU加速分子动力学模拟中的非绑定力计算 总被引：1，自引：0，他引：1

吴强杨灿群葛振陈娟《计算机工程与科学》2009,31(Z1)

在分子动力学模拟(MD)中,对非绑定力的计算需要花费大量的时间。本文提出了基于CUDA和Brook+的两种双精度算法,分别在NVIDIA和AMD两款主流GPU上实现了非绑定力的计算,借助GPU的计算能力加速了整个MD程序。算法对MD进行了任务分割,采用区域分解的方法将非绑定力的计算映射到GPU的计算核心上,同时针对两款GPU的各自特点提出了线程块内共享存储、最小化数据集两种优化方法。性能测试结果表明,与Intel Xeon 2.6GHzCPU的单核相比,43.2万粒子的高速粒子碰撞模拟,在配置NVIDIA Tesla C1060的系统上性能提高了6.5倍,在配置AMD HD4870的系统上性能提高了4.8倍。相似文献

5.

面向环境力学的离散元分析软件研发和工程应用

季顺迎赵金凤狄少丞孙珊珊《计算机辅助工程》2014,23(1):69-75

离散元计算分析软件对解决环境力学中的离散介质问题有至关重要的作用.针对环境灾害中的非规则颗粒单元,以圆球颗粒为基本单元,分别构造镶嵌组合单元、黏结组合单元、扩展圆盘单元和扩展多面体单元等,并在此基础上开发基于球形颗粒离散元方法的分析软件(Software of Spherical Particle-based Discrete Element Method,SDEM).该软件可模拟碎冰、岩石和道砟等颗粒材料的力学行为,能直观展现这些力学过程的发生、发展和演化;基于GPU的并行计算实现离散元大规模计算的高效性.对SDEM软件在地质灾害、工程海冰和铁路道床等领域的应用进行介绍. 相似文献

6.

可扩展的实时自然景物模拟算法

肖何饶云波李佳邓利平《计算机工程与科学》2014,36(9):1795-1800

针对传统的粒子系统实时仿真存在只能针对单一自然景物模拟、计算耗时、图像不真实、算法复杂等问题,提出了一种基于粒子系统和图形处理器（GPU）加速通用可扩展的自然景物模拟算法。在该算法中,粒子的物理运动计算过程和渲染阶段完全由CPU转移至GPU,可以增加粒子数量和提高渲染速度;同时,在渲染过程中,可以较好地利用硬件支持的粒子图技术来改善渲染中粒子的外表,选择不同纹理,从而能够较方便地模拟不同的自然景物。最后,在GPU上实现了雪花、喷泉、烟花、瀑布等模拟,算法充分利用了GPU的多通道并行处理性和可编程性,提高了自然景物模拟的实时性,可运用于虚拟现实系统。相似文献

7.

基于GPU的LARED-P算法加速

刘来国徐炜遐杨灿群陈娟《计算机工程与科学》2009,31(Z1)

GPU拥有几百GFlops甚至上TFlops的浮点计算能力,将GPU应用于粒子模拟,可有效提高大规模粒子模拟的速度,降低计算成本。本文利用GPU加速三维激光等离子体模拟算法LARED-P,提出了基于CPU+GPU的任务划分、GPU上任务分解、大规模计算核心的分解方法,结合使用了寄存器、纹理内存对算法进行加速。在双精度条件下,移植后的算法在工作频率为1.44GHz的NVIDIA Tesla S1070的单个GPU上获得了相当于主频2.4GHz的Intel(R)Core(TM)2 Quad CPU Q6600单核的6倍加速比。相似文献

8.

基于CPU-GPU混合加速的SPH流体仿真方法 总被引：1，自引：0，他引：1

胡鹏飞袁志勇廖祥云郑奇陈二虎《计算机工程与科学》2014,36(7):1231-1237

基于光滑粒子流体力学SPH的流体仿真是虚拟现实技术的重要研究内容,但SPH流体仿真需要大量的计算资源,采用一般计算方法难以实现流体仿真的实时性。流体仿真通常由物理计算、碰撞检测和渲染等部分组成,借助GPU并行加速粒子的物理属性计算和碰撞过程使SPH方法的实时流体仿真成为可能。为了满足流体仿真应用中的真实性和实时性需求,提出一种基于CPU GPU混合加速的SPH流体仿真方法,流体计算部分采用GPU并行加速,流体渲染部分采用基于CPU的OpenMP加速。实验结果表明,基于CPU GPU混合加速的SPH流体仿真方法与CPU实现相比,能显著地减少流体仿真单帧计算时间且能更快速地完成渲染任务。相似文献

9.

一个SPH流体实时模拟的全GPU实现框架

郭秋雷唐逸之刘诗秋李桂清《计算机应用与软件》2011,28(11)

怎样实时地进行高度逼真的大规模流体模拟是图形学要研究的一个重要内容。流体的模拟由物理计算、碰撞检测、表面重构和渲染几个部分组成,因此有大量工作针对流体模拟中的各个部分算法进行GPU加速。提出一整套基于GPU的SPH流体模拟加速框架。在利用平滑粒子动力学(SPH)求解Navier-Stokes方程的基础上,借助基于GPU的空间划分PSS(Parallel Spatial Subdivision)来大幅度提升粒子碰撞的检测速度。同时,设计一种基于几何着色器(Geometry Shader)的流体表面信息的重建算法,并进一步地实现基于索引的优化,使得在流体表面重建过程无须遍历不包含表面的区域。实验结果表明,该方法能实时模拟出具有较好真实感的流体场景。相似文献

10.

基于GPU的固态晶体硅分子动力学算法优化

李靖祝爱琦韩林侯超峰《计算机工程》2023,(3):288-295

分子动力学模拟通常用于晶体硅热力学性质的研究，因原子间采用复杂的多体作用势，分子模拟通常面临较高的计算负载，导致计算的时间和空间尺度受限。图形处理器(GPU)采用并行多线程技术，用于计算密集型处理任务，在分子动力学模拟领域中显示巨大的应用潜力。因此，充分利用GPU硬件架构特性提升固态共价晶体硅分子动力学模拟的时空尺度对晶体硅导热机制的研究具有重要意义。基于固态共价晶体硅分子动力学模拟算法，提出面向GPU计算平台的固定邻居算法设计与优化。利用数据结构、分支结构优化等方法解决分子动力学模拟的固定邻居算法全局访存和分支结构的耗时问题，降低数据访存消耗和分支冲突，通过改变线程并行调度方式，在GPU计算平台上实现高性能并行计算，有效解决计算负载问题。实验结果表明，LAMMPS双精度固态晶体硅分子动力学模拟与双精度固定邻居算法的加速比为11.62,HOOMD-blue双精度固态晶体硅分子动力学模拟与双精度固定邻居算法和单精度固定邻居算法的加速比分别为9.39和12.18。相似文献

11.

CPU–GPU Parallel Framework for Real‐Time Interactive Cutting of Adaptive Octree‐Based Deformable Objects

下载免费PDF全文

Shiyu Jia Weizhong Zhang Xiaokang Yu Zhenkuan Pan 《Computer Graphics Forum》2018,37(1):45-59

A software framework taking advantage of parallel processing capabilities of CPUs and GPUs is designed for the real‐time interactive cutting simulation of deformable objects. Deformable objects are modelled as voxels connected by links. The voxels are embedded in an octree mesh used for deformation. Cutting is performed by disconnecting links swept by the cutting tool and then adaptively refining octree elements near the cutting tool trajectory. A surface mesh used for visual display is reconstructed from disconnected links using the dual contour method. Spatial hashing of the octree mesh and topology‐aware interpolation of distance field are used for collision. Our framework uses a novel GPU implementation for inter‐object collision and object self collision, while tool‐object collision, cutting and deformation are assigned to CPU, using multiple threads whenever possible. A novel method that splits cutting operations into four independent tasks running in parallel is designed. Our framework also performs data transfers between CPU and GPU simultaneously with other tasks to reduce their impact on performances. Simulation tests show that when compared to three‐threaded CPU implementations, our GPU accelerated collision is 53–160% faster; and the overall simulation frame rate is 47–98% faster. 相似文献

12.

Efficient GPU Data Structures and Methods to Solve Sparse Linear Systems in Dynamics Applications

Daniel Weber Jan Bender Markus Schnoes André Stork Dieter Fellner 《Computer Graphics Forum》2013,32(1):16-26

We present graphics processing unit (GPU) data structures and algorithms to efficiently solve sparse linear systems that are typically required in simulations of multi‐body systems and deformable bodies. Thereby, we introduce an efficient sparse matrix data structure that can handle arbitrary sparsity patterns and outperforms current state‐of‐the‐art implementations for sparse matrix vector multiplication. Moreover, an efficient method to construct global matrices on the GPU is presented where hundreds of thousands of individual element contributions are assembled in a few milliseconds. A finite‐element‐based method for the simulation of deformable solids as well as an impulse‐based method for rigid bodies are introduced in order to demonstrate the advantages of the novel data structures and algorithms. These applications share the characteristic that a major computational effort consists of building and solving systems of linear equations in every time step. Our solving method results in a speed‐up factor of up to 13 in comparison to other GPU methods. 相似文献

13.

土壤碰撞问题的刚-散耦合动力学分析

孙昊刘铸永《动力学与控制学报》2023,21(9):33-40

本文以探测器着陆行星土壤为背景,对土壤碰撞问题进行刚-散耦合动力学建模与仿真分析研究.结合离散元方法和多体动力学方法,对半球壳装置土壤跌落问题进行耦合动力学仿真.通过与实验结果及有限元仿真结果对比,验证所采用离散元方法的有效性.分析了颗粒场中颗粒尺寸、恢复系数、静摩擦系数等参数,对碰撞中物体和颗粒场的碰撞加速度、碰撞持续时间、振动波形等动力学响应的影响.本研究将拓展对刚-散耦合动力学问题的理论认识,为探测器着陆系统的设计提供技术支持. 相似文献

14.

CPU–GPU hybrid parallel strategy for cosmological simulations

Yueqing Wang Yong Dou Song Guo Yuanwu Lei Dan Zou 《Concurrency and Computation》2014,26(3):748-765

Gadget is a simulation application for N‐body and smoothed particle hydrodynamics problems in cosmology, and it is widely applied in solving series of cosmological problems. N‐body focuses on the motion of the interaction of N particles, and smoothed particle hydrodynamics is a fluid simulation algorithm that studies the movement of fluid through particle simulation. Most scholars focus their attention on accelerating Gadget on multi‐core CPU or graphics processing units (GPUs) platforms. However, these research activities failed to achieve CPU–GPU hybrid computing, which resulted in tremendous waste of CPU computing resources. In this paper, we propose a CPU–GPU hybrid parallel strategy to accelerate Gadget‐2, a massively parallel structure formation code for cosmological simulations. This strategy uses CPU and GPU to process the calculation of short‐range force. To ensure CPU and GPU workload balance, a dynamic task allocation scheme is proposed according to the computational performance difference between the CPU and GPU. Experimental results showed that our CPU–GPU hybrid parallel strategy achieved an overall speedup factor of 18.6 and a partial speedup factor for short‐range force calculation of 28.35 compared with a single‐core CPU implementation for particles in million‐size magnitudes. Moreover, compared with a GPU platform that contained 12 CPU cores and one GPU, our hybrid parallel strategy obtained overall speedup and partial speedup factors of 6% and 20%, respectively. Furthermore, the scalability of the hybrid strategy is very fine – its performance will be enhanced when the problem scale is increasing. However, this strategy also has its limitation that the performance enhancement will be decreasing if the ratio(the number of CPU cores divides that of the GPU cards) reduces. Finally, in our hybrid strategy, the CPU coefficient of utilization improved by 17.14% or better. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

15.

基于GPU/CPU混合架构的流程序多粒度划分与调度方法研究

陈文斌杨瑞瑞于俊清《计算机工程与科学》2017,39(1):15-26

数据流编程语言简化了相关领域的编程,很好地把任务计算和数据通信分开,从而使应用程序分别在任务级和数据级均具有可并行性。针对GPU/CPU混合架构中存在的大量数据并行、任务并行和流水线并行等问题,提出并实现了面向GPU/CPU混合架构的数据流程序任务划分方法和多粒度调度策略,包括任务的分类处理、GPU端任务的水平分裂和CPU端离散任务的均衡化,构造了软件流水调度,经过编译优化生成OpenCL的目标代码。任务的分类处理根据数据流程序各个任务的计算特点和任务间的通信量大小,将各任务分配到合适的计算平台上;GPU端任务的水平分裂利用GPU端任务的并行性将其均衡分裂到各个GPU,以避免GPU间高额的通信开销影响程序整体的执行性能;CPU端离散任务的均衡化通过选择合适CPU核,将CPU端各任务均衡分配给各CPU核,以保证负载均衡并提高各CPU核的利用率。实验以多块NVIDIA Tesla C2050、多核CPU为混合架构平台,选取多媒体领域典型的算法作为测试程序,实验结果表明了划分方法和调度策略的有效性。相似文献

16.

Performance analysis of single‐phase,multiphase, and multicomponent lattice‐Boltzmann fluid flow simulations on GPU clusters

J. Myre S. D. C. Walsh D. Lilja M. O. Saar 《Concurrency and Computation》2011,23(4):332-350

The lattice‐Boltzmann method is well suited for implementation in single‐instruction multiple‐data (SIMD) environments provided by general purpose graphics processing units (GPGPUs). This paper discusses the integration of these GPGPU programs with OpenMP to create lattice‐Boltzmann applications for multi‐GPU clusters. In addition to the standard single‐phase single‐component lattice‐Boltzmann method, the performances of more complex multiphase, multicomponent models are also examined. The contributions of various GPU lattice‐Boltzmann parameters to the performance are examined and quantified with a statistical model of the performance using Analysis of Variance (ANOVA). By examining single‐ and multi‐GPU lattice‐Boltzmann simulations with ANOVA, we show that all the lattice‐Boltzmann simulations primarily depend on effects corresponding to simulation geometry and decomposition, and not on the architectural aspects of GPU. Additionally, using ANOVA we confirm that the metrics of Efficiency and Utilization are not suitable for memory‐bandwidth‐dependent codes. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

17.

GPU-based Collision Detection for Deformable Parameterized Surfaces 总被引：1，自引：0，他引：1

Alexander Greß Michael Guthe Reinhard Klein 《Computer Graphics Forum》2006,25(3):497-506

相似文献

18.

基于紫金桥软件的海上油田自控信息系统

梁宏宝祁雪峰范广辉《自动化与仪表》2008,23(12)

海上采油平台环境恶劣,油田生产管理不便,为了安全生产,提高采油时率及资料录取率的及时可靠,针对海上不同于陆地的特殊情况,在海上开展自动化技术,以实现海上采油平台的卫星平台与中心平台数据统一。紫金桥提出以实时数据库系统为平台实现海上采油平台自动化与信息化的集成,本系统实现了实时数据流程模拟、异常报警、停机处理、数据统计分析等功能,为海上油田高效生产及数字油田建设提供新的解决方案。相似文献