期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

曹丹平《计算机工程与应用》2012,48(36):9-13

波动方程数值模拟普遍存在计算量大的问题,如何根据波动方程有限差分方法的特点开展并行化方法研究是适应微机多核发展的必然趋势。结合波动方程数值模拟中的多层循环嵌套问题和OpenMP的特点,通过确定循环体并行顺序、减少串行环节、合并循环体、准确设置制导语句以及线程绑定优化等方法有助于实现微机多核的高效并行。针对波动方程特点的多核并行不仅有助于提高单机计算效率,对于提高计算机集群上常用的MPI+OpenMP混合并行效率也具有重要意义。相似文献

2.

混合并行技术在激光化学反应模拟中的应用 总被引：2，自引：0，他引：2

李鸿健白明泽唐红孙世新《计算机应用》2010,30(6):1687-1689

为提高激光化学反应模拟效率,在半经典分子动力学模拟中引入混合并行技术和双层并行思想。基于MPI+OpenMP混合模型设计并实现激光化学反应双层并行模拟算法,上层基于MPI实现节点间的原子分解并行,下层基于OpenMP实现节点内的多线程矩阵并行乘法。在SMP集群中测试表明,模拟大分子体系激光化学反应并行效率可达60%以上。因此,应用混合并行技术可有效提高激光化学反应模拟效率。相似文献

3.

基于OpenMP的电力系统潮流并行计算研究

马科陈晓东万华舰《福建电脑》2014,(1):95-97

随着电网规模的扩大,电力计算越加复杂,传统串行计算方法难以满足电力系统模拟和实时控制的仿真要求。引入并行潮流计算是解决电力系统计算耗时过多的重要途径。本文介绍了并行技术,基于OpenMP编写了潮流计算程序,用串行和并行两种方式对标准测试系统的潮流进行测试。结果显示使用了并行技术的程序计算时间明显优于串行,因此使用OpenMP技术是一种低成本、高效率的解决方案。相似文献

4.

计算与通信重叠和并行I/O在粒子模拟中的应用

颜小洋张伟文布社辉邓辉舫《计算机应用》2011,31(Z1):33-36,39

三维电磁场粒子模拟是研究空间众多微观物理现象的一项先进数值模拟方法。虽然应用MPI和OpenMP混合编程技术实现了程序并行,但阻塞通信的通信同步和应用网络文件系统集中式数据I/O的数据传输降低了程序效率。介绍引入非阻塞通信法,最初计算需要通信部分,在其他计算继续时,进行非阻塞通信,最后接收全部数据,从而实现计算和通信重叠,减少通信等待时间;在分布式存储系统中,各节点同时把本节点数据输入输出到本地单独文件中,大幅度减少数据并行I/O时间,随着数据量和CPU数的增加,改善更加明显,从而提高程序性能。相似文献

5.

SMP集群系统上可扩展并行特征问题求解器研究

下载免费PDF全文

赵永华迟学斌姜金荣《计算机工程》2006,32(19):3-5

基于对称三对角特征问题的分而治之方法,提出了一个适合SMP集群环境的多级混合并行算法。SMP节点内的并行求解采用了粗粒度和细粒度两种OpenMP并行。为了改善纯MPI算法中的负载不平衡,混合并行算法使用了动态任务分配方法。在深腾6800上的试验表明,混合并行算法具有好的扩展性和加速比。关键词：SMP集群;MPI+OpenMP;混合并行;并行求解器相似文献

6.

基于OpenMP的三维并行Delaunay网格生成算法及实现

张晓蒙陆忠华张鉴《计算机应用研究》2016,33(12)

针对大多数并行Delaunay网格生成算法无法充分利用多核共享内存结构的问题,在原有面向共享内存的二维并行算法基础上,根据三维问题的特点提出基于OpenMP的三维并行Delaunay网格生成算法。算法采用划分求解区域为方格的方法实现候选点集的划分和并行插点。使用OpenMP对算法进行实现,并利用多种实现技术避免线程间的同步等待,提升算法效率。实验结果表明,本文算法及所采用的实现技术可以在三维条件下快速生成大量网格单元,具有较高的并行效率,同时能够保证良好的网格质量。相似文献

7.

基于OpenMP的SAR距离多普勒成像算法的研究与实现

张艳华魏桂敏《计算机与现代化》2011,(8):71-73,95

阐述OpenMP的特点与使用方法,分析合成孔径雷达距离多普勒成像算法,得到该算法中适合利用OpenMP并行处理的部分:傅里叶变换和逆傅里叶变换,并将OpenMP应用到傅里叶变换和逆傅里叶变换中。将原雷达距离多普勒成像算法,设计成可并行化执行的程序。采用pragma omp for和pragma omp section两种并行设计方法,通过创建多个线程,缩短程序执行时间。实验证明,采用双核处理器并行化的雷达成像算法,图像生成时间缩短到原来时间的67%左右,可有效地提高处理效率,充分挖掘处理器的处理能力。相似文献

8.

多种任务调度混合的IB-LBM并行优化方法

刘智翔刘慧超黄冬梅周丽萍苏诚《计算机应用》2020,40(2):386-391

在使用浸入边界-格子玻尔兹曼方法（IB-LBM）求解流场时,为了得出比较精确的结果,往往需要规模较大、较密集的流场网格,这就会造成模拟过程时间长的问题。为了提高模拟的效率,利用IB-LBM局部计算的特点,结合OpenMP中三种不同的任务调度方式,给出了IB-LBM的并行优化方法。在并行优化中混合使用三种任务调度方式,以弥补单一任务调度造成的负载不均衡问题;将IB-LBM进行结构化分解,测试每一结构部分的最优调度方式,根据实验结果选择最优的调度组合方式,而在不同线程数下,最优的组合方式是不同的。优化结果通过并行加速比来检验,可以得出：在线程数较少的情况下,加速比趋近于理想状态;在线程数较多的情况下,虽然线程开辟和销毁的额外时间消耗对性能的优化产生了影响,模型的并行性能仍有了很大的提升。流场的模拟结果显示,在进行并行优化后, IB-LBM对流固耦合问题模拟的准确性并没有受到影响。相似文献

9.

基于OpenMP的共轭梯度法并行加速

胡建平王剑钢《电脑编程技巧与维护》2016,(6):29-30

共轭梯度法是为求解线性方程组而独立提出的一种常用的数值计算方法,被广泛地应用于天气动力、物理海洋等数值计算中,其复杂的矩阵计算产生巨大工作量,成为业务化应用过程中的计算瓶颈。利用OpenMP共享并行技术,将大量计算并行化,实现基于OpenMP的共轭梯度法并行加速,为共轭梯度法的广泛应用提供了新的计算解决方案。相似文献

10.

基于局域网的有限元网格分布式并行生成 总被引：2，自引：0，他引：2

李水乡王云鹏陈永强《计算机工程与设计》2005,26(12):3165-3166,3193

在常见的PC＋Windows＋LAN环境下,采用Winsock API网络通信接口实现了局域网上的分布式并行有限元网格生成。网格生成区域在服务器上按照工作站数量被分解为若干个子区域,这些子区域及网格控制参数通过局域网（LAN）传给工作站。子区域在工作站上被剖分成子网格并通过局域网传回服务器以合并形成最终网格。算例表明只要有足够的计算节点,分布式并行技术可以将网格生成速度大幅度提高,而网络通信所占时间的比例基本固定。相似文献

11.

基于二维结构化网格的可压缩流体并行算法研究

皇甫永硕刘杰龚春叶《计算机工程与科学》2017,39(9):1602-1609

基于二维/轴对称高精度可压缩多相流计算流体力学方法 MuSiC-CCASSIM的结构化网格部分,设计了区域并行分解方法;针对各处理器边界数据的通信,设计了阻塞式通信与非阻塞式通信并行算法;为了减少通信开销,设计了MPI/OpenMP混合并行优化算法。在天河二号超级计算机上进行了测试,每个核固定网格规模为625*250,最多调用8 192核。测试数据表明,采用MPI/OpenMP混合并行算法、纯MPI非阻塞式通信并行算法和纯MPI阻塞式通信并行算法的程序的平均并行效率分别达到86%、83%和77%,三种算法都具有良好的可扩展性。相似文献

12.

Application of Multiblock Grid and Dual-Level Parallelism in Coastal Ocean Circulation Modeling

Phu Luong Clay P. Breshears Le N. Ly 《Journal of scientific computing》2004,20(2):257-275

Numerical grid generation techniques play an important role in the numerical solution of partial differential equations on arbitrarily shaped regions. For coastal ocean modeling, in particular, a one-block grid covering the region under study is commonly used. Most bodies of water of interest have complicated coastlines; e.g., the Persian Gulf and Mediterranean Sea. Since such one-block grids are not boundary conforming, the number of unused grid points can be a relatively large portion of the entire domain space. Other disadvantages of using a one block grid include large memory requirements and long computer processing time. Multiblock grid generation and dual-level parallel techniques are used to overcome these problems. Message Passing Interface (MPI) is used to parallelize the Multiblock Grid Princeton Ocean Model (MGPOM) such that each grid block is assigned to a unique processor. Since not all grid blocks are of the same size, the workload varies between MPI processes. To alleviate this, OpenMP dynamic threading is used to improve load balance. Performance results from the MGPOM model on a one-block grid, a twenty block grid, and a forty-two block grid after a 90-day simulation for the Persian Gulf demonstrate the efficacy of the dual-level parallel code version. 相似文献

13.

Portable Solution for Modeling Compressible Flows on All Existing Hybrid Supercomputers

S. A. Soukov A. V. Gorobets P. B. Bogdanov 《Mathematical Models and Computer Simulations》2018,10(2):135-144

A variant of a numerical algorithm for simulating viscous gasdynamic flows on unstructured hybrid grids and its software implementation for heterogeneous computations is described. The system of Navier–Stokes equations is approximated by the finite-volume method of an increased approximation order with the values of the variables being defined at the mass centers of the grid elements. The distributed software implementation of the numerical algorithm is adapted to running on hybrid computer systems of various architectures. Comparative implementations were created using the MPI, OpenMP, CUDA, and OpenCL software models permitting the use of multicore processors and various types of accelerators, including NVIDIA and AMD graphics processors, and Intel Xeon Phi multicore coprocessors. The data exchange between MPI processes and between processors and accelerators is carried out simultaneously with the execution of calculations (both in MPI + OpenMP mode and when using CUDA or OpenCL). The indicators of parallel efficiency and performance on systems with different types of computing devices are studied in detail. In the tests, up to 260 GPUs were successfully used. 相似文献

14.

多重网格格子Boltzmann方法的并行算法

刘智翔宋安平徐磊郑汉垣张武《计算机应用》2014,34(11):3065-3068

针对复杂流动数值模拟中的格子Boltzmann方法存在计算网格量大、收敛速度慢的缺点,提出了基于三维几何边界的多重笛卡儿网格并行生成算法,并基于该网格生成方法提出了多重网格并行格子Boltzmann方法（LBM）。该方法结合不同尺度网格间的耦合计算,有效减少了计算网格量,提高了收敛速度;而且测试结果也表明该并行算法具有良好的可扩展性。相似文献

15.

Recent Advances in Parallel Advancing Front Grid Generation

Rainald Löhner 《Archives of Computational Methods in Engineering》2014,21(2):127-140

The quest for scalable, parallel advancing front grid generation techniques now spans more than two decades. A recent innovation has been the use of a so-called domain-defining grid, which has led to a dramatic increase in robustness and speed. The domain-defining grid (DDG) has the same fine surface triangulation as the final mesh desired, but a much coarser interior mesh. The DDG renders the domain to be gridded uniquely defined and allows for a well balanced work distribution among the processors during all stages of grid generation and improvement. In this way, most of the shortcomings of previous techniques are overcome. Timings show that the approach is scalable and able to produce large grids of high quality in a modest amount of clocktime. These recent advances in parallel grid generation have enabled a completely scalable simulation pipeline (grid generation, solvers, post-processing), opening the way for truly large-scale computations using unstructured, body-fitted grids. 相似文献

16.

GRAPES动力框架中大规模稀疏线性系统并行求解及优化

张琨贾金芳严文昕黄建强王晓英《计算机工程》2022,48(1):149-154+162

赫姆霍兹方程求解是GRAPES数值天气预报系统动力框架中的核心部分,可转换为大规模稀疏线性系统的求解问题,但受限于硬件资源和数据规模,其求解效率成为限制系统计算性能提升的瓶颈。分别通过MPI、MPI+OpenMP、CUDA三种并行方式实现求解大规模稀疏线性方程组的广义共轭余差法,并利用不完全分解LU预处理子（ILU）优化系数矩阵的条件数,加快迭代法收敛。在CPU并行方案中,MPI负责进程间粗粒度并行和通信,OpenMP结合共享内存实现进程内部的细粒度并行,而在GPU并行方案中,CUDA模型采用数据传输、访存合并及共享存储器方面的优化措施。实验结果表明,通过预处理优化减少迭代次数对计算性能提升明显,MPI+OpenMP混合并行优化较MPI并行优化性能提高约35%,CUDA并行优化较MPI+OpenMP混合并行优化性能提高约50%,优化性能最佳。相似文献

17.

Parallel software package for simulation of continuum mechanics problems on modern multiprocessor systems

S. V. Polyakov T. A. Kudryashova A. A. Sverdlin E. M. Kononov O. A. Kosolapov 《Mathematical Models and Computer Simulations》2011,3(1):46-57

A parallel software package designed for numerical simulation of continuum mechanics problems is presented. In order to illustrate the capabilities of the package, two problems on simulation of supersonic gas flows, around a descent space vehicle and in the vicinity of a micro nozzle, were chosen. A system of equations of quasi-gas dynamics was used as a mathematical model for the dynamics of gas. Inasmuch as radiation transfer in gas is taken into account in the first problem, the multigroup approach and diffusion approximation are employed to describe radiation processes. The numerical algorithm is based on explicit in time finite volume schemes using nonregular locally condensed grids of different types. Parallel implementation of numerical schemes was executed in the context of the MPI+OpenMP technology and optimized for computations performed by means of modern clusters with hybrid architecture. 相似文献

18.

大规模空气动力学数值模拟中的并行重叠网格装配技术

梁姗张鉴陆忠华《数据与计算发展前沿》2016,7(3):66-76

重叠网格方法因为其网格生成简单、局部网格质量高等优点,是模拟有相对运动的非定常问题的一种常用方法。重叠网格首先需要装配过程,即通过建立网格间的重叠关系、确定宿主及待插值单元,实现网格间的流场信息交换。本文针对大规模数值模拟中,网格分散存储在多个处理器上的情况,研究了分布式重叠网格装配方法。本文先提出分布式网格装配问题,再详细阐述适用于三维结构网格的分布式装配技术,最后利用标准的机翼挂载模型对算法进行了测试,结果表明该算法可以在保证较高并行性的同时,为非定常计算提供准确的流场插值信息。相似文献

19.

A hybrid MPI–OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence

Pablo D. Mininni Duane Rosenberg Raghu Reddy Annick Pouquet 《Parallel Computing》2011,37(6-7):316-326

A hybrid scheme that utilizes MPI for distributed memory parallelism and OpenMP for shared memory parallelism is presented. The work is motivated by the desire to achieve exceptionally high Reynolds numbers in pseudospectral computations of fluid turbulence on emerging petascale, high core-count, massively parallel processing systems. The hybrid implementation derives from and augments a well-tested scalable MPI-parallelized pseudospectral code. The hybrid paradigm leads to a new picture for the domain decomposition of the pseudospectral grids, which is helpful in understanding, among other things, the 3D transpose of the global data that is necessary for the parallel fast Fourier transforms that are the central component of the numerical discretizations. Details of the hybrid implementation are provided, and performance tests illustrate the utility of the method. It is shown that the hybrid scheme achieves good scalability up to ～20,000 compute cores with a maximum efficiency of 89%, and a mean of 79%. Data are presented that help guide the choice of the optimal number of MPI tasks and OpenMP threads in order to maximize code performance on two different platforms. 相似文献