期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

李鸿健白明泽唐红孙世新《计算机应用》2010,30(6):1687-1689

为提高激光化学反应模拟效率,在半经典分子动力学模拟中引入混合并行技术和双层并行思想。基于MPI+OpenMP混合模型设计并实现激光化学反应双层并行模拟算法,上层基于MPI实现节点间的原子分解并行,下层基于OpenMP实现节点内的多线程矩阵并行乘法。在SMP集群中测试表明,模拟大分子体系激光化学反应并行效率可达60%以上。因此,应用混合并行技术可有效提高激光化学反应模拟效率。相似文献

2.

GRAPES四维变分同化系统MPI和OpenMP混合算法研究

樊志杰赵文涛《计算机光盘软件与应用》2012,(19):21-23

本文阐述了MPI和OpenMP的编程模式,并在此基础上利用MPI和OpenMP混合并行的方式,即:在节点内应用OpenMP共享存储、在节点间应用MPI进行消息传递的模式,对我国自行研发的数值天气预报系统GRAPES(Global/Regional Assimilation and Prediction Syste)进行测试,。结果表明,混合并行算法比原来的单纯的MPI模式有更加理想的并行效率和加速比。相似文献

3.

遥感卫星图像系统几何校正多级并行算法

《遥感信息》2016,(3)

针对遥感卫星图像数据量大、系统几何校正计算复杂的问题,提出了基于SMP机群的系统几何校正多级并行算法。该算法利用MPI+OpenMP并行编程技术,节点间实现进程级粗粒度的并行,节点内实现线程级细粒度的并行。采用基于冗余存储的数据划分方式,保证了各个节点的负载均衡,减少了数据定位的复杂度;利用并行文件系统进行数据分配,避免了节点间的数据搬移,实现了数据并行读写,节点内部的并行,进一步细化了算法的并行粒度。在SMP机群系统上对资源三号卫星正视相机图像进行算法验证。结果表明,该算法充分利用了SMP机群的计算资源,具有良好的并行性能。相似文献

4.

多核环境下边缘提取并行算法研究

张思乾程果陈荤熊伟《计算机科学》2012,39(1):295-298

随着处理器由高主频的单核处理器逐步转向片上多核处理器(CMP),计算机并行处理能力不断提升。通过分析GIS串行算法面临的性能瓶颈,利用CMP的优势,采用线程级并行处理栅格数据。针对边缘提取算法,深入分析和比较了MPI、OpenMP等当前主流的并行编程模式,提出了并行性能估计模型。基于OpenMP编程模型分析线程数、调度方式和分块大小对算法并行性能的影响,实现边缘提取最优并行。实验证明,性能评估模型能够准确预测CMP环境下的并行性能,基于OpenMP实现的边缘提取并行算法能够提高图像边缘提取效率。相似文献

5.

矢量多边形栅格化算法快速并行化方法研究

《遥感信息》2014,(5)

本文在分析典型多边形栅格化算法的基础上,研究了串行算法并行化思路,提出一种多边形栅格化算法并行框架。该并行框架包括MPI与OpenMP的双层并行模式、顾及负载均衡的矢量多边形数据划分方法、多边形栅格化基本算子调用接口。利用本文形成的并行框架对扫描线算法、边界代数法进行了并行化,并利用大规模土地现状数据验证本文所提出的并行化方法的有效性。试验结果表明,该方法能够解决矢量多边形栅格化串行算法快速并行化的问题,并行化后的算法大大减少了矢量多边形转换时间,具有良好的并行效率。相似文献

6.

Research on Parallel Algorithm of Fdge Extraction Based on Multi-processor

ZHANG Si-qian CHENG Guo CHEN Luo XIONG Wei 《计算机科学》2012,39(1)

随着处理器由高主频的单核处理器逐步转向片上多核处理器(CMP),计算机并行处理能力不断提升.通过分析GIS串行算法面临的性能瓶颈,利用CMP的优势,采用线程级并行处理栅格数据.针对边缘提取算法,深入分析和比较了MPI、OpenMP等当前主流的并行编程模式,提出了并行性能估计模型.基于OpenMP编程模型分析线程数、调度方式和分块大小对算法并行性能的影响,实现边缘提取最优并行.实验证明,性能评估模型能够准确预测CMP环境下的并行性能,基于OpenMP实现的边缘提取并行算法能够提高图像边缘提取效率. 相似文献

7.

声波数值模拟中的多核并行方法研究

曹丹平《计算机工程与应用》2012,48(36):9-13

波动方程数值模拟普遍存在计算量大的问题,如何根据波动方程有限差分方法的特点开展并行化方法研究是适应微机多核发展的必然趋势。结合波动方程数值模拟中的多层循环嵌套问题和OpenMP的特点,通过确定循环体并行顺序、减少串行环节、合并循环体、准确设置制导语句以及线程绑定优化等方法有助于实现微机多核的高效并行。针对波动方程特点的多核并行不仅有助于提高单机计算效率,对于提高计算机集群上常用的MPI+OpenMP混合并行效率也具有重要意义。相似文献

8.

嵌入式零树小波压缩和解压缩的并行化算法

韩丽洁李文田晏嘉《计算机应用》2009,29(Z1)

嵌入式零树小波压缩算法是图像压缩技术中有效的压缩算法,但其压缩时间较长.对该算法进行了研究,并在多核机群系统下实现了该算法的并行算法,提高了算法的性能.实现了MPI和MPI+OpenMP两种并行算法,并将串行算法、MPI并行算法与MPI+OpenMP并行算法进行比较.结果显示,随着数据量的增多,MPI并行算法和MPI+OpenMP并行算法相对于串行算法的运行效率都有明显提高,其中MPI+OpenMP并行算法的效率更好. 相似文献

9.

一种基于MPI与OpenMP的矩阵乘法并行算法

张艳华刘祥港《计算机与现代化》2011,(7):84-87

阐述MPI与OpenMP进行并行计算的特点,并在Visual Studio 2010上构建一个基于两者的混合编程平台。程序在该平台上执行时能够同时实现多进程与进程内多线程编程,设计并实现一种基于数据划分的矩阵乘法的并行算法,将数据分解为两部分交给两个计算节点分别完成,并在每个计算节点内将数据进一步划分,交给多个线程同时执行。通过与非并行矩阵乘法、MPI矩阵乘法、OpenMP矩阵乘法运算性能进行比较,验证该算法可以有效地挖掘计算机的处理能力。相似文献

10.

SMP集群系统上可扩展并行特征问题求解器研究

下载免费PDF全文

赵永华迟学斌姜金荣《计算机工程》2006,32(19):3-5

基于对称三对角特征问题的分而治之方法,提出了一个适合SMP集群环境的多级混合并行算法。SMP节点内的并行求解采用了粗粒度和细粒度两种OpenMP并行。为了改善纯MPI算法中的负载不平衡,混合并行算法使用了动态任务分配方法。在深腾6800上的试验表明,混合并行算法具有好的扩展性和加速比。关键词：SMP集群;MPI+OpenMP;混合并行;并行求解器相似文献

11.

多群粒子输运问题在多核集群系统上的混合并行计算

迟利华刘杰龚春叶徐涵蒋杰胡庆丰《计算机工程与科学》2009,31(11)

本文分析了非结构网格多群粒子输运Sn方程求解的并行性,拟合多核机群系统的特点,设计了MPI/OpenMP混合程序,针对空间网格点采用区域分解划分,计算结点间基于消息传递MPI编程,每个MPI计算进程在计算过程中碰到关于能群的计算,就生成多个OpenMP线程,计算结点内针对能群进行多线程并行计算。数值测试结果表明,非结构网格上的粒子输运问题的混合并行计算能较好地匹配多核机群系统的硬件结构,具有良好的可扩展性,可以扩展到1024个CPU核。相似文献

12.

基于二维结构化网格的可压缩流体并行算法研究

皇甫永硕刘杰龚春叶《计算机工程与科学》2017,39(9):1602-1609

基于二维/轴对称高精度可压缩多相流计算流体力学方法 MuSiC-CCASSIM的结构化网格部分,设计了区域并行分解方法;针对各处理器边界数据的通信,设计了阻塞式通信与非阻塞式通信并行算法;为了减少通信开销,设计了MPI/OpenMP混合并行优化算法。在天河二号超级计算机上进行了测试,每个核固定网格规模为625*250,最多调用8 192核。测试数据表明,采用MPI/OpenMP混合并行算法、纯MPI非阻塞式通信并行算法和纯MPI阻塞式通信并行算法的程序的平均并行效率分别达到86%、83%和77%,三种算法都具有良好的可扩展性。相似文献

13.

Performance‐based parallel loop self‐scheduling using hybrid OpenMP and MPI programming on multicore SMP clusters

Chao‐Tung Yang Chao‐Chin Wu Jen‐Hsiang Chang 《Concurrency and Computation》2011,23(8):721-744

Parallel loop self‐scheduling on parallel and distributed systems has been a critical problem and it is becoming more difficult to deal with in the emerging heterogeneous cluster computing environments. In the past, some self‐scheduling schemes have been proposed as applicable to heterogeneous cluster computing environments. In recent years, multicore computers have been widely included in cluster systems. However, previous researches into parallel loop self‐scheduling did not consider certain aspects of multicore computers; for example, it is more appropriate for shared‐memory multiprocessors to adopt Open Multi‐Processing (OpenMP) for parallel programming. In this paper, we propose a performance‐based approach using hybrid OpenMP and MPI parallel programming, which partition loop iterations according to the performance weighting of multicore nodes in a cluster. Because iterations assigned to one MPI process are processed in parallel by OpenMP threads run by the processor cores in the same computational node, the number of loop iterations allocated to one computational node at each scheduling step depends on the number of processor cores in that node. Experimental results show that the proposed approach performs better than previous schemes. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

14.

基于Docker的MPI和OpenMP混合编程

赵博颖肖鹏张力《计算机与现代化》2018,(5):60

针对当前搭建集群并行系统复杂且耗时等问题,提出基于Docker搭建并行系统。介绍轻量级虚拟化技术Docker的核心概念和基本架构,并基于Docker技术在Linux平台上搭建集群并行开发环境。简要阐述并行计算的思想,叙述MPI和OpenMP并行计算的基本概念和特点,针对矩阵并行乘法的算法建立MPI和OpenMP的混合编程模型,并给出混合编程模型与MPI并行编程模型以及OpenMP并行编程模型的性能对比,分析出现差异的原因。基于该混合编程模型比较Docker与传统物理机两者搭建的并行系统的并行效率。相似文献

15.

基于SMP集群的三维网格多粒度混合并行编程模型 总被引：2，自引：0，他引：2

于方郑晓薇孙晓鹏《计算机应用与软件》2009,26(3)

为提高大规模三维网格并行算法的执行效率,针对SMP集群分布/共享两级内存层次结构的特点,介绍适用于SMP集群混合编程的不同实现方法.对三维网格模型最短路径问题的并行求解提出了多粒度混合并行编程模型,给出了实现该问题的MPI+OpenMP混合并行算法,并在SMP集群上同粗粒度MPI(Message Passing Interface)并行算法做了性能比较.结果表明,采用该多粒度混合并行编程模型具有更好的加速比和运行效率. 相似文献

16.

SMP机群混合编程模型研究 总被引：12，自引：0，他引：12

陈勇陈国良李春生何家华《小型微型计算机系统》2004,25(10):1763-1767

研究了适用于 SMP机群的混合编程模型 ,并把它划分为 Open MP MPI和 Thread MPI两类 .通过研究指出 ,Open MP MPI优于 Thread MPI.在此基础上 ,重点研究了 Open MP MPI的实现机制、粗粒度和细粒度并行化方法、循环选择、优化措施以及注意事项等 ,得出细粒度并行化的 Open MP MPI是 SMP机群编程模型的一个较好选择的结论相似文献

17.

A hybrid message passing/shared memory parallelization of the adaptive integral method for multi-core clusters

Fangzhou Wei Ali E. Yilmaz 《Parallel Computing》2011,37(6-7):279-301

A hybrid message passing and shared memory parallelization technique is presented for improving the scalability of the adaptive integral method (AIM), an FFT based algorithm, on clusters of identical multi-core processors. The proposed hybrid MPI/OpenMP parallelization scheme is based on a nested one-dimensional (1-D) slab decomposition of the 3-D auxiliary regular grid and the associated AIM calculations: If there are M processors and T cores per processor, the scheme (i) divides the regular grid into M slabs and MT sub-slabs, (ii) assigns each slab/sub-slab and the associated operations to one of the processors/cores, and (iii) uses MPI for inter-processor data communication and OpenMP for intra-processor data exchange. The MPI/OpenMP parallel AIM is used to accelerate the solution of the combined-field integral equation pertinent to the analysis of time-harmonic electromagnetic scattering from perfectly conducting surfaces. The scalability of the scheme is investigated theoretically and verified on a state-of-the-art multi-core cluster for benchmark scattering problems. Timing and speedup results on up to 1024 quad-core processors show that the hybrid MPI/OpenMP parallelization of AIM exhibits better strong scalability (fixed problem size speedup) than pure MPI parallelization of it when multiple cores are used on each processor. 相似文献

18.

High-Scalability Parallelization of a Molecular Modeling Application: Performance and Productivity Comparison Between OpenMP and MPI Implementations

Russell Brown Ilya Sharapov 《International journal of parallel programming》2007,35(5):441-458

Important components of molecular modeling applications are estimation and minimization of the internal energy of a molecule. For macromolecules such as proteins and amino acids, energy estimation is performed using empirical equations known as force fields. Over the past several decades, much effort has been directed towards improving the accuracy of these equations, and the resulting increased accuracy has come at the expense of greater computational complexity. For example, the interactions between a protein and surrounding water molecules have been modeled with improved accuracy using the generalized Born solvation model, which increases the computational complexity to O (n ³). Fortunately, many force-field calculations are amenable to parallel execution. This paper describes the steps that were required to transform the Born calculation from a serial program into a parallel program suitable for parallel execution in both the OpenMP and MPI environments. Measurements of the parallel performance on a symmetric multiprocessor reveal that the Born calculation scales well for up to 144 processors. In some cases the OpenMP implementation scales better than the MPI implementation, but in other cases the MPI implementation scales better than the OpenMP implementation. However, in all cases the OpenMP implementation performs better than the MPI implementation, and requires less programming effort as well. Trademark Legend Sun, Sun Microsystems, SPARC, UltraSPARC, Sun Fire, Sun Performance Library and Sun HPC Cluster Tools are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. 相似文献