期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

迟利华刘杰《计算机应用》2010,30(Z1)

OpenMP是现代多核机群系统采用的主要并行编程模型之一,在单CPU多核上可以获得良好的加速性能,但在整个机群系统上使用时,需要解决可扩展性差的问题.首先设计了求解非平衡动力学方程的并行算法.基于分布共享的多核机群系统,采用显式数据分布OpenMP并行计算方法,将数据进行分布式划分,分配到每个OpenMP线程,通过数据共享实现数据交换.计算结果表明显式OpenMP并行程序在保持可读性的同时,具有良好的可扩展性,在4核Xeon处理器构成的分布共享机群系统上,非平衡动力学方程组的数值并行计算可以扩展到1024个CPU核,具有明显的并行加速计算效果. 相似文献

2.

OpenMP多核技术在颗粒流体力学方法GHM中的应用

魏朝磊闫民赵方《计算机工程与科学》2017,39(7):1234-1240

为了达到提高颗粒流体动力学方法 GHM计算效率的目标,分析了GHM模型的主要计算模块,抽取其中的可并行计算模块,基于多核计算机的硬件环境,应用OpenMP多线程并行计算模型,对采用数值积分方法求解颗粒运动方程的部分,实现求解过程的并行计算。最后通过多次实验验证程序的正确性及算法性能。实验结果表明,在Windows 7系统4核8线程处理器的计算机上,并行程序的并行加速比最高达到了2.5,说明OpenMP多核并行技术能较显著地提高GHM方法的计算性能。相似文献

3.

OpenMP的多核并行程序设计 总被引：3，自引：0，他引：3

黄猛《电脑编程技巧与维护》2009,(17):35-38

介绍一种多核并行编程标准OpenMP,对循环并行化的指令和使用方法进行详细解释,并给出实例证明使用OpenMP对多核环境下程序效率的提高。相似文献

4.

面向嵌入式多核的OpenMP扩展方法

下载免费PDF全文

王庆季振洲刘涛《计算机科学与探索》2011,5(1):81-86

为多核平台开发一种有效的编程方法已经成为并行软件研究的一个重要目标.在嵌入式多核平台上进行了OpenMP并行程序的有效的实施运行.针对嵌入式具有有限内存资源的特点,提出了通过扩展OpenMP自定义制导语句tiling来提高并行程序在嵌入式多核平台上的运行效率.扩展后的OpenMP并行程序支持循环分片,从而能够充分利用层... 相似文献

5.

应用GPU集群加速计算蛋白质分子场 总被引：3，自引：2，他引：1

张繁王章野姚建吴韬彭群生《计算机辅助设计与图形学学报》2010,22(3)

针对生物化学计算中采用量子化学理论计算蛋白质分子场所带来的巨大计算量的问题,搭建起一个GPU集群系统,用来加速计算基于量子化学的蛋白质分子场.该系统采用消息传递并行编程环境(MPI)连接集群各结点,以开放多线程OpenMP编程标准作为多核CPU编程环境,以CUDA语言作为GPU编程环境,提出并实现了集群系统结点中GPU和多核CPU协同计算的并行加速架构优化设计.在保持较高计算精度的前提下,结合MPI,OpenMP和CUDA混合编程模式,大大提高了系统的计算性能,并对不同体系和规模的蛋白质分子场模拟进行了计算分析.与相应的CPU集群、GPU单机和CPU单机计算方法对比,该GPU集群大幅度地提高了高分辨率复杂蛋白质分子场模拟的计算效率,比CPU集群的平均计算加速比提高了7.5倍. 相似文献

6.

多核计算环境下改进的主从式并行遗传算法

谢克家刘昕王成良杨少晨《微计算机信息》2011,(3)

遗传算法作为通用而有效的全局搜索算法已在图像处理、自动控制等众多领域获得应用,但其计算量大、极耗计算资源,运行效率直接影响到复杂的非线性和多维空间寻优问题的求解效率。在分析OpenMP并行技术特点的基础上,针对主从式并行模型没有充分利用遗传算法内在并行性的问题,提出了一种改进的主从式并行遗传算法,并应用OpenMP编程模型在多核计算环境下实现。利用旅行商问题进行的实验表明,改进的并行遗传算法有更好的计算效率、扩展性,可在求解大规模TSP问题上有更广泛和高效的应用。相似文献

7.

非线性方程的多核并行蒙特卡洛求解方法

王克 ;薛小超 ;朱朋海《现代计算机》2014,(7):38-44

采用多核技术。通过使用Win32API、OpenMP、MPI三种并行模式将求解非线性方程的蒙特卡洛方法的串行方法并行化,得到5种并行化方法。根据数值试验结果,对各种并行方法进行比较,发现Win32API并行模式计算速度最快,最终将结论加以推广。相似文献

8.

基于多核并行计算的无人机CFD数值模拟

下载免费PDF全文

姜悦宁贾宏光厉明《计算机工程与应用》2018,54(7):221-225

随着计算机系统向多处理器多核架构发展，针对航空工程技术高精度大规模求解问题，提出了一种高效、准确的无人机CFD（Computational Fluid Dynamics）数值模拟多核并行计算方法。对典型机翼进行了串行解法和并行解法的数值校验，验证了以N-S（Navier-Stokes）方程为主控方程的数值求解方法的正确性。接着对比了串行计算和并行计算的时间、加速比等性能，最终获得了针对无人机整机的一种高效的并行计算方法，并对某无人机整机纵向气动特性进行了数值计算。采用FLUENT脚本记录功能，实现不同工况按编译顺序自动计算、保存数据和切换，提高了计算效率。从模块角度详细介绍了CFD串行、并行计算的工作原理，为采用并行计算的方法提供了依据。相似文献

9.

面向神威高性能多核处理器的并行编译优化方法

周雍浩徐金龙李斌钱宏聂凯《计算机工程》2022,48(9):130-138

在神威高性能多核服务器上,自动并行化编译系统为识别和申明程序中的并行性,产生的OpenMP程序没有经过充分的优化,其采用简单的fork-join模型,存在大量的并行循环嵌套,导致运行效率低。为提升自动并行化编译系统产生的OpenMP程序的运行效率,提出一种并行域重构优化技术。并行域重构技术通过合并程序中的并行域和扩展嵌套循环中的并行域范围,减少OpenMP程序的并行域数目,降低线程组频繁创建和合并等控制开销,将简单fork-join模型的OpenMP程序转换为性能更为高效的单程序多数据模型的OpenMP程序。实验结果表明,在新一代神威高性能多核服务器SW1621平台上,并行域重构技术在NPB3.3-OMP测试集和SPEC OMP2012测试集上的运行效率分别提高了10.77%和7.94%的,可有效提升自动并行化编译系统OpenMP程序的执行效率。相似文献

10.

基于OpenMP的分子动力学并行算法的性能分析与优化

白明泽程丽豆育升孙世新《计算机应用》2012,32(1):163-166

为提高分子动力学模拟在共享内存式服务器上的计算速度,对基于OpenMP的分子动力学并行算法(Critical方法)进行了性能分析与优化。通过在多核服务器上的测试,以及加速比和并行效率的计算分析了Critical方法的并行性能,进而提出优化的三角形方法。所提方法中每个线程所计算的粒子数固定,且粒子数目呈阶梯状上升,使得各线程能够错时到达临界区。从而使程序在临界区的闲置时间比Critical方法减半,加速比明显提高。相似文献

11.

A parallel CFD rotor code using OpenMP

《Advances in Engineering Software》2001,32(8):665-671

The extended full-potential (FPX) helicopter rotor computational fluid dynamics (CFD) code of Fortran in its reduced two-dimensional version is successfully converted into a parallel version for multiprocessing. The FPX code with an internal grid generator solves the compressible full-potential equation using an approximately factored finite-difference scheme with added numerous physical modeling enhancements, including viscous boundary layers, shock-induced entropy corrections and wake-vortex embedding. The parallel version of the code uses open multi-processing (OpenMP) directives as parallel programming tool in shared-memory (SM) environment. The OpenMP code is portable and scalable, which can run on various computer platforms including UNIX platforms and Windows NT platforms. The performance study of the parallel code on SGI Origin 2000 UNIX platform is made. The results show that reasonable speedups through parallelization are obtained and that OpenMP is easy to use and an efficient parallel programming tool for the present problem. 相似文献

12.

Supporting soft real-time parallel applications on multiprocessors

《Journal of Systems Architecture》2014,60(2):152-164

The prevalence of multicore processors has resulted in the wider applicability of parallel programming models such as OpenMP and MapReduce. A common goal of running parallel applications implemented under such models is to guarantee bounded response times while maximizing system utilization. Unfortunately, little previous work has been done that can provide such performance guarantees. In this paper, this problem is addressed by applying soft real-time scheduling analysis techniques. Analysis and conditions are presented for guaranteeing bounded response times for parallel applications under global EDF multiprocessor scheduling. 相似文献

13.

OpenMP‐oriented applications for distributed shared memory architectures

Ami Marowka Zhenying Liu Barbara Chapman 《Concurrency and Computation》2004,16(4):371-384

The rapid rise of OpenMP as the preferred parallel programming paradigm for small‐to‐medium scale parallelism could slow unless OpenMP can show capabilities for becoming the model‐of‐choice for large scale high‐performance parallel computing in the coming decade. The main stumbling block for the adaptation of OpenMP to distributed shared memory (DSM) machines, which are based on architectures like cc‐NUMA, stems from the lack of capabilities for data placement among processors and threads for achieving data locality. The absence of such a mechanism causes remote memory accesses and inefficient cache memory use, both of which lead to poor performance. This paper presents a simple software programming approach called copy‐inside–copy‐back (CC) that exploits the data privatization mechanism of OpenMP for data placement and replacement. This technique enables one to distribute data manually without taking away control and flexibility from the programmer and is thus an alternative to the automat and implicit approaches. Moreover, the CC approach improves on the OpenMP‐SPMD style of programming that makes the development process of an OpenMP application more structured and simpler. The CC technique was tested and analyzed using the NAS Parallel Benchmarks on SGI Origin 2000 multiprocessor machines. This study shows that OpenMP improves performance of coarse‐grained parallelism, although a fast copy mechanism is essential. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献

14.

基于多核微机的微粒群并行算法 总被引：4，自引：1，他引：3

下载免费PDF全文

陈华范宜仁邓少贵李智强《计算机工程与应用》2010,46(13):34-36

提出了一种基于Logistic模型的惯性权重非线性调整策略,采用OpenMP多线程编程,在微机上实现了微粒群算法的多核并行计算。通过对BenchMark测试函数集中的5个函数进行测试,试验结果表明,采用基于Logistic模型的惯性权重非线性调整策略在算法成功率和收敛代数都优于线性调整策略,而基于OpenMP的微粒群多核并行计算使得计算速度得到提高。相似文献

15.

Performance comparison of MPI and OpenMP on shared memory multiprocessors

Graud Krawezik Franck Cappello 《Concurrency and Computation》2006,18(1):29-61

When using a shared memory multiprocessor, the programmer faces the issue of selecting the portable programming model which will provide the best performance. Even if they restricts their choice to the standard programming environments (MPI and OpenMP), they have to select a programming approach among MPI and the variety of OpenMP programming styles. To help the programmer in their decision, we compare MPI with three OpenMP programming styles (loop level, loop level with large parallel sections, SPMD) using a subset of the NAS benchmark (CG, MG, FT, LU), two dataset sizes (A and B), and two shared memory multiprocessors (IBM SP3 NightHawk II, SGI Origin 3800). We have developed the first SPMD OpenMP version of the NAS benchmark and gathered other OpenMP versions from independent sources (PBN, SDSC and RWCP). Experimental results demonstrate that OpenMP provides competitive performance compared with MPI for a large set of experimental conditions. Not surprisingly, the two best OpenMP versions are those requiring the strongest programming effort. MPI still provides the best performance under some conditions. We present breakdowns of the execution times and measurements of hardware performance counters to explain the performance differences. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

16.

Design and implementation of parallel approximate inverse classes using OpenMP

Konstantinos M. Giannoutakis George A. Gravvanis 《Concurrency and Computation》2009,21(2):115-131

A new parallel normalized optimized approximate inverse algorithm, based on the concept of antidiagonal wave pattern, for computing classes of explicitly approximate inverses, is introduced for symmetric multiprocessor systems. The parallel normalized explicit approximate inverses are used in conjunction with parallel normalized explicit preconditioned conjugate gradient schemes for the efficient solution of finite element sparse linear systems. The parallel design and implementation issues of the new algorithm are discussed and the parallel performance is presented using OpenMP. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献

17.

基于Docker的MPI和OpenMP混合编程

赵博颖肖鹏张力《计算机与现代化》2018,(5):60

针对当前搭建集群并行系统复杂且耗时等问题,提出基于Docker搭建并行系统。介绍轻量级虚拟化技术Docker的核心概念和基本架构,并基于Docker技术在Linux平台上搭建集群并行开发环境。简要阐述并行计算的思想,叙述MPI和OpenMP并行计算的基本概念和特点,针对矩阵并行乘法的算法建立MPI和OpenMP的混合编程模型,并给出混合编程模型与MPI并行编程模型以及OpenMP并行编程模型的性能对比,分析出现差异的原因。基于该混合编程模型比较Docker与传统物理机两者搭建的并行系统的并行效率。相似文献

18.

High-Scalability Parallelization of a Molecular Modeling Application: Performance and Productivity Comparison Between OpenMP and MPI Implementations

Russell Brown Ilya Sharapov 《International journal of parallel programming》2007,35(5):441-458

Important components of molecular modeling applications are estimation and minimization of the internal energy of a molecule. For macromolecules such as proteins and amino acids, energy estimation is performed using empirical equations known as force fields. Over the past several decades, much effort has been directed towards improving the accuracy of these equations, and the resulting increased accuracy has come at the expense of greater computational complexity. For example, the interactions between a protein and surrounding water molecules have been modeled with improved accuracy using the generalized Born solvation model, which increases the computational complexity to O (n ³). Fortunately, many force-field calculations are amenable to parallel execution. This paper describes the steps that were required to transform the Born calculation from a serial program into a parallel program suitable for parallel execution in both the OpenMP and MPI environments. Measurements of the parallel performance on a symmetric multiprocessor reveal that the Born calculation scales well for up to 144 processors. In some cases the OpenMP implementation scales better than the MPI implementation, but in other cases the MPI implementation scales better than the OpenMP implementation. However, in all cases the OpenMP implementation performs better than the MPI implementation, and requires less programming effort as well. Trademark Legend Sun, Sun Microsystems, SPARC, UltraSPARC, Sun Fire, Sun Performance Library and Sun HPC Cluster Tools are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. 相似文献

19.

多层次并行体绘制算法的研究与应用 总被引：1，自引：0，他引：1

洪振刚罗省贤《计算机工程与科学》2009,31(Z1)

三维数据场的体绘制技术是科学可视化中一个重要的研究方向,本文在研究和总结体绘制的发展历程与关键技术的基础之上,着重研究了体绘制中的光线投射算法,结合多核处理器机群系统,提出并实现了一种基于多层次并行编程模型的并行光线投射体绘制算法,并成功地将该算法应用于三维城市浅层地质模型,取得了良好的可视化效果。分别对MPI环境和多层次并行编程MPI+OpenMP环境下的光线投射算法进行了不同计算规模的性能比较实验。实验和分析表明,多层次并行光线投射体绘制算法加快了体绘制的速度,MPI+OpenMP多层次并行模型性能高于纯MPI编程模型的性能。相似文献