共查询到20条相似文献,搜索用时 224 毫秒
1.
2.
适合机群OpenMP系统的制导扩展 总被引:1,自引:0,他引:1
OpenMP以其易用性和支持增量并行的特点成为共享存储体系结构的编程标准.机群OpenMP系统在机群上实现了OpenMP计算环境,它将OpenMP的易编程性和机群的可扩展性结合起来,是很有意义的.OpenMP的编程方式主要有循环级和SPMD两种,其中循环级方式易于编程而SPMD方式难于编程.然而在机群OpenMP系统中获得高性能OpenMP程序,必需采用SPMD方式.该文描述了适合机群OpenMP系统的一个简单的OpenMP制导扩展子集(包括数据分布制导、循环调度模式),并在机群OpenMP系统OpenMP/JIAJIA上进行了实现.应用测试表明,利用这些制导扩展进行编程,既保持循环级方式的易编程性又获得与SPMD方式相当的性能,是有效的编程方式. 相似文献
3.
机群OpenMP系统的设计与实现 总被引:5,自引:0,他引:5
OpenMP以其易用性和支持增量并行的特点成为共享存储体系结构的编程标准.目前机群系统已成为高性能计算的主流平台,研究机群OpenMP系统对推进并行应用的开发和普及非常有意义.该文作者以软件DSM系统JIAJIA作为OpenMP的运行时系统,结合一个前端编译器OMP2JIA,在机群系统上实现了OpenMP/JIAJIA计算环境,同时在提高性能方面根据机群系统特点扩展了OpenMP制导,优化了后端运行时库。通过11个OpenMP应用,作者比较了该计算环境和一个支持OpenMP的硬件cc-NUMA系统(SGI 2100)的性能.结果表明,作者的机群OpenMP系统的7机平均加速比为4.62;SGI 2100系统为4.55,二者性能相当. 相似文献
4.
机群系统的规模增大,部件增多,导致了机群的组合错误率也不断上升。节点失效使运行于机群节点上的作业面临中途失败,从而造成巨大的资源浪费,甚至导致大量的作业无法完成。检查点系统为节点提供了较好的容错性能,因此成为机群操作系统软件的重要组成部分。进程的地址空间是检查点系统需要记录的一部分重要内容,对它的存储效率直接影响检查点操作的性能。论文提出了两种检查点系统中进程地址空间的优化存储策略。其中组合式检查点文件写策略解决了并发写机制在应用内存接近物理内存时的性能突降问题,A-O(Access-Order)进程地址空间存储策略调整传统地址空间的存储顺序,使大内存应用的检查点操作性能得到了大幅度提升。在实验中,A-O进程地址空间存储策略最高可以将传统的存储策略的时间开销缩减至原来的50%。 相似文献
5.
基于共享内存的机群服务检查点机制研究 总被引:1,自引:0,他引:1
针对既有基于稳定存储的机群服务检查点存在的系统成本高、恢复时间长的问题,提出了一种基于共享内存的机群服务检查点机制;设计了一套面向基于共享内存的检查点信息主-备存储模式的检查点信息管理协议,确保机群服务检查点信息一致性;设计了一套基于单向逻辑环的检查点组管理协议,确保检查点逻辑备份环中检查点进程的成员视图一致性.性能实验结果表明,该检查点机制具有较好的检查点信息读写性能,组管理协议系统开销小,较好地满足了机群服务检查点需求. 相似文献
6.
7.
OpenMP作为共享存储并行编程标准,以其良好的易用性、支持增量并行等特点成为并行程序设计的主流模型之一.OpenMP标准是针对UMA共享存储结构制定的,其循环调度机制只考虑了负载平衡而无须考虑数据分布.然而在机群OpenMP系统中,数据局部性是影响性能的关键因素.针对OpenMP标准中静态调度策略不适合机群计算的缺点,提出了一个充分体现拥有者计算原则的LBS调度算法,并通过扩展制导的方式在机群OpenMP系统(OpenMP/JIAJIA)上加以实现.测试结果表明,LBS算法对于机群OpenMP系统很有效. 相似文献
8.
DSM体系结构对并行编译系统的支持与挑战 总被引:1,自引:0,他引:1
分布存储系统因其可伸缩性好而得到很好的应用,不同的分布存储系统应运而生。而用户为了编程的简单方便,往往要求底层体系结构是透明的,即要求整个系统有统一的全局地址空间,因而促使了分布共享存储(DSM)系统的出现。根据硬件支持程度的不同,不同DSM系统对并行编译系统的支持和要求也不同。根据过去的工作经验,通过对比和分析硬件支持全局编址和软件支持全局编址的两种DSM系统的特点,该文指出分布共享存储系统对并行编译器的研制所提出的挑战,为以后并行编译系统的设计和实现及用户编程方式的选择有一定的促进作用。 相似文献
9.
Beowulf计划关于“基于COTS技术以满足特殊计算需要”的思想使得机群计算成为斋性能计算的一个重要流派,本文针对类Beowulf机群的Intel微处理器特点,讨论了BLAS的优化技术,在以软件DSM系统作为并行编程环境的类Beowulf机群系统上作出了性能评价。 相似文献
10.
PDSM:一个可移植的分布式共享存储系统 总被引:1,自引:0,他引:1
1.引言科学计算是一门迅速发展的学科,传统上,这些问题是用超级计算机或工作站机群来解决的。在互相独立的计算机上的并行程序设计是在PVM这样的网络并行计算和分布式编程环境下通过消息传递实现结点通信的。但是,由于编程者要了解底层消息传递的细节,基于PVM的并行编程十分困难,而科学家们又没有很多精力用于细致的程序设计。DSM(分布式共享内存)通过在工作站机群上建立一个共享内存的抽象层来降低这种程序设计的复杂度。 相似文献
11.
12.
13.
Angelos Bilas Dongming Jiang Jaswinder Pal Singh 《Journal of Parallel and Distributed Computing》2003,63(12):1257-1276
Although the shared memory abstraction is gaining ground as a programming abstraction for parallel computing, the main platforms that support it, small-scale symmetric multiprocessors (SMPs) and hardware cache-coherent distributed shared memory systems (DSMs), seem to lie inherently at the extremes of the cost-performance spectrum for parallel systems. In this paper we examine if shared virtual memory (SVM) clusters can bridge this gap by examining how application performance scales on a state-of-the-art shared virtual memory cluster. We find that: (i) The level of application restructuring needed is quite high compared to applications that perform well on a DSM system of the same scale and larger problem sizes are needed for good performance. (ii) However, surprisingly, SVM performs quite well for a fairly wide range of applications, achieving at least half the parallel efficiency of a high-end DSM system at the same scale and often much more. 相似文献
14.
在操作系统中实现分布共享存储、存储管理和文件系统的集成 总被引:3,自引:0,他引:3
开发分布共享存储系统的目的是为了在分布式存储器的基础上构造逻辑上的共享存储器模型,对于如何在共享存储器模型的基础上为用户进程构造虚拟空间,传统的分布共享系统并未给予足够的重视。只有在操作系统中把分布共享存储技术、存储器管理和文件系统结合起来,才能充分发挥分布共享存储技术具有的能力。基于以上思想,在文中提出了一个实现了分布共享存储的操作系统模型,并分析了该模型一个实现原型,讨论该原型具有的优缺点。通过在操作系统中取消进程的逻辑空间,使进程直接在文件上运行,该模型不仅能够实现分布共享存储,而且和许多传统操作系统以及传统分布共享存储系统相比,具有许多优点。该操作系统实现了分布共享存储技术和操作系统中的存储管理以及文件系统的完美结合。 相似文献
15.
分布式共享内存的技术和实现 总被引:3,自引:0,他引:3
分布式共享内存结合了布式内存结构与共享存储结构的优点,具有可扩充性、通过性性、方便性,本文论述了在实现DSM系统中存在的问题,并讨论了DSM系统在软件硬件方面所做的工作和采取的措施。 相似文献
16.
Distributed systems that consist of workstations connected by high performance interconnects offer computational power comparable to moderate size parallel machines. Middleware like distributed shared memory (DSM) or distributed shared objects (DSO) attempts to improve the programmability of such hardware by presenting to application programmers interfaces similar to those offered by shared memory machines. This paper presents the portable Indigo data sharing library which provides a small set of primitives with which arbitrary shared abstractions are easily and efficiently implemented across distributed hardware platforms. Sample shared abstractions implemented with Indigo include DSM as well as fragmented objects, where the object state is split across different machines and where interfragment communications may be customized to application-specific consistency needs. The Indigo library's design and implementation are evaluated on two different target platforms: a workstation cluster and an IBM SP2 machine. As part of this evaluation, a novel DSM system and consistency protocol are implemented and evaluated with several high performance applications. Application performance attained with the DSM system is compared to the performance experienced when utilizing the underlying basic message-passing facilities or when employing Indigo to construct customized fragmented objects implementing the application's shared state. Such experimentation results in insights concerning the efficient implementation of DSM systems (e.g. how to deal with false sharing). It also leads to the conclusion that Indigo provides a sufficiently rich set of abstractions for efficient implementation of the next generation of parallel programming models for high performance machines. © 1998 John Wiley & Sons, Ltd. 相似文献
17.
Abandah G.A. Davidson E.S. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(2):206-216
In a distributed shared memory (DSM) multiprocessor, the processors cooperate in solving a parallel application by accessing the shared memory. The latency of a memory access depends on several factors, including the distance to the nearest valid data copy, data sharing conditions, and traffic of other processors. To provide a better understanding of DSM performance and to support application tuning and compiler development for DSM systems, this paper extends microbenchmarking techniques to characterize the important aspects of a DSM system. We present an experiment-based methodology for characterizing the memory, communication, scheduling, and synchronization performance, and apply it to the Convex SPP1000. We present carefully designed microbenchmarks to characterize the performance of the local and remote memory, producer-consumer communication involving two or more processors, and the effects on performance when multiple processors contend for utilization of the distributed memory and the interconnection network 相似文献
18.
19.
TreadMarks: shared memory computing on networks of workstations 总被引:2,自引:0,他引:2
Amza C. Cox A.L. Dwarkadas S. Keleher P. Honghui Lu Rajamony R. Weimin Yu Zwaenepoel W. 《Computer》1996,29(2):18-28
Shared memory facilitates the transition from sequential to parallel processing. Since most data structures can be retained, simply adding synchronization achieves correct, efficient programs for many applications. We discuss our experience with parallel computing on networks of workstations using the TreadMarks distributed shared memory system. DSM allows processes to assume a globally shared virtual memory even though they execute on nodes that do not physically share memory. We illustrate a DSM system consisting of N networked workstations, each with its own memory. The DSM software provides the abstraction of a globally shared memory, in which each processor can access any data item without the programmer having to worry about where the data is or how to obtain its value 相似文献
20.
基于共享虚拟存储(shared virtual memory,SVM)PC机群的大规模并行地理图像处理原型系统ParGIP(parallel geographical image processing)采用Client-Server计算模型,通过软件分布式共享存储(software distributed shared memory,software DSM)中间层将PC机群组织成一个逻辑上共享的内存的并行计算平台,地理图像处理可以充分利用ParGIP提供的大共享内存和并行处理能力来提高性能,缩短处理周期,从而解决传统单机串行方式下地理图像处理中内存匮乏和计算能力不足的问题,ParGIP还进一步将机群中各个结点上分布的磁盘组织起来,提供地理影像库所需的海量存储空间和并行I/O能力,测试结果表明,ParGIP的8机并行I/O带宽达到102.6MB/s,典型的图像处理算法获得了接近线性的加速比。 相似文献