期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王国仁于戈叶峰郑怀远《计算机学报》1999,22(10):1032-1041

提出了一个基于分布式共享虚拟存储器技术的并行Ｈａｓｈ连接算法,然后设计了一个并行连接算法的测试评价基准,并评价和分析了该算法在均匀情况下３个不同负载的性能比较和Ｚｉｐｆ顺斜数据分布情况下两种度策略的算法性能。同时与其它并行连接算法进行性能比较与分析。相似文献

2.

分布并行系统的并行程序设计环境 总被引：1，自引：0，他引：1

温钰洪沈美明《小型微型计算机系统》1995,16(2):40-44

分布式并行计算机系统中，由于没有共享内存以支持处理机间的数据交换，因而需采用ｍｅｓｓａｇｅｐａｓｓｉｎｇ的方式实现并行计算中处理机间的数据通讯，并行程序设计环境作为程序员使用并行计算机系统工具，对于并行处理技术以及并行计算机系统的发展与推广应用都有重要的作用，本文将分布基于ｍｅｓｓａｇｅｐａｓｓｉｎｇ的并行计算机系统中的并行程序设计环境的基本问题，并介绍几种典型的并行程序设计环境。相似文献

3.

DSM体系结构对并行编译系统的支持与挑战 总被引：1，自引：0，他引：1

曾丽芳曾劲松《计算机工程与应用》2003,39(1):72-75

分布存储系统因其可伸缩性好而得到很好的应用,不同的分布存储系统应运而生。而用户为了编程的简单方便,往往要求底层体系结构是透明的,即要求整个系统有统一的全局地址空间,因而促使了分布共享存储(DSM)系统的出现。根据硬件支持程度的不同,不同DSM系统对并行编译系统的支持和要求也不同。根据过去的工作经验,通过对比和分析硬件支持全局编址和软件支持全局编址的两种DSM系统的特点,该文指出分布共享存储系统对并行编译器的研制所提出的挑战,为以后并行编译系统的设计和实现及用户编程方式的选择有一定的促进作用。相似文献

4.

Scalable shared-memory multiprocessor architectures

Thakkar S. Dubois M. Laundrie A.T. Sohi G.S. 《Computer》1990,23(6):71-74

Directory-based and bus-based cache coherence schemes are defined and described. Directory-based schemes can be classified as centralized or distributed. Both categories support local caches to improve processor performance and reduce traffic in the interconnection. Schemes using presence flags, B pointers, and linked lists are discussed. Bus-based systems provide uniform memory access to all processors. This memory organization allows a simpler programming model, making it easier to develop new parallel applications or to move existing applications from a uniprocessor to a parallel system. Two architectural variations of bus-based systems are described: multiple-bus and hierarchical architectures 相似文献

5.

A taxonomy of task-based parallel programming technologies for high-performance computing

Peter Thoman Kiril Dichev Thomas Heller Roman Iakymchuk Xavier Aguilar Khalid Hasanov Philipp Gschwandtner Pierre Lemarinier Stefano Markidis Herbert Jordan Thomas Fahringer Kostas Katrinis Erwin Laure Dimitrios S. Nikolopoulos 《The Journal of supercomputing》2018,74(4):1422-1434

Task-based programming models for shared memory—such as Cilk Plus and OpenMP 3—are well established and documented. However, with the increase in parallel, many-core, and heterogeneous systems, a number of research-driven projects have developed more diversified task-based support, employing various programming and runtime features. Unfortunately, despite the fact that dozens of different task-based systems exist today and are actively used for parallel and high-performance computing (HPC), no comprehensive overview or classification of task-based technologies for HPC exists. In this paper, we provide an initial task-focused taxonomy for HPC technologies, which covers both programming interfaces and runtime mechanisms. We demonstrate the usefulness of our taxonomy by classifying state-of-the-art task-based environments in use today. 相似文献

6.

Experimenting with a Shared Virtual Memory Environment for Hypercubes

《Journal of Parallel and Distributed Computing》1995,29(2):228-235

This paper describes the design and implementation of a shared virtual memory (SVM) system for the nCUBE 2 machine. The SVM system provides the user a single coherent address space across all nodes. It is implemented at the user level in a C programming environment using high level constructs to support data sharing. Shared variables are treated as objects rather than pages. We have improved upon an existing algorithm for maintaining coherency in the SVM system, thus achieving a reduction in the number of internode messages required in coherency maintenance. Detailed timing analysis is conducted to analyze the feasibility of this shared environment. Experimental results indicate that parallel programs running under an SVM system show linear speedup, suggesting that SVM systems could provide an effective programming environment for the next generation of distributed memory parallel computers. The bottleneck of this implementation is associated with the expensive interrupt handling capability of the nCUBE 2. 相似文献

7.

An interface to a reliable packet delivery service for parallelsystems

Debbage M. Hill M.B. Nicole D.A. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(4):400-411

Modern distributed memory parallel computers provide hardware support for the efficient and reliable delivery of interprocessor messages. This facility needs to be accessed by lightweight protocols that do not waste the performance of the underlying hardware; the heavyweight layering techniques traditionally used in distributed systems are wholly inappropriate. A low-level communication interface is therefore presented which exploits modern architectures effectively, while maintaining a good match to existing parallel programming environments. The interface defines mechanisms to access an asynchronous reliable packet delivery service. It permits messaging protocols to be efficiently synthesized by considering the activity at their end-points alone. This arrangement effectively decouples the implementation of protocols from low-level architectural features, and hence aids the portability of parallel programming environments. Furthermore, the interface allows the communication network to be shared by multiple programming paradigms, giving additional flexibility over existing systems 相似文献

8.

Shared virtual memory clusters: bridging the cost-performance gap between SMPs and hardware DSM systems

Angelos Bilas Dongming Jiang Jaswinder Pal Singh 《Journal of Parallel and Distributed Computing》2003,63(12):1257-1276

Although the shared memory abstraction is gaining ground as a programming abstraction for parallel computing, the main platforms that support it, small-scale symmetric multiprocessors (SMPs) and hardware cache-coherent distributed shared memory systems (DSMs), seem to lie inherently at the extremes of the cost-performance spectrum for parallel systems. In this paper we examine if shared virtual memory (SVM) clusters can bridge this gap by examining how application performance scales on a state-of-the-art shared virtual memory cluster. We find that: (i) The level of application restructuring needed is quite high compared to applications that perform well on a DSM system of the same scale and larger problem sizes are needed for good performance. (ii) However, surprisingly, SVM performs quite well for a fairly wide range of applications, achieving at least half the parallel efficiency of a high-end DSM system at the same scale and often much more. 相似文献

9.

SkelCL: a high-level extension of OpenCL for multi-GPU systems

Michel Steuwer Sergei Gorlatch 《The Journal of supercomputing》2014,69(1):25-33

Application development for modern high-performance systems with graphics processing units (GPUs) currently relies on low-level programming approaches like CUDA and OpenCL, which leads to complex, lengthy and error-prone programs. We present SkelCL—a high-level programming approach for systems with multiple GPUs and its implementation as a library on top of OpenCL. SkelCL makes three main enhancements to the OpenCL standard: (1) memory management is simplified using parallel container data types (vectors and matrices); (2) an automatic data (re)distribution mechanism allows for implicit data movements between GPUs and ensures scalability when using multiple GPUs; (3) computations are conveniently expressed using parallel algorithmic patterns (skeletons). We demonstrate how SkelCL is used to implement parallel applications, and we report experimental evaluation of our approach in terms of programming effort and performance. 相似文献

10.

数据并行计算：概念,模型与系统 总被引：3，自引：2，他引：1

李晓明《计算机科学》2000,27(6):1-5

一、引言并行计算,或者并行处理,指的是这样一种努力和相关的研究:利用多个具有计算能力的部件来共同完成一个计算工作,以获得比用一个部件来完成要快的效果。这显然是一个很自然的想法。历史地看,几乎是自从有了计算机,就有了并行处理的想法和实践。在80年代后期到90年代初期,以寻求对人类面临的若干重相似文献

11.

Shared Variable Oriented Parallel Precompiler for SPMD Model

下载免费PDF全文

Kang Jichang Zhu Yi''an Hong Yuanlin Ying Bishan 《计算机科学技术学报》1995,10(5):476-480

For the moment,commercial parallel computer systems with distributed memory architecture are usually provided with parallel FORTRAN or parallel C compliers,which are just traditional sequential FORTRAN or C compilers expanded with communication statements.Programmers suffer from writing parallel programs with communication statements. The Shared Variable Oriented Parallel Precompiler (SVOPP) proposed in this paper can automatically generate appropriate communication statements based on shared variables for SPMD(Single Program Multiple Data) computation model and greatly ease the parallel programming with high communication efficiency.The core function of parallel C precompiler has been successfully verified on a transputer-based parallel computer.Its prominent performance shows that SVOPP is probably a break-through in parallel programming technique. 相似文献

12.

Introducing and Implementing the Allpairs Skeleton for Programming Multi-GPU Systems

Michel Steuwer Malte Friese Sebastian Albers Sergei Gorlatch 《International journal of parallel programming》2014,42(4):601-618

Algorithmic skeletons simplify software development: they abstract typical patterns of parallelism and provide their efficient implementations, allowing the application developer to focus on the structure of algorithms, rather than on implementation details. This becomes especially important for modern parallel systems with multiple graphics processing units (GPUs) whose programming is complex and error-prone, because state-of-the-art programming approaches like CUDA and OpenCL lack high-level abstractions. We define a new algorithmic skeleton for allpairs computations which occur in real-world applications, ranging from bioinformatics to physics. We develop the skeleton’s generic parallel implementation for multi-GPU Systems in OpenCL. To enable the automatic use of the fast GPU memory, we identify and implement an optimized version of the allpairs skeleton with a customizing function that follows a certain memory access pattern. We use matrix multiplication as an application study for the allpairs skeleton and its two implementations and demonstrate that the skeleton greatly simplifies programming, saving up to 90 % of lines of code as compared to OpenCL. The performance of our optimized implementation is up to 6.8 times higher as compared with the generic implementation and is competitive to the performance of a manually written optimized OpenCL code. 相似文献

13.

基于分布存储系统的并行编译关键技术

贾明飞董渭清黄泳翔《计算机工程与应用》2003,39(22):103-106,152

分布存储系统的并行编译器需要解决各局部存储器之间数据分布问题和各处理机之间通信优化问题。论文并行编程模型、代码和数据分布、通信优化以及代码生成问题四个方面论述了基于分布存储系统的并行编译关键技术并提出了进一步研究所要解决的问题。相似文献

14.

MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory

Torsten Hoefler James Dinan Darius Buntinas Pavan Balaji Brian Barrett Ron Brightwell William Gropp Vivek Kale Rajeev Thakur 《Computing》2013,95(12):1121-1136

Hybrid parallel programming with the message passing interface (MPI) for internode communication in conjunction with a shared-memory programming model to manage intranode parallelism has become a dominant approach to scalable parallel programming. While this model provides a great deal of flexibility and performance potential, it saddles programmers with the complexity of utilizing two parallel programming systems in the same application. We introduce an MPI-integrated shared-memory programming model that is incorporated into MPI through a small extension to the one-sided communication interface. We discuss the integration of this interface with the MPI 3.0 one-sided semantics and describe solutions for providing portable and efficient data sharing, atomic operations, and memory consistency. We describe an implementation of the new interface in the MPICH2 and Open MPI implementations and demonstrate an average performance improvement of 40 % to the communication component of a five-point stencil solver. 相似文献

15.

Dual-Object: 面向对象的并行程序设计

袁伟孙永强《软件学报》1998,9(1):47-52

面向对象的并行程序设计提供了类似于共享内存模型对通讯和计算的抽象能力，从而非常适合于大型并行软件系统的开发．但是基于远程对象调用的分布式对象的实现效率一直是面向对象方法在分布式／并行程序设计中得到广泛应用的障碍．本文介绍了并行机ＭＡＮＮＡ上所采用的面向对象的并行程序设计模型——Ｄｕａｌ－Ｏｂｊｅｃｔ模型．该模型通过引入从语义角度出发给出的数据一致特性的描述，在一定程度上解决了实现效率低下的问题．其次，文章通过程序设计实例详细地讨论了基于Ｄｕａｌ－Ｏｂｊｅｃｔ模型的扩展Ｃ＋＋并行程序设计，并给出了部分实际测试结果．相似文献

16.

A graphical development and debugging environment for parallel programs

《Parallel Computing》1997,22(13):1747-1770

To provide high-level graphical support for PVM (Parallel Virtual Machine) based program development, a complex programming environment (GRADE) is being developed. GRADE currently provides tools to construct, execute, debug, monitor and visualize message-passing parallel programs. It offers a high-level graphical programming abstraction mechanism to construct parallel applications by introducing a new graphical language called GRAPNEL. GRADE also provides the programmer with the same graphical user interface during the program design and debugging stages. A distributed debugging engine (DDBG) assists the user in debugging GRAPNEL programs on distributed memory computer architectures. Tape/PVM and PROVE support the performance monitoring and visualization of parallel programs developed in the GRADE environment. 相似文献

17.

Design and Implementation of an Extended Collectives Library for Unified Parallel C

下载免费PDF全文

Carlos Teijeiro Guillermo L.Taboada Juan Tourio Ramón Doallo Jos C.Mourio Damivn A.Mallón Brian Wibecan 《计算机科学技术学报》2013,28(1):72-89

Unified Parallel C(UPC) is a parallel extension of ANSI C based on the Partitioned Global Address Space(PGAS) programming model,which provides a shared memory view that simplifies code development while it can take advantage of the scalability of distributed memory architectures.Therefore,UPC allows programmers to write parallel applications on hybrid shared/distributed memory architectures,such as multi-core clusters,in a more productive way,accessing remote memory by means of different high-level language constructs,such as assignments to shared variables or collective primitives.However,the standard UPC collectives library includes a reduced set of eight basic primitives with quite limited functionality.This work presents the design and implementation of extended UPC collective functions that overcome the limitations of the standard collectives library,allowing,for example,the use of a specific source and destination thread or defining the amount of data transferred by each particular thread.This library fulfills the demands made by the UPC developers community and implements portable algorithms,independent of the specific UPC compiler/runtime being used.The use of a representative set of these extended collectives has been evaluated using two applications and four kernels as case studies.The results obtained confirm the suitability of the new library to provide easier programming without trading off performance,thus achieving high productivity in parallel programming to harness the performance of hybrid shared/distributed memory architectures in high performance computing. 相似文献

18.

Data race avoidance and replay scheme for developing and debugging parallel programs on distributed shared memory systems

Yung-Chang Chiu Tyng-Yeu Liang 《Parallel Computing》2011,37(1):11-25

Distributed shared memory (DSM) allows parallel programs to run on distributed computers by simulating a global virtual shared memory, but data racing bugs may easily occur when the threads of a multi-threaded process concurrently access the physically distributed memory. Earlier tools to help programmers locate data racing bugs in non-DSM parallel programs are not easily applied to DSM systems. This study presents the data race avoidance and replay scheme (DRARS) to assist debugging parallel programs on DSM or multi-core systems. DRARS is a novel tool which controls the consistency protocol of the target program, automatically preventing a large class of data racing bugs when the parallel program is subsequently run, obviating much of the need for manual debugging. For data racing bugs that cannot be avoided automatically, DRARS performs a deterministic replay-type function on DSM systems, faithfully reproducing the behavior of the parallel program during run time. Because one class of data racing bugs has already been eliminated, the remaining manual debugging task is greatly simplified. Unlike previous debugging methods, DRARS does not require that the parallel program be written in a specific style or programming language. Moreover, DRARS can be implemented in most consistency protocols. In this paper, DRARS is realized and verified in real experiments using the eager release consistency protocol on a DSM system with various applications. 相似文献

19.

An adaptive middleware design to support the dynamic interpretation of domain-specific models

《Information and Software Technology》2015

ContextAs the use of Domain-Specific Modeling Languages (DSMLs) continues to gain popularity, we have developed new ways to execute DSML models. The most popular approach is to execute code resulting from a model-to-code transformation. An alternative approach is to directly execute these models using a semantic-rich execution engine – Domain-Specific Virtual Machine (DSVM). The DSVM includes a middleware layer responsible for the delivery of services in a given domain.ObjectiveWe will investigate an approach that performs the dynamic combination of constructs in the middleware layer of DSVMs to support the delivery of domain-specific services. This middleware should provide: (a) a model of execution (MoE) that dynamically integrates decoupled domain-specific knowledge (DSK) for service delivery, (b) runtime adaptability based on context and available resources, and (c) the same level of operational assurance as any DSVM middleware.MethodOur approach will involve (1) defining a framework that supports the dynamic combination of MoE and DSK and (2) demonstrating the applicability of our framework in the DSVM middleware for user-centric communication. We will measure the overhead of our approach and provide a cost-benefit analysis factoring in its runtime adaptability using appropriate experimentation.ResultsOur experiments show that combining the DSK and MoE for a DSVM middleware allow us to realize efficient specialization while maintaining the required operability. We also show that the overhead introduced by adaptation is not necessarily deleterious to overall performance in a domain as it may result in more efficient operation selection.ConclusionThe approach defined for the DSVM middleware allows for greater flexibility in service delivery while reducing the complexity of application development for the user. These benefits are achieved at the expense of increased execution times, however this increase may be negligible depending on the domain. 相似文献

20.

基于Transputer的中粒度多任务管理的研究 总被引：1，自引：0，他引：1

陈勇刘心松《小型微型计算机系统》1995,16(10):6-11

Ｔｒａｎｓｐｕｔｅｒ是一种特别适于并行处理的处理器芯片，但是由于缺乏系统支撑软件使得对其开发显得很不方便。ＭＧＰＯＳ是一个基于Ｔｒａｎｓｐｕｔｅｒ网络的单用户多任务并行处理操作系统，它支持现有Ｔｒａｎｓｐｕｔｅｒ的ＯＣＣＡＭ程序设计模型，同时允许在装载时根据硬件资源情况对任务进行分配，此外所提供的存储管理接口和任务通信接口等功能为应用程序的开发提供了良好的基础。本文着重描述了该操作系统任务管理的设计思想及其关键技术，同时对其所特有的一些性能也进行了说明。该平台的建立为并行处理技术的进一步研究提供了有力的支持。相似文献