期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Partitioning and mapping of parallel programs by self-organization

Hans-Ulrich Heiss Marcus Dormanns 《Concurrency and Computation》1996,8(9):685-706

To execute a parallel program on a multicomputer system, the tasks of the program have to be mapped to the particular processors of the parallel machine. The aim of the mapping is twofold: (i) to achieve a balanced load on the processors (partitioning problem) and (ii) to keep communication delays low by placing communicating tasks closely together (mapping). Since both the communication structure of the program and the interconnection structure of the parallel machine can be represented as graphs, the mapping problem can be regarded as a graph embedding problem to minimize communication costs. As a new heuristic approach to this NP-hard problem we apply Kohonen's self-organizing maps to establish a topology-preserving embedding. Experimental results are presented and compared to other approaches to this problem. The most attractive feature of our new method is that it can be extremely well parallelized. 相似文献

2.

Performance evaluation and comparison of parallel conjugate gradient on modern multi-core accelerator and massively parallel systems

Fadi N. Sibai 《International Journal of Parallel, Emergent and Distributed Systems》2014,29(1):38-67

Two parallel computer paradigms available today are multi-core accelerators such as the Sony, Toshiba and IBM Cell or Graphics Processing Unit (GPUs), and massively parallel message-passing machines such as the IBM Blue Gene (BG). The solution of systems of linear equations is one of the most central processing unit-intensive steps in engineering and simulation applications and can greatly benefit from the multitude of processing cores and vectorisation on today's parallel computers. We parallelise the conjugate gradient (CG) linear equation solver on the Cell Broadband Engine and the IBM Blue Gene/L machine. We perform a scalability analysis of CG on both machines across 1, 8 and 16 synergistic processing elements and 1–32 cores on BG with heptadiagonal matrices. The results indicate that the multi-core Cell system outperforms by three to four times the massively parallel BG system due to the Cell's higher communication bandwidth and accelerated vector processing capability. 相似文献

3.

ParaGraph: Graph editor support for parallel programming environments

Duane A. Bailey Janice E. Cuny Craig P. Loomis 《International journal of parallel programming》1990,19(2):75-110

We report here on a graph editor, ParaGraph, that supports massively parallel programming. It provides a flexible mechanism for the concise specification of families of annotated graphs, addressing the problems of user annotation and scale independent graph manipulation. ParaGraph currently serves as the basis for tools supporting communication abstractions in program specification and debugging. Its foundation in an extended form of aggregate rewriting graph grammars makes its adaptation to other parallel programming environments straightforward.The Parallel Programming Environments Project at the University of Massachusetts is supported by the Office of Naval Research under Contract N000014-84-K-0647 and by the National Science Foundation under Grants DCR-8500332 and CCR-8712410. 相似文献

4.

CAPLib—a ‘thin layer’ message passing library to support computational mechanics codes on distributed memory parallel systems

《Advances in Engineering Software》2001,32(1):61-83

The Computer Aided Parallelisation Tools (CAPTools) [Ierotheou, C, Johnson SP, Cross M, Leggett PF, Computer aided parallelisation tools (CAPTools)—conceptual overview and performance on the parallelisation of structured mesh codes, Parallel Computing, 1996;22:163–195] is a set of interactive tools aimed to provide automatic parallelisation of serial FORTRAN Computational Mechanics (CM) programs. CAPTools analyses the user's serial code and then through stages of array partitioning, mask and communication calculation, generates parallel SPMD (Single Program Multiple Data) messages passing FORTRAN.The parallel code generated by CAPTools contains calls to a collection of routines that form the CAPTools communications Library (CAPLib). The library provides a portable layer and user friendly abstraction over the underlying parallel environment. CAPLib contains optimised message passing routines for data exchange between parallel processes and other utility routines for parallel execution control, initialisation and debugging. By compiling and linking with different implementations of the library, the user is able to run on many different parallel environments.Even with today's parallel systems the concept of a single version of a parallel application code is more of an aspiration than a reality. However for CM codes the data partitioning SPMD paradigm requires a relatively small set of message-passing communication calls. This set can be implemented as an intermediate ‘thin layer’ library of message-passing calls that enables the parallel code (especially that generated automatically by a parallelisation tool such as CAPTools) to be as generic as possible.CAPLib is just such a ‘thin layer’ message passing library that supports parallel CM codes, by mapping generic calls onto machine specific libraries (such as CRAY SHMEM) and portable general purpose libraries (such as PVM an MPI). This paper describe CAPLib together with its three perceived advantages over other routes:

•as a high level abstraction, it is both easy to understand (especially when generated automatically by tools) and to implement by hand, for the CM community (who are not generally parallel computing specialists);
•the one parallel version of the application code is truly generic and portable;
•the parallel application can readily utilise whatever message passing libraries on a given machine yield optimum performance.

相似文献

5.

User transparency: a fully sequential programming model for efficient data parallel image processing

F. J. Seinstra D. Koelma 《Concurrency and Computation》2004,16(6):611-644

Although many image processing applications are ideally suited for parallel implementation, most researchers in imaging do not benefit from high‐performance computing on a daily basis. Essentially, this is due to the fact that no parallelization tools exist that truly match the image processing researcher's frame of reference. As it is unrealistic to expect imaging researchers to become experts in parallel computing, tools must be provided to allow them to develop high‐performance applications in a highly familiar manner. In an attempt to provide such a tool, we have designed a software architecture that allows transparent (i.e. sequential) implementation of data parallel imaging applications for execution on homogeneous distributed memory MIMD‐style multicomputers. This paper presents an extensive overview of the design rationale behind the software architecture, and gives an assessment of the architecture's effectiveness in providing significant performance gains. In particular, we describe the implementation and automatic parallelization of three well‐known example applications that contain many fundamental imaging operations: (1) template matching; (2) multi‐baseline stereo vision; and (3) line detection. Based on experimental results we conclude that our software architecture constitutes a powerful and user‐friendly tool for obtaining high performance in many important image processing research areas. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献

6.

PProto: An environment for prototyping parallel programs

Ramón D. Acosta 《Journal of Systems Integration》1991,1(3-4):339-365

This paper describes Parallel Proto (PProto), an integrated environment for constructing prototypes of parallel programs. Using functional and performance modeling of dataflow specifications, PProto assists in analysis of high-level software and hardware architectural tradeoffs. Facilities provided by PProto include a visual language and an editor for describing hierarchical dataflow graphs, a resource modeling tool for creating parallel architectures, mechanisms for mapping software components to hardware components, an interactive simulator for prototype interpretation, and a reuse capability. The simulator contains components for instrumenting, animating, debugging, and displaying results of functional and performance models. The Pproto environment is built on top of a substrate for managing user interfaces and database objects to provide consistent views of design objects across system tools. 相似文献

7.

Communication Benchmarking and Performance Modelling of MPI Programs on Cluster Computers

D.?A.?Grove Email author P.?D.?Coddington 《The Journal of supercomputing》2005,34(2):201-217

This paper gives an overview of two related tools that we have developed to provide more accurate measurement and modelling of the performance of message-passing communication and application programs on distributed memory parallel computers. MPIBench uses a very precise, globally synchronised clock to measure the performance of MPI communication routines. It can generate probability distributions of communication times, not just the average values produced by other MPI benchmarks. This allows useful insights to be made into the MPI communication performance of parallel computers, and in particular how performance is affected by network contention. The Performance Evaluating Virtual Parallel Machine (PEVPM) provides a simple, fast and accurate technique for modelling and predicting the performance of message-passing parallel programs. It uses a virtual parallel machine to simulate the execution of the parallel program. The effects of network contention can be accurately modelled by sampling from the probability distributions generated by MPIBench. These tools are particularly useful on clusters with commodity Ethernet networks, where relatively high latencies, network congestion and TCP problems can significantly affect communication performance, which is difficult to model accurately using other tools. Experiments with example parallel programs demonstrate that PEVPM gives accurate performance predictions on commodity clusters. We also show that modelling communication performance using average times rather than sampling from probability distributions can give misleading results, particularly for programs running on a large number of processors. 相似文献

8.

A communication-reduced and computation-balanced framework for fast graph computation

Yongli CHENG Fang WANG Hong JIANG Yu HUA Dan FENG Lingling ZHANG Jun ZHOU 《Frontiers of Computer Science》2018,12(5):887-907

The bulk synchronous parallel (BSP) model is very user friendly for coding and debugging parallel graph algorithms. However, existing BSP-based distributed graph-processing frameworks, such as Pregel, GPS and Giraph, routinely suffer from high communication costs. These high communication costs mainly stem from the fine-grained message-passing communication model. In order to address this problem, we propose a new computation model with low communication costs, called LCC-BSP. We use this model to design and implement a high-performance distributed graph-processing framework called LCC-Graph. This framework eliminates high communication costs in existing distributed graph-processing frameworks. Moreover, LCC-Graph also balances the computation workloads among all compute nodes by optimizing graph partitioning, significantly reducing the computation time for each superstep. Evaluation of LCC-Graph on a 32-node cluster, driven by real-world graph datasets, shows that it significantly outperforms existing distributed graph-processing frameworks in terms of runtime, particularly when the system is supported by a high-bandwidth network. For example, LCC-Graph achieves an order of magnitude performance improvement over GPS and GraphLab. 相似文献

9.

基于网络的分布并行虚拟计算机的实现 总被引：2，自引：0，他引：2

梅皓沈志宇《计算机工程与设计》2001,22(4):63-68

构造了一个基于网络的分布并行虚拟计算机DPVM,它由虚拟机层、通信层和基本类层组成,包括服务器,工作机和客户机3种不同类型的机器,在简要介绍DPVM的总体结构之后,从服务器,任务通信,程序输出和消息传递并行语义的面向对象实现几方面对系统实现的关键技术进行讨论。相似文献

10.

High-level abstractions for message-passing parallel programming

Fan Chan Jiannong Cao Yudong Sun 《Parallel Computing》2003,29(11-12):1589

Large-scale scientific and engineering computation problems are usually complex and consequently the development of parallel programs for solving these problems is a difficult task. In this paper, we describe the graph-oriented programming (GOP) model and environment for building and evaluating parallel applications. The GOP model provides higher level abstractions for message-passing parallel programming and the software environment offers tools which can ease programmers for parallelizing, writing, and deploying scientific and engineering computing applications. We discuss the motivations and various issues in developing the model and the software environment, present the design of the system architecture and the components, and describe the evaluation of the environment implemented on top of MPI with a sample parallel scientific application program. With the support of the high-level abstractions provided by the proposed GOP environment, programming of parallel applications on various parallel architectures can be greatly simplified. 相似文献

11.

Homogeneous and heterogeneous parallel architectures in real-time signal processing and control

M. O. Tokhi M. A. Hossain 《Control Engineering Practice》1995,3(12):1675-1686

This paper presents an investigation into the real-time performance of parallel architectures in signal-processing and control applications. Several algorithms of regular and irregular nature are implemented on a number of architectures. Hardware and software resources, and the capabilities of the architectures and characteristics of the algorithms are considered for suitable matching between the algorithms and the architectures. The partitioning and mapping of the algorithms on the architectures and inter-processor communication techniques are investigated. Finally, a comparison of the results of various implementations is made to establish the merits of the design and development of parallel architectures for real-time signal-processing and control applications. 相似文献

12.

A generalized scheme for mapping parallel algorithms

Chaudhary V. Aggarwal J.K. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(3):328-346

A generalized mapping strategy that uses a combination of graph theory, mathematical programming, and heuristics is proposed. The authors use the knowledge from the given algorithm and the architecture to guide the mapping. The approach begins with a graphical representation of the parallel algorithm (problem graph) and the parallel computer (host graph). Using these representations, the authors generate a new graphical representation (extended host graph) on which the problem graph is mapped. An accurate characterization of the communication overhead is used in the objective functions to evaluate the optimality of the mapping. An efficient mapping scheme is developed which uses two levels of optimization procedures. The objective functions include minimizing the communication overhead and minimizing the total execution time which includes both computation and communication times. The mapping scheme is tested by simulation and further confirmed by mapping a real world application onto actual distributed environments 相似文献

13.

A parallel algorithm for surface-based object reconstruction

Theodore Johnson Panos E. Livadas 《Journal of Mathematical Imaging and Vision》1994,4(4):389-400

This paper presents a parallel algorithm that approximates the surface of an object from a collection of its planar contours. Such a reconstruction has wide applications in such diverse fields as biological research, medical diagnosis and therapy, architecture, automobile and ship design, and solid modeling. The surface reconstruction problem is transformed into the problem of finding a minimum-cost acceptable path on a toroidal grid graph, where each horizontal and each vertical edge have the same orientation. An acceptable path is closed path that makes a complete horizontal and vertical circuit. We exploit the structure of this graph to develop efficient parallel algorithms for a message-passing computer. Givenp processors and anm byn toroidal graph, our algorithm will find the minimum cost acceptable path inO(mn log(m)/p) steps, ifp =O(mn/((m + n) log(mn/(m + n)))), which is an optimal speedup. We also show that the algorithm will sendO(p ²(m + n)) messages. The algorithm has a linear topology, so it is easy to embed the algorithm in common multiprocessor architectures. 相似文献

14.

Testing Moderately Parallel Environments for an Ocean Modeling Application

Jerry L. Bickham Germana Peggion Benjamin R. Seyfarth 《Journal of scientific computing》1998,13(2):185-200

Due to the high costs of accessing massively parallel and vector environments, as well as the overworking of high-performance computers, there is now a need for a different approach to parallel computing. The feasibility of ocean modeling in a moderately parallel environment is tested using a 2-D (vertically-integrated) ocean circulation model. The parallel algorithm is based on the Glenda message-passing software and follows the master-worker paradigm. It is evaluated on both internal and external communication environments. The numerical experiments show that the internal communication environment is only slightly more efficient than the external communication environment. This is due to a combination of shared memory problems in the internal communication environment and to inefficiencies in the message-passing software. The tests also demonstrate how efficiency depends on the domain sub-divisions. Most importantly, they show that both environments effectively outperform their sequential counterparts, reducing the program elapsed time, and offering quicker access to the model outputs. The parallel version provided a time-saving alternative to the sequential version of the same model on both internal and external communication platforms. This research supports the conclusion that both environments are a viable alternative to single-CPU machines and that moderately parallel environments are feasible computer platforms for ocean modeling applications. 相似文献

15.

Modeling the performance of parallel applications using model selection techniques

D. R. Martínez V. Blanco J. C. Cabaleiro T. F. Pena F. F. Rivera 《Concurrency and Computation》2014,26(2):586-599

Nowadays, parallel architectures are changing so fast that there is a need for scalable and efficient tools to analyze and predict the performance of parallel applications. Analytical models are proved to be a useful approximation for characterizing parallel algorithms, but developing accurate analytical models is a hard issue, and, in general, they provide coarse performance predictions due to their intrinsic lack of accuracy. In this paper, we describe in detail the Tools for Instrumentation and Analysis (TIA) framework, an easy‐to‐use tool that automatically obtains accurate performance models by means of analytical expressions. This framework automatizes most of its internal tasks, reducing opportunities for human error, and it only requires the user to focus on the metrics and execution parameters that might influence the performance, those that should be considered in the modeling process. Its main advantage over other tools is that TIA uses model selection techniques that allow the automation of the modeling process. As a case of study, the use of TIA to obtain analytical models of different implementations of the broadcast collective communication in a cluster of multicores is shown. The results obtained by TIA are evaluated and compared with theoretical approaches based on the LogGP model. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

16.

A graphical development and debugging environment for parallel programs

《Parallel Computing》1997,22(13):1747-1770

To provide high-level graphical support for PVM (Parallel Virtual Machine) based program development, a complex programming environment (GRADE) is being developed. GRADE currently provides tools to construct, execute, debug, monitor and visualize message-passing parallel programs. It offers a high-level graphical programming abstraction mechanism to construct parallel applications by introducing a new graphical language called GRAPNEL. GRADE also provides the programmer with the same graphical user interface during the program design and debugging stages. A distributed debugging engine (DDBG) assists the user in debugging GRAPNEL programs on distributed memory computer architectures. Tape/PVM and PROVE support the performance monitoring and visualization of parallel programs developed in the GRADE environment. 相似文献

17.

Graph grammar‐driven parallel partial differential equation solver

Maciej Paszy&#x;ski Robert Schaefer 《Concurrency and Computation》2010,22(9):1063-1097

The paper presents an extension of the composite programmable graph grammar (CP‐graph grammar) suitable for modeling the parallel direct solver algorithm utilized by the hp finite element method (hp‐FEM). In the proposed graph grammar model, the computational mesh is represented by a CP‐graph. The presented graph grammar models the solver algorithm by a set of graph grammar productions. The graph grammar model makes it possible to examine the concurrency of the algorithm by analyzing the interdependence between the atomic tasks, tasks and super‐tasks. The atomic tasks correspond to the graph grammar productions, representing basic undividable parts of the algorithms. The level of atomic tasks models the concurrency for the shared memory architectures. On the other hand, the tasks correspond to the groups of atomic tasks with predefined inter‐task communication channels. They constitute the grain for the decomposition of the parallel algorithm for the distributed memory architecture. Finally, the super‐tasks correspond to a group of tasks resulting from the execution of load balancing algorithm. The solver algorithm is tested on distributed memory linux cluster for up to 192 processors. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

18.

ParTransgrid: A scalable parallel preprocessing tool for unstructured-grid cell-centered computational fluid dynamics applications

Jian Zhang Jie Liu Naichun Zhou Jing Tang Xie He Jianqiang Chen 《Software》2023,53(1):6-26

The development of a basic scalable preprocessing tool is the key routine to accelerate the entire computational fluid dynamics (CFD) workflow toward the exascale computing era. In this work, a parallel preprocessing tool, called ParTransgrid, is developed to translate the general grid format like CFD General Notation System into an efficient distributed mesh data format for large-scale parallel computing. Through ParTransgrid, a flexible face-based parallel unstructured mesh data structure designed in Hierarchical Data Format can be obtained to support various cell-centered unstructured CFD solvers. The whole parallel preprocessing operations include parallel grid I/O, parallel mesh partition, and parallel mesh migration, which are linked together to resolve the run-time and memory consumption bottlenecks for increasingly large grid size problems. An inverted index search strategy combined with a multi-master-slave communication paradigm is proposed to improve the pairwise face matching efficiency and reduce the communication overhead when constructing the distributed sparse graph in the phase of parallel mesh partition. And we present a simplified owner update rule to fast the procedure of raw partition boundaries migration and the building of shared faces/nodes communication mapping list between new sub-meshes with an order of magnitude of speed-up. Experiment results reveal that ParTransgrid can be easily scaled to billion-level grid CFD applications, the preparation time for parallel computing with hundreds of thousands of cores is reduced to a few minutes. 相似文献

19.

Visual programming support for graph‐oriented parallel/distributed processing

Fan Chan Jiannong Cao Alvin T. S. Chan Kang Zhang 《Software》2005,35(15):1409-1439

GOP is a graph‐oriented programming model which aims at providing high‐level abstractions for configuring and programming cooperative parallel processes. With GOP, the programmer can configure the logical structure of a parallel/distributed program by constructing a logical graph to represent the communication and synchronization between the local programs in a distributed processing environment. This paper describes a visual programming environment, called VisualGOP, for the design, coding, and execution of GOP programs. VisualGOP applies visual techniques to provide the programmer with automated and intelligent assistance throughout the program design and construction process. It provides a graphical interface with support for interactive graph drawing and editing, visual programming functions and automation facilities for program mapping and execution. VisualGOP is a generic programming environment independent of programming languages and platforms. GOP programs constructed under VisualGOP can run in heterogeneous parallel/distributed systems. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

20.

An Architecture-Independent Graphical Tool for Automatic Contention-Free Process-to-Processor Mapping

Shen Hong Lor Sam Maheshwari Piyush 《The Journal of supercomputing》2001,18(2):115-139

Mapping of parallel programs onto parallel computers for efficient execution is a fundamental problem of great significance in parallel processing. This paper presents an architecture-independent software tool for contention-free mapping of arbitrary parallel programs onto parallel computers with arbitrary configurations. This mapping tool is based on an efficient heuristic algorithm that runs in time O(n ³+m ⁴) in the worst case for mapping n tasks onto m processors, where m n in most practical cases. It is fully implemented and incorporated into a graph editing system to produce a graphical mapping tool which enables its user to monitor and control the mapping process. The user can assist the mapping process or employ the algorithm to map automatically. Our mapping tool has been tested and its performance evaluated extensively. Experimental results show that our tool combines user intuition and mapping heuristics effectively to make it a powerful mapping tool which is practical to use. Our mapping tool can be easily extended for use in the more general case when the link contention-degree is bounded to a fixed system-specified value without increasing its complexity. 相似文献