首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
Parallel asynchronous iterative algorithms relax synchronization and communication requirements, and can potentially extend Desktop Grids beyond embarrassingly parallel applications to support a broader class of parallel iterative applications. This paper presents the design and implementation of CometG, a decentralized (peer-to-peer) computational infrastructure that extends Desktop Grid environments to support these applications. CometG provides a decentralized and scalable tuple space, efficient communication and coordination support, and application-level abstractions that can be used to implement Desktop Grid applications based on parallel asynchronous iterative algorithms using the master-worker/BOT paradigm. The deployment and evaluations of CometG and a CometG-based application in a wide-area environment using the PlanetLab [7] test bed, as well as a campus network are presented.  相似文献   

2.
Asynchronous iterative algorithms can reduce much of the data dependencies associated with synchronization barriers. The reported study investigates the potentials of asynchronous iterative algorithms by quantifying the critical parallel processing factors. Specifically, a time complexity-based analysis method is used to understand the inherent interdependencies between computing and communication overheads for the parallel asynchronous algorithm. The results show, not only that the computational experiments closely match the analytical results, but also that the use of asynchronous iterative algorithms can be beneficial for a vast number of parallel processing environments. The choice of local stopping criteria that is critically important to the overall system performance is investigated in depth.  相似文献   

3.
AIAC algorithms (Asynchronous Iterations Asynchronous Communications) are a particular class of parallel iterative algorithms. Their asynchronous nature makes them more efficient than their synchronous counterparts in numerous cases as has already been shown in previous works. The first goal of this article is to compare several parallel programming environments in order to see if there is one of them which is best suited to efficiently implement AIAC algorithms. The main criterion for this comparison consists in the performances achieved in a global context of grid computing for two classical scientific problems. Nevertheless, we also take into account two secondary criteria which are the ease of programming and the ease of deployment. The second goal of this study is to extract from this comparison the important features that a parallel programming environment must have in order to be suited for the implementation of AIAC algorithms.  相似文献   

4.
《国际计算机数学杂志》2012,89(3-4):391-410
In this paper the Quadrant Interlocking (QI) matrix splitting is shown to yield parallel iterative methods for the solution of linear equations with improved convergence rates for both synchronous and asynchronous versions of the algorithms.  相似文献   

5.
《Parallel Computing》1997,23(13):1855-1875
Asynchronous parallel computing can result in high message generation rates, thus triggering network congestion. We characterize the communication requirements of a large class of supercomputing applications falling under the category of fixed-point problems amenable to solution by parallel iterative methods. In particular, we concentrate on asynchronous iterative algorithms whose communication/computation ratio is especially high resulting in degraded effective throughput if communication is not managed properly. Second, we show the effects of network contention and asynchrony on application performance in a local-area network environment and investigate methods of solution. Our approach is based on a congestion control algorithm called ‘warp control’ whose adaptive properties are exploited to yield significant performance enhancements when network contention is high. Although tested in a LAN environment for experimental control purposes, our solution follows the end-to-end paradigm and refrains from exploiting special MAC-layer properties to achieve applicability to general WAN environments. Third, we provide a framework wherein efficient congestion control can be facilitated, encompassing methods acting at the application layer as well as the transport/network layer, with emphasis on application-driven control. We conclude with a discussion of our experimental results and special issues arising in high-bandwidth ATM networks.  相似文献   

6.
We discuss a parallel library of efficient algorithms for model reduction of large-scale systems with state-space dimension up to (104). We survey the numerical algorithms underlying the implementation of the chosen model reduction methods. The approach considered here is based on state-space truncation of the system matrices and includes absolute and relative error methods for both stable and unstable systems. In contrast to serial implementations of these methods, we employ Newton-type iterative algorithms for the solution of the major computational tasks. Experimental results report the numerical accuracy and the parallel performance of our approach on a cluster of Intel Pentium II processors.  相似文献   

7.
DGLa: A Distributed Graphics Language   总被引:1,自引:0,他引:1       下载免费PDF全文
A distributed graphics programming language called DGLa is presented,which facilitates the development of distributed graphics application.Facilities for distributed programming and graphics support are included in it,It not only supports synchronous and asynchronous communication but also provides programmer with multiple control mechanism for process communication.The graphics support of DGLa is powerful,for both sequential graphics library and parallel graphics library are provided.The design consideration and implementation experience are discussed in detail in this paper.Application examples are also given.  相似文献   

8.
This paper presents a set of benchmarks and metrics for performance reporting in explicit state parallel model checking algorithms. The benchmarks are selected for controllability, and the metrics are chosen to measure speedup and communication overhead. The benchmarks and metrics are used to compare two parallel model checking algorithms: partition and random walk. Implementations of the partition algorithm using synchronous and asynchronous communication are used. Metrics are reported for each benchmark and algorithm for up to 128 workstations using a network of dynamically loaded workstations. Empirical results show that load balancing becomes an issue for more than 32 workstations in the partition algorithm and that random walk is a reasonable, low overhead, approach for finding errors in large models. The synchronous implementation is consistently faster than the asynchronous. The benchmarks, metrics and results given here are intended to be a starting point for a larger discussion of performance reporting in parallel explicit state model checking.  相似文献   

9.
邱毓兰  王平 《计算机学报》1991,14(12):915-922
本文主要论述了一个面向科学计算,支撑“异步并行算法”解题的分布式操作系统中的某些基本问题.采用一种拟层次结构的对称形式、无主从关系.对并发进程的派生调度、进程间的通信和同步、共享资源的互斥等方面的问题,提出了一些新的有效方法,致使并行解题加速效率最高达0.87.  相似文献   

10.
A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the Collective Communication Library (CCL), intended for the line of scalable parallel computer products by IBM, has been designed. CCL is part of the parallel application programming interface of the recently announced IBM 9076 Scalable POWERparallel System 1 (SP1). In this paper, we examine several issues related to the functionality, correctness, and performance of a portable collective communication library while focusing on three novel aspects in the design and implementation of CCL: 1) the introduction of process groups, 2) the definition of semantics that ensures correctness, and 3) the design of new and tunable algorithms based on a realistic point-to-point communication model  相似文献   

11.
Solving large, sparse, linear systems of equations is a fundamental problems in large scale scientific and engineering computation. A model of a general class of asynchronous, iterative solution methods for linear systems is developed. In the model, the system is solved by creating several cooperating tasks that each compute a portion of the solution vector. A data transfer model predicting both the probability that data must be transferred between two tasks and the amount of data to be transferred is presented. This model is used to derive an execution time model for predicting parallel execution time and an optimal number of tasks given the dimension and sparsity of the coefficient matrix and the costs of computation, synchronization, and communication.The suitability of different parallel architectures for solving randomly sparse linear systems is discussed. Based on the complexity of task scheduling, one parallel architecture, based on a broadcast bus, is presented and analyzed.  相似文献   

12.
In relation with the mathematics of financial applications, the present study deals with the solution of the time dependent obstacle problem defined in a three-dimensional domain; this problem arises in the pricing of American options derivatives. In order to solve very quickly large scale algebraic systems derived from the discretization of the obstacle problem, the parallelization of the numerical algorithm is necessary. So, we present parallel synchronous, and more generally asynchronous, iterative algorithms to solve this problem. For the considered problem, arguments implying the convergence of parallel synchronous and asynchronous algorithms are given in a general framework. Finally, computational experiments on GRID’5000, the French national grid, are presented and analyzed. They allow us to compare both synchronous and asynchronous versions with local and distributed clusters and to show the interest of such methods in the context of grid computing.  相似文献   

13.
郭绚  郭平  郑守淇 《计算机学报》1999,22(6):591-595
介绍了一基于PVM并行环境的并行遗传算法的C++类库ParaGA的设计和实现,ParaGA以使用方便和灵活为主要目标,提供了透明的并行机制,使不具有并行程序经验的用户可以方便地编写并行遗传算法的程序,高级用户也可通过类库提供的若干方法来获得的优化的可行性能,类库采用粗粒度模型,支持并行遗传算法的3种迁移模式及SPMD和Master/Slave两种编程模式,ParaGA也提供了实现负载平衡分与及利用  相似文献   

14.
The solution of linear and nonlinear convection–diffusion problems via parallel subdomain methods is considered. MPI implementation of parallel Schwarz alternating methods on distributed memory multiprocessors is discussed. Parallel synchronous and asynchronous iterative schemes of computation are studied. Experimental results obtained from IBM-SP series machines are displayed and analyzed. The benefits of using parallel asynchronous Schwarz alternating methods are clearly shown.  相似文献   

15.
This article presents an algorithm that performs a decentralized detection of the global convergence of parallel asynchronous iterative applications. This algorithm is fault tolerant. It runs a decentralized saving procedure which enables this algorithm, after a node’s crash, to replace the dead node by a new one which will continue the computing task from the last check point. Combined with the advantages of the asynchronous iteration model, this method allows us to compute very large scale problems using highly volatile parallel architectures like Peer-to-Peer and distributed clusters architectures. We also present the implementation of this algorithm in the JaceP2P platform which is dedicated to designing and executing parallel asynchronous iterative applications in volatile environments. Numerous experiments show the robustness and the efficiency of our algorithm.  相似文献   

16.
Most parallel game-tree search approaches use synchronous methods, where the work is concentrated within a specific part of the tree or at a given search depth. This article shows that asynchronous game-tree search algorithms can be as efficient as or better than synchronous methods in determining the minimax value.  APHID, a new asynchronous parallel game-tree search algorithm, is presented. APHID is implemented as a freely available portable library, making the algorithm easy to integrate into a sequential game-tree searching program. APHID has been added to four programs written by different authors. APHID yields better speedups than synchronous search methods for an Othello and a checkers program and comparable speedups on two chess programs.  相似文献   

17.
This article focuses on the effect of both process topology and load balancing on various programming models for SMP clusters and iterative algorithms. More specifically, we consider nested loop algorithms with constant flow dependencies, that can be parallelized on SMP clusters with the aid of the tiling transformation. We investigate three parallel programming models, namely a popular message passing monolithic parallel implementation, as well as two hybrid ones, that employ both message passing and multi-threading. We conclude that the selection of an appropriate mapping topology for the mesh of processes has a significant effect on the overall performance, and provide an algorithm for the specification of such an efficient topology according to the iteration space and data dependencies of the algorithm. We also propose static load balancing techniques for the computation distribution between threads, that diminish the disadvantage of the master thread assuming all inter-process communication due to limitations often imposed by the message passing library. Both improvements are implemented as compile-time optimizations and are further experimentally evaluated. An overall comparison of the above parallel programming styles on SMP clusters based on micro-kernel experimental evaluation is further provided, as well.  相似文献   

18.
Yu  Hui  Jiang  Xin-Yu  Zhao  Jin  Qi  Hao  Zhang  Yu  Liao  Xiao-Fei  Liu  Hai-Kun  Mao  Fu-Bing  Jin  Hai 《计算机科学技术学报》2022,37(4):797-813

Many systems have been built to employ the delta-based iterative execution model to support iterative algorithms on distributed platforms by exploiting the sparse computational dependencies between data items of these iterative algorithms in a synchronous or asynchronous approach. However, for large-scale iterative algorithms, existing synchronous solutions suffer from slow convergence speed and load imbalance, because of the strict barrier between iterations; while existing asynchronous approaches induce excessive redundant communication and computation cost as a result of being barrier-free. In view of the performance trade-off between these two approaches, this paper designs an efficient execution manager, called Aiter-R, which can be integrated into existing delta-based iterative processing systems to efficiently support the execution of delta-based iterative algorithms, by using our proposed group-based iterative execution approach. It can efficiently and correctly explore the middle ground of the two extremes. A heuristic scheduling algorithm is further proposed to allow an iterative algorithm to adaptively choose its trade-off point so as to achieve the maximum efficiency. Experimental results show that Aiter-R strikes a good balance between the synchronous and asynchronous policies and outperforms state-of-the-art solutions. It reduces the execution time by up to 54.1% and 84.6% in comparison with existing asynchronous and the synchronous models, respectively.

  相似文献   

19.
A unified framework for the construction of various synchronous and asynchronous parallel matrix multisplitting iterative methods, suitable to the SIMD and MIMD multiprocessor systems, respectively, is presented, and its convergence theory is established under rather weak conditions. These afford general method models and systematical convergence criterions for studying the parallel iterations in the sense of matrix multisplitting. In addition, how the known parallel matrix multisplitting iterative methods can be classified into this new framework, and what novel ones can be generated by it are shown in detail.  相似文献   

20.
This new version of the HOTB program for calculation of the three and four particle harmonic oscillator transformation brackets provides some enhancements and corrections to the earlier version (Germanas et al., 2010) [1]. In particular, new version allows calculations of harmonic oscillator transformation brackets be performed in parallel using MPI parallel communication standard. Moreover, higher precision of intermediate calculations using GNU Quadruple Precision and arbitrary precision library FMLib [2] is done. A package of Fortran code is presented. Calculation time of large matrices can be significantly reduced using effective parallel code. Use of Higher Precision methods in intermediate calculations increases the stability of algorithms and extends the validity of used algorithms for larger input values.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号