期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Design of scalable Java message-passing communications over InfiniBand

Roberto R. Expósito Guillermo L. Taboada Juan Touri?o Ramón Doallo 《The Journal of supercomputing》2012,61(1):141-165

This paper presents ibvdev a scalable and efficient low-level Java message-passing communication device over InfiniBand. The continuous increase in the number of cores per processor underscores the need for efficient communication support for parallel solutions. Moreover, current system deployments are aggregating a significant number of cores through advanced network technologies, such as InfiniBand, increasing the complexity of communication protocols, especially when dealing with hybrid shared/distributed memory architectures such as clusters. Here, Java represents an attractive choice for the development of communication middleware for these systems, as it provides built-in networking and multithreading support. As the gap between Java and compiled languages performance has been narrowing for the last years, Java is an emerging option for High Performance Computing (HPC). The developed communication middleware ibvdev increases Java applications performance on clusters of multicore processors interconnected via InfiniBand through: (1) providing Java with direct access to InfiniBand using InfiniBand Verbs API, somewhat restricted so far to MPI libraries; (2) implementing an efficient and scalable communication protocol which obtains start-up latencies and bandwidths similar to MPI performance results; and (3) allowing its integration in any Java parallel and distributed application. In fact, it has been successfully integrated in the Java messaging library MPJ Express. The experimental evaluation of this middleware on an InfiniBand cluster of multicore processors has shown significant point-to-point performance benefits, up to 85% start-up latency reduction and twice the bandwidth compared to previous Java middleware on InfiniBand. Additionally, the impact of ibvdev on message-passing collective operations is significant, achieving up to one order of magnitude performance increases compared to previous Java solutions, especially when combined with multithreading. Finally, the efficiency of this middleware, which is even competitive with MPI in terms of performance, increments the scalability of communications intensive Java HPC applications. 相似文献

2.

Competitive neural networks on message-passing parallel computers

Michele Ceccarelli Alfredo Petrosino Roberto Vaccaro 《Concurrency and Computation》1993,5(6):449-470

相似文献

3.

Yama: a scalable generational garbage collector for Java in multiprocessor systems

Muthukumar R.M. Janakiram D. 《Parallel and Distributed Systems, IEEE Transactions on》2006,17(2):148-159

The current state-of-the-art generational garbage collector pauses all the program threads when it performs young and old generation garbage collection. As the number of program threads increases, the delay due to garbage collection also increases, thus restricting the scalability of the collector. In order to improve the scalability and reduce the pause time, an on-the-fly generational garbage collector called Yama is proposed for multiprocessor systems. This uses the on-the-fly deferred reference counting in the young generation and the DLG (Doligez Leroy Gonthier) on-the-fly mark and sweep garbage collector in the old generation. We have proposed and experimented with two novel variations of the on-the-fly deferred reference counting called Chitragupt1 and Chitragupt2 in the young generation. Yama does not pause all the application threads simultaneously. An adaptive tenuring policy based on object reference count and survival rate is also proposed. Yama has been implemented in the IBM Jikes RVM (research virtual machine). The above claims are supported with experimental results for standard benchmark programs. The results show that Yama has an extremely low pause time in both the young and the old generation. The pause time reduction results in better response times for the user programs. 相似文献

4.

High-level abstractions for message-passing parallel programming

Fan Chan Jiannong Cao Yudong Sun 《Parallel Computing》2003,29(11-12):1589

Large-scale scientific and engineering computation problems are usually complex and consequently the development of parallel programs for solving these problems is a difficult task. In this paper, we describe the graph-oriented programming (GOP) model and environment for building and evaluating parallel applications. The GOP model provides higher level abstractions for message-passing parallel programming and the software environment offers tools which can ease programmers for parallelizing, writing, and deploying scientific and engineering computing applications. We discuss the motivations and various issues in developing the model and the software environment, present the design of the system architecture and the components, and describe the evaluation of the environment implemented on top of MPI with a sample parallel scientific application program. With the support of the high-level abstractions provided by the proposed GOP environment, programming of parallel applications on various parallel architectures can be greatly simplified. 相似文献

5.

Snap-stabilization in message-passing systems

Sylvie Delaët Stéphane Devismes Mikhail Nesterenko Sébastien Tixeuil 《Journal of Parallel and Distributed Computing》2010

In this paper, we tackle the problem of snap-stabilization in message-passing systems. Snap-stabilization allows designing protocols that withstand transient faults: indeed, any computation that is started after faults cease immediately satisfies the expected specification. 相似文献

6.

Implementation of production systems on message-passing computers

Acharya A. Tambe M. Gupta A. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(4):477-487

The authors examine the suitability of message-passing computers for parallel implementations of production systems. Two mappings for production systems on these computers, one targeted toward fine-grained message-passing machines and the other targeted toward medium-grained machines, are presented. Simulation results for the medium-grained mapping are presented, and it is shown that it is possible to exploit the available parallelism and to obtain reasonable speedups. The authors perform a detailed analysis of the results and suggest solutions for some of the problems 相似文献

7.

A step-wise-overlapped parallel annealing algorithm on a message-passing multiprocessor system

Youngtak Kim Myunghwan Kim 《Concurrency and Computation》1990,2(2):123-148

相似文献

8.

Hypertool: a programming aid for message-passing systems 总被引：4，自引：0，他引：4

Wu M.-Y. Gajski D.D. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(3):330-343

Programming assistance, automation concepts, and their application to a message-passing system program development tool called Hypertool are discussed. Hypertool performs scheduling and handles the communication primitive insertion automatically, thereby increasing productivity and eliminating synchronization errors. Two algorithms, based on the critical-path method, are presented for scheduling processes statically. Hypertool also generates the performance estimates and other program quality measures to help programmers improve their algorithms and programs 相似文献

9.

Proof systems for message-passing process algebras 总被引：2，自引：0，他引：2

M. Hennessy H. Lin 《Formal Aspects of Computing》1996,8(4):379-407

We give sound and complete proof systems for a variety of bisimulation based equivalences over a message-passing process algebra. The process algebra is a generalisation of pureCCS where the actions consist of receiving and sending messages or data on communication channels; the standard prefixing operatora.p is replaced by the two operatorsc?x.p andc!e.p and in addition messages can be tested by a conditional construct. The various proof systems are parameterised on auxiliary proof systems for deciding on equalities or more general boolean identities over the expression language for data. The completeness of these proof systems are thus relative to the completeness of the auxiliary proof systems. 相似文献

10.

Optimal tracing and replay for debugging message-passing parallel programs

Robert H. B. Netzer Barton P. Miller 《The Journal of supercomputing》1995,8(4):371-388

A common debugging strategy involves reexecuting a program (on a given input) over and over, each time gaining more information about bugs. Such techniques can fail on message-passing parallel programs. Because of nondeterminacy, different runs on the given input may produce different results. This nonrepeatability is a serious debugging problem, since an execution cannot always be reproduced to track down bugs. This paper presents a technique for tracing and replaying message-passing programs. By tracing the order in which messages are delivered, a reexecution can be forced to deliver messages in their original order, reproducing the original execution. To reduce the overhead of such a scheme, we show that the delivery'order of only messages involved inraces need be traced (and not every message). Our technique makes run-time decisions to detect and trace racing messages and is usuallyoptimal in the sense that the minimal number of racing messages is traced. Experiments indicate that only 1% of the messages are often traced, gaining a reduction of two orders of magnitude over traditional techniques that trace every message. These traces allow an execution to be reproduced any number of times for debugging. Our work is novel in that we adaptively decide what to trace, and trace only those messages that introduce nondeterminacy. With our strategy, large reductions in trace size allow long-running programs to be replayed that were previously unmanageable. In addition, the reduced tracing requirements alleviate tracing bottle-necks, allowing executions to be debugged with substantially lower execution time overhead.This work was supported in part by National Science Foundation grants CCR-8815928 and CCR-9100968, Office of Naval Research grant N00014-89-J-1222, and a grant from Sequent Computer Systems, Inc. 相似文献

11.

Developing a distributed scalable Java component server 总被引：2，自引：0，他引：2

Yike Patrick 《Future Generation Computer Systems》2001,17(8):1051-1057

We present here approaches for a distributed scalable Java component server. The first one uses a resource broker model, whereby the system is composed of one or several entry point servers, a resource broker and a set of participating servers. The resource broker gives the system its dynamic scalability and load balancing capability by notifying participants and providing information to the entry point servers. An experimental version of the server has been developed. Two other approaches based on Jini and JavaSpace are proposed. An experimental version of the latter one is also compared with the resource broker model. 相似文献

12.

A parallel graph partitioning algorithm for a message-passing multiprocessor 总被引：1，自引：0，他引：1

John R. Gilbert Earl Zmijewski 《International journal of parallel programming》1987,16(6):427-449

We develop a parallel algorithm for partitioning the vertices of a graph intop2 sets in such a way that few edges connect vertices in different sets. The algorithm is intended for a message-passing multiprocessor system, such as the hypercube, and is based on the Kernighan-Lin algorithm for finding small edge separators on a single processor.⁽¹⁾ We use this parallel partitioning algorithm to find orderings for factoring large sparse symmetric positive definite matrices. These orderings not only reduce fill, but also result in good processor utilization and low communication overhead during the factorization. We provide a complexity analysis of the algorithm, as well as some numerical results from an Intel hypercube and a hypercube simulator.Publication of this report was partially supported by the National Science Foundation under Grant DCR-8451385 and by AT&T Bell Laboratories through their Ph.D scholarship program. 相似文献

13.

Editorial: communication optimization for scalable parallel system

Ching-Hsien?Hsu Email author Peter?Sloot 《The Journal of supercomputing》2012,60(1):1-3

相似文献

14.

Runtime support for scalable programming in Java

Sang Boem Lim Hanku Lee Bryan Carpenter Geoffrey Fox 《The Journal of supercomputing》2008,43(2):165-182

The paper research is concerned with enabling parallel, high-performance computation—in particular development of scientific software in the network-aware programming language, Java. Traditionally, this kind of computing was done in Fortran. Arguably, Fortran is becoming a marginalized language, with limited economic incentive for vendors to produce modern development environments, optimizing compilers for new hardware, or other kinds of associated software expected of by today’s programmers. Hence, Java looks like a very promising alternative for the future. The paper will discuss in detail a particular environment called HPJava. HPJava is the environment for parallel programming—especially data-parallel scientific programming—in Java. Our HPJava is based around a small set of language extensions designed to support parallel computation with distributed arrays, plus a set of communication libraries. A high-level communication API, Adlib, is developed as an application level communication library suitable for our HPJava. This communication library supports collective operations on distributed arrays. We include Java Object as one of the Adlib communication data types. So we fully support communication of intrinsic Java types, including primitive types, and Java object types. 相似文献

15.

A message-passing class library C++ for portable parallel programming

S. H. Hsieh Assistant Professor E. D. Sotelino 《Engineering with Computers》1997,13(1):20-34

An object-oriented message-passing class library in C++, called PPI++, for portable parallel programming has been developed. PPI++ (parallel portability interface in C++) is designed to serve as a stable (unchanging) interface between the client parallel code and the rapidly evolving distributed computing environments. By taking advantage of encapsulation, inheritance, and polymorphism supported by C++, PPI++ provides a clean and consistent programming interface, which helps improve the clarity and expressiveness of client parallel codes and hides implementation details and complexity from the user to ease parallel programming tasks. In addition, the use of strong type-checking in C++ allows the detection of potential misuses of the library at compile time, and thus promotes code reliability. This paper describes the object-oriented design and implementation of PPI++. Evaluation of PPI++ on important performance issues, such as portability, ease-of-use, extensibility, and efficiency, is also discussed. 相似文献

16.

Unifying stabilization and termination in message-passing systems 总被引：1，自引：0，他引：1

Anish Arora Mikhail Nesterenko 《Distributed Computing》2005,17(3):279-290

The paper dispels the myth that it is impossible for a message-passing program to be both terminating and stabilizing. We consider a rather general notion of termination: a terminating program eventually stops its execution after the environment ceases to provide input. We identify termination-symmetry to be a necessary condition for a problem to admit a solution with such properties. Our results do confirm that a number of well-known problems (e.g., consensus, leader election) do not allow a terminating and stabilizing solution. On the flip side, they show that other problems such as mutual exclusion and reliable-transmission allow such solutions. We present a message-passing solution to the mutual exclusion problem that is both stabilizing and terminating. We also describe an approach of adding termination to a stabilizing program. To illustrate this approach, we add termination to a stabilizing solution for the reliable transmission problem.Published online: 15 November 2004Anish Arora: Supported in part by DARPA contract OSU-RF #F33615-01-C-1901,NSF grant NSF-CCR-9972368, Ohio State University Fellowship,and 2002-2003,2003-2004 grants from Microsoft Research.Mikhail Nesterenko: Supported in part by DARPA contract OSU-RF #F33615-01-C-1901 and byNSF CAREER Award 0347485Some of the results in this paper were presented at the 21st International Conference on Distributed Computing Systems, Mesa, Arizona, April 2001, pp 99-106. Correspondence to: Mikhail Nesterenko 相似文献

17.

A space-efficient parallel sequence comparison algorithm for a message-passing multiprocessor

Xiaoqiu Huang 《International journal of parallel programming》1989,18(3):223-239

We present a parallel algorithm for computing an optimal sequence alignment in efficient space. The algorithm is intended for a message-passing architecture with one-dimensional-array topology. The algorithm computes an optimal alignment of two sequences of lengthsM andN inO((M+N) ²/P) time andO((M+N)/P) space per processor, where the number of processors isP>=max(M, N). Thus, whenP=max(M, N) it achieves linear speedup and requires constant space per processor. Some experimental results on an Intel hypercube are provided.This research was supported by NIH Grant LM05110 from the National Library of Medicine. 相似文献

18.

A parallel and scalable CAST-based clustering algorithm on GPU

Kawuu W. Lin Chun-Hung Lin Chun-Yuan Hsiao 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2014,18(3):539-547

The advances in nanometer technology and integrated circuit technology enable the graphics card to attach individual memory and one or more processing units, named GPU, in which most of the graphing instructions can be processed in parallel. Obviously, the computation resource can be used to improve the execution efficiency of not only graphing applications but other time consuming applications like data mining. The Clustering Affinity Search Technique is a famous clustering algorithm, which is widely used in clustering the biological data. In this paper, we will propose an algorithm that can utilize the GPU and the individual memory of graphics card to accelerate the execution. The experimental results show that our proposed algorithm can deliver excellent performance in terms of execution time and is scalable to very large databases. 相似文献

19.

Emulating shared-memory Do-All algorithms in asynchronous message-passing systems

Dariusz R. Kowalski Mariam Momenzadeh Alexander A. Shvartsman 《Journal of Parallel and Distributed Computing》2010

相似文献

20.

A scalable encryption scheme for multi-privileged group communications

Guojun Wang Qiushuang Du Wei Zhou Qin Liu 《The Journal of supercomputing》2013,64(3):1075-1091

In multi-privileged group communications, since users, who can subscribe to different data streams according to their interests, have multiple access privileges, security issues are more difficult to be solved than those in traditional group communications. The common drawback of traditional key management schemes is that they will result in the “one-affect-many” problem, because they use a key graph to manage all the keys in a group, which makes one key being shared by many users. Recently, a key-policy attribute-based encryption (KP-ABE) scheme is proposed to encrypt messages to multiple users efficiently, which has been applied in secure multi-privileged group communications. However, user revocation in KP-ABE is still not resolved when applied to multi-privileged group communications. So, in this paper, by uniquely combining a collusion-resistant broadcast encryption system and a KP-ABE system with a non-monotone access structure, we propose a scalable encryption scheme for multi-privileged group communications (EMGC). Based on the features of different multi-privileged group communication systems, we also propose two constructions for our EMGC scheme. With the two constructions, a system can support a user not only to join/leave a group at will, but also to change his access privilege on demand, and the expenses during rekeying operations are small. Therefore, our scheme, which can accommodate a dynamic group of users, is more applicable to multi-privileged group communications. 相似文献