期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

窦勇周兴铭《计算机学报》1995,(7)

Ｓｕｐｅｒ－Ｏｂｊｅｃｔ模型提出了一种新的方法，在分布存储器多计算机上实现语言级虚拟共享存储器以支持共享存储器通信模式．Ｓｕｐｅｒ－Ｏｂｊｅｃｔ模型引入新的概念ｓｕｐｅｒ－ｏｂｊｅｃｔ，不同于其它模型，基于ｓｕｐｅｒ－ｏｂｊｅｃｔ，它提出了新的共享数据定位方法，全局地址标识（ｎａｍｅ，ｏｆｆ－ｓｅｔ）．Ｓｕｐｅｒ－Ｏｂｊｅｃｔ模型与Ｆｏｒｔｒａｎ７７结合，我们实现了一个运行时间系统和库调用，支持程序员使用Ｆｏｒｔｒａｎ语言编写并行程序，最后介绍了系统的实现和取得的性能．相似文献

2.

软件DSM系统中的动态数据竞争检测

章隆兵吴少刚张福新《小型微型计算机系统》2004,25(12):2070-2074

数据竞争是共享存储程序中的一类难于调试的错误 .在支持域存储一致性模型的软件 DSM系统 JIAJIA上 ,通过采用汇编代码装配技术来获得程序所读写的共享变量集合的方法 ,实现了基于锁集合的动态数据竞争检测算法 .利用本文方法 ,在 TSP和 Barnes程序中找到了数据竞争情况 ,并根据找到的数据竞争 ,修正了 Barnes中的错误 .实际使用经验表明 ,本文方法易于用户使用 ,达到了实用水平相似文献

3.

A Hybrid Analysis of an Optimization Approach for Cluster Applications

Ming Zhu Wentong Cai Bu-Sung Lee Xudong Wu 《The Journal of supercomputing》2005,32(3):191-215

Cluster/distributed computing has become a popular, cost-effective alternative to high-performance parallel computers. Many parallel programming languages and related programming models have become widely accepted on clusters. However, the high communication overhead is a major shortcoming of running parallel applications on cluster/distributed computing environments. To reduce the communication overhead and thus the completion time of a parallel application, this paper introduces and evaluates an efficient Key Message (KM) approach to support parallel computing on cluster computing environments. In this paper, we briefly present the model and algorithm, and then analytical and simulation methods are adopted to evaluate the performance of the algorithm. It demonstrates that when network background load increases or the computation to communication ratio decreases, the analysis results show better improvement on communication of a parallel application over the system which does not use the KM approach. 相似文献

4.

Optimizing OpenMP Programs on Software Distributed Shared Memory Systems

Min Seung-Jai Basumallik Ayon Eigenmann Rudolf 《International journal of parallel programming》2003,31(3):225-249

This paper describes compiler techniques that can translate standard OpenMP applications into code for distributed computer systems. OpenMP has emerged as an important model and language extension for shared-memory parallel programming. However, despite OpenMP's success on these platforms, it is not currently being used on distributed system. The long-term goal of our project is to quantify the degree to which such a use is possible and develop supporting compiler techniques. Our present compiler techniques translate OpenMP programs into a form suitable for execution on a Software DSM system. We have implemented a compiler that performs this basic translation, and we have studied a number of hand optimizations that improve the baseline performance. Our approach complements related efforts that have proposed language extensions for efficient execution of OpenMP programs on distributed systems. Our results show that, while kernel benchmarks can show high efficiency of OpenMP programs on distributed systems, full applications need careful consideration of shared data access patterns. A naive translation (similar to OpenMP compilers for SMPs) leads to acceptable performance in very few applications only. However, additional optimizations, including access privatization, selective touch, and dynamic scheduling, resulting in 31% average improvement on our benchmarks. 相似文献

5.

基于分布式共享存储器的并行程序设计

袁道华《计算机研究与发展》1994,31(9):56-60

本文概述了分布式并行系统和分布式共享存储器的一般概念，讨论了使用共享对象和可靠广播的并行程序设计模型，最后给出了我们的改进模型。相似文献

6.

DGLa: A Distributed Graphics Language 总被引：1，自引：0，他引：1

下载免费PDF全文

Pan Zhigeng Shi Jiaoying Hu Bingfeng 《计算机科学技术学报》1994,9(2):97-106

A distributed graphics programming language called DGLa is presented,which facilitates the development of distributed graphics application.Facilities for distributed programming and graphics support are included in it,It not only supports synchronous and asynchronous communication but also provides programmer with multiple control mechanism for process communication.The graphics support of DGLa is powerful,for both sequential graphics library and parallel graphics library are provided.The design consideration and implementation experience are discussed in detail in this paper.Application examples are also given. 相似文献

7.

Distributed programming languages: design and implementation

Thomas LeBlanc Robert Cook 《Computer Communications》1982,5(5):239-244

Considerable discussion has appeared in recent literature about ‘distributed programming’. One area of discussion concerns the design of programming languages which support distributed algorithms. This paper examines the important issues in programming language design for distributed computing, and presents an example of a language philosophy which supports program development for a distributed environment. It also discusses experiences with the STARMOD distributed programming language. 相似文献

8.

A VLSI implementation of an architecture for applicative programming

《Future Generation Computer Systems》1988,4(3):245-254

The Applicative Programming System Architecture contains a novel Data Structure Memory (DSM) which supports fast access operations on compact linear data structures. Several problems that arise in implementations of applicative and functional programming languages can be solved efficiently using special data representations on the DSM. Each memory word in the DSM contains a very small local processor, and there is also a tree-structured communications network within the DSM. Therefore the DSM is a massively parallel SIMD machine. This paper describes a VLSI implementation of the DSM architecture and compares its performance with implementations on a conventional sequential computer and the NASA Massively Parallel Processor. 相似文献

9.

基于NOW的对象式分布式程序设计语言NC++

顾庆谢立陈道蓄吴迎红孙钟秀《软件学报》2001,12(2):183-189

提出了一个基于工作站网(networkofworkstations,简称NOW)的分布式程序设计语言NC++(NOWC++).它是DC++语言的扩充.NC++提供了一个完备的编程环境,包括NC++预编译器、图视编程界面、多目通信机制和测试系统.它完善了组管理机制和进程通信机制,提出了一个基于信度推理网络的分布共享内存(distributedsharedmemory,简称DSM)机制以管理C++公共变量.实践证明,NC++语言在确保编程方便性的前提下保证了分布式程序的性能. 相似文献

10.

Data race avoidance and replay scheme for developing and debugging parallel programs on distributed shared memory systems

Yung-Chang Chiu Tyng-Yeu Liang 《Parallel Computing》2011,37(1):11-25

Distributed shared memory (DSM) allows parallel programs to run on distributed computers by simulating a global virtual shared memory, but data racing bugs may easily occur when the threads of a multi-threaded process concurrently access the physically distributed memory. Earlier tools to help programmers locate data racing bugs in non-DSM parallel programs are not easily applied to DSM systems. This study presents the data race avoidance and replay scheme (DRARS) to assist debugging parallel programs on DSM or multi-core systems. DRARS is a novel tool which controls the consistency protocol of the target program, automatically preventing a large class of data racing bugs when the parallel program is subsequently run, obviating much of the need for manual debugging. For data racing bugs that cannot be avoided automatically, DRARS performs a deterministic replay-type function on DSM systems, faithfully reproducing the behavior of the parallel program during run time. Because one class of data racing bugs has already been eliminated, the remaining manual debugging task is greatly simplified. Unlike previous debugging methods, DRARS does not require that the parallel program be written in a specific style or programming language. Moreover, DRARS can be implemented in most consistency protocols. In this paper, DRARS is realized and verified in real experiments using the eager release consistency protocol on a DSM system with various applications. 相似文献

11.

A flexible framework for consistency management

S. Weber P. A. Nixon B. Tangney 《Concurrency and Computation》2002,14(1):33-53

Recent distributed shared memory (DSM) systems provide increasingly more support for the sharing of objects rather than portions of memory. However, like earlier DSM systems these distributed shared object systems (DSO) still force developers to use a single protocol, or a small set of given protocols, for the sharing of application objects. This limitation prevents the applications from optimizing their communication behaviour and results in unnecessary overhead. A current general trend in software systems development is towards customizable systems, for example frameworks, reflection, and aspect‐oriented programming all aim to give the developer greater flexibility and control over the functionality and performance of their code. This paper describes a novel object‐oriented framework that defines a DSM system in terms of a consistency model and an underlying coherency protocol. Different consistency models and coherency protocols can be used within a single application because they can be customized, by the application programmer, on a per‐object basis. This allows application specific semantics to be exploited at a very fine level of granularity and with a resulting improvement in performance. The framework is implemented in JAVA and the speed‐up obtained by a number of applications that use the framework is reported. Copyright © 2002 John Wiley & Sons, Ltd. 相似文献

12.

一种面向图的分布式软件动态配置和容错方法 总被引：1，自引：0，他引：1

宋毅刘云超《计算机应用》2003,23(12):37-41

提出一种新的方法，通过动态配置对基于组件的分布式软件的容错提供支持。此方法采用面向图的GOP编程模型，将整个分布式软件的体系结构用一张逻辑图来描述，系统的动态配置可以通过执行图上预定义的一组操作来完成。检测到故障或异常的时候实施这种动态配置能够支持系统的容错。文中描述了此方法的基本模型、系统结构和基于CORBA的原型实现。相似文献

13.

OpenMP‐oriented applications for distributed shared memory architectures

Ami Marowka Zhenying Liu Barbara Chapman 《Concurrency and Computation》2004,16(4):371-384

The rapid rise of OpenMP as the preferred parallel programming paradigm for small‐to‐medium scale parallelism could slow unless OpenMP can show capabilities for becoming the model‐of‐choice for large scale high‐performance parallel computing in the coming decade. The main stumbling block for the adaptation of OpenMP to distributed shared memory (DSM) machines, which are based on architectures like cc‐NUMA, stems from the lack of capabilities for data placement among processors and threads for achieving data locality. The absence of such a mechanism causes remote memory accesses and inefficient cache memory use, both of which lead to poor performance. This paper presents a simple software programming approach called copy‐inside–copy‐back (CC) that exploits the data privatization mechanism of OpenMP for data placement and replacement. This technique enables one to distribute data manually without taking away control and flexibility from the programmer and is thus an alternative to the automat and implicit approaches. Moreover, the CC approach improves on the OpenMP‐SPMD style of programming that makes the development process of an OpenMP application more structured and simpler. The CC technique was tested and analyzed using the NAS Parallel Benchmarks on SGI Origin 2000 multiprocessor machines. This study shows that OpenMP improves performance of coarse‐grained parallelism, although a fast copy mechanism is essential. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献

14.

Object Clustering for High Performance Parallel Computing

Kim Hyeong-Do Jeong Chang-Sung 《The Journal of supercomputing》2001,19(3):267-283

In this article we present a new parallel programming environment, called distributed object-oriented virtual computing environment (DOVE), for clustered computers based on distributed object model. In DOVE, a parallel program is built as a collection of concurrent objects each of which has its own computing power and which interacts with one another by remote method invocation. The parallelism is encapsulated within distributed objects, which can be handled the same way as local objects. The main goal of DOVE is to provide users with an easy-to-use transparent parallel programming environment while supporting efficient parallelism encapsulated and distributed among objects. For the experiment and evaluation of DOVE, two parallel application programs have been developed both on DOVE and PVM. 相似文献

15.

Shared virtual memory clusters: bridging the cost-performance gap between SMPs and hardware DSM systems

Angelos Bilas Dongming Jiang Jaswinder Pal Singh 《Journal of Parallel and Distributed Computing》2003,63(12):1257-1276

Although the shared memory abstraction is gaining ground as a programming abstraction for parallel computing, the main platforms that support it, small-scale symmetric multiprocessors (SMPs) and hardware cache-coherent distributed shared memory systems (DSMs), seem to lie inherently at the extremes of the cost-performance spectrum for parallel systems. In this paper we examine if shared virtual memory (SVM) clusters can bridge this gap by examining how application performance scales on a state-of-the-art shared virtual memory cluster. We find that: (i) The level of application restructuring needed is quite high compared to applications that perform well on a DSM system of the same scale and larger problem sizes are needed for good performance. (ii) However, surprisingly, SVM performs quite well for a fairly wide range of applications, achieving at least half the parallel efficiency of a high-end DSM system at the same scale and often much more. 相似文献

16.

High Performance Software Coherence for Current and Future Architectures

《Journal of Parallel and Distributed Computing》1995,29(2):179-195

Shared memory provides an attractive and intuitive programming model for large-scale parallel computing, but requires a coherence mechanism to allow caching for performance while ensuring that processors do not use stale data in their computation. Implementation options range from distributed shared memory emulations on networks of workstations to tightly coupled fully cache-coherent distributed shared memory multiprocessors. Previous work indicates that performance varies dramatically from one end of this spectrum to the other. Hardware cache coherence is fast, but also costly and time-consuming to design and implement, while DSM systems provide acceptable performance on only a limit class of applications. We claim that an intermediate hardware option-memory-mapped network interfaces that support a global physical address space, without cache coherence-can provide most of the performance benefits of fully cache-coherent hardware, at a fraction of the cost. To support this claim we present a software coherence protocol that runs on this class of machines, and use simulation to conduct a performance study. We look at both programming and architectural issues in the context of software and hardware coherence protocols. Our results suggest that software coherence on NCC-NUMA machines in a more cost-effective approach to large-scale shared-memory multiprocessing than either pure distributed shared memory or hardware cache coherence. 相似文献

17.

MILLIPEDE: Easy Parallel Programming in Available Distributed Environments

ROY FRIEDMAN MAXIM GOLDIN AYAL ITZKOVITZ ASSAF SCHUSTER 《Software》1997,27(8):929-965

MILLIPEDE is a project aimed at developing a distributed shared memory environment for parallel programming. A major goal of this project is to support easy-to-grasp parallel programming languages that will also make it straightforward to parallelize existing code. Other targets are forward compatibility and availability of both the user programs (hence the shared memory support and the C-like parallel language PARC) and the system itself (which is thus implemented in user-level and using the operating system exported services). Locality of memory references, which implies efficiency and speedups, is maintained by MILLIPEDE} using page and thread migration, through which dynamic load-balancing and weak memory are implemented. ©1997 by John Wiley & Sons, Ltd. 相似文献

18.

基于分布/共享内存层次结构的并行程序设计 总被引：1，自引：0，他引：1

李清宝张平《计算机应用》2004,24(6):148-150,158

分布内存结构和共享内存结构各具特点，又有很强的互补性，分布／共享内存层次结构将两种结构相结合，以充分发挥其优势。文中主要讨论基于分布／共享内存层次结构的并行程序设计问题，介绍了MPI和OpenMP混合并行程序设计模式。相似文献

19.

PRAM programming: in theory and in practice

D. S. Lecomber C. J. Siniolakis K. R. Sujithan 《Concurrency and Computation》2000,12(4):211-226

That the influence of the PRAM model is ubiquitous in parallel algorithm design is as clear as the fact that it is technologically infeasible for the forseeable future. The current generation of parallel hardware prominently features distributed memory and high‐performance interconnection networks—very much the antithesis of the shared memory required for the PRAM model. It has been shown that, in spite of communication costs, for some problems very fast parallel algorithms are available for distributed‐memory machines—from embarassingly parallel problems to sorting and numerical analysis. In contrast it is known that for other classes of problem PRAM‐style shared‐memory simulation on a distributed‐memory machine can, in theory, produce solutions of comparable performance to the best possible for such architectures. The Bulk Synchronous Parallel (BSP) model accurately represents most parallel machines—theoretical and actual—in an execution and cost model. We introduce a scalable portable PRAM realization appropriate for BSP computers and a methodology for usage. Our system is fast and built upon the familiar sequential C++ coupled with the new standard BSP library of parallel computation and communication primitives. It is portable to and predictable on a vast number of parallel computers including workstation clusters, a 256‐processor Cray T3D, an 8‐node IBM SP/2 and a 4‐node shared‐memory SGI Power Challenge machine. Our approach achieves simplicity of programming over direct‐mode BSP programming for reasonable overhead cost. We objectively compare optimized BSP and PRAM algorithms implemented with our C++ PRAM library and provide encouraging experimental results for our new style of programming. Copyright © 2000 John Wiley & Sons, Ltd. 相似文献

20.

The programming model of ASSIST, an environment for parallel and distributed portable applications 总被引：4，自引：0，他引：4

Marco Vanneschi 《Parallel Computing》2002,28(12):595-1732

A software development system based upon integrated skeleton technology (ASSIST) is a proposal of a new programming environment oriented to the development of parallel and distributed high-performance applications according to a unified approach. The main goals are: high-level programmability and software productivity for complex multidisciplinary applications, including data-intensive and interactive software; performance portability across different platforms, in particular large-scale platforms and grids; effective reuse of parallel software; efficient evolution of applications through versions that scale according to the underlying technologies.

The purpose of this paper is to show the principles of the proposed approach in terms of the programming model (successive papers will deal with the environment implementation and with performance evaluation). The features and the characteristics of the ASSIST programming model are described according to an operational semantics style and using examples to drive the presentation, to show the expressive power and to discuss the research issues.

According to our previous experience in structured parallel programming, in ASSIST we wish to overcome some limitations of the classical skeletons approach to improve generality and flexibility, expressive power and efficiency for irregular, dynamic and interactive applications, as well as for complex combinations of task and data parallelism. A new paradigm, called “parallel module” (parmod), is defined which, in addition to expressing the semantics of several skeletons as particular cases, is able to express more general parallel and distributed program structures, including both data-flow and nondeterministic reactive computations. ASSIST allows the programmer to design the applications in the form of generic graphs of parallel components. Another distinguishing feature is that ASSIST modules are able to utilize external objects, including shared data structures and abstract objects (e.g. CORBA), with standard interfacing mechanisms. In turn, an ASSIST application can be reused and exported as a component for other applications, possibly expressed in different formalisms. 相似文献