首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recent distributed shared memory (DSM) systems provide increasingly more support for the sharing of objects rather than portions of memory. However, like earlier DSM systems these distributed shared object systems (DSO) still force developers to use a single protocol, or a small set of given protocols, for the sharing of application objects. This limitation prevents the applications from optimizing their communication behaviour and results in unnecessary overhead. A current general trend in software systems development is towards customizable systems, for example frameworks, reflection, and aspect‐oriented programming all aim to give the developer greater flexibility and control over the functionality and performance of their code. This paper describes a novel object‐oriented framework that defines a DSM system in terms of a consistency model and an underlying coherency protocol. Different consistency models and coherency protocols can be used within a single application because they can be customized, by the application programmer, on a per‐object basis. This allows application specific semantics to be exploited at a very fine level of granularity and with a resulting improvement in performance. The framework is implemented in JAVA and the speed‐up obtained by a number of applications that use the framework is reported. Copyright © 2002 John Wiley & Sons, Ltd.  相似文献   

2.
The granularity of shared data is one of the key factors affecting the performance of distributed shared memory machines (DSM). Given that programs exhibit quite different sharing patterns, providing only one or two fixed granularities cannot result in an efficient use of resources. On the other hand, supporting arbitrarily granularity sizes significantly increases not only hardware complexity but software overhead as well. Furthermore. the efficient use of arbitrarily granularities put the burden on users to provide information about program behavior to compilers and/or runtime systems. These kind of requirements tend to restrict the programmability of the shared memory model. In this paper, we present a new communication scheme, calledAdaptive Granularity (AG). Adaptive Granularity makes it possible to transparently integrate bulk transfer into the shared memory model by supporting variable-size granularity and memory replication. It consists of two protocols: one for small data and another for large data. For small size data, the standard hardware DSM protocol is used and the granularity is fixed to the size of a cache line. For large array data, the protocol for bulk data is used instead, and the granularity varies depending on the runtime sharing behavior of the applications. Simulation results show that AG improves performance up to 43% over the hardware implementation of DSM (e.g., DASH, Alewife). Compared with an equivalent architecture that supports fine-grain memory replication at the fixed granularity of a cache line (e.g., Typhoon), AG reduces execution time up to 35%. This research was supported in part by NSF under grant CCR-9308981 by ARPA under Rome Laboratories Contract F30602-91-C-0146, and by the USC Zumberge Fund. Computing resources were provided in part by NSF infrastructure grant CDA-9216321.  相似文献   

3.
Distributed systems that consist of workstations connected by high performance interconnects offer computational power comparable to moderate size parallel machines. Middleware like distributed shared memory (DSM) or distributed shared objects (DSO) attempts to improve the programmability of such hardware by presenting to application programmers interfaces similar to those offered by shared memory machines. This paper presents the portable Indigo data sharing library which provides a small set of primitives with which arbitrary shared abstractions are easily and efficiently implemented across distributed hardware platforms. Sample shared abstractions implemented with Indigo include DSM as well as fragmented objects, where the object state is split across different machines and where interfragment communications may be customized to application-specific consistency needs. The Indigo library's design and implementation are evaluated on two different target platforms: a workstation cluster and an IBM SP2 machine. As part of this evaluation, a novel DSM system and consistency protocol are implemented and evaluated with several high performance applications. Application performance attained with the DSM system is compared to the performance experienced when utilizing the underlying basic message-passing facilities or when employing Indigo to construct customized fragmented objects implementing the application's shared state. Such experimentation results in insights concerning the efficient implementation of DSM systems (e.g. how to deal with false sharing). It also leads to the conclusion that Indigo provides a sufficiently rich set of abstractions for efficient implementation of the next generation of parallel programming models for high performance machines. © 1998 John Wiley & Sons, Ltd.  相似文献   

4.
Although the shared memory abstraction is gaining ground as a programming abstraction for parallel computing, the main platforms that support it, small-scale symmetric multiprocessors (SMPs) and hardware cache-coherent distributed shared memory systems (DSMs), seem to lie inherently at the extremes of the cost-performance spectrum for parallel systems. In this paper we examine if shared virtual memory (SVM) clusters can bridge this gap by examining how application performance scales on a state-of-the-art shared virtual memory cluster. We find that: (i) The level of application restructuring needed is quite high compared to applications that perform well on a DSM system of the same scale and larger problem sizes are needed for good performance. (ii) However, surprisingly, SVM performs quite well for a fairly wide range of applications, achieving at least half the parallel efficiency of a high-end DSM system at the same scale and often much more.  相似文献   

5.
In a distributed system framework, program interactions can be modelled by using typed objects according to client/server relationships. The operations defined by a given type are the services that may be provided by an object of this type to a client process. When the process and the object are located in different nodes, migrations may represent valid alternatives to remote procedure calls. Migration of the server object causes the memory area storing the internal representation of this object to be copied into the node of the client process. Migration of the client process causes execution of this process to proceed in the node of the server object. This paper proposes migration paradigms with reference to a memory environment implementing the notion of a single address space. The discussion takes a number of salient issues into consideration, including performance, memory configurations for object storage, and the strategies for memory management.  相似文献   

6.
Algorithms implementing distributed shared memory   总被引:1,自引:0,他引:1  
Stumm  M. Zhou  S. 《Computer》1990,23(5):54-64
Four basic algorithms for implementing distributed shared memory are compared. Conceptually, these algorithms extend local virtual address spaces to span multiple hosts connected by a local area network, and some of them can easily be integrated with the hosts' virtual memory systems. The merits of distributed shared memory and the assumptions made with respect to the environment in which the shared memory algorithms are executed are described. The algorithms are then described, and a comparative analysis of their performance in relation to application-level access behavior is presented. It is shown that the correct choice of algorithm is determined largely by the memory access behavior of the applications. Two particularly interesting extensions of the basic algorithms are described, and some limitations of distributed shared memory are noted  相似文献   

7.
The evaluation of load balancing algorithms for use in general-purpose distributed systems has traditionally been carried out by discrete event simulation. This paper describes a testbed which supports the artificial creation of workload and the measurement of performance for load balancing algorithm implementations based on job migration in a small distributed system. Nine load balancing algorithms are implemented and evaluated under a range of environmental conditions. The experimental results confirm the effectiveness of load balancing and indicate the superiority of the centralised non-adaptive and distributed adaptive approaches. © 1998 John Wiley & Sons, Ltd.  相似文献   

8.
This paper presents a comprehensive evaluation testbed for interconnection networks and routing algorithms using real applications. The testbed is flexible enough to implement any network topology and fault-tolerant routing algorithm, and allows the system architect to study the cost versus performance trade-offs for a range of network parameters. We illustrate its use with one fault-tolerant algorithm and analyze the performance of four shared memory applications with different fault conditions. We also show how the testbed can be used to drive future research in fault-tolerant routing algorithms and architectures by proposing and evaluating novel architectural enhancements to the network router, called path selection heuristics (PSH). We propose three such schemes and the Least Recently Used (LRU) PSH is shown to give the best performance in the presence of faults  相似文献   

9.
Scientific computations have been using GPU-enabled computers successfully, often relying on distributed nodes to overcome the limitations of device memory. Only a handful of text mining applications benefit from such infrastructure. Since the initial steps of text mining are typically data intensive, and the ease of deployment of algorithms is an important factor in developing advanced applications, we introduce a flexible, distributed, MapReduce-based text mining workflow that performs I/O-bound operations on CPUs with industry-standard tools and then runs compute-bound operations on GPUs which are optimized to ensure coalesced memory access and effective use of shared memory. We have performed extensive tests of our algorithms on a cluster of eight nodes with two NVidia Tesla M2050s attached to each, and we achieve considerable speedups for random projection and self-organizing maps.  相似文献   

10.
With increasing richness in features such as personalization of content, Web applications are becoming increasingly complex and hence compute intensive. Traditional approaches for improving performance of static content Web sites have been based on the assumption that static content such as images are network intensive. However, these methods are not applicable to the dynamic content applications which are more compute intensive than static content. This paper proposes a suite of algorithms which jointly optimize the performance of dynamic content applications by reducing the client access times while also minimizing the resource utilization. A server migration algorithm allocates servers on-demand within a cluster such that the client access times are not affected even under sudden overload conditions. Further, a server selection mechanism enables statistical multiplexing of resources across clusters by redirecting requests away from overloaded clusters. We also propose a cluster decision algorithm which decides whether to migrate in additional servers at the local cluster or redirect requests remotely under different workload conditions. Through a combination of analytical modeling, trace-driven simulation over traces from large e-commerce sites and testbed implementation, we explore the performance savings achieved by the proposed algorithms.  相似文献   

11.
Practical parallel algorithms, based on classical sequential Union-Find algorithms for computing transitive closures of binary relations, are described and implemented for both shared memory and distributed memory parallel computers. By practical algorithms, we mean algorithms that are efficient for parallel systems with bounded numbers of processors as opposed to algorithms where the number of processors grows with the problem size. Transitive closures are useful for decomposing many applications problems into independent subproblems. The implementations were on an ENCORE Multimax shared memory machine and an NCUBE hypercube. Our implementations indicate that transitive closure computations are intrinsically difficult for distributed memory parallel machines because of the need for global information. By contrast, our results for shared memory machines exhibited excellent speedups.Supported in part by NSF Grant DCR-8619103, ONR contract N000-86-G-0202 and DOE Grant DE-FG02-85ER25001.Supported in part by RADC contract F30602-85-C-0303.Supported in part by RADC contract F30602-85-C-0303.  相似文献   

12.
Despite the large amount of Byzantine fault-tolerant algorithms for message-passing systems designed through the years, only recently algorithms for the coordination of processes subject to Byzantine failures using shared memory have appeared. This paper presents a new computing model in which shared memory objects are protected by fine-grained access policies, and a new shared memory object, the Policy-Enforced Augmented Tuple Space (PEATS). We show the benefits of this model by providing simple and efficient consensus algorithms. These algorithms are much simpler and requires less shared memory operations, using also less memory bits than previous algorithms based on ACLs and sticky bits. We also prove that PEATS objects are universal, i.e., that they can be used to implement any other shared memory object, and present lock-free and wait-free universal constructions.  相似文献   

13.
要松散型多机系统中实现虚拟共享存贮,不仅更加方便于分布式应用系统的开发,而且可以在此基础上建立分布式的系统服务环境。本文描述了Mach的存贮管理模型和基于此模型上的虚拟共享存贮服务器。  相似文献   

14.
Distributed shared memory (DSM) allows parallel programs to run on distributed computers by simulating a global virtual shared memory, but data racing bugs may easily occur when the threads of a multi-threaded process concurrently access the physically distributed memory. Earlier tools to help programmers locate data racing bugs in non-DSM parallel programs are not easily applied to DSM systems. This study presents the data race avoidance and replay scheme (DRARS) to assist debugging parallel programs on DSM or multi-core systems. DRARS is a novel tool which controls the consistency protocol of the target program, automatically preventing a large class of data racing bugs when the parallel program is subsequently run, obviating much of the need for manual debugging. For data racing bugs that cannot be avoided automatically, DRARS performs a deterministic replay-type function on DSM systems, faithfully reproducing the behavior of the parallel program during run time. Because one class of data racing bugs has already been eliminated, the remaining manual debugging task is greatly simplified. Unlike previous debugging methods, DRARS does not require that the parallel program be written in a specific style or programming language. Moreover, DRARS can be implemented in most consistency protocols. In this paper, DRARS is realized and verified in real experiments using the eager release consistency protocol on a DSM system with various applications.  相似文献   

15.
一种基于内存服务的内存共享网格系统   总被引:1,自引:0,他引:1  
褚瑞  肖侬  卢锡城 《计算机学报》2006,29(7):1225-1233
内存密集型应用对运行环境的物理内存要求严格,在物理内存不足时将会引发大量磁盘IO,降低系统性能.传统的网络内存致力于在集群内部通过共享空闲节点的物理内存解决该问题,但受集群负载和内部网络影响较大.通过结合网络内存和服务计算、网格计算等技术,提出一种基于内存服务的内存共享网格系统——内存网格,并分析和讨论了实现内存服务的关键技术和算法.内存网格弥补了网络内存的不足,扩展了网格计算的应用范围.通过基于真实应用运行状态的模拟,证明了内存网格与网络内存相比具有性能的提高.  相似文献   

16.
In this paper, we present and evaluate a dynamic proxy framework called the chek proxy framework (CPF). CPF is an application-level approach that provokes the use of client machines to host at runtime a server-initiated intermediate object called dynamic application proxy server (DAPS) based on the designed clustering policy. Unlike conventional and current dynamic proxy systems, CPF adopts an incentive scheme where the selected client machines will be rewarded for sharing the central server workloads by servicing local/regional client requests. The results showed that the CPF approach reduces both the processor utilization and memory consumption of the central server by 15.1% and 16.5 MB, respectively, than the conventional client/server approach in our prototype implementation. With our simulation, it is further quantified that allocating DAPS to work cooperatively in a hierarchical fashion further increases the average client-receiving rate and the network throughput by at least 100% and 35%, respectively, with a server workload reduction of 11.38%, than DAPS serving end-clients directly.  相似文献   

17.
Two kinds of parallel computers exist: those with shared memory and those without. The former are difficult to build but easy to program. The latter are easy to build but difficult to program. In this paper we present a hybrid model that combines the best properties of each by simulating a restricted object-based shared memory on machines that do not share physical memory. In this model, objects can be replicated on multiple machines. An operation that does not change an object can then be done locally, without any network traffic. Update operations can be done using the reliable broadcast protocol described in the paper. We have constructed a prototype system, designed and implemented a new programming language for it, and programmed various applications using it. The model, algorithms, language, applications and performance will be discussed.  相似文献   

18.
Two paradigms for distributed shared memory on loosely-coupled computing systems are compared: the shared data-object model as used in Orca, a programming language specially designed for loosely-coupled computing systems, and the shared virtual memory model. For both paradigms two systems are described, one using only point-to-point messages, the other using broadcasting as well. The two paradigms and their implementations are described briefly. Their performances are compared on four applications: the travelling-salesman problem, alpha-beta search, matrix multiplication and the all-pairs shortest-paths problem. Measurements were obtained on a system consisting of 10 MC68020 processors connected by an Ethernet. For comparison purposes, the applications have also been run on a system with physical shared memory. In addition, the paper gives measurements for the first two applications above when remote procedure call is used as the communication mechanism. The measurements show that both paradigms can be used efficiently for programming large-grain parallel applications, with significant speed-ups. The structured shared data-object model achieves the highest speed-ups and is easiest to program and to debug.  相似文献   

19.
一种新的大型通用分布式服务器架构   总被引:2,自引:0,他引:2  
唐磊  金连甫 《计算机工程与设计》2004,25(10):1784-1786,1810
目前许多大型应用如电子商务系统、电子邮件系统、企业信息系统等都是基于大型分布式服务器系统。给出了一种新的大型分布式服务器系统架构的设计与实现,该分布式服务器体系结构为各种大型应用系统提供了一个统一的服务器架构和开发模型。它能够高效处理大量并发业务,容易随着系统业务规模扩大而扩展,并且适用于目前所有操作系统和硬件平台。  相似文献   

20.
Any parallel program has abstractions that are shared by the program's multiple processes. Such shared abstractions can considerably affect the performance of parallel programs, on both distributed and shared memory multiprocessors. As a result, their implementation must be efficient, and such efficiency should be achieved without unduly compromising program portability and maintainability. The primary contribution of the DSA library is its representation of shared abstractions as objects that may be internally distributed across different nodes of a parallel machine. Such distributed shared abstractions (DSA) are encapsulated so that their implementations are easily changed while maintaining program portability across parallel architectures. The principal results presented are: a demonstration that the fragmentation of object state across different nodes of a multiprocessor machine can significantly improve program performance; and that such object fragmentation can be achieved without compromising portability by changing object interfaces. These results are demonstrated using implementations of the DSA library on several medium scale multiprocessors, including the BBN Butterfly, Kendall Square Research, and SGI shared memory multiprocessors. The DSA library's evaluation uses synthetic workloads and a parallel implementation of a branch and bound algorithm for solving the traveling salesperson problem (TSP)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号