期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Reaching agreement on processor-group membrship in synchronous distributed systems

Flaviu Cristian 《Distributed Computing》1991,4(4):175-187

Reaching agreement on the identity of correctly functioning processors of a distributed system in the presence of random communication delays, failures and processor joins is a fundamental problem in fault-tolerant distributed systems. Assuming a synchronous communication network that is not subject to partition occurrences, we specify the processor-group membership problem and we propose three simple protocols for solving it. The protocols provide all correct processors with consistent views of the processor-group membership and guarantee bounded processor failure detection and join delays. Flaviu Cristian is a computer scientist at the IBM Almaden Research Center in San Jose, California. He received his PhD from the University of Grenoble, France, in 1979. After carrying out research in operating systems and programming methodology in France and working on the specification, design, and verification of fault-tolerant software in England, he joined IBM in 1982. Since then he has worked in the area of fault-tolerant distributed systems and protocols. He has participated in the design and implementation of a highly available distributed system prototype at the Almaden Research Center, has reviewed and consulted for several fault-tolerant distributed system designs, both in Europe and the American divisions of IBM, and is now a technical leader in the design of a new Air Traffic Control System for the US which must satisfy very stringent availability requirements. 相似文献

2.

Networking and communication challenges for post-exascale systems

Dhabaleswar Panda Xiao-Yi Lu Hari Subramoni 《浙江大学学报:C卷英文版》2018,19(10):1230-1235

With the significant advancement in emerging processor, memory, and networking technologies, exascale systems will become available in the next few years (2020–2022). As the exascale systems begin to be deployed and used, there will be a continuous demand to run next-generation applications with finer granularity, finer time-steps, and increased data sizes. Based on historical trends, next-generation applications will require postexascale systems during 2025–2035. In this study, we focus on the networking and communication challenges for post-exascale systems. Firstly, we present an envisioned architecture for post-exascale systems. Secondly, the challenges are summarized from different perspectives: heterogeneous networking technologies, high-performance communication and synchronization protocols, integrated support with accelerators and field-programmable gate arrays, fault-tolerance and quality-of-service support, energy-aware communication schemes and protocols, softwaredefined networking, and scalable communication protocols with heterogeneous memory and storage. Thirdly, we present the challenges in designing efficient programming model support for high-performance computing, big data, and deep learning on these systems. Finally, we emphasize the critical need for co-designing runtime with upper layers on these systems to achieve the maximum performance and scalability. 相似文献

3.

Distributing concurrent Ada programs by source translation

Judy M. Bishop Stephen R. Adams Avid J. Pritchard 《Software》1987,17(12):859-884

This paper tackles the practical aspects of obtaining a distributed version of an Ada program. It proposes the use of an adapter, which can be a methodology or an automatic translator. The adapter accepts source of a concurrent Ada program, adds communication and control tasks, and produces source for a single distributed Ada program, which can then be compiled and run on a multi-processor computer. The original program can consist of packages and tasks, and both of these can be classed as virtual nodes. The process of adaption does not alter the contents of any package in the original program, so that the method is directly applicable to systems that make use of library and generic packages. The communication between virtual nodes, which would normally reside as one per processor, is via messages on a ring, but the protocols are kept as simple as possible, and the messages are fully checked Ada types, rather than byte strings. The method has been applied to programs of the client-server model, and could be adapted for other rendezvous-based languages such as occam. 相似文献

4.

动态对等环境中组密钥协商协议的健壮性研究

周贤伟张臻贤尹立芳《计算机工程与应用》2007,43(6):152-155

总结提出了一种健壮的安全组通信系统一般模型,比较分析了多种组密钥协商协议,基于安全性和效率的考虑,从中选择了三种作为研究对象,描述了它们对各种异步网络事件和组成员关系变化的处理过程。在此基础上,探讨了它们不同的健壮性,并阐明了利用它们来构建健壮、可靠和安全的组通信系统的基本思路。相似文献

5.

RMP：一种具有高可伸缩性的随机成员协议

王伟波郭敬林刘西洋陈平《微机发展》2004,14(8):5-7

针对组播通信协议中所使用的成员协议的伸缩性差的问题，提出了一种新的随机成员协议(RMP)。RMP通过使用随机的响应组成员的加入请求，建立一个每个节点仅仅维护logN个其它成员信息的连接图，并可以为可靠的报文扩散提供基础。文中对RMP的算法在数学上进行了分析，并通过仿真进行验证，结果表明，RMP是一种具有很强可伸缩性的成员协议。相似文献

6.

Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs

《Journal of Parallel and Distributed Computing》2005,65(4):464-478

Distributed-memory parallel computers and networks of workstations (NOWs) both rely on efficient communication over increasingly high-speed networks. Software communication protocols are often the performance bottleneck. Several current and proposed parallel systems address this problem by dedicating one general-purpose processor in a symmetric multiprocessor (SMP) node specifically for protocol processing. This protocol processing convention reduces communication latency and increases effective bandwidth, but also reduces the peak performance since the dedicated processor no longer performs computation. In this paper, we study a parallel machine with SMP nodes and compare two protocol processing policies: the Fixed policy, which uses a dedicated protocol processor; and the Floating policy, where all processors perform both computation and protocol processing. The results from synthetic microbenchmarks and five macrobenchmarks show that: (i) a dedicated protocol processor benefits light-weight protocols much more than heavy-weight protocols, (ii) a dedicated protocol processor is generally advantageous when there are four or more processors per node, (iii) multiprocessor node performance is not as sensitive to interrupt overhead as uniprocessor node because a message arrival is likely to find an idle processor on a multiprocessor node, thereby eliminating interrupts, (iv) the system with the lowest cost-performance will include a dedicated protocol processor when interrupt overheads are much higher than protocol weight—as in light-weight protocols. 相似文献

7.

Computing global combine operations in the multiport postal model

Bar-Noy A. Bruck J. Ching-Tien Ho Kipnis S. Schieber B. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(8):896-900

Consider a message-passing system of n processors, in which each processor holds one piece of data initially. The goal is to compute an associative and commutative reduction function on the n pieces of data and to make the result known to all the n processors. This operation is frequently used in many message-passing systems and is typically referred to as global combine, census computation, or gossiping. This paper explores the problem of global combine in the multiport postal model. This model is characterized by three parameters: n-the number of processors, k-the number of ports per processor, and λ-the communication latency. In this model, in every round r, each processor can send k distinct messages to k other processors, and it can receive k messages that were sent from k other processors λ-1 rounds earlier. This paper provides an optimal algorithm for the global combine problem that requires the least number of communication rounds and minimizes the time spent by any processor in sending and receiving messages 相似文献

8.

Portability in MAC protocol and transceiver software implementations for LR‐WPAN platforms

Anthony Schoofs Phillip Stanley‐Marbell 《Software》2011,41(4):339-361

In a variety of emerging networked computing system domains over the years, there have been bursts of activity on new medium access control (MAC) protocols, as new communication transceiver technologies with greater data‐movement performance or lower power dissipation have been introduced. To enable implementations flexible to evolving standards and improving application‐domain insight, such MAC protocols are typically initially implemented in software, and interface between applications or system software, typically executing on an embedded processor or microcontroller, and the evolving radio transceiver hardware. Many challenges exist in implementing MAC protocols across evolving or competing transceiver hardware implementations and processor architectures. Some of these challenges are peculiar to the requirements of MAC protocols, and others are a result of the plethora of system and processor architectures in the embedded systems domain. This article studies the challenges facing software implementations of MAC protocols running on embedded microcontrollers, and interfacing with radio transceiver hardware. Experience with an implementation of the IEEE 802.15.4 MAC across three hardware platforms with different processor, system, and systems software architectures is presented, focusing on implementation approach and interfaces. Pitfalls are pointed out, and guidelines are provided for ensuring that new MAC implementations are easily portable across processor architectures and transceiver hardware. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

9.

Optimal Communication Primitives on the Generalized Hypercube Network

Paraskevi Fragopoulou Selim G. Akl Henk Meijer 《Journal of Parallel and Distributed Computing》1996,32(2):173

Efficient interprocessor communication is crucial to increasing the performance of parallel computers. In this paper, a special framework is developed on thegeneralized hypercube, a network that is currently receiving considerable attention. Using this framework as the basic tool, a number of spanning subgraphs with special properties to fit various communication needs are constructed on the network. The importance of these spanning subgraphs is demonstrated with the development of optimal algorithms for four fundamental communication problems, namely, theone-to-allandall-to-all broadcastingand theone-to-allandall-to-all scattering. Broadcastingis the distribution of the same group of messages from a source processor to all other processors, andscatteringis the distribution of distinct groups of messages from a source processor to each other processor. We consider broadcasting and scattering from a single processor of the network (one-to-all broadcasting and scattering) and simultaneously from all processors of the network (all-to-all broadcasting and scattering). For the all-to-all broadcasting and scattering algorithms, a special technique is developed on the generalized hypercube so that messages originating at individual nodes are interleaved in such a manner that no two messages contend for the same edge at any given time. The communication problems are studied under thestore-and-forward, all-portcommunication model. Lower bounds are derived for the above problems under the stated assumptions, in terms of time and number of message transmissions, and optimal algorithms are designed. 相似文献

10.

Robust and efficient membership management in large-scale dynamic networks

《Future Generation Computer Systems》2017

Epidemic protocols are a bio-inspired communication and computation paradigm for large-scale networked systems based on randomised communication. These protocols rely on a membership service to build decentralised and random overlay topologies. In large-scale, dynamic network environments, node churn and failures may have a detrimental effect on the structure of the overlay topologies with negative impact on the efficiency and the accuracy of applications. Most importantly, there exists the risk of a permanent loss of global connectivity that would prevent the correct convergence of applications. This work investigates to what extent a dynamic network environment may negatively affect the performance of Epidemic membership protocols. A novel Enhanced Expander Membership Protocol (EMP+) based on the expansion properties of graphs is presented. The proposed protocol is evaluated against other membership protocols and the comparative analysis shows that EMP+ can support faster application convergence and is the first membership protocol to provide robustness against global network connectivity problems. 相似文献

11.

A new solution for the Byzantine agreement problem

Hui-Ching HsiehAuthor Vitae 《Journal of Parallel and Distributed Computing》2011,71(10):1261-1277

Reliability is an important research topic in distributed computing systems consisting of a large number of processors. To achieve reliability, the fault-tolerance scheme of the distributed computing system must be revised. This kind of problem is known as a Byzantine agreement (BA) problem. It requires all fault-free processors to agree on a common value, even if some components are corrupt. Consequently, there have been significant studies of this agreement problem in distributed systems. However, the traditional BA protocols focus on running ⌊(n−1)/3⌋+1 rounds of message exchange continuously to make each fault-free processor reach an agreement. In other words, since having a large number of messages results in a large protocol overhead, those protocols are inefficient and unreasonable, especially for some network environments which have large number of processors. In this study, we propose a novel and efficient protocol to reduce the number of messages. Our protocol can collect, compare and replace the received values to find the reliable processors and replace the values sent by the unreliable processors. Subsequently, each processor can agree on a common value through three rounds of message exchange. Furthermore, the proposed protocol can use the minimum number of messages to tolerate the maximum number of faulty components in a distributed system. 相似文献

12.

The Design and Performance Evaluation of the DI-Multicomputer

Lynn Choi Andrew A. Chien 《Journal of Parallel and Distributed Computing》1996,36(2):119

In this paper, we propose a new multicomputer node architecture, theDI-multicomputerwhich uses packet routing on a uniform point-to-point interconnect for both local memory access and internode communication. This is achieved by integrating a router into each processor chip and eliminating the memory bus interface. Since communication resources such as pins and wires are allocated dynamically via packet routing, the DI-multicomputer is able to maximize the available communication resources, providing much higher performance for both intranode and internode communication. Multi-packet handling mechanisms are used to implement a high performance memory interface based on packet routing. The DI-multicomputer network interface provides efficient communication for both short and long messages, decoupling the processor from the transmission overhead for long messages while achieving minimum latency for short messages. Trace-driven simulations based on a suite of message passing applications show that the communication mechanisms of the DI-multicomputer can achieve up to four times speedup when compared to existing architectures. 相似文献

13.

Broadcast protocols for distributed systems

Melliar-Smith P.M. Moser L.E. Agrawala V. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(1):17-25

An innovative approach is presented to the design of fault-tolerant distributed systems that avoids the several rounds of message exchange required by current protocols for consensus agreement. The approach is based on broadcast communication over a local area network, such as an Ethernet or a token ring, and on two novel protocols, the Trans protocol, which provides efficient reliable broadcast communication, and the Total protocol, which with high probability promptly places a total order on messages and achieves distributed agreement even in the presence of fail-stop, omission, timing, and communication faults. Reliable distributed operations, such as locking, update, and commitment, typically require only a single broadcast message rather than the several tens of messages required by current algorithms 相似文献

14.

A distributed load-balancing policy for a multicomputer

Amnon Barak Amnon Shiloh 《Software》1985,15(9):901-913

This paper deals with the organization of a distributed load-balancing policy for a multicomputer system which consists of a cluster of independent computers that are interconnected by a local area communication network. We introduce three algorithms necessary to maintain load balancing in this system: the local load algorithm, used by each processor to monitor its own load; the exchange algorithm, for exchanging load information between the processors, and the process migration algorithm that uses this information to dynamically migrate processes from overloaded to underloaded processors. The policy that we present is distributed, i.e. each processor uses the same policy. It is both dynamic, responding to load changes without using an a priori knowledge of the resources that each process requires; and stable, unnecessary overloading of a processor is minimized. We give the essential details of the implementation of the policy and initial results on its performance. Our results confirm the feasibility of building distributed systems that are based on network communication for uniform access, resource sharing and improved reliability, as well as the use of workstations without a secondary storage device. 相似文献

15.

System effects of interprocessor communication latency inmulticomputers

Zhang X. 《Micro, IEEE》1991,11(2)

A series of experiments and analyses on five types of hypercube and grid-topology multicomputers, carried out to evaluate interprocessor communication performance, is described. The effects on the system of communication speed, message routing, interprocessor connectivity, and message-passing software/hardware protocols were studied. The experimental results clearly show the difference in interprocessor communication performance between the first-generation multicomputer systems and the second-generation distributed multiprocessor systems. The traditional store-and-forward technique for interprocessor communication greatly limits the communication speed among the processors. In addition, the processors of the first-generation systems are not very powerful, which is another major reason communication proceeds slowly in these systems. It is seen that the wormhole routing model greatly reduces communication latency and is not sensitive to the distance involved in passing messages 相似文献

16.

Efficient algorithms for all-to-all communications in multiportmessage-passing systems

Bruck J. Ching-Tien Ho Kipnis S. Upfal E. Weathersby D. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(11):1143-1156

We present efficient algorithms for two all-to-all communication operations in message-passing systems: index (or all-to-all personalized communication) and concatenation (or all-to-all broadcast). We assume a model of a fully connected message-passing system, in which the performance of any point-to-point communication is independent of the sender-receiver pair. We also assume that each processor has k⩾1 ports, through which it can send and receive k messages in every communication round. The complexity measures we use are independent of the particular system topology and are based on the communication start-up time, and on the communication bandwidth 相似文献

17.

On the Effect of Communication Delays in Failure Diagnosis of Decentralized Discrete Event Systems

Rami Debouk Stéphane Lafortune Demosthenis Teneketzis 《Discrete Event Dynamic Systems》2003,13(3):263-289

We study the effect of communication delays on the performance of a coordinated decentralized architecture for failure diagnosis of untimed discrete event systems. The architecture consists of local sites communicating with a coordinator that is responsible for diagnosing the failures occurring in the system. A protocol that realizes the architecture is defined by the diagnostic information generated at the local sites, the communication rules used by the local sites, and the decision rule used by the coordinator to infer the occurrence of failures. Our prior work (Debouk et al., 2000) has addressed the performance of a set of protocols under the assumption that messages are received by the coordinator in the order in which they are sent globally. In this work we relax the abovementioned assumption. We modify the coordinator's decision rule for two of the protocols analyzed in Debouk et al. (2000) to account for the reception of out of order messages. We discover conditions on the system structure under which the modified protocols perform as well as the centralized diagnostic scheme proposed in Sampath et al. (1995). 相似文献

18.

A bandwidth latency tradeoff for broadcast and reduction

Peter Sanders Jop F Sibeyn 《Information Processing Letters》2003,86(1):33-38

The “fractional tree” algorithm for broadcasting and reduction is introduced. Its communication pattern interpolates between two well known patterns—sequential pipeline and pipelined binary tree. The speedup over the best of these simple methods can approach two for large systems and messages of intermediate size. For networks which are not very densely connected the new algorithm seems to be the best known method for the important case that each processor has only a single (possibly bidirectional) channel into the communication network. 相似文献

19.

Availability-based noncontiguous processor allocation policies for 2D mesh-connected multicomputers 总被引：1，自引：0，他引：1

Ismail Ababneh 《Journal of Systems and Software》2008,81(7):1081-1092

Various contiguous and noncontiguous processor allocation policies have been proposed for mesh-connected multicomputers. Contiguous allocation suffers from high external processor fragmentation because it requires that the processors allocated to a parallel job be contiguous and have the same topology as the multicomputer. The goal of lifting the contiguity condition in noncontiguous allocation is reducing processor fragmentation. However, this can increase the communication overhead because the distances traversed by messages can be longer, and messages from different jobs can interfere with each other by competing for communication resources. The extra communication overhead depends on how the allocation request is partitioned and mapped to free processors. In this paper, we investigate a new class of noncontiguous allocation schemes for two-dimensional mesh-connected multicomputers. These schemes are different from previous ones in that request partitioning is based on the submeshes available for allocation. The available submeshes selected for allocation to a job are such that a high degree of contiguity among their processors is achieved. The proposed policies are compared to previous noncontiguous policies using detailed simulations, where several common communication patterns are considered. The results show that the proposed policies can reduce the communication overhead and improve performance substantially. 相似文献

20.

Communication styles for parallel systems

Gross T. Hinrichs S. O'Hallaron D.R. Stricker T. Hasegawa A. 《Computer》1994,27(12):34-44

Distributed-memory parallel systems rely on explicit message exchange for communication, but the communication operations they support can differ in many aspects. One key difference is the way messages are generated or consumed. With systolic communication, a message is transmitted as it is generated. For example, the result computed by the multiplier is sent directly to the communication subsystem for transmission to another node. With memory communication, the complete message is generated and stored in memory, and then transmitted to its destination. Since sender and receiver nodes are individually controlled, they can use different communication styles. One example of memory communication is message passing: both the sender and receiver buffer the message in memory. These two communication styles place different demands on processor design. This article illustrates each style's effect on processor resources for some key application kernels. We are targeting the iWarp system because it supports both communication styles. Two parallel-program generators, one for each communication style, automatically map the sample programs 相似文献