首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Raid, a robust and adaptable distributed database system for transaction processing, is described. Raid is a message-passing system, with server processes on each site. The servers manage concurrent processing, consistent replicated copies during site failures and atomic distributed commitment. A high-level, layered communications package provides a clean, location-independent interface between servers. The latest design of the communications package delivers messages via shared memory in a high-performance configuration in which several servers are linked into a single process. Raid provides the infrastructure to experimentally investigate various methods for supporting reliable distributed transaction processing. Measurements on transaction processing time and server CPU time are presented. Data and conclusions of experiments in three categories are also presented: communications software, consistent replicated copy control during site failures, and concurrent distributed checkpointing. A software tool for the evaluation of transaction processing algorithms in an operating system kernel is proposed  相似文献   

2.
多复制服务器间无阻塞的数据更新   总被引:2,自引:0,他引:2  
赵洪彪  周立柱 《软件学报》1998,9(4):268-272
在Client/Server系统中,服务器的可用性是提高整个系统可用性的关键,采用多复制服务器是提高系统可用性的最有希望的手段.但是,复制数据更新过程中的阻塞问题是整个系统性能的一个瓶颈.本文提出一种无阻塞的多服务器独立提交的复制数据更新方法,对于因失效不能完成更新的服务器采用协调机制使其达到相同的最终状态.  相似文献   

3.
Probabilistic Quorum Systems   总被引:1,自引:0,他引:1  
We initiate the study of probabilistic quorum systems, a technique for providing consistency of replicated data with high levels of assurance despite the failure of data servers. We show that this technique offers effective load reduction on servers and high availability. We explore probabilistic quorum systems both for services tolerant of benign server failures and for services tolerant of arbitrary (Byzantine) ones. We also prove bounds on the server load that can be achieved with these techniques.  相似文献   

4.
In this paper, we develop a model to study how to effectively download a document from a set of replicated servers. We propose a generalized application-layer anycasting protocol, known as paracasting, to advocate concurrent access of a subset of replicated servers to cooperatively satisfy a client's request. Each participating server satisfies the request in part by transmitting a subset of the requested file to the client. The client can recover the complete file when different parts of the file sent from the participating servers are received. This model allows us to estimate the average time to download a file from the set of homogeneous replicated servers, and the request blocking probability when each server can accept and serve a finite number of concurrent requests. Our results show that the file download time drops when a request is served concurrently by a larger number of homogeneous replicated servers, although the performance improvement quickly saturates when the number of servers increases. If the total number of requests that a server can handle simultaneously is finite, the request blocking probability increases with the number of replicated servers used to serve a request concurrently. Therefore, paracasting is effective when a small number of servers, say, up to four, are used to serve a request concurrently.  相似文献   

5.
This paper presents an effective method of metadata rebalance in exascale distributed file systems. Exponential data growth has led to the need for an adaptive and robust distributed file system whose typical architecture is composed of a large cluster of metadata servers and data servers. Though each metadata server can have an equally divided subset from the entire metadata set at first, there will eventually be a global imbalance in the placement of metadata among metadata servers, and this imbalance worsens over time. To ensure that disproportionate metadata placement will not have a negative effect on the intrinsic performance of a metadata server cluster, it is necessary to recover the balanced performance of the cluster periodically. However, this cannot be easily done because rebalancing seriously hampers the normal operation of a file system. This situation continues to get worse with both an ever-present heavy workload on the file system and frequent failures of server components at exascale. As one of the primary reasons for such a degraded performance, file system clients frequently fail to look up metadata from the metadata server cluster during the period of metadata rebalance; thus, metadata operations cannot proceed at their normal speed. We propose a metadata rebalance model that minimizes failures of metadata operations during the metadata rebalance period and validate the proposed model through a cost analysis. The analysis results demonstrate that our model supports the feasibility of online metadata rebalance without the normal operation obstruction and increases the chances of maintaining balance in a huge cluster of metadata servers.  相似文献   

6.
Byzantine quorum systems   总被引:12,自引:0,他引:12  
Summary. Quorum systems are well-known tools for ensuring the consistency and availability of replicated data despite the benign failure of data repositories. In this paper we consider the arbitrary (Byzantine) failure of data repositories and present the first study of quorum system requirements and constructions that ensure data availability and consistency despite these failures. We also consider the load associated with our quorum systems, i.e., the minimal access probability of the busiest server. For services subject to arbitrary failures, we demonstrate quorum systems over servers with a load of , thus meeting the lower bound on load for benignly fault-tolerant quorum systems. We explore several variations of our quorum systems and extend our constructions to cope with arbitrary client failures. Received: October 1996 / Accepted June 1998  相似文献   

7.
Parallel video servers can achieve highly storage-saving and granularly load-balancing, but they suffer from a system expansion problem. As the number of users continuously increases, the system inevitably needs to expand the number of video servers. However, the expansion of a parallel video server system is not as simple as that of a replicated video server system. Hence, this work develops an efficient expansion algorithm, called the Cyclic Expansion Algorithm (CEA), for parallel video servers. The proposed CEA algorithm has several good features. First, the data layout of each video content exhibits periodicity. Consequently, the meta-data size of each video and the complexity of the CEA algorithm are reduced. Second, the number of required data movements during a system expansion is optimized. Third, the total number of required XOR recomputations for updating parity blocks during an expansion is also minimized. Additionally, the new CEA can be applied to a variety of distributed storage systems, such as the cloud-based storage systems using striping and parity check techniques.  相似文献   

8.
Recent technology advances have made multimedia on-demand services feasible. One of the challenges is to provide fault-tolerant capability at system level for a practical video-on-demand system. The main concern on providing fault recovery is to minimize the consumption of system resources on the surviving servers in the event of server failure. In order to reduce the overhead on recovery, we present three schemes for recovering faulty playbacks through channel merging and sharing techniques on the surviving servers. Furthermore, to evenly distribute the recovery load among the surviving servers, we propose a balanced dispatch policy that ensures load balancing in both the normal server conditions and the presence of a server failure.  相似文献   

9.
The author and his colleagues at the National Security Agency designed an application called Flodar (short for Flow Radar) that monitors the flow of network traffic. The techniques and visuals used in Flodar can apply to a variety of applications. While many flow visualizations concentrate on the path of network traffic, this system monitors the status of individual servers within the system. In their particular system, they need to monitor two types of servers: those that send information at semi-regular intervals and those that receive this information and store it temporarily, waiting for users to read or process the information within a certain time. They are not as concerned with the path the data takes to get from the sending server to the storage server. They are more concerned with ensuring the sender transmits regularly and that the information on the storage server is processed before being overwritten. Therefore, monitoring the system's timeliness remains the primary objective for Flodar  相似文献   

10.
Replicated Server Placement with QoS Constraints   总被引:1,自引:0,他引:1  
The network planning problem of placing replicated servers with QoS constraints is considered. Each server site may consist of multiple server types with varying capacities and each site can be placed in any location among those belonging to a given set. Each client can be served by more than one location as long as the round-trip delay of data requests satisfies predetermined upper bounds. Our main focus is to minimize the cost of using the servers and utilizing the link bandwidth, while serving requests according to their delay constraint. This is an NP-hard problem. A pseudopolynomial and a polynomial algorithm that provide guaranteed approximation factors with respect to the optimal for the problem at hand are presented.  相似文献   

11.
This paper presents a mechanism that facilitates and enhances the use of independently administered remote network servers in the presence of server interface heterogeneity. The mechanism is designed under the client-service model, which extends the client-server model with an abstraction of service to decouple abstract server capabilities from concrete server interface specifics such as server interface binding protocols and the interface operation invocation protocols. The mechanism selects servers, accommodates server interface heterogeneity, and handles server access failures as per the abstract server capabilities desired by the client. It could return the identity of the server used for each service access invocation to facilitate billing, refining service specifications, and reporting server-specific errors. This paper also illustrates a C library interface to this mechanism, and describes a language veneer over the C programming language demonstrating how a typed procedural language could be extended by a few language constructs to support the mechanism under the client-service model. In this language, server capabilities are referenced by abstract data type (ADT) objects, and are accessed by invoking the objects' interface operations using a call-by-value-result paradigm  相似文献   

12.
Current Internet service architectures lack support for salvaging stateful client sessions when the underlying operating system fails due to hangs, crashes, deadlocks, or panics. The backdoors (BD) system is designed to detect such failures and recover service sessions in clusters of Internet servers by extracting lightweight state associated with client service sessions from server memory. The BD architecture combines hardware and software mechanisms to enable accurate monitoring and remote healing actions, even in the presence of failures that render a system unavailable.  相似文献   

13.
Most distributed operating systems are built with a kernel replicated in each machine that supports only basic interprocess communication (IPC) and process control. All other system services, such as memory management, file system, and name service, are distributed in a set of utility servers, which are ordinary processes (except perhaps for some privileges) residing at various machines. Design and implementation of such utility servers in distributed environments are far different from those in a centralized system. This paper presents our experience in building utility servers in Charlotte, a message-based distributed operating system running on a loosely-coupled multicomputer. Utility services in Charlotte are provided by server squads. Each member in a squad covers services to its own community. The squad as a whole co-operatively provides services to the entire system. These servers are designed with the goals of simplicity, efficiency and robustness. They are intended to support a multiprogramming system for the development of distributed algorithms and other distributed applications. We address several major issues in developing a utility server, including the server structure, the management of message buffers, deadlock, and the robustness of server processes. Several utility servers in the Charlotte system are discussed as real examples.  相似文献   

14.
一个虚拟Internet服务器的设计与实现   总被引:11,自引:0,他引:11  
针对已有的解决Internet服务器性能瓶颈和可靠性问题的方法所存在的不足,提出基于IP层负载平衡调度的解决方法,将一组服务器构成一个可伸缩的、高可用的虚拟Internet服务器.通过在服务机群中透明地加入和删除结点以实现系统的伸缩性;通过检测结点或服务进程故障和正确地重置系统达到高可用性.详细讨论了虚拟Internet服务器的体系结构、设计方法和实现技术,并给出了相应的性能测试结果.  相似文献   

15.
The mobile agents create a new paradigm for data exchange and resource sharing in rapidly growing and continually changing computer networks. In a distributed system, failures can occur in any software or hardware component. A mobile agent can get lost when its hosting server crashes during execution, or it can get dropped in a congested network. Therefore, survivability and fault tolerance are vital issues for deploying mobile-agent systems. This fault tolerance approach deploys three kinds of cooperating agents to detect server and agent failures and recover services in mobile-agent systems. An actual agent is a common mobile agent that performs specific computations for its owner. Witness agents monitor the actual agent and detect whether it's lost. A probe recovers the failed actual agent and the witness agents. A peer-to-peer message-passing mechanism stands between each actual agent and its witness agents to perform failure detection and recovery through time-bounded information exchange; a log records the actual agent's actions. When failures occur, the system performs rollback recovery to abort uncommitted actions. Moreover, our method uses checkpointed data to recover the lost actual agent.  相似文献   

16.
Motivated by the trade-off between reliability and utilization level of a stochastic service system, we considers a Markovian multi-server vacation queueing system with c unreliable servers. In such a system, some servers may not be available due to either planned stoppage (vacations) or unplanned service interruptions (server failures). The vacations are controlled by a threshold policy. With this policy, at a service completion instant, if d (?c) servers become idle, they take a vacation together and will keep taking vacations until they find at least cd + 1 customers are in the system at a vacation completion instant, and then they return to serve the queue. In addition, all on-duty servers are subject to failures and can be repaired within a random period of time. We formulate a quasi-birth–death (QBD) process, establish the stability condition, and develop a computational algorithm to obtain the stationary performance measures of the system. Numerical examples are presented to show the performance evaluation and optimization of such a system. The insights gained from this model help practitioners make capacity and operating decisions for this type of waiting line systems.  相似文献   

17.
In this paper, a M/G/n/c multiserver queueing system with basic and standby servers is studied. Customers servicing is disturbed by failures of servers that make up a simplest flow. After the failure, the server needs a random time for renewal. It is also assumed that customers have limited, exponentially distributed waiting time in the system. The system is studied in both stationary and nonstationary modes.  相似文献   

18.
Large-scale distributed applications such as online information retrieval and collaboration over computational elements demand an approach to self-managed computing systems with a minimum of human interference. However, large scales and full distribution often lead to poor system dependability and security, and increase the difficulty in managing and controlling redundancy for fault tolerance. In particular, fault tolerance schemes for mobile agents to survive agent server crash failures in an autonomie environment are complex since developers normally have no control over remote agent servers. Some solutions inject a replica into stable storage upon its arrival at an agent server. But in the event of an agent server crash the replica is unavailable until the agent server recovers. In this paper we present a failure model and an exception handling framework for mobile agent systems. An exception handling scheme is developed for mobile agents to survive agent server crash failures. A replica mobile agent operates at the agent server visited prior to its master's current location. If a master crashes its replica is available as a replacement. The proposed scheme is examined in comparison with a simple time-out scheme. Experimental evaluation is performed, and performance results show that the scheme leads to some overhead in the round trip time when fault tolerance measures are exercised. However the scheme offers the advantage that fault tolerance is provided during the mobile agent trip, i.e. in the event of an agent server crash all agent servers are not revisited.  相似文献   

19.
In this paper, we investigate the issue of server selection for parallel download in overlay content-distribution networks. To achieve high performance and resilience to failures, a receiver can make connections with multiple servers simultaneously and receive different portions of the data from the servers in parallel. Prior studies mostly focus on the user-centric performance objectives, such as reducing the round-trip time (RTT) or the completion time of individual download, but tend to ignore the congestion caused by the concurrent connections from different servers or the total network resource usage. The latter performance concerns are important for the service providers who operate content-distribution networks. In this paper, we present a node-selection scheme in a hypercube-like overlay network that generates the optimal server set with respect to the worst-case link stress (WLS) criterion. The algorithm allows scaling to a large system because it is very efficient and does not require network measurement or collection of topology or routing information. It has performance advantages in a number of areas, particularly against the random selection scheme. First, it minimizes the level of congestion at the bottleneck link. This is equivalent to maximizing the achievable throughput. Second, it consumes less network resources in terms of the total number of links used and the total bandwidth usage. Third, it often leads to low average round-trip time to selected servers, hence, allowing nearby nodes to exchange more data, an objective sought by many content-distribution systems.  相似文献   

20.
The work presents a new protocol, VELOS, for tolerating partitionings in distributed systems with replicated data. Our primary goals were influenced by efficiency and availability constraints. The proposed protocol achieves optimal availability, according to a well known metric, while ensuring one copy serializability. In addition, however, VELOS is designed to reduce the cost involved in achieving high availability. We have developed mechanisms through which transactions, in the absence of failures, can access replicated data objects and observe shorter delays than related protocols, and impose smaller loads on the network and the servers. Furthermore, VELOS offers high availability without relying on system transactions that must execute to restore availability when failures and recoveries occur. Such system transactions typically access all (replicas of all) data objects and thus introduce significant delays to user transactions and consume large quantities of resources such as network bandwidth and CPU cycles. Thus, we offer our protocol as a proof that high availability can be achieved inexpensively  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号