首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The mobile agents create a new paradigm for data exchange and resource sharing in rapidly growing and continually changing computer networks. In a distributed system, failures can occur in any software or hardware component. A mobile agent can get lost when its hosting server crashes during execution, or it can get dropped in a congested network. Therefore, survivability and fault tolerance are vital issues for deploying mobile-agent systems. This fault tolerance approach deploys three kinds of cooperating agents to detect server and agent failures and recover services in mobile-agent systems. An actual agent is a common mobile agent that performs specific computations for its owner. Witness agents monitor the actual agent and detect whether it's lost. A probe recovers the failed actual agent and the witness agents. A peer-to-peer message-passing mechanism stands between each actual agent and its witness agents to perform failure detection and recovery through time-bounded information exchange; a log records the actual agent's actions. When failures occur, the system performs rollback recovery to abort uncommitted actions. Moreover, our method uses checkpointed data to recover the lost actual agent.  相似文献   

2.
3.
Large scale video servers are typically based on disk arrays that comprise multiple nodes and many hard disks. Due to the large number of components, disk arrays are susceptible to disk and node failures that can affect the server reliability. Therefore, fault tolerance must be already addressed in the design of the video server. For fault tolerance, we consider parity-based as well as mirroring-based techniques with various distribution granularities of the redundant data. We identify several reliability schemes and compare them in terms of the server reliability and per stream cost. To compute the server reliability, we use continuous time Markov chains that are evaluated using the SHARPE software package. Our study covers independent disk failures and dependent component failures. We propose a new mirroring scheme called Grouped One-to-One scheme that achieves the highest reliability among all schemes considered. The results of this paper indicate that dividing the server into independent groups achieves the best compromise between the server reliability and the cost per stream. We further find that the smaller the group size, the better the trade-off between a high server reliability and a low per stream cost  相似文献   

4.
顾佳伟 《微机发展》2007,17(8):140-143
为了构造和部署大规模的多agent系统,人们必须找到并解决其基本问题,其中之一就是可能存在的局部性系统故障。这也就意味着,容错对于大规模多agent系统来说,是一个无法回避的主题。文中讨论了这类问题并且提出了一种多agent系统的容错方法。最先的想法是将复制策略运用到agent中,对处于危急状态的agent进行复制从而避免系统故障,但是由于agent的危急性会在执行过程中演变,并且agent的可用资源是绑定的,所以需要动态以及自动地调整agent的复制体个数,从而最大化它们的作用和可靠性。文中将描述评估某个agent危险性的方法以及相关机制,并且决定使用何种策略(如:主动复制,被动复制)以及如何将其参数化(如:复制的个数)。  相似文献   

5.
目前许多P2P网络存储系统都采用了m/n容错机制来提高系统的可用性和可靠性,但是在实际应用中,服务器之间发生相关错误会导致这种容错机制具有低容错率.针对这种问题,描述了一种在P2P系统中寻找低错误相关的服务器节点集合的方法,m/n容错机制可以通过使用此集合中的服务器节点来提高其容错率,从而使得系统具有高可用性和可靠性,并对此方法进行了实验分析,验证了方法实用有效.  相似文献   

6.
If an off-the-shelf software product exhibits poor dependability due to design faults, then software fault tolerance is often the only way available to users and system integrators to alleviate the problem. Thanks to low acquisition costs, even using multiple versions of software in a parallel architecture, which is a scheme formerly reserved for few and highly critical applications, may become viable for many applications. We have studied the potential dependability gains from these solutions for off-the-shelf database servers. We based the study on the bug reports available for four off-the-shelf SQL servers plus later releases of two of them. We found that many of these faults cause systematic noncrash failures, which is a category ignored by most studies and standard implementations of fault tolerance for databases. Our observations suggest that diverse redundancy would be effective for tolerating design faults in this category of products. Only in very few cases would demands that triggered a bug in one server cause failures in another one, and there were no coincident failures in more than two of the servers. Use of different releases of the same product would also tolerate a significant fraction of the faults. We report our results and discuss their implications, the architectural options available for exploiting them, and the difficulties that they may present.  相似文献   

7.
The primary concern of traditional Byzantine fault tolerance is to ensure strong replica consistency by executing incoming requests sequentially according to a total order. Speculative execution at both clients and server replicas has been proposed as a way of reducing the end-to-end latency. In this article, we introduce optimistic Byzantine fault tolerance. Optimistic Byzantine fault tolerance aims to achieve higher throughput and lower end-to-end latency by using a weaker replica consistency model. Instead of ensuring strong safety as in traditional Byzantine fault tolerance, nonfaulty replicas are brought to a consistent state periodically and on-demand in optimistic Byzantine fault tolerance. Not all applications are suitable for optimistic Byzantine fault tolerance. We identify three types of applications, namely, realtime collaborative editing, event stream processing, and services constructed with conflict-free replicated data types, as good candidates for applying optimistic Byzantine fault tolerance. Furthermore, we provide a design guideline on how to achieve eventual consistency and how to recover from conflicts at different replicas. In optimistic Byzantine fault tolerance, a replica executes a request immediately without first establishing a total order of the message, and Byzantine agreement is used only to establish a common state synchronization point and the set of individual states needed to resolve conflicts. The recovery mechanism ensures both replica consistency and the validity of the system by identifying and removing the operations introduced by faulty clients and server replicas.  相似文献   

8.
A mobile agent is an object which can autonomously migrate in a distributed system to perform tasks on behalf of its creator. Security issues in regard to the protection of host resources, as well as the agent themselves, raise significant obstacles in practical applications of the agent paradigm. This article describes the security architecture of Ajanta, a Java‐based system for mobile agent programming. This architecture provides mechanisms to protect server resources from malicious agents, agent data from tampering by malicious servers and communication channels during its travel, and protection of name service data and the global namespace. We present here a proxy based mechanism for secure access to server resources by agents. Using Java's class loader model and thread group mechanism, isolated execution domains are created for agents at a server. An agent can contain three kinds of protected objects: read‐only objects whose tampering can be detected, encrypted objects for specific servers, and a secure append‐only log of objects. A generic authentication protocol is used for all client–server interactions when protection is required. Using this mechanism, the security model of Ajanta enforces protection of namespaces, and secure execution of control primitives such as agent recall or abort. Ajanta also supports communication between agents using RMI, which can be controlled if required by the servers' security policies. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

9.
Advances in data collection techniques and database technologies, such as remote sensing and satellite telemetry, have led to the collection of huge amounts of data distributed among large databases and heterogeneous remote sites. Intelligent and automatic processing of the distributed data and efficiently supporting scientific collaboration between both professional and casual users is a highly demanding task. It is also particularly challenging when the system must cope with active data that is processed on-demand. These requirements have generated an urgent need for more advanced software infrastructure to create, maintain, evolve, and federate these active digital libraries of scientific data. Traditional models of distributed computing are inadequate to support such complex applications. As part of the ongoing Synthetic Aperture Radar Atlas (SARA) Digital Library project, the research presented here proposes a collaborating mobile agent approach to on-demand processing of remote sensing data. The approach, which is based on autonomous data processing and enables different image analysis algorithms to be wrapped as mobile agents, is expected to be an improvement over the static CGI-based interface and inefficient information discovery that are currently used by SARA. We discuss the agent-based infrastructure we have developed. The SARA system allows users to dispatch their compute-intensive jobs as mobile agents. Since the agents can be programmed to satisfy their specific goals, even if they move and lose contact with their creators they can survive intermittent or unreliable network connections. During their lifetime, the agents can also move themselves autonomously from one server to another for load balancing, and to enhance data locality and fault tolerance. The SARA system relies on XML to support agent communications on clusters of servers. Although the examples presented are based mainly on the SARA system, the proposed techniques are applicable to other active archives. In particular, we believe the proposed agent design can be used to dynamically configure distributed parallel computing resources and automatically integrate data analysis in remote sensing systems.  相似文献   

10.
在电子商务应用中,移动Agent被用于代表客户搜索某个特定的产品,在这个过程中移动Agent很容易受到恶意主机的攻击。文中在分析了现有对付恶意主机方法上所存在缺陷的基础上,提出了一种基于改进RSA算法的非交互性CEF签名机制,能够在无交互的条件下进行快速加密和安全的签名,使得客户的签名不会被滥用,解决了移动Agent在恶意环境中所遇到的问题,保证了移动Agent的秘密性和完整性。  相似文献   

11.
The AQuA architecture provides adaptive fault tolerance to CORBA applications by replicating objects and providing a high-level method that an application can use to specify its desired level of dependability. This paper presents the algorithms that AQUA uses, when an application's dependability requirements can change at runtime, to tolerate both value faults in applications and crash failures simultaneously. In particular, we provide an active replication communication scheme that maintains data consistency among replicas, detects crash failures, collates the messages generated by replicated objects, and delivers the result of each vote. We also present an adaptive majority voting algorithm that enables the correct ongoing vote while both the number of replicas and the majority size dynamically change. Together, these two algorithms form the basis of the mechanism for tolerating and recovering from value faults and crash failures in AQuA  相似文献   

12.
The Web is increasingly used for critical applications and services. We present a client-transparent mechanism, called CoRAL, that provides high reliability and availability for Web service. CoRAL provides fault tolerance even for requests being processed at the time of server failure. The scheme does not require deterministic servers and can thus handle dynamic content. CoRAL actively replicates the TCP connection state while maintaining logs of HTTP requests and replies. In the event of a primary server failure, active client connections fail over to a spare, where their processing continues seamlessly. We describe key aspects of the design and implementation as well as several performance optimizations. Measurements of system overhead, failover performance, and preliminary validation using fault injection are presented.  相似文献   

13.
基于移动Agent的LBS应用平台设计与实现   总被引:2,自引:0,他引:2  
钟世明  张胜  辜志力  朱才连 《计算机应用》2005,25(10):2306-2309
利用移动Agent的自主性及可以在异构的软、硬件网络环境中自由移动的特性来构建分布式LBS(Location Based Services)系统。在带宽有限的移动网络环境中,能够实时有效的为使用非持续连接设备的LBS用户提供各种信息服务,克服了现有LBS系统必须稳定连接,客户端与服务器端不能异步工作的缺点。详细讨论了LBS系统组成、移动Agent消息与事件模型、移动Agent设计、移动Agent迁移以及客户端设计,最后实现了一个系统原型。  相似文献   

14.
Auction mechanisms are nowadays widely used in electronic commerce Web sites for buying and selling items among different users. The increasing importance of auction protocols in the negotiation phase is not limited to online marketplaces. In fact, the wide applicability of auctions as resource‐allocation and negotiation mechanisms have also led to a great deal of interest in auctions within the agent community. A challenging issue for agents operating in open Multiagent Systems (such as the emerging semantic Web infrastructure) concerns the specification of declarative communication rules which could be published and shared allowing agents to dynamically engage well‐known and trusted negotiation protocols. To cope with real‐world applications, these rules should also specify fault tolerant patterns of interaction, enabling negotiating agents to interact with each other tolerating failures, for instance terminating an auction process even if some bidding agents dynamically crash. In this paper, we propose an approach to specify fault tolerant auction protocols in open and dynamic environments by means of communication rules dealing with crash failures of agents. We illustrate these concepts considering a case study about the specification of an English Auction protocol which tolerate crashes of bidding agents and we discuss its properties.  相似文献   

15.
16.
The growth of web-based applications in business and e-commerce is building up demands for high performance web servers for better throughputs and lower user-perceived latency. These demands are leading to a widespread substitution of powerful single servers by robust newcomers, cluster web servers, in many enterprise companies. In this respect the load-balancing algorithms play an important role in boosting the performance of cluster servers. The previous load-balancing algorithms which were designed for the handling of static contents in web services suffer from significant performance degradation under dynamic and database-driven workloads. Regarding this, we propose an approximation-based load-balancing algorithm with admission control for cluster-based web servers in this study. Since it is difficult to accurately determine the loads of web servers through feedbacks from distributed agents in web servers, we propose an analytical model of a web server to estimate the web servers’ loads. To achieve this, the algorithm classifies requests based on their service times and track numbers of outstanding requests for each class of each web server node and also based on their resource demands to dynamically estimate the loads of each node. For the error handling of the model a proportional integral (PI) controller from control theory is used. Then the estimated available capacity of each web server is used for load balancing and admission control decisions. The implementation results with a standard benchmark confirm the effectiveness of the proposed scheme, which improves both the mean response time and the throughput of the cluster compared to rival load-balancing algorithms, and also avoids situations in which the cluster is overloaded, even when the request rates are beyond the cluster capacity.  相似文献   

17.
Agent communication languages (ACLs) should allow the developer to adopt human-like communication mechanisms in agent programming, facilitating the development of distributed protocols in multi-agent systems (MASs). However, to implement robust protocols, ACLs should provide a way to deal with the failures of agents, as MASs are prone to the same failures that can occur in any distributed software system. In this paper, we address this issue showing how an asynchronous ACL that provides high-level mechanisms to deal with crash failures of agents can be effectively used to specify fault tolerant protocols.  相似文献   

18.
Dissimilar to traditional networks, the features of mobile wireless devices that can actively form a network without any infrastructure mean that mobile ad hoc networks frequently display partition due to node mobility or link failures. These indicate that an ad hoc network is difficult to provide ou-llne access to a trusted authority server. Therefore, applying traditional Public Key Infrastructure (PKI) security framework to mobile ad hoc networks will cause insecurities. This study proposes a scalable and elastic key management scheme integrated into Cluster Based Secure Routing Protocol (CBSRP) to enhance security and non-repudiation of routing authentication, and introduces an ID-Based internal routing authentication scheme to enhance the routing performance in an internal cluster. Additionally, a method of performing routing authentication between internal and external clusters, as well as inter-cluster routing authentication, is developed. The proposed cluster-based key management scheme distributes trust to an aggregation of cluster heads using a threshold scheme faculty, provides Certificate Authority (CA) with a fault tolerance mechanism to prevent a single point of compromise or failure, and saves CA large repositories from maintaining member certificates, making ad hoc networks robust to malicious behaviors and suitable for numerous mobile devices.  相似文献   

19.
基于集群服务器的容灾系统的副本管理研究*   总被引:4,自引:0,他引:4  
提出一种基于集群服务器的容灾系统副本管理方案,提出多个副本的一致性维护和副本选择的算法以及副本数量和分布方式的数学模型。通过容灾系统的性能测试实验,证明它能够实现数据的快速自动恢复,有效地管理副本,并保持副本可靠性和集群服务器性能之间的平衡。  相似文献   

20.
This paper investigates into fault tolerance of cluster of servers and their energy efficiency to realize a reliable and energy aware server cluster system. A client issues a request to one server in a server cluster and the server sends a reply to the client in information systems. Once the server stops by fault, the client does not receive a reply of the request. Even if the request is performed on another server on detection of fault of the server, some QoS requirements like response time may not be satisfied. Hence, each request has to be redundantly performed on multiple servers to be tolerant of server faults. The redundant power consumption laxity-based (RPCLB) algorithm is discussed where multiple servers are selected to redundantly and energy-efficiently perform a request process in our previous studies. Since each application process is redundantly performed on more than one server, the larger amount of electric power is consumed. In this paper, we propose a novel and improved RPCLB (IRPCLB) algorithm to reduce the power consumption of servers, where once a process successfully terminates on one server, meaningless redundant processes are forced to terminate on the other servers. In the evaluation, we show the total power consumption of servers and total execution time of processes are reduced in homogeneous and heterogeneous types of clusters by the IRPCLB algorithm than the RPCLB and RR algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号