共查询到20条相似文献,搜索用时 31 毫秒
1.
V.K. Garg C.M. Chase Richard Kilgore J.Roger Mitchell 《Journal of Parallel and Distributed Computing》1997,45(2):191
This paper discusses efficient detection of global predicates in a distributed program. Previous work in this area required predicates to be specified as a conjunction of predicates defined on individual processes. Many properties in distributed systems, however, use the state of channels, such as “the channel is empty,” or “there is a token in the channel.” In this paper, we introduce the concept of alinearchannel predicate and provide efficient centralized and distributed algorithms to detect any conjunction of local and linear channel predicates. The class of linear predicates is fairly broad. For example, classic problems such as detection of termination and computation of global virtual time are instances of conjunctions of linear channel predicates. Linear predicates can be functions of the number of messages in the channel, or can be based upon the actual contents of the messages. The main application of our results are in debugging and testing of distributed programs. For these applications it is important to detect thefirststate where some predicate is true. We show that this first state is uniquely defined if and only if linear predicates are used. 相似文献
2.
程序观察研究程序一次执行的行为特性,由于缺乏全局信息,观察分而式程序是当相困难的,分布式程序观察的核心是全局性质检测问题,文中提出了分布式程序观察-全局性质检测的方法学分类,讨论了全局性质检测的通用方法,检测稳定性质的快照方法,检测不稳定性持贩格方法,将动态性质检测转换为语言识别问题,同时讨论了检测特殊结构全局性质的方法,强稳定性质和局部稳定性质,合取不稳定性质及流/波模式的动态性质。 相似文献
3.
A chained-matrices approach for parallel computing thenth convergent of continued fractions is presented. The resulting algorithm computes the entire prefix values of any continued fraction inO(logn) time on the EREW PRAM model or a network withO(n/logn) processors connected by the cube-connectedcycles, binary tree, perfect shuffle, or hypercube. It can be applied to approximate the transcendental numbers, such as ande, inO(logm) time by usingO(m/logm) processors for a result withm-digit precision. We also use it to costoptimally solve the second-order linear recurrence, the polynomial evaluation, the recurrence of vector norm, the general class of recurrence equation defined by Kogge and Stone (1973), and the generalmth order linear recurrence. It is easy to implement because there are only some matrix multiplications and a division operation involved.This work was supported in part by National Science Council of the Republic of China under Contract NSC 77-0408-E002-09. 相似文献
4.
5.
6.
重点将UML图和动态切片应用于回归测试中。针对现在应用于软件开发的UML图不能很好地满足软件测试的要求,引入了时序状态图和改进后的状态图,对两种图进行形式化定义,并且通过实例说明定义内容,其中时序状态图用于类间测试,改进后的状态图用于类内测试。对定义的图进行切片分析,形成测试步骤和测试算法。网上购物实例表明时序状态图和改进状态图可以提高回归测试效率。 相似文献
7.
Ranganath Atreya Neeraj Mittal Ajay D. Kshemkalyani Vijay K. Garg Mukesh Singhal 《Journal of Parallel and Distributed Computing》2007
We present an efficient approach to detect a locally stable predicate in a distributed computation. Examples of properties that can be formulated as locally stable predicates include termination and deadlock of a subset of processes. Our algorithm does not require application messages to be modified to carry control information (e.g., vector timestamps), nor does it inhibit events (or actions) of the underlying computation. The worst-case message complexity of our algorithm is O(n(m+1)), where n is the number of processes in the system and m is the number of events executed by the underlying computation. We show that, in practice, its message complexity should be much lower than its worst-case message complexity. The detection latency of our algorithm is O(d) time units, where d is the diameter of communication topology. Our approach also unifies several known algorithms for detecting termination and deadlock. We also show that our algorithm for detecting a locally stable predicate can be used to efficiently detect a stable predicate that is a monotonic function of other locally stable predicates. 相似文献
8.
装备自动测试系统软件的可测试性设计与分析 总被引:1,自引:0,他引:1
针对导弹通用自动测试系统的功能与实现,对系统的软件部分进行可测试性设计分析研究。结合目前测试系统软件的测试与排错技术研究,提出了几种提高软件可测试性的可行性设计技术。实践证明,这些技术可以显著提高自动测试系统软件的可测试性。 相似文献
9.
Testability, the tendency for software to reveal its faults during testing, is an important issue for verification and quality assurance. But testability can also be used to good advantage as a debugging technique. Although this concept is more general, we will illustrate it with a specific example: propagation analysis.Propagation Analysis (PA) is a technique for predicting the probability that a data state error affects program output. PA is a technique that produces information about a piece of software's testability. PA bases its prediction on empirical measurement of the probability that an artificial data state error affects program output. After obtaining propagation analysis information for a program and obtaining a failure probability estimate for the program during execution we build a model that can be used to identify possible sites of missing-assignment faults of the form x f(x). Thus we can apply the testability technique PA as a debugging tool.This work supported by a National Research Council NASA-Langley Resident Research Associateship and NASA-Langley Grant NAG-1-884. 相似文献
10.
针对Spark数据集不可变,以及Java虚拟机(JVM)依赖环境引起的代码执行、内存管理、数据序列化/反序列化等开销过多的不足,采用C/C++语言,设计并实现了一种轻量级的大数据运算系统--Helius。Helius支持Spark的基本操作,同时允许数据集整体修改;同时,Helius利用C/C++优化内存管理和网络传输,并采用stateless worker机制简化分布式计算平台的容错恢复过程。实验结果显示:5次迭代中,Helius运行PageRank算法的时间仅为Spark的25.12%~53.14%,运行TPCH Q6的时间仅为Spark的57.37%;在PageRank迭代1次的基础上,运行在Helius系统下时,master节点IP接收和发送数据量约为运行于Spark系统的40%和15%,而且200 s的运行过程中,Helius占用的总内存约为Spark的25%。实验结果与分析表明,与Spark相比,Helius具有节约内存、不需要序列化和反序列化、减少网络交互以及容错简单等优点。 相似文献
11.
Panagiotis Katsaros Lefteris Angelis Constantine Lazos 《Concurrency and Computation》2007,19(1):37-63
Checkpointing has a crucial impact on systems' performance and fault‐tolerance effectiveness: excessive checkpointing results in performance degradation, while deficient checkpointing incurs expensive recovery. In distributed systems with independent checkpoint activities there is no easy way to determine checkpoint frequencies optimizing response‐time and fault‐tolerance costs at the same time. The purpose of this paper is to investigate the potentialities of a statistical decision‐making procedure. We adopt a simulation‐based approach for obtaining performance metrics that are afterwards used for determining a trade‐off between checkpoint interval reductions and efficiency in performance. Statistical methodology including experimental design, regression analysis and optimization provides us with the framework for comparing configurations, which use possibly different fault‐tolerance mechanisms (replication‐based or message‐logging‐based). Systematic research also allows us to take into account additional design factors, such as load balancing. The method is described in terms of a standardized object replication model (OMG FT‐CORBA), but it could also be applied in other (e.g. process‐based) computational models. Copyright © 2006 John Wiley & Sons, Ltd. 相似文献
12.
针对分布式环境提出一种容错的文件数据复制与更新机制,其算法/协议建立在分布式算法理论的基础上,具有较强的容错性、故障恢复透明性和较高的效率,支持服务器同步和异步两种复制模式以及客户机启动与服务器启动两种工作方式。该机制可广泛应用于Internet分布式文件系统、分布式数据库、WEB镜像服务器以及分布式软件分发、群集服务器等应用中。 相似文献
13.
Defect detection activities are generally seen as expensive and time consuming. This article reports on actual experiences from large projects. Based on data, dependencies between the costs of early defect detection, late defect detection and defect fixing are analysed and some guidelines are derived for timing and resource allocation to ensure cost-effective defect detection. Experience reports show that the costs of defect fixing in software development grow dramatically when the defects are found late, i.e. when the defects have already influenced the next phase. This common experience is supported by all the projects, two of which are considered in this paper. However, metrics applied to the data show that it is not always an order of magnitude that lies between the costs of early and late fixing of defects. Rather, the average cost of defects found late decreases with introduction of early defect detection. This is due to the fact that early document reviews leave only minor defects, e.g. in the requirements or the design. Major and therefore expensive defects tend to be found in those early reviews. Further analysis of the data led to the use of metrics that proved very useful when planning and executing the required defect detection activities. For example the number of remarks per 100 lines was used to decide how efficient reviews of the design have been and whether more or less resources should be allocated to these reviews in a running project. As a conclusion, it is emphasized that a substantial amount of a project's budget and time will be saved by introducing systematic defect detection early in the project. It is suggested that defect detection activities should start as soon as the requirements documents are declared finished. 相似文献
14.
免疫机器人的仿生计算与控制 总被引:2,自引:0,他引:2
传统的移动机器人研究一般假设环境是安全的,为了增强机器人在危险、变化的环境中适应无人作业的能力,提高机器人对外界干扰、攻击和破坏的抵抗力、容错力和免疫力,提出了危险环境的自体/异体建模方法和免疫机器人的仿生计算模型与控制方法.模仿生物免疫系统,构建机器人的免疫计算模型和免疫控制结构,实现类似于生物免疫系统的自体/异体检测、辨别、学习和修复及鲁棒性、免疫性等功能.免疫机器人技术用来检测、识别和预报危险、变化的环境,检测并修复机器人的正常状态,实现恶劣环境中机器人仿生控制,具有重要的理论创新意义、明显的技术创新价值和可观的应用前景. 相似文献
15.
Detecting causal relationships in distributed computations: In search of the holy grail 总被引:3,自引:0,他引:3
Summary The paper shows that characterizing the causal relationship between significant events is an important but non-trivial aspect for understanding the behavior of distributed programs. An introduction to the notion of causality and its relation to logical time is given; some fundamental results concerning the characterization of causality are presented. Recent work on the detection of causal relationships in distributed computations is surveyed. The issue of observing distributed computations in a causally consistent way and the basic problems of detecting global predicates are discussed. To illustrate the major difficulties, some typical monitoring and debugging approaches are assessed, and it is demonstrated how their feasibility is severely limited by the fundamental problem to master the complexity of causal relationships.
Reinhard Schwarz received a diploma in computer science from the University of Kaiserslautern, Germany, in 1990. Since then, he is working as a research assistant at the computer science department. His research interests include debugging and monitoring of distributed systems, runtime support for object-oriented distributed programming, and distributed algorithms.
Friedemann Mattern received the diploma in computer science from Bonn University, Germany, and the Ph.D. degree from the University of Kaiserslautern, Germany, in 1983 and 1989, respectively. Since 1991 he is a professor of computer science at the University of Saarland in Saarbrücken, Germany. His current research interests include programming of distributed systems, distributed applications, and distributed algorithms.The work presented in this paper was carried out as part of the PARAWAN project supported by the Bundesministerium für Forschung und Technologie (BMFT) 相似文献
16.
This paper presents a unified approach to model the reliability growth of software with imperfect debugging and coverage factor. Existing testing coverage‐based software reliability growth models considered that faults present at a particular fault location are detected with certainty during the testing process. Practically, it is very difficult to detect all software faults. To overcome this limitation, a revised software reliability growth model has been developed with the assumption that detection of the faults at a particular fault location is not definite. Furthermore, a new method to model the imperfect debugging phenomenon has been incorporated in the proposed study. A revised model ranking method has been developed to improve the accuracy of model ranking, which is mainly extension of existing normalized criteria distance method. Change point analysis has been done with the effect of different environmental factors on the models' parameters. Numerical examples are given to demonstrate the effectiveness of the proposed model. 相似文献
17.
文中首先介绍了分布式算法的相关概念和分布式算法的分类,然后根据同步模型和异步模型的特点,分别讨论了两种模型的研究方法,重点研究了异步网络模型中的一致性全局快照与稳定属性检测的问题,详细解释并改进了异步网络模型A算法的终止检测镜像算法,同时分析了算法的时间及通信复杂度。 相似文献
18.
19.
A common approach to fault-tolerant software DSM is to take checkpoints with message logging. Our remote logging has low overhead
because each node saves the coherence-related data into the memory of a remote node through a high-speed system area network.
For more lightweight fault-tolerant DSM, in this paper, we mainly focused on eliminating shared memory checkpointing during
failure-free execution. Each node independently takes the checkpoints of execution states and non-shared data only. When a
node fails, it regenerates its pages from the remote copies in live nodes. In order to efficiently reconstruct pages, we also
introduced a XOR-diffing technique. The diff logs, which have been created by XOR operations during failure-free execution,
can be applicable to any version of remote copies either backward or forward for recovery. Our scheme reduces the checkpointing
overhead and also alleviates the imbalance in execution times among nodes due to independent checkpointing.
This research is supported by KISTEP under the National Research Laboratory program. 相似文献
20.
软件项目质量管理是贯穿整个软件生命周期的重要工作,有效地实施质量控制是提高软件质量、降低成本的重要手段。针对质量控制的难点,对质量控制流程与技术进行研究。首先,说明质量控制过程,包括事前质量控制、事中质量控制和事后质量控制;然后,通过图形研究质量控制流程;最后,通过图形并结合文字说明研究质量控制技术,包括因果图、Pareto图、控制图、运行图、统计抽样等5种常用技术。结果表明,通过对质量控制流程与技术的研究,为质量控制提供技术和方法支持,提高质量管理的科学性。 相似文献