Communication-based prevention of useless checkpoints in distributed computations |
| |
Authors: | J-M Hélary A Mostefaoui RHB Netzer M Raynal |
| |
Affiliation: | (1) IRISA, Université de Rennes, Campus de Beaulieu, F-35042 Rennes Cedex, France (e-mail: {helary,mostefaoui,raynal}@irisa.fr), FR;(2) Computer Science Department, Brown University, Box 1910, Providence, RI 02921, USA (e-mail: rn@cs.brown.edu), US |
| |
Abstract: | Summary. A useless checkpoint is a local checkpoint that cannot be part of a consistent global checkpoint. This paper addresses the following
problem. Given a set of processes that take (basic) local checkpoints in an independent and unknown way, the problem is to design communication-induced checkpointing protocols
that direct processes to take additional local (forced) checkpoints to ensure no local checkpoint is useless.
The paper first proves two properties related to integer timestamps which are associated with each local checkpoint. The first
property is a necessary and sufficient condition that these timestamps must satisfy for no checkpoint to be useless. The second
property provides an easy timestamp-based determination of consistent global checkpoints. Then, a general communication-induced
checkpointing protocol is proposed. This protocol, derived from the two previous properties, actually defines a family of
timestamp-based communication-induced checkpointing protocols. It is shown that several existing checkpointing protocols for
the same problem are particular instances of the general protocol. The design of this general protocol is motivated by the
use of communication-induced checkpointing protocols in “consistent global checkpoint”-based distributed applications such
as the detection of stable or unstable properties and the determination of distributed breakpoints.
Received: July 1997 / Accepted: August 1999 |
| |
Keywords: | :Asynchronous distributed system – Checkpointing protocols – Fault-Tolerance |
本文献已被 SpringerLink 等数据库收录! |
|