Distributed Fault Recovery

Posted by Shane on March 29, 2018

Please read slides on fault recovery alogrithm prepared by Prof. Gagan first.

Claim: Orphan messages are not acceptable by this algorihm.
Proof:

Assume an orphan message M exit between thread Y and Z. Without the loss of generality, consider Z as the sender and Y as the receiver. By definition of orphan message, M must be sent after \(t_3\) and received before \(t_2\). By the checkpoint algorithm, no message is allowed to sent at the interval between tentative point and commit point. Thus, M could only be sent after \(t_5\). In other words, \(t_5 < t_6\) and \(t_7 < t_2\) .

By checkpoint algorithm, we could obtain the conclusion \(t_2 < t_4 < t_5\). Revisit the relations at the end of above paragragh, the final time chain would be \(t_7 < t_6\).

Contradiction: The sending time is larger than the receiving time!