Lease is a very useful and pervasive technique in distributed system, it can be used to authorize other nodes in the system. For example, in master election, follower nodes would use lease to promise current master that they will not elect another master until the lease is expired. But the problem is how can you make sure that current master will not think it still holds the lease after lease grantors think otherwise?
It is not an easy question in practice, because you have to cope with clock skew and asynchronous network. So you can not grant lease on absolute time, and you have to assume your package may have arbitrary time delays.
Normally, we can make a conservative guess about network delay. To be absolutely safe, we have to be very conservative. So we may build a system that its lease is valid for 60 seconds, but the master have to assume the package has been delayed, say 30 seconds, and the master refresh its lease every 10 seconds. This looks good if the lease duration is long enough, but it won’t work if the lease expired quickly. The benefit of shorter lease is higher availability in the system, because long-lived lease will prevent system from working for longer time if the master crashed.
I didn’t have a very good solution to this until I read a paper. The paper proposed a novel way to perform reads with high throughput and low latency in Paxos system without sacrificing consistency. It is especially useful in wide-area scenarios. Apart from the main topic of the paper, it also has a new way to grant & refresh leases without depending on external clock synchronization.
As above picture shown, it uses guard to bound the promise duration: if grantor does not receive promise_ACK during t_guard, lease would expire at T3 + t_guard + t_lease. If holder does not receive promise during t_guard, the lease won’t be activated at all. The receival of promise_ACK only shorten the lease duration. When renewing active leases, there is no need to send the guard anymore, the most recent promise_ACK plays the role of the guard.
With this protocol, we can use very short lease in the system because we make no guess at all. In the evaluation section of the paper, authors set lease duration to 2 seconds, and let grantor renew the current lease after 500ms. In this case if the holder crashed or unavailable, the lease won’t prevent the system from working for more than 2 seconds.