Recently, I read many papers & articles about latency, especially tail latency, so this post is served as my notes on my reading.
Latency is a very important metric for a system, because it affects user experience a lot. In distributed system, tail latency is much more important than in single node system. This is based on a simple observation:
if you have components that 1% of requests exhibit high latency, and if request from client have to touch 100 components in a distributed system, then you’ll have 64% of client requests exhibit high latency. That’s awful user experience. This is a fundamental property of scaling systems: you need to worry not just about latency, but tail latency. High performance equals high tolerances. At scale you can’t ignore tail latency.
From this article, we can know that latency can arise from many sources, from hardware to software. It also links to interesting discussion on TCP vs SPDY.
Authors of another paper discussed their effort on tuning web server software, they found out that head-of-line blocking in kernel queue has great impact on latency. This kind of blocking is sometime helpful for throughput on disk write, but can severely degrades latency. Also, if you have blocking invocation in an event loop, it will also causes head-of-line blocking, and will makes tail latency higher, by removing blocking invocation out of event loop, we can solve this kind of problem.
Another article from linkedin discussed how they solve high tail latency by installing another net card to serve specific network request.
Two examples above show how difficult it is to identify the source of latency. As always, optimization requires a lot of insight on how system works.
Another interesting paper proposed a runtime coordination system that coordinate GC in distributed system written in GC language. GC usually causes hiccup for system, especially major collection. This paper shows that minor GC have little impact on batch workload like Spark, but will affect real-time application significantly, GC is a key contributor to stragglers in many interactive system. By deploying coordinate system, PageRank computation on Spark completed 15% faster, 99.9% tail update latency on Cassandra is improved from 3.3ms to 1.6ms, the worst case from 83ms to 19ms, which is quite impressive.
High tail latency is not only caused by GC, head-of-line blocking etc, it’s prevalent. This paper shows that 1% tail latency is unpredictable no matter what config, programming language or OS is. After removing 1% tail latency from statistic, the server’s behavior is predictable. So that, many systems could focusing on provide QoS guarantees for statistical measures such as 99th latency percentile.
Ok, enough talk about background, let’s read some really awesome stuff.
Jeff Dean gives a talk about tail latency in 2013. Here is the slide and summary article. Instead of trying to identify the source of latency & eliminate it, he tries to live with it. The talk gives some general techniques to cut the tail, like: hedged requests, tied requests, micro-partitions, selective replication and latency-induced probation. These techniques is general enough to applied to many scenarios.
The talk compared faults tolerance and variability tolerance. To tolerate variability, one has to:
make a predictable whole out of unpredictable parts
and this is where the title of this post came from. It’s very similar to:
make a reliable whole out of unreliable parts
which is the key point of distributed system. And since latency is more and more important as a metric, predictable latency should also be more and more important. I think this is the direction that next age distributed system should heading.