RSS Feed

January, 2015

  1. Simple cache using binding & memoize in Clojure

    January 4, 2015 by xudifsd

    In this post I will talk about a simple way to do cache in Clojure application.

    I am building an API server in Clojure, a lot of permission check should be done before serving the request, so I abstract it to some kind of ACL checker. Our server is built with ring SPEC, and because I want the ACL checker could be independent and reusable, I just abstract ACL checker as a ring handler, and only passes request to actual handler if it passed ACL check, this is very sweet because I can combine several ACL checker together just like ring middleware, this also makes permission requirements very obvious in router.

    The problem with this, however, is ACL checker will do many redundant database reads. For example, we have following ACL checker:

    and many APIs need these two checkers at the same time, but clearly these two checkers all need to get user from database, and what they get is identical, because the same request passed to them! And we have many cases like this, the most obvious way to reduce database read is add extra cache server before database, and deploy write-back or write-through policy to serve the write, but this requires changing the architecture and many models’ code. I just want a simple cache that just work for this case, and do not require global changes.

    Then I realized I can combine binding & memoize to construct a simple cache. The basic idea is I will not invoke model function directly, I just binding many dynamic vars with memoized model function before the check, and invoke dynamic var in checker instead of model function:

    The advantage of this is I don’t need to refactor much code, just binding the dynamic name and replace actually function name with it. This optimization is quite sucessful: I didn’t spend much time to refactor code structure, and also didn’t complicated code, but the profile result shows I reduced 1 ~ 2 database read in every request.

    I think we can use this technique whenever following condition meet:

    • only needs to read
    • many duplicated reads
    • cache is not long-lived

    So, after this optimization, I also optimized our email notification system with same technique, because our notification system scheduled at fixed rate, and it also needs to read database many times with many duplicated reads.