The previous post on caching introduced the basics and briefly summarized the potential role and reasons for using caching in a microservices architecure. In this post I’ll dig a little deeper and look at some common cache deployment patterns in distributed systems.
In the monolith world it is very easy and convenient to provide local caches. Therefore it is often the case that in-memory caches in the single address space is implemented with abandon. In a distributed world, getting a caching system right requires careful analysis of data characteristics, usage patterns and evolutionary aspects.
The old adage “Premature optimization is the root of all evil” applies just as well to caching as any other area of design. So it’s best to evaluate hotspots and performance bottlenecks in the data layer sans caching and only then foray into the caching world with a few clear goals of what needs optimizing.
Having said that there are some low hanging fruits in the caching tree that can be utilized without much effort but still gain significant benefits in performance in terms of latency, throughput and availability. This is in large part due to the standardization and support of caching mechanisms in the HTTP protocol. So if the consumers of services are interfacing using the standard HTTP protocol, then the inbuilt caching support can be leveraged at either the client side, server side, somewhere in between or hybrid approaches. Plenty has been written about the details of how to utilize it, so I’ll just do a quick overview of the relevant headers and usage.
The HTTP Cache-Control header is used to control the caching mechanism along the request/response path. Any entity involved in the path uses the specifics in the header to determine if and how to cache data. Some of the important cache directives are:
no-cache – Content is not allowed to be cached
no-store – Content is not allowed to be cached
public – Content is allowed to be cached by all
private – Content is private to specific users and as such can be only cached by clients and not by intermediaries
max-age – Specified in seconds, after which time, the content is stale and must be refetched
The If-Modified-Since request header and the Last-Modified response header are used to optimize on sending the same data repeatedly between the client and the server. When a client initially requests a resource, the server responds with the resource and also adds the Last modified header with the modification timestamp. The client associates this timestamp with the resource and stores it. When requesting the same resource again in the future, the client adds the If-Modified-Since header, set to the saved timestamp. At the server, if the resource has not been modified since that timestamp, a ‘304 Not Modified‘ response is sent back.
The ETag and If-None-Match also work similarly as above. The server adds the ETag header set to a unique value, typically the hash of the resource, when responding to requests. The client stores this off. On future requests, the client adds the If-None-Match header set to the ETag value. The server compares the hash values and if no change to the resource, responds with the ‘304 Not Modified‘.
So both of these request/response scenarios prevent sending the same data again if no modifications have occured. Although the network roundtrips between the client and server still happens, the savings on serializing/deserializing of data and saving of bandwidth, latency can be significant.
In addition to all standard browsers supporting caching, Reverse proxies such as Varnish and Squid are heavily used to provide caching at the server side in practice. So it is well worth leveraging all of these tools for caching if you are using the HTTP protocol.
When the volume/throughput of the data to be cached is too high for one cache server, it can be sharded or partitioned similar to how backend data stores are sharded. Sharding caches has many of the same advantages as well as complexities that arise from sharding backends. On the one hand it improves scalability, throughput, availability, fault tolerance, geolocation etc. On the other hand, it also adds complexity in design, development and operations. Once a sharding scheme is implemented, it is significantly hard to change the scheme. So it needs careful analysis to pick the right scheme at the outset and getting it right is not a trivial task.
The exact algorithm of how to shard the data will depend on the data and is highly application specific. For modern distributed applications however, there are some common patterns that are generally used. A cache sharding function maps a particular piece of data to a particular shard. This is typically a consistent hashing function. A good sharding function should result in distributing the data evenly over the set of shards. Some systems also use range based sharding where data is split into ranges and mapped to shards. The selection of the appropriate sharding key is equally important to ensure efficient and even mapping of the data.
Some final thoughts
If the cache is in the critical path of the application, by which I mean that if the failure of the cache could cause major degradation of the system performance, then caches are replicated as well. So now you have sharded and replicated caches. As you see, the caching system grows into a complex network of several caches and also in layers from the client side to the server side. Each layer introduces additional complexity. If the data being cached is dynamic and frequently changing, caching could adversely affect the performance. You might be better off without caching under such circumstances. A word of caution is that the greater the number of caches, the harder to maintain them especially when it comes to cache invalidation. The operational costs could easily outweigh the benefit of caching.
As such caching (beyond the basic HTTP mechanism perhaps) is no silver bullet. Any caching added should be based on actual measured needs rather than assumed ones. However when designed based on careful analysis and benchmarking, caching can lead to significant improvements. Web scale companies like Netflix, Twitter achieve their humongous scale in part due to caching.
Note that this article is part of the Microservices series. You can read the previous ones here : Prelude, Introduction, Evolution, Guiding Principles, Ubiquitous Language, Bounded Contexts, Communication Part 1, Communication Part2, Communication Part 3, Communication Part 4, Communication Part 5, Kafka, Time Sense, Containers, API Gateways, Service Mesh, Caching Part1
EVCache , the Netflix Caching system – A good read
Redis vs Memcached article
Twitter infrastructure, talks about their caching system