A cache is a data storage layer that typically fronts a durable data storage system and is used to improve speed and performance of data access. The data held in cache is typically transient in nature.
The primary motivation for using a cache is to improve the speed of access of frequently used data. Caches are typically based on RAM storage which is orders of magnitude faster to access compared to disk based storage. In addition to access time, complex querying and serialization/de-serialization of data also contribute significantly to the overall latency. Results of expensive or complex computations (even when done in-memory) are also good candidates for caching. So is data that results from remote calls over the network. In all these scenarios, caching provides significant improvement in speed and performance.
Caching and Microservices
Microservices are typically built with the expectation of high scaling needs. So performance is a key criteria when processing at scale. In a microservices architecture, services are typically expected to maintain their own data stores and this might in fact result in many more dips into the DB than with a traditional monolith with a single DB. If these are handled sequentially as part of a workflow spanning multiple services, this will cause an untoward increase in overall latency in processing the requests. Services interacting over the network also adds latency as compared to in-process calls. A proven way to deal with these issues is to use caching at various points in the system.
Perhaps a more important factor that comes into play with microservices is scaling. With microservices, scaling the application layer is easy – add more service instances. How do we scale the data layer in step with the application layer ? This will require scaling the backend data stores. However scaling backend data stores is easier said than done. It is complex and expensive to scale them, especially when auto-scaling on demand, as is common in cloud native deployments. Caching can help with scaling the data layer. It is cheaper and easier to add and scale caches fronting the backend data stores. A shared caching layer among all the service instances of the same type, rather than individual in-memory caches per service could also help avoid fragmenting the data and improve performance. Note that this option requires externalizing session state and is a whole topic by itself but well worth looking into. Externalizing session state in a shared cache layer confers the additional benefits of resiliency – when instances fail, another instance can be spun up to replace the failed instance and it continues processing from where the previous instance left off. It is worth pointing out here that this sharing is not in violation of the ‘Share nothing’ principle since the sharing is between instances of the same service type and not among different services.
Now that I’ve laid out some good reasons as to why caching might be beneficial to microservices or in general to any distributed system that needs high performance and resiliency at scale, let’s look into some basics of caching.
Caching patterns are based around how a cache miss is handled. Which part of your system is responsible for updating the cache when a piece of data is searched for in a cache but not found ? Is it the application or the cache itself ? The answer to this forms the crux of the commonly found cache patterns. The read and write operations can either be inlined in the cache or handled outside of the cache.
Cache Inline patterns
In cache inline patterns, the data is read and written through the cache. The cache is responsible for talking to the data source and keeping the cache and data source updated. Below are how read and write are accomplished.
When a cache miss occurs, the cache talks to the data source and fetches the data. It responds to the application as well as updates the cache for future requests.
When a piece of data is to be updated, it is updated in the cache by the application. The cache then updates the data source synchronously.
Write behind/Write back
When a piece of data is to be updated, it is updated in the cache by the application. The data source is then updated asynchronously by the cache.
Cache aside patterns
When a cache miss occurs, the application talks to the data source and fetches the data. The application then updates the cache as well.
When a piece of data is to be updated, it is updated directly at the data source by the application. The entry in the cache is updated or not depending on how the cache is setup. If the data in the cache is not updated but data consistency is required, then the cache entry must be invalidated due to the modification. If consistency is not required and stale data is acceptable, then the cache can continue to serve the stale data until it either expires or evicted.
Notice that both Read through and Read aside do a lazy initialization. The data is read and populated into the cache only when first required and hence the first request is always of highest latency. This can sometimes lead to a thundering herd problem whereby several concurrent requests to the same data all result in a cache miss and subsequently those requests are routed to the data backend resulting in redundant and unnecessary overload on the backend. If this is a cause for concern, it can be alleviated by warming or pre-heating the cache by pre-emptively loading the target subset of data into the cache.
Cached data is usually associated with a time limit that determines its validity. A TTL or an expiration date is associated with each entry and once past that age/time, the entry in the cache is invalid. Choosing the right TTL requires careful analysis of the application and the data characteristics.
Two important considerations are the frequency with which the data changes and the constraints around the accuracy of the data being served out of the cache. Do you always need the latest and greatest data or is it ok to serve slightly outdated/stale data ? If data does not change frequently and or some inconsistencies are tolerable, a longer TTL is used and this results in better performance. An often overlooked detail w.r.t TTLs, is to have the TTLs set such that they all expire simultaneously which could potentially overload the backend. A good practice that prevents this issue is to add a small jitter offset to the TTLs such that they stagger in expiration times.
Using cache aside pattern for updating the cache when data is modified can result in data inconsistencies in the cache especially when using distributed, replicated caches. In such a case, the data in the cache needs to be invalidated by some other means prior to the expiration time. It is well known, thanks to this saying, that cache invalidation is not an easy problem to solve:
There are some well known patterns that can be used to deal with this, such as purging, banning, setting TTLs to low values to force early expirations, having a background thread to check for invalid data etc. The Cache systems usually provide several options (although not perfect) to deal with this issue and the right solution depends on the individual applications.
The cache size is typically smaller relative to the size of the backend. So a cache can potentially become full and run out of space to cache newer data. In such cases, existing data is evicted to make room for newer data. Eviction is based on some rather well known strategies and most cache systems support at least a few of these.
Least Frequently used – Removes items that have been accessed the least number of times.
Least Recently used – Removes items that have not been accessed for the longest time. This is one of the most popular strategies for eviction.
FIFO, LIFO, Random selection are some of the other options. A more comprehensive list is found on Wikipedia.
Although disk based databases are much slower than in-memory caches, we continue to use them rampantly owing to the cheaper cost and durability offered by them. What if we could get rid of these restrictions ? RAM is getting cheaper over time and so it is not as much of a limiting factor now. So durability continues to the single most important reason still standing in the way. In-memory databases address this issue. They persist the data and offer durability by writing to disk based storage (or special hardware) or to replicas.
Although these in-memory databases writing to disk are similar to regular databases, they are considered in-memory since all of their reads are off the memory and the disk/replica is only used for durability. A few well knowns in this space are Oracle TimesTen (which is what I used at my previous company), VoltDB for relational databases. Redis and CouchDB for Nosql. Memcached is an in-memory cache but does not offer durability. Redis is the most popular choice currently and heavily used by many web scale applications.
This post has been a quick peek at the extensive world of caching. When designed and used correctly, it can offer significant improvement in performance, scalability, availability and resilience. This is especially true for distributed, cloud native architectures such as microservices. The key is to analyze the data characteristics and usage patterns to design a good caching solution. I hope to write more on this in my next post.
Note that this article is part of the Microservices series. You can read the previous ones here : Prelude, Introduction, Evolution, Guiding Principles, Ubiquitous Language, Bounded Contexts, Communication Part 1, Communication Part2, Communication Part 3, Communication Part 4, Communication Part 5, Kafka, Time Sense, Containers, API Gateways, Service Mesh