Are service meshes going to be a ‘thing’ this year in the microservices ecosystem ? The tides are running in its favor and at least some seem to think that this is the year that they gain prominence as a must have – in the arsenal of tools required to deal with the complexities of microservices. Will we start hearing of micromeshes – A set of microservices with service meshes ? Micromesh – You heard it here first :-). Given that it such a nascent field, let’s dig in and see what the buzz is all about.
What is a service mesh ?
“A service mesh is a dedicated infrastructure layer for handling service-to-service communication. It’s responsible for the reliable delivery of requests through the complex topology of services that comprise a modern, cloud native application. In practice, the service mesh is typically implemented as an array of lightweight network proxies that are deployed alongside application code, without the application needing to be aware.”
This is the definition from Buoyant, the makers of two of the well known meshes – Linkerd and Conduit. A little known fact, the name service mesh is attributed to the founder of Buoyant, William Morgan.
The service mesh is not just any old regular infrastructure layer. The distinguishing factor really is that it forms a distributed network of interconnected proxies, typically deployed as sidecars alongside the services. The services themselves are agnostic about the proxies. All inter service communication is routed through the proxies.
So it is an overlay network – a network of proxies within the network of microservices. Hmm..like we didn’t have enough moving parts with microservices, now let’s just double that number. A fair enough concern, so let’s look at the why of a service mesh.
The Backdrop to a Service mesh
“Smart endpoints and dumb pipes” is one of the core philosophies adopted by a microservices architecture when it comes to integrating services. This philosophy stems out of the backlash to the bloated, centralized Enterprise Service Bus (ESB) days of yore. This is of course a step in the right direction, however there are issues in making the endpoints smart – how smart is too smart, how thick or thin should the services be ? Given that in a microservices system, the complexity has been pushed to the interactions between services, how and where should we handle this complexity ? Should it be inside the individual services themselves or something outside of it ? If the answer is inside, then the question to consider is if it truly belongs with the service business logic or is it more of an infrastructure related concern ? Remember, that we do want to keep our services clean, in that we want them to focus on purely business functions (SRP, Unix philosophy etc). If the answer is outside, then we have to be careful in not falling back into the trap of the likes of an ESB.
So what are some of the concerns that we need to address to handle the communication between the services ?
- Service discovery
- Load balancing
- Traffic control (rate limits, throttling, back pressure etc)
- Reliability of communication (delivery guarantees, ordering guarantees, retries, QOS etc)
- Resiliency (circuit breakers, timeouts, health checks etc)
- Security ( authentication/authorization, encryption etc)
To put this in perspective, these are not entirely new issues but inherent in any distributed system since they have to deal with network fallacies. The difference here is that the issues are magnified in orders of magnitude because of the number of microservices involved.
One option that some have adapted is an API Gateway. In my previous post, I touched upon this area somewhat – the API gateway is handling some of the smarts required to deal with the interactions between the outside world and the services, such as service discovery, routing, monitoring, traffic control etc. The API Gateway however has several drawbacks in addressing the needs listed above. For starters, it can become the single point of failure (SPOF) and bloated with functionality – almost dangerously veering into the ESB territory. Second off API gateways are traditionally and practically designed to handle the client facing traffic. So even though it can solve some of the issues for traffic from external clients to microservices, it is not going to solve all of the issues. We need a complete solution or at least something that is complementary to the API Gateway.
As another option, an argument could be made that some reliability, monitoring, traffic control, etc could be done at the lower layers (4/3) of the network stack. This is indeed true. However these lower level primitives fall far short of dealing with issues that are of concern only at the application layer. Think end-to-end argument here. It makes the case that for most of the functions listed above, the application layer is the only layer where they make sense semantically and thus can only be implemented successfully in that layer.
The early adopters of SOA/Microservices like Netflix, Twitter etc handled these issues by building libraries in-house (Netflix Hystrix, Finagle) that were then used by all the services. The problem with this approach is that it is hard to scale libraries across potentially 100s or 1000s of microservices and also importantly, they are brittle in the sense that they cannot be easily adapted to the technology stack of your choice if there is a mismatch.
The service mesh really is akin to these libraries except they are independent processes running adjacent to the services. The services connect to the proxies which in turn talk to other proxies (HTTP1.1/2, grpc mostly) The fact that they are independent processes, distributed and running at or just below the application layer solves the issues that plague the other solutions above.
Service Mesh Architecture
The service mesh consists of a data plane where all the service to service communication occurs through the sidecar proxies. (The interconnections of all the proxies makes a mesh – hence the name.) The mesh also consists of a control plane – which ties all the independent sidecar proxies into a distributed network and sets policies that are enacted by the data plane.
The control plane is where the various policies for service discovery, routing, traffic control etc is defined. They can be of varying degrees of scope from global to narrow slices of the mesh. The data plane is responsible for applying and enforcing these policies when communicating.
Service meshes are really a product of their time – Containers and Container orchestrators like Docker and Kubernetes are both a reason for needing service meshes while also enabling, by the ease with which these complex systems can be deployed and managed.
The latest in the service mesh space – Conduit from Buoyant is specifically built for Kubernetes. Linkerd is the older service mesh from the same company but works with other frameworks as well. There is also the Istio platform deployed with Envoy proxy sidecars.
It will be interesting to see how this evolves in the near future. Ideally I would love to see the different platforms converging to a common standard or protocol analogous to what happened with TCP/IP. Ideally being the key word there. Regardless of which platform one chooses, the value add from service meshes to a microservice ecosystem is undeniable. They allow microservices to delegate the undifferentiated heavy lifting of managing the service interaction complexities neatly to the service mesh. Service mesh adoption case studies are slowly popping up but I haven’t had the chance to look them up yet. I’ll post followups here when I do.
Until then, the below are all interesting reads on service mesh.
Note that this article is part of the Microservices series. You can read the previous ones here : Prelude, Introduction, Evolution, Guiding Principles, Ubiquitous Language, Bounded Contexts, Communication Part 1, Communication Part2, Communication Part 3, Communication Part 4, Communication Part 5, Kafka, Time Sense, Containers, API Gateways
Blog post by Phil Calcado is a good read.