Service Mesh Comparison

What is a Service Mesh?

A service mesh is a dedicated infrastructure layer that adds features to a network between services. It allows to control traffic and gain insights throughout the system. Observability, traffic shifting (for canary releasing), resiliency features (such as circuit breaking and retry/timeout) and automatic mutual TLS can be configured once and enforced in a decentralized fashion. In contrast to libraries, which are used for similar functionality, a service mesh does not require code changes. Instead, it adds a layer of additional containers that implement the features reliably and agnostic to technology or programming language.

Who needs a Service Mesh?

The value of a service mesh grows with the number of services an application consists of. Logically, microservices architectures are the most common use cases for a service mesh. However, the specific interaction might be more relevant in regards to how a service mesh can improve the control, reliability, security, and observability of the services. Even a monolith could benefit from a service mesh and some concrete microservice applications might not.


Service Mesh Implementations


Service Mesh Interface

In the face of the vast variety of service mesh implementations, a group of companies, including Microsoft, Buoyant (developing Linkerd), and HashiCorp (developing Consul), joined forces to create a common standard for service mesh features. The result, the Service Mesh Interface specification, means to enable tools based on service mesh Features (such as Flagger for Canary Releasing automation) to be compatible with any service mesh rather than binding to a specific set of implementations. Service mesh users also benefit from the ability to change their service mesh implementation without changing the configuration.


How to choose a Service Mesh Implementation

While service meshes have no impact on the code, they change operations procedures and require familiarization with new concepts and technology. So - especially until the Service Mesh Interface is widely supported – adopting a service mesh implementation is a long term decision. Therefore, the implementations should be compared and tested carefully in advance. Choosing the most flexible service mesh with the most features seems logical at first. But the contrary could be true because features and flexibility are often paid with cognitive and technical complexity.

The goal of the evaluation is to figure out which features are important to you and how you benefit from them. As service meshes impact the latency and resource consumption, these disadvantages have to be measured, too.

We recommend to include the following steps in your decision process:


Service Mesh Comparison

Istio Linkerd 2 AWS App Mesh Consul Connect Maesh Kuma
Current version 1.4 2.6 1.6 1.0 0.3
License Apache License 2.0 Apache License 2.0 Closed Source Mozilla License Apache License 2.0 Apache License 2.0
Developed by Google, IBM, Lyft Buoyant AWS HashiCorp Containous Kong
Service Proxy Envoy linkerd-proxy Envoy defaults to Envoy, exchangeable Traefik Envoy
Ingress Controller Envoy / Own Concept any any any any
Governance see Istio Community see Linkerd Governance and CNCF Charter AWS see Contributing to Consul see Contributing notice see Contributing notice
Tutorial Istio Tasks Linkerd Tasks AWS App Mesh Getting Started HashiCorp Learn platform Maesh Example Kuma Kubernetes Quickstart
Platform Kubernetes Kubernetes ECS, Fargate, EKS, EC2 Kubernetes, Nomad, VMs (Universal) Kubernetes Kubernetes, VMs (Universal)
Automatic Sidecar Injection yes yes yes yes yes (per Node) yes
Used in production yes yes
Advantages Istio can be adapted and extended like no other Mesh. Its many features are available for Kubernetes and other platforms. Linkerd 2 is designed to be non-invasive and is optimized for performance and usability. Therefore, it requires little time to adopt. AWS App Mesh is integrated into the AWS landscape and it is fully managed for you. Consul Connect can be used in any Consul environment and therefore does not require a scheduler. The proxy can be changed and extended. Maesh focuses on a selection of features to achieve good usability and performance. Kuma supports both Kubernetes and plain VMs and allows you to customize the Envoy Proxy.
Drawbacks Istio's flexibility can be overwhelming for teams who don't have the capacity for more complex technology. Also, Istio takes control of the ingress controller. Linkerd 2 is deeply integrated with Kubernetes and cannot be expanded. Since Linkerd 2 does not rely on a third-party proxy, it cannot be extended easily. AWS App Mesh configuration cannot be migrated to an environment outside AWS. Consul Connect can only be used in combination with Consul. Maesh currently does not support transparent TLS encryption. Kuma is still in an early state. That might be a risk for production.
Supported Protocols
TCP yes yes yes yes yes yes
HTTP/1.1+ yes yes yes yes yes yes
HTTP/2 yes yes yes yes yes yes
gRPC yes yes yes yes yes yes
Service Mesh Interface compatibility
Traffic Access Control yes no no yes yes no
Traffic Specs yes no no no yes no
Traffic Split yes yes no no yes no
Traffic Metrics yes yes no no no no
Monitoring Features
Access Log Generation yes no (tap-Feature instead) yes yes yes yes
“Golden Signal” Metrics Generation yes yes yes yes, depending on the proxy used yes no*
Integrated, pre-configured Prometheus yes yes no no yes no
Integrated, pre-configured Grafana yes yes no no yes no
Per-Route Metrics no yes depending on the proxy used no
Dashboard yes, Kiali yes yes, AWS Cloud Watch yes, showing configuration and availability only no yes, showing configuration only
Compatible Tracing-Backends Jaeger, Zipkin, Solarwinds all Backends supporting OpenCensus AWS X-Ray Datadog, Jaeger, Zipkin, OpenTracing, Honeycomb Jaeger all Backends supporting OpenTracing
Routing Features
Load Balancing yes (Round Robin, Random, Least Connction) yes (EWMA exponentially weighted moving average) yes yes yes yes
Percentage-based Traffic Splits yes yes, through SMI yes yes yes yes
Header- and Path-based Traffic Splits yes no yes yes no no*
Resilience Features
Circuit Breaking yes no no no* yes no*
Retry & Timeout yes yes yes Timeout yes, Retry no* yes no*
Path-based Retry & Timeout no yes yes no no no
Fault Injection yes yes, by adding a deployment and a traffic split config no* no no*
Delay Injection yes no no* no no*
Security Features
mTLS yes yes, not for TCP In preview yes. Optional integration with Vault no yes
Authorization Rules yes no yes no yes

*Might be possible through manual configuration/templating of proxy.

Found a mistake? Or have something to add? We appreciate your issues or pull requests on GitHub!

Alternatives to Service Meshes

Undoubtedly, service mesh is a useful pattern and some current implementations are very promising. But they also go along with challenges such as cognitive and technical complexity. Like any tool, they are not useful in every situation. Sometimes it might be wise to keep existing well-known "boring" technology or to go with alternative solutions.

Libraries

Libraries are included in the microservices. The drawbacks are dependencies on specific technologies/languages, potential inconsistency in implementations and missing separation of service infrastructure and business logic.

However, the developer productivity can (at least in the short term) be better through the familiar use of libraries. Also, sometimes domain knowledge is needed, for example, to configure the fallback for a circuit breaker or to define business metrics. In these cases, a service mesh is of no use.

Service meshes require a change to the infrastructure. So it is not possible to use them if the infrastructure can or should not be changed. Sometimes the risk of changing the infrastructure is deemed too high even though services meshes can be applied to specific services only.

No (synchronous) Microservices

Service meshes are in particular helpful for synchronous communication. They usually rely on the HTTP protocol to transfer additional information and e.g. understand if a call failed.

One of the reasons for adopting microservices is their potential to reduce the time-to-market for software. Despite several drawbacks such as high latency and tight coupling, it's a common practice to implement microservice communication synchronously.

However, it is overseen that there are more approaches to perform microservice communication or to even avoid dependencies in the first place. (Read more in the free Microservices Recipes Book) Patterns like SCS and asynchronous communication aim to mitigate many problems of classic (synchronously communicating) microservices. Of course, you can have asynchronous microservices with HTTP e.g. by polling a feed for new events. As service meshes rely on HTTP, they would still be of some use. However, features e.g. for resilience are of less use as asynchronous communication supports resilience anyway.

Unjustifiably, monolithic architectures are often not even considered as a solution. Obviously, service meshes can only help a monolith with communication to other systems but not with internal communication.

Service Mesh Primer

Our free Service Mesh Primer explains the service mesh pattern und features in detail and contains examples for Istio.