Service Mesh Comparison

What is a Service Mesh?

A service mesh is a dedicated infrastructure layer that adds features to a network between services. It allows to control traffic and gain insights throughout the system. Observability, traffic shifting (for canary releasing), resiliency features (such as circuit breaking and retry/timeout) and automatic mutual TLS can be configured once and enforced in a decentralized fashion. In contrast to libraries, which are used for similar functionality, a service mesh does not require code changes. Instead, it adds a layer of additional containers that implement the features reliably and agnostic to technology or programming language.

Service Mesh Architecture

An image showing a comparison of a two-service microservice architecture with and without a Service Mesh.

Without a Service Mesh,
... each Microservice implements Business Logic and Cross Cutting Concerns (CCC) by itself.

With a Service Mesh,
... many CCCs like traffic metrics, routing, and encryption are moved out of the Microservice and into a proxy. Business logic and business metrics stay in the Microservices. Incoming and outgoing requests are transparently routed through the proxies. In addition to a layer of proxies (Data Plane), a Service Mesh adds a so-called Control Plane. It distributes configuration updates to all proxies and receives metrics collected by the proxies for further processing, e.g. by a monitoring infrastructure such as Prometheus.


Who needs a Service Mesh?

The value of a service mesh grows with the number of services an application consists of. Logically, microservices architectures are the most common use cases for a service mesh. However, the specific interaction might be more relevant in regards to how a service mesh can improve the control, reliability, security, and observability of the services. Even a monolith could benefit from a service mesh and some concrete microservice applications might not.


Service Mesh Implementations



How to choose a Service Mesh Implementation

While service meshes have no impact on the code, they change operations procedures and require familiarization with new concepts and technology. So - especially until the Service Mesh Interface is widely supported – adopting a service mesh implementation is a long term decision. Therefore, the implementations should be compared and tested carefully in advance. Choosing the most flexible service mesh with the most features seems logical at first. But the contrary could be true because features and flexibility are often paid with cognitive and technical complexity.

The goal of the evaluation is to figure out which features are important to you and how you benefit from them. As service meshes impact the latency and resource consumption, these disadvantages have to be measured, too.

We recommend to include the following steps in your decision process:


Service Mesh Comparison

Istio Linkerd 2 AWS App Mesh Consul Connect Traefik Mesh (formerly Maesh) Kuma Open Service Mesh (OSM)
Current version 1.8 2.9 1.9 1.4 0.7 0.5
License Apache License 2.0 Apache License 2.0 Closed Source Mozilla License Apache License 2.0 Apache License 2.0 MIT License
Developed by Google, IBM, Lyft Buoyant AWS HashiCorp Containous Kong Microsoft
Service Proxy Envoy linkerd-proxy Envoy defaults to Envoy, exchangeable Traefik Envoy Envoy
Ingress Controller Envoy / Own Concept any Envoy and Ambassador in Kubernetes any any Nginx, Azure Application Gateway Ingress Controller
Governance see Istio Community and Open Usage Commons see Linkerd Governance and CNCF Charter AWS see Contributing to Consul see Contributing notice see Contributing notice, Governance, and CNCF Charter see Microsoft OpenSource
Tutorial Istio Tasks Linkerd Tasks AWS App Mesh Getting Started HashiCorp Learn platform Traefik Mesh Example Install Kuma on Kubernetes Install OSM on Kubernetes
Used in production yes yes
Advantages Istio can be adapted and extended like no other Mesh. Its many features are available for Kubernetes and other platforms. Linkerd 2 is designed to be non-invasive and is optimized for performance and usability. Therefore, it requires little time to adopt. AWS App Mesh is integrated into the AWS landscape and it is fully managed for you. Consul Connect can be used in any Consul environment and therefore does not require a scheduler. The proxy can be changed and extended. Traefik Mesh focuses on a selection of features to achieve good usability and performance. Kuma supports both Kubernetes and VMs - including hybrid multi-zone deployments - and allows you to customize the Envoy Proxy.
Drawbacks Istio's flexibility can be overwhelming for teams who don't have the capacity for more complex technology. Also, Istio takes control of the ingress controller. Linkerd 2 is deeply integrated with Kubernetes and cannot be expanded. Since Linkerd 2 does not rely on a third-party proxy, it cannot be extended easily. AWS App Mesh configuration cannot be migrated to an environment outside AWS. Consul Connect can only be used in combination with Consul. Traefik Mesh currently does not support transparent TLS encryption. Kuma is still in an early state. That might be a risk for production.
Supported Protocols
TCP yes yes yes yes yes yes yes
HTTP/1.1+ yes yes yes yes yes yes yes
HTTP/2 yes yes yes yes yes yes yes
gRPC yes yes yes yes yes yes yes
Sidecar / Data Plane
Automatic Sidecar Injection yes yes yes yes yes (per Node) yes yes
CNI plugin to avoid pod network priviledges yes yes yes no no yes no
Platform and Extensibility
Platform Kubernetes Kubernetes ECS, Fargate, EKS, EC2 Kubernetes, Nomad, VMs (Universal) Kubernetes Kubernetes, VMs (Universal) Kubernetes
Cloud Integrations Google Cloud, Alibaba Cloud, IBM Cloud DigitalOcean AWS Microsoft Azure
Mesh ExpansionExtension of the Mesh by containers/VMs outside the cluster yes no yes, within AWS yes no yes
Multi-Cluster MeshControl and observe multiple clusters yes yes yes no yes no
Service Mesh Interface Compatibility
Traffic Access Control yes no no yes yes no yes
Traffic Specs yes no no no yes no yes
Traffic Split yes yes no no yes no yes
Traffic Metrics yes yes no no no no yes
Monitoring Features
Access Log Generation yes no (tap-Feature instead) yes yes yes yes
"Golden Signal” Metrics Generation yes yes yes yes, depending on the proxy used yes no* yes
Integrated, pre-configured Prometheus yes yes, option to use own installation no no yes yes yes
Integrated, pre-configured Grafana yes yes no no yes yes yes
Per-Route MetricsCollect values for each HTTP endpoint individually experimental yes depending on the proxy used
Dashboard yes, Kiali yes yes, AWS Cloud Watch yes no yes, showing configuration only no
Compatible Tracing-Backends Jaeger, Zipkin, Solarwinds all Backends supporting OpenCensus AWS X-Ray Datadog, Jaeger, Zipkin, OpenTracing, Honeycomb Jaeger all Backends supporting OpenTracing Jaeger
Integrated, pre-configured Tracing-Backends yes, Jaeger or Zipkin for nonprod environments yes, Jaeger yes, AWS X-Ray no yes, Jaeger yes, Jaeger
Routing Features
Load Balancing yes (Round Robin, Random, Weighted, Least Request) yes (EWMA, exponentially weighted moving average) yes yes (Round Robin, Random, Weighted, Least Request, Consistent Hash) yes yes yes
Percentage-based Traffic Splits yes yes, through SMI yes yes yes, through SMI yes yes, through SMI
Header- and Path-based Traffic SplitsRouting rules based on request header and path yes no yes yes no no* Header-based via SMI
Resilience Features
Circuit Breaking yes no yes yes yes yes no
Retry & Timeout yes yes yes yes yes no* no
Path-based Retry & TimeoutDifferent retry and timeout config for each endpoint yes yes yes yes no no no
Fault Injection yes yes, by adding a deployment and a traffic split config no* no yes no
Delay Injection yes no no* no yes no
Security Features
mTLS yes yes yes yes no yes yes
External CA certificate and key pluggable e.g. Vault, cert-manager yes, CA cert pluggable and CA integration (experimental) yes yes HashiCorp Vault, ACM Private CA, custom CA no yes HashiCorp Vault, cert-manager
Authorization Rules yes no no yes no yes yes
*Might be possible through manual configuration/templating of proxy

Found a mistake? Or have something to add? We appreciate your issues or pull requests on GitHub!


That's just a table.
For advice, trainings, and support around Kubernetes and Service Mesh send an email to info@innoq.com


Alternatives to Service Meshes

Undoubtedly, service mesh is a useful pattern and some current implementations are very promising. But they also go along with challenges such as cognitive and technical complexity. Like any tool, they are not useful in every situation. Sometimes it might be wise to keep existing well-known "boring" technology or to go with alternative solutions.

Libraries

Libraries are included in the microservices. The drawbacks are dependencies on specific technologies/languages, potential inconsistency in implementations and missing separation of service infrastructure and business logic.

However, the developer productivity can (at least in the short term) be better through the familiar use of libraries. Also, sometimes domain knowledge is needed, for example, to configure the fallback for a circuit breaker or to define business metrics. In these cases, a service mesh is of no use.

Service meshes require a change to the infrastructure. So it is not possible to use them if the infrastructure can or should not be changed. Sometimes the risk of changing the infrastructure is deemed too high even though services meshes can be applied to specific services only.

No (synchronous) Microservices

Service meshes are in particular helpful for synchronous communication. They usually rely on the HTTP protocol to transfer additional information and e.g. understand if a call failed.

One of the reasons for adopting microservices is their potential to reduce the time-to-market for software. Despite several drawbacks such as high latency and tight coupling, it's a common practice to implement microservice communication synchronously.

However, it is overseen that there are more approaches to perform microservice communication or to even avoid dependencies in the first place. (Read more in the free Microservices Recipes Book) Patterns like SCS and asynchronous communication aim to mitigate many problems of classic (synchronously communicating) microservices. Of course, you can have asynchronous microservices with HTTP e.g. by polling a feed for new events. As service meshes rely on HTTP, they would still be of some use. However, features e.g. for resilience are of less use as asynchronous communication supports resilience anyway.

Unjustifiably, monolithic architectures are often not even considered as a solution. Obviously, service meshes can only help a monolith with communication to other systems but not with internal communication.

Service Mesh Primer

Our free Service Mesh Primer explains the service mesh pattern und features in detail and contains examples for Istio.