Service Mesh Comparison

What is a Service Mesh?

A service mesh is a dedicated infrastructure layer that adds features to a network between services. It allows to control traffic and gain insights throughout the system. Observability, traffic shifting (for canary releasing), resiliency features (such as circuit breaking and retry/timeout) and automatic mutual TLS can be configured once and enforced in a decentralized fashion. In contrast to libraries, which are used for similar functionality, a service mesh does not require code changes. Instead, it adds a layer of additional containers that implement the features reliably and agnostic to technology or programming language.

Service Mesh Architecture

An image showing a comparison of a two-service microservice architecture with and without a Service Mesh.

Without a service mesh,
... each microservice implements business logic and cross cutting concerns (CCC) by itself.

With a service mesh,
... many CCCs like traffic metrics, routing, and encryption are moved out of the microservice and into a proxy. business logic and business metrics stay in the microservices. Incoming and outgoing requests are transparently routed through the proxies. In addition to a layer of proxies (data plane), a service mesh adds a so-called control plane. It distributes configuration updates to all proxies and receives metrics collected by the proxies for further processing, e.g. by a monitoring infrastructure such as Prometheus.

Who Needs a Service Mesh?

The value of a service mesh grows with the number of services an application consists of. Logically, microservices architectures are the most common use cases for a service mesh. However, the specific interaction might be more relevant in regards to how a service mesh can improve the control, reliability, security, and observability of the services. Even a monolith could benefit from a service mesh and some concrete microservice applications might not.

Service Mesh Implementations

How to Choose a Service Mesh Implementation

While service meshes have no impact on the code, they change operations procedures and require familiarization with new concepts and technology. So - especially until the Service Mesh Interface is widely supported – adopting a service mesh implementation is a long term decision. Therefore, the implementations should be compared and tested carefully in advance. Choosing the most flexible service mesh with the most features seems logical at first. But the contrary could be true because features and flexibility are often paid with cognitive and technical complexity.

The goal of the evaluation is to figure out which features are important to you and how you benefit from them. As service meshes impact the latency and resource consumption, these disadvantages have to be measured, too.

We recommend to include the following steps in your decision process:

Service Mesh Comparison

Istio Linkerd AWS App Mesh Consul Traefik Mesh (formerly Maesh) Kuma Open Service Mesh (OSM)
Current version 1.11 2.10 1.10 1.4 1.3 0.9
License Apache License 2.0 Apache License 2.0 Closed Source Mozilla License Apache License 2.0 Apache License 2.0 Apache License 2.0
Initiated by Google, IBM, Lyft Buoyant AWS HashiCorp Traefik Labs Kong Microsoft
Service Proxy Envoy Linkerd2-proxy Envoy defaults to Envoy, exchangeable Traefik Proxy Envoy Envoy
Ingress Controller Envoy / Own Concept any Envoy and Ambassador in Kubernetes any any Nginx, Azure Application Gateway Ingress Controller
Governance see Istio Community and Open Usage Commons see Linkerd Governance and CNCF Charter AWS see Contributing to Consul see Contributing notice see Contributing notice, Governance, and CNCF Charter see Contributing notice and CNCF Charter
Tutorial Istio Tasks Linkerd Getting Started Guide AWS App Mesh Getting Started HashiCorp Learn platform Traefik Mesh Example Install Kuma on Kubernetes Install OSM on Kubernetes
Used in production yes yes
Advantages Istio can be adapted and extended like no other mesh. Its many features are available for Kubernetes and other platforms. Linkerd is designed to be non-invasive and is optimized for performance and usability. Therefore, it requires little time to adopt. AWS App Mesh is integrated into the AWS landscape and it is fully managed for you. Consul service mesh can be used in any Consul environment and therefore does not require a scheduler. The proxy can be changed and extended. Traefik Mesh focuses on a selection of features to achieve good usability and performance. Kuma supports both Kubernetes and VMs - including hybrid multi-zone deployments - and allows you to customize the Envoy Proxy. OpenServiceMesh is driven by Microsoft and therefore expected to be well integrated with Azure. It also supports the SMI API.
Drawbacks Istio's flexibility can be overwhelming for teams who don't have the capacity for more complex technology. Also, Istio takes control of the ingress controller. Linkerd is deeply integrated with Kubernetes and does not currently support non-Kubernetes workloads. It also does not currently support data plane extensions. AWS App Mesh configuration cannot be migrated to an environment outside AWS. Consul uses its own internal storage, and does not on rely Kubernetes for persistent storage. Traefik Mesh currently does not support transparent TLS encryption. Kuma is possibly the most flexible service mesh. Teams should thoroughly consider whether their project can handle the complexity involved. OpenServiceMesh (OSM) is the latest service mesh Implementation and simply too young to be production-ready.
Supported Protocols
TCP yes yes yes yes yes yes yes
HTTP/1.1+ yes yes yes yes yes yes yes
HTTP/2 yes yes yes yes yes yes yes
gRPC yes yes yes yes yes yes yes
Sidecar / Data Plane
Automatic Sidecar Injection yes yes yes yes yes (per Node) yes yes
CNI plugin to avoid pod network priviledges yes, in beta yes yes no no yes no
Platform and Extensibility
Platform Kubernetes Kubernetes ECS, Fargate, EKS, EC2 ECS, Kubernetes, Nomad, VMs Kubernetes Kubernetes, VMs Kubernetes
Cloud Integrations Google Cloud, Alibaba Cloud, IBM Cloud DigitalOcean AWS HCP Consul on AWS, Microsoft Azure Microsoft Azure
Mesh ExpansionExtension of the Mesh by containers/VMs outside the cluster yes no yes, within AWS yes no yes no
Multi-Cluster MeshControl and observe multiple clusters yes yes yes no yes no
Service Mesh Interface Compatibility
Traffic Access Control yes (unofficial/3rd party support) no no yes yes no yes
Traffic Specs yes (unofficial/3rd party support) no no no yes no yes
Traffic Split yes (unofficial/3rd party support) yes no no yes no yes
Traffic Metrics yes (unofficial/3rd party support) yes (unofficial/3rd party support) no no no no yes
Monitoring Features
Service Log Collection no no no, use AWS FireLens for ECS and Fargate instead no no no yes, using Fluent Bit
Access Log Generation yes no (tap feature instead) yes yes yes yes no
"Golden Signal” Metrics Generation yes yes yes yes, depending on the proxy used yes yes yes
Integrated, pre-configured Prometheus yes yes, in an extension no yes, for non-prod environments yes yes yes
Integrated, pre-configured Grafana yes yes, in an extension no no yes yes yes
Per-Route MetricsCollect values for each HTTP endpoint individually experimental yes depending on the proxy used no no no
Dashboard yes, Kiali yes yes, AWS Cloud Watch yes no yes, with a service topology map in grafana no
Compatible Tracing-Backends Jaeger, Zipkin, Solarwinds all Backends supporting OpenCensus AWS X-Ray Datadog, Jaeger, Zipkin, OpenTracing, Honeycomb Jaeger Jaeger, zipkin Jaeger
Integrated, pre-configured Tracing-Backends yes, Jaeger or Zipkin for nonprod environments Jaeger, in an extension yes, AWS X-Ray no yes, Jaeger yes, Jaeger yes (install with flag), Jaeger
Routing Features
Load Balancing yes (Round Robin, Random, Weighted, Least Request) yes (EWMA, exponentially weighted moving average) yes yes (Round Robin, Random, Weighted, Least Request, Ring Hash, Maglev) yes yes (Round Robin, Least Request, Ring Hash, Random, Maglev) yes
Percentage-based Traffic Splits yes yes, through SMI yes yes yes, through SMI yes yes, through SMI
Header- and Path-based Traffic SplitsRouting rules based on request header and path yes planned yes yes no yes, with transformations Header-based via SMI
Resilience Features
Circuit Breaking yes no, planned for 2.12.0 yes yes yes yes no
Retry & Timeout yes yes yes yes yes yes, retry and timeout no
Path- & Method-based Retry & TimeoutDifferent retry and timeout config for each endpoint yes yes yes yes no no no
Fault Injection yes yes, by adding a deployment and a traffic split config no* no yes no
Delay Injection yes no no* no yes no
Security Features
mTLS yes yes, on by default yes yes, on by default no yes yes
mTLS Enforcement yes yes, 2.11 yes, via client policies yes no yes
mTLS Permissvie Mode yes yes yes no yes no
mTLS by default yes, permissive mode yes, permissive mode no yes, permissive mode no no yes
External CA certificate and key pluggable e.g. Vault, cert-manager yes, CA cert pluggable and CA integration (experimental) yes yes yes, HashiCorp Vault, ACM Private CA, custom CA no yes HashiCorp Vault, cert-manager and Azure Key Vault
Service-to-Service Authorization Rules yes yes, 2.11.0 no, but support for IAM for user-authorization yes no yes yes
*Might be possible through manual configuration/templating of proxy

Found a mistake? Or have something to add? We appreciate your issues or pull requests on GitHub!

That's just a table.
For advice, trainings, and support around Kubernetes and Service Mesh send an email to

Alternatives to Service Meshes

Undoubtedly, service mesh is a useful pattern and some current implementations are very promising. But they also go along with challenges such as cognitive and technical complexity. Like any tool, they are not useful in every situation. Sometimes it might be wise to keep existing well-known "boring" technology or to go with alternative solutions.


Libraries are included in the microservices. The drawbacks are dependencies on specific technologies/languages, potential inconsistency in implementations and missing separation of service infrastructure and business logic.

However, the developer productivity can (at least in the short term) be better through the familiar use of libraries. Also, sometimes domain knowledge is needed, for example, to configure the fallback for a circuit breaker or to define business metrics. In these cases, a service mesh is of no use.

Service meshes require a change to the infrastructure. So it is not possible to use them if the infrastructure can or should not be changed. Sometimes the risk of changing the infrastructure is deemed too high even though services meshes can be applied to specific services only.

No (synchronous) Microservices

Service meshes are in particular helpful for synchronous communication. They usually rely on the HTTP protocol to transfer additional information and e.g. understand if a call failed.

One of the reasons for adopting microservices is their potential to reduce the time-to-market for software. Despite several drawbacks such as high latency and tight coupling, it's a common practice to implement microservice communication synchronously.

However, it is overseen that there are more approaches to perform microservice communication or to even avoid dependencies in the first place. (Read more in the free Microservices Recipes Book) Patterns like SCS and asynchronous communication aim to mitigate many problems of classic (synchronously communicating) microservices. Of course, you can have asynchronous microservices with HTTP e.g. by polling a feed for new events. As service meshes rely on HTTP, they would still be of some use. However, features e.g. for resilience are of less use as asynchronous communication supports resilience anyway.

Unjustifiably, monolithic architectures are often not even considered as a solution. Obviously, service meshes can only help a monolith with communication to other systems but not with internal communication.

Service Mesh Primer

Our free Service Mesh Primer explains the service mesh pattern and features in detail and contains examples for Istio.