Chapter 1. Introducing Istio Ambient Mesh
Istio ambient mesh is a new sidecar-less data plane option for Istio service mesh originally developed by Solo.io and Google. The goal for Istio ambient is to improve the operational experience of adopting, deploying, upgrading, and generally managing Istio throughout its life as critical infrastructure. Additional benefits over Istio’s existing sidecar deployments include resource cost savings, performance improvements, and improved security while maintaining Istio’s core feature set of zero-trust security, resilience, observability, traffic routing, and policy enforcement.
Before we get too deep into what Istio ambient mesh is and how it works, we should understand the motivation for its creation. The current model using sidecars to implement mesh functionality has been battle-tested and used successfully at scale to provide a lot of value. So, why introduce an alternative approach?
We (the creators of Istio) have always intended to make the service mesh transparent and incrementally adoptable, but in practice the sidecar approach has had drawbacks. The first drawback is in Kubernetes: the sidecar container is not a first-class citizen in a pod (i.e., the sidecar has no lifecycle or ordering controls). This creates scenarios where the workload container may become available before the Istio sidecar proxy. If the workload tries to make an outgoing connection, it will fail because the sidecar is not ready, creating a race condition. Another scenario happens when using Kubernetes Job resources. A Job that gets injected with a sidecar may run to completion, but the pod will not get cleaned up because the sidecar runs indefinitely.
Another drawback comes in how the sidecar interprets Layer 7 (L7) protocols like HTTP or gRPC. Not all applications use libraries that correctly implement the HTTP spec, which can cause parsing errors or misinterpreted L7 protocols. Another example of this L7 mishandling happens when using server-send-first protocols like with MySQL. The Istio sidecar proxy (on both client and server side) assumes client-send-first transmissions, which would break if not the case.
The last drawback we’ll point out is the adoption of Istio features is an “all-or-nothing” proposition. For example, users who wish to adopt mutual transport layer security (mTLS) for compliance reasons must inject a sidecar proxy to get mTLS, but that proxy also implements complex L7 handling (retries, traffic splitting, complex load balancing, observability collection, etc.). This co-location of features introduces the risk of unexpected behaviors (similar to that described with L7 mishandling described previously). Istio is a powerful service mesh with many capabilities, but the current adoption curve can be very steep without the ability to adopt features incrementally and absorb risk.
Injecting a proxy and tying the application lifecycle to the infrastructure lifecycle also causes transparency to be lost. Deployment descriptors that go through continuous integration/continuous deployment (CI/CD) may not be exactly what is deployed in production when injecting a sidecar proxy. Additionally, when upgrading Istio’s data plane, applications need to be redeployed or restarted, and this churn in the cluster needs to be coordinated to avoid unplanned outages.
These challenges make it harder to adopt a service mesh at scale, cause turbulence in a rollout of mesh functionality, and introduce L7 behavior risk that may be unnecessary for the features being adopted. Istio ambient mesh aims to solve this transparency and incremental adoption problem while introducing cost, performance, and security improvements.
Benefits of Istio Ambient Mesh
Istio ambient mesh addresses the challenges of transparency and incremental adoption by introducing a sidecar-less data plane and splitting the behaviors of the network into two separate layers, each handling concerns that can be combined to provide the full features of the service mesh.
By removing the data plane from the application pod, workloads are no longer susceptible to container race conditions caused by injecting a sidecar proxy. Job workloads also run to completion correctly. Additional benefits seen with this model include workloads not going around the data plane (ignoring the sidecar, forgetting to inject the sidecar, maliciously removing the sidecar, etc.).
Application onboarding and adoption are easier when you don’t need to add data plane components to the workload. For example, applications that may already be running can be dynamically added to the ambient mesh by applying labels either per workload or at the namespace level. Workloads can be similarly removed from the mesh dynamically, which makes experimentally or incrementally adding workloads to the mesh possible without disturbing the pods.
Upgrades become easier when no data plane component is intermingled with the application. The data plane components can be upgraded independently without restarting the workloads, or fleets of workloads, making upgrades safer.
Istio ambient splits the data plane into two layers: the secure overlay layer and the L7 waypoint proxy layer. We’ll discuss each layer in more detail in Chapter 2. This layering approach allows incremental adoption of mesh features. For example, users who want to adopt zero-trust networking properties may opt to use mTLS. By starting with Istio ambient mesh’s secure overlay layer, which handles only Layer 4 (L4) behaviors of the network like establishing mTLS and telemetry collection, users can adopt the features they want without introducing unnecessary risk.
We originally designed Istio ambient for ease of adoption, operations, and maintenance; however, the design of the data plane provides users with a number of additional benefits:
Better resource usage for data plane components because there are fewer proxies
Right-size scaling of L7 proxies based on traffic and not overprovisioning
Better performance for workloads needing only mTLS because all L7 processing is bypassed
Separation of application code from the data plane for security improvements
Better support for server-send-first protocols and nonconformant HTTP implementations
What About the Sidecar?
The sidecar has been a convenient approach to building mesh functionality, and it will not be going away. Even as ambient mesh matures, we see the sidecar playing a role in use cases where the client needs dedicated policy enforcement, reserved resources for performance or compliance reasons, or even just for the existing mental comfort of using sidecars. The sidecar deployment model will continue to be a first-class citizen of the Istio data plane and can even interoperate with ambient workloads.