Chapter 4. Service Resiliency

Remember that your services and applications will be communicating over unreliable networks. In the past, developers have often tried to use frameworks (EJBs, CORBA, RMI, etc.) to simply make network calls appear like local method invocations. This gave developers a false peace of mind. Without ensuring the application actively guarded against network failures, the entire system was susceptible to cascading failures. Therefore, you should never assume that a remote dependency that your application or microservice is accessing across a network is guaranteed to respond with a valid payload nor within a particular timeframe (or, at all. As Douglas Adams, author of The Hitchhiker’s Guide to the Galaxy, once said, “A common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools”). You do not want the misbehavior of a single service to become a catastrophic failure that hamstrings your business objectives.

Istio comes with many capabilities for implementing resilience within applications, but just as we noted earlier, the actual enforcement of these capabilities happens in the sidecar. This means that the resilience features listed here are not targeted toward any specific runtime; they’re applicable regardless of library or framework you choose to write your service:

Client-side load balancing: Istio augments Kubernetes out-of-the-box load balancing.
Timeout: Wait only N seconds for a response and then give up.
Retry: If one pod returns an error (e.g., 503), retry for another pod.
Simple circuit breaker: Instead of overwhelming the degraded service, open the circuit and reject further requests.
Pool ejection: This provides autoremoval of error-prone pods from the load-balancing pool.

Let’s take a look at each capability with an example. Here, we use the same set of services from the previous examples.

Load Balancing

A core capability for increasing throughput and lowering latency is load balancing. A straightforward way to implement this is to have a centralized load balancer with which all clients communicates and knows how to distribute load to any backend systems. This is a great approach, but it can become both a bottleneck as well as a single point of failure. Load balancing capabilities can be distributed to clients with client-side load balancers. These client load balancers can use sophisticated, cluster-specific, load-balancing algorithms to increase availability, lower latency, and increase overall throughput. The Istio proxy has the capabilities to provide client-side load balancing through the following configurable algorithms:

ROUND_ROBIN: This algorithm evenly distributes the load, in order, across the endpoints in the load-balancing pool
RANDOM: This evenly distributes the load across the endpoints in the load-balancing pool but without any order.
LEAST_CONN: This algorithm picks two random hosts from the load-balancing pool and determines which host has fewer outstanding requests (of the two) and sends to that endpoint. This is an implementation of weighted least request load balancing.

In the previous chapters on routing, you saw the use of RouteRules to control how traffic is routed to specific clusters. In this chapter, we show you how to control the behavior of communicating with a particular cluster using DestinationPolicy rules. To begin, we discuss how to configure load balancing with Istio DestinationPolicy rules.

First, make sure there are no RouteRules that might interfere with how traffic is load balanced across v1 and v2 of our recommendation service. You can delete all RouteRules like this:

istioctl delete routerule --all

Next, you can scale up the recommendation service replicas to 3:

oc scale deployment recommendation-v2 --replicas=3 -n tutorial

Wait a moment for all containers to become healthy and ready for traffic. Now, send traffic to your cluster using the same script you used earlier:

#!/bin/bash
while true
do curl customer-tutorial.$(minishift ip).nip.io
sleep .1
done

You should see a round-robin-style distribution of load based on the outputs:

customer => preference => recommendation v1 from '99634814': 1145
customer => preference => recommendation v2 from '2819441432': 1
customer => preference => recommendation v2 from '2819441432': 2
customer => preference => recommendation v2 from '2819441432': 181
customer => preference => recommendation v1 from '99634814': 1146
customer => preference => recommendation v2 from '2819441432': 3
customer => preference => recommendation v2 from '2819441432': 4
customer => preference => recommendation v2 from '2819441432': 182

Now, change the load-balancing algorithm to RANDOM. Here’s what the Istio DestinationPolicy would look like for that:

apiVersion: config.istio.io/v1alpha2
kind: DestinationPolicy
metadata:
  name: recommendation-loadbalancer
  namespace: tutorial
spec:
  source:
    name: preference
  destination:
    name: recommendation
  loadBalancing:
    name: RANDOM

This destination policy configures traffic from the preference service to the recommendation service to be sent using a random load-balancing algorithm.

Let’s create this destination policy:

istioctl create -f istiofiles/recommendation_lb_policy_app.yml -n tutorial

You should now see a more random distribution when you call your service:

customer => preference => recommendation v2 from '2819441432': 10
customer => preference => recommendation v2 from '2819441432': 3
customer => preference => recommendation v2 from '2819441432': 11
customer => preference => recommendation v1 from '99634814': 1153
customer => preference => recommendation v1 from '99634814': 1154
customer => preference => recommendation v1 from '99634814': 1155
customer => preference => recommendation v2 from '2819441432': 12
customer => preference => recommendation v2 from '2819441432': 4
customer => preference => recommendation v2 from '2819441432': 5
customer => preference => recommendation v2 from '2819441432': 13
customer => preference => recommendation v2 from '2819441432': 14

Because you’ll be creating more destination policies throughout the remainder of this chapter, now is a good time to clean up:

istioctl delete -f istiofiles/recommendation_lb_policy_app.yml \
-n tutorial

Timeout

Timeouts are a crucial component to making systems resilient and available. Calls to services over a network can result in lots of unpredictable behavior, but the worst behavior is latency. Did the service fail? Is it just slow? Is it not even available? Unbounded latency means any of those things could have happened. But what does your service do? Just sit around and wait? Waiting is not a good solution if there is a customer on the other end of the request. Waiting also uses resources, causes other systems to potentially wait, and is usually a contributor to cascading failures. Your network traffic should always have timeouts in place, and you can use Istio service mesh to do this.

If you take a look at your recommendation service, find the RecommendationVerticle.java class and uncomment the line that introduces a delay in the service. You should save your changes before continuing:

@Override
public void start() throws Exception {
  Router router = Router.router(vertx);
//router.get("/").handler(this::timeout);
  router.get("/").handler(this::logging);
  router.get("/").handler(this::getRecommendations);
  router.get("/misbehave").handler(this::misbehave);
  router.get("/behave").handler(this::behave);

  HealthCheckHandler hc = HealthCheckHandler.create(vertx);
  hc.register("dummy-health-check", future ->
         future.complete(Status.OK()));
  router.get("/health").handler(hc);

  vertx.createHttpServer().requestHandler(router::accept).listen(8080);
}

You can now build the service and deploy it:

cd recommendation
mvn clean package
docker build -t example/recommendation:v2 .
oc delete pod -l app=recommendation,version=v2 -n tutorial

The last step here is to restart the v2 pod with the latest Docker image of your recommendation service. Now, if you call your customer service endpoint, you should experience the delay when the call hits the registration v2 service:

$  time curl customer-tutorial.$(minishift ip).nip.io

customer => preference => recommendation v2 from '751265691-qdznv': 2

real    0m3.054s
user    0m0.003s
sys     0m0.003s

Note that you might need to make the call a few times for it to route to the v2 service. The v1 version of recommendation does not have the delay.

Let’s take a look at your RouteRule that introduces a rule that imposes a timeout when making calls to recommendation service:

apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: recommendation-timeout
spec:
  destination:
    namespace: tutorial
    name: recommendation
  precedence: 1
  route:
  - labels:
      app: recommendation
  httpReqTimeout:
    simpleTimeout:
      timeout: 1s

You can now create this route rule:

istioctl create -f istiofiles/route-rule-recommendation-timeout.yml \
-n tutorial

Now when you send traffic to your customer service, you should see either a successful request (if it was routed to v1 of recommendation) or a 504 upstream request timeout error if routed to v2:

$  time curl customer-tutorial.$(minishift ip).nip.io

customer => 503 preference => 504 upstream request timeout

real    0m1.151s
user    0m0.003s
sys     0m0.003s

You can clean up by deleting this route rule:

istioctl delete routerule recommendation-timeout -n tutorial

Retry

Because you know the network is not reliable you might experience transient, intermittent errors. This can be even more pronounced with distributed microservices rapidly deploying several times a week or even a day. The service or pod might have gone down only briefly. With Istio’s retry capability, you can make a few more attempts before having to truly deal with the error, potentially falling back to default logic. Here, we show you how to configure Istio to do this.

The first thing you need to do is simulate transient network errors. You could do this in your Java code, but you’re going to use Istio, instead. You’re going to inject transient HTTP 503 errors into your call to recommendation service. We cover fault injection in more detail in Chapter 5, but for the moment, trust that installing the following route rule will introduce HTTP 503 errors:

istioctl create -f istiofiles/route-rule-recommendation-v2_503.yml \
-n tutorial

Now when you send traffic to the customer service, you should see intermittent 503 errors:

#!/bin/bash
while true
do
curl customer-tutorial.$(minishift ip).nip.io
sleep .1
done

customer => preference => recommendation v2 from '2036617847': 190
customer => preference => recommendation v2 from '2036617847': 191
customer => preference => recommendation v2 from '2036617847': 192
customer => 503 preference => 503 fault filter abort
customer => preference => recommendation v2 from '2036617847': 193
customer => 503 preference => 503 fault filter abort
customer => preference => recommendation v2 from '2036617847': 194
customer => 503 preference => 503 fault filter abort
customer => preference => recommendation v2 from '2036617847': 195
customer => 503 preference => 503 fault filter abort

Let’s take a look at a RouteRule that specifies your retry configuration:

apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
  name: recommendation-v2-retry
spec:
  destination:
    namespace: tutorial
    name: recommendation
  precedence: 3
  route:
  - labels:
      version: v2
  httpReqRetries:
    simpleRetry:
      perTryTimeout: 2s
      attempts: 3

This rule sets your retry attempts to 3 and will use a 2s timeout for each retry. The cumulative timeout is therefore six seconds plus the timeout of the original call. (To specify an overall timeout, see the previous section on timeouts.)

Let’s create your retry rule and try the traffic again:

istioctl create -f istiofiles/route-rule-recommendation-v2_retry.yml \
-n tutorial

Now when you send traffic, you shouldn’t see any errors. This means that even through you are experiencing 503s, Istio is automatically retrying to request for you, as shown here:

customer => preference => recommendation v2 from '751265691-n65j9': 35
customer => preference => recommendation v2 from '751265691-n65j9': 36
customer => preference => recommendation v2 from '751265691-n65j9': 37
customer => preference => recommendation v2 from '751265691-n65j9': 38
customer => preference => recommendation v2 from '751265691-n65j9': 39
customer => preference => recommendation v2 from '751265691-n65j9': 40
customer => preference => recommendation v2 from '751265691-n65j9': 41
customer => preference => recommendation v2 from '751265691-n65j9': 42
customer => preference => recommendation v2 from '751265691-n65j9': 43

Now you can clean up all of the route rules you’ve installed:

oc delete routerule --all

Circuit Breaker

Much like the electrical safety mechanism in the modern home (we used to have fuse boxes, and “blew a fuse” is still part of our vernacular), the circuit breaker insures that any specific appliance does not overdraw electrical current through a particular outlet. If you ever lived with someone who plugged in their radio, hair dryer, and perhaps a portable heater into the same circuit, you have likely seen this in action. The overdraw of current creates a dangerous situation because you can overheat the wire, which can result in a fire. The circuit breaker opens and disconnects the electrical current flow.

Note

The concepts of the circuit breaker and bulkhead for software systems were first proposed in the book by Michael Nygard titled Release It. The book was first published in 2007, long before the term microservices was even coined. A second edition of the book was just released in 2018.

The patterns of circuit breaker and bulkhead were popularized with the release of Netflix’s Hystrix library in 2012. The Netflix libraries such as Eureka (Service Discovery), Ribbon (load balancing) and Hystrix (circuit breaker and bulkhead) rapidly became very popular as many folks in the industry also began to focus on microservices and cloud-native architecture. Netflix OSS was built before there was a Kubernetes/OpenShift, and it does have some downsides: one, it is Java-only, and two it requires the application developer to use the embed library correctly. Figure 4-1 provides a timeline, from when the software industry attempted to break up monolithic application development teams and massive multimonth waterfall workflows, to the birth of Netflix OSS and the coining of the term “microservices.”

Istio puts more of the resilience implementation into the infrastructure so that you can focus more of their valuable time and energy on code that differentiates their business from the ever-growing competitive field.

Istio implements circuit breaking at the connection pool level and at the load-balancing host level. We’ll show you examples of both.

To explore the connection-pool circuit breaking, prepare by ensuring recommendation v2 service has the 3s timeout enabled (from the previous section). The RecommendationVerticle.java file should look similar to this:

    Router router = Router.router(vertx);
    router.get("/").handler(this::logging);
    router.get("/").handler(this::timeout);
    router.get("/").handler(this::getRecommendations);
    router.get("/misbehave").handler(this::misbehave);
    router.get("/behave").handler(this::behave);

You will route traffic to both v1 and v2 of recommendation using this Istio RouteRule:

istioctl create -f \
istiofiles/route-rule-recommendation-v1_and_v2_50_50.yml -n tutorial

From the initial installation instructions, we recommended you install the siege command-line tool. You can use this for load testing with a simple command-line interface (CLI).

We will use 20 clients sending two requests each (concurrently). Use the following command to do so:

siege -r 2 -c 20 -v customer-tutorial.$(minishift ip).nip.io

You should see output similar to this:

All of the requests to your system were successful, but it took some time to run the test because the v2 instance or pod was a slow performer. Note that for each call to v2, it took three seconds or more to complete (this is from the delay functionality you enabled).

But suppose that in a production system this three-second delay was caused by too many concurrent requests to the same instance or pod. You don’t want multiple requests getting queued or making that instance or pod even slower. So, we’ll add a circuit breaker that will open whenever you have more than one request being handled by any instance or pod.

To create circuit breaker functionality for our services, we use an Istio DestinationPolicy that looks like this:

apiVersion: config.istio.io/v1alpha2
kind: DestinationPolicy
metadata:
  name: recommendation-circuitbreaker
spec:
  destination:
    namespace: tutorial
    name: recommendation
    labels:
      version: v2
  circuitBreaker:
    simpleCb:
      maxConnections: 1
      httpMaxPendingRequests: 1
      sleepWindow: 2m
      httpDetectionInterval: 1s
      httpMaxEjectionPercent: 100
      httpConsecutiveErrors: 1
      httpMaxRequestsPerConnection: 1

Here, you’re configuring the circuit breaker for any client calling into v2 of the recommendation service. Remember in the previous RouteRule that you are splitting (50%) traffic between both v1 and v2, so this DestinationPolicy should be in effect for half the traffic. You are limiting the number of connections and number of pending requests to one. (We discuss the other settings in the next section, in which we look at outlier detection.) Let’s create this circuit breaker policy:

istioctl create -f istiofiles/recommendation_cb_policy_version_v2.yml \
-n tutorial

Now try the siege load generator one more time:

siege -r 2 -c 20 -v customer-tutorial.$(minishift ip).nip.io

You can now see that almost all calls completed in less than a second with either a success or a failure. You can try this a few times to see that this behavior is consistent. The circuit breaker will short circuit any pending requests or connections that exceed the specified threshold (in this case, an artificially low number, 1, to demonstrate these capabilities).

You can clean up these destination policies and route rules like this:

istioctl delete routerule recommendation-v1-v2 -n tutorial
istioctl delete -f istiofiles/recommendation_cb_policy_version_v2.yml

Pool Ejection

The last of the resilience capabilities that we discuss has to do with identifying badly behaving cluster hosts and not sending any more traffic to them for a cool-off period. Because the Istio proxy is based on Envoy and Envoy calls this implementation outlier detection, we’ll use the same terminology for discussing Istio.

Pool ejection or outlier detection is a resilience strategy that takes place whenever you have a pool of instances or pods to serve a client request. If the request is forwarded to a certain instance and it fails (e.g., returns a 50x error code), Istio will eject this instance from the pool for a certain sleep window. In our example, the sleep window is configured to be 15s. This increases the overall availability by making sure that only healthy pods participate in the pool of instances.

First, you need to ensure that you have a RouteRule in place. Let’s use a 50/50 split of traffic:

oc create -f istiofiles/route-rule-recommendation-v1_and_v2_50_50.yml \
-n tutorial

Next, you can scale the number of pods for the v2 deployment of recommendation so that you have some hosts in the load balancing pool with which to work:

oc scale deployment recommendation-v2 --replicas=2 -n tutorial

Wait a moment for all of the pods to get to the ready state. You can watch their progress with the following:

oc get pods -w

Now, let’s generate some simple load against the customer service:

#!/bin/bash
while true
do curl customer-tutorial.$(minishift ip).nip.io
sleep .1
done

You will see the load balancing 50/50 between the two different versions of the recommendation service. And within version v2, you will also see that some requests are handled by one pod and some requests are handled by the other pod:

customer => preference => recommendation v1 from '2039379827': 447
customer => preference => recommendation v2 from '2036617847': 26
customer => preference => recommendation v1 from '2039379827': 448
customer => preference => recommendation v2 from '2036617847': 27
customer => preference => recommendation v1 from '2039379827': 449
customer => preference => recommendation v1 from '2039379827': 450
customer => preference => recommendation v2 from '2036617847': 28
customer => preference => recommendation v1 from '2039379827': 451
customer => preference => recommendation v1 from '2039379827': 452
customer => preference => recommendation v2 from '2036617847': 29
customer => preference => recommendation v2 from '2036617847': 30
customer => preference => recommendation v2 from '2036617847': 216

To test outlier detection, you’ll want one of the pods to misbehave. Find one of them and login to it and instruct it to misbehave:

oc get pods -l app=recommendation,version=v2

You should see something like this:

recommendation-v2-2036617847         2/2       Running   0          1h
recommendation-v2-2036617847-spdrb   2/2       Running   0          7m

Now you can get into one the pods and add some erratic behavior on it. Get one of the pod names from your system and replace on the following command accordingly:

oc exec -it recommendation-v2-2036617847-spdrb -c recommendation /bin/bash

You will be inside the application container of your pod recommendation-v2-2036617847-spdrb. Now execute:

curl localhost:8080/misbehave
exit

This is a special endpoint that will make our application return only 503s.

#!/bin/bash
while true
do curl customer-tutorial.$(minishift ip).nip.io
sleep .1
done

You’ll see that whenever the pod recommendation-v2-2036617847-spdrb receives a request, you get a 503 error:

customer => preference => recommendation v1 from '2039379827': 495
customer => preference => recommendation v2 from '2036617847': 248
customer => preference => recommendation v1 from '2039379827': 496
customer => preference => recommendation v1 from '2039379827': 497
customer => 503 preference => 503 recommendation misbehavior from
'2036617847-spdrb'
customer => preference => recommendation v2 from '2036617847': 249
customer => preference => recommendation v1 from '2039379827': 498
customer => 503 preference => 503 recommendation misbehavior from
'2036617847-spdrb'

Now let’s see what happens when you configure Istio to eject misbehaving hosts. Take a look at the DestinationPolicy in the following:

istiofiles/recommendation_cb_policy_pool_ejection.yml

apiVersion: config.istio.io/v1alpha2
kind: DestinationPolicy
metadata:
  name: recommendation-poolejector-v2
  namespace: tutorial
spec:
  destination:
    namespace: tutorial
    name: recommendation
    labels:
      version: v2
  loadBalancing:
    name: RANDOM
  circuitBreaker:
    simpleCb:
      httpConsecutiveErrors: 1
      sleepWindow: 15s
      httpDetectionInterval: 5s
      httpMaxEjectionPercent: 100

In this DestinationPolicy, you’re configuring Istio to check every five seconds for misbehaving hosts and to remove hosts from the load balancing pool after one consecutive error (artificially low for this example). You are willing to eject up to 100% of the hosts (effectively temporarily suspending any traffic to the cluster).

istioctl create -f istiofiles/recommendation_cb_policy_pool_ejection.yml \
-n tutorial

Let’s put some load on the service now and see its behavior:

#!/bin/bash
while true
do curl customer-tutorial.$(minishift ip).nip.io
sleep .1
Done

You will see that whenever you get a failing request with 503 from the pod recommendation-v2-2036617847-spdrb, it is ejected from the pool and it doesn’t receive any more requests until the sleep window expires—which takes at least 15 seconds.

customer => preference => recommendation v1 from '2039379827': 509
customer => 503 preference => 503 recommendation misbehavior from
'2036617847'
customer => preference => recommendation v1 from '2039379827': 510
customer => preference => recommendation v1 from '2039379827': 511
customer => preference => recommendation v1 from '2039379827': 512
customer => preference => recommendation v1 from '2039379827': 513
customer => preference => recommendation v1 from '2039379827': 514
customer => preference => recommendation v2 from '2036617847': 256
customer => preference => recommendation v2 from '2036617847': 257
customer => preference => recommendation v1 from '2039379827': 515
customer => preference => recommendation v2 from '2036617847': 258
customer => preference => recommendation v2 from '2036617847': 259
customer => preference => recommendation v2 from '2036617847': 260
customer => preference => recommendation v1 from '2039379827': 516
customer => preference => recommendation v1 from '2039379827': 517
customer => preference => recommendation v1 from '2039379827': 518
customer => 503 preference => 503 recommendation misbehavior from
'2036617847'
customer => preference => recommendation v1 from '2039379827': 519
customer => preference => recommendation v1 from '2039379827': 520
customer => preference => recommendation v1 from '2039379827': 521
customer => preference => recommendation v2 from '2036617847': 261
customer => preference => recommendation v2 from '2036617847': 262
customer => preference => recommendation v2 from '2036617847': 263
customer => preference => recommendation v1 from '2039379827': 522
customer => preference => recommendation v1 from '2039379827': 523
customer => preference => recommendation v2 from '2036617847': 264
customer => preference => recommendation v1 from '2039379827': 524
customer => preference => recommendation v1 from '2039379827': 525
customer => preference => recommendation v1 from '2039379827': 526
customer => preference => recommendation v1 from '2039379827': 527
customer => preference => recommendation v2 from '2036617847': 265
customer => preference => recommendation v2 from '2036617847': 266
customer => preference => recommendation v1 from '2039379827': 528
customer => preference => recommendation v2 from '2036617847': 267
customer => preference => recommendation v2 from '2036617847': 268
customer => preference => recommendation v2 from '2036617847': 269
customer => 503 preference => 503 recommendation misbehavior
from '2036617847'
customer => preference => recommendation v1 from '2039379827': 529
customer => preference => recommendation v2 from '2036617847': 270

Combination: Circuit-Breaker + Pool Ejection + Retry

Even with pool ejection your application doesn’t look that resilient. That’s probably because you’re still letting some errors to be propagated to your clients. But you can improve this. If you have enough instances or versions of a specific service running into your system, you can combine multiple Istio capabilities to achieve the ultimate backend resilience:

Circuit Breaker to avoid multiple concurrent requests to an instance
Pool Ejection to remove failing instances from the pool of responding instances
Retries to forward the request to another instance just in case you get an open circuit breaker or pool ejection

By simply adding a retry configuration to our current RouteRule, we are able to completely get rid of our 503s requests. This means that whenever you receive a failed request from an ejected instance, Istio will forward the request to another supposedly healthy instance:

istioctl replace -f istiofiles/route-rule-recommendation-v1_and_v2_retry.yml

Throw some requests at the customer endpoint:

#!/bin/bash
while true
do curl customer-tutorial.$(minishift ip).nip.io
sleep .1
done

You will no longer receive 503s, but the requests from recommendation v2 are still taking more time to get a response:

customer => preference => recommendation v1 from '2039379827': 538
customer => preference => recommendation v1 from '2039379827': 539
customer => preference => recommendation v1 from '2039379827': 540
customer => preference => recommendation v2 from '2036617847': 281
customer => preference => recommendation v1 from '2039379827': 541
customer => preference => recommendation v2 from '2036617847': 282
customer => preference => recommendation v1 from '2039379827': 542
customer => preference => recommendation v1 from '2039379827': 543
customer => preference => recommendation v1 from '2039379827': 544
customer => preference => recommendation v2 from '2036617847': 283
customer => preference => recommendation v2 from '2036617847': 284
customer => preference => recommendation v1 from '2039379827': 545
customer => preference => recommendation v1 from '2039379827': 546
customer => preference => recommendation v1 from '2039379827': 547
customer => preference => recommendation v2 from '2036617847': 285
customer => preference => recommendation v2 from '2036617847': 286
customer => preference => recommendation v1 from '2039379827': 548
customer => preference => recommendation v2 from '2036617847': 287
customer => preference => recommendation v2 from '2036617847': 288
customer => preference => recommendation v1 from '2039379827': 549
customer => preference => recommendation v2 from '2036617847': 289
customer => preference => recommendation v2 from '2036617847': 290
customer => preference => recommendation v2 from '2036617847': 291
customer => preference => recommendation v2 from '2036617847': 292
customer => preference => recommendation v1 from '2039379827': 550
customer => preference => recommendation v1 from '2039379827': 551
customer => preference => recommendation v1 from '2039379827': 552
customer => preference => recommendation v1 from '2039379827': 553
customer => preference => recommendation v2 from '2036617847': 293
customer => preference => recommendation v2 from '2036617847': 294
customer => preference => recommendation v1 from '2039379827': 554

Your misbehaving pod recommendation-v2-2036617847-spdrb never shows up in the console, thanks to pool ejection and retry.

Clean up (note, we’ll leave the route rules in place as those will be used in the next chapter):

oc scale deployment recommendation-v2 --replicas=1 -n tutorial
oc delete pod -l app=recommendation,version=v2
oc delete routerule recommendation-v1-v2 -n tutorial
istioctl delete -f istiofiles/recommendation_cb_policy_pool_ejection.yml
-n tutorial

Get Introducing Istio Service Mesh for Microservices now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Introducing Istio Service Mesh for Microservices by Christian Posta, Burr Sutter

Chapter 4. Service Resiliency

Load Balancing

Timeout

Retry

Circuit Breaker

Note

Figure 4-1. Microservices timeline

Pool Ejection

Combination: Circuit-Breaker + Pool Ejection + Retry

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly