Istio: Fault Injection, Retries and Circuit Breaker

Continuation of Kubernetes Istio.

Table of Contents

I’ve mentioned in this post that will sticking with HTTPRoute, but feature discussed here only support (for now) Istio API.

Ingress Gateway

Istio deploys a default resource for this, and for this example we are using the default ingress gateway ingressgateway.

If you want to create a custom ingress gateway.

istio-gateway.yaml

 1apiVersion: install.istio.io/v1alpha1
 2kind: IstioOperator
 3metadata:
 4  name: istio-control-plane
 5  namespace: istio-system
 6spec:
 7  components:
 8    ingressGateways:
 9      - name: istio-ingressgateway-prod
10        namespace: istio-system
11        enabled: true
12        label:
13          istio: ingressgateway-prod
14
15      - name: istio-ingressgateway-dev
16        namespace: istio-system
17        enabled: true
18        label:
19          istio: ingressgateway-dev

This will create two ingress gateway.

  • istio-ingressgateway-prod
  • istio-ingressgateway-dev

Gateway

This will attach to the LoadBalancer on namespace istio-system. demo-app-gateway.yaml

 1apiVersion: networking.istio.io/v1
 2kind: Gateway
 3metadata:
 4  name: demo-app-gateway
 5  namespace: demo
 6spec:
 7  selector:
 8    istio: ingressgateway # use istio default controller
 9  servers:
10    - port:
11        number: 80
12        name: http
13        protocol: HTTP
14      hosts:
15        - "*"
1kubectl create -f routing/istio-api/demo-app-gateway.yaml -n demo

Fault Injection

This is a good tool for testing resiliency on your application, but don’t apply this on production.

Review the manifest below.

Logic:

  • 50% chance → delayed by 2 seconds
  • 50% chance → request is aborted (error returned)

fault_injection.yaml

 1apiVersion: networking.istio.io/v1beta1
 2kind: VirtualService
 3metadata:
 4  name: round-robin-fault-injection
 5  namespace: demo
 6spec:
 7  hosts:
 8    - "*"
 9  gateways:
10    - demo-app-gateway
11
12  http:
13    # /api
14    - match:
15        - uri:
16            prefix: /api/
17      fault:
18        delay:
19          fixedDelay: 2s
20          percentage:
21            value: 50
22        abort:
23          httpStatus: 401
24          percentage:
25            value: 50
26      route:
27        - destination:
28            host: backend
29            port:
30              number: 3000
31
32    # /status
33    - match:
34        - uri:
35            prefix: /status
36      fault:
37        delay:
38          fixedDelay: 2s
39          percentage:
40            value: 50
41        abort:
42          httpStatus: 500
43          percentage:
44            value: 50
45      route:
46        - destination:
47            host: monitor
48            port:
49              number: 8000
50
51    # /app
52    - match:
53        - uri:
54            prefix: /app
55      fault:
56        delay:
57          fixedDelay: 2s
58          percentage:
59            value: 50
60        abort:
61          httpStatus: 500
62          percentage:
63            value: 50
64      route:
65        - destination:
66            host: frontend
67            port:
68              number: 80
1kubectl create -f security/retries_circuitbreaker_faultinjection/fault_injection.yaml -n demo

Verify

Notice that the it has 2 second delay.

1curl -s -o /dev/null -w "%{time_total}\\n" http://192.168.254.220/app
20.013159
3curl -s -o /dev/null -w "%{time_total}\\n" http://192.168.254.220/app
42.008432
5curl -s -o /dev/null -w "%{time_total}\\n" http://192.168.254.220/app
60.005950
7curl -s -o /dev/null -w "%{time_total}\\n" http://192.168.254.220/app
82.004389

Now let’s try to load access it consecutive times, it would return fault filter abort.

 1curl http://192.168.254.220/app
 2<html>
 3<head><title>301 Moved Permanently</title></head>
 4<body>
 5<center><h1>301 Moved Permanently</h1></center>
 6<hr><center>nginx/1.29.7</center>
 7</body>
 8</html>
 9
10curl http://192.168.254.220/app
11fault filter abort% 
12
13curl http://192.168.254.220/app
14fault filter abort%

Retries

This rule give the request multiple change to succeed but making sure to not connect to broken instances.

retries.yaml

 1apiVersion: networking.istio.io/v1beta1
 2kind: DestinationRule
 3metadata:
 4  name: demo-app-retries
 5  namespace: demo
 6spec:
 7  host: backend
 8  trafficPolicy:
 9    connectionPool:
10      tcp:
11        maxConnections: 100
12      http:
13        http1MaxPendingRequests: 50
14        maxRequestsPerConnection: 10
15    outlierDetection:
16      consecutive5xxErrors: 5
17      interval: 5s
18      baseEjectionTime: 15s
19      maxEjectionPercent: 50
20    retry:
21      attempts: 3           # retry 3 times
22      perTryTimeout: 2s     # each try max 2 seconds
23      retryOn: gateway-error,connect-failure,refused-stream
1kubectl create -f security/retries_circuitbreaker_faultinjection/retries.yaml -n demo

Allows up to 100 simultaneous TCP connections.

1maxConnections: 100

50 requests can queue, each connection handles 10 requests before it got drop.

1http1MaxPendingRequests: 50
2maxRequestsPerConnection: 10

This is the core of the config:

  • Istio will try up to 3 retries
  • Each attempt can take max 2 seconds

Retries happen only for these failures:

  • gateway-error → upstream returned 502/503/504
  • connect-failure → cannot connect to backend
  • refused-stream → connection-level issues (HTTP/2)
1retry:
2  attempts: 3
3  perTryTimeout: 2s
4  retryOn: gateway-error,connect-failure,refused-stream

Circuit Breaker

Now lets add rule to not overload the pod and and temporary avoid bad pods.

circuit-breaker.yaml

 1apiVersion: networking.istio.io/v1beta1
 2kind: DestinationRule
 3metadata:
 4  name: demo-app-circuit-breaker
 5  namespace: demo
 6spec:
 7  host: frontend
 8  trafficPolicy:
 9    connectionPool:
10      tcp:
11        maxConnections: 10       # max 10 TCP connections
12      http:
13        http1MaxPendingRequests: 5
14        maxRequestsPerConnection: 2
15    outlierDetection:
16      consecutive5xxErrors: 3
17      interval: 2s
18      baseEjectionTime: 10s
19      maxEjectionPercent: 50
1kubectl create -f security/retries_circuitbreaker_faultinjection/circuit_breaker.yaml -n demo

Throttles how much traffic each frontend instance can handle. Only 10 active TCP connection per Envoy proxy to frontend instance.

1tcp:
2  maxConnections: 10

Max 5 queued requests waiting for a connection. If exceeded, requests are rejected (503). Each TCP connection can only handle 2 requests.

1http:
2  http1MaxPendingRequests: 5
3  maxRequestsPerConnection: 2

This part removes unhealthy frontend pods automatically.

  • If a frontend pod returns 3 errors in a row (5xx) - marked unhealthy.
  • Health checks happen every 2 seconds.
  • Bad pod is removed from load balancing for 10 seconds.
  • At most 50% of frontend pods can be ejected.
1outlierDetection:
2  consecutive5xxErrors: 3
3  interval: 2s
4  baseEjectionTime: 10s
5  maxEjectionPercent: 50