Istio: Fault Injection, Retries and Circuit Breaker

Mark Taguiad Mar 28, 2026 · 4 min read

Continuation of Kubernetes Istio.

I’ve mentioned in this post that will sticking with HTTPRoute, but feature discussed here only support (for now) Istio API.

Ingress Gateway

Istio deploys a default resource for this, and for this example we are using the default ingress gateway ingressgateway.

If you want to create a custom ingress gateway.

istio-gateway.yaml

 1apiVersion: install.istio.io/v1alpha1
 2kind: IstioOperator
 3metadata:
 4  name: istio-control-plane
 5  namespace: istio-system
 6spec:
 7  components:
 8    ingressGateways:
 9      - name: istio-ingressgateway-prod
10        namespace: istio-system
11        enabled: true
12        label:
13          istio: ingressgateway-prod
14
15      - name: istio-ingressgateway-dev
16        namespace: istio-system
17        enabled: true
18        label:
19          istio: ingressgateway-dev

This will create two ingress gateway.

istio-ingressgateway-prod
istio-ingressgateway-dev

Gateway

This will attach to the LoadBalancer on namespace istio-system. demo-app-gateway.yaml

 1apiVersion: networking.istio.io/v1
 2kind: Gateway
 3metadata:
 4  name: demo-app-gateway
 5  namespace: demo
 6spec:
 7  selector:
 8    istio: ingressgateway # use istio default controller
 9  servers:
10    - port:
11        number: 80
12        name: http
13        protocol: HTTP
14      hosts:
15        - "*"

1kubectl create -f routing/istio-api/demo-app-gateway.yaml -n demo

Fault Injection

This is a good tool for testing resiliency on your application, but don’t apply this on production.

Review the manifest below.

Logic:

50% chance → delayed by 2 seconds
50% chance → request is aborted (error returned)

fault_injection.yaml

 1apiVersion: networking.istio.io/v1beta1
 2kind: VirtualService
 3metadata:
 4  name: round-robin-fault-injection
 5  namespace: demo
 6spec:
 7  hosts:
 8    - "*"
 9  gateways:
10    - demo-app-gateway
11
12  http:
13    # /api
14    - match:
15        - uri:
16            prefix: /api/
17      fault:
18        delay:
19          fixedDelay: 2s
20          percentage:
21            value: 50
22        abort:
23          httpStatus: 401
24          percentage:
25            value: 50
26      route:
27        - destination:
28            host: backend
29            port:
30              number: 3000
31
32    # /status
33    - match:
34        - uri:
35            prefix: /status
36      fault:
37        delay:
38          fixedDelay: 2s
39          percentage:
40            value: 50
41        abort:
42          httpStatus: 500
43          percentage:
44            value: 50
45      route:
46        - destination:
47            host: monitor
48            port:
49              number: 8000
50
51    # /app
52    - match:
53        - uri:
54            prefix: /app
55      fault:
56        delay:
57          fixedDelay: 2s
58          percentage:
59            value: 50
60        abort:
61          httpStatus: 500
62          percentage:
63            value: 50
64      route:
65        - destination:
66            host: frontend
67            port:
68              number: 80

1kubectl create -f security/retries_circuitbreaker_faultinjection/fault_injection.yaml -n demo

Verify

Notice that the it has 2 second delay.

1curl -s -o /dev/null -w "%{time_total}\\n" http://192.168.254.220/app
20.013159
3curl -s -o /dev/null -w "%{time_total}\\n" http://192.168.254.220/app
42.008432
5curl -s -o /dev/null -w "%{time_total}\\n" http://192.168.254.220/app
60.005950
7curl -s -o /dev/null -w "%{time_total}\\n" http://192.168.254.220/app
82.004389

Now let’s try to load access it consecutive times, it would return fault filter abort.

 1curl http://192.168.254.220/app
 2<html>
 3<head><title>301 Moved Permanently</title></head>
 4<body>
 5<center><h1>301 Moved Permanently</h1></center>
 6<hr><center>nginx/1.29.7</center>
 7</body>
 8</html>
 9
10curl http://192.168.254.220/app
11fault filter abort% 
12
13curl http://192.168.254.220/app
14fault filter abort%

Retries

This rule give the request multiple change to succeed but making sure to not connect to broken instances.

retries.yaml

 1apiVersion: networking.istio.io/v1beta1
 2kind: DestinationRule
 3metadata:
 4  name: demo-app-retries
 5  namespace: demo
 6spec:
 7  host: backend
 8  trafficPolicy:
 9    connectionPool:
10      tcp:
11        maxConnections: 100
12      http:
13        http1MaxPendingRequests: 50
14        maxRequestsPerConnection: 10
15    outlierDetection:
16      consecutive5xxErrors: 5
17      interval: 5s
18      baseEjectionTime: 15s
19      maxEjectionPercent: 50
20    retry:
21      attempts: 3           # retry 3 times
22      perTryTimeout: 2s     # each try max 2 seconds
23      retryOn: gateway-error,connect-failure,refused-stream

1kubectl create -f security/retries_circuitbreaker_faultinjection/retries.yaml -n demo

Allows up to 100 simultaneous TCP connections.

1maxConnections: 100

50 requests can queue, each connection handles 10 requests before it got drop.

1http1MaxPendingRequests: 50
2maxRequestsPerConnection: 10

This is the core of the config:

Istio will try up to 3 retries
Each attempt can take max 2 seconds

Retries happen only for these failures:

gateway-error → upstream returned 502/503/504
connect-failure → cannot connect to backend
refused-stream → connection-level issues (HTTP/2)

1retry:
2  attempts: 3
3  perTryTimeout: 2s
4  retryOn: gateway-error,connect-failure,refused-stream

Circuit Breaker

Now lets add rule to not overload the pod and and temporary avoid bad pods.

circuit-breaker.yaml

 1apiVersion: networking.istio.io/v1beta1
 2kind: DestinationRule
 3metadata:
 4  name: demo-app-circuit-breaker
 5  namespace: demo
 6spec:
 7  host: frontend
 8  trafficPolicy:
 9    connectionPool:
10      tcp:
11        maxConnections: 10       # max 10 TCP connections
12      http:
13        http1MaxPendingRequests: 5
14        maxRequestsPerConnection: 2
15    outlierDetection:
16      consecutive5xxErrors: 3
17      interval: 2s
18      baseEjectionTime: 10s
19      maxEjectionPercent: 50

1kubectl create -f security/retries_circuitbreaker_faultinjection/circuit_breaker.yaml -n demo

Throttles how much traffic each frontend instance can handle. Only 10 active TCP connection per Envoy proxy to frontend instance.

1tcp:
2  maxConnections: 10

Max 5 queued requests waiting for a connection. If exceeded, requests are rejected (503). Each TCP connection can only handle 2 requests.

1http:
2  http1MaxPendingRequests: 5
3  maxRequestsPerConnection: 2

This part removes unhealthy frontend pods automatically.

If a frontend pod returns 3 errors in a row (5xx) - marked unhealthy.
Health checks happen every 2 seconds.
Bad pod is removed from load balancing for 10 seconds.
At most 50% of frontend pods can be ejected.

1outlierDetection:
2  consecutive5xxErrors: 3
3  interval: 2s
4  baseEjectionTime: 10s
5  maxEjectionPercent: 50

marktaguiad.dev