Istio: Fault Injection, Retries and Circuit Breaker
Continuation of Kubernetes Istio.
Table of Contents
I’ve mentioned in this post that will sticking with HTTPRoute, but feature discussed here only support (for now) Istio API.
Ingress Gateway
Istio deploys a default resource for this, and for this example we are using the default ingress gateway ingressgateway.
If you want to create a custom ingress gateway.
istio-gateway.yaml
1apiVersion: install.istio.io/v1alpha1
2kind: IstioOperator
3metadata:
4 name: istio-control-plane
5 namespace: istio-system
6spec:
7 components:
8 ingressGateways:
9 - name: istio-ingressgateway-prod
10 namespace: istio-system
11 enabled: true
12 label:
13 istio: ingressgateway-prod
14
15 - name: istio-ingressgateway-dev
16 namespace: istio-system
17 enabled: true
18 label:
19 istio: ingressgateway-dev
This will create two ingress gateway.
- istio-ingressgateway-prod
- istio-ingressgateway-dev
Gateway
This will attach to the LoadBalancer on namespace istio-system.
demo-app-gateway.yaml
1apiVersion: networking.istio.io/v1
2kind: Gateway
3metadata:
4 name: demo-app-gateway
5 namespace: demo
6spec:
7 selector:
8 istio: ingressgateway # use istio default controller
9 servers:
10 - port:
11 number: 80
12 name: http
13 protocol: HTTP
14 hosts:
15 - "*"
1kubectl create -f routing/istio-api/demo-app-gateway.yaml -n demo
Fault Injection
This is a good tool for testing resiliency on your application, but don’t apply this on production.
Review the manifest below.
Logic:
- 50% chance → delayed by 2 seconds
- 50% chance → request is aborted (error returned)
fault_injection.yaml
1apiVersion: networking.istio.io/v1beta1
2kind: VirtualService
3metadata:
4 name: round-robin-fault-injection
5 namespace: demo
6spec:
7 hosts:
8 - "*"
9 gateways:
10 - demo-app-gateway
11
12 http:
13 # /api
14 - match:
15 - uri:
16 prefix: /api/
17 fault:
18 delay:
19 fixedDelay: 2s
20 percentage:
21 value: 50
22 abort:
23 httpStatus: 401
24 percentage:
25 value: 50
26 route:
27 - destination:
28 host: backend
29 port:
30 number: 3000
31
32 # /status
33 - match:
34 - uri:
35 prefix: /status
36 fault:
37 delay:
38 fixedDelay: 2s
39 percentage:
40 value: 50
41 abort:
42 httpStatus: 500
43 percentage:
44 value: 50
45 route:
46 - destination:
47 host: monitor
48 port:
49 number: 8000
50
51 # /app
52 - match:
53 - uri:
54 prefix: /app
55 fault:
56 delay:
57 fixedDelay: 2s
58 percentage:
59 value: 50
60 abort:
61 httpStatus: 500
62 percentage:
63 value: 50
64 route:
65 - destination:
66 host: frontend
67 port:
68 number: 80
1kubectl create -f security/retries_circuitbreaker_faultinjection/fault_injection.yaml -n demo
Verify
Notice that the it has 2 second delay.
1curl -s -o /dev/null -w "%{time_total}\\n" http://192.168.254.220/app
20.013159
3curl -s -o /dev/null -w "%{time_total}\\n" http://192.168.254.220/app
42.008432
5curl -s -o /dev/null -w "%{time_total}\\n" http://192.168.254.220/app
60.005950
7curl -s -o /dev/null -w "%{time_total}\\n" http://192.168.254.220/app
82.004389
Now let’s try to load access it consecutive times, it would return fault filter abort.
1curl http://192.168.254.220/app
2<html>
3<head><title>301 Moved Permanently</title></head>
4<body>
5<center><h1>301 Moved Permanently</h1></center>
6<hr><center>nginx/1.29.7</center>
7</body>
8</html>
9
10curl http://192.168.254.220/app
11fault filter abort%
12
13curl http://192.168.254.220/app
14fault filter abort%
Retries
This rule give the request multiple change to succeed but making sure to not connect to broken instances.
retries.yaml
1apiVersion: networking.istio.io/v1beta1
2kind: DestinationRule
3metadata:
4 name: demo-app-retries
5 namespace: demo
6spec:
7 host: backend
8 trafficPolicy:
9 connectionPool:
10 tcp:
11 maxConnections: 100
12 http:
13 http1MaxPendingRequests: 50
14 maxRequestsPerConnection: 10
15 outlierDetection:
16 consecutive5xxErrors: 5
17 interval: 5s
18 baseEjectionTime: 15s
19 maxEjectionPercent: 50
20 retry:
21 attempts: 3 # retry 3 times
22 perTryTimeout: 2s # each try max 2 seconds
23 retryOn: gateway-error,connect-failure,refused-stream
1kubectl create -f security/retries_circuitbreaker_faultinjection/retries.yaml -n demo
Allows up to 100 simultaneous TCP connections.
1maxConnections: 100
50 requests can queue, each connection handles 10 requests before it got drop.
1http1MaxPendingRequests: 50
2maxRequestsPerConnection: 10
This is the core of the config:
- Istio will try up to 3 retries
- Each attempt can take max 2 seconds
Retries happen only for these failures:
- gateway-error → upstream returned 502/503/504
- connect-failure → cannot connect to backend
- refused-stream → connection-level issues (HTTP/2)
1retry:
2 attempts: 3
3 perTryTimeout: 2s
4 retryOn: gateway-error,connect-failure,refused-stream
Circuit Breaker
Now lets add rule to not overload the pod and and temporary avoid bad pods.
circuit-breaker.yaml
1apiVersion: networking.istio.io/v1beta1
2kind: DestinationRule
3metadata:
4 name: demo-app-circuit-breaker
5 namespace: demo
6spec:
7 host: frontend
8 trafficPolicy:
9 connectionPool:
10 tcp:
11 maxConnections: 10 # max 10 TCP connections
12 http:
13 http1MaxPendingRequests: 5
14 maxRequestsPerConnection: 2
15 outlierDetection:
16 consecutive5xxErrors: 3
17 interval: 2s
18 baseEjectionTime: 10s
19 maxEjectionPercent: 50
1kubectl create -f security/retries_circuitbreaker_faultinjection/circuit_breaker.yaml -n demo
Throttles how much traffic each frontend instance can handle. Only 10 active TCP connection per Envoy proxy to frontend instance.
1tcp:
2 maxConnections: 10
Max 5 queued requests waiting for a connection. If exceeded, requests are rejected (503). Each TCP connection can only handle 2 requests.
1http:
2 http1MaxPendingRequests: 5
3 maxRequestsPerConnection: 2
This part removes unhealthy frontend pods automatically.
- If a frontend pod returns 3 errors in a row (5xx) - marked unhealthy.
- Health checks happen every 2 seconds.
- Bad pod is removed from load balancing for 10 seconds.
- At most 50% of frontend pods can be ejected.
1outlierDetection:
2 consecutive5xxErrors: 3
3 interval: 2s
4 baseEjectionTime: 10s
5 maxEjectionPercent: 50