Kubernetes Autoscaling
In Kubernetes, scaling is how you adjust your application to handle more or less traffic. There are two main types: horizontal scaling and vertical scaling.
Table of Contents
Horizontal Scaling vs Vertical Scaling
| Feature | Horizontal Scaling | Vertical Scaling |
|---|---|---|
| Method | Add/remove pods | Increase/decrease resources |
| Tool | HPA | VPA |
| Best for | Stateless apps | Stateful or legacy apps |
| Downtime | None | Possible restart |
| Limit | Cluster size | Node capacity |
Horizontal Scaling
This uses the Horizontal Pod Autoscaler to scale number of pods automatically.
- You increase the number of pod replicas.
- Traffic gets distributed across more pods.
HPA needs resource requests defined, make sure to add resource request and limit.
deploy.yaml
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: web-app
5spec:
6 replicas: 2
7 selector:
8 matchLabels:
9 app: web-app
10 template:
11 metadata:
12 labels:
13 app: web-app
14 spec:
15 containers:
16 - name: web
17 image: nginx
18 resources:
19 requests:
20 cpu: "100m"
21 memory: "128Mi"
22 limits:
23 cpu: "500m"
24 memory: "256Mi"
hpa.yaml
1apiVersion: autoscaling/v2
2kind: HorizontalPodAutoscaler
3metadata:
4 name: web-app-hpa
5spec:
6 scaleTargetRef:
7 apiVersion: apps/v1
8 kind: Deployment
9 name: web-app
10 minReplicas: 2
11 maxReplicas: 8
12 metrics:
13 - type: Resource
14 resource:
15 name: cpu
16 target:
17 type: Utilization
18 averageUtilization: 60
Create and verify.
1kubectl get hpa -n web
2NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
3web-app-hpa Deployment/web-app cpu: 0%/60% 2 8 2 2m7s
CPU is at 0% percent utilization so replicas is still at 2. If it goes beyond 60% then it will scale up (increase replicas).
Vertical Scaling
This means increasing or decreasing resources of a single pod. You give a pod more CPU or RAM instead of adding more pods.
In-place pod resize graduates to stable in Kubernetes 1.35. Let’s get into that, but let’s first demonstraten the immutable version where pod are evicted and recreated when reach the resource limit.
When the resource limit it reached:
- calculate new cpu/memory recommendation
- evict the running pod
- recreate pod with new updated resources
deploy.yaml
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: web-app
5spec:
6 replicas: 2
7 selector:
8 matchLabels:
9 app: web-app
10 template:
11 metadata:
12 labels:
13 app: web-app
14 spec:
15 containers:
16 - name: web
17 image: nginx
18 resources:
19 requests:
20 cpu: "100m"
21 memory: "128Mi"
22 limits:
23 cpu: "500m"
24 memory: "256Mi"
vpa.yaml
1apiVersion: autoscaling.k8s.io/v1
2kind: VerticalPodAutoscaler
3metadata:
4 name: web-app-vpa
5spec:
6 targetRef:
7 apiVersion: "apps/v1"
8 kind: Deployment
9 name: web-app
10 updatePolicy:
11 updateMode: "Auto"
In Kubernetes v1.35 there’s no need to create VPA, just add resizePolicy.
- vpa monitor usage
- recommendeds better cpu/memory
- applies update
- no pod eviction and recreate
deploy.yaml
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: web-app
5spec:
6 replicas: 2
7 selector:
8 matchLabels:
9 app: web-app
10 template:
11 metadata:
12 labels:
13 app: web-app
14 spec:
15 containers:
16 - name: web
17 image: nginx
18 resources:
19 requests:
20 cpu: "100m"
21 memory: "128Mi"
22 limits:
23 cpu: "500m"
24 memory: "256Mi"
25 resizePolicy:
26 - resourceName: cpu
27 restartPolicy: NotRequired
28 - resourceName: memory
29 restartPolicy: NotRequired