Kubernetes Autoscaling

Architecture Diagram

In Kubernetes, scaling is how you adjust your application to handle more or less traffic. There are two main types: horizontal scaling and vertical scaling.

Table of Contents

Horizontal Scaling vs Vertical Scaling

Feature Horizontal Scaling Vertical Scaling
Method Add/remove pods Increase/decrease resources
Tool HPA VPA
Best for Stateless apps Stateful or legacy apps
Downtime None Possible restart
Limit Cluster size Node capacity

Horizontal Scaling

This uses the Horizontal Pod Autoscaler to scale number of pods automatically.

  • You increase the number of pod replicas.
  • Traffic gets distributed across more pods.

HPA needs resource requests defined, make sure to add resource request and limit.

deploy.yaml

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: web-app
 5spec:
 6  replicas: 2
 7  selector:
 8    matchLabels:
 9      app: web-app
10  template:
11    metadata:
12      labels:
13        app: web-app
14    spec:
15      containers:
16      - name: web
17        image: nginx
18        resources:
19          requests:
20            cpu: "100m"
21            memory: "128Mi"
22          limits:
23            cpu: "500m"
24            memory: "256Mi"

hpa.yaml

 1apiVersion: autoscaling/v2
 2kind: HorizontalPodAutoscaler
 3metadata:
 4  name: web-app-hpa
 5spec:
 6  scaleTargetRef:
 7    apiVersion: apps/v1
 8    kind: Deployment
 9    name: web-app
10  minReplicas: 2
11  maxReplicas: 8
12  metrics:
13  - type: Resource
14    resource:
15      name: cpu
16      target:
17        type: Utilization
18        averageUtilization: 60

Create and verify.

1kubectl get hpa -n web
2NAME          REFERENCE            TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
3web-app-hpa   Deployment/web-app   cpu: 0%/60%   2         8         2          2m7s

CPU is at 0% percent utilization so replicas is still at 2. If it goes beyond 60% then it will scale up (increase replicas).

Vertical Scaling

This means increasing or decreasing resources of a single pod. You give a pod more CPU or RAM instead of adding more pods.

In-place pod resize graduates to stable in Kubernetes 1.35. Let’s get into that, but let’s first demonstraten the immutable version where pod are evicted and recreated when reach the resource limit.

When the resource limit it reached:

  • calculate new cpu/memory recommendation
  • evict the running pod
  • recreate pod with new updated resources

deploy.yaml

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: web-app
 5spec:
 6  replicas: 2
 7  selector:
 8    matchLabels:
 9      app: web-app
10  template:
11    metadata:
12      labels:
13        app: web-app
14    spec:
15      containers:
16      - name: web
17        image: nginx
18        resources:
19          requests:
20            cpu: "100m"
21            memory: "128Mi"
22          limits:
23            cpu: "500m"
24            memory: "256Mi"

vpa.yaml

 1apiVersion: autoscaling.k8s.io/v1
 2kind: VerticalPodAutoscaler
 3metadata:
 4  name:  web-app-vpa
 5spec:
 6  targetRef:
 7    apiVersion: "apps/v1"
 8    kind: Deployment
 9    name: web-app
10  updatePolicy:
11    updateMode: "Auto"

In Kubernetes v1.35 there’s no need to create VPA, just add resizePolicy.

  • vpa monitor usage
  • recommendeds better cpu/memory
  • applies update
  • no pod eviction and recreate

deploy.yaml

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: web-app
 5spec:
 6  replicas: 2
 7  selector:
 8    matchLabels:
 9      app: web-app
10  template:
11    metadata:
12      labels:
13        app: web-app
14    spec:
15      containers:
16      - name: web
17        image: nginx
18        resources:
19          requests:
20            cpu: "100m"
21            memory: "128Mi"
22          limits:
23            cpu: "500m"
24            memory: "256Mi"
25        resizePolicy:
26        - resourceName: cpu
27          restartPolicy: NotRequired
28        - resourceName: memory
29          restartPolicy: NotRequired