Cilium ClusterMesh: Connecting Kubernetes Clusters
Cilium is a Kubernetes CNI built on eBPF, replacing the traditional iptables-heavy networking model with kernel-level packet processing. Instead of relying on large iptables chains for routing, filtering, and service load balancing, Cilium injects eBPF programs directly into the Linux kernel datapath for lower latency and better scalability.
This would require a whole book in explaining eBPF, so we will not dwell in that. Let’s focus first on connecting two Kubernetes Cluster.
Table of Contents
Why Cilium?
Cilium’s main architectural advantage is that it shifts networking and security enforcement from userspace and iptables into the kernel:
- eBPF dataplane: packet filtering, forwarding, NAT, and service LB happen in-kernel
- Identity-based security: policies are enforced on workload identity (labels) instead of only IP/CIDR
- Observability-first: Hubble exposes flow-level visibility without sidecars or packet mirroring
- kube-proxy replacement: service translation and load balancing can run entirely in eBPF
- Sidecar-free service mesh: optional L7 traffic management and mTLS without Envoy sidecars per pod
Prerequisite
- Cluster
- I’m using two K3S cluster, running on VM with bridge networking.
- Networking
- Use bridge networking
- We’ll use Cilium
loadbalancer.IPAMinstead ofMetalLB - Cluster A; pod_cidr: 10.42.0.0/16, service_cidr: 10.96.0.0/12
- Cluster B; pod_cidr: 10.43.0.0/16, service_cidr: 10.97.0.0/12
- Merge K3S config/context
1KUBECONFIG=cluster-a.yaml:cluster-b.yaml kubectl config view --flatten > ~/.kube/config
2export KUBECONFIG=~/.kube/config
3
4kubectl config get-contexts
5CURRENT NAME CLUSTER AUTHINFO NAMESPACE
6* cluster-a cluster-a cluster-a
7 cluster-b cluster-b cluster-b
Install Cilium
First install in cluster-a.
1cilium install \
2 --context cluster-a \
3 --version 1.19.3 \
4 --set kubeProxyReplacement=true \
5 --set ipam.mode=cluster-pool \
6 --set loadBalancerIPAM.enabled=true \
7 --set cluster.name=cluster-a \
8 --set cluster.id=1 \
9 --set clusterPoolIPv4PodCIDR=10.42.0.0/16 \
10 --set l2announcements.enabled=true \
11 --set hubble.enabled=true \
12 --set hubble.relay.enabled=true \
13 --set hubble.ui.enabled=true \
14 --set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,http}"
Export cluster-a cilium certificate. Note that you can skip this but some feature would not work. Check warning below:
1⚠️ Cilium CA certificates do not match between clusters. Multicluster features will be limited!
To export:
1kubectl --context cluster-a get secret -n kube-system cilium-ca -o yaml > cilium-ca.yaml
2sed -i '/resourceVersion/d;/uid/d;/creationTimestamp/d' cilium-ca.yaml
Now apply certifcate in cluster-b.
1kubectl --context cluster-b create -f cilium-ca.yaml -n kube-system
Install Cilium in cluster-b.
1cilium install \
2 --context cluster-b \
3 --version 1.19.3 \
4 --set kubeProxyReplacement=true \
5 --set ipam.mode=cluster-pool \
6 --set loadBalancerIPAM.enabled=true \
7 --set cluster.name=cluster-b \
8 --set cluster.id=2 \
9 --set clusterPoolIPv4PodCIDR=10.43.0.0/16 \
10 --set l2announcements.enabled=true \
11 --set hubble.enabled=true \
12 --set hubble.relay.enabled=true \
13 --set hubble.ui.enabled=true \
14 --set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,http}"
LoadBalancer Pool
ClusterMesh
You can create a one pool for both cluster or separate the pool for the cluster mesh.
I go with the latter, first create the clustermesh IP pool.
cluster-pool-a-lb.yaml
1apiVersion: cilium.io/v2
2kind: CiliumLoadBalancerIPPool
3metadata:
4 name: cluster-pool
5spec:
6 blocks:
7 - start: 192.168.254.220
8 stop: 192.168.254.220
9---
10apiVersion: cilium.io/v2alpha1
11kind: CiliumL2AnnouncementPolicy
12metadata:
13 name: l2-policy
14spec:
15 loadBalancerIPs: true
16 externalIPs: true
cluster-pool-b-lb.yaml
1apiVersion: cilium.io/v2
2kind: CiliumLoadBalancerIPPool
3metadata:
4 name: cluster-pool
5spec:
6 blocks:
7 - start: 192.168.254.221
8 stop: 192.168.254.221
9---
10apiVersion: cilium.io/v2alpha1
11kind: CiliumL2AnnouncementPolicy
12metadata:
13 name: l2-policy
14spec:
15 loadBalancerIPs: true
16 externalIPs: true
Apply clustermesh IP pool.
1kubectl --context cluster-a create -f cluster-pool-a-lb.yaml
2kubectl --context cluster-b create -f cluster-pool-b-lb.yaml
App pool
Create also IP pool for app that will be deployed in the cluster.
app-lb-pool-a.yaml
1apiVersion: cilium.io/v2
2kind: CiliumLoadBalancerIPPool
3metadata:
4 name: app-pool
5spec:
6 blocks:
7 - start: 192.168.254.230
8 stop: 192.168.254.239
9 serviceSelector:
10 matchLabels:
11 lb-pool: app
app-lb-pool-b.yaml
1apiVersion: cilium.io/v2
2kind: CiliumLoadBalancerIPPool
3metadata:
4 name: app-pool
5spec:
6 blocks:
7 - start: 192.168.254.240
8 stop: 192.168.254.249
9 serviceSelector:
10 matchLabels:
11 lb-pool: app
Apply app IP pool.
1kubectl --context cluster-a create -f app-lb-pool-a.yaml
2kubectl --context cluster-b create -f app-lb-pool-b.yaml
ClusterMesh
Enable ClusterMesh
We’re gonna be using service-type LoadBalancer, you can also set it to NodePort.
1cilium --context cluster-a clustermesh enable --service-type LoadBalancer
2cilium --context cluster-b clustermesh enable --service-type LoadBalancer
clustermesh pod will be init state as we need still need to connect the clusters.
1kubectl --context cluster-a get pods -n kube-system
2NAMESPACE NAME READY STATUS RESTARTS AGE
3kube-system cilium-bkv2p 1/1 Running 0 10m
4kube-system cilium-envoy-fmphf 1/1 Running 0 10m
5kube-system cilium-envoy-fqw6f 1/1 Running 0 10m
6kube-system cilium-envoy-k5r62 1/1 Running 0 10m
7kube-system cilium-g9mtc 1/1 Running 0 10m
8kube-system cilium-lglld 1/1 Running 0 10m
9kube-system cilium-operator-5784844fd8-mx4wl 1/1 Running 0 10m
10kube-system clustermesh-apiserver-6f7f876799-j89jz 0/3 Init:0/1 0 117s
11kube-system clustermesh-apiserver-generate-certs-08c2b7ed3e-24plq 0/1 Completed 0 117s
12kube-system coredns-7566b5ff58-mc2j8 1/1 Running 0 9m55s
13kube-system local-path-provisioner-6bc6568469-7vdj8 1/1 Running 0 147m
14kube-system metrics-server-786d997795-l6wkf 0/1 Running 0 21s
Connect the Cluster
Run this once on cluster-a.
1cilium clustermesh connect --context cluster-a --destination-context cluster-b
2
3✨ Extracting access information of cluster cluster-a...
4🔑 Extracting secrets from cluster cluster-a...
5ℹ️ Found ClusterMesh service IPs: [192.168.254.220]
6✨ Extracting access information of cluster cluster-b...
7🔑 Extracting secrets from cluster cluster-b...
8ℹ️ Found ClusterMesh service IPs: [192.168.254.221]
9ℹ️ Configuring Cilium in cluster cluster-a to connect to cluster cluster-b
10ℹ️ Configuring Cilium in cluster cluster-b to connect to cluster cluster-a
11✅ Connected cluster cluster-a <=> cluster-b!
Verify.
1cilium cilium --context cluster-a clustermesh status
2
3✅ Service "clustermesh-apiserver" of type "LoadBalancer" found
4✅ Cluster access information is available:
5 - 192.168.254.220:2379
6✅ Deployment clustermesh-apiserver is ready
7ℹ️ KVStoreMesh is enabled
8
9✅ All 3 nodes are connected to all clusters [min:1 / avg:1.0 / max:1]
10✅ All 1 KVStoreMesh replicas are connected to all clusters [min:1 / avg:1.0 / max:1]
11
12🔌 Cluster Connections:
13 - cluster-b: 3/3 configured, 3/3 connected - KVStoreMesh: 1/1 configured, 1/1 connected
Troubleshooting; you’ll get this error if both clustermesh attached to the same IP.
1ilium --context cluster-a clustermesh status
2✅ Service "clustermesh-apiserver" of type "LoadBalancer" found
3✅ Cluster access information is available:
4 - 192.168.254.220:2379
5✅ Deployment clustermesh-apiserver is ready
6ℹ️ KVStoreMesh is enabled
7
8⚠️ 3/3 nodes are not connected to all clusters [min:0 / avg:0.0 / max:0]
9⚠️ 1/1 KVStoreMesh replicas are not connected to all clusters [min:0 / avg:0.0 / max:0]
10
11🔌 Cluster Connections:
12 - cluster-b: 3/3 configured, 0/3 connected - KVStoreMesh: 1/1 configured, 0/1 connected
13
14❌ 4 Errors:
15 ❌ cilium-6zzjf is not connected to cluster cluster-b: remote cluster configuration required but not found
16 💡 This is likely caused by KVStoreMesh not being connected to the given cluster
17 ❌ cilium-gk2m5 is not connected to cluster cluster-b: remote cluster configuration required but not found
18 💡 This is likely caused by KVStoreMesh not being connected to the given cluster
19 ❌ cilium-kz4ls is not connected to cluster cluster-b: remote cluster configuration required but not found
20 💡 This is likely caused by KVStoreMesh not being connected to the given cluster
21 ❌ clustermesh-apiserver-566589c7bd-8wjp2 is not connected to cluster cluster-b: remote cluster configuration required but not found
22 💡 Double check if the cluster name matches the one configured in the remote cluster
Failover Example
Deployment
Let’s first deploy an application in both cluster. Create echo namespace.
1kubectl --contect cluster-a create ns echo
2kubectl --contect cluster-b create ns echo
We use different image tag to easily differentiate the cluster deployment.
- cluster-a
- cluster-b
echo-deploy-a.yaml
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 labels:
5 app: echo
6 name: echo
7spec:
8 replicas: 1
9 selector:
10 matchLabels:
11 app: echo
12 template:
13 metadata:
14 labels:
15 app: echo
16 spec:
17 containers:
18 - name: echo
19 image: mcbtaguiad/web-echo:cluster-a
20 imagePullPolicy: Always
21 ports:
22 - containerPort: 80
23 restartPolicy: Always
Apply in cluster-a.
1kubectl --contect cluster-a apply -f echo-deploy-a.yaml -n echo
echo-deploy-b.yaml
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 labels:
5 app: echo
6 name: echo
7spec:
8 replicas: 1
9 selector:
10 matchLabels:
11 app: echo
12 template:
13 metadata:
14 labels:
15 app: echo
16 spec:
17 containers:
18 - name: echo
19 image: mcbtaguiad/web-echo:cluster-b
20 imagePullPolicy: Always
21 ports:
22 - containerPort: 80
23 restartPolicy: Always
Apply in cluster-b.
1kubectl --contect cluster-b apply -f echo-deploy-b.yaml -n echo
Global Service
Create a global service in both cluster.
global-service.yaml
1apiVersion: v1
2kind: Service
3metadata:
4 name: echo
5 labels:
6 lb-pool: app
7 annotations:
8 service.cilium.io/global: "true"
9 service.cilium.io/affinity: "local"
10spec:
11 type: LoadBalancer
12 loadBalancerClass: io.cilium/l2-announcer
13 selector:
14 app: echo
15 ports:
16 - port: 80
17 targetPort: 80
Appy on cluster-a and cluster-b.
1kubectl --contect cluster-a apply -f global-service.yaml
2kubectl --contect cluster-b apply -f global-service.yaml
This Service combines two layers of failover:
- Layer 1 — External VIP failover
- Cilium L2 Announcer ensures the LoadBalancer IP remains reachable on the LAN by moving VIP ownership between nodes if needed.
- Layer 2 — Cross-cluster backend failover
- ClusterMesh ensures requests can still be served by remote cluster endpoints if the local cluster has no healthy pods.
Other configuration:
| Annotation | Use Case | Behavior |
|---|---|---|
service.cilium.io/global: "true" |
Basic service export | Exposes the service across the ClusterMesh and distributes traffic across all participating clusters. |
service.cilium.io/affinity: "local" |
Latency / cost optimization | Prefers endpoints in the local cluster and only forwards to remote clusters when no local endpoints are available. |
service.cilium.io/affinity: "remote" |
Maintenance / traffic shifting | Prioritizes remote cluster endpoints, allowing traffic to be drained away from the local cluster during upgrades or maintenance. |
service.cilium.io/shared: "false" |
Service isolation | Keeps the service local to the cluster by preventing its endpoints from being advertised to other clusters in the mesh. |
With loadBalancerClass: io.cilium/l2-announcer,Cilium advertises the Service VIP directly on the local Layer 2 network using ARP/NDP.
1kubectl --context cluster-a get svc -n echo
2NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
3echo LoadBalancer 10.111.71.163 192.168.254.230 80:31515/TCP 23h
4
5kubectl --context cluster-b get svc -n echo
6NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
7echo LoadBalancer 10.96.201.70 192.168.254.240 80:31837/TCP 23h
Verify
1curl 192.168.254.230
2<html>
3<!-- <h2> </h2> -->
4<h3>Cluster A</h3>
5<p>Look at me Mom, I'm a DevOps.</p>
6
7curl 192.168.254.240
8<html>
9<!-- <h2> </h2> -->
10<h3>Cluster A</h3>
11<p>Look at me Mom, I'm a DevOps.</p>
Failover
Scale down echo pod in cluster-b.
1kubectl --context cluster-b scale deployment/echo --replicas=0 -n echo
2deployment.apps/echo scaled
Curl IP of echo pod in cluster-b. This should still work but response should come from cluster-a.
1curl 192.168.254.240
2<html>
3<!-- <h2> </h2> -->
4<h3>Cluster A</h3>
5<p>Look at me Mom, I'm a DevOps.</p>