Kubernetes Monitoring & Logging
If you have used prometheus in your docker or podman environment then this is much easier to setup. Unlike in your docker/podman where you have to create the config and scrape config from scratch, in k8s some good helm repo are already available to use and do all that for you. Also Prometheus in Kubernetes has a dynamic service discovery, so any resources added to the cluster will autotmatically added and monitored.
For logging let’s integrate Loki, this is bundled with grafana.
I won’t go into much details about architecture, etc. It is better to deploy the monitoring and logging stack and you explore the dashboard.
Table of Contents
Deploy
Monitoring
Install via Helm.
This will install the default value of the helm chart.
- Prometheus
- Alertmanager
- Grafana
1kubectl create ns monitoring
2helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
3helm repo update
4helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack -n monitoring
To customized installation, extract helm values and edit to your liking.
1helm show values prometheus-community/kube-prometheus-stack > value.yaml
2helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack -f value.yaml -n monitoring
This is a minimal manifest which enable Persisten Storage and add email alerting in alertmanager. There other options in routes availble, famous are slack and discord, please do check them.
value.yaml
1alertmanager:
2 enabled: true
3 alertmanagerSpec:
4 storage:
5 volumeClaimTemplate:
6 spec:
7 storageClassName: rook-cephfs
8 accessModes: ["ReadWriteOnce"]
9 resources:
10 requests:
11 storage: 5Gi
12
13 config:
14 global:
15 resolve_timeout: 5m
16
17 route:
18 receiver: alert-emailer
19 group_wait: 30s
20 group_interval: 5m
21 repeat_interval: 12h
22 routes::
23 - receiver: alert-emailer
24 group_wait: 2m
25 continue: true
26
27 receivers:
28 - name: alert-emailer
29 email_configs:
30 - to: alerts@yourdomain.com
31 send_resolved: true
32 from: alerts@yourdomain.com
33 smarthost: mail.yourdomain.com:587
34 auth_username: alerts@yourdomain.com
35 auth_identity: alerts@yourdomain.com
36 auth_password: AveryStongPassword123!
37 require_tls: true
38
39prometheus:
40 enabled: true
41 prometheusSpec:
42 storageSpec:
43 volumeClaimTemplate:
44 spec:
45 storageClassName: rook-cephfs
46 accessModes: ["ReadWriteOnce"]
47 resources:
48 requests:
49 storage: 10Gi
50
51grafana:
52 enabled: true
53 persistence:
54 enabled: true
55 storageClassName: rook-cephfs
56 accessModes: ["ReadWriteOnce"]
57 size: 5Gi
1helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack -f value.yaml -n monitoring
If installation succeed.
1NAME: kube-prometheus-stack
2LAST DEPLOYED: Mon Mar 23 21:10:53 2026
3NAMESPACE: monitoring
4STATUS: deployed
5REVISION: 1
6TEST SUITE: None
7NOTES:
8kube-prometheus-stack has been installed. Check its status by running:
9 kubectl --namespace monitoring get pods -l "release=kube-prometheus-stack"
10
11Get Grafana 'admin' user password by running:
12
13 kubectl --namespace monitoring get secrets kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 -d ; echo
14
15Access Grafana local instance:
16
17 export POD_NAME=$(kubectl --namespace monitoring get pod -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=kube-prometheus-stack" -oname)
18 kubectl --namespace monitoring port-forward $POD_NAME 3000
19
20Get your grafana admin user password by running:
21
22 kubectl get secret --namespace monitoring -l app.kubernetes.io/component=admin-secret -o jsonpath="{.items[0].data.admin-password}" | base64 --decode ; echo
23
24
25Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.
Get contents.
1kubectl get pods -n monitoring
2NAME READY STATUS RESTARTS AGE
3alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 0 81s
4kube-prometheus-stack-grafana-85cdd5d48d-5mdv9 3/3 Running 0 92s
5kube-prometheus-stack-kube-state-metrics-567d49447b-g9l7x 1/1 Running 0 92s
6kube-prometheus-stack-operator-5b6c67b7b-cbdkq 1/1 Running 0 92s
7kube-prometheus-stack-prometheus-node-exporter-gxkbh 1/1 Running 0 92s
8kube-prometheus-stack-prometheus-node-exporter-m7jms 1/1 Running 0 92s
9kube-prometheus-stack-prometheus-node-exporter-ww2td 1/1 Running 0 92s
10prometheus-kube-prometheus-stack-prometheus-0 2/2 Running 0 81s
In production environment make sure to add Persistent Volume. If no PVC is defined, Prometheus data would be deleted if pod are restarted .
Logging
Usually Loki is deployed with Grafana, but for my dev environment I’ll be using the same namespace and reusing the Grafana pod deployed earlier.
Loki is more complicated to configure compared to prometheu stack, it will have to depend on what environment you are working on.
For production environment make sure you are using either High Availability Mode or Microservices Mode, make sure you have Object Storage available to use. Since I’m using a small cluster we’ll be using SingleBinary Mode.
Here’s a quck summary table comparing the modes.
| Deployment Mode | Scale | Storage Requirement | Environment |
|---|---|---|---|
| SingleBinary | Small, 1 pod | Local filesystem OK | Dev/test, small clusters |
| HA (High Availability) | Medium-large | Object storage preferred | Production, critical workloads |
| Microservices | Large, multi-component | Object storage required | Large enterprise, multi-tenant |
Export the loki chart values to inspect the deployment options.
1helm repo add grafana https://grafana.github.io/helm-charts
2helm repo update
3helm show values grafana/loki > values.yaml
This is a minimal install using SingleBinary Mode.
value.yaml
1deploymentMode: SingleBinary
2
3loki:
4 auth_enabled: false
5
6 commonConfig:
7 replication_factor: 1
8
9 storage:
10 type: filesystem
11
12 schemaConfig:
13 configs:
14 - from: 2024-01-01
15 store: tsdb
16 object_store: filesystem
17 schema: v13
18 index:
19 prefix: index_
20 period: 24h
21
22singleBinary:
23 replicas: 1
24
25 persistence:
26 enabled: true
27 storageClass: rook-cephfs
28 accessModes:
29 - ReadWriteMany
30 size: 10Gi
31
32# explicitly disable scalable components
33write:
34 replicas: 0
35read:
36 replicas: 0
37backend:
38 replicas: 0
Deploy.
1helm upgrade --install loki grafana/loki -n monitoring -f value.yaml
Expose Endpoint
Metallb
Using Metallb/Loadbalancer, edit service and change ClusterIP to LoadBalancer.
1kubectl get svc -n monitoring
2NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
3alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 5m27s
4kube-prometheus-stack-alertmanager ClusterIP 10.43.163.166 <none> 9093/TCP,8080/TCP 5m39s
5kube-prometheus-stack-grafana ClusterIP 10.43.46.120 <none> 80/TCP 5m39s
6kube-prometheus-stack-kube-state-metrics ClusterIP 10.43.224.131 <none> 8080/TCP 5m39s
7kube-prometheus-stack-operator ClusterIP 10.43.116.141 <none> 443/TCP 5m39s
8kube-prometheus-stack-prometheus ClusterIP 10.43.212.61 <none> 9090/TCP,8080/TCP 5m39s
9kube-prometheus-stack-prometheus-node-exporter ClusterIP 10.43.61.90 <none> 9100/TCP 5m39s
10prometheus-operated ClusterIP None <none> 9090/TCP 5m27s
1# Grafana
2kubectl edit svc kube-prometheus-stack-grafana -n monitoring
3service/kube-prometheus-stack-grafana edited
4
5# Prometheus
6kubectl edit svc kube-prometheus-stack-prometheus -n monitoring
7service/kube-prometheus-stack-prometheus edited
8
9# Alermanager
10kubectl edit svc kube-prometheus-stack-alertmanager -n monitoring
11service/kube-prometheus-stack-alertmanager edited
Verify endpoints.
1kubectl get svc -n monitoring
2NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
3alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 12m
4kube-prometheus-stack-alertmanager LoadBalancer 10.43.161.11 192.168.254.222 9093:30198/TCP,8080:31683/TCP 12m
5kube-prometheus-stack-grafana LoadBalancer 10.43.74.225 192.168.254.220 80:32594/TCP 12m
6kube-prometheus-stack-kube-state-metrics ClusterIP 10.43.209.174 <none> 8080/TCP 12m
7kube-prometheus-stack-operator ClusterIP 10.43.112.212 <none> 443/TCP 12m
8kube-prometheus-stack-prometheus LoadBalancer 10.43.30.133 192.168.254.221 9090:32616/TCP,8080:30470/TCP 12m
9kube-prometheus-stack-prometheus-node-exporter ClusterIP 10.43.52.148 <none> 9100/TCP 12m
10prometheus-operated ClusterIP None <none> 9090/TCP 12m
Loki does not have a UI, it is dependent with Grafana. Make sure to add Loki in sources when you access Grafana.
Ingress
Sample Ingress, this can also be enabled in the value.yaml file.
ingress-grafana.yaml
1apiVersion: networking.k8s.io/v1
2kind: Ingress
3metadata:
4 name: grafana
5 annotations:
6 cert-manager.io/cluster-issuer: "letsencrypt-prod"
7spec:
8 ingressClassName: nginx
9 tls:
10 - hosts:
11 - grafana.yourdomain.com
12 secretName: grafana-tls
13 rules:
14 - host: grafana.yourdomain.com
15 http:
16 paths:
17 - pathType: Prefix
18 path: /
19 backend:
20 service:
21 name: kube-prometheus-stack-grafana
22 port:
23 number: 80
ingress-prometheus.yaml
1apiVersion: networking.k8s.io/v1
2kind: Ingress
3metadata:
4 name: prometheus-ingress
5 annotations:
6 cert-manager.io/cluster-issuer: "letsencrypt-prod"
7spec:
8 ingressClassName: nginx
9 tls:
10 - hosts:
11 - prometheus.yourdomain.com
12 secretName: prometheus-tls
13 rules:
14 - host: prometheus.yourdomain.com
15 http:
16 paths:
17 - pathType: Prefix
18 path: /
19 backend:
20 service:
21 name: kube-prometheus-stack-prometheus
22 port:
23 number: 9090
ingress-alertmanager.yaml
1apiVersion: networking.k8s.io/v1
2kind: Ingress
3metadata:
4 name: alermanager-ingress
5 annotations:
6 cert-manager.io/cluster-issuer: "letsencrypt-prod"
7spec:
8 ingressClassName: nginx
9 tls:
10 - hosts:
11 - alertmanager.yourdomain.com
12 secretName: alertmanager-tls
13 rules:
14 - host: alertmanager.yourdomain.com
15 http:
16 paths:
17 - pathType: Prefix
18 path: /
19 backend:
20 service:
21 name: kube-prometheus-stack-alertmanager
22 port:
23 number: 9093