Create GKE Cluster with Terraform/Opentofu 0x01
Still a not-so comprehensive guide to create GKE Cluster with Terraform/Opentofu.
Visit the previous post before proceeding with this one.
Repo for this post: g-ulap-demo.
Table of Contents
Overview
This topic is quite big and not a single blog or post would answer all your question. I advice you to read on documentations and blog posts. This post would be too long if I explain every details.
As a curious engineer and sometimes reckless (don’t be reckless when using company account or in production hahaha), I suggest you first deploy and you fix the error that popped up.
We’ll cover creating both Standard GKE and Autopilot GKE, I won’t be able to show you here some of the code blocks like variables.tf so make sure to look at the repo for reference.
Also I structured it to be modular so if you are not yet familiar with Terraform modules then I suggest you also look it up.
Project Structure
1├── env
2│ ├── dev
3│ │ ├── main.tf
4│ │ ├── provider.tf
5│ │ ├── terraform.tfvars
6│ │ └── variables.tf
7│ └── prod
8│ ├── main.tf
9│ ├── provider.tf
10│ ├── terraform.tfvars
11│ └── variables.tf
12└── modules
13 ├── addons
14 │ ├── cert-manager
15 │ │ └── main.tf
16 │ └── ingress-nginx
17 │ └── main.tf
18 ├── gke
19 │ ├── autopilot
20 │ │ ├── main.tf
21 │ │ ├── outputs.tf
22 │ │ └── variables.tf
23 │ └── standard
24 │ ├── main.tf
25 │ ├── outputs.tf
26 │ └── variables.tf
27 ├── iam
28 │ ├── api
29 │ │ ├── main.tf
30 │ │ └── variables.tf
31 │ └── service-account
32 │ ├── main.tf
33 │ ├── outputs.tf
34 │ └── variables.tf
35 ├── network
36 │ ├── firewall
37 │ │ ├── main.tf
38 │ │ └── variables.tf
39 │ ├── main.tf
40 │ ├── output.tf
41 │ └── variables.tf
42 ├── nodepool
43 │ ├── main.tf
44 │ └── variables.tf
45 └── storage
46 ├── main.tf
47 └── variables.tf
Services & Roles
Enable first services in your Terraform service account and project. We are using the same service account used in the previous post.
1gcloud services enable container.googleapis.com --project=<project_id>
2
3gcloud projects add-iam-policy-binding <project_id> \
4 --member="serviceAccount:tofu-sa@<project_id>.iam.gserviceaccount.com" \
5 --role="roles/serviceusage.serviceUsageAdmin"
6
7gcloud projects add-iam-policy-binding <project_id> \
8 --member="serviceAccount:tofu-sa@<project_id>.iam.gserviceaccount.com" \
9 --role="roles/iam.serviceAccountAdmin"
IAM Module
api
Enables required Google Cloud APIs.
| API | Purpose |
|---|---|
| compute | Required for networking and VM instances |
| container | Core GKE service |
| logging | Collects cluster logs |
| monitoring | Provides metrics and observability |
| secretmanager | Secure storage for secrets |
| artifactregistry | Stores container images |
modules/api/main.tf
1resource "google_project_service" "api" {
2 for_each = toset([
3 "compute.googleapis.com",
4 "container.googleapis.com",
5 "logging.googleapis.com",
6 "secretmanager.googleapis.com",
7 "monitoring.googleapis.com",
8 "artifactregistry.googleapis.com"
9 ])
10
11 project = var.project_id
12 service = each.key
13 disable_on_destroy = false
14}
Service Account
Creates a service account for nodepool with logging and metrics roles.
- logging.logWriter - Allows nodes to send logs
- monitoring.metricWriter - Allows nodes to send metrics
modules/service-account/main.tf
1resource "google_service_account" "node" {
2 account_id = var.node_sa
3}
4
5resource "google_project_iam_member" "gke_logging" {
6 project = var.project_id
7 role = "roles/logging.logWriter"
8 member = "serviceAccount:${google_service_account.node.email}"
9}
10
11resource "google_project_iam_member" "gke_metrics" {
12 project = var.project_id
13 role = "roles/monitoring.metricWriter"
14 member = "serviceAccount:${google_service_account.node.email}"
15}
Export the output to be used in nodepool module.
modules/service-account/outputs.tf
1output "node_sa_email" {
2 value = google_service_account.node.email
3}
Network Module
Disable this if you are okay with your nodes and cluster exposed to the public cloud. GCP create a network with auto-generated subnets.
We’ll create a private network and NAT to expose the endpoint for Ingress and Outbout internet (pulling container images, apis, updates).
VPC
modules/network/main.tf
1resource "google_compute_network" "vpc" {
2 name = "gke-network"
3 routing_mode = "REGIONAL"
4 auto_create_subnetworks = false
5 delete_default_routes_on_create = true
6}
7
8# -----------------------------
9# Subnet (with secondary ranges)
10# -----------------------------
11resource "google_compute_subnetwork" "private" {
12 name = "private"
13 ip_cidr_range = "10.0.32.0/19"
14 region = var.region
15 network = google_compute_network.vpc.id
16 private_ip_google_access = true
17
18 secondary_ip_range {
19 range_name = "k8s-pods"
20 ip_cidr_range = "172.16.0.0/14"
21 }
22
23 secondary_ip_range {
24 range_name = "k8s-services"
25 ip_cidr_range = "172.20.0.0/18"
26 }
27}
28
29# -----------------------------
30# Cloud Router
31# -----------------------------
32resource "google_compute_router" "router" {
33 name = "router"
34 region = var.region
35 network = google_compute_network.vpc.id
36}
37
38# -----------------------------
39# Cloud NAT
40# -----------------------------
41resource "google_compute_router_nat" "nat" {
42 name = "nat"
43 region = var.region
44 router = google_compute_router.router.name
45
46 nat_ip_allocate_option = "AUTO_ONLY"
47 source_subnetwork_ip_ranges_to_nat = "LIST_OF_SUBNETWORKS"
48
49 subnetwork {
50 name = google_compute_subnetwork.private.self_link
51 source_ip_ranges_to_nat = ["ALL_IP_RANGES"]
52 }
53}
54
55# -----------------------------
56# Firewall
57# -----------------------------
58module "firewall" {
59 source = "./firewall"
60
61 network = google_compute_network.vpc.name
62}
modules/outputs.tf
1output "network" {
2 value = google_compute_network.vpc.id
3}
4
5output "subnetwork" {
6 value = google_compute_subnetwork.private.id
7}
8
9output "network_name" {
10 value = google_compute_network.vpc.name
11}
Firewall
modules/network/firewall/main.tf
1# -----------------------------
2# Default route (required!)
3# -----------------------------
4resource "google_compute_route" "internet" {
5 name = "egress-internet"
6 network = var.network
7 dest_range = "0.0.0.0/0"
8 next_hop_gateway = "default-internet-gateway"
9}
10
11# -----------------------------
12# Firewall: Internal traffic
13# -----------------------------
14resource "google_compute_firewall" "allow_internal" {
15 name = "gke-allow-internal"
16 network = var.network
17
18 direction = "INGRESS"
19
20 source_ranges = [
21 "10.0.0.0/8", # node subnet
22 "172.16.0.0/14", # pods
23 "172.20.0.0/18" # services
24 ]
25
26 allow {
27 protocol = "all"
28 }
29}
30
31# -----------------------------
32# Firewall: Egress (internet access)
33# -----------------------------
34resource "google_compute_firewall" "allow_egress" {
35 name = "gke-allow-egress"
36 network = var.network
37
38 direction = "EGRESS"
39
40 destination_ranges = ["0.0.0.0/0"]
41
42 allow {
43 protocol = "all"
44 }
45}
46
47# -----------------------------
48# Firewall: Control plane → nodes
49# -----------------------------
50resource "google_compute_firewall" "allow_master_to_nodes" {
51 name = "gke-master-to-nodes"
52 network = var.network
53
54 direction = "INGRESS"
55
56 source_ranges = ["192.168.0.0/28"]
57
58 allow {
59 protocol = "tcp"
60 ports = ["443", "10250"]
61 }
62}
Node Pool Module
This is where your workloads actually run. Changed the configuration based on your app requirement, this will get costly if not managed correctly.
Disable this module if you are planning to use autopilot.
In this module, we defined the compute layer of our GKE setup:
- configured a scalable node pool
- enabled automatic repair and upgrades
- defined machine and disk configurations
- connected nodes to a secure service account
modules/nodepool/main.tf
1resource "google_container_node_pool" "gke-node" {
2 name = "gke-node"
3 cluster = var.cluster_id
4 location = var.region
5
6 autoscaling {
7 total_min_node_count = var.min_node
8 total_max_node_count = var.max_node
9 }
10
11 management {
12 auto_repair = true
13 auto_upgrade = true
14 }
15
16 node_config {
17 disk_size_gb = var.disk_size_gb
18 machine_type = var.machine_type
19 disk_type = var.disk_type
20
21
22 # labels = {
23 # role = "worker"
24 # }
25
26 # taint {
27 # key = "instance_type"
28 # value = "spot"
29 # effect = "NO_SCHEDULE"
30 # }
31
32 service_account = var.node_sa
33 oauth_scopes = [
34 "https://www.googleapis.com/auth/compute",
35 "https://www.googleapis.com/auth/cloud-platform",
36 "https://www.googleapis.com/auth/devstorage.read_only",
37 "https://www.googleapis.com/auth/logging.write",
38 "https://www.googleapis.com/auth/monitoring",
39 ]
40 }
41}