It’s already been a year since my first Kubernetes journey. My initial clusters—where I started learning and understanding more about Kubernetes—are now all taken down. This time, I want to build a fully functional, highly available (HA) cluster.
Over the past weeks, I’ve done more research in Kubernetes communities, as well as on subreddits like [k3s], [homelab], and [selfhosted]. I discovered that one of the best ways to deploy a cluster these days is by following guides and content from Techno Tim, so I decided to write this blog and share my own approach.
What I Want to Achieve#
- A fully organized HA cluster on my hardware, so if any of my machines go down, the cluster remains functional. Specifically:
- 1 x DELL R720 →
k3s-master-1
andk3s-worker-1
- 1 x DELL Optiplex Micro 3050 →
k3s-master-2
andk3s-worker-2
- 1 x DELL Optiplex Micro 3050 →
k3s-master-3
andk3s-worker-3
- 1 x DELL R720 →
How I Will Deploy#
I will create six virtual machines (VMs) on a Proxmox cluster:
- 3 x Ubuntu 22.04 Master Nodes
- 3 x Ubuntu 22.04 Worker Nodes
The goal is to run K3s on these VMs to set up a solid Kubernetes environment with redundancy.
Let’s Begin!#
In the upcoming sections, I’ll detail each step, from setting up Proxmox VMs to installing and configuring K3s, managing networking, storage, and beyond.
Chapter 1: Preparing DNS and IP Addresses#
When setting up a Kubernetes cluster, DNS and IP management are crucial. Below is how I handle DHCP, static IP assignments, and DNS entries in my homelab environment.
DHCP Configuration#
There are two possible scenarios for assigning IP addresses to your VMs:
Use IP addresses outside of your DHCP range
This method is often preferred, as your machines will keep their manually configured network settings even if your DHCP server goes down.DHCP Static Mappings
You can mapMAC -> IP
in your network services to allocate IP addresses to VMs based on their MAC addresses.
Tip: If you choose the second scenario, make sure you document your static leases carefully. Proper documentation avoids conflicts and confusion later.
My Approach#
I chose the first scenario, where I use IPs outside the DHCP range. This ensures my network remains stable if the DHCP service is unavailable.
- IP Range:
10.57.57.30/24
→10.57.57.35/24
for my VMs
DNS Setup#
I also set up a DNS entry in my Unbound service on pfSense to easily manage and access my machines. For instance, you can create an A
record or similar DNS record type pointing to your VM’s IP address. Below is a simple example:
Chapter 2: Automated VM Deployment on Proxmox with Cloud-Init#
To streamline the next steps, I’ve created a bash script that automates crucial parts of the process, including:
- Creating a Cloud-Init template
- Deploying multiple VMs with static or DHCP-based IP addresses
- Destroying the VMs if needed
If you prefer an even more automated approach using tools like Packer or Terraform, I suggest checking out this related post: Homelab as Code and adapting it to your specific scenario. However, for this blog, I’ll demonstrate a simpler, more direct approach using the script below.
This script can create or destroy VMs. Use it carefully and always keep backups of critical data.
Prerequisites#
- Make sure you have Proxmox up and running.
- You’ll need to place your SSH public key (e.g.,
/root/.ssh/id_rsa.pub
) on the Proxmox server before running the script.
Script Overview#
Option 1: Create Cloud-Init Template
- Downloads the Ubuntu Cloud image (currently Ubuntu 24.04, code-named “noble”)
- Creates a VM based on the Cloud-Init image
- Converts it into a template
Option 2: Deploy VMs
- Clones the Cloud-Init template to create the desired number of VMs
- Configures IP addressing, gateway, DNS, search domain, SSH key, etc.
- Adjusts CPU, RAM, and disk size to fit your needs
Option 3: Destroy VMs
- Stops and removes VMs created by this script
During the VM creation process, you’ll be prompted to enter the VM name for each instance (e.g., k3s-master-1
, k3s-master-2
, etc.).
To fully automate naming, you could edit the script to increment VM names automatically. However, prompting ensures you can organize VMs with custom naming.
The Bash Script#
Below is the full script. Feel free to customize it based on your storage, networking, and naming preferences.
1#!/bin/bash
2
3# Function to get user input with a default value
4get_input() {
5 local prompt=$1
6 local default=$2
7 local input
8 read -p "$prompt [$default]: " input
9 echo "${input:-$default}"
10}
11
12# Ask the user whether they want to create a template, deploy or destroy VMs
13echo "Select an option:"
14echo "1) Create Cloud-Init Template"
15echo "2) Deploy VMs"
16echo "3) Destroy VMs"
17read -p "Enter your choice (1, 2, or 3): " ACTION
18
19if [[ "$ACTION" != "1" && "$ACTION" != "2" && "$ACTION" != "3" ]]; then
20 echo "❌ Invalid choice. Please run the script again and select 1, 2, or 3."
21 exit 1
22fi
23
24# === OPTION 1: CREATE CLOUD-INIT TEMPLATE ===
25if [[ "$ACTION" == "1" ]]; then
26 TEMPLATE_ID=$(get_input "Enter the template VM ID" "300")
27 STORAGE=$(get_input "Enter the storage name" "local")
28 TEMPLATE_NAME=$(get_input "Enter the template name" "ubuntu-cloud")
29 IMG_URL="https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img"
30 IMG_FILE="/root/noble-server-cloudimg-amd64.img"
31
32 echo "📥 Downloading Ubuntu Cloud image..."
33 cd /root
34 wget -O $IMG_FILE $IMG_URL || { echo "❌ Failed to download the image"; exit 1; }
35
36 echo "🖥️ Creating VM $TEMPLATE_ID..."
37 qm create $TEMPLATE_ID --memory 2048 --cores 2 --name $TEMPLATE_NAME --net0 virtio,bridge=vmbr0
38
39 echo "💾 Importing disk to storage ($STORAGE)..."
40 qm disk import $TEMPLATE_ID $IMG_FILE $STORAGE || { echo "❌ Failed to import disk"; exit 1; }
41
42 echo "🔗 Attaching disk..."
43 qm set $TEMPLATE_ID --scsihw virtio-scsi-pci --scsi0 $STORAGE:vm-$TEMPLATE_ID-disk-0
44
45 echo "☁️ Adding Cloud-Init drive..."
46 qm set $TEMPLATE_ID --ide2 $STORAGE:cloudinit
47
48 echo "🛠️ Configuring boot settings..."
49 qm set $TEMPLATE_ID --boot c --bootdisk scsi0
50
51 echo "🖧 Adding serial console..."
52 qm set $TEMPLATE_ID --serial0 socket --vga serial0
53
54 echo "📌 Converting VM to template..."
55 qm template $TEMPLATE_ID
56
57 echo "✅ Cloud-Init Template created successfully!"
58 exit 0
59fi
60
61# === OPTION 2: DEPLOY VMs ===
62if [[ "$ACTION" == "2" ]]; then
63 TEMPLATE_ID=$(get_input "Enter the template VM ID" "300")
64 START_ID=$(get_input "Enter the starting VM ID" "301")
65 NUM_VMS=$(get_input "Enter the number of VMs to deploy" "6")
66 STORAGE=$(get_input "Enter the storage name" "dataz2")
67 IP_PREFIX=$(get_input "Enter the IP prefix (e.g., 10.57.57.)" "10.57.57.")
68 IP_START=$(get_input "Enter the starting IP last octet" "30")
69 GATEWAY=$(get_input "Enter the gateway IP" "10.57.57.1")
70 DNS_SERVERS=$(get_input "Enter the DNS servers (space-separated)" "8.8.8.8 1.1.1.1")
71 DOMAIN_SEARCH=$(get_input "Enter the search domain" "merox.dev")
72 DISK_SIZE=$(get_input "Enter the disk size (e.g., 100G)" "100G")
73 RAM_SIZE=$(get_input "Enter the RAM size in MB" "16384")
74 CPU_CORES=$(get_input "Enter the number of CPU cores" "4")
75 CPU_SOCKETS=$(get_input "Enter the number of CPU sockets" "4")
76 SSH_KEY_PATH=$(get_input "Enter the SSH public key file path" "/root/.ssh/id_rsa.pub")
77
78 if [[ ! -f "$SSH_KEY_PATH" ]]; then
79 echo "❌ Error: SSH key file not found at $SSH_KEY_PATH"
80 exit 1
81 fi
82
83 for i in $(seq 0 $((NUM_VMS - 1))); do
84 VM_ID=$((START_ID + i))
85 IP="$IP_PREFIX$((IP_START + i))/24"
86 VM_NAME=$(get_input "Enter the name for VM $VM_ID" "ubuntu-vm-$((i+1))")
87
88 echo "🔹 Creating VM: $VM_ID (Name: $VM_NAME, IP: $IP)"
89
90 if qm status $VM_ID &>/dev/null; then
91 echo "⚠️ VM $VM_ID already exists, removing..."
92 qm stop $VM_ID &>/dev/null
93 qm destroy $VM_ID
94 fi
95
96 if ! qm clone $TEMPLATE_ID $VM_ID --full --name $VM_NAME --storage $STORAGE; then
97 echo "❌ Failed to clone VM $VM_ID, skipping..."
98 continue
99 fi
100
101 qm set $VM_ID --memory $RAM_SIZE \
102 --cores $CPU_CORES \
103 --sockets $CPU_SOCKETS \
104 --cpu host \
105 --serial0 socket \
106 --vga serial0 \
107 --ipconfig0 ip=$IP,gw=$GATEWAY \
108 --nameserver "$DNS_SERVERS" \
109 --searchdomain "$DOMAIN_SEARCH" \
110 --sshkey "$SSH_KEY_PATH"
111
112 qm set $VM_ID --delete ide2 || true
113 qm set $VM_ID --ide2 $STORAGE:cloudinit,media=cdrom
114 qm cloudinit update $VM_ID
115
116 echo "🔄 Resizing disk to $DISK_SIZE..."
117 qm resize $VM_ID scsi0 +$DISK_SIZE
118
119 qm start $VM_ID
120 echo "✅ VM $VM_ID ($VM_NAME) created and started!"
121 done
122 exit 0
123fi
124
125# === OPTION 3: DESTROY VMs ===
126if [[ "$ACTION" == "3" ]]; then
127 START_ID=$(get_input "Enter the starting VM ID to delete" "301")
128 NUM_VMS=$(get_input "Enter the number of VMs to delete" "6")
129
130 echo "⚠️ Destroying VMs from $START_ID to $((START_ID + NUM_VMS - 1))..."
131 for i in $(seq 0 $((NUM_VMS - 1))); do
132 VM_ID=$((START_ID + i))
133
134 if qm status $VM_ID &>/dev/null; then
135 echo "🛑 Stopping and destroying VM $VM_ID..."
136 qm stop $VM_ID &>/dev/null
137 qm destroy $VM_ID
138 else
139 echo "ℹ️ VM $VM_ID does not exist. Skipping..."
140 fi
141 done
142 echo "✅ Specified VMs have been destroyed."
143 exit 0
144fi
Verifying Your Deployment#
After running the script under Option 2, you should see your new VMs listed in the Proxmox web interface. You can now log in via SSH from the machine that holds the corresponding private key
ssh ubuntu@k3s-master-01
Note: Adjust the hostname or IP as configured during the script prompts.
Chapter 3: Installing K3s with Ansible#
This chapter will guide you through setting up K3s using Ansible on your Proxmox-based VMs. Ansible helps automate the process across multiple nodes, making the deployment faster and more reliable.
Prerequisites#
Ensure Ansible is installed on your management machine (Debian/Ubuntu or macOS):
- Debian/Ubuntu:
sudo apt update && sudo apt install -y ansible
- macOS:
brew install ansible
- Debian/Ubuntu:
Clone the k3s-ansible repository
We will use Techno Tim’s k3s-ansible repository, but in this guide, we’ll use a forked version:
git clone https://github.com/mer0x/k3s-ansible
Pre-Deployment Configuration#
Set up the Ansible environment:
1 cd k3s-ansible 2 cp ansible.example.cfg ansible.cfg 3 ansible-galaxy install -r ./collections/requirements.yml 4 cp -R inventory/sample inventory/my-cluster
Edit
inventory/my-cluster/hosts.ini
Modify this file to match your cluster’s IP addresses. Example:
1 [master] 2 10.57.57.30 3 10.57.57.31 4 10.57.57.32 5 6 [node] 7 10.57.57.33 8 10.57.57.34 9 10.57.57.35 10 11 [k3s_cluster:children] 12 master 13 node
Edit
inventory/my-cluster/group_vars/all.yml
Some critical fields to modify:
ansible_user:#
- Default VM user is
ubuntu
with sudo privileges.
system_timezone:#
- Set to your local timezone (e.g.,
Europe/Bucharest
).
Networking (
Calico
vs.Flannel
):#- Comment out
#flannel_iface: eth0
and usecalico_iface: "eth0"
for better network policies. - Flannel is the simpler alternative if you prefer an easier setup.
apiserver_endpoint: 10.57.57.100:#
- Ensure this is an unused IP in your local network.
- It serves as the VIP (Virtual IP) for the k3s control plane.
k3s_token:#
- Use any alphanumeric string.
metal_lb_ip_range:#
- 10.57.57.80-10.57.57.90
- The IP belongs to your local network (LAN)
- It’s not already in use by other network services
- It’s outside your DHCP pool range to avoid conflicts
- This setup enables exposing K3s container services to your network, similar to how Docker ports are exposed to their host IP.
- Default VM user is
Deploy the Cluster#
Run the following command to deploy the cluster:
ansible-playbook ./site.yml -i ./inventory/my-cluster/hosts.ini
Once the playbook execution completes, you can verify the cluster’s status:
1# Copy the kubeconfig file from the first master node
2scp [email protected]:~/.kube/config .
3
4# Move it to the correct location
5mkdir -p ~/.kube
6mv config ~/.kube/
7
8# Check if the cluster nodes are properly registered
9kubectl get nodes
If the setup was successful, kubectl get nodes
should display the cluster’s nodes and their statuses.
What’s Next?#
With K3s successfully deployed, the next steps involve setting up additional tools such as Rancher, Traefik, and Longhorn for cluster management, ingress control, and persistent storage.
Chapter 4: K3S Apps Deployment#
Deploying Traefik#
Install Helm Package Manager for Kubernetes#
1curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
2chmod 700 get_helm.sh
3./get_helm.sh
Create Namespace for Traefik#
kubectl create namespace traefik
Add Helm Repository and Update#
helm repo add traefik https://helm.traefik.io/traefik
helm repo update
Clone TechnoTim Launchpad Repository#
git clone https://github.com/techno-tim/launchpad
Configure values.yaml
for Traefik#
Open the launchpad/kubernetes/traefik-cert-manager/
directory and check values.yaml
. Most configurations are already set; you only need to specify the IP for the LoadBalancer service. Choose an IP from the MetalLB range defined in your setup here.
Install Traefik Using Helm#
helm install --namespace=traefik traefik traefik/traefik --values=values.yaml
Verify Deployment#
kubectl get svc --all-namespaces -o wide
Expected output:
1NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
2calico-system calico-typha ClusterIP 10.43.80.131 <none> 5473/TCP 2d20h k8s-app=calico-typha
3traefik traefik LoadBalancer 10.43.185.67 10.57.57.80 80:32195/TCP,443:31598/TCP,443:31598/UDP 53s app.kubernetes.io/instance=traefik,app.kubernetes.io/name=traefik
Apply Middleware#
kubectl apply -f default-headers.yaml
kubectl get middleware
Expected output:
NAME AGE
default-headers 4s
Deploying Traefik Dashboard#
Install htpasswd
#
sudo apt-get update
sudo apt-get install apache2-utils
Generate a Base64-Encoded Credential#
htpasswd -nb merox password | openssl base64
Configure DNS Resolver#
Ensure that your DNS server points to the MetalLB IP specified in values.yaml
here.
Example entry for pfSense DNS Resolver:
routes:
- match: Host(`traefik.k3s.merox.dev`)
Apply Kubernetes Resources#
from traefik/dashboard folder
1kubectl apply -f secret-dashboard.yaml
2kubectl get secrets --namespace traefik
3kubectl apply -f middleware.yaml
4kubectl apply -f ingress.yaml
At this point, you should be able to access the DNS entry you created. However, it will use a self-signed SSL certificate generated by Traefik. In the next steps, we will configure Let’s Encrypt certificates using Cloudflare as the provider.
Deploying Cert-Manager#
traefik-cert-manager/cert-manager folder
Add Jetstack Helm Repository#
helm repo add jetstack https://charts.jetstack.io
helm repo update
Create Namespace for Cert-Manager#
kubectl create namespace cert-manager
Apply CRDs (Custom Resource Definitions)#
Note: Ensure you use the latest version of Cert-Manager.
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.17.0/cert-manager.crds.yaml
Install Cert-Manager Using Helm#
helm install cert-manager jetstack/cert-manager --namespace cert-manager --values=values.yaml --version v1.17.0
Apply Cloudflare API Secret#
Make sure you generate the correct API token if using Cloudflare (use an API Token, not a global key).
kubectl apply -f issuers/secret-cf-token.yaml
Deploy Production Certificates#
Fields to be edited before:
issuers/letsencrypt-production.yaml
email
,dnsZones
certificates/production/your-domain-com.yaml
name
,secretName
,commonName
,dnsNames
kubectl apply -f issuers/letsencrypt-production.yaml
kubectl apply -f certificates/production/your-domain-com.yaml
Verify Logs and Challenges#
kubectl logs -n cert-manager -f cert-manager-(your-instance-name)
kubectl get challenges
With these steps completed, your K3s cluster now runs Traefik as an ingress controller, supports HTTPS with Let’s Encrypt, and manages certificates automatically. This setup ensures secure traffic routing and efficient load balancing for your Kubernetes applications.
Deploying Rancher#
Add Rancher Helm Repository and Create Namespace#
helm repo add rancher-latest https://releases.rancher.com/server-charts/stable
kubectl create namespace cattle-system
Since Traefik is already deployed, Rancher will utilize it for ingress. Deploy Rancher with Helm:
1helm install rancher rancher-stable/rancher \
2 --namespace cattle-system \
3 --set hostname=rancher.k3s.merox.dev \
4 --set tls=external \
5 --set replicas=3
Create Ingress for Rancher#
Create an ingress.yml
file with the following configuration:
1apiVersion: traefik.io/v1alpha1
2kind: IngressRoute
3metadata:
4 name: rancher
5 namespace: cattle-system
6spec:
7 entryPoints:
8 - websecure
9 routes:
10 - match: Host(`rancher.k3s.merox.dev`)
11 kind: Rule
12 services:
13 - name: rancher
14 port: 443
15 middlewares:
16 - name: default-headers
17 tls:
18 secretName: k3s-merox-dev-tls
Apply the ingress configuration:
kubectl apply -f ingress.yml
Now, you should be able to manage your cluster from https://rancher.k3s.merox.dev.
Deploying Longhorn#
If you want to use cloud-ready drive shared storage, follow these steps:
Install Required Packages#
only on the VMs you want to deploy longhorn
sudo apt update && sudo apt install -y open-iscsi nfs-common
Enable iSCSI#
sudo systemctl enable iscsid
sudo systemctl start iscsid
Add Longhorn Label on Nodes#
A minimum of three nodes are required for High Availability. In this setup, we will use three worker nodes:
1kubectl label node k3s-worker-1 storage.longhorn.io/node=true
2kubectl label node k3s-worker-2 storage.longhorn.io/node=true
3kubectl label node k3s-worker-3 storage.longhorn.io/node=true
Deploy Longhorn#
modified to use storage.longhorn.io/node=true
label
kubectl apply -f https://raw.githubusercontent.com/mer0x/merox.docs/refs/heads/master/K3S/cluster-deployment/longhorn.yaml
Verify Deployment#
kubectl get pods --namespace longhorn-system --watch
Print Confirmation#
kubectl get nodes
kubectl get svc -n longhorn-system
Exposing Longhorn with Traefik#
Create Middleware Configuration#
Create a middleware.yml
file:
1apiVersion: traefik.io/v1alpha1
2kind: Middleware
3metadata:
4 name: longhorn-headers
5 namespace: longhorn-system
6spec:
7 headers:
8 customRequestHeaders:
9 X-Forwarded-Proto: "https"
Setup Ingress#
Create an ingress.yml
file:
1apiVersion: networking.k8s.io/v1
2kind: Ingress
3metadata:
4 name: longhorn-ingress
5 namespace: longhorn-system
6 annotations:
7 traefik.ingress.kubernetes.io/router.entrypoints: websecure
8 traefik.ingress.kubernetes.io/router.tls: "true"
9 traefik.ingress.kubernetes.io/router.middlewares: longhorn-system-longhorn-headers@kubernetescrd
10spec:
11 rules:
12 - host: storage.k3s.merox.dev
13 http:
14 paths:
15 - path: /
16 pathType: Prefix
17 backend:
18 service:
19 name: longhorn-frontend
20 port:
21 number: 80
22 tls:
23 - hosts:
24 - storage.k3s.merox.dev
25 secretName: k3s-merox-dev-tls
Using NFS Storage#
If you want to use NFS storage in your cluster, follow this guide: Merox Docs - NFS Storage Guide
Monitoring Your Cluster#
A great monitoring tool for your cluster is Netdata.
You can also try deploying Prometheus and Grafana from Rancher. However, if you don’t fine-tune the setup, you might notice a high resource usage due to the large number of queries processed by Prometheus.
Continuous Deployment with ArgoCD#
ArgoCD is an excellent tool for continuous deployment. You can find more details here.
Upgrading Your Cluster#
If you need to upgrade your cluster, I put some notes here: How to Upgrade K3s.
Final Thoughts#
When I first deployed a K3s/RKE2 cluster (about a year ago), I struggled to find a single source of documentation that covered everything needed for at least a homelab, if not even for production use. Unfortunately, I couldn’t find anything comprehensive, so I decided to write this article to consolidate all the necessary information in one place.
If this guide helped you and you’d like to see more information added, please leave a comment, and I will do my best to update this post.
How Have You Deployed Your Clusters?#
Let me know in the comments!