HA Kubernetes Cluster

My first setup of a kubernetes cluster lacked HA features because I didn’t completely understood the procedure and ran directly into the abyss. With a little more experience on K8s, and proof reading of the official guide this time I managed to set it up correctly.

Setting up the infrastructure

I started with the same infrastructure of my first attempt, that is a huge benefit from Infrastructure as Code, you can reuse your definitions and spin up an environment in seconds, and I did so.

But for setting up a HA kubernetes cluster, I needed a load balancer and the fastest way for me was setting up a container running HAProxy on my host machine with a very basic configuration file, I changed the example to tcp mode so the ssl connections were tunneled and the certificates were still valid.

My haproxy.cfg:

global
defaults
	timeout client		30s
	timeout server		30s
	timeout connect		30s

frontend MyFrontend
	bind 0.0.0.0:6443
	default_backend		TransparentBack

backend TransparentBack
	mode			tcp
	server			node1 192.168.123.201:6443 check port 6443
	server			node2 192.168.123.202:6443 check port 6443
	server			node3 192.168.123.203:6443 check port 6443

And I ran it using podman:

podman run -p 6443:6443 -v ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro -it haproxy

I made a couple of changes in my ansible playbook, I added my host’s network address on each host /etc/hosts file, so they can resolve the balancer address. And second, as suggested on this stackoverflow thread, I addedd a couple of parameters to the kubelet service editing the jinja2 template 09-extra-args.conf.j2

--- k8s/templates/09-extra-args.conf.j2	2021-08-23 17:32:57.752915581 +0200
+++ k8sha/templates/09-extra-args.conf.j2	2021-10-30 17:55:01.641828075 +0200
@@ -1,3 +1,3 @@
 # {{ ansible_managed }}
 [Service]
-Environment=KUBELET_EXTRA_ARGS="--address={{ ansible_facts[internalip_iface]['ipv4']['address'] }} --node-ip={{ ansible_facts[internalip_iface]['ipv4']['address'] }}"
+Environment=KUBELET_EXTRA_ARGS="--address={{ ansible_facts[internalip_iface]['ipv4']['address'] }} --node-ip={{ ansible_facts[internalip_iface]['ipv4']['address'] }} --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice"

Setting up the first master

Once all nodes were up and configured at OS level, it was time to setup the first master. The script from my first attempt was not compatible with configuring a multimaster cluster, so I used the command line parameters:

kubeadm init --upload-certs --node-name=k8snode1 --control-plane-endpoint anthrax.garmo.local:6443 --apiserver-advertise-address 10.255.255.101

As vagrant forced the use of a NAT network as primary interface, I had to specify the address of the private network, without this parameter both apiserver and etcd tried to advertise the 10.0.2.15 address, which not only it was not accessible from other vms, but also each vm had its own 10.0.2.15 address assigned to its primary interface.

Everything went smoothly and my first master node was up after the few tries which led me to the final kubeadm command.

...
Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of the control-plane node running the following command on each as root:

  kubeadm join anthrax.garmo.local:6443 --token u75068.un3ci99ox6qsvh5k \
	--discovery-token-ca-cert-hash sha256:f116b5a382265af12636f7b35a3ac454e0e4febc5ce9661163455e06b783c4ea \
	--control-plane --certificate-key 70a5bf2a3c5eb7d73114020483f796365e1f937bbf112bf24cc8ff45452cef66

Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join anthrax.garmo.local:6443 --token u75068.un3ci99ox6qsvh5k \
	--discovery-token-ca-cert-hash sha256:f116b5a382265af12636f7b35a3ac454e0e4febc5ce9661163455e06b783c4ea

Installing weave network plugin

The next step was to deploy a network plugin. I chose weave for two reasons:

  1. It has support for network policies.
  2. Its installation is documented on the official documentation and can be used on certification exams.

Obviously the first reason was the important one, but here is the installation command copied directly from kubernetes documentation.

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

Configuring additional control plane nodes:

As I had to specify the advertise address for each node to prevent them using the NAT interface I ran the following commands:

# On master 2
kubeadm join anthrax.garmo.local:6443 --token u75068.un3ci99ox6qsvh5k --discovery-token-ca-cert-hash sha256:f116b5a382265af12636f7b35a3ac454e0e4febc5ce9661163455e06b783c4ea --control-plane --certificate-key 70a5bf2a3c5eb7d73114020483f796365e1f937bbf112bf24cc8ff45452cef66 --apiserver-advertise-address 10.255.255.102

# On master 3
kubeadm join anthrax.garmo.local:6443 --token u75068.un3ci99ox6qsvh5k --discovery-token-ca-cert-hash sha256:f116b5a382265af12636f7b35a3ac454e0e4febc5ce9661163455e06b783c4ea --control-plane --certificate-key 70a5bf2a3c5eb7d73114020483f796365e1f937bbf112bf24cc8ff45452cef66 --apiserver-advertise-address 10.255.255.103

The only difference was the --apiserver-advertise-address value.

Configuring worker nodes

After wating a couple of minutes for having my second and third node correctly registered on the cluster, I registered two additional worker nodes, in this case, as I didn’t had to specify the advertise address, I was able to use the command suggested by kubeadm verbatim:

ansible cluster_workers -a 'kubeadm join anthrax.garmo.local:6443 --token u75068.un3ci99ox6qsvh5k --discovery-token-ca-cert-hash sha256:f116b5a382265af12636f7b35a3ac454e0e4febc5ce9661163455e06b783c4ea'
k8snode5 | CHANGED | rc=0 >>
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
k8snode4 | CHANGED | rc=0 >>
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

Troubleshooting network plugin

After all nodes joined the cluster, I noticed the weave pods were failing on all nodes but the first one:

# kubectl get pods -n kube-system -l name=weave-net
NAME              READY   STATUS             RESTARTS        AGE
weave-net-4cz7b   1/2     CrashLoopBackOff   8 (2m ago)      24m
weave-net-dg2fh   1/2     CrashLoopBackOff   7 (4m55s ago)   16m
weave-net-qdhvw   2/2     Running            1 (32m ago)     32m
weave-net-qfz66   1/2     CrashLoopBackOff   8 (84s ago)     22m
weave-net-tw2qq   1/2     CrashLoopBackOff   7 (4m41s ago)   16m

So after googling a bit, I found an issue on weave’s github, the problem was caused by vagrant network setup (again), and the fix was forcing the route.

ansible all -a 'ip r a 10.96.0.0/16 dev eth1'
k8snode1 | CHANGED | rc=0 >>

k8snode4 | CHANGED | rc=0 >>

k8snode2 | CHANGED | rc=0 >>

k8snode5 | CHANGED | rc=0 >>

k8snode3 | CHANGED | rc=0 >>

As this was not going to survive reboots, I added a new task to my set up playbook

    - name: Add a custom route for k8s networking
      copy:
        content: |
          network:
            version: 2
            renderer: networkd
            ethernets:
              eth1:
                routes:
                  - to: 10.96.0.0/16
                    via: 0.0.0.0          
        dest: /etc/netplan/99-k8s-routes.yaml
      notify: 'Apply netplan'

and a handler:

    - name: Apply netplan
      command:
        cmd: netplan apply

That made the route persistent across reboots.

Testing the HA

First I launched a pod with my buildah built helloworld image from my jenkins post:

# kubectl run helloworld --image 192.168.123.100:8081/juanjovlc2/gunitest:dev
pod/helloworld created

And exposed its port as NodePort

# kubectl expose pod helloworld --target-port=9090 --port 9090 --type NodePort
service/helloworld exposed
# kubectl get services
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
helloworld   NodePort    10.96.147.41   <none>        9090:32599/TCP   4s
kubernetes   ClusterIP   10.96.0.1      <none>        443/TCP          98m

Then I tested it from my host machine:

$ for i in {201..205}; do curl 192.168.123.$i:32599; done
Hello, World!
Hello, World!
Hello, World!
Hello, World!
Hello, World!

Once I was sure everything was working, one simple way to test the if the HA was working was powering off the first node.

After few seconds, the first node was marked as NotReady.

root@k8snode2:/home/vagrant# kubectl get nodes
NAME       STATUS     ROLES                  AGE    VERSION
k8snode1   NotReady   control-plane,master   108m   v1.22.3
k8snode2   Ready      control-plane,master   98m    v1.22.3
k8snode3   Ready      control-plane,master   97m    v1.22.3
k8snode4   Ready      <none>                 91m    v1.22.3
k8snode5   Ready      <none>                 91m    v1.22.3

Then I deleted my helloworld service and pod, and created them again.

root@k8snode2:/home/vagrant# kubectl delete service helloworld
service "helloworld" deleted
root@k8snode2:/home/vagrant# kubectl delete pod helloworld
pod "helloworld" deleted
root@k8snode2:/home/vagrant# kubectl run helloworld --image 192.168.123.100:8081/juanjovlc2/gunitest:dev
pod/helloworld created
root@k8snode2:/home/vagrant# kubectl expose pod helloworld --target-port=9090 --port 9090 --type NodePort
service/helloworld exposed
root@k8snode2:/home/vagrant# kubectl get pods
NAME         READY   STATUS    RESTARTS   AGE
helloworld   1/1     Running   0          24s
root@k8snode2:/home/vagrant# kubectl get services
NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
helloworld   NodePort    10.108.172.121   <none>        9090:32035/TCP   66s
kubernetes   ClusterIP   10.96.0.1        <none>        443/TCP          124m

And it worked even though the cluster was missing one master.

root@k8snode2:/home/vagrant# kubectl get nodes
NAME       STATUS     ROLES                  AGE    VERSION
k8snode1   NotReady   control-plane,master   123m   v1.22.3
k8snode2   Ready      control-plane,master   114m   v1.22.3
k8snode3   Ready      control-plane,master   112m   v1.22.3
k8snode4   Ready      <none>                 106m   v1.22.3
k8snode5   Ready      <none>                 106m   v1.22.3

Security warning

I’ve posted my token and my certificate hash because my nodes were destroyed before publishing this article, but they are sensitive data, specially if you are using cloud providers which provide external access to your resources, so, KEEP YOUR SECRETS SECRET!

References