Rancher on K3s
This post provides instructions about how to create a K3S kubernetes cluster and deploy Rancher Manager on it, among other applications useful in any on-prem cluster for hosting complex solutions, these applications are cert-manager to manage certificates including self-enrollment on Let’s Encrypt, longhorn to provide block storage, minio to provide Object Storage using a S3 compatible API and the Zalando’s Postgres Operator to provide PostgreSQL databases.
Motivation
Sometimes, I’m requested to provide a solution in an environment that can be managed by our client. Working with major cloud providers like AWS or Azure is not a problem because most of the times, is the client who provides the subscription or the cluster, and we deploy on it, so there is no need for a real handover. Also, in those cases, the client already has experience managing clusters in the cloud.
But there are cases where the client wants to host the solution on his own datacenter, for those cases, the solution should be understandable and easy to manage, that’s where Rancher Manager shows its might.
Rancher Manager provides a friendly interface where people with less experience can check the status of the workloads, the cluster nodes and even update the underlying kubernetes version. That’s why I wanted to have a way to automate the deployments.
This is going to be a boring post, with a lot of text, a some lines of yaml code and maybe no screenshot at all.
Provisioning
Provisioning the VM’s is done using Vagrant as usual, in fact, I’m reusing the Vagrantfile and the provisioning from other posts, but with one difference: I had to create a box from the official cloud images.
The process for creating a Vagrant box from the cloud image is well documented in Paul Beltrani’s post so I’m not going deep in this but the steps are pretty simple:
- Download cloud image
- Untar OVA
- Rename .ovf file to box.ovf
- Create a metadata file
- Tar the contents in a new OVA
- Register the box in Vagrant
I didn’t include the Vagrant user in the box, I had to, as indicated in the post, create a cloud-init.yaml file which creates the user with the default ssh-key.
config.vm.cloud_init content_type: "text/cloud-config", path: "./user-data.yaml"
As I discovered the base system image din’t provide enough space after being using the image for a while, I had to instruct Vagrant to increase the size of the root disk, so, I add this in the Vagrantfile inside the loop for creating VMs:
node.vm.disk :disk, size: "20GB", primary: true
Now, Vagrant will expand the root disk when creating the VMs.
Install K3S
Installing K3S is usually done using the shell script which takes care of creating the systemd k3s.service file and starting it. The k3s executable is self-contained to run a k3s cluster, but it’s not something that is persisted across reboots, so I invoked the installation script.
As I said, this deployment is opinionated and focused on having something maintainable, an easy to use, so I decided to disable traefik
and servicelb
.
I’ve created a role that will:
- Download the installation script
- Copy the installation script to the targets
- Download the k3s binary
- Copy the k3s binary to the targets.
- Install k3s on the first control plane node
- Install k3s on the rest of the control plane nodes
- Fetch the k3s
kubeconfig
file - Fix the
kubeconfig
file to point to the load balancer instead of localhost.
Downloading and copying files has no secret on ansible, so let’s focus on the k3s installation.
This will be task for installing k3s on the first node.
- name: Install K3s on the first control plane node
shell: >
k3s-installer server
--cluster-init
--token {{ k3s_token }}
--disable traefik
--disable servicelb
--tls-san {{ lb_ip }}
--tls-san {{ lb_hostname }}
--advertise-address {{ ansible_host }}
--default-local-storage-path /data/k3s
environment:
INSTALL_K3S_SKIP_DOWNLOAD: "true"
any_errors_fatal: true
If the task fails, it will abort the complete playbook, as without the first node, there won’t be any cluster.
Other nodes will use a slightly different task, as they will connect to the first one to init themselves:
- name: Install K3s on other control plane nodes
shell: >
k3s-installer server
--server https://{{ lb_hostname }}:6443
--token {{ k3s_token }}
--disable traefik
--disable servicelb
--advertise-address {{ ansible_host }}
--default-local-storage-path /data/k3s
environment:
INSTALL_K3S_SKIP_DOWNLOAD: "true"
In case this task fails, the playbook can continue because a single node cluster is still a cluster, and more nodes can be added later.
This role is invoked as a play in the playbook, because this tasks are run in the target hosts while the component installation is run in the control host.
The play looks like follows:
- name: Setup K3s Cluster with Rancher
hosts: all
gather_facts: true
become: true
vars:
lb_ip: "192.168.123.100"
lb_hostname: "anthrax.garmo.local"
kubeconfig: ~/.kube/anthrax.yaml
k3s_version: "v1.32.8+k3s1"
k3s_sha256: "sha256:639a113cfc5465082fd98ff2947a3d962039f78adddefbf4a4aecf4a1f485e79"
tasks:
- name: Include K3s installation tasks
import_role:
name: setup-k3s
tags:
- setup-k3s
In this example I’ve overridden the k3s version and it’s sha256sum because I wanted to install an specific version, different from the one in the role.
Install rancher and other components
The second play in the playbook will install several components that we would need to deploy application in the cluster, it’s defined as:
- name: Setup Rancher on K3s
hosts: localhost
gather_facts: false
become: false
vars:
kubeconfig: ~/.kube/anthrax.yaml
tasks:
- name: Include K3s installation tasks
import_role:
name: rancher-on-k3s
tags:
- rancher-on-k3s
Install cert-manager
cert-manager will allow us to create custom certificates, it allows creating self-signed certificates, but beside managing Let’s Encrypt certificates, using it as an issuing CA is a powerful feature.
In my case, as I already have a custom Root CA trusted by my personal computer, I used EasyRSA to create a SubCA for this project, so all certificates issued in this cluster will be signed with that CA.
But first, we use kubernetes.core.helm_repository
ansible module to register the jetstack chart repository and the kubernetes.core.helm
to install a release of the cert-manager helm chart.
- name: Add cert-manager Helm repository
kubernetes.core.helm_repository:
name: jetstack
repo_url: https://charts.jetstack.io
tags:
- cert-manager
- name: Install/Upgrade cert-manager
kubernetes.core.helm:
kubeconfig: "{{ kubeconfig }}"
name: cert-manager
chart_ref: jetstack/cert-manager
chart_version: "{{ certmanager_chart_version }}"
namespace: cert-manager
create_namespace: true
values:
installCRDs: true
wait: true
wait_timeout: 600s
tags:
- cert-manager
This will be the process to install all components that are distributed in helm charts.
Then to create a subCA using easyrsa do the following
cd ~/pki #Work directory of the existing pki
easyrsa gen-req rancher-subca nopass
easyrsa sign-req ca rancher-subca
Then your subca’s cert and key would be in pki/issued/rancher-subca.crt
and pki/private/rancher-subca.key
, create an ansible-vault encrypted variables.yaml file and put their contents as certmanager_subca_crt
and certmanager_subca_key
respectively.
Those variables would be used to create our cluster issue in the tasks:
- name: Create secret for Rancher cluster issuer CAs.
kubernetes.core.k8s:
kubeconfig: "{{ kubeconfig }}"
state: present
definition:
apiVersion: v1
kind: Secret
metadata:
name: "{{ certmanager_cluster_issuer_name }}"
namespace: cert-manager
type: kubernetes.io/tls
data:
tls.crt: "{{ certmanager_subca_crt | b64encode }}"
tls.key: "{{ certmanager_subca_key | b64encode }}"
tags:
- cert-manager
- name: Create ClusterIssuer for Rancher
kubernetes.core.k8s:
kubeconfig: "{{ kubeconfig }}"
state: present
definition:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: "{{ certmanager_cluster_issuer_name }}"
spec:
ca:
secretName: "{{ certmanager_cluster_issuer_name }}"
tags:
- cert-manager
We can check for the status of the ClusterIssuer:
juanjo@lab pki $ kubectl get clusterissuer NAME READY AGE rancher-subca True 74m
Install nginx ingress
Installing the nginx ingress is basically the same as the provided in the official instructions, but using ansible, in this case, I already have the load balancer pointing to specific node ports, so the role ensures that the service is defined as NodePort and the right ports are used.
- name: Add ingress-nginx Helm repository
kubernetes.core.helm_repository:
name: ingress-nginx
repo_url: https://kubernetes.github.io/ingress-nginx
tags:
- ingress-nginx
- name: Install/Upgrade ingress-nginx
kubernetes.core.helm:
kubeconfig: "{{ kubeconfig }}"
name: ingress-nginx
chart_ref: ingress-nginx/ingress-nginx
chart_version: "{{ ingress_nginx_chart_version }}"
namespace: kube-system
values:
controller:
service:
type: NodePort
nodePorts:
http: "{{ ingress_nginx_nodeport_http }}"
https: "{{ ingress_nginx_nodeport_https }}"
wait: true
wait_timeout: 600s
tags:
- ingress-nginx
Install Rancher Manager
Rancher Manager is a very complex software but it comes packaged in a helm chart that makes installing it a breeze, the default values file is immense, but very few values need to be changed from the default for it to work.
- name: Install/Upgrade Rancher
kubernetes.core.helm:
kubeconfig: "{{ kubeconfig }}"
name: rancher
chart_ref: rancher-stable/rancher
chart_version: "{{ rancher_chart_version }}"
namespace: cattle-system
create_namespace: false
values:
hostname: "{{ rancher_hostname }}"
bootstrapPassword: "{{ rancher_bootstrap_password }}"
replicas: 1
auditLog:
enabled: true
maxBackup: 10
ingress:
ingressClassName: nginx
tls:
source: secret
secretName: tls-rancher-ingress
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: "{{ certmanager_cluster_issuer_name }}"
privateCA: true
additionalTrustedCAs: false
But, as we’re using a private CA, we’ll need to create a secret for it, so I put the root ca in a file under files/tls/anthraxca.crt
, but you can create the rancher_privateca_pem
variable in your variables file and it will be created by the task:
- name: Create secret for Private CAs.
kubernetes.core.k8s:
kubeconfig: "{{ kubeconfig }}"
state: present
definition:
apiVersion: v1
kind: Secret
metadata:
name: tls-ca
namespace: cattle-system
type: kubernetes.io/generic
data:
cacerts.pem: "{{ rancher_privateca_pem |b64encode}}"
tags:
- rancher
Rancher helm chart requires the tls secret to exist before installing the release, so it’s created using our subca by the following task:
- name: Create secret for Rancher ingress TLS.
kubernetes.core.k8s:
kubeconfig: "{{ kubeconfig }}"
state: present
definition:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: tls-rancher-ingress
namespace: cattle-system
spec:
secretName: tls-rancher-ingress
dnsNames:
- "{{ rancher_hostname }}"
commonName: rancher.cattle-system.svc.cluster.local
issuerRef:
name: "{{ certmanager_cluster_issuer_name }}"
kind: ClusterIssuer
duration: 24h
renewBefore: 1h
tags:
- rancher
The certificate lifespan is set to a very short value, but this is for demonstration purposes.
Install Longhorn
Longhorn allows using the available space on the cluster nodes to provide Persistent Volumes for kubernetes workloads, provides replication to ensure data availability and snapshotting volumes for recovering data.
It integrates with Rancher Manager, providing a visual representation of the volumes state, and easing some maintenance tasks like removing dead replicas.
The tasks that deploy it are as usual: registering the helm chart repository and installing the release:
- name: Add longhorn Helm repository
kubernetes.core.helm_repository:
name: longhorn
repo_url: https://charts.longhorn.io
tags:
- longhorn
- name: Install/Upgrade Longhorn
kubernetes.core.helm:
kubeconfig: "{{ kubeconfig }}"
name: longhorn
chart_ref: longhorn/longhorn
chart_version: "{{ longhorn_chart_version }}"
namespace: longhorn-system
create_namespace: true
values:
ingress:
enabled: true
ingressClassName: nginx
host: "{{ longhorn_hostname }}"
tls: true
tlsSecret: longhorn-rancher-tls
annotations:
kubernetes.io/ingress.class: nginx
kubernetes.io/tls-acme: true
cert-manager.io/cluster-issuer: "{{ certmanager_cluster_issuer_name }}"
defaultSettings:
defaultDataPath: /data/longhorn
wait: true
wait_timeout: 600s
tags:
- longhorn
In this case, we’re instructing the chart to use our secondary disk mounted on /data
to store the volumes, preventing it from eating all the space in the root disk.
Also, we’re going to use our cluster issue to issue a certificate using our subca.
Install MINio
Minio provides a S3 compatible object storage. It mimics not only the API, so any language that has libraries to access S3 can use minio, but also mimics the policy syntax, so any S3 administrator is able to manage Minio policies.
As minio encrypts all traffic among its replicas, it needs a certificate to exist before being deployed. Again, we use our cluster issuer:
- name: Create a secret for minio tls using cert-manager
kubernetes.core.k8s:
kubeconfig: "{{ kubeconfig }}"
state: present
definition:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: minio-tls
namespace: minio
spec:
secretName: minio-tls
dnsNames:
- minio.minio.svc.cluster.local
- "*.minio-svc.minio.svc"
- "*.minio-svc.minio.svc.cluster.local"
commonName: minio.minio.svc.cluster.local
issuerRef:
name: "{{ certmanager_cluster_issuer_name }}"
kind: ClusterIssuer
duration: 24h
renewBefore: 1h
tags:
- minio
The installation tasks are:
- name: Add minio helm repository
kubernetes.core.helm_repository:
name: minio
repo_url: https://charts.min.io/
tags:
- minio
- name: Install/Upgrade MinIO
kubernetes.core.helm:
kubeconfig: "{{ kubeconfig }}"
name: minio
chart_ref: minio/minio
chart_version: "{{ minio_chart_version }}"
namespace: minio
values:
replicas: 2
tls:
enabled: true
certSecret: minio-tls
publicCrt: tls.crt
privateKey: tls.key
persistence:
enabled: true
storageClass: "longhorn"
size: "{{ minio_storage_size}}"
ingress:
enabled: true
annotations:
nginx.ingress.kubernetes.io/secure-backends: "true"
nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
cert-manager.io/cluster-issuer: "{{ certmanager_cluster_issuer_name }}"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
kubernetes.io/ingress.class: nginx
ingressClassName: nginx
hosts:
- "{{ minio_hostname }}"
tls:
- secretName: minio-ingress-tls
hosts:
- "{{ minio_hostname }}"
consoleIngress:
enabled: true
annotations:
nginx.ingress.kubernetes.io/secure-backends: "true"
nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
cert-manager.io/cluster-issuer: "{{ certmanager_cluster_issuer_name }}"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
kubernetes.io/ingress.class: nginx
ingressClassName: nginx
hosts:
- "{{ minio_console_hostname }}"
tls:
- secretName: minio-console-ingress-tls
hosts:
- "{{ minio_console_hostname }}"
resources:
requests:
memory: 1Gi
wait: true
wait_timeout: 600s
tags:
- minio
For minio, two ingresses are defined, one for the api and the other one for the management interface.
Minio will use “longhorn” as storage class for their volumes, so we can have several replication layers here: at object level defined in minio, and at volume level defined in longhorn. In my experience, we can configure minio buckets to not use replication and relay in longhorn replication.
Install Zalando’s postgres-operator
Zalando’s postgres operator will allow us tu quickly deploy postgresql database clusters, with automatic failover, backup to an S3 compatible bucket and restoring from it. We will use minio for that in our cluster.
Its installation is pretty clean as we will use the default values:
- name: Add postgres-operator Helm repository
kubernetes.core.helm_repository:
name: postgres-operator-charts
repo_url: https://opensource.zalando.com/postgres-operator/charts/postgres-operator
tags:
- postgres-operator
- name: Install/Upgrade Postgres Operator
kubernetes.core.helm:
kubeconfig: "{{ kubeconfig }}"
name: postgres-operator
chart_ref: postgres-operator-charts/postgres-operator
chart_version: "{{ postgres_operator_chart_version }}"
namespace: postgres-operator
create_namespace: true
tags:
- postgres-operator
Running the playbook
First, provision the VMs:
vagrant up
As I said before, I set the sensitive information in an ansible-vault variables file, then to run the playbook I did:
ansible-playbook site.yaml -e @variables.yaml --ask-vault-password
I was prompted for the password and, in about ten minutes, the playbook completed and my Rancher Manager instance was available at my defined url with all the components working.
Conclusions
This cluster provides the basic feature usually needed in an on-prem cluster to deploy complex solutions, for example, a django application which uses a postgresql database and stores files in a S3 compatible object storage. The application can also provide a React frontend running in a separate container, being both exposed using HTTPS with certificates issued by a subca.
Future works
This cluster just works, but there are some nice additions that could improve it and make it more production friendly, so the following tasks would be:
- Adding support for SOPS secrets.
- Adding keycloak for identity management.
- Adding continuous delivery with ArgoCD.
- Adding monitoring with grafana stack.
References
- K3S https://k3s.io
- Rancher Manager https://rancher.com
- cert-manager https://cert-manager.io
- Longhorn https://longhorn.io
- MINio https://minio.io
- Zalando’s Postgres Operator https://opensource.zalando.com/postgres-operator/docs/quickstart.html
- Files in my homelab repo https://github.com/juanjo-vlc/homelab/tree/main/k8s-rancher
- Vagrant Boxes from Official Ubuntu Cloud Images, by Paul Beltrani https://beltrani.com/vagrant-boxes-from-official-ubuntu-cloud-images/