Rancher on K3s

This post provides instructions about how to create a K3S kubernetes cluster and deploy Rancher Manager on it, among other applications useful in any on-prem cluster for hosting complex solutions, these applications are cert-manager to manage certificates including self-enrollment on Let’s Encrypt, longhorn to provide block storage, minio to provide Object Storage using a S3 compatible API and the Zalando’s Postgres Operator to provide PostgreSQL databases.

Motivation

Sometimes, I’m requested to provide a solution in an environment that can be managed by our client. Working with major cloud providers like AWS or Azure is not a problem because most of the times, is the client who provides the subscription or the cluster, and we deploy on it, so there is no need for a real handover. Also, in those cases, the client already has experience managing clusters in the cloud.

But there are cases where the client wants to host the solution on his own datacenter, for those cases, the solution should be understandable and easy to manage, that’s where Rancher Manager shows its might.

Rancher Manager provides a friendly interface where people with less experience can check the status of the workloads, the cluster nodes and even update the underlying kubernetes version. That’s why I wanted to have a way to automate the deployments.

This is going to be a boring post, with a lot of text, a some lines of yaml code and maybe no screenshot at all.

Provisioning

Provisioning the VM’s is done using Vagrant as usual, in fact, I’m reusing the Vagrantfile and the provisioning from other posts, but with one difference: I had to create a box from the official cloud images.

The process for creating a Vagrant box from the cloud image is well documented in Paul Beltrani’s post so I’m not going deep in this but the steps are pretty simple:

  1. Download cloud image
  2. Untar OVA
  3. Rename .ovf file to box.ovf
  4. Create a metadata file
  5. Tar the contents in a new OVA
  6. Register the box in Vagrant

I didn’t include the Vagrant user in the box, I had to, as indicated in the post, create a cloud-init.yaml file which creates the user with the default ssh-key.

config.vm.cloud_init content_type: "text/cloud-config", path: "./user-data.yaml"

As I discovered the base system image din’t provide enough space after being using the image for a while, I had to instruct Vagrant to increase the size of the root disk, so, I add this in the Vagrantfile inside the loop for creating VMs:

node.vm.disk :disk, size: "20GB", primary: true

Now, Vagrant will expand the root disk when creating the VMs.

Install K3S

Installing K3S is usually done using the shell script which takes care of creating the systemd k3s.service file and starting it. The k3s executable is self-contained to run a k3s cluster, but it’s not something that is persisted across reboots, so I invoked the installation script.

As I said, this deployment is opinionated and focused on having something maintainable, an easy to use, so I decided to disable traefik and servicelb.

I’ve created a role that will:

  1. Download the installation script
  2. Copy the installation script to the targets
  3. Download the k3s binary
  4. Copy the k3s binary to the targets.
  5. Install k3s on the first control plane node
  6. Install k3s on the rest of the control plane nodes
  7. Fetch the k3s kubeconfig file
  8. Fix the kubeconfig file to point to the load balancer instead of localhost.

Downloading and copying files has no secret on ansible, so let’s focus on the k3s installation.

This will be task for installing k3s on the first node.

  - name: Install K3s on the first control plane node
    shell: >
      k3s-installer server
      --cluster-init
      --token {{ k3s_token }}
      --disable traefik
      --disable servicelb
      --tls-san {{ lb_ip }}
      --tls-san {{ lb_hostname }}
      --advertise-address {{ ansible_host }}
      --default-local-storage-path /data/k3s      
    environment:
      INSTALL_K3S_SKIP_DOWNLOAD: "true"
    any_errors_fatal: true

If the task fails, it will abort the complete playbook, as without the first node, there won’t be any cluster.

Other nodes will use a slightly different task, as they will connect to the first one to init themselves:

  - name: Install K3s on other control plane nodes
    shell: >
      k3s-installer server 
      --server https://{{ lb_hostname }}:6443
      --token {{ k3s_token }}
      --disable traefik
      --disable servicelb
      --advertise-address {{ ansible_host }}
      --default-local-storage-path /data/k3s      
    environment:
      INSTALL_K3S_SKIP_DOWNLOAD: "true"

In case this task fails, the playbook can continue because a single node cluster is still a cluster, and more nodes can be added later.

This role is invoked as a play in the playbook, because this tasks are run in the target hosts while the component installation is run in the control host.

The play looks like follows:

- name: Setup K3s Cluster with Rancher
  hosts: all
  gather_facts: true
  become: true
  vars:
    lb_ip: "192.168.123.100"
    lb_hostname: "anthrax.garmo.local"
    kubeconfig: ~/.kube/anthrax.yaml
    k3s_version: "v1.32.8+k3s1"
    k3s_sha256: "sha256:639a113cfc5465082fd98ff2947a3d962039f78adddefbf4a4aecf4a1f485e79"
  tasks:
    - name: Include K3s installation tasks
      import_role:
        name: setup-k3s
      tags: 
        - setup-k3s

In this example I’ve overridden the k3s version and it’s sha256sum because I wanted to install an specific version, different from the one in the role.

Install rancher and other components

The second play in the playbook will install several components that we would need to deploy application in the cluster, it’s defined as:

- name: Setup Rancher on K3s
  hosts: localhost
  gather_facts: false
  become: false
  vars:
    kubeconfig: ~/.kube/anthrax.yaml
  tasks:
    - name: Include K3s installation tasks
      import_role:
        name: rancher-on-k3s
      tags: 
        - rancher-on-k3s

Install cert-manager

cert-manager will allow us to create custom certificates, it allows creating self-signed certificates, but beside managing Let’s Encrypt certificates, using it as an issuing CA is a powerful feature.

In my case, as I already have a custom Root CA trusted by my personal computer, I used EasyRSA to create a SubCA for this project, so all certificates issued in this cluster will be signed with that CA.

But first, we use kubernetes.core.helm_repository ansible module to register the jetstack chart repository and the kubernetes.core.helm to install a release of the cert-manager helm chart.

- name: Add cert-manager Helm repository
  kubernetes.core.helm_repository:
    name: jetstack
    repo_url: https://charts.jetstack.io
  tags:
    - cert-manager
- name: Install/Upgrade cert-manager
  kubernetes.core.helm:
    kubeconfig: "{{ kubeconfig }}"
    name: cert-manager
    chart_ref: jetstack/cert-manager
    chart_version: "{{ certmanager_chart_version }}"
    namespace: cert-manager
    create_namespace: true
    values:
      installCRDs: true
    wait: true
    wait_timeout: 600s
  tags:
    - cert-manager

This will be the process to install all components that are distributed in helm charts.

Then to create a subCA using easyrsa do the following

cd ~/pki #Work directory of the existing pki
easyrsa gen-req rancher-subca nopass
easyrsa sign-req ca rancher-subca

Then your subca’s cert and key would be in pki/issued/rancher-subca.crt and pki/private/rancher-subca.key, create an ansible-vault encrypted variables.yaml file and put their contents as certmanager_subca_crt and certmanager_subca_key respectively.

Those variables would be used to create our cluster issue in the tasks:

- name: Create secret for Rancher cluster issuer CAs.
  kubernetes.core.k8s:
    kubeconfig: "{{ kubeconfig }}"
    state: present
    definition:
      apiVersion: v1
      kind: Secret
      metadata:
        name: "{{ certmanager_cluster_issuer_name }}"
        namespace: cert-manager
      type: kubernetes.io/tls
      data:
        tls.crt: "{{ certmanager_subca_crt | b64encode }}"
        tls.key: "{{ certmanager_subca_key | b64encode }}"
  tags:
    - cert-manager 
- name: Create ClusterIssuer for Rancher
  kubernetes.core.k8s:
    kubeconfig: "{{ kubeconfig }}"
    state: present
    definition:
      apiVersion: cert-manager.io/v1
      kind: ClusterIssuer
      metadata:
        name: "{{ certmanager_cluster_issuer_name }}"
      spec:
        ca:
          secretName: "{{ certmanager_cluster_issuer_name }}"
  tags:
    - cert-manager

We can check for the status of the ClusterIssuer:

juanjo@lab pki $ kubectl get clusterissuer
NAME            READY   AGE
rancher-subca   True    74m

Install nginx ingress

Installing the nginx ingress is basically the same as the provided in the official instructions, but using ansible, in this case, I already have the load balancer pointing to specific node ports, so the role ensures that the service is defined as NodePort and the right ports are used.

- name: Add ingress-nginx Helm repository
  kubernetes.core.helm_repository:
    name: ingress-nginx
    repo_url: https://kubernetes.github.io/ingress-nginx
  tags:
    - ingress-nginx
- name: Install/Upgrade ingress-nginx
  kubernetes.core.helm:
    kubeconfig: "{{ kubeconfig }}"
    name: ingress-nginx
    chart_ref: ingress-nginx/ingress-nginx
    chart_version: "{{ ingress_nginx_chart_version }}"
    namespace: kube-system
    values:
      controller:
        service:
          type: NodePort
          nodePorts:
            http: "{{ ingress_nginx_nodeport_http }}"
            https: "{{ ingress_nginx_nodeport_https }}"
    wait: true
    wait_timeout: 600s
  tags:
    - ingress-nginx

Install Rancher Manager

Rancher Manager is a very complex software but it comes packaged in a helm chart that makes installing it a breeze, the default values file is immense, but very few values need to be changed from the default for it to work.

- name: Install/Upgrade Rancher
  kubernetes.core.helm:
    kubeconfig: "{{ kubeconfig }}"
    name: rancher
    chart_ref: rancher-stable/rancher
    chart_version: "{{ rancher_chart_version }}"
    namespace: cattle-system
    create_namespace: false
    values:
      hostname: "{{ rancher_hostname }}"
      bootstrapPassword: "{{ rancher_bootstrap_password }}"
      replicas: 1
      auditLog:
        enabled: true
        maxBackup: 10
      ingress:
        ingressClassName: nginx
        tls:
          source: secret
          secretName: tls-rancher-ingress
        annotations:
          kubernetes.io/ingress.class: nginx
          cert-manager.io/cluster-issuer: "{{ certmanager_cluster_issuer_name }}"
      privateCA: true
      additionalTrustedCAs: false

But, as we’re using a private CA, we’ll need to create a secret for it, so I put the root ca in a file under files/tls/anthraxca.crt, but you can create the rancher_privateca_pem variable in your variables file and it will be created by the task:

- name: Create secret for Private CAs.
  kubernetes.core.k8s:
    kubeconfig: "{{ kubeconfig }}"
    state: present
    definition:
      apiVersion: v1
      kind: Secret
      metadata:
        name: tls-ca
        namespace: cattle-system
      type: kubernetes.io/generic
      data:
        cacerts.pem: "{{ rancher_privateca_pem |b64encode}}"
  tags:
    - rancher 

Rancher helm chart requires the tls secret to exist before installing the release, so it’s created using our subca by the following task:

- name: Create secret for Rancher ingress TLS.
  kubernetes.core.k8s:
    kubeconfig: "{{ kubeconfig }}"
    state: present
    definition:
      apiVersion: cert-manager.io/v1
      kind: Certificate
      metadata:
        name: tls-rancher-ingress
        namespace: cattle-system
      spec:
        secretName: tls-rancher-ingress
        dnsNames: 
          - "{{ rancher_hostname }}"
        commonName: rancher.cattle-system.svc.cluster.local
        issuerRef:
          name: "{{ certmanager_cluster_issuer_name }}"
          kind: ClusterIssuer
        duration: 24h
        renewBefore: 1h
  tags:
    - rancher

The certificate lifespan is set to a very short value, but this is for demonstration purposes.

Install Longhorn

Longhorn allows using the available space on the cluster nodes to provide Persistent Volumes for kubernetes workloads, provides replication to ensure data availability and snapshotting volumes for recovering data.

It integrates with Rancher Manager, providing a visual representation of the volumes state, and easing some maintenance tasks like removing dead replicas.

The tasks that deploy it are as usual: registering the helm chart repository and installing the release:

- name: Add longhorn Helm repository
  kubernetes.core.helm_repository:
    name: longhorn
    repo_url: https://charts.longhorn.io
  tags:
    - longhorn
- name: Install/Upgrade Longhorn
  kubernetes.core.helm:
    kubeconfig: "{{ kubeconfig }}"
    name: longhorn
    chart_ref: longhorn/longhorn
    chart_version: "{{ longhorn_chart_version }}"
    namespace: longhorn-system
    create_namespace: true
    values:
      ingress:
        enabled: true
        ingressClassName: nginx
        host: "{{ longhorn_hostname }}"
        tls: true
        tlsSecret: longhorn-rancher-tls
        annotations:
          kubernetes.io/ingress.class: nginx
          kubernetes.io/tls-acme: true
          cert-manager.io/cluster-issuer: "{{ certmanager_cluster_issuer_name }}"
      defaultSettings:
        defaultDataPath: /data/longhorn
    wait: true
    wait_timeout: 600s
  tags:
    - longhorn

In this case, we’re instructing the chart to use our secondary disk mounted on /data to store the volumes, preventing it from eating all the space in the root disk. Also, we’re going to use our cluster issue to issue a certificate using our subca.

Install MINio

Minio provides a S3 compatible object storage. It mimics not only the API, so any language that has libraries to access S3 can use minio, but also mimics the policy syntax, so any S3 administrator is able to manage Minio policies.

As minio encrypts all traffic among its replicas, it needs a certificate to exist before being deployed. Again, we use our cluster issuer:

- name: Create a secret for minio tls using cert-manager
  kubernetes.core.k8s:
    kubeconfig: "{{ kubeconfig }}"
    state: present
    definition:
      apiVersion: cert-manager.io/v1
      kind: Certificate
      metadata:
        name: minio-tls
        namespace: minio
      spec:
        secretName: minio-tls
        dnsNames:
          - minio.minio.svc.cluster.local
          - "*.minio-svc.minio.svc"
          - "*.minio-svc.minio.svc.cluster.local"
        commonName: minio.minio.svc.cluster.local
        issuerRef:
          name: "{{ certmanager_cluster_issuer_name }}"
          kind: ClusterIssuer
        duration: 24h
        renewBefore: 1h
  tags:
    - minio

The installation tasks are:

- name: Add minio helm repository
  kubernetes.core.helm_repository:
    name: minio
    repo_url: https://charts.min.io/
  tags:
    - minio
- name: Install/Upgrade MinIO
  kubernetes.core.helm:
    kubeconfig: "{{ kubeconfig }}"
    name: minio
    chart_ref: minio/minio
    chart_version: "{{ minio_chart_version }}"
    namespace: minio
    values:
      replicas: 2
      tls:
        enabled: true
        certSecret: minio-tls
        publicCrt: tls.crt
        privateKey: tls.key
      persistence:
        enabled: true
        storageClass: "longhorn"
        size: "{{ minio_storage_size}}"
      ingress:
        enabled: true
        annotations:
          nginx.ingress.kubernetes.io/secure-backends: "true"
          nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
          cert-manager.io/cluster-issuer: "{{ certmanager_cluster_issuer_name }}"
          nginx.ingress.kubernetes.io/ssl-redirect: "true"
          kubernetes.io/ingress.class: nginx
        ingressClassName: nginx
        hosts:
          - "{{ minio_hostname }}"
        tls:
          - secretName: minio-ingress-tls
            hosts:
              - "{{ minio_hostname }}"
      consoleIngress:
        enabled: true
        annotations:
          nginx.ingress.kubernetes.io/secure-backends: "true"
          nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
          cert-manager.io/cluster-issuer: "{{ certmanager_cluster_issuer_name }}"
          nginx.ingress.kubernetes.io/ssl-redirect: "true"
          kubernetes.io/ingress.class: nginx
        ingressClassName: nginx
        hosts:
          - "{{ minio_console_hostname }}"
        tls:
          - secretName: minio-console-ingress-tls
            hosts:
              - "{{ minio_console_hostname }}"
      resources:
        requests:
          memory: 1Gi
    wait: true
    wait_timeout: 600s
  tags:
    - minio

For minio, two ingresses are defined, one for the api and the other one for the management interface.

Minio will use “longhorn” as storage class for their volumes, so we can have several replication layers here: at object level defined in minio, and at volume level defined in longhorn. In my experience, we can configure minio buckets to not use replication and relay in longhorn replication.

Install Zalando’s postgres-operator

Zalando’s postgres operator will allow us tu quickly deploy postgresql database clusters, with automatic failover, backup to an S3 compatible bucket and restoring from it. We will use minio for that in our cluster.

Its installation is pretty clean as we will use the default values:

- name: Add postgres-operator Helm repository
  kubernetes.core.helm_repository:
    name: postgres-operator-charts
    repo_url: https://opensource.zalando.com/postgres-operator/charts/postgres-operator
  tags:
    - postgres-operator
- name: Install/Upgrade Postgres Operator
  kubernetes.core.helm:
    kubeconfig: "{{ kubeconfig }}"
    name: postgres-operator
    chart_ref: postgres-operator-charts/postgres-operator
    chart_version: "{{ postgres_operator_chart_version }}"
    namespace: postgres-operator
    create_namespace: true
  tags:
    - postgres-operator

Running the playbook

First, provision the VMs:

vagrant up

As I said before, I set the sensitive information in an ansible-vault variables file, then to run the playbook I did:

ansible-playbook site.yaml -e @variables.yaml --ask-vault-password

I was prompted for the password and, in about ten minutes, the playbook completed and my Rancher Manager instance was available at my defined url with all the components working.

Conclusions

This cluster provides the basic feature usually needed in an on-prem cluster to deploy complex solutions, for example, a django application which uses a postgresql database and stores files in a S3 compatible object storage. The application can also provide a React frontend running in a separate container, being both exposed using HTTPS with certificates issued by a subca.

Future works

This cluster just works, but there are some nice additions that could improve it and make it more production friendly, so the following tasks would be:

  • Adding support for SOPS secrets.
  • Adding keycloak for identity management.
  • Adding continuous delivery with ArgoCD.
  • Adding monitoring with grafana stack.

References