Posted on

Context

Often, I want to play with a Kubernetes cluster without having to pay a cloud provider for compute, or by setting up a home lab cluster with kubeadm. In these times, I reach for K8s Kind (although I'd love to have a home lab cluster).

Simply put, Kind allows me to run a small Kubernetes cluster within Docker/Podman/You Choose. Whilst the number of pods I can run is limited to the power of my machine, it's great for running small proof-of-concepts (PoCs) and exploring the configuration of a product before deploying it to a real cluster.

I do have a small tip that I can share with you now to increase that pod limit if you're on a Debian-based machine:

sudo sysctl fs.inotify.max_user_watches=524288
sudo sysctl fs.inotify.max_user_instances=512

These commands boost the number of file watchers available for the duration of your session, returning those values to their defaults once you reboot. More file watches = more pods.

If Docker/Others were already running before you ran these commands, you might want to restart those services.

But that's not what I really want to share...

Problem

Recently, I decided to spend too much time writing YAML files, in playing with the Argo Project. Mostly to explore some features and generally play with GitOps type processes. As part of my test-bed setup, I deployed Forgejo to replicate a scenario where there's a Git server and container registry hosted alongside the GitOps tooling.

My goals and reasons aren't really important here. The problem I came up against was after I had code repos set up, events firing off workflows to build and push container images, and finally automatic deployment to a target Kubernetes namespace.

All of that worked, as defined in the glorious YAML files I had painstakingly created. The problem was at deployment time: ImagePullBackOff - Kubernetes couldn't pull my container images because it couldn't verify the self-signed SSL certificate I'd used for the Forgejo Ingress rule.

How did I get around that problem? Well, first, I destroyed my Kind cluster.

Was I having a tantrum? Maybe. Would I have needed to nuke everything I created anyway? Unfortunately so.

Resolution

The reason I had to destroy my Kind cluster is that the container image used by Kind as a virtual Kubernetes node doesn't seem to contain a text editor. No Vim, no Vi, not even Nano. So I couldn't simply shell into the node, make a few edits and service containerd restart my way to happiness.

The first modification I needed to make is in the ContainerD config file. This is typically located at /etc/containerd/config.toml on a Kubernetes node.

A bit of reading pointed me to a way to modify this file upon cluster creation. Cool. I modified my Kind cluster definition (another YAML file) to look a little like this:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: plaything
containerdConfigPatches:
  - |-
    [plugins."io.containerd.grpc.v1.cri".registry]
      config_path = "/etc/containerd/certs.d"
nodes:
  - role: control-plane
    extraPortMappings:
      - containerPort: 80
        hostPort: 80
        protocol: TCP
      - containerPort: 443
        hostPort: 443
        protocol: TCP

The magic is under the containerdConfigPatches key. It's basically telling ContainerD that some configuration exists under the defined directory.

The next thing I needed to do was to create some config under that config_path for my self-hosted container registry. I had my Forgejo instance running under the git.benzo.test hostname, thanks to some Dnsmasq magic that could have also been achieved with an extra entry in my machine's hosts file.

Note: git.benzo.test is a local-only hostname, which isn't internet facing - it won't work for you.

I'm telling you the hostname because it defines the location and the contents of the next file we need to create: /etc/containerd/certs.d/<hostname>/hosts.toml. Because I wanted this configuration to be repeatable, I chose to create this file locally and then mount it to the Kind node at cluster creation time.

I fired up my text editor and smashed the following keys:

server = "git.benzo.test"

[host."git.benzo.test"]
    capabilities = ["pull", "resolve"]
    skip_verify = true

This config tells ContainerD, "Hey! There's this server...you can pull and resolve container images there...but keep your eyes off my dodgy self-signed SSL certificate. It's good, I promise."

The last thing I needed to do was to place that config file in the right location, via an additional modification to my Kind cluster definition:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: plaything
containerdConfigPatches:
  - |-
    [plugins."io.containerd.grpc.v1.cri".registry]
      config_path = "/etc/containerd/certs.d"
nodes:
  - role: control-plane
    extraMounts:
      - hostPath: /home/ben/code/kind/my-cr.toml
        containerPath: /etc/containerd/certs.d/git.benzo.test/hosts.toml
    extraPortMappings:
      - containerPort: 80
        hostPort: 80
        protocol: TCP
      - containerPort: 443
        hostPort: 443
        protocol: TCP

Focusing on the extraMounts section, I'm instructing Kind to take a file from my local filesystem (hostPath) and place it in a specific location on the Kubernetes node (containerPath).

With all that done, I spooled up my cluster with just go. Thankfully, I had configured all the deployments as ArgoCD applications, which meant that once ArgoCD was deployed, everything else I had previously configured and deployed just came back to life, without any additional command-line input from myself. The container images hosted in Forgejo pulled without any issues this time around.

For a Real Cluster

I would not recommend using the above hosts.toml for a production deployment. Blindly accepting a connection and trusting the SSL certificates can open you up to a number of exploits.

It's quite common for enterprises to have their own SSL CA certificate and require it to be used for signing your Ingress SSL certs, instead of those you might get from a public CA like Let's Encrypt.

For this scenario, I'd suggest creating:

  • A Secret
    • Containing the Client SSL certificate, which needs to be trusted by ContainerD
  • A ConfigMap containing a shell script to:
    • Modify the ContainerD config.toml if necessary
    • Copy over the SSL cert to a location on a Kuberneres Node
    • Create the required hosts.toml file (echo 'blah' > /path/to/hosts.toml)
  • A DaemonSet (Runs on each Node)
    • With the above Secret and ConfigMap mounted
    • Runs script defined in above ConfigMap
    • Note: You may need to configure the DaemonSet with tolerations if your nodes have taints (link)

Finally, I would be using a hosts.toml file that looks a little like this:

server = "hostname"

[host."hostname"]
    capabilities = ["pull", "resolve"]
    client = "path/to/cert.pem"

This file tells ContainerD, "Hey! There's this server...you can pull and resolve container images there...it has an SSL certificate you might not know about but don't worry, here's a pem file you can use for validation."

The path to the cert.pem file can be relative to the hosts.toml file. I would recommend storing the certificate alongside the hosts.toml file, in the same directory, to keep things visible.

A little more reading on the use of this file can be found here.

Read More Posts