Enough cribbing about tooling, let’s run some containers!

Archetype

At $DAYJOB I maintain a lot of Python code, so to make everything transferrable I’m going to create a Python package called archetype which is your archetypical Azure Python REST service. The only design constraint is that archetype must not do anything interesting! WYSIWYG, so to speak. Its source code can be found here, and for my sanity I’ll document a little of its structure here:

  1. scripts directory holds the project scripts according to the scripts-to-rule-them-all pattern.
  2. pyproject.toml defines metadata for the project. This file is involved when packaging this into a wheel (a python package) and if I was publishing this on PyPI then this file would drive the metadata that others see about this package.
  3. src holds the source code of the project. By default, the build backend Hatchling will use either these files for wheels, and these files for source distributions. This might be very important to understand if I was publishing on PyPI but for deploying services it seems not too important.
  4. requirements.txt and dev-requirements.txt are generated by pip-compile using pyproject.toml and are not meant to be edited by hand. They are checked in and modified whenever the dependency set of the project changes.
  5. Dockerfile simply copies the code, runs setup to install dependencies and install the package as an editable install, and then runs the project using uvicorn as the server.

This is the output I see when running the project:

❯ ./scripts/server 
INFO:     Will watch for changes in these directories: ['/Users/gustavo/code/ghidalgo3.github.io/k8s']
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [39552] using StatReload
Importing archetype module!
INFO:     Started server process [39554]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

Dockerizing Archetype

Dockerizing this project is pretty straight forward:

FROM python:3.11-bookworm
WORKDIR /app
COPY . /app
RUN scripts/setup
ENTRYPOINT [ "uvicorn", "archetype.main:app", "--host=0.0.0.0"]

Line by line:

  1. Start with an image with Python pre-installed, python:3.11-bookwork
  2. Create a working directory /app
  3. Copy everything under the Dockerfile’s path to /app
  4. Run the same scripts/setup scripts inside the container. This install our dependencies!
  5. Leave a declared entry point for uvicorn to run its server.

Then I publish this container to a private ACR with docker push -t "gustavo2acr.azurecr.io/archetype:latest". Pretty simple!

ARM?

Eventually I’ll come to find out that docker files are built for specific ISAs! When I build images on my Mac, the image is an ARM build, not an x86 build. If I want the image as is to run on the Kubernetes nodes I created, I would have needed to create a cluster with ARM CPU VMs. The solution is simply to ask docker to build an x86 image with --platform=linux/amd64 in the docker build call.

Deploy this application to AKS.

I’ll do this in the most verbose way possible first to motivate using Helm. Verbose mandates tool purity, so kubectl is my only choice. To authenticate againt the cluster, I simply run:

az aks get-credentials \
    --resource-group $(terraform output -raw resource_group_name) \
    --name $(terraform output -raw k8s_cluster_name)

Double checking that worked…

❯ kubectl get services
NAME         TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   192.168.0.1   <none>        443/TCP   20h

Cool, I have a configured kubectl.

Building this from the bottom up, my understanding is that I need 3 kubernetes resources:

  1. A Deployment object
  2. A Service object
  3. A Ingress object

Deployment

According to the docs, a Deployment resource declaratively controls Pods and ReplicaSets. I think since 90% of all kubenetes applications just need the cluster to make sure that N replicas of the application are running, deployments were created to satisfy this common use case.

Here’s our deployment definition:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: archetype-deployment
  labels:
    app: archetype
spec:
  replicas: 2
  selector:
    matchLabels:
      app: archetype
  template:
    metadata:
      labels:
        app: archetype
    spec:
      containers:
      - name: archetype
        image: gustavo2acr.azurecr.io/archetype:latest-amd64
        ports:
        - containerPort: 8000
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "128Mi"
            cpu: "500m"

Then after applying it:

> kubectl describe deployment
Name:                   archetype-deployment
Namespace:              default
CreationTimestamp:      Sun, 25 Feb 2024 12:35:33 -0500
Labels:                 app=archetype
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               app=archetype
Replicas:               2 desired | 2 updated | 2 total | 0 available | 2 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=archetype
  Containers:
   archetype:
    Image:      gustavo2acr.azurecr.io/archetype:latest-amd64
    Port:       8000/TCP
    Host Port:  0/TCP
    Limits:
      cpu:     500m
      memory:  128Mi
    Requests:
      cpu:        250m
      memory:     64Mi
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      False   MinimumReplicasUnavailable
  Progressing    True    ReplicaSetUpdated
OldReplicaSets:  <none>
NewReplicaSet:   archetype-deployment-75cc984b68 (2/2 replicas created)
Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  3m23s  deployment-controller  Scaled up replica set archetype-deployment-75cc984b68 to 2

Resource Limits

I’m cheap, so my cluster only has 1 node and it’s a little 2-core Azure VM. It’s remarkable how many pods are running by default in a cluster doing nothing:

❯ kubectl describe node
Name:               aks-default-42366886-vmss000000
Roles:              agent
Labels:             agentpool=default
                    beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=Standard_DS2_v2
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=eastus2
                    failure-domain.beta.kubernetes.io/zone=0
                    kubernetes.azure.com/agentpool=default
                    kubernetes.azure.com/cluster=MC_gustavo2-rg_gustavo2-aks-cluster_eastus2
                    kubernetes.azure.com/consolidated-additional-properties=91882a8a-d34a-11ee-bd45-2a30a704d842
                    kubernetes.azure.com/kubelet-identity-client-id=abb742dc-7a98-47b0-8d03-cc3d662c02c6
                    kubernetes.azure.com/mode=system
                    kubernetes.azure.com/node-image-version=AKSUbuntu-2204gen2containerd-202402.07.0
                    kubernetes.azure.com/nodepool-type=VirtualMachineScaleSets
                    kubernetes.azure.com/os-sku=Ubuntu
                    kubernetes.azure.com/role=agent
                    kubernetes.azure.com/storageprofile=managed
                    kubernetes.azure.com/storagetier=Premium_LRS
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=aks-default-42366886-vmss000000
                    kubernetes.io/os=linux
                    kubernetes.io/role=agent
                    node-role.kubernetes.io/agent=
                    node.kubernetes.io/instance-type=Standard_DS2_v2
                    storageprofile=managed
                    storagetier=Premium_LRS
                    topology.disk.csi.azure.com/zone=
                    topology.kubernetes.io/region=eastus2
                    topology.kubernetes.io/zone=0
Annotations:        csi.volume.kubernetes.io/nodeid:
                      {"disk.csi.azure.com":"aks-default-42366886-vmss000000","file.csi.azure.com":"aks-default-42366886-vmss000000"}
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sat, 24 Feb 2024 14:28:09 -0500
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  aks-default-42366886-vmss000000
  AcquireTime:     <unset>
  RenewTime:       Sun, 25 Feb 2024 13:06:03 -0500
Conditions:
  Type                          Status  LastHeartbeatTime                 LastTransitionTime                Reason                          Message
  ----                          ------  -----------------                 ------------------                ------                          -------
  ContainerRuntimeProblem       False   Sun, 25 Feb 2024 13:02:16 -0500   Sat, 24 Feb 2024 14:29:40 -0500   ContainerRuntimeIsUp            container runtime service is up
  VMEventScheduled              False   Sun, 25 Feb 2024 13:02:16 -0500   Sat, 24 Feb 2024 14:29:40 -0500   NoVMEventScheduled              VM has no scheduled event
  FrequentDockerRestart         False   Sun, 25 Feb 2024 13:02:16 -0500   Sat, 24 Feb 2024 14:29:40 -0500   NoFrequentDockerRestart         docker is functioning properly
  ReadonlyFilesystem            False   Sun, 25 Feb 2024 13:02:16 -0500   Sat, 24 Feb 2024 14:29:40 -0500   FilesystemIsNotReadOnly         Filesystem is not read-only
  FrequentKubeletRestart        False   Sun, 25 Feb 2024 13:02:16 -0500   Sat, 24 Feb 2024 14:29:40 -0500   NoFrequentKubeletRestart        kubelet is functioning properly
  KubeletProblem                False   Sun, 25 Feb 2024 13:02:16 -0500   Sat, 24 Feb 2024 14:29:40 -0500   KubeletIsUp                     kubelet service is up
  FilesystemCorruptionProblem   False   Sun, 25 Feb 2024 13:02:16 -0500   Sat, 24 Feb 2024 14:29:40 -0500   FilesystemIsOK                  Filesystem is healthy
  KernelDeadlock                False   Sun, 25 Feb 2024 13:02:16 -0500   Sat, 24 Feb 2024 14:29:40 -0500   KernelHasNoDeadlock             kernel has no deadlock
  FrequentUnregisterNetDevice   False   Sun, 25 Feb 2024 13:02:16 -0500   Sat, 24 Feb 2024 14:29:40 -0500   NoFrequentUnregisterNetDevice   node is functioning properly
  FrequentContainerdRestart     False   Sun, 25 Feb 2024 13:02:16 -0500   Sat, 24 Feb 2024 14:29:40 -0500   NoFrequentContainerdRestart     containerd is functioning properly
  MemoryPressure                False   Sun, 25 Feb 2024 13:04:21 -0500   Sat, 24 Feb 2024 14:28:09 -0500   KubeletHasSufficientMemory      kubelet has sufficient memory available
  DiskPressure                  False   Sun, 25 Feb 2024 13:04:21 -0500   Sat, 24 Feb 2024 14:28:09 -0500   KubeletHasNoDiskPressure        kubelet has no disk pressure
  PIDPressure                   False   Sun, 25 Feb 2024 13:04:21 -0500   Sat, 24 Feb 2024 14:28:09 -0500   KubeletHasSufficientPID         kubelet has sufficient PID available
  Ready                         True    Sun, 25 Feb 2024 13:04:21 -0500   Sat, 24 Feb 2024 14:28:10 -0500   KubeletReady                    kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  10.0.1.4
  Hostname:    aks-default-42366886-vmss000000
Capacity:
  cpu:                2
  ephemeral-storage:  129886128Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             7097684Ki
  pods:               30
Allocatable:
  cpu:                1900m
  ephemeral-storage:  119703055367
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             4652372Ki
  pods:               30
System Info:
  Machine ID:                 91d02b7297d74d2290142148194c2bec
  System UUID:                e0c121d3-1932-4eb1-b50f-fbec6960eccd
  Boot ID:                    36431874-31af-4b99-84d5-11c14da820a8
  Kernel Version:             5.15.0-1054-azure
  OS Image:                   Ubuntu 22.04.3 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.7.7-1
  Kubelet Version:            v1.27.9
  Kube-Proxy Version:         v1.27.9
ProviderID:                   azure:///subscriptions/394dccd4-6f06-4003-8c9a-671ba0d665c7/resourceGroups/mc_gustavo2-rg_gustavo2-aks-cluster_eastus2/providers/Microsoft.Compute/virtualMachineScaleSets/aks-default-42366886-vmss/virtualMachines/0
Non-terminated Pods:          (16 in total)
  Namespace                   Name                                     CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                     ------------  ----------  ---------------  -------------  ---
  app-routing-system          nginx-5d4cbcf56b-qwkn4                   500m (26%)    0 (0%)      127Mi (2%)       0 (0%)         76m
  app-routing-system          nginx-5d4cbcf56b-rkrx4                   500m (26%)    0 (0%)      127Mi (2%)       0 (0%)         77m
  default                     archetype-deployment-69945d5cf9-klrff    0 (0%)        0 (0%)      0 (0%)           0 (0%)         109s
  default                     archetype-deployment-69945d5cf9-wnh7p    0 (0%)        0 (0%)      0 (0%)           0 (0%)         3m28s
  kube-system                 azure-ip-masq-agent-lbpkh                100m (5%)     500m (26%)  50Mi (1%)        250Mi (5%)     22h
  kube-system                 cloud-node-manager-qbxww                 50m (2%)      0 (0%)      50Mi (1%)        512Mi (11%)    22h
  kube-system                 coredns-789789675-nn9qz                  100m (5%)     3 (157%)    70Mi (1%)        500Mi (11%)    22h
  kube-system                 coredns-789789675-ss2hf                  100m (5%)     3 (157%)    70Mi (1%)        500Mi (11%)    22h
  kube-system                 coredns-autoscaler-649b947bbd-xr74t      20m (1%)      200m (10%)  10Mi (0%)        500Mi (11%)    22h
  kube-system                 csi-azuredisk-node-69clt                 30m (1%)      0 (0%)      60Mi (1%)        400Mi (8%)     22h
  kube-system                 csi-azurefile-node-mwcpv                 30m (1%)      0 (0%)      60Mi (1%)        600Mi (13%)    22h
  kube-system                 konnectivity-agent-7565b9c9b8-bmjb2      20m (1%)      1 (52%)     20Mi (0%)        1Gi (22%)      22h
  kube-system                 konnectivity-agent-7565b9c9b8-t55pk      20m (1%)      1 (52%)     20Mi (0%)        1Gi (22%)      22h
  kube-system                 kube-proxy-nct72                         100m (5%)     0 (0%)      0 (0%)           0 (0%)         22h
  kube-system                 metrics-server-5bd48455f4-6cmbn          50m (2%)      145m (7%)   85Mi (1%)        355Mi (7%)     22h
  kube-system                 metrics-server-5bd48455f4-n8km8          50m (2%)      145m (7%)   85Mi (1%)        355Mi (7%)     22h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests     Limits
  --------           --------     ------
  cpu                1670m (87%)  8990m (473%)
  memory             834Mi (18%)  6020Mi (132%)
  ephemeral-storage  0 (0%)       0 (0%)
  hugepages-1Gi      0 (0%)       0 (0%)
  hugepages-2Mi      0 (0%)       0 (0%)
Events:              <none>

The interesting bit is that the VM has a CPU capacity of 1900m (about 2 cores) and the current requested CPU consumption is 1670m CPU from all the pods (that I didn’t create BTW!). The deployment requests to deploy 2 archetype pods that require 500m CPU each so naturally Kubernetes cannot schedule them because 1670m + 2 * 500m > 1900m. I could create more VMs to increase cluster resourcing but I’m cheap so instead I’ll remove the resource request and limits from the pod spec. Maybe this is dangerous in production but for my learning purposes I don’t really care about violating resource constraints.

Service

The service definition looks like this:

apiVersion: v1
kind: Service
metadata:
  name: archetype-service
spec:
  type: ClusterIP
  ports:
  - port: 80
    targetPort: 8000
  selector:
    app: archetype

Note that the spec.selector.app of the service matches spec.template.metadata.labels.app of the deployment. This is how kubernetes finds archetype pods to expose as the archetype-service. If the labels don’t match, then the pods simply aren’t exposed!

❯ kubectl describe service archetype-service
Name:              archetype-service
Namespace:         default
Labels:            <none>
Annotations:       <none>
Selector:          app=archetype
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                192.168.90.238
IPs:               192.168.90.238
Port:              <unset>  80/TCP
TargetPort:        8000/TCP
Endpoints:         <none>
Session Affinity:  None
Events:            <none>

ClusterIP services are only available inside the cluster. That means, a pod within the cluster can communicate with the service either at IP address 192.168.90.238 or at address archetype. The outside world cannot reach this service! For that we need to define an Ingress object.

Ingress

This is where the Azure documentation falls apart.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: archetype-ingress
spec:
  ingressClassName: webapprouting.kubernetes.azure.com
  rules:
  - host: archetype.gustavohidalgo.com
    http:
      paths:
      - backend:
          service:
            name: archetype-service
            port:
              number: 80
        path: /
        pathType: Prefix

Looking at the limitations:

All global Azure DNS zones integrated with the add-on have to be in the same resource group.

Ok that’s kind of a stupid limitation. I already have a DNS Zone resource that I created long ago for gustavohidalgo.com in a different resource group and I don’t know if I need to or should create a new DNS zone.

Maybe we don’t need to create a DNS zone, because actually this worked:

❯ curl 4.152.12.223/ -H "Host: archetype.gustavohidalgo.com"
{"Hello":"NAMELESS!"}

And the address 4.152.12.223 was allocated either when I enabled the app routing feature or when I created the ingress. Who knows, AKS definitely doesn’t tell you when new billable resources are created under your nose!

Conclusion

After all of this, I have a dockerized Python application running in a Kubenetes cluster. Cool, but I still feel like AKS is overly complicated.

Reference

  1. Python Project TOML
  2. Python Packaging
  3. Relieving Python packaging pain
  4. AKS Ingress Options
  5. Kubernetes Label Selectors