One Year of Kubernetes

In October last year, we flipped the switch to run our backend on Kubernetes. This was the culmination of months of research, experiments and incremental migrations and it went without a hitch - not one customer would have noticed that the App transparently switched over from one backend to an entirely different one. This isn't a detailed post about the migration, rather it's a retrospective on the last year – what did we do right? What did we get wrong? What's next for us?

I do want to establish some context on how things were at Countingup before Kubernetes first though.

Pre-Kubernetes

We've always run Countingup in a containerised environment, originally this was orchestrated by Rancher v1 which is very different from Rancher as you may know it today. This version wasn't using Kubernetes, it was a custom orchestration solution with everything you'd expect.

Service definitions. Docker images, environment variables, scaling factor
Service health checks.
Secrets management. Secrets defined through the UI were mounted to the file system of the container at runtime
Logs. You could tail the logs of an instance of a container. We've always had an aggregated log solution through logz.io but it was often helpful to see real-time logs in Rancher.

Rancher was working well for us but it was end-of-life so we had to migrate to another solution and we chose to use AWS's managed Kubernetes service, EKS. Why Kubernetes instead of, for example, ECS, Nomad, Rancher v2? We had a chance to go with a clean slate and given Kubernetes has emerged as the industry standard and that AWS have a managed offering it was the sensible choice for us.

Kubernetes Migration

The Kubernetes ecosystem is vast. It's very easy to get caught up in all the tools, ingress possibilities, networking options, deployment mechanisms, namespace strategies 🤯. We were keen to Keep It Simple. We can migrate to some 'simple' version of Kubernetes and iterate over time as and when we spot problems or opportunities. In real terms this meant:

Using EKS. There's no appetite for running our own Kubernetes cluster.
Use tools we're familiar with. That meant using Terraform and avoiding eksctl despite Amazon recommending its use.
Not using Helm. We didn't see ourselves having many third-party deployments running on the cluster and there was no need to use it for managing our own service deployments.
No custom networking overlays. EKS has a networking option, called VPC CNI which works 'out of the box'. IPs are assigned from the VPC which means we can use familiar tools for managing the security groups and access control. There are third-party options here that provide lots of useful features at the cost of complexity.

What Went Well?

In no particular order, here are some of the things that have worked well for us in the past year.

The Tooling

One of the reasons we chose Kubernetes was for the number of amazing, open source tools to interface with your cluster. This has probably been the biggest day-to-day change for us.

kubectl. The de-facto CLI for using Kubernetes. I use this all the time for checking the status of pods, watching nodes get replaced when we perform an upgrade, and a whole bunch of one-off tasks. For me it's a bit like Git, I use probably less than 10% of what it's capable of but I've got a few established commands burned into memory that I use again and again.
stern. This lets you tail pod logs in real-time. Originally we used kubectl for this but it's clunky for viewing more than one pod. Stern just works and has a surprising number of options.
The API. Kubernetes exposes an "API Server" for clients to query and issue commands – this is what kubectl uses. I've occasionally used the API and the Go client to run some especially gnarly queries, not often, but having the option has been great.

Terraform

We manage everything in AWS with Terraform (and a few other things) and managing EKS is no different.

It wasn't easy though.

EKS is complex, so much so that there's a very popular Terraform module for managing it. We started with this but it become apparent that the module supports far more than we needed and didn't support everything we did need. Writing the Terraform from scratch took a long time and required lots of trial and error to get our setup in a place we were happy with but ultimately this has worked out great for us. Some of the resources we need to define include

aws_eks_cluster - The actual cluster object
aws_eks_addon - EKS has a concept of 'add-ons' for networking and DNS
aws_eks_node_group - The managed worker node group
aws_iam_openid_connect_provider - To enable IAM roles for service accounts
aws_security_group and aws_security_group_rule - So, so many rules to keep everything secure

Deploying an update to the VPC CNI, or AMI version is as simple as changing a variable and running terraform apply.

Secrets Management

Rancher's secret management solution was in beta – it worked but it wasn't great. Kubernetes has a secret management solution too which is also not without problems. By default secrets in etcd are stored as base64 strings (encoded, not encrypted), secrets attached to pods are readable by anyone with admin access to the pod.

We opted to avoid using Kubernetes secrets entirely (where possible) by using AWS Secrets Manager and changing our service code to read from Secrets Manager using the AWS Go SDK. We created a reusable function to make this easy to manage.

// Very simplified version of the real code
func GetConfigurationValue(envVar string, secretName string) string {
    // Return environment variable if set.
    // Useful for non-secret configuration and local development
    envValue := os.Getenv(envKey)
    if envValue != "" {
        return envValue
    }

    // Use AWS SDK to read the secret from Secrets Manager
    return getSecretFromSecretsManager(secretName)
}

We define our secrets in Terraform as aws_secretsmanager_secret resources. Following the principle of least priviledge we only allow access to a secret by the pod/s that need it.

An example of setting up a pod with limited secret access in Terraform looks a little like this for us,

module "tax" {
  source            = "./pod"
  service_name      = "tax"
  secret_ids_read_access = [
    "tax_database",
    "segment_write_key", # 'tax' can make calls to Segment using the secret write key
  ]
}

The 'pod' module has a policy resource that attaches the pod's IAM role granting it read access to those secrets.

locals {
  role_name = "${var.service_name}.pod"
}

# Enable IAM roles for service accounts by allowing the pod to authenticate with the OIDC endpoint
data "aws_iam_policy_document" "pod_irsa_role_doc" {
  statement {
    effect = "Allow"
    principals {
      type        = "Federated"
      identifiers = [var.oidc_provider_arn]
    }
    actions = ["sts:AssumeRoleWithWebIdentity"]
    condition {
      test     = "StringEquals"
      variable = "${split("oidc-provider/", var.oidc_provider_arn)[1]}:sub"
      values   = [system:serviceaccount:default:${var.service_name}]
    }
  }
}

# Each pod assumes a single IAM Role. This makes it very easy to enforce the principle of least priviledge
resource "aws_iam_role" "pod_irsa_role" {
  name               = local.role_name
  assume_role_policy = data.aws_iam_policy_document.pod_irsa_role_doc.json
}

# Grant read access to secrets
data "aws_iam_policy_document" "read_secrets_policy_document" {
  count = length(var.secret_ids_read_access) == 0 ? 0 : 1
  statement {
    effect = "Allow"
    actions = [
      "secretsmanager:GetSecretValue",
      "secretsmanager:DescribeSecret",
    ]
    # Secrets Manager adds `-xxxxxx` random string to secrets so we need to add that suffix here
    resources = [for secret_id in var.secret_ids_read_access : "arn:aws:secretsmanager:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:secret:${secret_id}-??????"]
  }
}

resource "aws_iam_policy" "read_secret_policy" {
  count  = length(var.secret_ids_read_access) == 0 ? 0 : 1
  name   = "secrets_manager_read_access_for_role_${local.role_name}"
  policy = data.aws_iam_policy_document.read_secrets_policy_document[0].json
}

resource "aws_iam_role_policy_attachment" "read_secret_policy_attachment" {
  count      = length(var.secret_ids_read_access) == 0 ? 0 : 1
  role       = aws_iam_role.pod_irsa_role.name
  policy_arn = aws_iam_policy.read_secret_policy[0].arn
}

AWS has an alternative solution to using Secrets Manager directly from the code, the verbosely named AWS Secrets Manager and Config Provider for Secret Store CSI Driver but this was not available until after we migrated and we're already very happy with our solution.

VPC CNI

The cluster networking solution. This one definitely has a more turbulent success trajectory – I mentioned above that we used this over third-party because it's simple but there was a major issue with it until recently.

✨ Pod density ✨

Kubernetes requires each pod to have its own IP address. If a node (EC2 instance) is running 20 pods it needs to have access to 20 IP addresses through its attached ENI/s. The number of IPs available to an EC2 instance varies based on the instance type.

AWS has a page with a table showing the limits,

So if we wanted to run >30 pods on an m5.large we can't do it, we'd need to bump to an m5.xlarge. This wasn't a problem for our production environment, we run big enough instances that there's no issue here but our development and QA instances are t3.medium which only support 18 IPv4 addresses (actually it's 17 because EKS needs one). It didn't take long for us to get incredibly close to that limit given our number of pods and the number of replicas of each pod.

Luckily AWS delivered a much-requested feature just in time which massively increases the number of pods that can run on an instance using a feature called 'prefix delegation'.

This bumped the pod capacity of the t3.medium nodes from 17 to a whopping 110. I'm really glad we held out on using a third-party networking option, and a big thanks to Amazon for listening to the community and implementing this feature.

Cost

EKS costs $72 a month. That's it. We have 3 clusters, 1 per environment so $216 a month. It's unbelievably cheap for what it gives us and the burden it takes away from us. We don't host any Kubernetes specific services (etcd, API server, master nodes etc). Running these ourselves would cost well over $216 a month in developer costs.

What Didn't Work Out?

We didn't get everything right, some decisions that made sense then don't make sense now and vice versa.

Kubernetes Dashboard

The Kubernetes team ship a dashboard product which can be deployed into a Kubernetes cluster. It shows what you'd expect: deployments, pods, node status, CPU/memory usage etc.

It seems like most folks install this – it's a first-party product so we figured we may as well install it too. Over the last year I can literally count on my hand the number of times I've used it. 3. One of which was the create the screenshot above.

kubectl is so powerful and easy to use I always find myself reaching for that instead. The metrics server adds some nice details on CPU and memory usage in the dashboard but that's not something I find myself needing to check.

This is especially annoying because we spent a fair bit of time getting the dashboard working. There are several ways to authenticate and we originally opted to use oauth2-proxy which let us use our GitHub accounts to access the dashboard. This worked but, again, extra complexity went against our Keep It Simple strategy. Post migration we came across kauthproxy which runs a proxy server locally so you can access the dashboard securely via localhost. It's so simple, It Just Works, and it reduces our publically exposed surface area so we immediately removed oauth2-proxy in favour of this.

We also had to use a different secret management system for oauth2-proxy, which leads on to my next point.

Secrets Management

Yes, this is in the "What Went Well" section, but it's a big topic and some of it didn't go as well as we'd have liked. For third-party pods that need secrets we couldn't use our GetConfigurationValue function to use AWS Secrets Manager, we were forced to use Kubernetes Secrets. We knew we still wanted to keep all secrets in Secrets Manager as the single source of truth so we opted for a tool called External Secrets

external-secrets polls some given source of secrets, AWS Secrets Manager in our case and creates Kubernetes Secrets that pods can access. It's a really cool idea, we liked it a lot.

We used it for a couple of third-party dependencies but eventually phased it out by either removing those dependencies or managing the Kubernetes Secret in Terraform instead. So we don't use external-secrets anymore. It was great while it lasted though.

That link for external-secrets points to the "new" in-development version. We were running a now deprecated version written in JavaScript. This deprecation whilst the new version was still in Alpha was another reason we were keen to remove this from our cluster.

Upgrading Third Party dependencies

I mentioned at the top that we chose to avoid Helm. At the time this was the correct decision. There were 1001 decisions that needed to be made and Keep It Simple guided us very well and opting to not use Helm made sense.

I'm not sure this decision still holds a year later though.

We're in a much better place with understanding how we use Kubernetes, what our dependencies are, what our upgrade cadence looks like and having a tool to manage these makes a lot of sense. Our upgrade process mostly involves grabbing YAML files from GitHub repos and pasting them into our GitHub repo and deploying with CI. Except there are always some manual tweaks to namespaces, RBAC rules etc that make this more painful than it should be – especially for more complex dependencies like Ingress NGINX.

There's a roadmap item for us to explore using Helm. I'm optimistic about it.

One Year Of Kubernetes

The migration was a great success,

Excellent tooling
Improved secrets management
Low cost

There are a few things to improve on, particularly around third-party dependency management that we're looking into.

I'm really pleased we stuck with Keep It Simple and even simplified things further after the migration with the removal of oauth2-proxy and external-secrets.

The large upfront cost of managing everything in Terraform with hand-rolled HCL was a pain at first but we've barely touched it since migrating and we know we've got a complete understanding of our infrastructure which wouldn't have been so simple using a community module.

I wonder how things will change a year from now.