Nodeless EKS Clusters

Ryan Graham

6 min readAug 26, 2024

with Fargate and Karpenter

Sick of updating AMIs on managed node groups? This is how I got rid of them.

Why

I saw three major benefits when I moved to Karpenter.

Cluster upgrades became a breeze. I could update the EKS control plane. Karpenter would automatically detect drift and roll all of my nodes for me.
Maintaining compliant AMIs on nodes became automatic. No more waiting for the ASG to roll each node at an agonizing pace. No more 40+ minute terraform runs.
Cost savings! Instances were continuously right-sized and spot instances could be utilized and managed automatically.

But there was still a problem. I still had one managed node group in every cluster in order to bootstrap Karpenter. I still had to update that managed node group at least once a month to stay ahead of patch compliance goals and it was toilsome.

This is where Fargate came in.

What

Spin up a new cluster WITHOUT the customary base managed node group.
Deploy Karpenter and CoreDNS to Fargate in the cluster.
Deploy Flux, Argo, etc and all of your actual workloads to run on Karpenter-managed nodes.

How

Terraform the cluster

Start by terraforming your cluster with Fargate profiles for CoreDNS and Karpenter.

module "eks" {
  source          = "terraform-aws-modules/eks/aws"
  version         = "~> 20.22"
  cluster_name    = local.cluster_name
  cluster_version = local.cluster_version

  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = false

  vpc_id                   = data.aws_vpc.default.id
  control_plane_subnet_ids = data.aws_subnets.other.ids
  subnet_ids               = data.aws_subnets.private.ids

  enable_cluster_creator_admin_permissions = true
  authentication_mode                      = "API"

  cluster_addons = {
    kube-proxy = {}
    vpc-cni    = {}
    coredns = {
      configuration_values = jsonencode({
        computeType = "fargate"
        resources = {
          limits = {
            cpu    = "0.25"
            memory = "256M"
          }
          requests = {
            cpu    = "0.25"
            memory = "256M"
          }
        }
      })
    }
  }

  fargate_profiles = {
    kube_system = {
      name = "kube-system"
      selectors = [
        {
          namespace = "kube-system"
          labels    = { k8s-app = "kube-dns" }
        },
        {
          namespace = "kube-system"
          labels    = { "app.kubernetes.io/name" = "karpenter" }
        }
      ]
    }
  }

  access_entries = {
    admin = {
      principal_arn     = "<Insert your SSO role ARN>"
      kubernetes_groups = ["cluster-admin"]
      policy_associations = {
        single = {
          policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
          access_scope = {
            type = "cluster"
          }
        }
      }
    }
  }

  cluster_security_group_additional_rules = {
    ingress_self_all = {
      description = "Allow Atlantis and VPN to reach control plane"
      protocol    = "tcp"
      from_port   = 443
      to_port     = 443
      type        = "ingress"
      cidr_blocks = ["10.0.0.0/8", "172.16.0.0/12"]
    }
  }

  node_security_group_tags = {
    "karpenter.sh/discovery" = local.cluster_name
  }
}

Terraform IAM and SQS for Karpenter

module "karpenter" {
  source  = "terraform-aws-modules/eks/aws//modules/karpenter"
  version = "~> 20.22"

  # fargate only works with IRSA
  enable_pod_identity             = false
  create_pod_identity_association = false
  enable_irsa                     = true
  irsa_oidc_provider_arn          = module.eks.oidc_provider_arn
  irsa_namespace_service_accounts = ["kube-system:karpenter"]

  cluster_name = module.eks.cluster_name

  node_iam_role_additional_policies = {
    AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
  }
}

Add providers for helm and kubectl

provider "helm" {
  kubernetes {
    host                   = module.eks.cluster_endpoint
    cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      command     = "aws"
      args        = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
    }
  }
}

provider "kubectl" {
  apply_retry_count      = 5
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  load_config_file       = false

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
  }
}

data "aws_ecrpublic_authorization_token" "token" {}

Helm release Karpenter CRDs

resource "helm_release" "karpenter_crd" {
  namespace           = "kube-system"
  name                = "karpenter-crd"
  repository          = "oci://public.ecr.aws/karpenter"
  repository_username = data.aws_ecrpublic_authorization_token.token.user_name
  repository_password = data.aws_ecrpublic_authorization_token.token.password
  chart               = "karpenter-crd"
  version             = "1.0.1"
  wait                = false

  values = [
    <<-EOT
    webhook:
      enabled: true
      serviceName: karpenter
      serviceNamespace: kube-system
      port: 8443
    EOT
  ]
}

Helm release Karpenter itself

resource "helm_release" "karpenter" {
  namespace           = "kube-system"
  name                = "karpenter"
  repository          = "oci://public.ecr.aws/karpenter"
  repository_username = data.aws_ecrpublic_authorization_token.token.user_name
  repository_password = data.aws_ecrpublic_authorization_token.token.password
  chart               = "karpenter"
  version             = "1.0.1"
  wait                = false

  values = [
    <<-EOT
    serviceAccount:
      name: ${module.karpenter.service_account}
      annotations:
        eks.amazonaws.com/role-arn: ${module.karpenter.iam_role_arn}
    settings:
      clusterName: ${module.eks.cluster_name}
      clusterEndpoint: ${module.eks.cluster_endpoint}
      interruptionQueue: ${module.karpenter.queue_name}
    tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      - key: eks.amazonaws.com/compute-type
        operator: Equal
        value: fargate
        effect: NoSchedule
    controller:
      resources:
        requests:
          cpu: 1000m
          memory: 1024Mi
        limits:
          cpu: 1000m
          memory: 1024Mi
    EOT
  ]

  depends_on = [
    helm_release.karpenter_crd
  ]
}

Add a default NodeClass

resource "kubectl_manifest" "karpenter_node_class" {
  yaml_body = <<-YAML
    apiVersion: karpenter.k8s.aws/v1
    kind: EC2NodeClass
    metadata:
      name: default
    spec:
      amiSelectorTerms:
        - alias: al2023@latest
      role: ${module.karpenter.node_iam_role_name}
      subnetSelectorTerms:
        - tags:
            karpenter.sh/discovery: ${local.env}
      securityGroupSelectorTerms:
        - tags:
            karpenter.sh/discovery: ${module.eks.cluster_name}
        - id: ${module.eks_community.cluster_primary_security_group_id}
      tags:
        karpenter.sh/discovery: ${module.eks.cluster_name}
  YAML

  depends_on = [
    helm_release.karpenter
  ]
}

Add a default NodePool

resource "kubectl_manifest" "karpenter_node_pool" {
  yaml_body = <<-YAML
    apiVersion: karpenter.sh/v1
    kind: NodePool
    metadata:
      name: default
    spec:
      template:
        spec:
          nodeClassRef:
            group: karpenter.k8s.aws
            kind: EC2NodeClass
            name: default
          requirements:
            - key: "karpenter.k8s.aws/instance-category"
              operator: In
              values: ["c", "m", "r"]
            - key: "karpenter.k8s.aws/instance-cpu"
              operator: In
              values: ["4", "8", "16", "32"]
            - key: "karpenter.k8s.aws/instance-hypervisor"
              operator: In
              values: ["nitro"]
            - key: "karpenter.k8s.aws/instance-generation"
              operator: Gt
              values: ["2"]
      limits:
        cpu: 1000
      disruption:
        consolidationPolicy: WhenEmpty
        consolidateAfter: 30s
  YAML

  depends_on = [
    kubectl_manifest.karpenter_node_class
  ]
}

Add a test deployment

resource "kubectl_manifest" "karpenter_example_deployment" {
  yaml_body = <<-YAML
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: inflate
    spec:
      replicas: 0
      selector:
        matchLabels:
          app: inflate
      template:
        metadata:
          labels:
            app: inflate
        spec:
          terminationGracePeriodSeconds: 0
          containers:
            - name: inflate
              image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
              resources:
                requests:
                  cpu: 1
  YAML

  depends_on = [
    helm_release.karpenter
  ]
}

Testing…

Once you’ve run all of the above, you’ll have an EKS cluster running 4 pods (2x CoreDNS, 2x Karpenter) on 4 minimal fargate nodes.

❯ kubectl get nodes
NAME                                  STATUS   ROLES    AGE    VERSION
fargate-ip-10-0-24-69.ec2.internal    Ready    <none>   17d    v1.30.0-eks-404b9c6
fargate-ip-10-0-45-231.ec2.internal   Ready    <none>   4h2m   v1.30.0-eks-404b9c6
fargate-ip-10-0-49-95.ec2.internal    Ready    <none>   17d    v1.30.0-eks-404b9c6
fargate-ip-10-0-57-54.ec2.internal    Ready    <none>   4h2m   v1.30.0-eks-404b9c6
❯ kubectl get pods -A
NAMESPACE     NAME                         READY   STATUS    RESTARTS   AGE
kube-system   coredns-5b87ff9b5-64jch      1/1     Running   0          17d
kube-system   coredns-5b87ff9b5-ss4wf      1/1     Running   0          17d
kube-system   karpenter-666d8f76ff-vr849   1/1     Running   0          4h11m
kube-system   karpenter-666d8f76ff-x59tw   1/1     Running   0          4h11m

Try spinning up some pods in our test deployment.

❯ kubectl scale deployment inflate --replicas 5
deployment.apps/inflate scaled

Within a few minutes you should see Karpenter spin up a new node. Daemonset pods for that node should show up as well.

❯ kubectl get nodes
NAME                                  STATUS   ROLES    AGE    VERSION
fargate-ip-10-0-24-69.ec2.internal    Ready    <none>   17d    v1.30.0-eks-404b9c6
fargate-ip-10-0-45-231.ec2.internal   Ready    <none>   4h4m   v1.30.0-eks-404b9c6
fargate-ip-10-0-49-95.ec2.internal    Ready    <none>   17d    v1.30.0-eks-404b9c6
fargate-ip-10-0-57-54.ec2.internal    Ready    <none>   4h4m   v1.30.0-eks-404b9c6
ip-10-0-104-83.ec2.internal           Ready    <none>   57s    v1.30.2-eks-1552ad0
❯ kubectl get pods -A
NAMESPACE     NAME                         READY   STATUS    RESTARTS   AGE
default       inflate-66fb68585c-drc69     1/1     Running   0          92s
default       inflate-66fb68585c-lvlxl     1/1     Running   0          92s
default       inflate-66fb68585c-qf42p     1/1     Running   0          92s
default       inflate-66fb68585c-qmz5j     1/1     Running   0          92s
default       inflate-66fb68585c-wqn6n     1/1     Running   0          92s
kube-system   aws-node-5rkkl               2/2     Running   0          67s
kube-system   coredns-5b87ff9b5-64jch      1/1     Running   0          17d
kube-system   coredns-5b87ff9b5-ss4wf      1/1     Running   0          17d
kube-system   karpenter-666d8f76ff-vr849   1/1     Running   0          4h5m
kube-system   karpenter-666d8f76ff-x59tw   1/1     Running   0          4h5m
kube-system   kube-proxy-z2tbk             1/1     Running   0          67s

Finish the test by spinning the deployment back down. If you’ve deployed nothing else, the Karpenter node will disappear within minutes as well.

❯ kubectl scale deployment inflate --replicas 0
deployment.apps/inflate scaled

Whats next?

Go bootstrap Flux or ArgoCD on your new cluster and watch Karpenter spin up new nodes to accommodate them. Then maybe drop some more nuanced NodeClasses and NodePools into your GitOps repo.

Lets talk about cost

Its true that four minimal Fargate nodes will still costs tens of dollars more per cluster per month than your old managed node group. But that is a drop in the bucket compared to what you’ll save on maintenance.

Notable Updates

Things I’ve changed since publishing…

Added label selectors to fargate profile
Added primary cluster SG ID to karpenter node class (this ensures pods on karpenter nodes can hit coredns on fargate nodes)
Increased requests/limits for Karpenter