Our migration from Kubernetes Built-in NLB to ALB Controller

Working with Kubernetes Services is convenient, especially when you can deploy Load Balancers via cloud providers like AWS.

At Qovery, we initially started with Kubernetes’ built-in Network Load Balancer (NLB). However, we decided to move to the AWS Load Balancer Controller (ALB Controller). In this article, I explain why we made this switch and how it benefits our infrastructure.

We will discuss the reasons for the transition, the features of the ALB Controller, and provide a guide for deploying it. This shift has helped us simplify management, reduce costs, and enhance performance. By understanding these points, you can decide if the ALB Controller is right for your Kubernetes setup.

Pierre Mavro

Pierre Mavro

August 3, 2024 · 5 min read
Our migration from Kubernetes Built-in NLB to ALB Controller - Qovery

Important consideration: We should have been using it from the beginning, as migrating can be a pain, depending on your configuration. Moving to it from day one or as early as possible would have been the best!

#Why did we start with the NLB controller

For our customers and several technical people we discussed this with, NLB is the default choice because:

  1. Ease of Use: Simple to configure and use with built-in Kubernetes service annotations.
  2. Kubernetes Native: Uses Kubernetes-native objects, reducing the need for AWS-specific knowledge.
  3. Cloud-Agnostic: It is easier to migrate to other cloud providers or on-premises environments without deep AWS integration. As we support multiple cloud providers in the managed offering, we must maintain maximum transparency for our customers and be able to port functionalities to every supported cloud provider. Without some NLB features, this compatibility is not possible.
  4. Maintenance: Minimal maintenance overhead compared to managing additional AWS services and controllers.

#Why did we move to the ALB controller

Migration to the ALB Controller came late (4 years after we used the built-in NLB). We were able to live without it for a long time.

However, during this time, we faced issues like NLB not being cleaned correctly after deletion (on the Kubernetes side).

NLB deletion issue with Kubernetes
NLB deletion issue with Kubernetes

AWS support told us they wouldn’t make efforts to make fixes since they developed the ALB Controller. We contacted AWS support and looked at GitHub issues on Kubernetes, and the result is that it’s a legacy part of the Kubernetes code base that is not maintained anymore.

When using Kubernetes built-in NLB, be prepared to manage issues manually 😅.

This is what we did! We manually instrumented our Qovery Engine to manage this kind of issue.

// fix for NLB not properly removed https://discuss.qovery.com/t/why-provision-nlbs-for-container-databases/1114/10?u=pierre_mavro
pub fn clean_up_deleted_k8s_nlb(
    event_details: EventDetails,
    target: &DeploymentTarget,
) -> Result<(), Box<EngineError>> {
 // DO SOME NASTY STUFF TO DEAL WITH NLB DELETION ISSUE -_-'
}

But recently, we wanted to leverage some NLB features not present in Kubernetes's built-in NLB, so we had to move to the ALB Controller.

The move to the ALB Controller brings useful features such as:

Moving to the ALB Controller is an old topic for Qovery, as we already raised it a few years back when encountering the issues discussed above.

#Why didn’t we move to the ALB Controller before

The main reason is that it was one more thing to maintain on our end. But this time, we had no choice since some features requested by our customers required specific configurations that the native Kubernetes NLB implementation couldn’t handle.

Even if you’re not changing the Load Balancer type (NLB), you, unfortunately, can’t move from Kubernetes's "built-in NLB" to "ALB Controller NLB". cf. documentation:

You can't easily migrate from "built-in NLB" to "ALB Controller NLB"
You can't easily migrate from "built-in NLB" to "ALB Controller NLB"

And obviously, this has been confirmed as well by contacting AWS support 😭.

So if you have DNS CNAME pointing directly to the NLB DNS name (xxx.elb.eu-west-3.amazonaws.com), TLS/SSL certificates associated, or anything directly connected to it, you will have to manage it properly to avoid/reduce downtime as much as possible.

In anticipation, at Qovery, we use our domain on top of the NLB domain name, so we’re not concerned about TLS issues but about CNAME name changes and new NLB availability time 🤩.

However, it’s a shame that the AWS ALB Controller does not manage it transparently. The consequences regarding reliability and time investment for the migration are high. For most companies, it’s a problem. I’m worried that AWS didn’t consider it. Luckily for our customers, it's fully transparent.

Yes, at Qovery, we manage thousands of EKS clusters for our customers. If you are interested in this, read this article.

#Deployment

I won’t go into details because several tutorials are already available on the internet. To summarize, here is the Terraform configuration to prepare ALB Controller permissions:

resource "aws_iam_policy" "aws_load_balancer_controller_policy" {
  name = "qovery-alb-controller-${var.kubernetes_cluster_id}"
  description = "Policy for AWS Load Balancer Controller"

  // https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/deploy/installation/#option-b-attach-iam-policies-to-nodes 
  policy = jsonencode({
    "Version": "2012-10-17",
    "Statement": [
       ...REDACTED...
  })
}

resource "aws_iam_role" "aws_load_balancer_controller" {
  name = "qovery-eks-alb-controller-${var.kubernetes_cluster_id}"
  description = "ALB controller role for EKS cluster ${var.kubernetes_cluster_id}"
  tags = local.tags_eks

  assume_role_policy = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "${aws_iam_openid_connect_provider.oidc.arn}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${replace(aws_iam_openid_connect_provider.oidc.url, "https://", "")}:sub": "system:serviceaccount:kube-system:aws-load-balancer-controller"
        }
      }
    }
  ]
}
POLICY
}

resource "aws_iam_instance_profile" "aws_load_balancer_controller" {
  name = "qovery-eks-alb-controller-${var.kubernetes_cluster_id}"
  role = aws_iam_role.eks_cluster.name
  tags = local.tags_eks
}

resource "aws_iam_role_policy_attachment" "aws_load_balancer_controller" {
  policy_arn = aws_iam_policy.aws_load_balancer_controller_policy.arn
  role       = aws_iam_role.aws_load_balancer_controller.name
}

Then deploying the Helm ALB controller chart is not complicated:

helm repo add eks https://aws.github.io/eks-charts
helm upgrade --install --set clusterName=qovery-clusterid --set "serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn=arn:aws:iam::xxx:role/qovery-eks-alb-controller-clusterid" aws-load-balancer-controller eks/aws-load-balancer-controller

Now you’re ready to deploy NLB managed by ALB controller:

apiVersion: v1
kind: Service
metadata:
  name: nginx-ingress-ingress-nginx-controller
  namespace: nginx-ingress
  annotations:
    external-dns.alpha.kubernetes.io/hostname: 'xxx.yourdomain.com'
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "10"
    service.beta.kubernetes.io/aws-load-balancer-name: nginx-ingress
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
    service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*'
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
    service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: target_health_state.unhealthy.connection_termination.enabled=false
    service.beta.kubernetes.io/aws-load-balancer-type: external
spec:
  externalTrafficPolicy: Local
  internalTrafficPolicy: Cluster
  ports:
  - name: http
    nodePort: xxxxx
    port: 80
    protocol: TCP
    targetPort: http
  - name: https
    nodePort: xxxxx
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: nginx-ingress
    app.kubernetes.io/name: ingress-nginx
  type: LoadBalancer

If you look into the EC2 console, you should see your named load balancer (here: nginx-ingress).

#Conclusion

The ALB Controller offers many features compared to the built-in NLB and allows you to extend usage to ALB if desired.

We hope AWS will include this in the EKS add-ons to simplify lifecycle management and reduce the setup and deployment phase.

Companies should be strongly encouraged to move to the ALB Controller sooner rather than later to avoid a lengthy migration process.

Your Favorite DevOps Automation Platform

Qovery is a DevOps Automation Platform Helping 200+ Organizations To Ship Faster and Eliminate DevOps Hiring Needs

Try it out now!
Your Favorite DevOps Automation Platform
Qovery white logo

Your Favorite DevOps Automation Platform

Qovery is a DevOps Automation Platform Helping 200+ Organizations To Ship Faster and Eliminate DevOps Hiring Needs

Try it out now!
EngineeringKubernetesAWS