Deploying Multus CNI into Amazon’s Elastic Kubernetes Service (EKS)

Joe Alford
8 min readFeb 17, 2022

--

EKS logo and Multus logo

Back in August 2021, Amazon announced support for the Multus CNI on their managed EKS platform. As we were just starting to migrate work loads over to EKS, this was great news for us as we were looking for a way to assign static IP addresses to a selection of pods (such that we could use them as outbound web traffic proxies, and add firewall rules accordingly) and had used Multus on some local clusters already.

However, once we actually started down the path of deployment, we soon ran into numerous issues with the Amazon provided deployment approach. While the finer points of some of these issues have slipped my mind in the months since starting this work and finding the time to type it up, they largely boiled down to bugs in the provided Lambda/Python code that caused:

  • an inability to have Multus in more than one availability zone (AZ)
  • an inability to properly dynamically add Multus to nodes added to the cluster
  • attempts to reboot a node after attaching an interface (which is not needed)

This solution was co-designed with Alan Hollis

To resolve this, we set around creating our own deployment strategy for Multus for EKS which can support:

  • more than one node per AWS AZ
  • dynamically able to add and remove nodes from the cluster
  • dynamically derive cluster-specific config, meaning that this is entirely portable (we have several clusters and didn’t want to have to write per-cluster config where possible)
  • able to support more than one Multus instance (or EKS cluster) per AWS account
  • no need to have separate Lambda functions — all functionality exists within the K8s cluster
  • no node reboots required

In order to deploy this, we took the following approach:

  • we used the AWS provided CNI and Multus files to deploy the relevant CRDs, accounts etc. into the cluster as this does work
  • create a K8s DaemonSet (DS) that runs in the cluster and looks for ‘Multus ENIs’, and attaches them to a cluster node, if the ENI is not already in use.

In order for the DS to work, we make use of:

  • tags on the AWS ENIs to denote that an ENI is to be consumed by Multus
  • add tags on the AWS EC2 instance to which the Multus network is attached
  • K8s node labels to denote that a Multus ENI is attached and that it can be considered for scheduling Multus consuming workloads

This following diagram looks to explain what’s going on a little more clearly.

Diagram showing relationship between Multus, ENIs and nodes
This shows the relationship of ENIs to nodes with/without attachments from Multus, and how the Multus ENI is always attached to the node running the DS

Deployment

Now, let’s walk through how to deploy all of this, and how to make it functional. The following assumptions have been made to get to this point:

  • you have a functional EKS cluster
  • you are able to deploy resources/manifests to the cluster
  • you have the permissions to create subnets and ENIs, apply tags to the ENIs/EC2 instances

Creating ENIs

If you’re reading this, you probably have a background on Multus, and what it does, so you should be aware that you’ll need to add another ENI to your cluster.

All of our infrastructure is deployed using Terraform and custom modules, so rather than sharing complete code that won’t port nicely, here are rough steps to create something that can be used:

  • create a subnet that your Mutlus ENI can consume IPs from
  • create a new ENI, making sure to use the below tags (you can change these, but you’ll need to update the code used in the DS image if you do)
  • relevant routes/Transit Gateways etc. to fit into your environment

This ENI is one resource that our Terraform creates without using a custom module, so the code is included below for

resource "aws_network_interface" "multus_eni_az_a_prod" {
subnet_id = module.prod_vpc.private_subnets[0] #the subnet created above. You'd use a `data` object to retrieve this
private_ips = ["10.231.4.4"] #range from the subnet above - we only needed one for our use-case, hence the single IP
tags = {
"multus": "true"
"Zone": "eu-west-2a"
"node.k8s.amazonaws.com/no_manage": "true"
"cluster": "EKS_CLUSTER_NAME"
}
description = "Multus network interface availability zone a"
}
Tags needed for multus ENI
Once the ENI is created, it should have tags roughly like the above

Installing Multus

Now that all of the AWS resources (EKS cluster, subnet, ENI) are in place, we need to install Multus into our cluster. For this, rather than reinvent the wheel, I used the existing config provided by Amazon. You can find the latest versions of the two relevant files at the below links. Install both of these into your cluster:

Validate that all required resources/pods etc. are installed before proceeding.

Installing the Daemon Set

Next, it is time to install the DS that is responsible for attaching an ENI to an EKS node. We’ll look at how this DS works shortly, but for now, you will need to head to this GitHub repo, and either:

Once you have a suitable copy of the image that your EKS cluster can access, you will need to deploy this DS into your cluster (be sure to update the image repo if you’re self-hosting). You can deploy this DS by installing this yaml file into your cluster.

Ensure that the pods all start as expected. They log to stdout, sokubectl logs will show you any output they produce.

Understanding the Daemon Set

In order to get a better understanding of what this DS is doing, you can either refer to the README for the repo linked above, or read a short summary below. The DS will:

  • start by generating a kubeconfig file dynamically, based on the cluster it’s deployed into (this is a one-time operation for the lifetime of the pod)

Then, every 5 minutes, it will:

  • get the ENIs tagged with mutlus:true, cluster:YOUR_CLUSTER_NAME, ZONE:DERVIVED_FROM_CURRENT_NODE
  • if there is no available ENI, it will wait 5 minutes and check again
  • if there is a returned ENI with a status of available then it will proceed to:
  • create an EC2 NetworkInterfaceAttachement between the node the DS pod is running on, and the ENI found above
  • it then tags the EC2 instance with multus-network-attached:true, and adds a label to the K8s node like so: cluster.custom.tags/multus-attached=true . (This K8s tag is so the pod that needs Multus knows where to run.)
  • Next, it renames the ENI from something like eth0 to multus. This is important, as the name of the interface is used by the NetworkAttachmentDefinition, and if they don’t match, multus will not work

There is some other stuff around handling restarted nodes etc., but that is essentially the crux of what the DS does. The fact that it uses tags to support multiple AZs, and more than one cluster per AWS account, it what sets it apart from the AWS provided solution, with it’s unreliable Lambda.

Verifying Multus

So now that Multus is installed, and we have a DS that will assign EC2 ENIs to EC2 instances, we should start seeing some attachment logs. Use kubectl logs -n kube-system kube-multus-eni-attacher-ds-**** and you should see it attach the ENI to the instance, and then issue logs like:

Thu Feb 17 11:43:51 UTC 2022:Found interface in eu-west-2b with tag multus, and status of in-use
Thu Feb 17 11:43:52 UTC 2022: Interface is currently attached or in-use with the correct name - waiting for 600 seconds to poll again...

Consuming Multus

At this point, Multus is ready to be used — we can create some pods that start to take IPs from our subnet we created earlier. To do this, we need two things:

  • a Multus NetworkAttachmentDefinition (NAD) which is essentially used to define the virtual K8s network which is to be used by Multus. An example for our outbound-proxy is below. How you design your network is up to you, but we have one NAD per availability zone (and one proxy per AZ, each with a static IP, thanks to Multus).
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: squid-proxy-multus-network-definition-az-a
namespace: squid-proxy-multi-az
spec:
config: '{
"cniVersion": "0.3.0",
"plugins": [
{
"type": "host-device",
"device": "multus", #note: this is what the ENI is renamed too by the DS
"ipam": {
"type": "host-local",
"subnet": "10.231.4.0/29",
"rangeStart": "10.231.4.4", #you'll note this matches
the private_ips assigned when we created the ENI above
"rangeEnd": "10.231.4.4",
"gateway": "10.231.4.1"
}
},
{
"type": "sbr"
}
]
}'
  • config for our pod to tell it to use the Multus network above. For this example NAD, it will look something like this:
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: squid-proxy-az-a
namespace: squid-proxy-multi-az
spec:
values:
nodeSelector: #these node selectors tell this pod where to run
#this is so we have one proxy pod per AZ
topology.kubernetes.io/zone: eu-west-2a
#remember we added a tag to a node in the DS? This tells our
#pod that the multus ENI is present here, so it can run

cluster.custom.tags/multus-attached: "true"
podAnnotations:
#this defines which NAD to use
k8s.v1.cni.cncf.io/networks: '[{
"name": "squid-proxy-multus-network-definition-az-a"
}]'

In the above, we use a patch to take our existing HelmRelease (although you could deploy your pod via other means) and set it to use the tags/NAD as described in-line above.

Once that pod is deployed, you should be able to use a kubectl describe pods to see that the annotations include the Multus networking we’ve just defined, and that there is an Event where Multus applied the networking.

Annotations (non-Multus parts redacted, formatting changed for readability):

Annotations:  {
"name": "squid-proxy-multi-az/squid-proxy-multus-network-definition-az-a",
"interface": "net1",
"ips": [
"10.232.4.4"
],
"mac": "",
"dns": {}
}]

Deployment events (pay attention to the event for net1 , which is our Multus ENI):

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 12s default-scheduler Successfully assigned squid-proxy-multi-az/squid-proxy-az-a-helm-squid-proxy-0 to ip-10-232-1-109.eu-west-2.compute.internal
Normal AddedInterface 10s multus Add eth0 [10.232.1.183/32] from aws-cni
Normal AddedInterface 10s multus Add net1 [10.232.4.4/29] from squid-proxy-multi-az/squid-proxy-multus-network-definition-az-a
Normal Created 10s kubelet Created container helm-squid-proxy
Normal Started 10s kubelet Started container helm-squid-proxy

Summary

There you have it — a quick look at how to use the AWS provided config files to install Multus, and then use a custom DS in-place of the AWS Lambda approach to attach the ENIs to cluster nodes.

If it looks much easier than the AWS provided method, that’s because it is!

--

--

Joe Alford