1. Node Feature Discovery for AI Hardware Optimization


    Node Feature Discovery (NFD) is a Kubernetes add-on that detects and advertises hardware features to the Kubernetes API server. This can be particularly useful in AI workloads where you want to schedule workloads onto nodes with specific hardware features like GPUs, TPUs, or other specialized accelerators that are critical for machine learning tasks.

    To achieve hardware optimization for AI on Kubernetes with Pulumi, you will want to deploy NFD to your cluster, which will expose the hardware features of the nodes as labels or annotations. Then, you can use these labels when scheduling pods to ensure they land on nodes with the desired hardware. In this Pulumi program, we're going to deploy a generic Kubernetes resource which would represent the NFD workload. You'll need to further customize it according to your specific hardware optimization needs.

    In this example, we'll assume you have a Kubernetes cluster already provisioned and Pulumi is configured with access to it.

    import pulumi import pulumi_kubernetes as k8s # Assuming the use of an existing Kubernetes cluster. # The kubeconfig file is often found at ~/.kube/config, # but it might be in a different location if you have # customized your setup. # Name of the NFD deployment nfd_deployment_name = "node-feature-discovery" # Kubernetes Deployment nfd_deployment = k8s.apps.v1.Deployment( nfd_deployment_name, metadata={ "name": nfd_deployment_name, "namespace": "kube-system" # It is recommended to deploy NFD in the `kube-system` namespace }, spec={ "selector": { "matchLabels": { "app": "node-feature-discovery" } }, "template": { "metadata": { "labels": { "app": "node-feature-discovery" } }, "spec": { "containers": [ { "name": "node-feature-discovery", "image": "k8s.gcr.io/nfd/node-feature-discovery:v0.6.0", # The arguments might vary depending on the version and your specific hardware "args": [ "--no-publish", "--sources=cpuid,pci,pstate,rdt,iommu,cpu,memory" ] } ] } } } ) # Export the deployment name pulumi.export('nfd_deployment_name', nfd_deployment.metadata["name"])

    What is happening here?

    • We declare that we will use the Kubernetes provider by importing pulumi_kubernetes.
    • We set up a Kubernetes deployment for Node Feature Discovery. This involves defining a pod with a container using the k8s.gcr.io/nfd/node-feature-discovery image. You will need to change the version (v0.6.0) to match the latest or required release for your situation.
    • The args field in the container specification is used to pass arguments to the NFD. Here, it's configured to discover a range of hardware features, but you might need to adjust this list to match your hardware.
    • We recommend deploying NFD in the kube-system namespace because this is a cluster-wide system component.
    • Finally, the deployment's name is exported as a stack output, so it can be easily retrieved if needed.

    Please note that this Pulumi program only creates the deployment resource. You need to ensure that your Kubernetes cluster has the right RBAC settings to allow NFD to inspect hardware features. You may also require DaemonSets or other configurations depending on your needs and the NFD version you are using. Always refer to the NFD official documentation for the most up-to-date instructions on deploying NFD.

    To apply this Pulumi program to your cluster, you would save it to a file (e.g., main.py), navigate to the directory containing this file, and run pulumi up. This would execute the script and provision the resources on your Kubernetes cluster. Ensure you have the required kubeconfig file for Pulumi to access your cluster.