Deploy the nvidia helm chart on AWS EKS

Question

Pulumi · Accepted Answer

To deploy the NVIDIA Helm chart on AWS EKS, we will follow these steps:

1. **Set up an EKS Cluster**: We will use the `eks.Cluster` resource from the Pulumi EKS package which sets up all the necessary resources for an EKS cluster. We will make sure that the cluster configuration supports GPUs, as that is typically required for NVIDIA workloads.

2. **Install the Helm Chart**: After the cluster is up and running, we will install the NVIDIA Helm chart using the `kubernetes.helm.v3.Chart` resource from the Pulumi Kubernetes provider. This resource will deploy the specified Helm chart to our EKS cluster, and we'll need to ensure the correct Helm repository and chart name are specified.

Here is the program written in TypeScript that accomplishes these two steps:

```typescript
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
import * as eks from "@pulumi/eks";
import * as k8s from "@pulumi/kubernetes";

// Create an EKS cluster with GPU support.
const cluster = new eks.Cluster("my-cluster", {
    instanceType: "p2.xlarge", // This is a general-purpose GPU-enabled instance, adjust as necessary for your workloads.
    desiredCapacity: 2, // Desired number of instances - adjust as necessary.
    minSize: 1,
    maxSize: 3,
    
    // More customization regarding the cluster can be done here.
});

// Export the cluster's kubeconfig.
export const kubeconfig = cluster.kubeconfig;

// Create a provider for the EKS cluster.
const clusterProvider = new k8s.Provider("my-cluster-provider", {
    kubeconfig: cluster.kubeconfig,
});

// Install the NVIDIA device plugin using a Helm chart.
const nvidiaHelmChart = new k8s.helm.v3.Chart("nvidia-device-plugin", {
    chart: "nvidia-device-plugin",
    version: "<chart version>", // Replace this with the version you want to install.
    fetchOpts: {
        repo: "https://helm.ngc.nvidia.com/nvidia-device-plugin", // Official NVIDIA Helm chart repository.
    },
}, { provider: clusterProvider });

// Export values that might be useful.
export const clusterName = cluster.eksCluster.name;
export const clusterEndpoint = cluster.eksCluster.endpoint;
```

#### Detailed Explanation:

- We import the necessary Pulumi packages at the top of the program. This includes AWS, EKS, and Kubernetes packages that allow us to describe our cloud resources using TypeScript.

- We create an EKS cluster with the `eks.Cluster` constructor. The `instanceType` property is set to a GPU-enabled instance (in this case `p2.xlarge`) to ensure our cluster can run GPU workloads. We also set the desired, minimum, and maximum number of instances for the cluster.

- After creating the cluster, we export the `kubeconfig`. This is a configuration file necessary for connecting to the Kubernetes cluster with tools like `kubectl` or other Kubernetes clients.

- Next, we create a Pulumi Kubernetes provider for the newly created cluster. This provider uses the `kubeconfig` of the cluster to communicate with it.

- Using the `k8s.helm.v3.Chart` resource, we deploy the NVIDIA device plugin to the cluster. We specify the chart and version, making sure that the chart supports running GPU workloads on the Kubernetes cluster. The repository URL is the location of the NVIDIA Helm charts.

- Finally, we export the cluster name and endpoint as convenience outputs that could be useful for accessing the cluster afterwards.

#### What Now?

To apply this program, save it in a TypeScript file (e.g., `index.ts`) within a Pulumi project. Ensure that you have the Pulumi CLI installed and configured with your AWS credentials. Then, run `pulumi up` to create the resources. This will set up an EKS cluster with the NVIDIA device plugin installed, ready for you to deploy GPU-accelerated workloads.