Deploy the gpu-operator helm chart on Digital Ocean Kubernetes Service

Question

Pulumi · Accepted Answer

To deploy the GPU Operator Helm chart on Digital Ocean Kubernetes Service (DOKS), you will need to follow several steps. I'll guide you through setting up the Kubernetes cluster on Digital Ocean, installing the Helm CLI, and deploying the GPU Operator using a Helm chart.

First, let's outline the steps:
1. **Create a DOKS Kubernetes Cluster**: We will create a new managed Kubernetes cluster using DigitalOcean's managed Kubernetes service.
2. **Install the Helm CLI**: The Helm CLI is a powerful tool for managing Kubernetes applications, Helm charts help you define, install, and upgrade even the most complex Kubernetes application.
3. **Deploy the GPU Operator**: We will use the Helm CLI within our Pulumi code to deploy the GPU Operator on our DOKS cluster.

### Step 1: Create a DOKS Kubernetes Cluster
To create a Kubernetes cluster on Digital Ocean using Pulumi, we use the `digitalocean.KubernetesCluster` resource.

Here's a Pulumi program in TypeScript that creates a Kubernetes cluster:

```typescript
import * as digitalocean from "@pulumi/digitalocean";
import * as k8s from "@pulumi/kubernetes";

// Create a DigitalOcean Kubernetes cluster with default settings
const cluster = new digitalocean.KubernetesCluster("do-cluster", {
    region: "nyc1", // New York datacenter (change it to the region you prefer)
    version: "1.21.5-do.0", // The version of Kubernetes to use (use a version supported by DOKS)
    nodePool: {
        name: "worker-pool",
        size: "s-2vcpu-2gb", // For GPU support you typically want larger instances
        nodeCount: 2, // Number of worker nodes
        tags: ["gpu-operator"], // Optional tags that you can add
    },
});

// Export the kubeconfig
export const kubeconfig = cluster.kubeConfigs[0].rawConfig;

// Rest of the program...
```
Here we create a DOKS cluster with a specific version and node size that's adequate for GPU workloads. Make sure to check for the latest versions and correct instance sizes for your use case.

### Step 2: Install the Helm CLI
Helm CLI itself is usually installed on your local machine or CI/CD systems, and it's not handled directly inside Pulumi. However, for our use case, we'll assume that Helm is installed and available in your environment.

### Step 3: Deploy the GPU Operator using Helm
In this step, we'll use the `kubernetes.helm.sh/v3.Chart` resource to deploy the GPU Operator Helm chart.

Here's how you add the GPU Operator chart deployment to your Pulumi program:

```typescript
// Use the Pulumi Kubernetes provider to interact with the DOKS cluster
const provider = new k8s.Provider("do-k8s", {
    kubeconfig: kubeconfig,
});

// Deploy the GPU-Operator using Helm chart
const gpuOperatorChart = new k8s.helm.v3.Chart("gpu-operator", {
    chart: "gpu-operator",
    version: "1.8.2", // specify the version of the GPU Operator chart
    namespace: "gpu-operator", // deploy in the namespace `gpu-operator`
    fetchOpts: {
        repo: "https://nvidia.github.io/gpu-operator", // replace with the correct Helm repo
    },
}, { provider: provider });

// Rest of the program...
```

In this Helm chart deployment:
- We specify the `gpu-operator` as the chart we want to deploy.
- We specify the version of the chart to ensure repeatability.
- We use the official NVIDIA GPU Operator repository.

Combine these steps into a single Pulumi TypeScript program:

```typescript
import * as digitalocean from "@pulumi/digitalocean";
import * as k8s from "@pulumi/kubernetes";

// Create the Kubernetes cluster on DigitalOcean
const cluster = new digitalocean.KubernetesCluster("do-cluster", {
    region: "nyc1",
    version: "1.21.5-do.0",
    nodePool: {
        name: "worker-pool",
        size: "s-2vcpu-2gb",
        nodeCount: 2,
        tags: ["gpu-operator"],
    },
});

// Export the kubeconfig
export const kubeconfig = cluster.kubeConfigs[0].rawConfig;

// Create a Kubernetes provider instance using the kubeconfig
const provider = new k8s.Provider("do-k8s", {
    kubeconfig: cluster.kubeConfigs[0].rawConfig,
});

// Deploy the GPU Operator using a Helm chart
const gpuOperatorChart = new k8s.helm.v3.Chart("gpu-operator", {
    chart: "gpu-operator",
    version: "1.8.2",
    namespace: "gpu-operator",
    fetchOpts: {
        repo: "https://nvidia.github.io/gpu-operator",
    },
}, { provider: provider });
```

Upon running this program with Pulumi, you will have a DigitalOcean Kubernetes cluster with the GPU Operator helm chart installed. To use GPUs in your workloads, you will then schedule your GPU workloads on this cluster.

Ensure that you have the correct prerequisites set up, such as Pulumi and Helm installed on your system and proper permissions to create resources on DigitalOcean and deploy Helm charts.