Deploy the nvidia helm chart on Google Kubernetes Engine (GKE)

Question

Pulumi · Accepted Answer

To deploy an Nvidia Helm chart on Google Kubernetes Engine (GKE) using Pulumi, you will need to perform a few high-level steps:

1. **Create a GKE Cluster**: Define and configure a Kubernetes cluster in Google Cloud.
2. **Install Helm and Tiller**: Ensure that Helm and its server-side component, Tiller, are installed on your GKE cluster.
3. **Add Nvidia Helm Repository**: Add the Nvidia Helm repository so that the desired Nvidia chart can be located.
4. **Deploy the Nvidia Helm Chart**: Define and deploy the Nvidia Helm chart to your GKE cluster.

For the purpose of this guide, I'll demonstrate how you can accomplish these steps with Pulumi in TypeScript.

First, you need to create a new GKE cluster. In Pulumi, you use resource classes from the `@pulumi/gcp` package to create cloud resources. The `Cluster` resource class represents a GKE cluster and allows you to configure properties such as the machine type, node count, and network settings.

Once the GKE cluster is created, you typically use the Pulumi Kubernetes provider to interact with the cluster. This provider uses the cluster's kubeconfig file to authenticate with the cluster.

Next, you'll set up Helm support. Pulumi's Kubernetes provider has first-class support for Helm, which can be leveraged to deploy Helm charts.

Below you can find an illustrative Pulumi TypeScript program that performs these steps. Please note that this assumes you have Pulumi and GCP set up and configured appropriately to deploy resources.

```typescript
import * as pulumi from "@pulumi/pulumi";
import * as gcp from "@pulumi/gcp";
import * as k8s from "@pulumi/kubernetes";

// Step 1: Create a GKE cluster
const cluster = new gcp.container.Cluster("demo-cluster", {
    initialNodeCount: 2,
    nodeConfig: {
        preemptible: true,
        machineType: "n1-standard-1",
        oauthScopes: [
            "https://www.googleapis.com/auth/compute",
            "https://www.googleapis.com/auth/devstorage.read_only",
            "https://www.googleapis.com/auth/logging.write",
            "https://www.googleapis.com/auth/monitoring",
        ],
    },
});

// Export the Cluster name and the Kubeconfig
export const kubeconfig = pulumi.
    all([cluster.name, cluster.endpoint, cluster.masterAuth]).
    apply(([name, endpoint, masterAuth]) => {
        const context = `${gcp.config.project}_${gcp.config.zone}_${name}`;
        return `apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: ${masterAuth.clusterCaCertificate}
    server: https://${endpoint}
  name: ${context}
contexts:
- context:
    cluster: ${context}
    user: ${context}
  name: ${context}
current-context: ${context}
kind: Config
preferences: {}
users:
- name: ${context}
  user:
    auth-provider:
      config:
        cmd-args: config config-helper --format=json
        cmd-path: gcloud
        expiry-key: '{.credential.token_expiry}'
        token-key: '{.credential.access_token}'
      name: gcp
`;
});

// Step 2: Setup a Kubernetes provider instance using the kubeconfig from the cluster
const clusterProvider = new k8s.Provider("provider", {
    kubeconfig: kubeconfig,
});

// Step 3: Add the Nvidia Helm repository
const nvidiaHelmRepo = new k8s.yaml.ConfigFile("nvidia-helm-repo", {
    file: "https://helm.ngc.nvidia.com/nvidia/chart",
}, { provider: clusterProvider });

// Step 4: Deploy the Nvidia Helm chart
const nvidiaHelmChart = new k8s.helm.v3.Chart("nvidia-chart", {
    chart: "nvidia",
    version: "version_number", // replace with the desired Nvidia Helm chart version
    fetchOpts: {
        repo: "https://helm.ngc.nvidia.com/nvidia/chart",
    },
}, { provider: clusterProvider });

// Export the necessary endpoints
export const nvidiaHelmChartName = nvidiaHelmChart.metadata.apply(m => m.name);
```

In the above program:

- We create a new GKE cluster with a couple of pre-emptible VMs of type `n1-standard-1` that saves on cost but may be reclaimed by GCP under resource pressure.
- The kubeconfig needed to interact with the cluster is generated and exported.
- A new instance of a Kubernetes provider is created using the exported kubeconfig which allows Pulumi to interact with the GKE cluster.
- The Nvidia Helm repository is added as a Kubernetes ConfigFile resource. Although this step is not always necessary when using Pulumi's Helm support, it illustrates how you might explicitly add a Helm repo.
- An Nvidia Helm chart is then deployed using Pulumi's Helm Chart resource. You have to replace `"version_number"` with the desired version of the Nvidia Helm chart you wish to deploy.

Please ensure to replace placeholders (like `"version_number"`) with actual values that suit your deployment requirements. Also, consider reviewing resource requirements and compatibility of the Nvidia chart with your cluster's configuration, as certain GPU-enabled charts might need specific Kubernetes configurations or hardware to work correctly.

After configuring the above code with your specific details, you can deploy it using the following Pulumi CLI commands:

```bash
pulumi up # Deploy the stack
pulumi stack export # To see the outputs
```

This program provides a very basic example of deploying a Helm chart. Depending on your specific needs and configurations, you might need to set additional property values, handle dependencies, or configure other services.