Deploy the gpu-operator helm chart on Opensshift

Question

Pulumi · Accepted Answer

To deploy the GPU Operator Helm chart on OpenShift using Pulumi, we will leverage the Pulumi Kubernetes provider. The GPU Operator automates the management of all NVIDIA software components needed to provision GPU. This operator deploys the NVIDIA driver, NVIDIA container runtime, device plugins, and monitoring services across your cluster.

Here is a step-by-step explanation of how to write a Pulumi program to achieve this:

1. **Set up Pulumi Kubernetes Provider**: We'll start by setting up the Pulumi Kubernetes provider to interact with your OpenShift cluster. You'll need to have `kubeconfig` configured for the target OpenShift cluster so that Pulumi can communicate with it.

2. **Install the GPU Operator using Helm**: The `kubernetes.helm.v3.Chart` resource is used to install Helm charts. In our case, we're installing the GPU Operator Helm chart. You'll need to specify the chart name, repository, and any values that are required for the configuration of the GPU Operator. Since Helm charts often depend on the CRDs that they introduce being available when resources defined by the chart are created, we use the `skipAwait` option to tell Pulumi not to wait for all resources to become ready before considering the Helm chart installed. This is necessary because Pulumi waits for all resources to be in a ready state, but CRDs can cause a race condition during the Helm chart deployment.

Below is a TypeScript program that deploys the GPU Operator to an OpenShift cluster:

```typescript
import * as pulumi from "@pulumi/pulumi";
import * as k8s from "@pulumi/kubernetes";

// Create an instance of the Kubernetes provider configured with the kubeconfig of our OpenShift cluster.
const k8sProvider = new k8s.Provider("openshiftK8s", {
    kubeconfig: process.env.KUBECONFIG,
});

// Define the GPU Operator Helm chart. 
const gpuOperatorChart = new k8s.helm.v3.Chart("gpu-operator", {
    chart: "gpu-operator",
    version: "1.9.0", // Specify the version of the GPU Operator you wish to deploy.
    namespace: "gpu-operator", // Specify the namespace where the GPU Operator should be deployed. It can be an existing or new namespace.
    fetchOpts: {
        repo: "https://nvidia.github.io/gpu-operator", // The repository URL of the GPU Operator Helm chart.
    },
    values: {
        // Include any necessary configuration values here. For example:
        operator: {
            defaultRuntime: "crio",
        },
    },
    skipAwait: true, // Skip await due to CRDs and potential race conditions.
}, {provider: k8sProvider});

// Export the namespace name where the GPU Operator is installed
export const namespaceName = gpuOperatorChart.namespace;
```

This program creates the necessary resources in the OpenShift cluster to run the GPU Operator using Helm. Ensure that you have the OpenShift CLI (`oc`) installed, logged in, and configured with the appropriate permissions before running this Pulumi program.

Remember, the values in the `Chart` resource can be adjusted based on your requirements. This includes the version of the GPU Operator, which should be chosen based on compatibility with your Kubernetes/OpenShift version and your workloads. Before running the program, check the Helm chart's values file for additional options you might want to configure.

To execute this Pulumi program:

1. Save the code to a file with a `.ts` extension, for example, `deployGpuOperator.ts`.
2. Run `pulumi up` to preview and deploy the changes. Pulumi will orchestrate the deployment according to the resources defined in the program.

Please ensure that you have Pulumi installed and configured for TypeScript, and that you're authenticated against your OpenShift cluster. For more details on how to use Pulumi with TypeScript, you can visit the [Pulumi TypeScript documentation](https://www.pulumi.com/docs/intro/languages/typescript/).