High-Performance GPU Workloads with DigitalOcean Node Pools

Question

Pulumi · Accepted Answer

When deploying high-performance GPU workloads, it's important to choose the right kind of infrastructure that offers GPU capabilities. One of the cloud providers that offer such capabilities is DigitalOcean. DigitalOcean's GPU-optimized droplets are suitable for machine learning, scientific computing, video encoding, and other server-side GPU tasks.

In the Pulumi infrastructure as code framework, we can define and provision DigitalOcean resources using the `pulumi_digitalocean` Python package. Below, we'll create a Pulumi program that sets up a DigitalOcean Kubernetes Cluster with a Node Pool that includes GPU-optimized droplets. This setup will allow you to run high-performance GPU workloads.

The key resources we'll be using are `KubernetesCluster` to create the Kubernetes cluster itself and `KubernetesNodePool` to define a pool of GPU-optimized droplets within the cluster:

1. `KubernetesCluster`: Represents a managed Kubernetes cluster where the container workloads can be deployed and orchestrated.
2. `KubernetesNodePool`: Represents a group of droplets that serve as worker nodes for the Kubernetes cluster. It can be configured with specific sizes that provide GPU capabilities.

Here's the Pulumi program in Python that will accomplish this:

```python
import pulumi
import pulumi_digitalocean as digitalocean

# Create a new DigitalOcean Kubernetes cluster
k8s_cluster = digitalocean.KubernetesCluster(
    "gpu-k8s-cluster",
    region="nyc1",     # The region where the cluster will be created
    version="1.21.5-do.0",  # Specify the version of Kubernetes
    # Node pool using GPU-optimized droplet sizes
    node_pool={
        "name": "gpu-node-pool",
        "size": "g-32vcpu-128gb",  # This droplet size includes GPU capabilities
        "nodeCount": 3,  # The number of nodes (droplets) in the node pool
    })

# Export the cluster's kubeconfig and endpoint
pulumi.export("kubeconfig", k8s_cluster.kube_configs.apply(lambda configs: configs[0]["rawConfig"]))
pulumi.export("cluster_endpoint", k8s_cluster.endpoint)
```

In the program above:

- A Kubernetes cluster (`gpu-k8s-cluster`) is created in the `nyc1` region with the specified Kubernetes version.
- A node pool (`gpu-node-pool`) is defined with a droplet size that includes GPU (`g-32vcpu-128gb`), and this size is responsible for providing the GPU capabilities required for our high-performance workloads. We are starting with 3 nodes in our initial pool setup.
- We export the `kubeconfig`, which will be used to interact with the Kubernetes cluster using kubectl or other Kubernetes management tools.
- We also export the `cluster_endpoint`, which gives us the API endpoint to interact with our Kubernetes cluster directly.

After you run this Pulumi program, it will output the Kubernetes configuration and endpoint that you can use to deploy GPU-enabled applications to the cluster.

Please ensure you have the DigitalOcean access token configured for Pulumi, either via environment variable `DIGITALOCEAN_TOKEN` or using the Pulumi configuration system, before running this program.

To create the infrastructure with Pulumi, save the code to a file (let's say `__main__.py`), and then execute it using the Pulumi CLI:

```sh
pulumi up
```

This command will prompt you to confirm the details of the resources that will be created. Once confirmed, Pulumi will provision the resources in your DigitalOcean account.

When you'd like to remove the resources, you can run:

```sh
pulumi destroy
```

This will de-provision the resources from your DigitalOcean account.