GPU-enabled Kubernetes Clusters for LLM Training Workloads

Question

Pulumi · Accepted Answer

To set up a GPU-enabled Kubernetes cluster suitable for language model (LLM) training workloads, you will need to create a Kubernetes cluster and then configure it with a node pool that includes GPU-enabled instances. For this example, I'm going to show you how to do this using the Google Cloud Platform (GCP) and Pulumi.

Google Cloud offers GPU-enabled virtual machines that can be used as nodes in a Kubernetes cluster managed by Google Kubernetes Engine (GKE). The `pulumi_gcp` Pulumi package provides resources to create and manage Kubernetes clusters in GCP.

Here's how to create a GPU-enabled GKE cluster using Pulumi with Python:

1. **Google Kubernetes Engine (GKE) Cluster**: You will start by creating a Kubernetes cluster resource, defining the basic parameters such as the location, initial node count, and Kubernetes version.

2. **GKE Node Pool**: You will create a separate node pool that specifies the type of the machines that should include the NVIDIA Tesla GPUs suitable for your LLM training workloads.

3. **Pulumi Exports**: At the end of the Pulumi program, you will export some key information, such as the cluster name and Kubeconfig, which are necessary to interact with the cluster once the deployment is complete.

The following Pulumi Python program assumes you have already set up `gcloud` CLI with the appropriate authentication and project configuration. It also assumes you have installed the Pulumi CLI and the necessary Pulumi SDK for Python.

Before you begin, you will need to enable the Kubernetes Engine API and Compute Engine API in the Google Cloud Console.

```python
import pulumi
import pulumi_gcp as gcp

# Cluster configuration variables
project_id = "your-gcp-project-id"  # Google Cloud project ID
zone = "us-west1-b"  # Google Cloud zone
cluster_name = "llm-gpu-cluster"
kubernetes_version = "1.20.9-gke.1001"  # specify the desired Kubernetes version
node_pool_name = "gpu-node-pool"
gpu_type = "nvidia-tesla-v100"  # specify the GPU type for the cluster
gpu_count_per_node = 1  # GPUs per node

# Create a GKE cluster
cluster = gcp.container.Cluster(cluster_name,
    initial_node_count=1,  # one node in default node pool (can be changed or default pool can be removed if needed)
    min_master_version=kubernetes_version,
    location=zone,
    project=project_id
)

# Create a GKE node pool with GPUs
gpu_node_pool = gcp.container.NodePool(node_pool_name,
    cluster=cluster.name,
    location=cluster.location,
    node_count=1,  # specify the number of nodes in the GPU node pool
    node_config=gcp.container.NodePoolNodeConfigArgs(
        preemptible=False,
        machine_type="n1-standard-4",  # specify the machine type
        oauth_scopes=[
            "https://www.googleapis.com/auth/compute",
            "https://www.googleapis.com/auth/devstorage.read_only",
            "https://www.googleapis.com/auth/logging.write",
            "https://www.googleapis.com/auth/monitoring",
        ],
        guest_accelerator=gcp.container.NodePoolNodeConfigGuestAcceleratorArgs(
            type=gpu_type,
            count=gpu_count_per_node
        ),
        metadata={"disable-legacy-endpoints": "true"},
        labels={"llm-node": "true"},
        taints=[
            gcp.container.NodePoolNodeConfigTaintArgs(
                key="llmworkload",
                value="true",
                effect="NO_SCHEDULE",
            )
        ]
    ),
    autoscaling=gcp.container.NodePoolAutoscalingArgs(
        min_node_count=1,
        max_node_count=4  # setting max number of nodes for auto-scaling
    ),
    management=gcp.container.NodePoolManagementArgs(
        auto_repair=True,
        auto_upgrade=True
    ),
    project=project_id,
)

# Export the cluster name and Kubeconfig file for accessing the cluster
pulumi.export('cluster_name', cluster.name)
kubeconfig = pulumi.Output.all(cluster.name, cluster.endpoint, cluster.master_auth).apply(lambda args: '''
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: {1}
    server: https://{0}
  name: {2}
contexts:
- context:
    cluster: {2}
    user: {2}
  name: {2}
current-context: {2}
kind: Config
preferences: {{}}
users:
- name: {2}
  user:
    auth-provider:
      config:
        cmd-args: config config-helper --format=json
        cmd-path: gcloud
        expiry-key: '{{.credential.token_expiry}}'
        token-key: '{{.credential.access_token}}'
      name: gcp
'''.format(args[1], args[2]['cluster_ca_certificate'], args[0]))

pulumi.export('kubeconfig', kubeconfig)
```

Replace `'your-gcp-project-id'` with your actual Google Cloud project ID.

This program will:
- Create a GKE cluster.
- Add a GPU-enabled node pool to the cluster.
- Configure the required OAuth scopes and machine types.
- Set labels and taints to ensure that only the LLM workloads run on these nodes.
- Enable autoscaling for the node pool to add or remove nodes based on the workload.
- Export the cluster name and generate a kubeconfig file that can be used to interact with the Kubernetes API.

After running `pulumi up` with this program, you will have a Kubernetes cluster with a GPU-enabled node pool that can be used to train LLM models.