Auto-Scaling AI Workloads on Google Kubernetes Engine

Question

Pulumi · Accepted Answer

To facilitate auto-scaling AI workloads on Google Kubernetes Engine (GKE), you would typically look at creating a Kubernetes cluster with autoprovisioning enabled on node pools so that GKE can automatically manage the types and amount of resources your workload needs.

The resources implicated for this task from the Pulumi Registry include GKE clusters, autoscalers, and node pools, among others. In the context of scaling AI workloads, considerations for GPU or TPU resources might also be necessary, given that these could be critical for AI model training and inference tasks.

Here’s a high-level overview of the steps we will follow:

1. **Create a GKE Cluster**: A GKE cluster is a set of compute resources, and the control plane that manages those resources on Google Cloud.
2. **Configure Node Pools**: Node pools enable the GKE cluster to contain multiple node instances where the AI workload will run.
3. **Set up Auto-Scaling**: GKE provides a way to automatically resize the number of nodes in a given node pool, based on the demands of your workloads.

Now let's write a Pulumi program in Python that accomplishes these high-level tasks. Within this program:

- Creates a GKE cluster.
- Configures a node pool with auto-scaling enabled and sets up specialized hardware such as GPUs for the AI workloads.
- Optionally, applies other configurations such as setting the Kubernetes version or enabling network policies.

```python
import pulumi
import pulumi_gcp as gcp

# Initialize GCP project and location configuration.
config = pulumi.Config()
project = config.require("project")
zone = config.require("zone")

# Create a GKE cluster.
cluster = gcp.container.Cluster("ai-cluster",
    initial_node_count=1,
    min_master_version="latest",
    node_version="latest",
    node_config={
        "oauth_scopes": [
            "https://www.googleapis.com/auth/compute",
            "https://www.googleapis.com/auth/devstorage.read_only",
            "https://www.googleapis.com/auth/logging.write",
            "https://www.googleapis.com/auth/monitoring"
        ],
        "machine_type": "n1-standard-1",
        "labels": {"workload": "ai-processing"},
        "metadata": {"disable-legacy-endpoints": "true"},
    },
    autoscaling={"enabled": True},
    description="Auto-scaling AI workloads on GKE",
    # Enabling network policy on cluster allows you to restrict and control network access to and from container Pods.
    network_policy={"enabled": True},
    resource_labels={"usage": "ai-workload-scaling"},
    location=zone,
    project=project)

# Configure a node pool with auto-scaling and optional GPU/TPU resources for AI workloads.
ai_node_pool = gcp.container.NodePool("ai-node-pool",
    cluster=cluster.name,
    initial_node_count=1,
    autoscaling={
        "min_node_count": 1,
        "max_node_count": 5,
    },
    management={
        "auto_repair": True,
        "auto_upgrade": True
    },
    node_config={
        # Sample machine type and GPU configuration, this will vary based on the workload requirements and budget.
        "machine_type": "n1-standard-4",
        "oauth_scopes": [
            "https://www.googleapis.com/auth/compute",
            "https://www.googleapis.com/auth/devstorage.read_only",
            "https://www.googleapis.com/auth/logging.write",
            "https://www.googleapis.com/auth/monitoring"
        ],
        # Accelerators such as Google's TPUs or Nvidia GPUs can significantly speed up AI-related tasks.
        "accelerators": {
            "accelerator_count": 2,
            "accelerator_type": "nvidia-tesla-v100",
        },
        "labels": {"usage": "gpu-accelerated-workload"},
        "metadata": {"disable-legacy-endpoints": "true"},
    },
    location=zone,
    project=project)

pulumi.export('cluster_name', cluster.name)
pulumi.export('ai_node_pool_name', ai_node_pool.name)
```

In the code above, we're doing the following:

- Creating a GKE cluster (`ai-cluster`) with basic configurations for authentication, computational resources, and network policy enablement.
- Creating a node pool (`ai-node-pool`) for the cluster with auto-scaling configurations. The provided sample uses `n1-standard-4` machine types and specifies Nvidia Tesla V100 GPUs as accelerators, which are well-suited for computational workloads such as AI training jobs.
- Auto-repair and auto-upgrade are enabled on the node pool to ensure the cluster nodes remain healthy and up-to-date.

Please note that the specific machine types and accelerators may change depending on workload demands and what's available in your Google Cloud region. You'll need to adjust the `machine_type` and `accelerator_type` according to your particular AI workload and performance needs.

You can create the Pulumi stack by running `pulumi up` in the directory containing this script, assuming you have Pulumi installed and have an active GCP account configured. The Pulumi CLI will take care of authenticating with Google Cloud, creating the resources, and outputting any exported values such as the cluster and node pool names.