Kubernetes Cluster Autoscaling for AI Model Training

Question

Pulumi · Accepted Answer

Setting up a Kubernetes cluster for AI model training requires an appropriate setup that can efficiently manage the varying demands of computational resources, as AI workloads can be highly variable. Autoscaling is a key feature that helps to dynamically adjust the number of nodes or pods based on the current workload, which can help to optimize costs and ensure that there are enough resources for the training jobs.

In Kubernetes, the Horizontal Pod Autoscaler (HPA) and Cluster Autoscaler are often used in conjunction to manage autoscaling:

1. **Horizontal Pod Autoscaler (HPA)**: Automatically scales the number of pods in a replication controller, deployment, replica set, or stateful set based on observed CPU utilization or other select metrics.

2. **Cluster Autoscaler**: Automatically adjusts the size of the Kubernetes cluster when one of the following conditions is true:
    - There are pods that failed to run in the cluster due to insufficient resources.
    - There are nodes in the cluster that have been underutilized for an extended period and their pods can be placed on other existing nodes.

Below is a Pulumi program in Python that sets up a Kubernetes cluster with autoscaling enabled. This program uses the `kubernetes` package, employing resources like `Deployment`, `HorizontalPodAutoscaler`, and utilizes a metric server for CPU-based scaling.

Make sure you have the Pulumi CLI installed and configured for use with your Kubernetes cluster. You will also need to have access to a Kubernetes cluster where you have permissions to deploy resources.

```python
import pulumi
import pulumi_kubernetes as k8s

# Define the application name.
app_name = "ai-model-training"

# Create a Kubernetes Deployment that deploys a specified number of pod replicas to run the application.
# It's assumed that you have a container image (`my-image:latest`) ready for deployment which contains your AI model training application.
app_labels = {"app": app_name}
deployment = k8s.apps.v1.Deployment(
    "ai-model-deployment",
    spec=k8s.apps.v1.DeploymentSpecArgs(
        replicas=2,  # Starting with 2 replicas.
        selector=k8s.meta.v1.LabelSelectorArgs(match_labels=app_labels),
        template=k8s.core.v1.PodTemplateSpecArgs(
            metadata=k8s.meta.v1.ObjectMetaArgs(labels=app_labels),
            spec=k8s.core.v1.PodSpecArgs(
                containers=[k8s.core.v1.ContainerArgs(
                    name=app_name,
                    image="my-image:latest",  # Replace with your actual image.
                    resources=k8s.core.v1.ResourceRequirementsArgs(
                        requests={"cpu": "500m", "memory": "512Mi"},
                        limits={"cpu": "1000m", "memory": "1024Mi"},
                    ),
                )],
            ),
        ),
    ),
)

# Define a Horizontal Pod Autoscaler that adjusts the number of pods in the deployment based on CPU utilization.
hpa = k8s.autoscaling.v2.HorizontalPodAutoscaler(
    "ai-model-hpa",
    spec=k8s.autoscaling.v2.HorizontalPodAutoscalerSpecArgs(
        scale_target_ref=k8s.autoscaling.v2.CrossVersionObjectReferenceArgs(
            api_version="apps/v1",
            kind="Deployment",
            name=deployment.metadata.name,
        ),
        min_replicas=2,
        max_replicas=5,  # Autoscale up to 5 replicas.
        metrics=[k8s.autoscaling.v2.MetricSpecArgs(
            type="Resource",
            resource=k8s.autoscaling.v2.ResourceMetricSourceArgs(
                name="cpu",
                target=k8s.autoscaling.v2.MetricTargetArgs(
                    type="Utilization",
                    average_utilization=pulumi.Input(50),
                ),
            ),
        )],
    ),
)

# Export the name of the deployment and the HPA.
pulumi.export("deployment_name", deployment.metadata.name)
pulumi.export("hpa_name", hpa.metadata.name)
```

In the above program:
- A Kubernetes Deployment is created with two replicas to start. Replace `my-image:latest` with your actual container image that hosts your AI training application.
- The pods request a minimum of 0.5 CPU and 512Mi of memory and are limited to a maximum of 1 CPU and 1Gi of memory.
- A Horizontal Pod Autoscaler (HPA) is created targeting the deployment.
- The HPA adjusts the number of replicas in the deployment based on the CPU utilization. In this case, the HPA targets an average CPU utilization of 50%.
- The minimum number of replicas is set to 2 and can scale up to a maximum of 5 replicas.

While the HPA helps to scale the pods within the cluster, if your cluster requires additional nodes to accommodate increasing pods, you would also want to set up a Cluster Autoscaler, typically provided by the cloud service with the Kubernetes cluster. The setup for a Cluster Autoscaler may vary based on your cloud provider and Kubernetes version.

For AI model training workloads that are highly variable and computationally intensive, it is recommended to have thorough monitoring and alerting in place to keep track of the performance and cost implications of autoscaling.