Kubernetes for AI Workload Resiliency and High Availability

Question

Pulumi · Accepted Answer

To ensure high availability and resiliency for AI workloads in a Kubernetes cluster, you will typically want to use a combination of Kubernetes' built-in features and best practices that together enable the system to tolerate failures and maintain service continuity.

Here is what you need to create in a Kubernetes cluster for your AI workloads to be highly available and resilient:

1. **Replicas**: By using more than one replica of a pod, you can ensure that even if one fails, others can take over.

2. **PodDisruptionBudget (PDB)**: PDBs limit the number of concurrent disruptions that your application experiences, ensuring high availability during voluntary disruptions like upgrades.

3. **Resource Requests and Limits**: By setting the appropriate CPU and memory requests and limits for your containers, you ensure that your AI workload gets the resources it needs and is stable under load.

4. **Liveness and Readiness Probes**: These help to ensure that traffic does not go to pods that are not ready to handle it and restarts ones that have become unresponsive.

5. **Node Affinity and Anti-affinity**: Use affinity to schedule your AI workloads on appropriate nodes and anti-affinity to ensure that the workloads are spread across different nodes for fault tolerance.

6. **Priority Class**: Assign a priority to your AI workload's pods to ensure that they are scheduled and rescheduled preferentially over lower-priority workloads.

7. **Horizontal Pod Autoscaler (HPA)**: It automatically scales the number of pod replicas based on CPU usage or other select metrics to accommodate changes in load.

The following Pulumi Python program creates a `Deployment` that ensures some of these high availability features for a hypothetical AI application:

```python
import pulumi
import pulumi_kubernetes as k8s

# Define the application container
app_container = k8s.core.v1.ContainerArgs(
    name="ai-app",
    image="ai-app:latest",  # Replace with your actual image
    resources=k8s.core.v1.ResourceRequirementsArgs(
        requests={"cpu": "500m", "memory": "1Gi"},
        limits={"cpu": "1", "memory": "2Gi"}
    ),
    liveness_probe=k8s.core.v1.ProbeArgs(
        http_get=k8s.core.v1.HTTPGetActionArgs(
            path="/healthz",
            port=8080
        ),
        initial_delay_seconds=60,
        period_seconds=60
    ),
    readiness_probe=k8s.core.v1.ProbeArgs(
        http_get=k8s.core.v1.HTTPGetActionArgs(
            path="/ready",
            port=8080
        ),
        initial_delay_seconds=30,
        period_seconds=30
    )
)

# Create a Deployment for the AI application
ai_app_deployment = k8s.apps.v1.Deployment(
    "ai-app-deployment",
    spec=k8s.apps.v1.DeploymentSpecArgs(
        replicas=3,  # Multiple replicas for redundancy
        selector=k8s.meta.v1.LabelSelectorArgs(
            match_labels={"app": "ai-app"}
        ),
        template=k8s.core.v1.PodTemplateSpecArgs(
            metadata=k8s.meta.v1.ObjectMetaArgs(
                labels={"app": "ai-app"}
            ),
            spec=k8s.core.v1.PodSpecArgs(
                containers=[app_container]
            )
        )
    )
)

# Define a PodDisruptionBudget for the AI application
ai_app_pdb = k8s.policy.v1beta1.PodDisruptionBudget(
    "ai-app-pdb",
    spec=k8s.policy.v1beta1.PodDisruptionBudgetSpecArgs(
        min_available=1,  # At least one replica must be available at all times
        selector=k8s.meta.v1.LabelSelectorArgs(
            match_labels={"app": "ai-app"}
        )
    )
)

# Define a PriorityClass for the AI application pods
ai_app_priority_class = k8s.scheduling.v1.PriorityClass(
    "ai-app-priority-class",
    value=1000000,  # Arbitrary value to represent high priority
    global_default=False,
    description="High priority class for AI app pods"
)

# Export the Deployment name
pulumi.export("deployment_name", ai_app_deployment.metadata["name"])
```

This program creates a Kubernetes `Deployment` that ensures the following:

- **Replicas**: The AI application is deployed with 3 replicas for redundancy.
- **PodDisruptionBudget**: A PDB is set up to ensure that at least one replica of the AI app is running at all times.
- **Resources**: CPU and memory requests and limits are defined for the AI application containers. Adjust the values according to your AI workload's actual requirements.
- **Probes**: Liveness and readiness probes are configured to check the health and readiness of each container.
- **PriorityClass**: A PriorityClass ensures that the AI application pods have a higher scheduling priority than other pods in the cluster.

To run this Pulumi program, you need to have Pulumi installed and set up with your Kubernetes cluster's configuration. Save the code in a file with a `.py` extension, and run `pulumi up` after navigating to the directory containing your code to deploy the resources to your Kubernetes cluster.