Auto-Scaling AI Applications with Ingress Controllers

Question

Pulumi · Accepted Answer

To auto-scale AI applications in Kubernetes, you typically need a few key components:

1. **Deployment or StatefulSet**: This runs your AI application as a set of pods within Kubernetes.
2. **Service**: To expose your AI application pods within Kubernetes.
3. **Ingress Controller**: To manage external access to your services within the cluster. An example of an Ingress Controller is NGINX.
4. **Horizontal Pod Autoscaler (HPA)**: Automatically scales the number of pods in a deployment or replica set based on observed CPU utilization or custom metrics.

For an Ingress Controller, the NGINX Ingress Controller is often used due to its flexibility and functionality. Pulumi has a `kubernetes-ingress-nginx.IngressController` resource you can use to deploy this in your cluster.

The HPA will watch the CPU load, or a custom metric that your application exposes, and will scale the number of replica pods up or down. If the load increases, it deploys more pods to handle the traffic; if the load decreases, it reduces the number of pods.

Let's walk through how you might set this up with Pulumi and Python.

First, we'd create a deployment with a specified number of replica sets. This deployment would have resource requests and limits, which are important for the HPA to make autoscaling decisions. The HPA targets CPU utilization of these pods according to what's specified in the `targetCPUUtilizationPercentage` field.

Next, we would define a service of type `ClusterIP` to act as an internal load balancer to distribute traffic among pods of our application.

Then, we'd set up an NGINX Ingress, which requires the NGINX Ingress Controller to be running in your cluster. We'll assume this controller is already installed.

Finally, we'd create the HPA resource tied to our deployment, triggering scaling actions based on the CPU utilization.

Here's a Pulumi program that does all the above:

```python
import pulumi
import pulumi_kubernetes as k8s
from pulumi_kubernetes.networking.v1 import Ingress
from pulumi_kubernetes.autoscaling.v1 import HorizontalPodAutoscaler
from pulumi_kubernetes.apps.v1 import Deployment, DeploymentSpecArgs
from pulumi_kubernetes.core.v1 import Service, ServiceSpecArgs, ResourceRequirementsArgs

# Define the application deployment
app_labels = { "app": "ai-application" }

app_deployment = Deployment("ai-app-deployment",
    spec=DeploymentSpecArgs(
        replicas=3,  # Starting with 3 replicas
        selector={ "matchLabels": app_labels },
        template={
            "metadata": { "labels": app_labels },
            "spec": {
                "containers": [{
                    "name": "ai-application",
                    "image": "ai-application:latest",  # Replace with your application's container image
                    "resources": ResourceRequirementsArgs(
                        requests={"cpu": "500m", "memory": "512Mi"},
                        limits={"cpu": "1000m", "memory": "1024Mi"},
                    ),
                }]
            }