1. Orchestrating AI Model Serving Workloads with Kubernetes


    Orchestrating AI model serving workloads with Kubernetes typically involves creating multiple resources that work together to manage the lifecycle and scaling of your model serving application. For such tasks, you might define a Deployment to manage the pods that run your containers, Service objects to expose your application within the cluster or to the outside world, an optional Ingress object if you need HTTP routing, HorizontalPodAutoscaler for scaling, and perhaps CronJobs for running periodic tasks.

    In a Pulumi program, you'd write Python code that uses the Pulumi Kubernetes SDK to define these resources. The SDK provides classes and functions that correspond to Kubernetes resources, which you can assemble into a program to declaratively define the desired state of your cluster.

    Below is a Python program using Pulumi to orchestrate an AI model serving workload on Kubernetes. The program:

    1. Sets up a Namespace for organizing resources.
    2. Defines a Deployment for running model serving containers.
    3. Creates a Service to expose the deployment within the cluster.
    4. Sets up a HorizontalPodAutoscaler to automatically scale the number of pods.
    5. Optionally, creates a CronJob for periodic AI model retraining tasks.

    Let's look at a sample Pulumi program:

    import pulumi from pulumi_kubernetes.apps.v1 import Deployment from pulumi_kubernetes.core.v1 import Service from pulumi_kubernetes.autoscaling.v2beta1 import HorizontalPodAutoscaler from pulumi_kubernetes.batch.v1beta1 import CronJob # Define a Namespace for your AI workloads for better isolation and management ai_namespace = pulumi_kubernetes.core.v1.Namespace("ai-model-serving-ns", metadata={"name": "ai-model-serving"}) # Define a Deployment for the AI model serving workload ai_model_deployment = Deployment( "ai-model-serving-deployment", metadata={ "namespace": ai_namespace.metadata["name"], }, spec={ "selector": { "matchLabels": { "app": "ai-model-serving", }, }, "replicas": 2, # Start with 2 replicas "template": { "metadata": { "labels": { "app": "ai-model-serving", }, }, "spec": { "containers": [{ "name": "model-container", "image": "my-registry/my-model-serving-image:latest", # Replace with your container image "ports": [{"containerPort": 8080}], }], }, }, }) # Create a Service to expose the Deployment in the cluster ai_model_service = Service( "ai-model-serving-service", metadata={ "namespace": ai_namespace.metadata["name"], "labels": ai_model_deployment.spec["template"]["metadata"]["labels"], }, spec={ "ports": [{"port": 8080}], "selector": ai_model_deployment.spec["template"]["metadata"]["labels"], "type": "ClusterIP", # Use "LoadBalancer" for external access if needed }) # Define a HorizontalPodAutoscaler for automatic scaling based on resource usage ai_model_hpa = HorizontalPodAutoscaler( "ai-model-serving-hpa", metadata={ "namespace": ai_namespace.metadata["name"], }, spec={ "scale_target_ref": { "api_version": "apps/v1", "kind": "Deployment", "name": ai_model_deployment.metadata["name"], }, "min_replicas": 2, "max_replicas": 10, "resource": { "metrics": [{ "name": "cpu", "target": { "type": "Utilization", "average_utilization": 80, }, }], }, }) # Optional: Define a CronJob for periodic tasks like retraining the AI model ai_model_cron_job = CronJob( "ai-model-serving-cronjob", metadata={ "namespace": ai_namespace.metadata["name"], }, spec={ "schedule": "0 2 * * *", # Run every day at 2 am "job_template": { "spec": { "template": { "spec": { "containers": [{ "name": "retraining-container", "image": "my-registry/my-model-retraining-image:latest", # Replace with your retraining image }], "restart_policy": "OnFailure", }, }, }, }, }) # Export the Service name and ClusterIP to access the model serving API pulumi.export('service_name', ai_model_service.metadata['name']) pulumi.export('service_cluster_ip', ai_model_service.spec['cluster_ip'])

    In the program above, ai_namespace creates a separate namespace for our AI workloads, providing a way to group related resources for easier management and access control. The ai_model_deployment defines the desired state for deploying our model serving application. It specifies the container image, initial replica count, and the container port that should be exposed.

    The ai_model_service resource creates a Kubernetes service that allows other pods in the cluster to communicate with the pods managed by ai_model_deployment. The service type ClusterIP makes it only accessible within the cluster. If external access is needed, LoadBalancer could be used instead.

    ai_model_hpa sets up a Horizontal Pod Autoscaler that automatically scales our serving pods between 2 to 10 replicas based on the CPU utilization.

    The optional ai_model_cron_job schedules periodic jobs, useful for tasks like retraining your model. It is set to run once a day at a specified time.

    At the end, we export the name and the internal IP of