Kubernetes-based Canary Deployments for AI Services

Question

Pulumi · Accepted Answer

Canary deployments are a pattern for rolling out releases to a subset of users or servers. The idea is to deploy a new version of your application to a small percentage of your production infrastructure and then gradually increase this percentage as you gain confidence in the release. Kubernetes is a great platform for this kind of deployment strategy due to its powerful orchestration capabilities.

When it comes to AI services, or any service running within a Kubernetes cluster, canary deployments can help ensure that new versions are stable and perform as expected before a full rollout.

To implement a canary deployment in Kubernetes, you typically need:
1. Multiple versions of your application running in the cluster, often with different tags.
2. A Kubernetes service that acts as an entry point to your application.
3. A way to route traffic to different versions of your application based on certain criteria (like weights to adjust traffic percentages).

In our setup, we'll use a simplified scenario of an AI service where we'll have two deployments:
- The stable version of our service (primary).
- The new version of our service to be evaluated (canary).

We will then create a service that routes traffic to both deployments. For simplicity, we will not implement complicated traffic shifting logic, where specific percentages of traffic are shifted to the canary. Instead, we will just ensure both versions are running and are accessible.

Here's a basic Pulumi program in Python that sets up a canary deployment in Kubernetes. We will use two Deployments and one Service resource. The deployments will each run a version of an AI service, and the service will expose both deployments under a single IP address.

```python
import pulumi
import pulumi_kubernetes as k8s

# Define the stable version of your AI application.
primary_app_labels = {"app": "ai-service", "version": "stable"}
primary_deployment = k8s.apps.v1.Deployment("primary-deployment",
    spec=k8s.apps.v1.DeploymentSpecArgs(
        replicas=3,
        selector=k8s.meta.v1.LabelSelectorArgs(
            match_labels=primary_app_labels
        ),
        template=k8s.core.v1.PodTemplateSpecArgs(
            metadata=k8s.meta.v1.ObjectMetaArgs(
                labels=primary_app_labels,
            ),
            spec=k8s.core.v1.PodSpecArgs(
                containers=[k8s.core.v1.ContainerArgs(
                    name="ai-service",
                    image="my-ai-service:stable",  # Replace with your stable image tag
                    ports=[k8s.core.v1.ContainerPortArgs(container_port=80)],
                )],
            ),
        ),
    ),
)

# Define the canary version of your AI application.
canary_app_labels = {"app": "ai-service", "version": "canary"}
canary_deployment = k8s.apps.v1.Deployment("canary-deployment",
    spec=k8s.apps.v1.DeploymentSpecArgs(
        replicas=1,
        selector=k8s.meta.v1.LabelSelectorArgs(
            match_labels=canary_app_labels
        ),
        template=k8s.core.v1.PodTemplateSpecArgs(
            metadata=k8s.meta.v1.ObjectMetaArgs(
                labels=canary_app_labels,
            ),
            spec=k8s.core.v1.PodSpecArgs(
                containers=[k8s.core.v1.ContainerArgs(
                    name="ai-service",
                    image="my-ai-service:canary",  # Replace with your canary image tag
                    ports=[k8s.core.v1.ContainerPortArgs(container_port=80)],
                )],
            ),
        ),
    ),
)

# Create a Kubernetes Service to expose both deployments under a single IP address.
service = k8s.core.v1.Service("ai-service",
    spec=k8s.core.v1.ServiceSpecArgs(
        selector={"app": "ai-service"},  # This will match both primary and canary pods.
        ports=[k8s.core.v1.ServicePortArgs(
            port=80,
            target_port=80,
        )],
        type="LoadBalancer",  # Use LoadBalancer for external access (could be ClusterIP or NodePort).
    ),
)

# Export the Service name and IP address for easy access after deployment.
pulumi.export("service_name", service.metadata["name"])
pulumi.export("service_ip", service.status.apply(lambda status: status.load_balancer.ingress[0].ip if status.load_balancer.ingress else "Not assigned yet"))
```

This Pulumi program creates two deployments and a service in your Kubernetes cluster:
1. `primary-deployment` is the Deployment that runs the stable version of the AI service.
2. `canary-deployment` is the Deployment that runs the new, "canary" version of the AI service that needs to be tested.
3. `ai-service` is the Service that provides access to both the primary and canary versions of the AI service. It will load balance traffic between them indiscriminately.

Remember to replace the image repository and tag (`my-ai-service:stable` and `my-ai-service:canary`) with the actual path and version of your container images.

Also note that while this setup gets both versions running, real-world scenarios often require more sophisticated setup for splitting traffic (e.g., using Istio's traffic management capabilities) and automating promotions or rollbacks based on monitoring and alerting.