Automated Canary Deployments for AI Applications with Flagger

Question

Pulumi · Accepted Answer

Canary deployments are an approach used to reduce the risk of introducing a new software version into production by slowly rolling out the change to a small subset of users before making it available to everybody. This allows you to monitor the new version's performance and behaviors under load. If issues are found during this canary stage, you can halt the deployment, investigate, and roll back if needed, without affecting the majority of users.

To create automated canary deployments, we can leverage a progressive delivery tool like Flagger. Flagger can automate the release process by incrementally shifting traffic to the new version while measuring metrics and running conformance tests.

In the context of AI applications, you want to ensure that the new version of your AI service does not introduce regressions or negatively impact the user experience. For example, if you have a machine learning model serving predictions, you could gradually increase the traffic to the new model version while monitoring its prediction accuracy and response times.

The code provided below demonstrates how to set up an automated canary deployment for an AI application running on Kubernetes, with Flagger and Prometheus for metrics. Note that the actual AI application and Prometheus configuration are assumed to be already deployed.

```python
import pulumi
import pulumi_kubernetes as k8s

# Configuring the Kubernetes provider to connect to your cluster
k8s_provider = k8s.Provider("k8s_provider", kubeconfig="your-kubeconfig-here")

# Create a Kubernetes Deployment for the AI application
app_labels = {"app": "ai-application"}
ai_app_deployment = k8s.apps.v1.Deployment("aiAppDeployment",
    metadata=k8s.meta.v1.ObjectMetaArgs(
        name="ai-application",
        labels=app_labels
    ),
    spec=k8s.apps.v1.DeploymentSpecArgs(
        replicas=1,
        selector=k8s.meta.v1.LabelSelectorArgs(match_labels=app_labels),
        template=k8s.core.v1.PodTemplateSpecArgs(
            metadata=k8s.meta.v1.ObjectMetaArgs(labels=app_labels),
            spec=k8s.core.v1.PodSpecArgs(
                containers=[k8s.core.v1.ContainerArgs(
                    name="ai-application-container",
                    image="dockerhub/your-ai-application:version1",
                    ports=[k8s.core.v1.ContainerPortArgs(container_port=8080)]
                )]
            )
        )
    ),
    opts=pulumi.ResourceOptions(provider=k8s_provider)
)

# Create a Kubernetes Service to expose the AI application
ai_app_service = k8s.core.v1.Service("aiAppService",
    metadata=k8s.meta.v1.ObjectMetaArgs(
        name="ai-application",
        labels=app_labels
    ),
    spec=k8s.core.v1.ServiceSpecArgs(
        selector=app_labels,
        ports=[k8s.core.v1.ServicePortArgs(
            port=80,
            target_port=8080
        )],
        type="ClusterIP"
    ),
    opts=pulumi.ResourceOptions(provider=k8s_provider)
)

# Install Flagger into the Kubernetes cluster
flagger_chart = k8s.helm.v3.Chart("flagger",
    k8s.helm.v3.ChartArgs(
        chart="flagger",
        fetch_opts=k8s.helm.v3.FetchOptsArgs(
            repo="https://flagger.app"
        ),
        values={
            "prometheus": {
                "install": True  # If you haven't installed Prometheus, this can help set it up
            }
        }
    ),
    opts=pulumi.ResourceOptions(provider=k8s_provider)
)

# Create a canary custom resource for the AI application
ai_app_canary = k8s.apiextensions.CustomResource("aiAppCanary",
    api_version="flagger.app/v1beta1",
    kind="Canary",
    metadata=k8s.meta.v1.ObjectMetaArgs(
        name="ai-application",
        namespace="default"
    ),
    spec={
        "targetRef": {
            "apiVersion": "apps/v1",
            "kind": "Deployment",
            "name": "ai-application"
        },
        "service": {
            "port": 80,
            "targetPort": 8080
        },
        "analysis": {
            "interval": "1m",
            "threshold": 10,
            "stepWeight": 5,
            "metrics": [
                {
                    "name": "request-success-rate",
                    "thresholdRange": {
                        "min": 99
                    },
                    "interval": "1m"
                },
                {
                    "name": "request-duration",
                    "thresholdRange": {
                        "max": 500
                    },
                    "interval": "30s"
                }
            ]
        }
    },
    opts=pulumi.ResourceOptions(provider=k8s_provider, depends_on=[flagger_chart])
)

# Export the Service endpoint to access the AI application
pulumi.export("ai_app_service_endpoint", ai_app_service.metadata.apply(lambda m: f"{m.name}.{m.namespace}.svc.cluster.local"))
```

This program performs the following actions:

1. Defines a Pulumi Kubernetes provider instance to connect to your Kubernetes cluster.
2. Creates a Kubernetes Deployment for the AI application.
3. Exposes the AI application through a Kubernetes Service of type `ClusterIP`.
4. Installs Flagger using a Helm chart.
5. Creates a Canary resource that defines the canary analysis parameters, such as the interval between analyses, the success rate threshold, and the maximum allowed request duration.

Remember that you need to have your Kubernetes configuration and access rights correctly set up for Pulumi to interact with your cluster. The `kubeconfig` value in the `k8s_provider` should point to the configuration file containing your cluster credentials.

To access the AI application, you can use the DNS name of the service inside the cluster, which is exported at the end of the program. If your services are publicly accessible, you would typically deploy an Ingress resource to manage external access.

Before running this program, ensure you have the Pulumi CLI installed and configured with your desired Kubernetes context. You will also need to install the Pulumi Kubernetes and Helm providers.

Please replace `"dockerhub/your-ai-application:version1"` with the image URL of your actual AI application. If you have Prometheus installed separately, you can set `"install": False` under the Prometheus values.

Keep in mind that Flagger, Prometheus, and the deployment strategies are highly configurable, and you may need to adjust the configurations to suit your specific needs. The analysis, for example, should include metrics that are relevant to your AI application's health and performance.