Rolling Updates for AI Model Serving Services

Question

Pulumi · Accepted Answer

To perform a rolling update for AI model serving services, you typically use managed cloud services that support this feature, such as Kubernetes deployments, AWS SageMaker endpoints, Google Cloud Compute Engine Instance Group Managers, or Azure Machine Learning services. These services facilitate the deployment and management of your models while providing mechanisms to update the models with minimal downtime.

Below is a sample Pulumi program in Python that demonstrates how you can create a Kubernetes deployment in AWS Elastic Kubernetes Service (EKS) to serve an AI model with the capability to perform rolling updates. The `Deployment` resource is used because it offers a rolling update strategy out of the box, which ensures zero downtime during updates as it creates new pods and gradually terminates the old ones.

### Explanation:

**Kubernetes Deployment:**
- A Deployment resource manages a replicated application on your cluster. When you describe the desired state in a Deployment, the Deployment Controller changes the actual state to the desired state at a controlled rate. This mechanism allows for rolling updates by updating Pod instances incrementally with new ones.

**AWS SageMaker Endpoint:**
- AWS SageMaker Endpoint is a fully managed service that allows you to deploy your machine learning models for inference. You can update these models using SageMaker's update mechanisms, which include blue/green or rolling updates to ensure high availability during your deployments.

**Google Compute Engine Instance Group Manager:**
- In Google Cloud, Instance Group Managers manage groups of instances that are based on a common template. Google Cloud's Instance Group Managers support rolling updates, which means that you can make changes to instances in the group that happen gradually, and you can control the level of disruption during the update.

**Azure Machine Learning Services:**
- Azure Machine Learning Services allows you to deploy machine learning models into production. The service supports rolling updates to models deployed as web services on AKS (Azure Kubernetes Service), enabling you to update your model without downtime.

Since Kubernetes is a common way to deploy containerized applications, including AI model servers like TensorFlow Serving, PyTorch Serve, or your custom Flask server, we'll proceed with a Kubernetes Deployment.

### Pulumi Program for Kubernetes Deployment with Rolling Updates:

```python
import pulumi
import pulumi_kubernetes as k8s

# Configuration for the Kubernetes Deployment
app_labels = {"app": "ai-model-server"}
container_image = "YOUR_CONTAINER_IMAGE_HERE"  # Replace with your AI model serving container image
container_port = 80  # Use the port your app is designed to listen on. It could be something other than 80.

# Kubernetes Deployment
deployment = k8s.apps.v1.Deployment(
    "ai-model-server-deployment",  # Logical name within Pulumi for the Deployment
    spec=k8s.apps.v1.DeploymentSpecArgs(
        replicas=3,  # Number of replicas of your application
        selector=k8s.meta.v1.LabelSelectorArgs(
            match_labels=app_labels,
        ),
        template=k8s.core.v1.PodTemplateSpecArgs(
            metadata=k8s.meta.v1.ObjectMetaArgs(
                labels=app_labels,
            ),
            spec=k8s.core.v1.PodSpecArgs(
                containers=[
                    k8s.core.v1.ContainerArgs(
                        name="ai-model-server",
                        image=container_image,
                        ports=[k8s.core.v1.ContainerPortArgs(container_port=container_port)],
                    ),
                ],
            ),
        ),
        strategy=k8s.apps.v1.DeploymentStrategyArgs(
            type="RollingUpdate",  # Define the type of update strategy
            rolling_update=k8s.apps.v1.RollingUpdateDeploymentArgs(
                max_unavailable=1,  # Only one pod at a time will be taken down during the update
                max_surge=1,  # At most one extra pod will be created during the update
            ),
        ),
    ),
)

# Export the Deployment name
pulumi.export("deployment_name", deployment.metadata["name"])
```

This program sets up a Kubernetes Deployment with a rolling update strategy. When you push a new container image tag and update the Deployment, Kubernetes will ensure that only a specified number of Pods are down at any time, and a corresponding number of new Pods are created. These values are controlled by `max_unavailable` and `max_surge`. This results in a seamless transition for users as the AI model service is updated.

Ensure you replace `"YOUR_CONTAINER_IMAGE_HERE"` with the actual image of your AI model service container. Also, ensure that your Pulumi stack configuration is set up correctly with the required cloud provider, Kubernetes context, and other necessary credentials.