Scalable ML Model Serving with Azure Kubernetes Service

Question

Pulumi · Accepted Answer

To serve a machine learning (ML) model scalably using Azure Kubernetes Service (AKS), you need to set up a Kubernetes cluster where you can deploy your model wrapped in a container. Once you have a Kubernetes cluster, you can use Kubernetes' features like auto-scaling to handle varying loads which is crucial for maintaining performance as demand for your ML model's predictions changes.

Here's a breakdown of the steps you will take with Pulumi and Python:
1. Create a new AKS cluster.
2. Deploy your containerized model to the AKS cluster.
3. Configure horizontal pod auto-scaling to handle workload changes.

Below is a Pulumi program in Python that creates an Azure Kubernetes Service cluster suitable for ML model serving. Please note that you'll need to have Docker images with your ML model packaged and stored in a container registry that the AKS cluster can access. This program doesn't cover the containerization of your ML model or the creation of a container registry, but focuses on standing up the infrastructure for serving the model.

```python
import pulumi
import pulumi_azure_native as azure_native

# Define the AKS cluster resources
class AksCluster(pulumi.ComponentResource):

def __init__(self, name: str, opts: pulumi.ResourceOptions = None):
        super().__init__('custom:resource:AksCluster', name, {}, opts)

# Create a new resource group for the AKS cluster
        resource_group = azure_native.resources.ResourceGroup(
            f"{name}-rg",
            resource_group_name=f"{name}-resources"
        )

# Create the AKS cluster
        managed_cluster = azure_native.containerservice.ManagedCluster(
            f"{name}-aks",
            resource_group_name=resource_group.name,
            # Strongly-typed classes are preferred here
            agent_pool_profiles=[{
                'count': 3, # Start with 3 nodes
                'max_pods': 110, # Max pods per node
                'mode': 'System', # System mode
                'name': 'agentpool', # Name of the agent pool
                'vm_size': 'Standard_DS2_v2', # VM size of nodes
            }],
            # ... other necessary configuration for the cluster
            dns_prefix=name,
            kubernetes_version='1.21.2', # specify your desired kubernetes version
            sku=azure_native.containerservice.ManagedClusterSKUArgs(
                name="Basic", # Use the Basic SKU
                tier="Free" # No additional fee for the cluster management
            ),
            opts=pulumi.ResourceOptions(depends_on=[resource_group])
        )

# Expose outputs from this component
        self.cluster_name = managed_cluster.name
        self.kubeconfig = managed_cluster.kube_config_raw
        self.resource_group_name = resource_group.name

self.register_outputs({})

# Create the AKS cluster
aks_cluster = AksCluster("my-ml-serving-cluster")

# Export the cluster name and kubeconfig as stack outputs
pulumi.export("cluster_name", aks_cluster.cluster_name)
pulumi.export("kubeconfig", aks_cluster.kubeconfig)
```

This Pulumi program does the following:
- Defines a custom component (`AksCluster`) to create an AKS cluster inside a resource group. The cluster will have an initial node count of 3, which is a good starting point for a scalable ML application.
- Inside the custom component, it defines the agent pool with configuration like the node size, count, and the maximum number of pods that can be scheduled on a node.
- It exports two crucial elements as stack outputs: the `cluster_name` and `kubeconfig`. These will be used to interact with your AKS cluster, such as when deploying Kubernetes resources like your ML model.

This is the fundamental infrastructure needed to host a scalable ML model on AKS. The next steps, which aren't covered in this script, involve deploying your containerized ML model to AKS and setting up auto-scaling based on the load. You'll use the Kubernetes API (with `kubectl` or similar tools) to deploy and manage your application on this cluster.