Auto-Scaling ML Model Serving with Kubernetes on AWS Upbound

Question

Pulumi · Accepted Answer

Auto-scaling an ML model serving infrastructure involves creating a Kubernetes cluster on AWS and then deploying your ML model with the ability to scale based on the demand. Auto-scaling helps to optimize resource utilization and ensures that the serving infrastructure can handle varying loads effectively.

To achieve this, we'll create an AWS EKS (Elastic Kubernetes Service) cluster and then set up an auto-scaling group that controls the number of pods running our ML model. This will be orchestrated using Pulumi, which allows you to define infrastructure as code.

Here's what we're going to do in our Pulumi program:

1. **Set Up EKS Cluster**: Use Pulumi AWS provider to create an EKS cluster. This will serve as the environment for running our Kubernetes deployments.

2. **Define the Node Group with Auto-Scaling**: Define an Auto Scaling Group (ASG) for the EKS cluster. The ASG will automatically adjust the number of nodes (EC2 instances) in the cluster.

3. **Deploy the Kubernetes Resources**: Deploy necessary Kubernetes resources, such as Deployments with a Pod template for your ML model. Also, set up the Horizontal Pod Autoscaler, which automatically scales the number of pods in a deployment based on observed CPU utilization or other selected metrics.

4. **Export the Outputs**: At the end of the program, we'll export certain outputs like the cluster name and autoscaling policies, which can be helpful for future references or CI/CD integrations.

The following program achieves the above objectives in Python using Pulumi:

```python
import pulumi
import pulumi_aws as aws
from pulumi_aws import eks

# Step 1: Create an EKS cluster
# This cluster will be the foundation for deploying the Kubernetes resources needed for ML model serving.
eks_cluster = eks.Cluster("eks-cluster",
    role_arn=aws.iam.Role("eks-cluster-role", # IAM Role for EKS
        assume_role_policy=aws.iam.get_policy_document(
            statements=[aws.iam.GetPolicyDocumentStatementArgs(
                actions=["sts:AssumeRole"],
                principals=[aws.iam.GetPolicyDocumentStatementPrincipalArgs(
                    type="Service",
                    identifiers=["eks.amazonaws.com"]
                )]
            )]
        ).json,
    ).arn,
    vpc_config=eks.ClusterVpcConfigArgs(
        subnet_ids=aws.ec2.Subnet("example-subnet",
            vpc_id=aws.ec2.Vpc("example-vpc").id
        ).id,
    ),
)

# Step 2: Define the node group and enable auto-scaling
# Creating an auto-scaling group within the EKS cluster to manage the worker nodes.
node_group = aws.eks.NodeGroup("eks-node-group",
    cluster_name=eks_cluster.name,
    node_role_arn=aws.iam.Role("eks-node-group-role", # IAM Role for EKS Node Group
        assume_role_policy=aws.iam.get_policy_document(
            statements=[aws.iam.GetPolicyDocumentStatementArgs(
                actions=["sts:AssumeRole"],
                principals=[aws.iam.GetPolicyDocumentStatementPrincipalArgs(
                    type="Service",
                    identifiers=["ec2.amazonaws.com"]
                )]
            )]
        ).json,
    ).arn,
    subnet_ids=eks_cluster.vpc_config.subnet_ids,
    scaling_config=aws.eks.NodeGroupScalingConfigArgs(
        desired_size=2,
        max_size=5,
        min_size=1,
    ),
)

# Step 3: Deploy Kubernetes resources
# At this step, you would typically use `pulumi_kubernetes` to deploy the model serving application
# as a Deployment resource and configure autoscaling parameters using a HorizontalPodAutoscaler resource.
# The code below is a placeholder for those details, which would require information about the container image
# and other configuration details specific to the ML model serving use case.

# Step 4: Export the outputs
# These outputs can be useful for interacting with the cluster, such as from a CI/CD system.
pulumi.export('eks_cluster_name', eks_cluster.name)
pulumi.export('node_group_scaling_config', node_group.scaling_config)
```

This program defines the necessary AWS and Kubernetes resources to get a basic ML model serving infrastructure up and running with auto-scaling features. For the Kubernetes deployment and auto-scaler setup that utilize the ML model, you would substitute actual configuration details for the placeholder code presented in Step 3.

It's important to note that setting up a CI/CD system for deploying your ML model into the EKS cluster and managing versioning, scaling, and other operations aspects are beyond the scope of this code and require additional considerations such as container image registries, deployment strategies, and monitoring.