Scalable ML Model Serving with Kubernetes Karpenter
PythonTo achieve scalable ML model serving with Kubernetes, you'll want to focus on two key components: setting up a Kubernetes cluster capable of scaling based on workload demands, and having an ML model serving solution that is containerized and ready to be deployed on Kubernetes.
For the Kubernetes cluster with scalability, we can use Amazon EKS (Elastic Kubernetes Service) as it's a managed Kubernetes service which simplifies running Kubernetes clusters without having to install and operate your own cluster control plane.
Within the EKS cluster, we can use Karpenter. Karpenter is an open-source, flexible, and high-performance Kubernetes cluster autoscaler designed to quickly start up nodes in response to the demands of the applications.
On Amazon EKS, Karpenter can automatically adjust the types and quantities of instances used, ensuring that the cluster scales efficiently. To use Karpenter, we need to set it up with EKS and provide it with the necessary permissions to manage the underlying EC2 instances.
For serving ML models, we can use a containerized solution. There are several options available for this, such as TensorFlow Serving, NVIDIA Triton Inference Server, Seldon, or custom solutions, depending on the specific use case and requirements. These model servers are typically packaged as Docker containers, which can be deployed on Kubernetes as a
Deployment
with aService
for accessing the model server's API.Below is a program block that illustrates how you could set up an EKS cluster and prepare it for use with Karpenter for the purpose of scalable ML model serving. For the sake of brevity, we'll not include the entire setup process for Karpenter and the ML serving component, but the following example will set up the prerequisites:
import pulumi import pulumi_aws as aws from pulumi_aws import ec2, iam, eks # Create an Amazon EKS cluster with managed nodes. eks_cluster = eks.Cluster( "eks-cluster", role_arn=eks_role.arn, tags={ "Name": "pulumi-eks-cluster", }, vpc_config=eks.ClusterVpcConfigArgs( public_access_cidrs=["0.0.0.0/0"], ), ) # Create an IAM role that will be used by the EKS cluster. Note: the # managed policy ARNs provided below are AWS defined roles. Attach # policies according to the least privilege principle and your requirements. eks_role = iam.Role( "eks-cluster-role", assume_role_policy=json.dumps({ "Version": "2012-10-17", "Statement": [{ "Action": "sts:AssumeRole", "Effect": "Allow", "Principal": { "Service": "eks.amazonaws.com", }, }], }), managed_policy_arns=[ # Amazon EKS cluster policy "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy", # Amazon EKS VPC Resource Controller policy (required for using security groups) "arn:aws:iam::aws:policy/AmazonEKSVPCResourceController", ], ) # Create an IAM policy and role for Karpenter to allow it to manage EC2 instances. karpenter_policy = iam.Policy( "karpenter-policy", policy=pulumi.Output.concat(""" { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ec2:CreateLaunchTemplate", "ec2:CreateFleet", "ec2:RunInstances", "ec2:CreateTags", "iam:PassRole", "ec2:TerminateInstances" ], "Resource": "*" } ] } """), ) karpenter_role = iam.Role( "karpenter-role", assume_role_policy=json.dumps({ "Version": "2012-10-17", "Statement": [{"Action": "sts:AssumeRole", "Effect": "Allow", "Principal": {"Service": "ec2.amazonaws.com"}}], }), ) iam.RolePolicyAttachment( "karpenter-role-policy-attachment", role=karpenter_role.name, policy_arn=karpenter_policy.arn, ) # Output the cluster details. pulumi.export("eks_cluster_name", eks_cluster.name) pulumi.export("eks_cluster_kubeconfig", pulumi.Output.secret(eks_cluster.kubeconfig))
This Pulumi program does the following:
- Creates a new EKS cluster with a default node group that can accommodate general workloads.
- Defines an IAM role that the EKS cluster will assume with the necessary policies for cluster management.
- Defines another IAM role and attaches policies to it which grants Karpenter the permissions it needs to create and manage EC2 instances.
The workflow for deploying your ML models on this cluster typically follows these steps:
- Push your containerized ML model serving application to a container registry (e.g., Amazon ECR).
- Write Kubernetes manifests (as code, or actual YAML configuration) for your ML application, referencing the pushed image.
- Deploy these manifests to your EKS cluster.
Please note the above Pulumi code assumes you're already familiar with how to configure your cloud resources and permissions securely. When working with IAM roles and policies, ensure you're following AWS best practices for security and that you review and understand the policies you’re attaching to roles.