Deploying Scalable ML Model Serving with EKS

Question

Pulumi · Accepted Answer

Deploying a scalable machine learning (ML) model serving solution on Amazon Elastic Kubernetes Service (EKS) involves setting up an EKS cluster that is optimized for ML workloads, with the ability to scale based on demand. For this purpose, we will need to:

1. Create an EKS cluster
2. Set up IAM roles and permissions for EKS
3. Configure the VPC plugin for Amazon EKS to use with the cluster
4. Deploy the ML model serving application, possibly using a solution like TensorFlow Serving or a similar technology, depending on the specifics of the ML model.

In the Pulumi program below, we'll focus on the infrastructure portion, setting up an EKS cluster and the necessary permissions. The specific ML model deployment is application-specific and thus not covered directly here. Instead, after the infrastructure is set up, the Kubernetes resources (like Deployments, Services, etc.) for the ML serving would be deployed using `kubectl` or a Pulumi program for Kubernetes.

Here's a Pulumi program that sets up an EKS cluster suitable for serving ML models. It uses the `pulumi_eks` package because it provides high-level abstractions that simplify setting up and managing EKS clusters.

```python
import pulumi
import pulumi_aws as aws
from pulumi_eks import Cluster

# Create an IAM role that the EKS service will assume.
eks_role = aws.iam.Role("eksRole",
    assume_role_policy="""{
      "Version": "2012-10-17",
      "Statement": [{
        "Effect": "Allow",
        "Principal": {"Service": "eks.amazonaws.com"},
        "Action": "sts:AssumeRole"
      }]
    }"""
)

# Attach the AmazonEKSClusterPolicy to the role created above.
eks_policy_attachment = aws.iam.RolePolicyAttachment("eksPolicyAttachment",
    role=eks_role.name,
    policy_arn="arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
)

# Create a Security Group that we can use to allow ingress to the EKS cluster.
sec_group = aws.ec2.SecurityGroup("secGroup", 
    description="Allow all HTTP(s) traffic to EKS",
    ingress=[
        {"protocol": "tcp", "from_port": 80, "to_port": 80, "cidr_blocks": ["0.0.0.0/0"]},
        {"protocol": "tcp", "from_port": 443, "to_port": 443, "cidr_blocks": ["0.0.0.0/0"]},
    ],
)

# Set up the EKS cluster itself with the required config.
eks_cluster = Cluster("eksCluster",
    role_arn=eks_role.arn,
    vpc_id=aws.config.vpc_id, # replace with your VPC ID
    subnet_ids=["subnet-12345", "subnet-67890"], # replace with your Subnet IDs
    instance_type="m5.large", # chosen for ML workload, confirm based on your model's needs
    desired_capacity=2, # start with 2 worker nodes, adjust as necessary for your workload
    min_size=1,
    max_size=10, # allows scaling up to 10 worker nodes
    create_oidc_provider=True,
    instance_role=eks_role,
)

# Export the cluster's kubeconfig and the security group id for access
pulumi.export("kubeconfig", eks_cluster.kubeconfig)
pulumi.export("security_group_id", sec_group.id)
```

In the above Pulumi program, we first set up an IAM role for EKS that allows the EKS service to manage clusters on our behalf. Then, we attach the necessary AWS managed policy `AmazonEKSClusterPolicy` to this IAM role to grant it necessary permissions.

Next, we create a security group to control access to the EKS cluster. In a real-world scenario, you would lock down the ingress rules to be more restrictive, but for simplicity we're allowing all traffic on ports 80 and 443 (HTTP and HTTPS).

We then create an EKS cluster, specifying the VPC and subnets it should operate in, as well as the type and number of worker nodes. We choose an instance type (`m5.large`) that is generally suitable for ML workloads, but this would need to be adjusted based on specific model requirements. We also set the minimum, desired, and maximum size for the cluster's autoscaling group, which controls how it scales.

Finally, we export the `kubeconfig` needed to interact with the cluster using tools like `kubectl`, and the security group ID, which could be used to set up additional network configurations.

Keep in mind that the VPC and subnets need to be pre-configured or also defined in Pulumi. They are specified in this example as placeholders.

Remember, this program only sets up the infrastructure. The deployment of the actual ML model serving application, which would run as Kubernetes workloads, would need additional Kubernetes resources like Deployments, Services, and possibly Ingress objects, which are handled separately.